Abstract
Activities in business processes primarily depend on human behavior for completion. Due to human agency, the behavior underlying individual activities may occur in multiple phases and can vary in execution. As a result, the execution duration and nature of such activities may exhibit complex multimodal characteristics. Phase-type distributions are useful for analyzing the underlying behavioral structure, which may consist of multiple sub-activities. The phenomenon of delayed start is also common in such activities, possibly due to the minimum task completion time or prerequisite tasks. As a result, the distribution of durations or certain components does not start at zero but has a minimum value, and the probability below this value is zero. When using phase-type models to fit such distributions, a large number of phases are often required, which exceed the actual number of sub-activities. This reduces the interpretability of the parameters and may also lead to optimization difficulties due to overparameterization. In this paper, we propose a smooth-delayed phase-type mixture model that introduces delay parameters to address the difficulty of fitting this kind of distribution. Since durations shorter than the delay should have zero probability, such hard truncation renders the parameter not estimable under the Expectation–Maximization (EM) framework. To overcome this, we design a soft-truncation mechanism to improve model convergence. We further develop an inference framework that combines the EM algorithm, Bayesian inference, and Sequential Least Squares Programming for comprehensive and efficient parameter estimation. The method is validated on a synthetic dataset and two real-world datasets. Results demonstrate that the proposed approach maintains a suitable performance comparable to purely data-driven methods while providing good interpretability to reveal the potential underlying structure behind human-driven activities.
| Original language | English |
|---|---|
| Article number | 575 |
| Pages (from-to) | 1-28 |
| Number of pages | 28 |
| Journal | Algorithms |
| Volume | 18 |
| Issue number | 9 |
| Early online date | 11 Sept 2025 |
| DOIs | |
| Publication status | Published (in print/issue) - 30 Sept 2025 |
Bibliographical note
Publisher Copyright:© 2025 by the authors.
Data Access Statement
The two data sets presented in this study are openly available as follows: (1) Hospital Billing-Event Log: Mannhardt, Felix (2017): Hospital Billing-Event Log. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741 [32]. (2) BPI Challenge 2020: Domestic Declarations: van Dongen, Boudewijn (2020): BPI Challenge 2020: Domestic Declarations. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/uuid:3f422315-ed9d-4882-891f-e180b5b4feb5 [35].Funding
This research was funded by Invest NI through the Advanced Research and Engineering Centre and was part-financed by the European Regional Development Fund under the Investment for Growth and Jobs Programme 2014–2020.
Keywords
- phase-type distribution
- mixture model
- Bayesian inference
- human-driven process
- process duration modeling