Next: 4.2.4.1 Results with the
Up: 4.2 D&C EM
Previous: 4.2.3 Computational efficiency of
4.2.4 Refined D&C EM fitting algorithm
The hyperexponential distribution has a complete monotone PDF. Hence it
provides better fits for data sets and distribution functions whose PDFs are
complete monotone. Example of distribution functions with complete monotone
PDFs are Pareto and Weibull distribution (with parameters as in Traces 4 and
5). Traces 1, 2, and 3 do not have complete monotone CDHs (equivalent to PDF
of a distribution). For the data sets with non-monotone CDHs, the
hyperexponential distribution is not the best choice among the special cases
of PH distributions.
A PH distribution constructed as a mixture of an Erlang and a
hyperexponential distribution is a better fit for data sets with CDH shapes
similar to Traces 1, 2, and 3, i.e., non-monotone with only one spike in the
PDF
[98] (shown also in Figure 4.8).
Figure 4.8:
Splitting of the non-monotone continuous data histogram (CDH).
 |
In order to fit the data set into a mixture of Erlang and hyperexponential
distributions, we first determine the portion of the data set CDH that
we need to fit into an Erlang. We start from the first bin of the CDH and
accumulate bins, in the same fashion described in Section 4.2,
until we take into consideration approximately 0.5% of the data set entries.
The value 0.5% accounts for just a ``small'' portion in the beginning of the
CDH4.3. We denote the index of the last bin of this accumulation
with
. In Figure 4.8, the first two bins marked
with ``E'' illustrate how we accumulate bins for the Erlang fitting. In the
example of Figure 4.8,
is equal to
.
The accumulated data is fitted into a 2-phase Erlang. Since the Erlang
distribution with 2 phases has only one parameter, i.e., the mean
of each of the exponential phases, we use the moment matching
method [43]
rather than the EM algorithm to determine this parameter.
Figure 4.9:
Refined D&C EM fitting algorithm.
|
|
Once the parameter of the Erlang distribution is determined, we continue with
the D&C EM algorithm as described in Figure 4.4 to fit the
entire data set into a hyperexponential distribution. Upon completion of the
fitting procedure, we merge the Erlang and hyperexponential fits by letting
the Erlang fit proceed the hyperexponential one in the final PH
distribution. We illustrate the structure of vector
and matrix
for the PH distribution in Eq.(4.1), assuming
3-phase hyperexponential and 2-phase Erlang fits. Recall that
and
are the parameters of a PH distribution as defined in
Subsection 2.6.
![\begin{displaymath}
\mbox{\boldmath {$\tau$}}= [1,~0,~0,~0,~0]~~~~~\mbox{and}~~~...
..._4 & 0 \\
0 & 0 & 0 & 0 & -\lambda_5 \\
\end{array}\right],
\end{displaymath}](img541.gif) |
(4.1) |
where
for
are the probabilities associated with
each phase of the hyperexponential distribution.
Figure 4.9 summarizes the refined D&C EM
algorithm that allows for fitting data sets with not completely monotone
CDHs into PH distributions. Observe that while we introduce
three new steps, i.e., Step 2, 3, and 7, we adjust the
refined D&C EM algorithm to accommodate both cases of monotone and
non-monotone densities. Step 2 determines if the CDH is non-monotone
and an Erlang fit is required. We emphasize that the refined D&C EM algorithm
provides improvements in fitting the body of the CDH and preserves the
tail-matching accuracy and computational efficiency of the original D&C EM.
Subsections
Next: 4.2.4.1 Results with the
Up: 4.2 D&C EM
Previous: 4.2.3 Computational efficiency of
Alma Riska
2003-01-13