next up previous
Next: 4.2.4.1 Results with the Up: 4.2 D&C EM Previous: 4.2.3 Computational efficiency of


4.2.4 Refined D&C EM fitting algorithm

The hyperexponential distribution has a complete monotone PDF. Hence it provides better fits for data sets and distribution functions whose PDFs are complete monotone. Example of distribution functions with complete monotone PDFs are Pareto and Weibull distribution (with parameters as in Traces 4 and 5). Traces 1, 2, and 3 do not have complete monotone CDHs (equivalent to PDF of a distribution). For the data sets with non-monotone CDHs, the hyperexponential distribution is not the best choice among the special cases of PH distributions. A PH distribution constructed as a mixture of an Erlang and a hyperexponential distribution is a better fit for data sets with CDH shapes similar to Traces 1, 2, and 3, i.e., non-monotone with only one spike in the PDF [98] (shown also in Figure 4.8).
Figure 4.8: Splitting of the non-monotone continuous data histogram (CDH).
\begin{figure}\centering {\psfig{figure=III/FIGS/histogram-Erlang.eps, width=0.65 \linewidth}}
\end{figure}

In order to fit the data set into a mixture of Erlang and hyperexponential distributions, we first determine the portion of the data set CDH that we need to fit into an Erlang. We start from the first bin of the CDH and accumulate bins, in the same fashion described in Section 4.2, until we take into consideration approximately 0.5% of the data set entries. The value 0.5% accounts for just a ``small'' portion in the beginning of the CDH4.3. We denote the index of the last bin of this accumulation with $Max$. In Figure 4.8, the first two bins marked with ``E'' illustrate how we accumulate bins for the Erlang fitting. In the example of Figure 4.8, $Max$ is equal to $2$. The accumulated data is fitted into a 2-phase Erlang. Since the Erlang distribution with 2 phases has only one parameter, i.e., the mean $\lambda_{Er}$ of each of the exponential phases, we use the moment matching method [43] rather than the EM algorithm to determine this parameter.

Figure 4.9: Refined D&C EM fitting algorithm.
\framebox{\begin{minipage}[t]{5.5in}
\begin{tabbing}
9\=999\=9999999999999999999...
...\> ~~~~ the Erlang and the hyperexponential fits\\
\end{tabbing}\end{minipage}}

Once the parameter of the Erlang distribution is determined, we continue with the D&C EM algorithm as described in Figure 4.4 to fit the entire data set into a hyperexponential distribution. Upon completion of the fitting procedure, we merge the Erlang and hyperexponential fits by letting the Erlang fit proceed the hyperexponential one in the final PH distribution. We illustrate the structure of vector $\mbox{\boldmath {$\tau$}}$ and matrix ${\mathbf{T}}$ for the PH distribution in Eq.(4.1), assuming 3-phase hyperexponential and 2-phase Erlang fits. Recall that $\mbox{\boldmath {$\tau$}}$ and ${\mathbf{T}}$ are the parameters of a PH distribution as defined in Subsection 2.6.
\begin{displaymath}
\mbox{\boldmath {$\tau$}}= [1,~0,~0,~0,~0]~~~~~\mbox{and}~~~...
..._4 & 0 \\
0 & 0 & 0 & 0 & -\lambda_5 \\
\end{array}\right],
\end{displaymath} (4.1)

where $p_i$ for $1 \leq i \leq 3$ are the probabilities associated with each phase of the hyperexponential distribution.

Figure 4.9 summarizes the refined D&C EM algorithm that allows for fitting data sets with not completely monotone CDHs into PH distributions. Observe that while we introduce three new steps, i.e., Step 2, 3, and 7, we adjust the refined D&C EM algorithm to accommodate both cases of monotone and non-monotone densities. Step 2 determines if the CDH is non-monotone and an Erlang fit is required. We emphasize that the refined D&C EM algorithm provides improvements in fitting the body of the CDH and preserves the tail-matching accuracy and computational efficiency of the original D&C EM.



Subsections
next up previous
Next: 4.2.4.1 Results with the Up: 4.2 D&C EM Previous: 4.2.3 Computational efficiency of
Alma Riska 2003-01-13