Table 4.2 illustrates the means and the CVs of the original
data sets, plus various hyperexponential fittings using the EM, FW,
and D&C EM algorithms. In Table 4.2, ``
ph'' means
that the EM or FW algorithms fit the entire data set into a hyperexponential
with
phases. Observe that the D&C EM models match the mean of the traces
with maximal error of 4%.
The coefficient of variation is more difficult to match, since it is
obtained using both the first and the second moments. Nevertheless, D&C EM models
match it with a maximal error of 20% (Trace 2).
The EM algorithm alone could not generate results for Traces 4 and 5 within
reasonable amount of computation time (less than a week) in a
Pentium III 800MHz processor with 1GB of memory. Since
Traces 4 and 5 are synthetically generated from a Weibull distributions,
we fit the same distribution functions into hyperexponential distributions
using the FW algorithm and compare them with the D&C EM fits.
The results of Table 4.2 show that D&C EM technique matches
better the statistical properties of the data sets, when compared to the
EM and FW algorithms.
D&C EM fits match better even the higher moments of the data sets.
We present in Table 4.3 the relative errors of fitted
third moments from the actual third moments of all five data sets
(we omit the absolute values because they are too large and not easy to
read).
Figure 4.5 plots the PDH, CDF, and CCDF for each data set and the D&C EM, EM, and FW models. The PDF plots for all five data sets are shown in the first column of graphs in Figure 4.5 (note the logscale of the x-axis). The PDF of Trace 1 is heavily jagged, characteristic of real trace data, which makes matching the PDF more challenging. D&C EM offers accurate fits for all traces. The fits for Traces 2 and 3, both D&C EM and EM ones, do not match well the body of the data PDFs. This happens because Traces 2 and 3 do not have monotone PDFs, while the hyperexponential distribution has a completely monotone PDF [30].
The CDF plots for all traces (middle column of graphs in Figures 4.5) illustrate that the D&C EM provides a good match for the body of the distribution for all traces. In order to investigate the accuracy of the fittings for the tail of the distribution, we present the CCDF plots in log-log scale (third column of graphs in Figure 4.5). Note that even for the tail of the distribution, which reflects the observed variability of the data sets, D&C EM generates models that closely match the data set characteristics.