Again, we base our technique on the analysis of the data set CDH.
Contrary to D&C EM, D&C MM requires determining the number of
partitions
before splitting the data into subsets.
Partition boundaries are determined such that the expected value of each
partition is
, where
is the expected value of the entire
data set.
Once the data set is partitioned, we compute the CV of each subset of data.
If the CV of a subset of data is less than 1.0, i.e., the CV of the exponential
distribution, we fit that subset of data into a hypoexponential
distribution. If the CV is greater than 1.0, we fit that subset of data
into a hyperexponential distribution.
We fit each subset of data into a PH distribution using the Newton-Raphson
method of moment matching, which is described in details in
Appendix C and [98].
The resulting PH distribution is a mixture of hypoexponential and
hyperexponential distributions.
We formally present the D&C MM algorithm in Figure 4.11.
To illustrate the fitting methodology, we selected a data set containing the sizes of the requested files, i.e., the service process, measured during one entire day at the World Cup'98 Web site. We split the data set into four partitions and present in Figure 4.12 the PH distribution resulting form the merging of the four individual fittings; of those, the first one is a two-stage hyperexponential, while the other three are hypoexponential, with the last one very close to an Erlang (the numbers written inside each stage are the rates of the corresponding exponential distributions, while those on the arcs describe probabilistic splittings). To assess the quality of the overall PH fitting, we plot the PDF and CDF of the data and of our fit in Figure 4.134.4. We also evaluate the accuracy of the fitting from the queueing system perspective, and present the results in Table 4.5 (we assume a M/PH/1 server with Poisson arrivals and the fitted PH distribution for service process). We conclude that D&C MM is a fast and accurate approach to fit data sets into PH distributions.