next up previous
Next: 4.2.1.2 D&C EM statistical Up: 4.2.1 D&C EM experimental Previous: 4.2.1 D&C EM experimental


4.2.1.1 Traces

We have selected five highly variable data sets to test our approach. The first data set (indicated as ``Trace 1'') is a trace from the 1998 World Soccer Cup Web site4.2. It contains the sizes of the files requested by clients from this Web site in the course of an entire day. The other four traces are synthetically generated from analytic models that closely approximate Web server traffic [3]. Traces 2 and 3 are generated from Lognormal distributions with shape parameters 1.85 and 1.5, respectively, and the same scale parameter 7.0. Traces 4 and 5 are generated from Weibull distributions with shape parameters 0.25 and 0.35, respectively, and the same scale parameter 9.2. The statistical characteristics of these data sets are shown in Table 4.1.

Table 4.1: Statistical characteristics of the data sets.
Trace Entries Unique Mean CV
1 16045065 12122 4407.81 7.28
2 25000 25000 6358.23 5.87
3 25000 25000 3459.86 3.13
4 25000 22969 227.27 7.36
5 25000 24298 47.50 3.86


The number of entries and the number of unique entries for each data set are significant for the performance of the D&C EM since the running time of the EM algorithm depends on these parameters [72]. Observe that the real trace has less unique entries than the synthetically generated data sets.


next up previous
Next: 4.2.1.2 D&C EM statistical Up: 4.2.1 D&C EM experimental Previous: 4.2.1 D&C EM experimental
Alma Riska 2003-01-13