next up previous
Next: 7.3.2 Fittings using FW Up: 7.3 World Cup 1998 Previous: 7.3 World Cup 1998


7.3.1 Fitting request sizes into a distribution

Analysis of Internet traffic at different levels of the communication infrastructure shows that many processes in Internet-related systems are highly variable and best characterized by heavy-tailed distributions [7,8,4,3,30]. Detailed analysis of server logs [7,3] shows that file-size requests are best described by hybrid distributions, where the body is best described by lognormal distribution and the tail by a power-tailed distribution [3].

To check for the heavy-tail property, we used Boston University's aest tool that verifies and estimates the heavy-tail portion of a distribution [28]7.2. Using the scaling estimator methodology, the tool helps identify the portion of the data set that exhibits power-tailed behavior by demonstrating graphically the tail of the distribution where the heavy-tailed behavior is present. The selection of the point where the power-tailed behavior starts is significant because it affects the computation of the parameters of the distribution. Figure 7.2 shows the results of the scaling analysis for the service process of a representative day of the dataset, i.e., day 80. Considering the tail portion of the plots, for requests larger than $1$ MByte, we see that they are close to linear, suggesting that the heavy-tailed portion of the dataset begins at around 1 MByte. Based on this observation, we conclude that the empirical distribution is best approximated by a hybrid model that combines a lognormal distribution for the body of the data with a power-tailed distribution for its tail [3].

Figure 7.2: Tail characterization for day 80. The various curves in the figure show the ccdf of the dataset on a log-log scale for successive aggregations of the dataset by factors of two. The figure illustrates that the shape of the tail (i.e., for size $> 10^6$) is close to linear and suggests the parameter for its power-tailed distribution. The `+' signs on the plot indicate the points used to compute the $\alpha $ in the distribution. The elimination of points for each successive aggregation indicates the presence of a long tail.
oout-tail.ps

After identifying the two portions of the trace, we need to compute the parameters of each of its portions. The body of the distribution is considered lognormal with PDF:

\begin{displaymath}
f(x) = \frac{1}{bx\sqrt{2\pi}}\exp{ \left(
\frac{-(\ln{x}-a)^2}{2b^2} \right)} .
\end{displaymath}

We compute $b> 0$ (i.e., the shape parameter), and $a\in (-\infty, \infty)$ (i.e., the scale parameter) using the maximum likelihood estimators [48]:

\begin{displaymath}
\hat{a} = \frac{\sum_{i=1}^{n}\ln X_i}{n},~~~
\hat{b} = \lef...
...{\sum_{i=1}^{n}(\ln X_i-\hat{a})^2}{n}
\right]^{\frac{1}{2}} ,
\end{displaymath}

where $X_i$ for $1 \leq i \leq n$ are the sample data. The trace is heavy-tailed with tail index $\alpha $ if its CDF is:

\begin{displaymath}
P[X>x] \sim x^{-\alpha}, ~~~~x \rightarrow \infty, ~~~~0 < \alpha < 2,
\end{displaymath}

where $X$ is the random variable describing the request size. In our study, we compute $\alpha $ via the aest tool.

Once the data is approximated by a hybrid distribution, we apply the FW7.3algorithm [30] for approximating a heavy-tailed distribution with a hyperexponential one. Since the body and the tail of the hybrid distribution are heavy-tailed, yet described by different distribution functions, we apply the algorithm to each component separately and finally combine both fittings into a single hyperexponential, weighting each part accordingly. The weights of the two hyperexponential distributions, corresponding to the lognormal and the power-tailed portions of the original data, are given by the probability that a request is for a file with size less or equal to, or greater than, $1$ Mbyte, respectively. These weights are computed from the empirical data. The final result is a hyperexponential distribution fitting for the entire data set.


next up previous
Next: 7.3.2 Fittings using FW Up: 7.3 World Cup 1998 Previous: 7.3 World Cup 1998
Alma Riska 2003-01-13