Analysis of Internet traffic at different levels of the communication infrastructure shows that many processes in Internet-related systems are highly variable and best characterized by heavy-tailed distributions [7,8,4,3,30]. Detailed analysis of server logs [7,3] shows that file-size requests are best described by hybrid distributions, where the body is best described by lognormal distribution and the tail by a power-tailed distribution [3].
To check for the heavy-tail property, we used Boston University's
aest tool that verifies and estimates the heavy-tail portion
of a distribution [28]7.2.
Using the scaling estimator methodology, the tool helps
identify the portion of the data set that exhibits power-tailed
behavior
by demonstrating graphically the tail of the distribution where
the heavy-tailed behavior is present.
The selection of the point where the power-tailed behavior starts is
significant because it affects the computation of the
parameters of the distribution.
Figure 7.2 shows the results of the scaling analysis
for the service process of a representative day of the dataset, i.e.,
day 80.
Considering the tail portion of the plots, for requests larger than
MByte, we see that they are close to linear, suggesting that
the heavy-tailed portion of the dataset begins at around 1 MByte.
Based on this observation, we conclude that the empirical distribution
is best approximated by a hybrid model that combines
a lognormal distribution for the body of the data
with a power-tailed distribution for its tail [3].
|
Once the data is approximated by a hybrid distribution, we apply the
FW7.3algorithm [30] for approximating a heavy-tailed distribution
with a hyperexponential one.
Since the body and the tail of the hybrid distribution are heavy-tailed,
yet described by different distribution functions, we apply the algorithm
to each component separately and finally combine both fittings into
a single hyperexponential, weighting each part accordingly.
The weights of the two hyperexponential distributions, corresponding to
the lognormal and the power-tailed portions of the original data, are given
by the probability that a request is for a file with size less or equal
to, or greater than,
Mbyte, respectively. These weights are computed
from the empirical data. The final result is a hyperexponential distribution
fitting for the entire data set.