Next: 4.2 D&C EM
Up: 4. Data Fitting Algorithms
Previous: 4. Data Fitting Algorithms
4.1 Long-tailed behavior
There is abundance of evidence [3,50] that the service
process in Internet-related systems is heavy-tailed.
This characteristic affects the complexity of such systems
(e.g., Web servers, routers).
As a motivation, we first present the effects of high variability on
user perceived performance by analyzing the behavior of an elementary
queueing system whose service process is long-tailed.
We use an M/G/1-type queue, i.e., a single server queue with general service
process, admitting Markovian arrivals. We run several experiments, that
are distinguished from each other by the characteristics of the service
process only: the mean of the service process, i.e., the first moment, is
fixed, while the coefficient of variation, i.e., the second moment, varies
within a certain range. The M/G/1 queue is solved using the
matrix-geometric method outlined in Chapter 3.
Figure 4.1 illustrates the effects of the
service process variability on the expected mean slowdown4.1 as a function of the arrival rates.
As the arrival rate in the system increases, the average slowdown for the
M/G/1 queue with the highest variability in the service process increases
much faster than the average slowdown of the queues with less
variability in their service process.
The same behavior is also observed in Figure 4.1(b),
where we plot the average queue length as a function of the variability in
the service process.
We observe that for high arrival rates the effect of the highly variable
service process on performance is more severe than for the case of low
arrival rates.
This simple example clearly shows that capturing the variability in the service
process is essential for accurate performance analysis.
Figure 4.1:
Effects in the performance of queuing system with highly variable
service process.
 |
Long-tailed behavior in a data set is commonly approximated by distributions
such as Lognormal, Weibull, and Pareto.
The analysis of queueing systems that admit these distributions as either
arrival or service process is complex and, usually, intractable.
PH distributions (along with their special case of Coxian and hyperexponential
distributions) offer an attractive alternative that can capture variability
via a tractable distribution.
Their only ``drawback'' is that they require more parameters than Lognormal or
Weibull distributions, making their estimation procedures more complex.
To avoid the complicated fitting algorithms and yet benefit from their
tractability, modelers often use Coxian or hyperexponential distributions
with only two phases. In such cases, there are only few, i.e., 2 or 3,
parameters to estimate, and the fitting is usually done by matching the first
and the second moments only [43, pages 141-143].
Below we show that a simple PH distribution, such as a two-phase hyperexponential
model, does not capture the complex behavior of the service process in
Internet-related systems.
To fit an empirical distribution with a certain mean and CV into a 2-phase
hyperexponential distribution, which is a mixture of two
exponentials only, we need to calculate the following parameters:
(
),
, and
, as the probability density function of
a hyperexponential distribution is given by:
One can match the mean
and the coefficient of variation
of the data set with the mean
and the coefficient of variation
of the hyperexponential distribution, using the following
formulas [43, pages 141-143]:
This is a classic case of a system of two equations and three
unknowns. This system can be solved by guessing one of its unknowns (in this
case the easiest one to guess is
since
).
To illustrate that such a simple model does not capture the complex behavior
of the queueing systems with heavy-tailed service process, we fit a
data set into a two-phase hyperexponential using the above method.
The data consists of the sizes of the requested files processed by a Web
server (i.e., one entire day of server logs from the 1998 World Soccer Cup site,
whose characteristics we evaluate in more detail later in this dissertation).
Assuming that the requests arrive to the system according to a
Poisson process and the service process is determined by the sizes of the
requested files, we model the Web server as an M/H
/1 queue.
To assess the accuracy of the results obtained from the analytic solution
of this queueing model (solved using the matrix-geometric method outlined
in Chapter 3),
we simulate the same single server queue using an identical arrival process
but now the
service process is driven by the original trace data. We present our
findings in Figure 4.2.
First, in Figure 4.2(a), we present the average queue
length in the server as function of the arrival rate. We observe that for
different guessed values of
the models compute exactly the same average
queue length for each arrival rate and match the respective simulations
result. Note that for performance metrics such as the average queue length,
simple models are sufficient. We go a step further and analyze the queue
length distribution. We present the body and the tail
of the queue length distribution (measured for high server utilization) in
Figures 4.2(b) and (c), respectively.
None of the queue length distributions of the analytic models follows the
shape of the queue length distribution obtained by the trace-driven
simulation.
Figure 4.2:
Average queue length, the body, and the tail of the
queue length distribution (80% server utilization).
 |
With the examples presented in this section, we emphasize the
complexity in the behavior of systems that operate under a
highly variable service process and the importance of capturing
it correctly for accurate performance analysis results.
In the following sections, we present fitting
techniques that capture correctly the long-tailed behavior of the
data sets and the respective queueing systems.
Next: 4.2 D&C EM
Up: 4. Data Fitting Algorithms
Previous: 4. Data Fitting Algorithms
Alma Riska
2003-01-13