next up previous
Next: 4.2 D&C EM Up: 4. Data Fitting Algorithms Previous: 4. Data Fitting Algorithms


4.1 Long-tailed behavior

There is abundance of evidence [3,50] that the service process in Internet-related systems is heavy-tailed. This characteristic affects the complexity of such systems (e.g., Web servers, routers). As a motivation, we first present the effects of high variability on user perceived performance by analyzing the behavior of an elementary queueing system whose service process is long-tailed. We use an M/G/1-type queue, i.e., a single server queue with general service process, admitting Markovian arrivals. We run several experiments, that are distinguished from each other by the characteristics of the service process only: the mean of the service process, i.e., the first moment, is fixed, while the coefficient of variation, i.e., the second moment, varies within a certain range. The M/G/1 queue is solved using the matrix-geometric method outlined in Chapter 3.

Figure 4.1 illustrates the effects of the service process variability on the expected mean slowdown4.1 as a function of the arrival rates. As the arrival rate in the system increases, the average slowdown for the M/G/1 queue with the highest variability in the service process increases much faster than the average slowdown of the queues with less variability in their service process. The same behavior is also observed in Figure 4.1(b), where we plot the average queue length as a function of the variability in the service process. We observe that for high arrival rates the effect of the highly variable service process on performance is more severe than for the case of low arrival rates. This simple example clearly shows that capturing the variability in the service process is essential for accurate performance analysis.

Figure 4.1: Effects in the performance of queuing system with highly variable service process.
\begin{figure}\centering {\psfig{figure=III/FIGS/variability-effects.eps, width=0.95 \linewidth}}
\end{figure}

Long-tailed behavior in a data set is commonly approximated by distributions such as Lognormal, Weibull, and Pareto. The analysis of queueing systems that admit these distributions as either arrival or service process is complex and, usually, intractable. PH distributions (along with their special case of Coxian and hyperexponential distributions) offer an attractive alternative that can capture variability via a tractable distribution. Their only ``drawback'' is that they require more parameters than Lognormal or Weibull distributions, making their estimation procedures more complex. To avoid the complicated fitting algorithms and yet benefit from their tractability, modelers often use Coxian or hyperexponential distributions with only two phases. In such cases, there are only few, i.e., 2 or 3, parameters to estimate, and the fitting is usually done by matching the first and the second moments only [43, pages 141-143]. Below we show that a simple PH distribution, such as a two-phase hyperexponential model, does not capture the complex behavior of the service process in Internet-related systems.

To fit an empirical distribution with a certain mean and CV into a 2-phase hyperexponential distribution, which is a mixture of two exponentials only, we need to calculate the following parameters: $p$ ( $0 \leq p \leq 1$), $\lambda_1$, and $\lambda_2$, as the probability density function of a hyperexponential distribution is given by:

\begin{displaymath}
h_2(x) = p \lambda_1 {\mathbf{e}}^{-\lambda_1 x} + (1-p) \la...
...thbf{e}}^{-\lambda_2 x},
x \geq 0, ~~\lambda_1,\lambda_2 > 0.
\end{displaymath}

One can match the mean $m_d$ and the coefficient of variation $cv_d$ of the data set with the mean $m_h$ and the coefficient of variation $cv_h$ of the hyperexponential distribution, using the following formulas [43, pages 141-143]:

\begin{displaymath}
m_d=m_h=\frac{p}{\lambda_1} + \frac{1-p}{\lambda_2} ~~~~\mbo...
...\frac{2p}{\lambda_1^2} + \frac{2(1-p)}{\lambda_2^2}}{m_d^2}-1.
\end{displaymath}

This is a classic case of a system of two equations and three unknowns. This system can be solved by guessing one of its unknowns (in this case the easiest one to guess is $p$ since $0 \leq p \leq 1$). To illustrate that such a simple model does not capture the complex behavior of the queueing systems with heavy-tailed service process, we fit a data set into a two-phase hyperexponential using the above method. The data consists of the sizes of the requested files processed by a Web server (i.e., one entire day of server logs from the 1998 World Soccer Cup site, whose characteristics we evaluate in more detail later in this dissertation). Assuming that the requests arrive to the system according to a Poisson process and the service process is determined by the sizes of the requested files, we model the Web server as an M/H$_2$/1 queue. To assess the accuracy of the results obtained from the analytic solution of this queueing model (solved using the matrix-geometric method outlined in Chapter 3), we simulate the same single server queue using an identical arrival process but now the service process is driven by the original trace data. We present our findings in Figure 4.2. First, in Figure 4.2(a), we present the average queue length in the server as function of the arrival rate. We observe that for different guessed values of $p$ the models compute exactly the same average queue length for each arrival rate and match the respective simulations result. Note that for performance metrics such as the average queue length, simple models are sufficient. We go a step further and analyze the queue length distribution. We present the body and the tail of the queue length distribution (measured for high server utilization) in Figures 4.2(b) and (c), respectively. None of the queue length distributions of the analytic models follows the shape of the queue length distribution obtained by the trace-driven simulation.
Figure 4.2: Average queue length, the body, and the tail of the queue length distribution (80% server utilization).
\begin{figure}\centering {\psfig{figure=III/FIGS/motivation-ql-dist.eps, width=0.99 \linewidth}}
\end{figure}

With the examples presented in this section, we emphasize the complexity in the behavior of systems that operate under a highly variable service process and the importance of capturing it correctly for accurate performance analysis results. In the following sections, we present fitting techniques that capture correctly the long-tailed behavior of the data sets and the respective queueing systems.


next up previous
Next: 4.2 D&C EM Up: 4. Data Fitting Algorithms Previous: 4. Data Fitting Algorithms
Alma Riska 2003-01-13