We consider the following model of a distributed server environment.
We assume a fixed number
of back-end servers with the same processing
power, each serving requests in first-come-first-serve order.
We further assume that each server has an unbounded queue.
Requests arrive to the dispatcher according to a Poisson process.
The dispatcher is responsible for distributing the jobs
among the various back-end servers according to a scheduling policy.
We also assume that the dispatcher can derive the request duration
(the size of the file) from the name of the file requested.
We consider the following two load balancing policies
(in neither case the dispatcher uses feedback from the
individual back-end servers to better balance the load, measured in
``bytes to be transferred'',
among them):
We first consider the performance of the random policy under Poisson
arrivals and service process that is driven by the entries of the
synthetically-generated requested file sizes described in
Subsection 7.3.2. Recall
that one of the curves in Figure 7.5
corresponds to the actual stream of requested file sizes corresponding to
day 57 of the World Cup server logs.
In the analysis of this section, we assume that the overall arrival
rate to the dispatcher is
, and that there are eight back-end
servers. Since the processing time for each request is linear to request
size, we approximate the service process
at the Web server by the hyperexponential distribution of requested
file sizes and model its performance using an M/Hr/1 queue. In the case
of the random policy, each Web server in the cluster is model by the
same M/Hr/1 queue with arrival rate
. An M/Hr/1 queue results
in a QBD which we analyze using ETAQA-QBD (see Section
5.3).
In most of our experiments, slowdown is the metric of interest that we choose for evaluation of cluster performance. Average request slowdown, the ratio of response time to service time for a job, is a fundamental measure of responsiveness in Web servers, since users are more willing to wait for ``large'' requests than for ``small'' ones. We start our evaluation by analyzing the performance of the random load balancing policy. Our objective is to be able to understand how the variable service process affects the performance of the random policy and identify possible improvements.
The average request slowdown for the random policy is illustrated in
Figure 7.6(a). Although the system saturates at the same
value of
regardless of the variability in the service process
(recall that all data sets have the same mean request size), the average
request slowdown differs dramatically from service process to service
process, especially in the range of medium-to-high system utilization.
Figure 7.6(b)
illustrates the average queue length at each server as a function of the
variability in the service process for various arrival
rates7.4.
The figure further confirms that the higher the variability in the service
process, the more dramatic the average queue build-up is.
![]() |
To further understand the behavior of the system under the random balancing
policy, we look closer at the range of request sizes that contribute to the
queue build-up and consequently to the performance degradation.
Our analytical model allows us to further explore the system behavior
by analyzing how the queue length builds up.
Figure 7.7 sketches the CTMC that models a back-end server,
an M/Hr
/1 server (for presentation clarity, not all arcs are
illustrated in the picture, but the reader can visualize the shape of
the Markov chain and, most importantly, identify the parts of the CTMC
that correspond to the power-law portion of the workload and the lognormal
portion of the workload).
The model of Figure 7.7 is analyzed using ETAQA-QBD which allows the exact computation of the stationary system queue length. Figure 7.8 illustrates the contribution to the overall queue length from the instances when short requests get stuck behind long ones, i.e., behind requests for files larger than 1MB (files that are part of the power-tailed portion of the distribution) and 100 KBytes (the tail part of the lognormal distribution together with the power-tailed part of the distribution). Since the FW algorithm ``splits'' the distribution (each portion corresponding to one phase of the hyperexponential distribution), it is possible to calculate approximately the contribution to the queue length of instances of queue built-ups because short requests wait for longer ones to be served. Figure 7.8(a) illustrates that, at medium-to-high load, the queue built-up due to requests from the power-tailed portion begin served is about 20% of the overall system queue length (even if the frequency of these requests is almost negligible). This percentage is much larger at smaller arrival rates.
We also note that if the lognormal portion has low variability, the queue build-up is dominated by instances of short requests waiting behind long ones, which we call power-tailed queue. This can be explained by examining the contribution to the queue build-up by the tail of the lognormal distribution. Figure 7.8(b) shows that the requests from the tail of the lognormal distribution play an important role on the performance (requests for files larger than 100 KBytes dominate the queue across the entire range of arrival rates) and illustrates that for higher variabilities in the service process, the system queue length due to large yet rare requests is significant.
![]() |
These last observations suggest that it may be appropriate to assign requests to specific servers according to their sizes. We conjecture that by reserving servers for scheduling requests of similar sizes, we ensure that no severe imbalances in the utilization of each of the servers occur.
The fitting provided by the FW algorithm for the requested files sizes provides a hyperexponential distribution with a special property: each exponential phase corresponds to a certain range of request (file) sizes. Thus, we use the hyperexponential distribution to make an educated guess on distributing the incoming requests across the back-end servers, ensuring that the variance of the service time distribution of the requests served by each server is kept as low as possible. Figure 7.9 illustrates this size-based policy.
By applying the size-based policy to our data sets with the stream of
requested files sizes, we notice that a single server suffices to serve
the power-tailed portion and the tail of the lognormal portions of the
request file sizes distribution. The body of the lognormal portion must
instead be served by the remaining seven servers.
Figure 7.10 illustrates the average queue length of the
hosts using either the size-based or the random policy for two fixed arrival
rates,
and
, representing medium and high
load, respectively.
In contrast to the random policy, the average queue length with the size-based policy does not increase as the variability in the service process increases. This indicates that the size-based policy, being aware of the heavy-tailed behavior in the service process, sustains better the variability effects in the Web server performance. Figure 7.10 shows similar results with respect to the expected request slowdown. We conclude that taking into consideration the properties of the service process in the load balancing policy, as we do in the sized-based policy, insures better overall performance of the Web server cluster.