We consider the following model of a distributed server environment. We assume a fixed number of back-end servers with the same processing power, each serving requests in first-come-first-serve order. We further assume that each server has an unbounded queue. Requests arrive to the dispatcher according to a Poisson process. The dispatcher is responsible for distributing the jobs among the various back-end servers according to a scheduling policy. We also assume that the dispatcher can derive the request duration (the size of the file) from the name of the file requested. We consider the following two load balancing policies (in neither case the dispatcher uses feedback from the individual back-end servers to better balance the load, measured in ``bytes to be transferred'', among them):
We first consider the performance of the random policy under Poisson arrivals and service process that is driven by the entries of the synthetically-generated requested file sizes described in Subsection 7.3.2. Recall that one of the curves in Figure 7.5 corresponds to the actual stream of requested file sizes corresponding to day 57 of the World Cup server logs. In the analysis of this section, we assume that the overall arrival rate to the dispatcher is , and that there are eight back-end servers. Since the processing time for each request is linear to request size, we approximate the service process at the Web server by the hyperexponential distribution of requested file sizes and model its performance using an M/Hr/1 queue. In the case of the random policy, each Web server in the cluster is model by the same M/Hr/1 queue with arrival rate . An M/Hr/1 queue results in a QBD which we analyze using ETAQA-QBD (see Section 5.3).
In most of our experiments, slowdown is the metric of interest that we choose for evaluation of cluster performance. Average request slowdown, the ratio of response time to service time for a job, is a fundamental measure of responsiveness in Web servers, since users are more willing to wait for ``large'' requests than for ``small'' ones. We start our evaluation by analyzing the performance of the random load balancing policy. Our objective is to be able to understand how the variable service process affects the performance of the random policy and identify possible improvements.
The average request slowdown for the random policy is illustrated in Figure 7.6(a). Although the system saturates at the same value of regardless of the variability in the service process (recall that all data sets have the same mean request size), the average request slowdown differs dramatically from service process to service process, especially in the range of medium-to-high system utilization. Figure 7.6(b) illustrates the average queue length at each server as a function of the variability in the service process for various arrival rates7.4. The figure further confirms that the higher the variability in the service process, the more dramatic the average queue build-up is.
To further understand the behavior of the system under the random balancing policy, we look closer at the range of request sizes that contribute to the queue build-up and consequently to the performance degradation. Our analytical model allows us to further explore the system behavior by analyzing how the queue length builds up. Figure 7.7 sketches the CTMC that models a back-end server, an M/Hr/1 server (for presentation clarity, not all arcs are illustrated in the picture, but the reader can visualize the shape of the Markov chain and, most importantly, identify the parts of the CTMC that correspond to the power-law portion of the workload and the lognormal portion of the workload).
The model of Figure 7.7 is analyzed using ETAQA-QBD which allows the exact computation of the stationary system queue length. Figure 7.8 illustrates the contribution to the overall queue length from the instances when short requests get stuck behind long ones, i.e., behind requests for files larger than 1MB (files that are part of the power-tailed portion of the distribution) and 100 KBytes (the tail part of the lognormal distribution together with the power-tailed part of the distribution). Since the FW algorithm ``splits'' the distribution (each portion corresponding to one phase of the hyperexponential distribution), it is possible to calculate approximately the contribution to the queue length of instances of queue built-ups because short requests wait for longer ones to be served. Figure 7.8(a) illustrates that, at medium-to-high load, the queue built-up due to requests from the power-tailed portion begin served is about 20% of the overall system queue length (even if the frequency of these requests is almost negligible). This percentage is much larger at smaller arrival rates.
We also note that if the lognormal portion has low variability, the queue build-up is dominated by instances of short requests waiting behind long ones, which we call power-tailed queue. This can be explained by examining the contribution to the queue build-up by the tail of the lognormal distribution. Figure 7.8(b) shows that the requests from the tail of the lognormal distribution play an important role on the performance (requests for files larger than 100 KBytes dominate the queue across the entire range of arrival rates) and illustrates that for higher variabilities in the service process, the system queue length due to large yet rare requests is significant.
These last observations suggest that it may be appropriate to assign requests to specific servers according to their sizes. We conjecture that by reserving servers for scheduling requests of similar sizes, we ensure that no severe imbalances in the utilization of each of the servers occur.
The fitting provided by the FW algorithm for the requested files sizes provides a hyperexponential distribution with a special property: each exponential phase corresponds to a certain range of request (file) sizes. Thus, we use the hyperexponential distribution to make an educated guess on distributing the incoming requests across the back-end servers, ensuring that the variance of the service time distribution of the requests served by each server is kept as low as possible. Figure 7.9 illustrates this size-based policy.
By applying the size-based policy to our data sets with the stream of requested files sizes, we notice that a single server suffices to serve the power-tailed portion and the tail of the lognormal portions of the request file sizes distribution. The body of the lognormal portion must instead be served by the remaining seven servers. Figure 7.10 illustrates the average queue length of the hosts using either the size-based or the random policy for two fixed arrival rates, and , representing medium and high load, respectively.
In contrast to the random policy, the average queue length with the size-based policy does not increase as the variability in the service process increases. This indicates that the size-based policy, being aware of the heavy-tailed behavior in the service process, sustains better the variability effects in the Web server performance. Figure 7.10 shows similar results with respect to the expected request slowdown. We conclude that taking into consideration the properties of the service process in the load balancing policy, as we do in the sized-based policy, insures better overall performance of the Web server cluster.