EQUILOAD is based on the observation that directing tasks of similar
size to the same server reduces the request slowdown in a Web server.
In a cluster of
web servers, EQUILOAD
requires partitioning the possible request sizes
into
intervals,
,
, up to
, so that server
,
,
is responsible for satisfying the requests with size falling
in the
interval.
EQUILOAD relies on accurate computation of its parameters
and, in Subsection 7.5.3, we demonstrated that
EQUILOAD performance degrades if its parameters do not reflect the
current workload characteristics.
In Subsection 7.6.1, we discussed how the workload can
change its characteristics across not only successive days but also
within a single day. Therefore, a dynamic adjustment of the
boundaries is imperative for high performance.
One simple solution to the above problem is to use the system history,
more precisely the last
requests seen by the system, to build the DDH
needed to determine the boundaries for the allocation of the next
requests (recall that EQUILOAD determines the policy parameters
using the DDH of the request sizes).
must be chosen wisely: it should be
neither too small (since it must ensure that the computed DDH is statistically
significant) or too large (since it must allow adapting to workload
fluctuations).
In Section 7.6.3, we provide a refined algorithm and use
the requests history in a geometrically-discounted fashion as a better
prediction technique.
An additional algorithmic modification is necessary to ensure good performance,
given that the boundaries are computed using an empirical distribution.
If a significant portion of the requests consists of a few popular files,
it may not be possible to select
distinct boundaries and still
ensure that each interval
corresponds to a fraction
of the load.
To solve this problem, we introduce ``fuzzy'' boundaries.
Formally, we associate a probability
to every ``fuzzy'' boundary point
, for
, expressing the portion of the requests for files
of size
that is to be served by server
.
The remaining portion
of requests for this file size is served by
server
, or even additional servers
(for the 1998 World Cup data, sometimes we had to extend a fuzzy boundary
beyond two servers to accommodate a very popular file).
Figure 7.18 illustrates ADAPTLOAD which, unlike
the algorithm EQUILOAD of Figure 7.11, uses past requests
information to determine (fuzzy) boundary points for the future.
Also, since ADAPTLOAD is an online algorithm, it must manipulate DDHs
efficiently (in linear time).
Thus, instead of storing a DDH with a vector of size equal to the number
of files, we use a vector with a constant number
of bins,
so that, for
, bin
accumulates the total number of bytes
due to requests for files with size between
and
,
in the (portion of the) trace being considered7.7.
Thus a DDH is now expressed simply as
.
Accordingly, the (possibly fuzzy) boundaries are expressed in terms of bin
indices, not actual file sizes.