• No se han encontrado resultados

LA CORRUPCIÓN DE LOS MEDIOS DE COMUNICACIÓN Y LOS «VLADIVIDEOS»

A cache is fundamentally a block of memory that can be used to store data items that are frequently requested. Over the years, different paradigms have evolved on how best to utilize the available memory. Most conventional caching algorithms, such as LRU, RANDOM and FIFO, have been designed and analyzed on an simple (isolated) cache, as shown in Figure 4.1 (a). The model here is that any time a request arrives, the corresponding data item is fetched and cached if it is not already in the cache. Where to place this item, and which item to evict if needed determines the nature of the caching algorithm. New caching algorithms have been proposed in the past few years, which have been shown to have better performance than classical model, often through numerical studies. The different dimensions that have been explored are two fold. One the one hand, the memory block can be divided into two or more levels, with a hierarchical algorithm attempting to ensure that more popular content items get cached in the higher levels. For example, a simple 2-level cache is shown in Figure 4.1 (b), and it has been empirically observed that under an

simple cache of the same size. On the other hand, a meta-cache that simply stores content identities an approach that can be used to better learn popularity without wasting memory to cache the actual data item. The idea is illustrated in Figure 4.1 (c) with one level of meta caching. Meta-caches are an efficient way of ensuring that only popular items are ever cached, and empirical observations suggest that when coupled with an appropriate caching algorithm, they too are quite effective at increasing the hit rate. However, in both cases, it is not clear how the multi-level caches and meta-caches enhance the hit probability, and what impact they have on the convergence to stationarity of the caching scheme. In this section, we first characterize the performance of an isolated cache through τ -distance and mixing time to study the adaptability of these algorithms. In the next section, we use this technique to study how the number of cache levels and cache partitions impact the performance. Finally, we use the insights gained in this process to design a new caching paradigm algorithm that combines ideas from using multi-level cache and meta-caches, as shown in Figure 4.1 (d). We design an algorithm to be applied to this cache structure, and name the resulting algorithm as Adaptive-LRU (A-LRU).

4.1.2 Related Work

Caching algorithms have mostly been analytically studied under the IRM Model. Explicit results for stationary distribution and hit probability for LRU, FIFO, RAN- DOM, CLIMB [9, 25, 35, 55, 85] have been derived under IRM, however, these results are only useful for small caches due to the computational complexity of solving for the stationary distribution. Several approximations have been proposed to analyze cache of a reasonably large size [27,82], and a notable one is the Time-To-Live (TTL) approximation, which was first introduced for LRU under IRM [20]. It has been fur- ther generalized to other situations [12, 30, 34, 70, 82]. Theoretical support on the

accuracy of TTL approximation was presented in [12]. A rich literature also studies the performance of caching algorithms in terms of hit probability based on real trace simulations, e.g., [63, 70, 71, 95], and we do not attempt to provide an overview here.

4.1.3 Organization

The next section contains some technical preliminaries and representative caching algorithms. We derive the steady state distributions of the algorithms in Section 4.3 and identify hit probabilities in Section 4.4. We consider our new notion of τ -distance in Section 4.5 and mixing time in Section 4.6. We join the two notions and investigate the learning error in Section 4.7. We conclude in Section 4.8.

4.2 Technical Preliminaries 4.2.1 Traffic Model

To compare various caching algorithms, it is necessary to define a model of how we specify the reference items first. For most of our analysis, we consider the simplest and most widely used stochastic model which is called the Independent Reference Model (IRM) [25]. In our numerical investigations, we will also consider three more realistic request processes: a Markov-modulated request process, a YouTube request trace [95], and one request trace from the IRCache project [13]. In IRM, the request process {r1, r2,· · · } is given by a sequence of independent, identically distributed

random variables with a fixed probability distribution

P(rt = i) = pi, i∈ {1, · · · , n}, t ∈ {1, 2, · · · }, (4.1)

where rt is the item referenced by the t-th request, and there are n different items.

Without loss of generality (w.l.o.g.), we assume that the reference items are numbered so that the probabilities are in a non-increasing order, i.e., p1 ≥ p2 ≥ · · · ≥ pn.

4.2.2 Popularity Law

Whereas our analytical results are not for any specific popularity law, for our numerical investigations we will use a Zipf-like distribution as this family has been frequently observed in real traffic measurements, and is widely used in performance evaluation studies in the literature [19]. For a Zipf-like distribution, the probability to request the i-th most popular item is pi = A/iα, where α is the Zipf parameter

that depends on the application considered [31], and A is the normalization constant so thatPn

i=1pi = 1 if there are totally n unique items to be considered in the system.

4.2.3 Caching Algorithms

There exist a large number of caching algorithms, with the difference being in their choice of insertion or eviction rules. In this section, we consider the following representative algorithms.

LRU: [34] When there is a request for item i, there are two cases: (1) i is not in the cache (cache miss), then i is inserted in the first position in the cache, all other items move back one position, and the item that was in the last position of the cache is evicted; (2) i is in position j of the cache (cache hit), then i moves to the first position of the cache, and all other items that were in positions 1 to j− 1 move back one position.

FIFO: The difference between FIFO and LRU is when a cache hit occurs on an item that was in position j. In FIFO, this item does not change its position.

RANDOM: The difference between RANDOM and FIFO is when a cache miss occurs, the item is inserted in a random position, and the item that was in this randomly selected position is evicted.

CLIMB: [25,85] The difference between CLIMB and LRU is when a cache hit occurs on an item that was in position j. In CLIMB, this item is inserted in position j + 1,

and the item that was in position j + 1 moves to position j.

Remark 1 LRU has been widely used due to its good performance and ease of imple- mentation. FIFO and RANDOM have been used to replace LRU in some scenarios since they are easier to implement with a reasonable good performance. CLIMB has been numerically shown to have a higher hit ratio than LRU, at the expense of longer time to reach this steady state than LRU.