• No se han encontrado resultados

Errores devueltos por la plataforma mediante el esquema SOAP FAULT

3. Descripción del Sistema

3.5 Errores devueltos por la plataforma mediante el esquema SOAP FAULT

In this chapter, we present the experimental results of my algorithm with variety of configurations including write buffer size, flash page size and number of elements in a SSD. My primary focus in this experiment is how much performance gain my EPO scheme gets in response time and throughput. We have also measured response time and throughput of four different algorithms; EPO, LRU, BPLRU [25] and NoCache. To conduct these experiments we used three real world system traces Financial1 [29], Financial2 [29] and TPC-C [30]. We used these traces in my simulation study to evaluate the performance of EPO and other three algorithms NoCache, LRU and BPLRU. We also used synthetically generated workload to further evaluate my algorithm. These synthetically generated workloads have 100% random write requests. The main purpose of this synthetically generated workload is to see how EPO performs in pure random write workload environment. In this chapter we will describe the experimental settings, impact of write buffer size, page size and number of elements on the performance of a SSD, nature of traces used and how EPO out performs other three

algorithms.

All simulation experiments are conducted in three independent stages sequentially:

pre-processing, reshaping and feeding. In pre-processing stage pro-processor (Figure 4.1a)

performs four different tasks: (1) Real world traces we used have both read and write requests. In this experiment we are only focusing on write requests and we do not need read requests. So the first task of pre-processor is to get rid of all read requests from the traces. (2) These original traces are very long and contain millions of traces which require quite a long time to simulate the results. So we decided to use first few millions of write requests in my experiment. Second task of the pre-processor is to truncate these original workloads and making it suitable for my experiments. (3) In TPC-C trace, the LBA is range is too high which is not a good fit to SSD configuration we used. Pre-processor evenly shrinks the trace’s logical address space so that logical address of each write request can be physically mapped on SSD. Logical address space in Financial1 and Financial2 traces are suitable for

my experiments so pre-processor does not need to do any truncation in logical address space for these traces. (4) As we discussed in Chapter 4, SSD extension for DiskSim is mimicking the SSD as HDD and combines number of sectors in page. Thus, pre-processor needs to make each request size multiple of the page size, in my experiments, the request size must be in multiple of eight (sector, i.e. 4KB). Now output of this pre-processor is suitable to SSD storage system.

In reshaping stage, traces go through different buffer management schemes. Traces went through these algorithms are reshaped according to their management policies and then they are fed to the SSD. We implemented three different algorithms other than my own scheme EPO. We implemented and ran all four different buffer management schemes NoCache, LRU, BPLRU and EPO on a Dell PowerEdge 1900 server with two Quad Core Intel® E5310 1.60 GHz processor and 8GB FB-DIMM memory. After FTL mapped logical address of each request to physical address, they are then buffered into the write buffer B (Figure 4.1b) and by managed by individual scheme like EPO. The output of the write buffer is a rearranged request set, which contains all write requests evicted from the buffer

according to the victim selection policy of a buffer management scheme. In the final stage, these rearranged request set is fed to SSD extension to DiskSim simulator.

We evaluate the four buffer management scheme by running simulation over three real world traces: Financial1 [29], Financial2 [29] and TPC-C [30], which has been widely used in researches conducted by different authors. The statistics of the real world traces are listed in Table 5.1.

Table 5.1. Statistics – Real World Traces

Workloads Financial1 Financial2 TPC-C

Number of writes 2,000,000 650,000 2,000,000

Mean write size (KB) 3.9 2.9 10.2

Write per second 62.76 10.52 4337.83

Write size range (KB) 0.5-3148.5 0.5-256.5 0.5-1024

Financial1 and Financial are taken from OLTP [31] applications running at two large financial institutions [29]. Financial1 is a write dominant trace which contains more than 60% write requests while Financial2 is a read dominant trace which contains more than

80% read requests. The average size of write requests in Financial1 trace is 3.9KB while in Financial2 it is 2.9KB which is smaller than our 4KB page size requirement. So pre-process has to make all these requests from Financial1 and Financial2 to 4KB. TPC-C is an I/O trace collected on a storage system connected to a Microsoft SQL Server via storage area network [30]. The average write size of requests in TPC-C trace is larger than 10KB which is also not suitable for my experiments. Pre-process has to also truncate size of each request to 4KB. From Table 5.1, it can be seen that TPC-C is a very intensive workload as more than

4000 requests arrive per second. TPC-C trace is distributed on all over the storage system and does not show much temporal or spatial locality [32]. The nature of TPC-C workload is very random. Financial1 and Financial2 are not very intensive with respect to TPC-C and show some temporal and spatial locality. For Financial1 trace, more than 60 requests arrive per second while in Financial2 trace, almost 10 requests arrive per second. The randomness nature of Financial1 is less than TPC-C while randomness nature of Financial2 is between TPC-C and Financial1. We selected these three traces so that the EPO scheme can be evaluated under different degrees of access randomness. And due to the time constraint we use two million requests for Financial1 and TPC-C traces while 0.65 million requests for Financial2 trace.

Now, let us look at the three different algorithm schemes that we are going to compare my scheme EPO with. BPLRU algorithm [25] is recently proposed and very well welcomed by the researchers. On the other hand, LRU is legacy policy and widely

implemented in cache management policies in read systems. We also tested the simulation with No Cache, for the system having no cache in which all the requests directly go to the SSD storage system. The purpose of implementing and testing all these schemes is to

compare EPO scheme with these proposed schemes and with having no cache in the system. We are explaining these schemes very briefly:

1. BPLRU (Block Padding Least Recently Used): The strategy taken BPLRU can also be seen as read-modify-write and granularity of this strategy is a block, the SSD block, in our case 64 pages of a block. When a victim is chosen from the buffer to

accommodate the newly arrived request, BPLRU select a block which is not be referenced recently. BPLRU then looks for pages within that block which are not read form flash. BPRLU reads those pages, makes a full valid block and writes it back to flash. This is why it is called read-modify-write policy. This policy has certain drawbacks: (1) Performance is highly dependent on the spatial locality. If pages

within same block are not requested than performance goes down. (2) The read- modify-write amplifies the read and write requests in great amount which further degrade the performance. This is a very expensive merge operation.

2. LRU (Least Recently Used): This is very widely used and legacy scheme, in which victim is selected based on its access time in the buffer.

3. NoCache: This is to test the system having no cache. All requests coming from host directly go to SSD storage system.

In my experiments, different configurations are used to evaluate the performance of EPO scheme. These configuration parameters are provided in Table 5.2. Number of elements in the SSD varies from 16 to 64, thus capacity of the SSD changes from 64GB to 256GB as each element is of 4GB size. The buffer is kept as small as possible considering the price/bit and varies from 4MB to 32MB. Each element contains two dies and each die contains four planes. Further, each plane contains 2048 blocks and each block has 64 pages.

Table 5.2. Simulation Parameters

Parameter Value (Default) – (Varied)

Write buffer capacity (MB) (8) – (4, 8, 16, 32) Number of elements (48) – (16, 32, 48, 64) Number of planes in an element (8)

Page size (KB) (4) – (1, 2, 4)

Flash block size (page) (64)

Element capacity (GB) (4)

Flash SSD capacity (GB) (192) – (64, 128, 192, 256)

Block erase latency (μs) (1500) Page read latency (μs) (25) Page write latency (μs) (200) Chip transfer latency per byte (μs) (0.025)

As seen from the Table 5.2, block erase operation requires lot more time than read and write. The number of invalid blocks increase for erase based on the randomness of the write requests. Write requests update the flash page and makes it invalid. So it is very important to reshape this write requests and make it as sequential as possible to improve the performance.

Now we will see the experimental results of EPO and other algorithms for three different trace with varying configuration of SSD. The goal of this experiment is to compare EPO against two other cache management schemes LRU and BPLRU. We also ran

simulation with no cache. We tried to experiment the impact of write buffer size on all these four algorithms and got the following results in Figure 5.1.

Figure 5.1. Performance impact of write buffer size on four schemes.

The test results in Figure 5.1 show the graphs for mean response time in milliseconds and Throughput in MB/S. This experiment was conducted with write buffer varying from 4MB to 32MB. As seen from the results, for all four schemes mean response time and throughput does not change with increasing size of buffer. This happens because even 32MB buffer is very small for enterprise level of workload and gets filled very quickly by arrival requests. Still, results for EPO are always better than rest of the schemes and show how EPO

exploits the element-level parallelism. From the results it been concluded that EPO reduces the mean response time by 38.9%, 33.11% and 44.63% than NoCache, LRU and BPLRU respectively while increases throughput of the system by 63.77%, 49.5% and 79.8% than NoCache, LRU and BPLRU respectively. For Financial2, EPO outperform all three schemes and reduces the mean response time by 32.53%, 32.28% and 50.17% than NoCache, LRU and BPLRU respectively. EPO also improves the system throughput by 47.16%, 47.67% and 100.44% than NoCache, LRU and BPLRU. EPO improves the throughput by 100% than BPLRU in this experiment. In case of TPC-C, EPO reduces mean response time by 42.26%, 37.63% and 99.92% than NoCache, LRU and BPLRU respectively and increase the

throughput by 73.19%, 60.31% and 12188.45% than NoCache, LRU and BPLRU. The results of TPC-C show that EPO is way better than BPLRU. From the above results we can see that increasing the buffer size does not really impact on reducing the mean response time or increasing the throughput. This means that size of write buffer has little impact on a totally random workload.

We conducted experiments on different page size of a SSD to see the impact on all four algorithms. In this experiment, we choose three different page size 1KB, 2KB and 4KB. Figure 5.2 shows the results for varying page size.

As seen from Figure 5.2, flash page size has a significant impact on the performance of all four algorithms. Remember size of each request in all our three traces is 4KB so when SSD page size is configured to 1KB or 2KB, pre-processor divides each request into four or two requests respectively. This increases the number of requests in trace by four times and doubled in case of 1KB and 2KB page size respectively. In all configurations EPO

outperforms all three other schemes. In EPO scheme, for smaller page size, each request is divided into number of requests and all these requests are treated independently. That’s why the response time does not change much with page size. However throughput increases with the page size from 1KB to 4KB. The reason for such result is that larger flash page improves write efficiency and decreases the number of block erasures [9].

Finally, a very important experiment to conduct is with varying number of elements in the SSD. From these results, it can be concluded the scalability of the algorithms.

Figure 5.2. Performance impact of flash page size on four schemes.

The intention of conducting experiments with different number of elements in a SSD is to check for the scalability. The simulation is run with different number of elements from 16 to 64 in a SSD. The default size of the buffer is 8MB and page size is 4KB. Results show that EPO and BPLRU algorithms show a good scalability for all three traces Financial1, Financial2 and TPC-C when number of elements increases from 16 to 48. EPO reduces the mean response time by 42.3% while BPLRU improve their mean response time by 66.1%. From Figure 5.3 it can be seen that EPO outperform all three algorithms NoCache, LRU and BPLRU by 20.5%, 12% and 38.3% respectively. None of the four algorithms shows

improvement after number of elements increased beyond 48. The reason is that footprint of the Financial1, Financial2 and TPC-C traces become relatively very small compared with the

Figure 5.3. Performance impact of number of elements on four schemes.

enlarged capacity of the SSD due to the increment of number of elements. Thus, adding more elements do not help in my experiments. This shows that the scalability of the EPO algorithm highly depends on the type and nature of the workload.

We also conducted experiments with synthetically generated workload. we used three different workloads with different distributions in arrival time and logical block address of the requests. Arrival time in synthetic bechmark1 is poisson distributed [33] and logical block address is normal distributed [34]. In synthetic benchmark2, arrival time is poisson distributed while logical block address is uniform distributed [35]. And in synthetic benchmark3, arrival time is uniform distributed and logical block address is also uniform distributed. We kept the size equal to one page (4KB, eight sectors in my experiments). Each

trace contains one million requests and all are write requests. These synthetic benchmarks have 100% random workload. Figure 5.4 shows the group of figures showing the effect of varying cache size on different benchmark.

Figure 5.4. Performance impact of write buffer size on four schemes for synthetic benchmarks.

As seen from Figure 5.4 EPO outperforms all three schemes, NoCache, LRU and BPLRU. The simulation is run using three different synthetically generated workload with varying size of cache. The different cache sizes used are 4MB, 8MB, 16MB and 32MB. The purpose of conducting experiments with synthetically generated workload is to verify the performance of EPO with other three schemes in different workload environment. It can be clearly seen that EPO outperforms all the three schemes. These synthetically generated workloads also resemble the real world traces. LRU scheme doesn’t perform better than

NoCache. This result can be seen due to total random workload, cache gets filled very quickly before any possible spatial or temporal locality seen by the algorithm. This is also true for EPO but the strategic method of selecting victims makes EPO standout and results in better performance. BPLRU continue showing poor performance as shown for real world traces. The random workload amplifies the number of writes greatly and it also requires reading the old block resulting in the poor performance. Another configuration parameter used was varying size of a page in SSD. In enterprise, the page size in an SSD can be 1K, 2K or 4K depending upon the requirement. Experimental results with this configuration are shown in Figure 5.5.

Figure 5.5. The mean response time and throughput of three different synthetic benchmarks for varying page size in SSD flash memory.

Results show that EPO shows shorter response time and gives better throughput. As seen from the graphs, as page size increases from 1K to 4K, performance of EPO is also increasing. Enterprise receives very heavy workload and that’s why 4K should be the idle page size in enterprise applications. Finally, experiments are conducted with varying number of elements in the SSD. In this case, page is set to 4K and cache size is set to 8M by default. EPO shows better performance results than other three schemes, see Figure 5.6. As seen from the experimental results and from the graphs that EPO shows shorter response time and better throughput in all different configurations and provides a better solution for random write workload.

Figure 5.6. The mean response time and throughput of three different synthetic benchmarks for varying number of elements in an SSD.

CHAPTER 6

Documento similar