LA MISA: UN LUGAR DE LA FIESTA Y EL ENCUENTRO

RADIOS LIBRES LAS VOCES DE LA GENTE

4. DESARROLLOS CONCEPTUALES

4.1 LA MISA: UN LUGAR DE LA FIESTA Y EL ENCUENTRO

This chapter has conducted a comparative analysis of the runtime characterization of a rep- resentative set of RDF stores, namely, Jena, Sesame, RDF-3X and Virtuoso. We have described the dynamics and behaviors of the query execution on the basis of experimental data and queries derived from the BSBM benchmark.

The main findings of this work are the following: (1) Investing in query optimization pays off in general, but, in SPARQL, it is easy to arrive at a situation in which the runtime performance is dominated by optimization. (2) Planning failures are potentially catas- trophic. In our experiments, although RDF-3X was the fastest system in most queries, failure in a single query resulted in it having the worst overall performance. (3) None of the

3.6 Conclusions 55 0 1 0 0 2 0 0 1 0 4 1 0 5 1 0 6 1 0 7 N u m b e r o f lo o k u p s N u m b e r o f t r i p l e s ( M i l l i o n ) J e n a S P S e s a m e S P R D F - 3 X S P J e n a E P S e s a m e E P R D F - 3 X E P

Fig. 3.10 The number of triple lookups of Query 5 by varying the number of triples. For each dataset, Jena and Sesame performs nearly the same, are much smaller than RDF-3X.

0 1 0 0 2 0 0 1 0 3 1 0 4 1 0 5 1 0 6 1 0 7 N u m b e r o f re a d i n p a g e s N u m b e r o f t r i p l e s ( M i l l i o n ) J e n a S P S e s a m e S P R D F - 3 X S P J e n a E P S e s a m e E P R D F - 3 X E P

Fig. 3.11 The number of read in pages of Query 5 by varying the number of triples.

RDF stores examined can exploit modern parallel architectures for single queries. This is expected to have a very negative effect on analytical workloads. (4) Using very fast stor- age, in most cases, did not have the expected impact on performance. This indicates that either the datasets used were completely served by data in memory and caching techniques performed adequately, or that query processing in RDF stores is actually CPU-bound.

All these investigations and findings demonstrate detailed aspects of triple stores and also provide deeper understanding for their behaviors during query execution. This helps us to confirm the main modules and the data flows of our proposed analytical framework as presented in the Figure 1.8 of Chapter 1 in a parallel case. Additionally, the results presented in this chapter show that standalone stores could encounter serious performance bottlenecks

in the presence of big RDF data. Therefore, in the following Chapter 4, we will investigate how to efficiently process large datasets using distributed systems.

Chapter 4 Design and Evaluation of Parallel

Hashing over Large-scale Data

4.1 Introduction

The previous chapter has provided a detailed analysis of the runtime characteristics of triple stores. The collected results have also demonstrated that RDF stores meet performance bottlenecks in the face of huge RDF data as the limitation of sequential implementations. We aim to apply parallel techniques to large-scale RDF management systems, namely, we have to find efficient parallel strategies to process big data at first. For example, the detailed parallelism patterns or thread cooperation strategies etc. over a distributed system need to be considered. As hash tables are commonly used in high-performance analytical data processing systems, which often run on servers with large amounts of memory, and they have been employed in the implementations of encoding, joins and indexing of our proposed framework, the focus of this chapter is on investigating efficient parallel hashing algorithms for processing massive data.

In fact, hash tables are the dominant structure for applications that require efficient map- pings, such as database indexing, object caching and string interning. The O(1) expected time for most critical operations puts them at a significant advantage to competing methods, especially for large data problems. Regardless, similar to other big data problems, as applications grow in scale, parallel hashing on multiple CPUs and/or machines is becoming increasingly important. Currently, there are two dominant parallel hashing frameworks that are widely used and studied: distributed and thread-level parallel hashing.

For the first framework, as shown in Figure 4.1, the threads at each computation node (either logical or physical) build their own hash tables first, and then process the initial

...

data computation node thread hash table

Fig. 4.1 Distributed-level parallelism.

...

Fig. 4.2 Thread-level parallelism

...

Item Distributed in Groups Globally

...

4.1 Introduction 59

partitioned data (refer as keys for simplification throughout this chapter) through accessing a local or remote hash table(s). In general, this access is determined by hash values of the processed keys. This approach is very popular in distributed systems. Considering the target for high performance computing, in the following we only discuss the conditions of full parallelism, rather than the hash tables used in peer-to-peer systems, for example, the commonly studied Distributed Hash Tables (DHTs) [107].

In thread-level hashing, (Figure 4.2), a single hash table is constructed on the single underlying platform, and multiple available threads operate with coordination on that table in parallel. This particular model is widely studied for multithreaded platforms which range in scale from commodity servers to supercomputers. As there exists no costly network communication (though possible NUMA) under this scheme, it always performs very fast.

The two parallel schemes scale in terms of processing large numbers of items by em- ploying new nodes or threads. However, both approaches meet performance issues when processing massive data. With distributed hashing, the large number of frequent and irreg- ular remote accesses of hash operations across computational nodes is costly in terms of communication. Moreover, when the processed data has significant skew, the performance of such parallel implementations will dramatically decrease because all the popular keys will flood into a small number of nodes and cause hot spots. For parallel hashing on multithreaded architecture platforms, the cooperation between threads can efficiently balance the workloads, regardless, both for the skewed or non-skewed data, the associated scalability is bound by the limit on the number of threads available, the availability of specialized hardware predicates and possible memory contention. Furthermore, memory and I/O eventually also become bottlenecks at very large scale.

In general terms, the memory hierarchy of modern clusters consists of a distributed memory level (across nodes) and a shared memory level (multiple hardware threads/cores accessing the memory of a single node). We are proposing a structured parallel hashing (SPH) framework (shown in Figure 4.3) that blends distributed hashing and shared-memory hashing, divided into two phases: (1) items are grouped and distributed globally by each thread, and (2) hash tables are constructed on each node and each of them is only accessed by a local thread(s).

The primary idea is a straightforward bulk-operation scheme, however, in so far as we are aware the approach has not been previously described in the literature. Intuitively, this method has two advantages: (a) reduced remote memory access, load imbalancing and the associated time-cost arising from memory allocation, table locks and communication in distributed hashing, and (b) support for high scalability compared to thread-level hashing

(there are no hardware limitations as our approach operates using predicates available on all platforms).

In fact, such bulk operations are widely applicable. For example, we have implemented joins for parallel data processing using a similar approach in the following chapters (also see [29, 31]). Namely, tuples of an input relation are redistributed to all the computation nodes. From that basis, local hash tables are created for lookup conducted by the other relation. In such application scenarios, the following three questions arising from the proposed framework are becoming to be interesting:

• performance: will the responsible implementations be scalable and can they achieve comparable performance or even outperform the other two approaches?

• parallelism: how will the performance change with varying the number of threads over each hash table, if the whole available threads are fixed for a given system? • impact factors: how will the high-level data distribution as well as the underlying

hash table designs impact on the performance?

The answers will give us an insight of the underlying hash implementation as well as an option to further improve the performance of applications using hash tables over distributed memory.

This chapter makes three main contributions. First, we propose a simple high-level parallel hashing framework, structured parallel hashing, targeting efficient processing of massive data on distributed memory. Second, we conduct a theoretical analysis of the scheme and present an efficient parallel hashing algorithm based on it. Finally, we evaluate on an experimental configuration consisting of up to 192 cores (16 nodes) and large datasets of up to 16 billion items (long integers). The experimental results demonstrate that the proposed approach is efficient and scalable. It is orders of magnitude faster than conventional distributed hashing methods, and also achieves comparable performance with a shared memory supercomputer-based approach, on a socket-for-socket basis.

The rest of this chapter is organized as follows: In Section 4.2, we conduct a theoretical analysis of different hashing frameworks. We present an efficient parallel hashing algorithm in Section 4.3. In Section 4.4, we experimentally evaluate our work, followed by a discussion in Section 4.5 and the conclusions in Section 4.6.

In document La Misa # 3 (folclórica) del maestro Sixto Arango Gallo : un encuentro festivo con la comunidad del Carmen de Viboral (página 53-61)