EL TRIBUNAL INTERNACIONAL SOBRE LA INFANCIA AFECTADA POR LA GUERRA Y LA POBREZA

The philosophy of MapReduce is to provide a flexible framework that can be used to solve different problems. Therefore, MapReduce does not provide a

CHAPTER 2. LITERATURE REVIEW

MapReduce join implementations

θ-join

Equi-join

Repartition

join Semi-join Map-only join Broadcast join Partition join Multi-way join Multiple MapReduce jobs Replicated join

Figure 2.1: Join Implementations on MapReduce

query language, expecting the users to implement their customized map and reduce functions. While this provides considerate flexibility, it adds to the complexity of application development. To make MapReduce easier to use, a number of high-level languages have been developed, some of which are SQL-like (HiveQL [101], Tenzing [31]), others are data flow languages (Pig Latin [83]), and some are declarative machine learning language (SystemML [50]). Among these languages, HiveQL is the most popular one as SQL-like language has been used for years in data management system. In this section, we review how the SQL operators are implemented using the MapReduce interface. Simple operators such as select and project can be easily supported in the map function, while complex ones, such as theta-join [82], equi-join [26] and multi-way join[108, 62] require significant effort.

The projection and filtering can be easily implemented by adding a few conditions in the map function to filter the unnecessary columns and tuples. The implementation of aggregation was discussed in the the original MapRe- duce paper. The mapper extracts an aggregation key for each incoming tuple (transformed into key/value pair). The tuples with the same aggregation key are shuffled to the same reducers, and the aggregation function (e.g., sum, min) is applied to these tuples. Join operator implementations have attracted by far the most attention, as it is one of the most expensive operators and a better implementation may potentially lead to a significant performance improvement. Therefore, in this section, we will focus our discussion on the join operator. We

1 1 1 2 2 2 3 3 3 4 4 4 R S

Figure 2.2: Matrix-to-reducer mapping for cross-product

summarize the existing join algorithms in Figure2.1.

2.2.1 Theta-Join

Theta-join [113] is a join operator where the join condition θ belongs to {<, ≤ , =, ≥, >, 6=}. It is a very expensive database operator, since |R onθ S| is close to |R|×|S| and few optimization techniques are available. To efficiently implement theta-join on MapReduce, the |R| × |S| tuples should be evenly distributed on the r reducers, which means that each reducer generates about the same number of results: |R|×|S|_r . To achieve this goal, a randomized algorithm, 1-Bucket- Theta algorithm, was proposed [82] that evenly partitions the join matrix into buckets (Figure 2.2), and assigns each bucket to only one reducer to eliminate the duplicate computation, while also ensuring that all the reducers are assigned the same number of buckets to balance the load.

2.2.2 Equi-Join

Equi-join is a special case of θ-join where θ equals to “=”. The strategies for MapReduce implementations of the equi-join operator follows earlier parallel database implementations [90]. Given tables R and S, the equi-join operator creates a new result table by combining the columns of R and S based on the equality comparisons over one or more column values. There are three variations of equi-join implementations (Figure2.1): repartition join, semijoin- based join, and map-only join (joins that only require map side processing).

Repartition Join [26] is the default join algorithm for MapReduce in Hadoop. The two tables are partitioned in the map phase, followed by shuffling the tuples with the same key to the same reducer that joins the tuples.

CHAPTER 2. LITERATURE REVIEW

Semijoin-based join has been well studied in parallel database systems (e.g., [24]), and it is natural to implement it on MapReduce [26]. The semijoin operator implementation consists of three MapReduce jobs. The first is a full MapReduce job that extracts the unique join keys from one of the relations, say R, where the map task extracts the join key of each tuple and shuffles the identical keys to the same reducer, and the reduce task eliminates the duplicate keys and stores the results in DFS as a set of files (u0, u1, ..., uk). The second job is a map-only job that produces the semijoin results S0 _{= S n R). In this job,} since the files that store the unique keys of R are small, they are broadcast to each mapper and locally joined with the part of S (called data chunk ) assigned to that mapper. The third job is also a map-only job where S0 is broadcast to all the mappers and locally joined with R.

Map-only join can be used if the tables are already co-partitioned based on the join key. In this case, for a specific join key, all tuples of R and S are co-located in the same node. The scheduler loads the co-partitioned data chunks of R and S in the same mapper to perform a local join, and the join can be processed entirely on the map side without shuffling the data to the reducers. This co-partitioning strategy has been adopted in many systems such as Hadoop++ [46].

2.2.3 Multi-Way Join

The multi-way join can be executed as a sequence of equi-joins, each of which is performed by one MapReduce job. The result of each MapReduce job is treated as input for the next MapReduce job. As different join orders lead to different query plans with significantly different performance, it is important to find the best join order for a multi-way join. The first step is to collect the statistics of the data (e.g., in [60], the problem of efficiently building histogram on MapReduce was investigated), and the second step is to estimate the processing cost of each possible plan using a cost model. Using the estimated cost for each binary join in the join tree, we can step-by-step calculate the cost of the multi-way join.

Many plan generation and selection algorithms that were developed for re- lational DBMSs can be directly applied here to find the optimal plan. These optimization algorithms can be further improved in a MapReduce system [108];

in particular, more elaborate algorithms may be deployed. As MapReduce jobs usually run for a long time, justifying more elaborate algorithms (i.e., longer query optimization time) if they can reduce query execution time. In addi- tion, instead of considering only the left-deep plans, the bushy plans are often considered for their efficiency.

In document SENTENCIA INTERNACIONAL (página 52-57)