The structure of the rowkey is of utmost importance. Effective rowkey design isn’t just about what goes into the rowkey, but also about where elements are positioned in the rowkey. You’ve already seen two cases of how the structure impacted read perfor- mance in the examples you’ve been working on.
First was the relationship table design, where you put the relationship type between the two user IDs. It didn’t work well because the reads became inefficient. You had to read (at least from the disk) all the information for both types of relation- ships for any given user even though you only needed information for one kind of relationship. Moving the relationship type information to the front of the key solved that problem and allowed you to read only the data you needed.
Second was the twit stream table, where you put the reverse timestamp as the sec- ond part of the key and the user ID as the first. That allowed you to scan based on user
IDs and limit the number of rows to fetch. Changing the order there resulted in losing the information about the user ID, and you had to scan a time range for the twits, but that range contained twits for all users with something in that time range.
For the sake of creating a simple example, consider the reverse timestamps to be in the range 1..10. There are three users in the system: TheRealMT, TheFakeMT, and Olivia. If the rowkey contains the user ID as the first part, the rowkeys look like the fol- lowing (in the order that HBase tables store them):
Olivia1 Olivia2 Olivia5 Olivia7 Olivia9 TheFakeMT2 TheFakeMT3 TheFakeMT4 TheFakeMT5 TheFakeMT6 TheRealMT1 TheRealMT2 TheRealMT5 TheRealMT8
But if you flip the order of the key and put the reverse timestamp as the first part, the rowkey ordering changes:
1Olivia 1TheRealMT 2Olivia 2TheFakeMT 2TheRealMT 3TheFakeMT 4TheFakeMT 5Olivia 5TheFakeMT 5TheRealMT
6TheFakeMT 7Olivia 8TheRealMT 9Olivia
Getting the last n twits for any user now involves scanning the entire time range because you can no longer specify the user ID as the start key for the scanner.
Now look back at the time-series data example, where you added a salt as a prefix to the timestamp to form the rowkey. That was done to distribute the load across mul- tiple regions at write time. You had only a few ranges to scan at read time when you were looking for data from a particular time range. This is a classic example of using the placement of the information to achieve distribution across the regions.
TIP Placement of information in your rowkey is as important as the infor- mation you choose to put into it.
We have explored several concepts about HBase table design in this chapter thus far. You may be at a place where you understand everything and are ready to go build your application. Or you may be trying to look at what you just learned through the lens of what you already know in the form of relational database table modeling. The next section is to help you with that.
4.6
From relational to non-relational
You’ve likely used relational database systems while building applications and been involved in the schema design. If that’s not the case and you don’t have a relational database background, skip this section. Before we go further into this conversation, we need to emphasize the following point: There is no simple way to map your relational
database knowledge to HBase. It’s a different paradigm of thinking.
If you find yourself in a position to migrate from a relational database schema to
HBase, our first recommendation is don’t do it (unless you absolutely have to). As we have said on several occasions, relational databases and HBase are different systems and have different design properties that affect application design. A naïve migration from relational to HBase is tricky. At best, you’ll create a complex set of HBase tables to represent what was a much simpler relational schema. At worst, you’ll miss impor- tant but subtle differences steeped in the relational system’s ACID guarantees. Once an application has been built to take advantage of the guarantees provided by a rela- tional database, you’re better off starting from scratch and rethinking your tables and how they can serve the same functionality to the application.
Mapping from relational to non-relational isn’t a topic that has received much attention so far. There is a notable master’s thesis5 that explores this subject. But we
can draw some analogies and try to make the learning process a little easier. In this section, we’ll map relational database modeling concepts to what you’ve learned so far about modeling HBase tables. Things don’t necessarily map 1:1, and these concepts are evolving and being defined as the adoption of NoSQL systems increases.
5 Ian Thomas Varley, “No Relation: The Mixed Blessing of Non-Relational Databases,” Master’s thesis, http:// mng.bz/7avI.
109
From relational to non-relational