Desarrollo de un Prototipo del Modelo de Integración de las
4.1 Instrumentación del Componente Ontológico
It is fairly common to find recommendations online that state the index key columns should be ordered based on their cardinality (or selectivity), the idea being to reduce the number of pages that have to be read to match a set of filtering or grouping columns. However, selecting the appropriate key column order is never as straightforward as most online content makes it seem, and you won't generally find a lot of guidance about when and why you might choose to create an index that has the columns in an order that is different from the cardinality order. For example, is it better to create multiple indexes, where each individual index has the optimal key column order, based on column selec- tivity, or to create a single index that covers multiple queries, but has a less selective column order?
Ultimately, your decision should be based on the type of database for which the index is being implemented. For a data warehouse, where there are significantly more read operations than write operations, the multiple indexes option may be appropriate. For an OLTP system, where there are more writes than reads, a less selective index that covers multiple queries using a less than optimal column order may be most appropriate. The most important point to understand though, regarding index key column order, is that a query cannot seek on an index unless the query filters on a left-based subset of the index key. To demonstrate this point, let's say we have the index and queries shown in Listing 5.1.
CREATE INDEX idx_Test ON TestTable (Col2, Col1, Col3)
SELECT 1
FROM TestTable
WHERE Col1 = @Var1 AND Col2 = @Var2 AND Col3 = @Var3
SELECT 1
FROM TestTable
WHERE Col1 = @Var1 AND Col3 = @Var3
SELECT 1
FROM TestTable
WHERE Col2 = @Var2 AND Col3 = @Var3
Listing 5.1: Various queries against a simple three-column index.
The first query can seek effectively on the index because it filters on all three columns of the index (the order of clauses in the WHERE clause is irrelevant).
The second query cannot seek on that index; the leading column of the index is Col2 and that query does not filter on Col2. The query can use that index, but only via a scan. The third query can seek on the index, but the seek is not as efficient as it could be, as SQL can only seek for Col2; it would have to do a secondary filter for Col3, since Col3 is not the second column in the index.
If I wanted to create the minimum number of indexes that could allow SQL Server to resolve all three queries as efficiently as possible, i.e. with index seeks, then I could create an index for each query, or I could try several combinations of two indexes, with differing index key ordering. The selectivity of the various columns would help choose which pair were the most appropriate. Some of the options are shown in Listing 5.2.
--3 possible pairs of indexes
CREATE INDEX idx_Test1
ON TestTable (Col1, Col3, Col2)
CREATE INDEX idx_Test1
ON TestTable (Col2, Col3)
--OR—
CREATE INDEX idx_Test
ON TestTable (Col2, Col3, Col1)
CREATE INDEX idx_Test1
ON TestTable (Col1, Col3)
--OR--
CREATE INDEX idx_Test
ON TestTable (Col3, Col1, Col2)
CREATE INDEX idx_Test1
ON TestTable (Col2, Col3)
Listing 5.2: Three possible pairs of indexes.
Let's look at a quick AdventureWorks example. Listing 5.3 shows three queries, each with a different predicate in the WHERE clause.
SELECT BusinessEntityID , PersonType , FirstName , MiddleName , LastName , EmailPromotion
FROM Person.Person AS p
WHERE FirstName = 'Carol' AND PersonType = 'SC'
SELECT BusinessEntityID ,
FirstName ,
LastName
FROM Person.Person AS p
AND Title = 'Ms.'
SELECT BusinessEntityID ,
PersonType ,
EmailPromotion
FROM Person.Person AS p
WHERE Title = 'Mr.'
AND FirstName = 'Paul' AND LastName = 'Shakespear'
Listing 5.3: Three queries against AdventureWorks.
We could create three indexes, each one perfectly suited to a single query, and in a data warehouse environment that may indeed be the best option. In an OLTP environment where the number of indexes should be kept low, in order to maintain good INSERT performance, it may not be such a good idea.
In terms of selectivity, the LastName column is the most selective, followed closely by FirstName. The Title and PersonType columns have a much lower selectivity, each having only six distinct values in the table. In this case, we could create just the two indexes shown in Listing 5.4, and have all the queries in Listing 5.3 perform very well.
CREATE INDEX idx_Person_FirstNameLastNameTitleType
ON Person.Person (FirstName, LastName, Title, PersonType)
CREATE INDEX idx_Person_TypeTitle
ON Person.Person (PersonType, Title)
Listing 5.4: Two indexes, designed based on column selectivity.
The first index satisfies the first and third queries. It's not perfect for the first one, but given how selective the FirstName column is, it's likely to be good enough. I've chosen to have FirstName as the leading column despite having slightly worse selectivity than LastName, because if I put LastName as the leading column then the first query would be unable to seek on it and I would need a third index to completely satisfy all queries.
With the second index, the order of columns is arbitrary. Neither the queries nor the selectivity shows a preferred order, so either way works. In a real environment, the order would probably be decided by other indexes or queries.
It should be clear even from this relatively simple example that determining the optimal order of columns for an index can be a complex process and is not something that should be decided on without sufficient analysis and investigation.
More on index selectivity
SQL Server MVP, and Technical Reviewer for this book, Gail Shaw, discusses this topic in further detail on her two blog posts: "Index columns, selectivity and equality predicates" (http://sqlinthewild. co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/) and "Index columns, selectivity and inequality predicates" (http://sqlinthewild.co.za/index.php/2009/02/06/ index-columns-selectivity-and-inequality-predicates/).