1. MARCOS DE REFERENCIA
1.3. MARCO TEÓRICO
1.3.5. La formación docente
There is a cost associated with a query execution needing to interact with a layer outside of itself in the form of time. When a storage engine has to be accessed, there is a time associated with reading the data. When a storage engine has to interact with the file system (such as cache misses on the file system), there is a time associated with that interaction. When the file system must interact with components of hardware that are slower than others (such as disk seeks versus being able to retrieve the data stored in memory) there is a loss in time to complete a query. With each of these there are steps that can be followed to improve performance, but for the most part the greatest gain in query execution performance involves indexes.
Indexes
Indexing increases the performance of the database when it comes to data-access performance by allowing quick access to the rows in a table. In addition, indexes can have positive effects on UPDATE and DELETE operations by providing a faster means to locate the rows affected. Indexes are created using one or more columns of a table. In addition to the index being smaller than the original table (due to having fewer columns), indexes are optimized for quick searching, usually via a balanced tree. When indexes are not used or are not matched by SQL statements submitted to the database, then a full-table scan is executed. A full- table scan will read all rows in a table to find a specific row or set of rows, which can be extremely inefficient. 41
42
Full Table Scan vs. Utilizing Indexes
1 10 100 1,000 10,000 100,000 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,38 4 32,76 8 65,536 Number of Rows M ill is ec on ds
Full Table Scan Utilizing Indexes
2-4 ______________________________________________________________________________________________ ______________________________________________________________________________________________ ______________________________________________________________________________________________
Thi Nguyet Tran (moonฺtran@spts21ฺcomฺvn) has a non-transferable
license to use this Student Guideฺ
Index Issues
The following list decribes some index issues to be aware of:
• Speed versus maintenance – Indexes help to speed up data retrieval but are expensive to
maintain. As indexes are added, the time to write to the data is increased to maintain index integrity. ○ UNIQUE indexes – A UNIQUE index creates a constraint such that all values in the index
must be distinct. An error occurs if a new row of data is added with a key that matches an existing row. If an index is unique, use the UNIQUE clause to force the column to be distinct which in turn also improves the efficiency of searching on that column.
• Index selectivity - The more selective an index is, the more benefit is obtained from using it.
The more unique values in an indexed field, the better chance that the index has of finding the value being searched in an effective amount of time. Index selectivity of a field can be determined by using the following SQL statement:
mysql> SELECT 1.0/NULLIF(COUNT(DISTINCT(index_field)),0) FROM table;
The NULLIF is placed into the SQL to ensure that there is no chance of a divide by zero error. A response of 0.50 is the worst possible result and states that the effectiveness of the index is extremely poor. A response as close to 0.00 as possible is the best result to ensure the effectiveness of the index.
For large datasets, this SELECT statement can be expensive to run (in reference to the time it would take). Using the INFORMATION_SCHEMA database, the following SQL will list the percentage of rows that are not unique:
mysql> SELECT TABLES.TABLE_SCHEMA, TABLES.TABLE_NAME, -> STATISTICS.INDEX_NAME,
-> CONCAT((1 - CARDINALITY/TABLE_ROWS)*100, '%') AS `Rows per value` -> FROM information_schema.STATISTICS
-> JOIN information_schema.TABLES -> USING (TABLE_SCHEMA, TABLE_NAME)
-> WHERE non_unique=1 AND TABLE_ROWS/CARDINALITY IS NOT NULL;
○ Cardinality - Highly duplicated data should not be indexed (for example, boolean data
types, and columns that represent gender, state abbreviations, or country codes). However, having a heavily skewed data distribution can make indexes useful when looking for some values and not useful when looking for others.
Altering the execution of a query
In a situation where the end user understands the data, MySQL provides two means to alter the execution of the query:
• USE INDEX(index_list) - This clause will tell the MySQL server to only evalute the indexes in the index_list to determine which index (if any) would be the best to use for the query. If the MySQL server believes that a full table scan is the most appropriate execution to provide the best results, this clause makes that possibility available.
• FORCE INDEX(index_list) - This clause is identical to the USE INDEX clause but will force MySQL to choose one of the indexes in the index_list. A full table scan would never be performed with this clause even if the MySQL server felt it would provide the best results. This clause would be appropriate in those times when the cardinality data type field indexed is searching for the more unique values in the field.
43
Thi Nguyet Tran (moonฺtran@spts21ฺcomฺvn) has a non-transferable
license to use this Student Guideฺ
2-6 ______________________________________________________________________________________________ ______________________________________________________________________________________________ ______________________________________________________________________________________________
• Short keys - Keys that are shorter in length have benefits over longer keys due to the sheer
speed of eliminating the need to scan a long value. However, too short of a key reduces the possibility of having index selectivity.
• Integer data types - Keys that are based on the integer data type make the best indexes not only
for index operations but also for join and other types of database operations.
• Dead Indexes - Make sure to avoid indexes that are never used by any queries. These cause
additional overhead that is not necessary, and removing them will improve overall efficiency, especially during updates, deletes and inserts.
• Duplicate indexes - Avoid more than one index on the same column(s). The optimizer must
determine which to use. Also there is more maintenance as the data changes.
Thi Nguyet Tran (moonฺtran@spts21ฺcomฺvn) has a non-transferable
license to use this Student Guideฺ
Composite Indexes
MySQL uses multiple-column indexes in such a way that queries are fast when a known quantity is
specified for the first column of the index in a WHERE clause, even if values are not specified for the
other columns. The following list of characteristics describe composite indexes:
• Indexes can be created on several columns.
• They can be used for searches on just the first column(s).
• An index on (a,b,c) can be used for searches on [(a), (a,b), (a,b,c)].
In summary, a composite index spans multiple columns where each column is sorted based on the value in the preceeding column. For example, the following describes what a composite index on the Country table which includes the Continent, Region and Name columns would look like when it is stored:
Continent Region Name
Antarctica Antarctica Antarctica Antarctica Antarctica Bouvet Island ...
Africa Central Africa Equatorial Guinea Africa Central Africa Gabon Africa Central Africa Sao Tome and Principe Africa Eastern Africa British Indian Ocean Territory Africa Eastern Africa Burundi Africa Eastern Africa Comoros ...
Africa Western Africa Togo Asia Eastern Asia China Asia Eastern Asia Hong Kong ...
Asia Southern and Central Asia Uzbekistan Europe Baltic Countries Estonia Europe Baltic Countries Latvia Europe Baltic Countries Lithuania Europe British Islands Ireland Europe British Islands United Kingdom Europe Eastern Europe Belarus ...
In this scenario, the Continent column is sorted and then the Region column is sorted within the Continent column and finally the Name is sorted within the Region column. This indexing method allows the MySQL server to quickly locate records when the first column is called along with the other columns in an ascending fashion. This type of index would not be useful if the third column was searched without searching the first and second columns.
Leftmost index prefixes
In a table that has a composite (multiple column) index, MySQL can use leftmost index prefixes of that index. A leftmost prefix of a composite index consists of one or more of the initial columns of the index. MySQL's capability to use leftmost index prefixes enables you to avoid creating unnecessary indexes. In the composite index example above, the index on column a along with the index on columns a and b are considered leftmost prefix indexes. 44