sql performance
- Read source code of
HashMap
in c# - Read source code of
LinkedList
in c# - Unlike the index, the table data is stored in a heap structure and is not sorted at all????? Heap is either min-heap or max-heap, all is partially sorted.
Developers need to know this?
Sql separate what & how.
However, developers needs to know how. Because the access path is what influence the performance most, and developers instead of DBAs know it.
Structure (anatomy of an index)
The primary purpose of an index is to provide an ordered representation of the indexed data. It is, however, not possible to store the data sequentially because an
insert
statement would need to move the following entries to make room for the new one. Moving large amounts of data is very time-consuming so theinsert
statement would be very slow. The solution to the problem is to establish a logical order that is independent of physical order in memory.
Doubly-linked list: (leaf nodes)
- manage the logic order between leaf nodes, which is redundancy of unordered table data.
- a leaf node is saved in a database block or page (the database’s smallest storage unit). All blocks are of the same size, typically a few kilobytes.
- a leaf node contains several ordered index entries.
- Each index entry consists of the indexed columns (the key, column 2) and refers to the corresponding table row (via
ROWID
orRID
) - the index order is maintained on two different levels: the index entries within each leaf node, and the leaf nodes among each other using a doubly linked list.
B+ Tree (branch nodes)
- a balanced search tree
The tree traversal is a very efficient operation—so efficient that I refer to it as the first power of indexing.
That is primarily because of the tree balance, which allows accessing all elements with the same number of steps, and secondly because of the logarithmic growth of the tree depth.
That means that the tree depth grows very slowly compared to the number of leaf nodes. Real world indexes with millions of records have a tree depth of four or five. A tree depth of six is hardly ever seen.
Slow Indexes
An index lookup requires three steps: (1) the tree traversal; (2) following the leaf node chain; (3) fetching the table data. The tree traversal is the only step that has an upper bound for the number of accessed blocks—the index depth. The other two steps might need to access many blocks—they cause a slow index lookup.
The Oracle database has three distinct operations that describe a basic index lookup:
- INDEX UNIQUE SCAN: performs the tree traversal only. (unique constraint)
- INDEX RANGE SCAN: performs the tree traversal and follows the leaf node chain to find all matching entries. (non-unique constraint)
- TABLE ACCESS BY INDEX ROWID: This operation is (often) performed for every matched record from a preceding index scan operation. This operation only get one block by the rowid once.
- TABLE FULL SCAN: this opertion won’t traverse the index tree, it scans the whole table. And it can get multiple blocks once. Sometimes it may be faster than index searching, since it can fetch batch blocks.
over indexing
Every index causes ongoing maintenance. Function-based indexes are particularly troublesome because they make it very easy to create redundant indexes.
Always aim to index the original data as that is often the most useful information you can put into an index.
query optimizer
The query optimizer, or query planner, is the database component that transforms an SQL statement into an execution plan. This process is also called compiling or parsing. There are two distinct optimizer types.
Cost-based optimizers (CBO) generate many execution plan variations and calculate a cost value for each plan. The cost calculation is based on the operations in use and the estimated row numbers. In the end the cost value serves as the benchmark for picking the “best” execution plan.
Rule-based optimizers (RBO) generate the execution plan using a hard-coded rule set. Rule based optimizers are less flexible and are seldom used today.
How does the cost-based optimizer works
A cost-based optimizer uses statistics about tables, columns, and indexes. Most statistics are collected on the column level: the number of distinct values, the smallest and largest values (data range), the number of
NULL
occurrences and the column histogram (data distribution). The most important statistical value for a table is its size (in rows and blocks).The most important index statistics are the tree depth, the number of leaf nodes, the number of distinct keys and the clustering factor (see Chapter 5, “Clustering Data”).
The optimizer uses these values to estimate the selectivity of the
where
clause predicates.If there are no statistics available—for example because they were deleted—the optimizer uses default values. The default statistics of the Oracle database suggest a small index with medium selectivity.
The following plan shows that the INDEX RANGE SCAN
will return 40 rows. It is a gross underestimate, as there are 1000 employees working for this subsidiary.
1 | -- pk: (subdiary_id, employ_id) |
The execution plan with statistics uses the column histograms to estimate the number of rows returned from the index lookup—the result is used for the cost calculation.
Column histograms are most useful if the values are not uniformly distributed. For columns with uniform distribution, it is often sufficient to divide the number of distinct values by the number of rows in the table. This method also works when using bind parameters.
the estimates are wrong?
Contradicting estimates often indicate problems with the statistics.
1 | # range scan gets 40 rows, while the substitute table access gets 100 rows. It's weird estimate. |
The Oracle database maintains the information about the number of distinct column values as part of the table statistics. These figures are reused if a column is part of multiple indexes.
Statistics for a function-based index (FBI) are also kept on table level as virtual columns. Although the Oracle database collects the index statistics for new indexes automatically (since release 10g), it does not update the table statistics. For this reason, the Oracle documentation recommends updating the table statistics after creating a function-based index:
After creating a function-based index, collect statistics on both the index and its base table using the
DBMS_STATS
package. Such statistics will enable Oracle Database to correctly decide when to use the index.
My personal recommendation goes even further: after every index change, update the statistics for the base table and all its indexes. That might, however, also lead to unwanted side effects. Coordinate this activity with the database administrators (DBAs) and make a backup of the original statistics.
Statistics
How to gather the statistics?
- mannually
- in oracle:
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15, cascade => TRUE);
(see DBMS_STATS) - In hive/spark: ((see spark cost-based optimizer), spark reordering joins - cost-based optimization)
- Must generate statistics before the query execution
- Must enable some config:
spark.sql.cbo.enabled
,spark.sql.cbo.joinReorder.enabled
,spark.sql.statistics.histogram.enabled
… - table level statistics:
analyze table xxx compute statitics
- column level statistics:
analyze table xxx compute statistics for columns col1, col2
- spark collected statistics for the following two types:
- in oracle:
- automatically
- By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called
GATHER_STATS_JOB
. By default this job runs within a maintenance windows between 10 P.M. to 6 A.M. week nights and all day on weekends. (see automatic statistics collection) - seems no automatically collect in spark/hive
- By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called
Optimization Points
The B-Tree traversal is the first power of indexing.
Clustering is the second power of indexing.
Pipelined order by
is the third power of indexing.
Where
concatenated index (Index order)
Rule of thumb: index for equality first—then for ranges.
Always aim to index the original data as that is often the most useful information you can put into an index.
Avoid function-based indexing for expressions that cannot be used as access predicates.
Avoid select *
and fetch only the columns you need.
In general, a database can use a concatenated index when searching with the leading (leftmost) columns. An index with three columns can be used when searching for the first column, when searching with the first two columns together, and when searching using all columns.
Even though the two-index solution delivers very good
select
performance as well, the single-index solution is preferable. It not only saves storage space, but also the maintenance overhead for the second index. The fewer indexes a table has, the better theinsert
,delete
andupdate
performance.
1 | CREATE UNIQUE INDEX employees_pk ON employees (employee_id, subsidiary_id); |
1 | SELECT first_name, last_name, date_of_birth |
The access predicates are the start and stop conditions for an index lookup. They define the scanned index range.
Index filter predicates are applied during the leaf node traversal only. They do not narrow the scanned index range.
The appendix explains how to recognize access predicates in MySQL, SQL Server and PostgreSQL.
Function-based indexes
Almost all the db supports function-based indexes, or computed-column index. (see db support on function-based indexes). In implementation, the computation result is what to be put into the index.
Although we can create function-based indexes, we must look out the over-indexing. And use a redundant condition on the most significant column when a range condition combines multiple columns, to avoid some function-based indexes.
Requirements:
- Only functions that always return the same result for the same parameters—functions that are deterministic—can be indexed.
- Besides being deterministic, PostgreSQL and the Oracle database require functions to be declared to be deterministic when used in an index so you have to use the keyword
DETERMINISTIC
(Oracle) orIMMUTABLE
(PostgreSQL).
An index whose definition contains functions or expressions is a so-called function-based index (FBI). Instead of copying the column data directly into the index, a function-based index applies the function first and puts the result into the index. As a result, the index stores the names in all caps notation.
The database can use a function-based index if the exact expression of the index definition appears in an SQL statement
Warning:
Sometimes ORM tools use
UPPER
andLOWER
without the developer’s knowledge. Hibernate, for example, injects an implicitLOWER
for case-insensitive searches.
1 | # this function cannot be used in index, since it's not derterministic. |
Parameterized queries
benefits:
Security
- Bind variables are the best way to prevent SQL injection.
Performance
- Databases with an execution plan cache like SQL Server and the Oracle database can reuse an execution plan when executing the same statement multiple times. It saves effort in rebuilding the execution plan but works only if the SQL statement is exactly the same. If you put different values into the SQL statement, the database handles it like a different statement and recreates the execution plan. When using bind parameters you do not write the actual values but instead insert placeholders into the SQL statement. That way the statements do not change when executing them with different values.
- this will only work if the data is equally distributted (for skewed data, the better plan maybe changed for each query value, see query optimizer). However, generating and evaluating all execution plan variants is a huge effort that does not pay off if you get the same result in the end anyway. Not using bind parameters is like recompiling a program every time.
- Example: Unevenly distributed status codes like “todo” and “done” are a good example. The number of “done” entries often exceeds the “todo” records by an order of magnitude. Using an index only makes sense when searching for “todo” entries in that case.
- Similar as the
reuse exchange
in spark execution plan
You should always use bind parameters except for values that shall influence the execution plan. In all reality, there are only a few cases in which the actual values affect the execution plan. You should therefore use bind parameters if in doubt—just to prevent SQL injections.
Cursor Sharing and Forced Parameterization
The more complex the optimizer and the SQL query become, the more important execution plan caching becomes. Many databases use a shared execution plan cache like DB2, the Oracle database, or SQL Server.
The SQL Server and Oracle databases have features to automatically replace the literal values in a SQL string with bind parameters. These features are called
CURSOR_SHARING
(Oracle) or forced parameterization (SQL Server).Both features are workarounds for applications that do not use bind parameters at all. Enabling these features prevents developers from intentionally using literal values.
Example (See also: PreparedStatement
class documentation):
1 | int subsidiary_id; |
Heuristic execution plan
When the data is unevenly distributed, the sharing cache may bring new problems and bugs. The optimizer may try to use some smart logic to make it work better.
The Oracle database uses a shared execution plan cache (“SQL area”) and is fully exposed to the problem described in this section.
Oracle introduced the so-called bind peeking with release 9i. Bind peeking enables the optimizer to use the actual bind values of the first execution when preparing an execution plan. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.
Release 11g introduced adaptive cursor sharing to further improve the situation. This feature allows the database to cache multiple execution plans for the same SQL statement. Further, the optimizer peeks the bind parameters and stores their estimated selectivity along with the execution plan. When the cache is subsequently accessed, the selectivity of the current bind values must fall within the selectivity ranges of a cached execution plan to be reused. Otherwise the optimizer creates a new execution plan and compares it against the already cached execution plans for this query. If there is already such an execution plan, the database replaces it with a new execution plan that also covers the selectivity estimates of the current bind values. If not, it caches a new execution plan variant for this query — along with the selectivity estimates, of course.
Sql server has similar polices, and adds hints to help the decision (see smart optimizer logic in other database):
SQL Server uses so-called parameter sniffing. Parameter sniffing enables the optimizer to use the actual bind values of the first execution during parsing. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.
SQL Server 2005 added new query hints to gain more control over parameter sniffing and recompiling. The query hint RECOMPILE bypasses the plan cache for a selected statement. OPTIMIZE FOR allows the specification of actual parameter values that are used for optimization only. Finally, you can provide an entire execution plan with the USE PLAN hint.
The original implementation of the OPTION(RECOMPILE) hint had a bug so it did not consider all bind variables. The new implementation introduced with SQL Server 2008 had another bug, making the situation very confusing. Erland Sommarskog has collected all the relevant information covering all SQL Server releases.
Like
LIKE
filters can only use the characters before the first wild card during tree traversal. The remaining characters are just filter predicates that do not narrow the scanned index range. A single LIKE
expression can therefore contain two predicate types: (1) the part before the first wild card as an access predicate; (2) the other characters as a filter predicate.
1 | SELECT first_name, last_name, date_of_birth |
index merge
is it better to create one index for each column or a single index for all columns of a where
clause?
one index with multiple columns is better—that is, a concatenated or compound index.
One index scan is faster than two.
Many database products offer a hybrid solution between B-tree and bitmap indexes. In the absence of a better access path, they convert the results of several B-tree scans into in-memory bitmap structures. Those can be combined efficiently. The bitmap structures are not stored persistently but discarded after statement execution, thus bypassing the problem of the poor write scalability. The downside is that it needs a lot of memory and CPU time. This method is, after all, an optimizer’s act of desperation.
bitmap index
Bitmap indexes are almost unusable for online transaction processing (OLTP).
The advantage of bitmap indexes is that they can be combined rather easily. That means you get decent performance when indexing each column individually. Conversely if you know the query in advance, so that you can create a tailored multi-column B-tree index, it will still be faster than combining multiple bitmap indexes.
By far the greatest weakness of bitmap indexes is the ridiculous insert
, update
and delete
scalability. Concurrent write operations are virtually impossible. That is no problem in a data warehouse because the load processes are scheduled one after another. In online applications, bitmap indexes are mostly useless.
Partial index
With partial (PostgreSQL) or filtered (SQL Server) indexes you can specify the rows that are indexed. It will decrease the index size and thus quicken the index search.
The
where
clause of a partial index can become arbitrarily complex. The only fundamental limitation is about functions: you can only use deterministic functions as is the case everywhere in an index definition. SQL Server has, however, more restrictive rules and neither allow functions nor theOR
operator in index predicates.
1 | CREATE INDEX messages_todo ON messages (receiver) WHERE processed = 'N' |
Obfuscated conditions
null in the oracle
Null is processed differently in many dbs, e.g. is null
instead of = null
. But in Oracle there’re more oddities.
In oracle:
- Empty string
''
is treated and stored as null - for single column index, it’s in fact partial index, since all the rows with null index column are not included in the index.
- however, for concatenated indexes, only when all the index columns are null, the row won’t be indexed.
- For some query that requests
null
rows, Oracle only use the index when it knows the index is not nullable. When the index col is nullable, oracle thinks that some rows are not included in the index. And then for some queries, the indexes won’t be used, e.g.count(*)
. Oracle can knows an index is not nullable in the following ways:NOT NULL
constraint on the indexing column- non-null constant expression as the index
- Some internal functions that oracles knows (for other user defined functitons, oracle sees it as blackbox).
NOT NULL
constraint may still be required. e.g.UPPER(last_name)
Solutions to index null in oracle:
1 | # use concatenated index, for the following 2 indexes, last_name must has `NOT NULL` constraint |
date
Oracle only has Date (with time part) type, so most of the time, you:
- must specify the exact range
- create simple index on the date col (instead of function-based index) & do not use function for the left part in the query condition.
1 | # this will use the index |
Implict type conversion in query
DB may conduct some implict type conversion for query (e.g. the above like
). When this happens, the index won’t work since the index column changed. And the implict conversion alter decrease performance (even a little)
Avoid implict type conversion
e.g.
- Numeric strings:
select * from test where numberic_string = 42
is in factselect * from test where to_number(numberic_string) = 42
1 | # In spark, it's the same. Alwasy explicitly convert types. |
use redundant condition/column
Although we can create function-based indexes, we must look out the over-indexing. And use a redundant condition on the most significant column when a range condition combines multiple columns, to avoid some function-based indexes.
1 | # the will use the index, too. The second condition is redundant, to use the index first. |
Join
There are 3 kinds of join:
- nested loop join:
- seems to be the least useful. Spark doesn’t has this
- Join order is import here
- n+1 problem in orm
- Use case: when the outer loop has very few records and the inner loop can use index, it can be really fast
- hash join:
- Build the smaller dataset as a hash table. In spark, broadcast hash join & shuffle hash join, both use the hash join in the second join stage.
- The hash table size is the key to the performance. Shrink it both vertically (use more filters and add indexes on the independent filters) and horizontally (project columns in use)
- use case: often used when joining big table & small table
- sort merge join
- Similar to spark sortMergeJoin
- use case: join two tables with similar size
1 | # this will use a as outer loop |
join order
pipelined execution
Clustered Index
Clustering data means to store consecutively accessed data closely together so that accessing it requires fewer IO operations.
index clustering factor:
The index clustering factor is an indirect measure of the probability that two succeeding index entries refer to the same table block. The optimizer takes this probability into account when calculating the cost value of the
TABLE ACCESS BY INDEX ROWID
operation.
Index-only scan
clustered index: only used when you have one index for a table. If a table has more than one index, then the other indexes would be the secodary index, which are very inefficient.
index-based access on a heap table:
Secondary index on an IOT:
use index to avoid sorting
Order by
pipelined order by
If the index order corresponds to the order by
clause, the database can omit the explicit sort operation.
Databases can read indexes in both directions.
When using mixed ASC
and DESC
modifiers in the order by
clause, you must define the index likewise in order to use it for a pipelined order by
.
This does not affect the index’s usability for the where
clause.
1 | # use the index, omit order |
Figure 6.5 Database/Feature Matrix
Group by
pipelined group by
Two group by algorithms:
- Hash algorithm: aggregates the input records in a temporary hash table. Once all input records are processed, the hash table is returned as the result.
- sort/group algorithm: first sorts the input data by the grouping key so that the rows of each group follow each other in immediate succession. Afterwards, the database just needs to aggregate them
In general, both algorithms need to materialize an intermediate state, so they are not executed in a pipelined manner. Nevertheless the sort/group algorithm can use an index to avoid the sort operation, thus enabling a pipelined group by
.
1 | # omit sort using the index |
Partial Results
Pipeline execution
Inform the database whenever you don’t need all rows.
The database can only optimize a query for a partial result if it knows this from the beginning.
1 | # The respective top-N syntax just aborts the execution after fetching ten rows. And it omit sort by using index. |
Control the index scale!
Index maintenance is expensive! The tree balance, the block size… e.g. During insert, once the correct leaf node has been identified, the database confirms that there is enough free space left in this node. If not, the database splits the leaf node and distributes the entries between the old and a new node. This process also affects the reference in the corresponding branch node as that must be duplicated as well. Needless to say, the branch node can run out of space as well so it might have to be split too. In the worst case, the database has to split all nodes up to the root node. This is the only case in which the tree gains an additional layer and grows in depth.
Nevertheless, the performance without indexes is so good that it can make sense to temporarily drop all indexes while loading large amounts of data—provided the indexes are not needed by any other SQL statements in the meantime. This can unleash a dramatic speed-up which is visible in the chart and is, in fact, a common practice in data warehouses.
Horizontal scalability
Bigger hardware is not always faster—but it can usually handle more load.
With more nodes, you have complex consistency problems to solve; and there is network latency and even firewall latency between nodes. It is really not always faster, even slower.
Performance has two dimensions: response time and throughput.
More hardware will typically not improve query response time.
Proper indexing is the best way to improve query response time.
Questions
- How’s the tree traversal going? especially on each node, branch node, leaf node?
- For leaf node,
- if it’s index unique scan, then it will locate a leaf node, and then use binary search or traverse that leaf node (from the image, it should be using tragersing), that will touch the target.
- If it’s index range scan, then it will locate a start_leaf_node (maybe end_leaf_node by the sort order), and then it can determine all the rows to fetch.
- If decide to not use index, it will traverse all the table rows
- For branch node,
- if it’s the index unique scan, then it will traverse the balance tree to get to the only target leaf node.
- if it’s the index range scan, then it will traverse the balance tree to get to the start_leaf_node (and the end_leaf_node ===> maybe optional)
- if it’s no index scan, then it will not traverse any branch nodes, and to to the table directly
- If we traverse the branch node & the leaf node in the same way, and the branch node & the leaf node contain the same count of entries, then they are the same for step (1) and step (2)
- for unique scan, yes, it is. For range scan, no, you need to traverse several leaf nodes sequentially.
- For equality condition and concatenated index, does the order in where matter?
- No. What matters is the the order in index. When there is range condition in where, the index column order matters.
- why for like operator, it only uses the characters before the first wild card during tree traversal???
- because if you want to traverse, you must know some leading chars to conduct the traversing. No leading chars, how to traverse?
- if we create indexes for multiple column, in index searching, will it use all these indexes or just pick one???
- will use all indexes, and then trigger the index merge
- Then the database must scan both indexes first and then combine the results. The duplicate index lookup alone already involves more effort because the database has to traverse two index trees. Additionally, the database needs a lot of memory and CPU time to combine the intermediate results.
- how does the bitmap index work?
- Why is there nested loop join? How oracle choose the 3 kinds of join? Is join order important for the hash join & sort merge join?
- Nested loop join: when the outer loop records are little and the inner loop can use index, it may be very fast, whereas the sort merge join must sort & merge which can be slower
- Sort merge join: often used when join two tables with simlar size
- hash join: other used when join big table and small table
- Join order: seems that the optimizer will optimize it.
- How to share this knowlege?
- By knowing the index structure, what improvment we can conduct?
- What kind of strategy to improve performance?
- use the index
- join (order, hash join, sort merge join)
- clustering data
- decrease I/O
- some principals
- Can hive have index?
- it has index first, but dropped since 3.0 because it’s kind of useless in hive
- why does hive remove index?
- because if we save data in columnar file formats (parquet/orc), the data is kind of indexed by nature.
- And there’s materialized views with automatic rewriting. This will improve performance to some extent, maybe from this way it works similarily to indexing??
- see 什么是列式存储
practice
1 | # concatenated indexes order (pk1, pk2, pk3) |
Sharing notes
- 今天关注的是 sql 的 how,为什么要关注 how?
- Sql 分离了 what & how。sql 语句是对 what 的描述,但我们需要关注 how 吗?
- 需要。sql 的写法,直接影响 performance,很多时候是很大程度的影响,所以需要关注
- Sql 分离了 what & how。sql 语句是对 what 的描述,但我们需要关注 how 吗?
- 数据库存储结构是怎么样的?(剖析了结构后,才能知道 how)
- 索引的结构:b+ 树 + 双向链表
- 索引的分类:
- 复合索引
- function-based 索引
- clustered 索引
- Partial index
- Query optimizer(how)
- Cost-based optimizer
- execution plan
- execution plan 是怎么选择的
- statistics 是怎么获取的?
- 基于我们定义的索引,query 是如何被执行的?(How with instances)
- 简单的 equal 查询 (use the index)
- 简单的 range 查询(use the index)
- function-based 查询(use the index???— need to change the index)
- 复合索引查询(wrong order)(use the index???—no)
- 复合索引查询(correct order - all cols)(use the index)
- 复合索引查询(correct order - first few cols)(use the index)
- 聚簇索引(单 table 单索引)
- 聚簇索引(单 table 多索引)
- order by (use the index)
- group by (use the index)
- parameterized queries(why it’s better)
- like
- Implict type conversion
- Some key points
- index for equality first—then for ranges.
- Keep the scanned index range as small as possible
- Take care of over-indexing.
- Always aim to index the original data as that is often the most useful information you can put into an index.
- Avoid function-based indexing for expressions that cannot be used as access predicates.
- Avoid
select *
and fetch only the columns you need. - concatenated index instead of multiple single col index on a table
- index order is very important
- prefer parameterized query
- Avoid implict type conversion
- Always aim to index the original data as that is often the most useful information you can put into an index
- only use clustered index when you have one index for a table
- be carefule with
like
, expecially when the leading chars are generic chars