In ASE, statistics are used by the Optimiser to determine the appropriate Query Plan. Wrong or out-of-date statistics can cause the wrong query plan to be selected which can lead to massive performance issues. It is thus very important to consider statistics in regular database maintenance activities and make sure they are updated on a regular basis.
There are a few ways in which you can make sure that the statistics are up-to-date and support the query optimizer in choosing the optimal query plan:
In the end of the article, I’ll also shortly discuss the performance considerations required when choosing the appropriate command.
Update Statistics
Using the update statistics command, you can update the statistics about the distribution of key values for a table, an index on the table, specific columns of the table or on partitions.
Only reference the table
If you only reference a table name:
update statistics mytable
it will update statistics for the leading columns of all indexes on the table. The leading column of a composite index is the first column in the index definition. This is the most important one since the index cannot be used if only the second column is used in the WHERE clause but not the first one.
Here’s an example of such an index:
create index mytable_comp_ind on maytable(col1, col2)
In this case, col1 is the leading column for this index.
Explicitely reference an index
If you explicitely reference an index on the table:
update statistics mytable mytable_comp_ind
it will update the statistics for the leading column of the specified index i.e. it will update the statistics on col1.
Explicitely reference a partition
If you explicitely reference a partition on the table:
update statistics mytable partition mypartition
it will update the density information on all columns of all indexes of the partition. It will additionally also create histograms for all leading columns of indexes of the partition. You can also provide a list of column names. In this case it will create histograms for the first column and densities for the composite columns.
Note that only local indexes of the partition are considered. Global indexes are not considered.
Also note that updating the statistics for a partition also updates the global statistics i.e. only local indexes are considered but global statistics are updated with the gathered data.
Explicitely reference columns
If you explicitely reference a list of columns:
update statistics mytable (col1, col2)
it will create histograms for the first referenced column and density information for all column groups with the first column as leading column.
Note that performing such an update statistics has two consequences:
- A table scan is performed
- A sort is performed, meaning that you need to have enough space in tempdb
So this update statistics does have a high impact on the performance of the server and requires more space in tempdb.
Also note that in many cases statistics are mostly useful for columns referenced by indexes (especially as leading columns). Updating statistics for other columns create an overhead. But in some cases it is required and better than using update all statistics.
Update Index Statistics
The command update index statistics updates the statistics of all columns of an index:
update index statistics mytable mytable_comp_ind
The difference between this command and:
update statistics mytable(col1, col2)
is that the latter focuses on the lead column of the index. The former creates histograms not only for the leading column of the index but for all columns specified in the index definition.
You can also update the statistics for all columns of all indexes on the table by omitting the index name:
update index statistics mytable
Note that it is not exactly the same as calling update index statistics for each index on the table since the latter updates the statistics of columns referenced by multiple indexes multiple times. So if you intend to update the statistics for all indexes, it’s more performant to omit the index name than issuing one command per index.
Here also you can specify a data or index partition.
From a performance point view, this means that:
- An index scan will be performed to update the statistics of the leading column of the index
- For the other columns, a table scan for each of them will be required followed by a sorting based on a table in tempdb
This is basically the same problem as when using update statistics on columns which are not the leading column of an index.
Update All Statistics
This command creates statistics for all columns of the specified table:
update all statistics mytable
A partition can also additionally be specified.
From a performance point of view, this means that:
- An index scan will be performed for each column which is the leading column of an index
- A table scan will be performed for each other table, creating a work table in tempdb which will then be sorted
Update Table Statistics
The update table statistics is kind of a different beast. It doesn’t update the data in systabstats but only table or partition level statistics in sysstatistics i.e. it does not affect column level statistics.
Here the syntax:
update table statistics mytable
Or with a partition name:
update table statistics mytable mypartition
Modifying Statistics
sp_modifystats is a system stored procedure which can be used to update density information for a table of a column group.
There are two uses of this command.
Modify density information
You can use the MODIFY_DENSITY parameter to change the cell density information for columns:
sp_modifystats mytable, "col1%", range, absolute, "0.5"
This will set the range cell density to 0.5 for the column col1 as well as for all column group with density information having col1 as leading column.
The column group parameter can either contain:
- “all” meaning that all column groups should be updated
- A column name meaning that only the density information for this column should be updated
- A list of columns meaning that the density information of this specified column group should be updated
- A column name or column list including wildcards as shown above
Note that the next update statistics will reset the density information. So it’s a good idea to always call sp_modifystats after an update statistics.
Handling skew in data
Please refer to my previous post regarding data skew, their effect on queries and how to handle them: Data skew and query plans.
Performance
Updates of statistics are a required maintenance activities if you want to keep your queries fast. On the other hand, updating statistics also has a non-negligeable impact on performance and load.
In general the leading column of indexes are the critical ones. So you need to make sure they are always up-to-date.
But in some cases it does make sense to have up-to-date statistics also for other columns: If you have a WHERE clause also containing this column not being part of an index and 99% of the values in this column are the same, this will greatly impact how joins are done. Without statistics on this column, a default distribution will be assumed and the wrong join might be selected.
So I’d recommend:
- Using “update statistics” instead of the other commands to optimize the leading columns of indexes
- Using “update statistics” on specific columns which are not leading column in an index, in case you see that ASE chooses the wrong join because it assumes a default distribution of data and it is not the case. Use it only if really required as it creates a huge load on the system
- Avoid using “update all statistics”. It generally makes more sense to use dedicated commands to update what needs to be updated
More information regarding sampling, histogram steps and the degree of parallelism will be added later…