Engine-Independent Table Statistics
The engine-independent table statistics feature was first implemented in MariaDB 10.0.1 and was first enabled for queries by default in MariaDB 10.4.
Introduction
Before MariaDB 10.0, the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:
- Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).
- The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
- There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.
Engine-independent table statistics lift these limitations.
- Statistics are stored in regular tables in the
mysql
database.- it is possible for a DBA to read and update the values.
- More data is collected/used.
Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.
Use or update of data from these tables is controlled by use_stat_tables variable. Possible values are listed below:
Value | Meaning |
---|---|
'never' | The optimizer doesn't use data from statistics tables. Default for MariaDB 10.4.0 and below. |
'complementary' | The optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine. |
'preferably' | Prefer the data from statistics tables, if it's not available there, use the data from the storage engine. |
'complementary_for_queries' | Same as complementary , but for queries only (to avoid needlessly collecting for ANALYZE TABLE). From MariaDB 10.4.1. |
'preferably_for_queries' | Same as preferably , but for queries only (to avoid needlessly collecting for ANALYZE TABLE). Available and default from MariaDB 10.4.1. |
Collecting Statistics with the ANALYZE TABLE Statement
The ANALYZE TABLE statement can be used to collect table statistics. For example:
ANALYZE TABLE table_name;
When the ANALYZE TABLE statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For InnoDB, see InnoDB Persistent Statistics for more information.
When the ANALYZE TABLE statement is executed, MariaDB may also collect engine-independent statistics for the table. The specific behavior depends on the value of the use_stat_tables system variable. Engine-independent statistics will only be collected by the ANALYZE TABLE statement if one of the following is true:
- The use_stat_tables system variable is set to
complementary
orpreferably
. - The ANALYZE TABLE statement includes the
PERSISTENT FOR
clause.
In MariaDB 10.4 and later, the use_stat_tables system variable is set to preferably_for_queries
by default. With this value, engine-independent statistics are used by default, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the ANALYZE TABLE statement and by specifying the PERSISTENT FOR
clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.
Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.
Collecting Statistics for Specific Columns or Indexes
The syntax for the ANALYZE TABLE statement has been extended with the PERSISTENT FOR
clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the use_stat_tables system variable. For example:
ANALYZE TABLE table_name PERSISTENT FOR ALL;
Statistics for columns using the BLOB and TEXT data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.
Examples of Statistics Collection
-- update all engine-independent statistics for all columns and indexes ANALYZE TABLE tbl PERSISTENT FOR ALL; -- update specific columns and indexes: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...); -- empty lists are allowed: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (); ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...); -- the following will only update mysql.table_stats fields: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (); -- when use_stat_tables is set to 'COMPLEMENTARY' or 'PREFERABLY', -- a simple ANALYZE TABLE collects engine-independent statistics for all columns and indexes. SET SESSION use_stat_tables='COMPLEMENTARY'; ANALYZE TABLE tbl;
Manual Updates to Statistics Tables
Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.
It is possible to update statistics tables manually. One should modify the table(s) with regular INSERT/UPDATE/DELETE statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue FLUSH TABLES command.
A few scenarios where one might need to update statistics tables manually:
- Deleting the statistics. Currently, the ANALYZE TABLE command will collect the statistics, but there is no special command to delete statistics.
- Running ANALYZE on a different server. ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
- In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.
See Also
- Index Statistics
- InnoDB Persistent Statistics
- Histogram-based Statistics
- JSON histograms (mariadb.org blog)
© 2021 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/engine-independent-table-statistics/