Histogram-Based Statistics

MariaDB starting with 10.4.3

Histograms were introduced in MariaDB 10.0, and are collected by default from MariaDB 10.4.3.

Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Consider this example, using the following query:

SELECT * FROM t1,t2 WHERE t1.a=t2.a and t2.b BETWEEN 1 AND 3;

Let's assume that

  • table t1 contains 100 records
  • table t2 contains 1000 records
  • there is a primary index on t1(a)
  • there is a secondary index on t2(a)
  • there is no index defined on column t2.b
  • the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

  • accesses t1 using a table scan
  • accesses t2 using index t2(a)
  • checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

  • accesses table t2 in a table scan
  • checks the condition t2.b BETWEEN 1 AND 3
  • accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

The histogram_size variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for MariaDB 10.4.2 and below), no histograms are created when running an ANALYZE TABLE.

histogram_type

The histogram_type variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From MariaDB 10.4.3, double precision is the default. For MariaDB 10.4.2 and below, single precision is the default. From MariaDB 10.7, JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The optimizer_use_condition_selectivity controls which statistics can be used by the optimizer when looking for the best query execution plan.

  • 1 Use selectivity of predicates as in MariaDB 5.5.
  • 2 Use selectivity of all range predicates supported by indexes.
  • 3 Use selectivity of all range predicates estimated without histogram.
  • 4 Use selectivity of all range predicates estimated with histogram.
  • 5 Additionally use selectivity of certain non-range predicates calculated on record sample.

From MariaDB 10.4.1, the default is 4. Until MariaDB 10.4.0, the default is 1.

See Also

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

© 2021 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/histogram-based-statistics/