一、统计信息分类
Database statistics are produced by:
– Using the DBMS_STATS package (recommended)
– Running the ANALYZE command
• System statistics are produced by using the
  DBMS_STATS package.
• Statistics are stored in the data dictionary tables
   and cached in the shared pool.
• CBO converts statistics into costs
    CBO优化器将统计信息转化为成本计算

prior to use in costing execution plans.
• You can gather statistics for tables, indexes, and
   table columns separately.
为了基于成本CBO计算,需要对tables,indexes,table columns进行单独的数据统计



二、.DBMS_STAT包含如下内容的统计信息
DBMS_STATS statistics that are generated include the following:
• Table statistics
  Number of rows:表行数
  Number of blocks:
  Average row length:avg_lenth
  Global statistics (if requested)
  User statistics (if requested)
  Sample size
   Last analyzed
• Column statistics
  Number of distinct values (NDV) in column: 列的位置值
  Number of nulls in column:列里面包含的null值
  Data distribution (histogram):数据的分布(直方图)

• Index statistics
  Number of leaf blocks:叶节点block数
  Levels                         二元高度
  Clustering factor          集簇因子
• System statistics (discussed later in this lesson)

  I/O performance and utilization
  CPU performance and utilization

2.1Table表统计信息


DBA_TABLES and DBA_TAB_[SUB]PARTITIONS
• Used to determine:
– Table and (sub)partition access cost
– Join cardinality
– Join order
• Some of the statistics generated are:
– Row count (NUM_ROWS)
– Block count (BLOCKS) Exact

dba_tables
1)NUM_ROWS行数量
This is the basis of cardinality computations. Row count is especially important if the
table is the driving table of a nested loops join; the row count defines how many times
the inner table will be probed
2)BLOCKS
Blocks is the number of used data blocks. Block count in combination with
DB_FILE_MULTIBLOCK_READ_COUNT gives the base table access cost as
described in an earlier lesson.
3)AVG_ROW_LEN
This is the average length of a row in the table in bytes.

4)EMPTY_BLOCKS(HWK高水位线下,空闲的blocks)
Number of empty (never used) data blocks in the table. This is the number of blocks
between the used data blocks and the high-water mark.
5)AVG_SPACE(空闲block空间的平均大小)
This is the average amount of free space, in bytes, in a data block allocated to the
table.
6)CHAIN_CNT
This is the number of rows in the table that are chained from one data block to
another, or that have migrated to a new block, requiring a link to preserve the old
ROWID.


2.2Index Statistics(索引统计信息)
DBA_INDEXES and DBA_IND_[SUB]PARTITIONS
• Used to decide:
– Full table scan versus index scan
• Statistics generated are:
– B*-tree level (BLEVEL) Exact
– Leaf block count (LEAF_BLOCKS)
– Clustering factor (CLUSTERING_FACTOR)
– Distinct keys (DISTINCT_KEYS)


In general, in order to select an index access, the CBO requires a predicate on the
prefix of the index columns. However, in case there is no predicate and all columns
referenced in the query are present in an index, the CBO considers using a full index
scan versus a full table scan.
1)一般情况下,CBO为了走索引,必须要求谓词(where)中带有一个前导的索引的字段;
    比如t(id,name,loc),ind on(id)
    select id,name from t where id=xxx  (id为谓词)
2)特殊的情况,查询语句没有谓词(没有where),同时查询的所有列都包含在一个索引中(可能是组合索引),那么CBO
  会根据具体情况走Index Full Scan或者Table Full Scan(取决于全扫表索引叶子的成本和Table Full Scan的成本)
注:Index Range Scan就是叶子节点到叶子节点的范围扫描
如:
--表
 OBJECT_ID OBJECT_NAME                                                                                                                      OBJECT_TYPE         STATUS
---------- -------------------------------------------------------------------------------------------------------------------------------- ------------------- -------
        20 ICOL$                                                                                                                                                    TABLE               VALID
--索引

 OWNER      TABLE_NAME                INDEX_NAME                COLUMN_NAME          COL_LE DESC UNIQUENESS BLEVEL LEAF_BLOCKS   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
---------- ------------------------- ------------------------- -------------------- ------ ---- ---------- ------ ----------- ---------- ------------- -----------------
TEST       T_OBJ                     IND_OBJ                   OBJECT_ID                  22     ASC  NONUNIQUE  1              234      34271         34270               422
TEST       T_OBJ                     IND_OBJ                   OBJECT_NAME          128    ASC  NONUNIQUE  1              234      34271         34270               422
TEST       T_OBJ                     IND_OBJ                   OBJECT_TYPE          19       ASC  NONUNIQUE  1              234      34271         34270               422



(object_id包含null值,null不能对索引产生影响导致走全表)

1)
SQL> select object_id from t_obj;

68539 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 172510092

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 34271 |   167K|    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_OBJ | 34271 |   167K|    55   (0)| 00:00:01 |              ---全表扫描
---------------------------------------------------------------------------


2)Alter table t_obj modify object_id not null;
演示对于组合索引的各个字段查询,都是ind full scan
select obj_id from t_obj;

Execution Plan
----------------------------------------------------------
Plan hash value: 2699271081

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         | 34271 |   167K|    53   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_OBJ | 34271 |   167K|    53   (0)| 00:00:01 |
--------------------------------------------------------------------------------
select object_name from t_obj;

Execution Plan
----------------------------------------------------------
Plan hash value: 2699271081

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         | 34271 |   301K|    53   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_OBJ | 34271 |   301K|    53   (0)| 00:00:01 |
--------------------------------------------------------------------------------


select object_type from t_obj;
Execution Plan
----------------------------------------------------------
Plan hash value: 2699271081

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         | 34271 |   301K|    53   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_OBJ | 34271 |   301K|    53   (0)| 00:00:01 |
--------------------------------------------------------------------------------

SQL> select object_id,object_name,object_type from t_obj
  2  where object_id=3444 and object_type='TABLE';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3193802408

----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |     1 |    39 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND_OBJ |     1 |    39 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

SQL> select * from t_obj  where object_id=10237 and object_name='T' and object_type='TABLE';


Execution Plan
----------------------------------------------------------
Plan hash value: 2263293454

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |    43 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_OBJ   |     1 |    43 |     2   (0)| 00:00:01 |       (组合索引所有字段都用到,类似唯一定位?)
|*  2 |   INDEX RANGE SCAN          | IND_OBJ |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


2.3Column Statistics
DBA_TAB_COL_STATISTICS and DBA_TAB_HISTOGRAMS
DBA_[SUB]PART_COL_STATISTICS
DBA_[SUB]PART_HISTOGRAMS
• Count of distinct values of the column
   (NUM_DISTINCT)
• Low value (LOW_VALUE) Exact
• High value (HIGH_VALUE) Exact
• Number of NULLS (NUM_NULLS)

NUM_DISTINCT
Used in selectivity calculations (for example, 1/NDV)
LOW_VALUE and HIGH_VALUE
The CBO assumes uniform distribution of values between low and high values for all
data types. These values are used to determine range selectivities.
NUM_NULLS
Helps with selectivity of nullable columns and IS NULL and IS NOT NULL
predicates
DENSITY
Only relevant for histograms; used as selectivity estimate for nonpopular values
Above columns remain for backward compatibility with Oracle7. This information is
now in the {TAB|PART}_COL_STATISTICS views. This view now picks up these
values from HIST_HEAD$ rather than COL$.
NUM_BUCKETS
The number of buckets in the column histogram



2.4Default Statistics

  

Default statistics can be important if the CBO is forced when insufficient statistics
have been gathered or some objects are missing statistics. With bind variables, you
often have no alternative to the use of the defaults due to the lack of other
information. Because the statistics are hard coded, there is a high likelihood that they
are inaccurate. All objects should be sufficiently analyzed prior to use of the CBO.
默认的统计信息对于CBO会很重要,尤其是在当前没有足够的统计信息的场合&局部统计信息丢失的场景。
Default Statistics (from kke.h) for Oracle7.3.3 (and later)
Selectivity for relations on indexed columns .009
Selectivity for = on indexed columns .        004
Multiblock read factor                          8
Multiblock write factor                         8
Remote table average row length               100
# of blocks                                   100
Scan cost                                      13
Index levels                                    1
Number leaf blocks/key                          1