MySQL机制之排序算法

在很多场景中，我们都要求从数据库中查询出来的数据是有序的，SQL语法为order by。如果在查询使用到了排序，那么在EXPLAIN中，Extra 这个字段中会存在Using filesort。

mysql> explain select city,name,age from t where city='杭州' order by name limit 1000  ;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+----------------+
|  1 | SIMPLE      | t     | NULL       | ref  | city          | city | 66      | const |    1 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+----------------+

那么它在MySQL中是怎么实现的？什么参数会影响它呢？

全字段排序

通常情况下，语句执行流程如下：

初始化sort_buffer，确认需要放入的字段。(city,name,age)
从索引中查询满足条件的数据。
如果需要进行回表查询，那么会根据ID到聚簇索引中去查询到所需的字段。并将字段数据放入到sort_buffer中。
继续查询下一条记录，重复以上动作。直到将所有满足条件的数据都查询出来。
对sort_buffer中的数据按照排序字段进行排序。
将排序结果的前1000行返回给客户端。

按照name排序的这个过程，可能在内存中完成，也可能需要使用外部排序。这取决于排序所需要的内存和参数sort_buffer_size

sort_buffer_size，就是 MySQL 为排序开辟的内存（sort_buffer）的大小。如果要排序的数据量小于 sort_buffer_size，排序就在内存中完成。但如果排序数据量太大，内存放不下，则不得不利用磁盘临时文件辅助排序。

rowid排序

在上面的排序算法中，只对原表的数据读取了一遍，剩下的操作都是在sort_buffer和临时文件中执行。
但是，如果需要返回的字段很多的话，sort_buffer中需要保存所有的字段，此时sort_buffer中保存不了几条数据，需要使用很多个临时文件，排序效率太低了。

于是MySQL引入了max_length_for_sort_data参数来进行控制，如果需要排序的字段总长度超过这个值，就会使用rowid排序算法。

max_length_for_sort_data，是 MySQL 中专门控制用于排序的行数据的长度的一个参数。它的意思是，如果单行的长度超过这个值，MySQL 就认为单行太大，要换一个算法。

在rowid排序中，整个排序过程就变成：

初始化 sort_buffer，确定放入两个字段，即name和id
从索引中查找满足条件的数据的这两个字段值
取出这两个值保存到sort_buffer中
继续查询下一条记录，重复以上动作。直到将所有满足条件的数据都查询出来。
对sort_buffer中的数据进行排序
遍历排序结果，取前 1000 行，并按照 id 的值回到原表中取出 city、name 和 age 三个字段返回给客户端。

优先队列排序

MySQL 5.6 版本引入的一个新的排序算法，即：优先队列排序算法
在一个排序场景中，我们可能只需要前面几条的数据。如果采用的是上面的两种排序算法，那么需要对所有数据进行排序，这样造成了无谓的排序损耗。
于是MySQL引入了优先队列排序算法，只保存前面的几条数据信息。

扩展:如何确认排序语句是否使用了临时文件

optimizer_trace 功能可以让我们方便的查看优化器生成执行计划的整个过程:
主要会体现以下阶段的具体计算：

prepare阶段
optimize阶段
execute阶段
基于成本的优化主要集中在optimize阶段
单表查询来说，我们主要关注optimize阶段的”rows_estimation”这个过程，这个过程深入分析了对单表查询的各种执行方案的成本
对于多表连接查询来说，我们更多需要关注”considered_execution_plans”这个过程，这个过程里会写明各种不同的连接方式所对应的成本

/* 打开optimizer_trace，只对本线程有效 */
SET optimizer_trace='enabled=on'; 

/* @a保存Innodb_rows_read的初始值 */
select VARIABLE_VALUE into @a from  performance_schema.session_status where variable_name = 'Innodb_rows_read';

/* 执行语句 */
select city, name,age from t where city='杭州' order by name limit 1000; 

/* 查看 OPTIMIZER_TRACE 输出 */
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G

/* @b保存Innodb_rows_read的当前值 */
select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';

/* 计算Innodb_rows_read差值 */
select @b-@a;

optimizer_trace输出信息：

mysql> SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G
*************************** 1. row ***************************
                            QUERY: select city, name,age from t where city='杭州' order by name limit 1000
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `t`.`city` AS `city`,`t`.`name` AS `name`,`t`.`age` AS `age` from `t` where (`t`.`city` = '杭州') order by `t`.`name` limit 1000"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`t`.`city` = '杭州')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "multiple equal('杭州', `t`.`city`)"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "multiple equal('杭州', `t`.`city`)"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "multiple equal('杭州', `t`.`city`)"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`t`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
              {
                "table": "`t`",
                "field": "city",
                "equals": "'杭州'",
                "null_rejecting": false
              }
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`t`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 1,
                    "cost": 2.45
                  },
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "city",
                      "usable": true,
                      "key_parts": [
                        "city",
                        "id"
                      ]
                    }
                  ],
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "skip_scan_range": {
                    "potential_skip_scan_indexes": [
                      {
                        "index": "city",
                        "usable": false,
                        "cause": "query_references_nonkey_column"
                      }
                    ]
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "city",
                        "ranges": [
                          "杭州 <= city <= 杭州"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,
                        "cost": 0.61,
                        "chosen": true
                      }
                    ],
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    }
                  },
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "city",
                      "rows": 1,
                      "ranges": [
                        "杭州 <= city <= 杭州"
                      ]
                    },
                    "rows_for_plan": 1,
                    "cost_for_plan": 0.61,
                    "chosen": true
                  }
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "city",
                      "rows": 1,
                      "cost": 0.35,
                      "chosen": true
                    },
                    {
                      "access_type": "range",
                      "range_details": {
                        "used_index": "city"
                      },
                      "chosen": false,
                      "cause": "heuristic_index_cheaper"
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 1,
                "cost_for_plan": 0.35,
                "chosen": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`t`.`city` = '杭州')",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`t`",
                  "attached": "(`t`.`city` = '杭州')"
                }
              ]
            }
          },
          {
            "optimizing_distinct_group_by_order_by": {
              "simplifying_order_by": {
                "original_clause": "`t`.`name`",
                "items": [
                  {
                    "item": "`t`.`name`"
                  }
                ],
                "resulting_clause_is_simple": true,
                "resulting_clause": "`t`.`name`"
              }
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`t`",
                "original_table_condition": "(`t`.`city` = '杭州')",
                "final_table_condition   ": null
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`t`"
              }
            ]
          },
          {
            "considering_tmp_tables": [
              {
                "adding_sort_to_table": "t"
              }
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "sorting_table": "t",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`t`.`name`"
              }
            ],
            "filesort_priority_queue_optimization": {
              "limit": 1000
            },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "memory_available": 262144,
              "key_size": 264,
              "row_size": 406,
              "max_rows_per_buffer": 633,
              "num_rows_estimate": 1057,
              "num_rows_found": 0,
              "num_initial_chunks_spilled_to_disk": 0,
              "peak_memory_used": 0,
              "sort_algorithm": "none",
              "sort_mode": "<varlen_sort_key, packed_additional_fields>"
            }
          }
        ]
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.29 sec)

《MySQL实战45讲(16、17)》