我现在有一个数据库, 表 notice 大概 100W row 左右的数据
表结构如下:
CREATE TABLE public.notice
(
id integer NOT NULL DEFAULT nextval('notice_id_seq'::regclass),
title character varying(200) COLLATE pg_catalog."default" NOT NULL,
tags integer[] DEFAULT '{}'::integer[],
public_time timestamp without time zone,
content text COLLATE pg_catalog."default",
type integer NOT NULL,
site character(20) COLLATE pg_catalog."default",
url character varying(1024) COLLATE pg_catalog."default",
area character(10) COLLATE pg_catalog."default",
CONSTRAINT notice_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
索引如下:
CREATE INDEX area_type_time_desc
ON public.notice USING btree
(area COLLATE pg_catalog."default" bpchar_pattern_ops, type, public_time DESC)
TABLESPACE pg_default;
如下查询语句:
explain analyze select id, public_time, title, area from notice where area ~ '^110' and type=31 limit 10
输出:
"Limit (cost=0.43..39.49 rows=10 width=122) (actual time=0.930..0.977 rows=10 loops=1)"
" -> Index Scan using area_type_time_desc on notice (cost=0.43..67845.25 rows=17369 width=122) (actual time=0.928..0.971 rows=10 loops=1)"
" Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))"
" Filter: (area ~ '^110'::text)"
"Planning Time: 0.532 ms"
"Execution Time: 1.020 ms"
可以看出索引是生效的!
当加上排序:
explain analyze select id, public_time, title, area from notice where area ~ '^110' and type=31 order by public_time desc limit 10
输出:
"Limit (cost=51918.94..51918.97 rows=10 width=122) (actual time=214.522..214.530 rows=10 loops=1)"
" -> Sort (cost=51918.94..51962.36 rows=17369 width=122) (actual time=214.520..214.522 rows=10 loops=1)"
" Sort Key: public_time"
" Sort Method: top-N heapsort Memory: 29kB"
" -> Bitmap Heap Scan on notice (cost=2012.57..51543.60 rows=17369 width=122) (actual time=30.013..205.214 rows=18635 loops=1)"
" Recheck Cond: (type = 31)"
" Filter: (area ~ '^110'::text)"
" Heap Blocks: exact=15625"
" -> Bitmap Index Scan on area_type_time_desc (cost=0.00..2008.23 rows=17362 width=0) (actual time=23.107..23.107 rows=18635 loops=1)"
" Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))"
"Planning Time: 0.636 ms"
"Execution Time: 214.591 ms"
排序耗时非常长, 没有用到索引排序
我尝试只建立一个 public_time desc 的索引, 查询不加任何条件只有一个 order by 同样排序要很耗时
请前辈们指正一下我哪里用的用问题, 我查了好多文档, 没有找到解决方案, 特来求助
1
zbinlin 2019-03-23 23:05:10 +08:00
调整下 work_mem 或 maintenance_work_mem,重新建索引试试。
|
2
xiangyuecn 2019-03-23 23:38:37 +08:00
一楼正解,调大排序内存,虽然不懂 pg。Sort Method: top-N heapsort Memory: 29kB,这句翻译过来不知道是不是字面意思,1.8 万数据这么点内存去排序,200 毫秒算是蛮快了。
|
3
reus 2019-03-24 00:06:22 +08:00
work_mem 太小可能是原因,但 work_mem 小并没有导致排序慢,而是触发了 recheck cond,所以才慢
Bitmap Index Scan 这一步,如果 work_mem 太小,它不会返回 row id 而是 page id,这样就需要 Bitmap Heap Scan 里 recheck 这一步,因为一个 page 可能有不符合条件的 row。 排序并不慢,Bitmap Heap Scan 实际跑了 205.214 毫秒,加上排序是 214.520 ,占比不高。 |
4
kyf0722 OP @zbinlin
pg 默认的 work_mem 是 4M maintenance_work_mem 是 64M 我修改当前 session 的 work_mem 到 128M, maintenance_work_mem 到 512M 然后 drop index, 再重新 create index 同样上面带 order by 的查询语句执行结果 ``` "Limit (cost=51977.56..51977.58 rows=10 width=122) (actual time=126.678..126.685 rows=10 loops=1)" " -> Sort (cost=51977.56..52021.04 rows=17394 width=122) (actual time=126.676..126.678 rows=10 loops=1)" " Sort Key: public_time DESC" " Sort Method: top-N heapsort Memory: 29kB" " -> Bitmap Heap Scan on notice (cost=2017.69..51601.68 rows=17394 width=122) (actual time=35.036..117.511 rows=18635 loops=1)" " Recheck Cond: (type = 31)" " Filter: (area ~ '^110'::text)" " Heap Blocks: exact=15625" " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..2013.34 rows=17386 width=0) (actual time=28.123..28.123 rows=18635 loops=1)" " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))" "Planning Time: 0.615 ms" "Execution Time: 126.819 ms" ``` 比刚刚确实快一些, 但是排序似乎还是和索引没有关系, 如果我把 order by 的 public_time 替换为 id, 可以在 2-5ms 内查出数据 @xiangyuecn Sort Method: top-N heapsort Memory: 29kB 中的 29kB 应该是排序使用了 29K 的内存, 我试过如果数据量巨大, 内存不够的时候会显示 disk size, 也就是用到了磁盘 |
5
kyf0722 OP @reus 这位仁兄说的很有道理
我索性直接把 work_mem 加到了 512M, 但是执行结果更加匪夷所思 "Limit (cost=118704.89..118704.92 rows=10 width=134) (actual time=688.049..688.057 rows=10 loops=1)" " -> Sort (cost=118704.89..118816.86 rows=44786 width=134) (actual time=688.047..688.049 rows=10 loops=1)" " Sort Key: public_time DESC" " Sort Method: top-N heapsort Memory: 27kB" " -> Bitmap Heap Scan on notice (cost=1869.15..117737.08 rows=44786 width=134) (actual time=47.015..663.260 rows=44366 loops=1)" " Filter: ((area ~ '^110'::text) AND (type <> 31))" " Rows Removed by Filter: 18635" " Heap Blocks: exact=39610" " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..1857.96 rows=62153 width=0) (actual time=28.356..28.356 rows=63001 loops=1)" " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar))" "Planning Time: 0.578 ms" "Execution Time: 688.124 ms" 无法理解了!!, 多次执行都差不多 |
6
kyf0722 OP @reus 抱歉, 错了, 请忽略上条回复. work_mem 加到 512M 之后, 的执行结果
"Limit (cost=51977.56..51977.58 rows=10 width=134) (actual time=112.037..112.044 rows=10 loops=1)" " -> Sort (cost=51977.56..52021.04 rows=17394 width=134) (actual time=112.034..112.037 rows=10 loops=1)" " Sort Key: public_time DESC" " Sort Method: top-N heapsort Memory: 29kB" " -> Bitmap Heap Scan on notice (cost=2017.69..51601.68 rows=17394 width=134) (actual time=30.136..103.225 rows=18635 loops=1)" " Recheck Cond: (type = 31)" " Filter: (area ~ '^110'::text)" " Heap Blocks: exact=15625" " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..2013.34 rows=17386 width=0) (actual time=22.173..22.173 rows=18635 loops=1)" " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))" "Planning Time: 0.571 ms" "Execution Time: 112.108 ms" 还是触发了 Recheck |
7
reus 2019-03-24 11:47:14 +08:00
如果存储是 SSD,用 SET random_page_cost = 1; 看有没有改善。bitmap index scan 的目的是减少不必要的 page read,因为贵。但如果随机读 page 的成本低,planner 会倾向于直接读而不是用 bitmap index scan。
还有个可能的原因是物理存储位置太分散,也会用 bitmap index scan,试下 vacuum full notice |
8
encro 2019-06-19 16:33:58 +08:00
这个组合索引顺序有点问题,先 type,再 area,在这个查询里,area 索引应该失效了。
|
9
encro 2019-06-19 16:36:31 +08:00
area 因为是模糊查询,导致 public_time 也失效了。
|