PostgreSQL并集操作 UNION 和 UNION ALL
并集操作 UNION 和 UNION ALL 异同
PostgreSQL并集操作 UNION 和 UNION ALL
PostgreSQL并集操作union和union all
UNION ALL:使用两个结果集,并将他们合并成为一个结果集;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
SELECT city_id,city FROM city WHERE country_id = 23
city_id | city
---------+---------------
46 | Baicheng
47 | Baiyin
80 | Binzhou
109 | Changzhou
... | ...
502 | Suihua
535 | Tianjin
536 | Tiefa
537 | Tieli
540 | Tongliao
574 | Weifang
578 | Xiangfan
579 | Xiangtan
580 | Xintai
581 | Xinxiang
584 | Yantai
587 | Yinchuan
588 | Yingkou
590 | Yuncheng
591 | Yuzhou
592 | Zalantun
594 | Zaoyang
599 | Zhoushan
(53 rows)
1
2
3
4
5
6
7
8
9
10
11
12
SELECT city_id,city FROM city WHERE city_id > 500;
city_id | city
---------+----------------------
501 | Sucre
502 | Suihua
... | ...
596 | Zaria
597 | Zeleznogorsk
598 | Zhezqazghan
599 | Zhoushan
600 | Ziguinchor
(100 rows)
那么这两个集合做 UNION ALL 就是把他们合并在一起结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT city_id,city FROM city WHERE country_id = 23
UNION ALL
SELECT city_id,city FROM city WHERE city_id > 500;
city_id | city
---------+----------------------
46 | Baicheng
47 | Baiyin
80 | Binzhou
109 | Changzhou
136 | Datong
139 | Daxian
145 | Dongying
157 | Emeishan
159 | Enshi
166 | Ezhou
... | ...
502 | Suihua
... | ...
594 | Zaoyang
599 | Zhoushan
501 | Sucre
502 | Suihua
... | ...
599 | Zhoushan
600 | Ziguinchor
(153 rows)
可以看到 UNION ALL 就是简单的将两个结果集合并在了一起,甚至都没有进行排序,我们看一下它的执行计划:
1
2
3
4
5
6
7
Append (cost=4.56..21.01 rows=153 width=13)
-> Bitmap Heap Scan on city (cost=4.56..10.22 rows=53 width=13)
Recheck Cond: (country_id = 23)
-> Bitmap Index Scan on idx_fk_country_id (cost=0.00..4.55 rows=53 width=0)
Index Cond: (country_id = 23)
-> Index Scan using city_pkey on city city_1 (cost=0.28..10.03 rows=100 width=13)
Index Cond: (city_id > 500)
从执行计划可以看到,两个集合的合并仅仅通过 Append 完成,也就是把第二个结果集直接加到了第一个结果集后面;
UNION:实现UNION ALL,然后依据所有列来排列结果,并删除重复的行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
SELECT city_id,city FROM city WHERE country_id = 23
UNION
SELECT city_id,city FROM city WHERE city_id > 500;
city_id | city
---------+----------------------
502 | Suihua
207 | Hohhot
563 | Valle de Santiago
240 | Jinchang
136 | Datong
462 | Sanya
540 | Tongliao
175 | Fuzhou
595 | Zapopan
572 | Vitria de Santo Anto
553 | Uijongbu
515 | Tabuk
546 | Tsaotun
516 | Tafuna
598 | Zhezqazghan
518 | Taizz
469 | Shaoguan
511 | Syrakusa
241 | Jining
249 | Junan
537 | Tieli
559 | Uttarpara-Kotrung
543 | Toulon
299 | Leshan
575 | Witten
510 | Syktyvkar
594 | Zaoyang
574 | Weifang
47 | Baiyin
525 | Tangail
302 | Liaocheng
532 | Tegal
520 | Tallahassee
577 | Wroclaw
551 | Udine
586 | Yerevan
600 | Ziguinchor
576 | Woodridge
174 | Fuyu
535 | Tianjin
501 | Sucre
549 | Tychy
558 | Usolje-Sibirskoje
503 | Sullana
530 | Tartu
560 | Vaduz
199 | Hami
568 | Vijayawada
554 | Uluberia
193 | Haining
291 | Laiwu
159 | Enshi
517 | Taguig
583 | Yangor
581 | Xinxiang
556 | Uruapan
562 | Valle de la Pascua
523 | Tanauan
569 | Vila Velha
547 | Tsuyama
591 | Yuzhou
593 | Zanzibar
526 | Tanshui
541 | Torren
592 | Zalantun
587 | Yinchuan
550 | Udaipur
565 | Vancouver
512 | Szkesfehrvr
573 | Warren
579 | Xiangtan
596 | Zaria
588 | Yingkou
46 | Baicheng
578 | Xiangfan
545 | Trshavn
298 | Lengshuijiang
527 | Tanza
533 | Tel Aviv-Jaffa
539 | Tonghae
145 | Dongying
597 | Zeleznogorsk
585 | Yaound
531 | Teboksary
589 | York
582 | Yamuna Nagar
536 | Tiefa
139 | Daxian
524 | Tandil
166 | Ezhou
599 | Zhoushan
561 | Valencia
507 | Sungai Petani
584 | Yantai
566 | Varanasi (Benares)
528 | Tarlac
506 | Sumy
407 | Pingxiang
277 | Korla
519 | Talavera
534 | Tete
505 | Sumqayit
468 | Shanwei
552 | Ueda
544 | Toulouse
529 | Tarsus
432 | Rizhao
293 | Laohekou
555 | Urawa
109 | Changzhou
564 | Valparai
513 | Tabora
509 | Surakarta
538 | Tokat
571 | Vinh
557 | Usak
514 | Tabriz
157 | Emeishan
521 | Tama
504 | Sultanbeyli
590 | Yuncheng
570 | Vilnius
422 | Qinhuangdao
362 | Nanyang
542 | Touliu
242 | Jinzhou
508 | Sunnyvale
580 | Xintai
471 | Shenzhen
507 | Sungai Petani
584 | Yantai
566 | Varanasi (Benares)
528 | Tarlac
506 | Sumy
407 | Pingxiang
277 | Korla
519 | Talavera
534 | Tete
505 | Sumqayit
468 | Shanwei
552 | Ueda
544 | Toulouse
529 | Tarsus
432 | Rizhao
293 | Laohekou
555 | Urawa
109 | Changzhou
564 | Valparai
513 | Tabora
509 | Surakarta
538 | Tokat
571 | Vinh
557 | Usak
514 | Tabriz
157 | Emeishan
521 | Tama
504 | Sultanbeyli
590 | Yuncheng
570 | Vilnius
422 | Qinhuangdao
362 | Nanyang
542 | Touliu
242 | Jinzhou
508 | Sunnyvale
580 | Xintai
471 | Shenzhen
567 | Vicente Lpez
210 | Huaian
80 | Binzhou
548 | Tuguegarao
332 | Meixian
(135 rows)
我们直观的可以看到UNION返回的行数(135)小于UNION ALL返回的行数(153),而他们相差的数正好是第一个集合(country_id = 23)中 city_id > 500 的行数 18;也就是说UNION做了去重,我们看一下UNION的执行计划:
1
2
3
4
5
6
7
8
9
HashAggregate (cost=21.78..23.31 rows=153 width=36)
Group Key: city.city_id, city.city
-> Append (cost=4.56..21.01 rows=153 width=36)
-> Bitmap Heap Scan on city (cost=4.56..10.22 rows=53 width=13)
Recheck Cond: (country_id = 23)
-> Bitmap Index Scan on idx_fk_country_id (cost=0.00..4.55 rows=53 width=0)
Index Cond: (country_id = 23)
-> Index Scan using city_pkey on city city_1 (cost=0.28..10.03 rows=100 width=13)
Index Cond: (city_id > 500)
可以看到 Append 往下,UNION和UINION ALL是完全一样的,不同的是UNION多了个HashAggregate。HashAggregate 完成去重(Unique/Distinct)的核心逻辑:
利用哈希表(Hash Table)在内存中记录已经出现过的数据行。
具体到上述查询(对 city_id 和 city 进行去重),它的工作流程如下:
- 建立哈希表:Postgres 在内存中开辟一块空间。
- 扫描数据:它从下方的
Append节点(包含两个子查询的结果)依次读取每一行。 - 计算哈希值:对
Group Key(即city_id和city的组合)计算哈希值。 - 查重与插入:
- 拿着哈希值去哈希表中查找。
- 如果没找到:说明这一行是第一次出现,将其存入哈希表。
- 如果找到了:说明是重复行,直接丢弃(因为这是去重操作)。
- 输出结果:当所有数据扫描完毕,哈希表中留下的就是唯一的行,最后一次性输出。
但是我们从上面的结果可以看到一个现象,那就是原本有序的数据被打乱成无序数据了,这一点也是HashAggregate的重要特征:输入数据集合无需有序,输出结果也是无序的;
This post is licensed under
CC BY 4.0
by the author.