Post

PostgreSQL并集操作 UNION 和 UNION ALL

并集操作 UNION 和 UNION ALL 异同

PostgreSQL并集操作 UNION 和 UNION ALL

PostgreSQL并集操作union和union all

UNION ALL:使用两个结果集,并将他们合并成为一个结果集;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
SELECT city_id,city FROM city WHERE country_id = 23
 city_id |     city
---------+---------------
      46 | Baicheng
      47 | Baiyin
      80 | Binzhou
     109 | Changzhou
     ... | ...
     502 | Suihua
     535 | Tianjin
     536 | Tiefa
     537 | Tieli
     540 | Tongliao
     574 | Weifang
     578 | Xiangfan
     579 | Xiangtan
     580 | Xintai
     581 | Xinxiang
     584 | Yantai
     587 | Yinchuan
     588 | Yingkou
     590 | Yuncheng
     591 | Yuzhou
     592 | Zalantun
     594 | Zaoyang
     599 | Zhoushan
(53 rows)
1
2
3
4
5
6
7
8
9
10
11
12
SELECT city_id,city FROM city WHERE city_id > 500;
city_id |         city
---------+----------------------
     501 | Sucre
     502 | Suihua
		 ... | ...
     596 | Zaria
     597 | Zeleznogorsk
     598 | Zhezqazghan
     599 | Zhoushan
     600 | Ziguinchor
(100 rows)

那么这两个集合做 UNION ALL 就是把他们合并在一起结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT city_id,city FROM city WHERE country_id = 23
UNION ALL
SELECT city_id,city FROM city WHERE city_id > 500;
 city_id |         city
---------+----------------------
      46 | Baicheng
      47 | Baiyin
      80 | Binzhou
     109 | Changzhou
     136 | Datong
     139 | Daxian
     145 | Dongying
     157 | Emeishan
     159 | Enshi
     166 | Ezhou
		 ... | ...
     502 | Suihua
		 ... | ...
     594 | Zaoyang
     599 | Zhoushan
     501 | Sucre
     502 | Suihua
	   ... | ...
     599 | Zhoushan
     600 | Ziguinchor
(153 rows)

可以看到 UNION ALL 就是简单的将两个结果集合并在了一起,甚至都没有进行排序,我们看一下它的执行计划:

1
2
3
4
5
6
7
Append  (cost=4.56..21.01 rows=153 width=13)
  ->  Bitmap Heap Scan on city  (cost=4.56..10.22 rows=53 width=13)
        Recheck Cond: (country_id = 23)
        ->  Bitmap Index Scan on idx_fk_country_id  (cost=0.00..4.55 rows=53 width=0)
              Index Cond: (country_id = 23)
  ->  Index Scan using city_pkey on city city_1  (cost=0.28..10.03 rows=100 width=13)
        Index Cond: (city_id > 500)

从执行计划可以看到,两个集合的合并仅仅通过 Append 完成,也就是把第二个结果集直接加到了第一个结果集后面;

UNION:实现UNION ALL,然后依据所有列来排列结果,并删除重复的行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
SELECT city_id,city FROM city WHERE country_id = 23
UNION
SELECT city_id,city FROM city WHERE city_id > 500;
 city_id |         city
---------+----------------------
     502 | Suihua
     207 | Hohhot
     563 | Valle de Santiago
     240 | Jinchang
     136 | Datong
     462 | Sanya
     540 | Tongliao
     175 | Fuzhou
     595 | Zapopan
     572 | Vitria de Santo Anto
     553 | Uijongbu
     515 | Tabuk
     546 | Tsaotun
     516 | Tafuna
     598 | Zhezqazghan
     518 | Taizz
     469 | Shaoguan
     511 | Syrakusa
     241 | Jining
     249 | Junan
     537 | Tieli
     559 | Uttarpara-Kotrung
     543 | Toulon
     299 | Leshan
     575 | Witten
     510 | Syktyvkar
     594 | Zaoyang
     574 | Weifang
      47 | Baiyin
     525 | Tangail
     302 | Liaocheng
     532 | Tegal
     520 | Tallahassee
     577 | Wroclaw
     551 | Udine
     586 | Yerevan
     600 | Ziguinchor
     576 | Woodridge
     174 | Fuyu
     535 | Tianjin
     501 | Sucre
     549 | Tychy
     558 | Usolje-Sibirskoje
     503 | Sullana
     530 | Tartu
     560 | Vaduz
     199 | Hami
     568 | Vijayawada
     554 | Uluberia
     193 | Haining
     291 | Laiwu
     159 | Enshi
     517 | Taguig
     583 | Yangor
     581 | Xinxiang
     556 | Uruapan
     562 | Valle de la Pascua
     523 | Tanauan
     569 | Vila Velha
     547 | Tsuyama
     591 | Yuzhou
     593 | Zanzibar
     526 | Tanshui
     541 | Torren
     592 | Zalantun
     587 | Yinchuan
     550 | Udaipur
     565 | Vancouver
     512 | Szkesfehrvr
     573 | Warren
     579 | Xiangtan
     596 | Zaria
     588 | Yingkou
      46 | Baicheng
     578 | Xiangfan
     545 | Trshavn
     298 | Lengshuijiang
     527 | Tanza
     533 | Tel Aviv-Jaffa
     539 | Tonghae
     145 | Dongying
     597 | Zeleznogorsk
     585 | Yaound
     531 | Teboksary
     589 | York
     582 | Yamuna Nagar
     536 | Tiefa
     139 | Daxian
     524 | Tandil
     166 | Ezhou
     599 | Zhoushan
     561 | Valencia
     507 | Sungai Petani
     584 | Yantai
     566 | Varanasi (Benares)
     528 | Tarlac
     506 | Sumy
     407 | Pingxiang
     277 | Korla
     519 | Talavera
     534 | Tete
     505 | Sumqayit
     468 | Shanwei
     552 | Ueda
     544 | Toulouse
     529 | Tarsus
     432 | Rizhao
     293 | Laohekou
     555 | Urawa
     109 | Changzhou
     564 | Valparai
     513 | Tabora
     509 | Surakarta
     538 | Tokat
     571 | Vinh
     557 | Usak
     514 | Tabriz
     157 | Emeishan
     521 | Tama
     504 | Sultanbeyli
     590 | Yuncheng
     570 | Vilnius
     422 | Qinhuangdao
     362 | Nanyang
     542 | Touliu
     242 | Jinzhou
     508 | Sunnyvale
     580 | Xintai
     471 | Shenzhen
     507 | Sungai Petani
     584 | Yantai
     566 | Varanasi (Benares)
     528 | Tarlac
     506 | Sumy
     407 | Pingxiang
     277 | Korla
     519 | Talavera
     534 | Tete
     505 | Sumqayit
     468 | Shanwei
     552 | Ueda
     544 | Toulouse
     529 | Tarsus
     432 | Rizhao
     293 | Laohekou
     555 | Urawa
     109 | Changzhou
     564 | Valparai
     513 | Tabora
     509 | Surakarta
     538 | Tokat
     571 | Vinh
     557 | Usak
     514 | Tabriz
     157 | Emeishan
     521 | Tama
     504 | Sultanbeyli
     590 | Yuncheng
     570 | Vilnius
     422 | Qinhuangdao
     362 | Nanyang
     542 | Touliu
     242 | Jinzhou
     508 | Sunnyvale
     580 | Xintai
     471 | Shenzhen
     567 | Vicente Lpez
     210 | Huaian
      80 | Binzhou
     548 | Tuguegarao
     332 | Meixian
(135 rows)

我们直观的可以看到UNION返回的行数(135)小于UNION ALL返回的行数(153),而他们相差的数正好是第一个集合(country_id = 23)中 city_id > 500 的行数 18;也就是说UNION做了去重,我们看一下UNION的执行计划:

1
2
3
4
5
6
7
8
9
HashAggregate  (cost=21.78..23.31 rows=153 width=36)
  Group Key: city.city_id, city.city
  ->  Append  (cost=4.56..21.01 rows=153 width=36)
        ->  Bitmap Heap Scan on city  (cost=4.56..10.22 rows=53 width=13)
              Recheck Cond: (country_id = 23)
              ->  Bitmap Index Scan on idx_fk_country_id  (cost=0.00..4.55 rows=53 width=0)
                    Index Cond: (country_id = 23)
        ->  Index Scan using city_pkey on city city_1  (cost=0.28..10.03 rows=100 width=13)
              Index Cond: (city_id > 500)

可以看到 Append 往下,UNION和UINION ALL是完全一样的,不同的是UNION多了个HashAggregate。HashAggregate 完成去重(Unique/Distinct)的核心逻辑:

利用哈希表(Hash Table)在内存中记录已经出现过的数据行。

具体到上述查询(对 city_idcity 进行去重),它的工作流程如下:

  1. 建立哈希表:Postgres 在内存中开辟一块空间。
  2. 扫描数据:它从下方的 Append 节点(包含两个子查询的结果)依次读取每一行。
  3. 计算哈希值:对 Group Key(即 city_idcity 的组合)计算哈希值。
  4. 查重与插入
    • 拿着哈希值去哈希表中查找。
    • 如果没找到:说明这一行是第一次出现,将其存入哈希表。
    • 如果找到了:说明是重复行,直接丢弃(因为这是去重操作)。
  5. 输出结果:当所有数据扫描完毕,哈希表中留下的就是唯一的行,最后一次性输出。

但是我们从上面的结果可以看到一个现象,那就是原本有序的数据被打乱成无序数据了,这一点也是HashAggregate的重要特征:输入数据集合无需有序,输出结果也是无序的;

This post is licensed under CC BY 4.0 by the author.