単純にやると group by して最大値の値を取り出してそれで where するという 2 段階な考え方
window 関数がいいよっという紹介をみたので 調べてみるとグループごとにソートしてランクとか順番のデータの列を作ることができるみたい
便利そうだけど対象テーブルの行全部にランクの列を追加した一時テーブル作ってそこから検索ってあんまり速くなさそう
気になったので実際に試してみた
データは 30000 件ちょっと
id 列が主キーで group_id 列はインデックスあり
PARTITION BY
explain analyze
select "id", "group_id"
from (select "id", "group_id", row_number() OVER (PARTITION BY group_id ORDER BY id DESC) as n from "tbl") as "sub"
where "n" = 1
"Subquery Scan on sub (cost=3678.01..4868.84 rows=183 width=8) (actual time=29.683..41.953 rows=3 loops=1)"
" Filter: (sub.n = 1)"
" Rows Removed by Filter: 36705"
" -> WindowAgg (cost=3678.01..4410.83 rows=36641 width=16) (actual time=29.681..40.401 rows=36708 loops=1)"
" -> Sort (cost=3678.01..3769.61 rows=36641 width=8) (actual time=29.671..31.515 rows=36708 loops=1)"
" Sort Key: tbl.group_id, tbl.id DESC"
" Sort Method: quicksort Memory: 3087kB"
" -> Seq Scan on tbl (cost=0.00..900.41 rows=36641 width=8) (actual time=0.024..7.945 rows=36708 loops=1)"
"Planning time: 0.178 ms"
"Execution time: 42.396 ms"
MAX
explain analyze
select * from tbl where id = ANY (select max(id) from tbl group by group_id)
"Nested Loop (cost=1083.97..1108.64 rows=3 width=80) (actual time=15.348..15.352 rows=3 loops=1)"
" -> HashAggregate (cost=1083.68..1083.71 rows=3 width=4) (actual time=15.331..15.332 rows=3 loops=1)"
" Group Key: max(tbl_1.id)"
" -> HashAggregate (cost=1083.62..1083.64 rows=3 width=8) (actual time=15.327..15.328 rows=3 loops=1)"
" Group Key: tbl_1.group_id"
" -> Seq Scan on tbl tbl_1 (cost=0.00..900.41 rows=36641 width=8) (actual time=0.020..5.036 rows=36708 loops=1)"
" -> Index Scan using tbl_pkey on tbl (cost=0.29..8.31 rows=1 width=80) (actual time=0.005..0.005 rows=1 loops=3)"
" Index Cond: (id = (max(tbl_1.id)))"
"Planning time: 0.410 ms"
"Execution time: 15.421 ms"
plan 的には PARTITION BY のほうが速いけど実際の結果は MAX のほうが速かった
何回かやっても一緒
短いし速いしなら MAX でよさそう