Forcing index seeks with LATERAL joins
When a query filters a table by an IN (...) list of values, the PostgreSQL planner estimates how many rows the
predicate matches and chooses a plan from that estimate. When the estimate is wrong, the planner can choose a
sequential scan or hash join over an index scan and degrade the query by orders of magnitude. Rewriting the query as a
LATERAL join removes the planner’s option to make that choice, forcing one index seek per value regardless of the
row-count estimate.
Efficient IN operator queries. The pattern described here covers a narrower case: a single
filtered lookup where the planner mis-costs the IN predicate itself. It also requires a specific
sorting which in is not needed in some cases like DELETE queries.
Symptoms
Reach for this pattern when all of the following hold:
- A query filters a table on a column using
WHERE col IN (<list>)orWHERE col IN (SELECT ... LIMIT n), where the list of values is supplied at runtime (for example, a batch of parent IDs). - The filtered column has moderate cardinality: values that are common enough that the planner over-estimates how many rows match, but not so common that they appear in the most-common-values (MCV) statistics.
EXPLAIN (ANALYZE, BUFFERS)shows aSeq ScanorHash Joinwhere you expected anIndex Scan, often returning far fewer rows than the planner’s estimated row count.- The query is intermittently slow: fast for some value sets, multi-second for others, because the plan flips with the estimate.
A useful tell: the query reads many buffers and returns few (or zero) rows.
Why MATERIALIZED does not help
A common first attempt is to wrap the inner query in a CTE with MATERIALIZED to fence off the planner. This does not
help here, because the sequential scan is on the subquery’s own table access, not on a join the CTE boundary would
separate. Materializing the result of a seq scan still runs the seq scan.
Rewrite the query as a LATERAL join
Restructure the query so each value drives its own index seek through a LATERAL join over a VALUES list:
Before: IN (...) subquery
DELETE FROM "issues"
WHERE ("issues"."id") IN (
SELECT "issues"."id" FROM "issues"
WHERE "issues"."project_id" IN (1, 2, 3, ...)
LIMIT 1000
)The planner is free to satisfy the inner SELECT with a sequential scan when it mis-estimates the project_id IN (...)
selectivity.
After: LATERAL join
DELETE FROM "issues"
WHERE ("issues"."id") IN (
SELECT "lateral_rows"."id"
FROM (VALUES (1), (2), (3), ...) AS parent("project_id"),
LATERAL (
SELECT "issues"."id" FROM "issues"
WHERE "issues"."project_id" = "parent"."project_id"
LIMIT 1000
) lateral_rows
LIMIT 1000
)Because the inner SELECT now filters on a single project_id value (= "parent"."project_id"), the planner has no
multi-value estimate to get wrong: an equality predicate on an indexed column resolves to an index seek. The list is
evaluated once per value through the LATERAL join.
Preserving limits
- The outer
LIMITpreserves the original cap on the total number of rows processed across all values. - An inner
LIMITbounds the work done per value, so a single hot value cannot dominate the batch.
When not to use this pattern
This is a targeted fix, not a default. Prefer the plain IN form when:
- The list is small. With a handful of values, the per-value
LATERALinvocation can cost more than a single set-based scan. The pattern pays off when the value list is large enough that a misestimatedINpredicate flips to a seq scan. - The
INform already produces a stable index-scan plan. Verify withEXPLAINon real data (using Database Lab) before rewriting. Do not apply it speculatively. - The column is genuinely low-selectivity, so most rows match. If a seq scan is actually the right plan, forcing per-value seeks is slower.
Forcing a plan removes the planner’s ability to adapt, so confirm the rewrite wins on representative data and at the batch sizes you use in production.
Examples
LooseForeignKeys::CleanerServicerewrote its child-row lookup from anIN (...)predicate to aLATERALjoin to stop the loose-foreign-keys cleaner from regressing to sequential scans on high-traffic child tables (for exampleci_build_names). See merge request !235721 and the example plans linked from it.- The same structural fix was applied earlier to the
commit_shas_from_metadatalookup. See merge request !239187.