2026 · Vancouver

Simon Fraser University — Vancouver Campus

Row Pattern Recognition Deep Dive

A guided tour through ISO/IEC 19075-5 · SQL R020 in PostgreSQL — what we built, what's still unwritten, and how it actually runs.

Companion to the poster on display at PGConf.dev 2026.
Skim the standard, walk through worked examples, browse the design, then drive a live NFA simulator with your own patterns.
Discussion: pgsql-hackers thread · latest patch (v47) · commitfest #4460.

Tatsuo Ishii — SRA OSS K.K. ishii@postgresql.org
Henson Choi — INZENT CO., LTD. assam@inzent.com

Translations available — menu above.

§1. The Standard, the Subset, and the Open Questions

1.1 Standard scope

Row Pattern Recognition was introduced into SQL by ISO/IEC 9075-2:2016 and is described in the companion document ISO/IEC 19075-5:2021 (Part 5, "Row Pattern Recognition").

The standard defines two surfaces for the same underlying machinery. Feature R010 places a MATCH_RECOGNIZE clause in the FROM list to produce a derived table; Feature R020 folds the same pattern logic into a WINDOW specification to produce window-function output. The two surfaces share the bulk of their syntax and all of their semantics, but they are functionally distinct entry points — a database may implement either one or both.

The patch series under discussion here implements a subset of R020 only; the table-clause form is deliberately out of scope.

The R020 surface is small but expressive. A query attaches a pattern to a window by adding three clauses inside the window specification: PATTERN describes a regular expression over named pattern variables, DEFINE supplies the Boolean predicate that identifies each variable, and AFTER MATCH SKIP controls how successive matches are positioned within the partition.

Optional MEASURES, SUBSET, INITIAL/SEEK, and the auxiliary CLASSIFIER() function round out the feature. (The standard's MATCH_NUMBER() function belongs to the R010 surface only — 19075-5 §6.3.3 (6) specifically excludes it from R020.)

The patch covers enough of this vocabulary to make non-trivial queries work, but stops short of several constructs whose implementation depends on infrastructure that has not yet been built.

The remainder of this section partitions the standard's vocabulary into what the patch already supports and what it intentionally leaves for later. The lists below reflect the current state of the patch series.

1.2 Implemented features

The implemented vocabulary is sufficient to express the canonical V-shape, W-shape, and reversal patterns that motivate row pattern recognition in the first place. It also covers every standard quantifier — including all seven reluctant variants *?, +?, ??, {n}?, {n,}?, {,m}?, {n,m}? — and the four navigation primitives that DEFINE conditions need to compare adjacent rows.

PATTERN clause: Defines the row pattern inside a window specification.
Regex: + * ? |: Standard quantifiers and alternation.
Regex: ( ) grouping: Parenthesised sub-patterns.
Regex: {n} {n,} {,m} {n,m}: Bounded repetition counts.
Reluctant *? +? ?? {n}? {n,}? {,m}? {n,m}?: All seven reluctant (non-greedy) variants the standard defines (excluded from absorption optimization).
DEFINE clause: Boolean conditions identifying each pattern variable.
Universal row pattern variable: Unqualified column references in DEFINE (e.g. bare Price) resolve to the current row, per 19075-5 §4.10/§4.16.
PREV, NEXT (with offset): Reach n rows before/after the current row (default 1); the inner argument is an arbitrary value expression evaluated at the navigated row.
FIRST, LAST (with offset): Reach a match-relative row; FIRST(expr, n) targets the row n after the match start, LAST(expr, n) the row n before the most recent match row.
Compound navigation (four forms): Outer PREV/NEXT step layered onto an inner FIRST/LAST: PREV(FIRST(expr)), NEXT(FIRST(expr)), PREV(LAST(expr)), NEXT(LAST(expr)) — both the inner and outer step accept their own offset.
INITIAL: Pattern matches must start at the current row (default).
AFTER MATCH SKIP PAST LAST ROW: Default skip mode; eligible for context absorption.
AFTER MATCH SKIP TO NEXT ROW: Allows overlapping matches; disables absorption.

1.3 Not implemented

The features that remain unimplemented fall into three loose groupings. The first — and by far the largest — is the family of constructs centred on MEASURES: the MEASURES clause itself, SUBSET, CLASSIFIER(), pattern-qualified column references such as B.price, and pattern-qualified navigation such as LAST(B.price) or PREV(B.price).

These are not independent gaps so much as different views of a single missing piece: the executor would have to retain a per-match history — a record, for every row in the match, of which pattern variable it was mapped to — and none of these constructs has a meaningful definition without it. CLASSIFIER() reads the variable name out of that record; B.price reads the price column of the rows whose record entry says B; LAST(B.price) walks the same set of rows and picks the last one; SUBSET U = (A, B) defines a virtual variable that aggregates over both buckets; and MEASURES evaluates expressions like AVG(U.Price) using exactly that aggregation.

The current executor evaluates DEFINE predicates row by row but does not record the resulting variable assignments anywhere — there is nothing to query later. Building the history infrastructure is therefore the prerequisite for the entire group; the individual features fall out as small projections once the records exist.

The second grouping concerns alternative skip targets: AFTER MATCH SKIP TO label, AFTER MATCH SKIP TO FIRST var, and AFTER MATCH SKIP TO LAST var. These also depend on match history, since the executor must be able to point at a specific labelled row inside the most recent match.

The remaining items are a heterogeneous tail: the SEEK keyword (the alternative to INITIAL), the empty pattern (), the exclusion form {- … -}, and the order-insensitive PERMUTE operator.

MEASURES: Named per-match output expressions, e.g. MEASURES LAST(B.Price) AS Bottom, AVG(U.Price) AS Avg; in R020 surfaced via OVER like a window function. Accepts the FINAL / RUNNING keywords (19075-5 §5.4), which in R020 collapse to the same value since measures are always evaluated on the last row of the match (19075-5 §6.9 (3)).
SUBSET: Named unions of pattern variables, e.g. SUBSET U = (A, B, C). Usable wherever a pattern variable can be referenced — MEASURES, pattern-qualified refs in DEFINE, CLASSIFIER(U) — all of which need match history.
CLASSIFIER(): Returns which pattern variable a given row was matched as. Defined for both DEFINE and MEASURES (19075-5 §5.9); the answer at any row is the variable name recorded in the match history for that row.
Pattern-qualified column reference: Bare B.price in DEFINE or MEASURES — the value of the column from the row mapped as the named variable.
Pattern-qualified navigation: Restrict navigation to rows matched as a named variable, e.g. LAST(B.price), FIRST(B.price), PREV(B.price), NEXT(B.price).
SEEK: Match may start anywhere in the window (vs. INITIAL).
AFTER MATCH SKIP TO label: Skip to a labelled row.
AFTER MATCH SKIP TO FIRST var: Skip to the first row matched as the named variable.
AFTER MATCH SKIP TO LAST var: Skip to the last row matched as the named variable.
Regex: {- -}: Exclusion — removes matched rows from the per-match output.
Regex: (): Empty pattern — matches the empty sequence.
PERMUTE(...): Order-insensitive matching across listed variables.
Aggregate functions in DEFINE: Aggregates such as SUM, AVG, COUNT are not permitted in DEFINE predicates. The standard allows them as running aggregates over the partial match (per-row evaluation against rows already mapped to a variable), which depends on the same per-match history that MEASURES/CLASSIFIER() require.

Four further points are worth noting here, since they are easy to mistake for omissions.

The first is that the pattern anchors ^ and $ are not permitted with RPR in the WINDOW clause by the standard itself (19075-5 §6.13: "the anchors (^ and $) are not permitted with row pattern matching in windows"; with the underlying definition at 19075-5 §4.14.1); their absence is conformance, not a gap.

The second is that MATCH_NUMBER() is likewise excluded from R020 by the standard (19075-5 §6.3.3 (6) and 19075-5 §6.9 (1)) — it is part of the R010 surface only, and its absence here is again conformance rather than a missing feature.

The third is a set of structural prohibitions the standard imposes on R020 (19075-5 §6.17): row pattern recognition cannot be nested within another row pattern recognition; MEASURES and DEFINE may not contain outer references; subqueries inside MEASURES or DEFINE may not reference row pattern variables; and RPR may not be used inside a recursive query.

The fourth is that MATCH_RECOGNIZE itself — Feature R010, the table-clause surface of the same machinery — is out of scope for this patch. Whether PostgreSQL eventually adds R010 will be a question for a future series, not a measure of this one.

Provenance. The implemented and not-implemented lists in this section reflect the current state of the patch series — specifically v47 (2026-05-02).

§2. Examples — How It Actually Runs

The examples in this section build up incrementally. We start with the conceptual reason that row-pattern matching is genuinely different from string-pattern matching, then introduce the four navigation functions that make DEFINE conditions interesting, and finally walk through three end-to-end traces: a single match (NFA movement), context absorption in the easy case, and the case where absorption becomes unsafe.

The internal mechanism that makes navigation cheap — the 1-slot tuple swap — belongs to the executor's design and is covered in the next section, not here.

2.1 Why row patterns differ from text patterns

A text regular expression matches a stream of characters, and at every position there is exactly one symbol to consider. The runtime question — "does the current symbol equal 'A'?" — has a single yes/no answer. Backtracking automata exploit this: at most one branch survives per character, and the cost of ambiguity is paid by re-reading the input.

A row pattern is different in a fundamental way. A row is not a single symbol; it is a tuple of columns evaluated against a set of Boolean predicates, the DEFINE conditions. Two predicates can be true at the same row simultaneously, and nothing in the standard says they must be mutually exclusive. Consider:

DEFINE
  A AS price > 100,
  B AS volume > 1000,
  C AS price > 200

A row with price = 150, volume = 2000 satisfies both A and B but not C. The pattern matcher cannot collapse this to a single state — two pattern variables are live, and any pattern element that names either of them may advance. The NFA must therefore carry multiple simultaneous states per partition row, not one.

Text regex collapses to one state per symbol; a row can satisfy several DEFINE predicates at once, so multiple states stay alive in parallel.

This single observation drives the rest of the architecture. The execution state is a list of contexts, each context is a list of states, and at every row the runtime asks of every state, "given the variables that are true here, where can I go?" The forks that result are why the runtime needs several layers of pruning — state deduplication within a context, absorption across contexts, and the other rules surveyed in §3.6 — without which the number of live states and contexts grows with the partition and pattern matching becomes super-linear.

2.2 Navigation functions

DEFINE conditions that reference only the current row are limited; almost every interesting pattern needs to compare the current row to its neighbours or to the rows already accepted earlier in the match. The standard provides four navigation functions for this, and the patch implements all of them.

PREV(expr [, n]): The row n rows before the current row (default n = 1). Used for "today vs. yesterday" comparisons.
NEXT(expr [, n]): The row n rows after the current row. Forward look-ahead; uncommon in window form because the window is monotone.
FIRST(expr [, n]): The n-th matched row of the current match, counting from the start. "Compare to the row that started this match."
LAST(expr [, n]): The n-th most recently matched row. "Compare to the most recent matched row."

The four primitives compose: PREV and NEXT can wrap a FIRST or LAST call, giving four compound forms whose semantics are "go to a match-relative row, then step a fixed distance away from it." This is what allows a DEFINE expression to read, for example, the row immediately before the match started.

PREV(FIRST(expr [, n]) [, m]): Step m rows back from the match's n-th row (default m = 1). "What was the value just before this match began?"
NEXT(FIRST(expr [, n]) [, m]): Step m rows forward from the match's n-th row. Reaches further into the match without depending on the current row.
PREV(LAST(expr [, n]) [, m]): Step m rows back from the n-th most recently matched row. "Compare to a row shortly before the latest match."
NEXT(LAST(expr [, n]) [, m]): Step m rows forward from the n-th most recently matched row.

Two practical points are worth making here. First, navigation can be compounded: PREV(FIRST(price)) reads the row immediately before the match started, and the patch supports this. Second, navigation has consequences for the executor's optimizations. PREV and NEXT are relative to the current row and are always safe; FIRST and offset variants of LAST are relative to the match boundaries, which interacts with context absorption and forces the planner to keep some contexts alive that it would otherwise discard. The mechanism behind that interaction is the topic of §2.6.

2.3 Worked example 1 — NFA movement

Goal of this exampleShow the row-by-row NFA state evolution on a simple, non-absorbing pattern. No optimizations are involved; the trace is what the runtime would do without any of them.

SELECT company, tdate, price,
  first_value(price) OVER w AS start_price,
  last_value(price)  OVER w AS end_price
FROM stock
WINDOW w AS (
  PARTITION BY company ORDER BY tdate
  ROWS BETWEEN CURRENT ROW
       AND UNBOUNDED FOLLOWING
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (START UP+ DOWN+)
  DEFINE
    UP   AS price > PREV(price),
    DOWN AS price < PREV(price)
);

The pattern flattens to four elements:

[0] START  quant 1..1   next → 1
[1] UP     quant 1..∞   next → 2
[2] DOWN   quant 1..∞   next → 3
[3] #FIN

For the price series 100, 110, 120, 115, 108, 130:

Row	Price	True vars	Action
0	100	START	New context. START matches and immediately exits (max=1); state advances to UP+.
1	110	START, UP	UP matches. Advance forks: one state loops at UP, another exits to DOWN+.
2	120	START, UP	UP matches in the looping state and forks again. The DOWN state from row 1 dies (120 ≮ 110).
3	115	START, DOWN	UP fails on the looping state and dies. The DOWN state from row 2 matches. Sole live state.
4	108	START, DOWN	DOWN matches. Advance forks: loop at DOWN, and exit to #FIN — the FIN state is a match candidate over rows 0–4.
5	130	START, UP	The looping DOWN state fails (130 ≮ 108). With no other live state, the FIN candidate is finalized as the match. A fresh context starts at row 5 and advances to UP+ but never sees a DOWN before partition end.

The key point: at row 3, the row satisfies both START (always true) and DOWN, but the only state that survived row 2 is on the DOWN-exit branch, so only the UP → DOWN transition is taken. The multi-state nature of §2.1 is visible as fan-out at every UP match (state could stay at UP+ or advance toward DOWN+). Greedy preferment is "loop again before exit", so under enough data the looping branches would extend the match further; here the looping DOWN dies at row 5 (130 ≮ 108), and the earlier FIN candidate (rows 0–4) — produced when DOWN exited at row 4 — is finalized as the match.

The query's result follows directly from this trace. Under RPR semantics, the window functions first_value(price) and last_value(price) are evaluated only at the row that started the match — every other row in the match produces NULL for these window functions, since its reduced frame is empty. The output for our data therefore looks like the table the poster shows in its top-right panel:

Row	Price	start_price	end_price
0	100	100	108
1	110	—	—
2	120	—	—
3	115	—	—
4	108	—	—
5	130	—	—

Row 0 is the match start, so its frame covers rows 0–4 and the window functions report the V-shape's opening price ($100) and bottom price ($108). Rows 1–4 are inside the match but not its start, so they receive NULL. Row 5 (price $130) is outside any match and likewise receives NULL.

2.4 Worked example 2 — Alternation and lexical order

Goal of this exampleShow how a single context carries multiple parallel states when one row satisfies more than one alternative — and how the standard breaks ties.

The (A | B) form gives the matcher a choice: at any row, the two alternatives are tested independently, and any number of them may match. This is where the multi-state nature of §2.1 becomes visible inside a single context — not across contexts (that is absorption), but along parallel branches the executor explores in lockstep.

PATTERN ((UP | HIGH) DONE)
DEFINE
  UP   AS price > PREV(price),
  HIGH AS price > 100,
  DONE AS volume > 1000

Imagine a row where the price has both risen and exceeded $100 — both UP and HIGH are true. Each alternative spawns its own state: one walking the UP branch, one walking the HIGH branch. They proceed in parallel until DONE resolves them.

Row	True vars	Live states
R	UP, HIGH	State A on UP-branch, State B on HIGH-branch — both at "next: DONE"
R+1	DONE	Both states reach #FIN at the same row

Both branches produce a match of the same length over the same rows, leaving the matcher with two candidate paths — one labelled UP, DONE and one labelled HIGH, DONE. The standard resolves this by lexical order: among the alternatives written in (UP | HIGH), the one written first wins, regardless of match length. Because UP appears before HIGH, the surviving path is UP, DONE.

Lexical order is what makes CLASSIFIER() well-defined when it is eventually implemented — it lets the runtime tell the user "this row matched UP, not HIGH" even when both predicates were true. Lexical order is the primary rule for alternation: a lex-earlier branch wins over a lex-later one even if the lex-later branch would produce a longer match, and a lex-later (possibly shorter) branch can still win if every lex-earlier branch dies before reaching FIN. Greedy length is decided within a single quantifier — given two completions of the same alternation branch, the greedy quantifier prefers more iterations.

2.5 Worked example 3 — Context absorption (same fate)

Goal of this exampleShow why naive O(n²) memory becomes O(1) under absorption.

The simplest pattern that exhibits absorption is PATTERN (A+) with DEFINE A AS TRUE. Every row matches A, and the standard requires the matcher to consider every row as a possible match start. Naively, this means N contexts for an N-row partition. Take a five-row partition (any data — the predicate is constant true):

Row	Naive contexts	With absorption
0	C1[A:1]	C1[A:1]
1	C1[A:2], C2[A:1]	C1[A:2] — C2 absorbed
2	C1[A:3], C2[A:2], C3[A:1]	C1[A:3]
3	C1[A:4], C2[A:3], C3[A:2], C4[A:1]	C1[A:4]
4	five contexts	C1[A:5]

Memory grows from O(n) to O(1). The absorption rule that justifies the discard is straightforward when the quantifier is unbounded:

Absorption rule. Two contexts whose live state is at the same pattern element, where the older context has a count ≥ the newer one, have identical futures under an unbounded quantifier. The newer context can be discarded; whatever match it would eventually find, the older context will find a longer or equal one.

The poster's left lower panel ("① Context Absorption") is exactly this rule visualised over five rows.

A subtle but important point hides in the rule: the discard is safe because the predicate A AS TRUE evaluates to the same value at every row regardless of which context is asking. The same is true of any predicate that references only the current row or a fixed offset from it — including PREV. The next example switches to a concrete week of prices, using a PREV-based predicate, and §2.6 will reuse exactly that week to make the symmetry between safe and unsafe absorption obvious:

PATTERN (RISE+)
DEFINE RISE AS price > PREV(price)

Take the trading week $100, $108, $112, $116, $110 from Monday through Friday — four rises followed by a sudden drop. Suppose C1 starts on Tuesday (the first day where RISE matches: $108 > $100), and the executor speculatively also tracks C2 starting on Wednesday. Each row's RISE condition compares the row's price to its immediate predecessor — a partition-level fact, not a context-level one — so the two contexts are forced to compute the same Boolean at every row they share:

Day	Price	C1 — Tue start price > PREV(price)	C2 — Wed start price > PREV(price)
Mon	$100	—	—
Tue	$108	$108 > $100 ✓ (just started)	—
Wed	$112	$112 > $108 ✓	$112 > $108 ✓ (just started)
Thu	$116	$116 > $112 ✓	$116 > $112 ✓
Fri	$110	$110 > $116 ✗ — finalize	$110 > $116 ✗ — finalize

Same fate Every row C1 and C2 share, they evaluate identical expressions and produce identical results — and on Friday they die at the very same comparison. Whatever future C2 has, C1 has too. Absorbing C2 into C1 throws nothing away.

The story breaks the instant the predicate begins to depend on each context's own boundaries — and that is what §2.6 is about.

2.6 Worked example 4 — When absorption becomes unsafe

Goal of this exampleShow what changes when DEFINE references FIRST — the absorption rule no longer holds, and the runtime must keep contexts alive.

Suppose an analyst wants to find consecutive trading days where a stock stayed within ten dollars of the day the run started — a kind of consolidation window. The pattern and DEFINE look like this:

PATTERN (STABLE+)
DEFINE STABLE AS price < FIRST(price) + 10

The condition now compares the current row's price to the price at the start of the current run. Two runs that started on different days have different FIRST(price) values, so two states at the same pattern element and the same count are no longer interchangeable: their futures depend on the boundary that absorption was about to discard.

Take the very same trading week as in §2.5 — $100, $108, $112, $116, $110 from Monday through Friday. The runtime again keeps two candidate runs alive at the same time: C1 started on Monday, C2 started on Tuesday (every row is a potential run start under STABLE+).

Day	Price	C1 — Mon start FIRST = $100 price < $100 + 10	C2 — Tue start FIRST = $108 price < $108 + 10
Mon	$100	$100 < $110 ✓	—
Tue	$108	$108 < $110 ✓	$108 < $118 ✓ (just started)
Wed	$112	$112 < $110 ✗ — finalize Mon–Tue	$112 < $118 ✓
Thu	$116	—	$116 < $118 ✓
Fri	$110	—	$110 < $118 ✓ (still running)

Different fates C1 dies on Wednesday with a two-day run (Mon–Tue); C2 keeps running all the way through Friday. Same prices, same form of question — but different ceilings derived from their own match starts. On the very same day, the two contexts reached opposite conclusions.

Under absorption, C2 would have been merged into C1 on Tuesday — the merged context keeps only one ceiling, so C2's distinct view (ceiling $118, the four-day run that ends only on Saturday) is no longer recoverable from internal state. C2 has to stay alive because it is an independent match candidate the runtime may still need: once C1's match ends on Wednesday, the next attempt has to pick up from a still-running C2 rather than start over from scratch. Whenever a DEFINE predicate carries match-start dependency, the planner therefore disables absorption preemptively.

(When the leading context's match does succeed, contexts that started inside its accepted span — under the default AFTER MATCH SKIP PAST LAST ROW — are simply discarded; they are kept alive only so the runtime has somewhere to fall back to if the leading match fails.)

The dependency table on the poster's right lower panel ("② Navigation") summarises which navigation forms create match-start dependency:

Navigation	Match-start dep.	Absorbable?
PREV, NEXT	none	yes
LAST (no offset)	none	yes
LAST with offset	boundary check	no
FIRST (any)	direct	no

The two examples in §2.5 and §2.6 reduce to a single rule. Same fate is what makes absorption safe: if every context at the same pattern element will compute the same answer to every future predicate, only the oldest needs to be kept. Different fates make absorption unsafe: as soon as predicates branch on context-private state — which is exactly what FIRST and offset-bearing LAST do — every live context represents a future that no other context can recover, and discarding any of them risks producing a wrong answer.

The planner detects this distinction at compile time and decides per-context whether absorption applies. This is also why the poster's benchmark in panel ③ stays linear in both the success and the failure cases: when absorption is safe the runtime collapses contexts, and when it isn't, the planner accepts the multi-context cost rather than risk a wrong answer.

The benchmark numbers on the poster are the same algorithm playing out at scale. On the success pattern (A+ B+ C+ D), both PostgreSQL and Trino scale O(n) once a match is confirmed, and PostgreSQL's lead — roughly 16× to 33× — is mostly the JVM gap.

On the failure pattern (A+ B+ C+ E), Trino has no absorption and degrades to O(n²) chasing every potential match start; at 100,000 rows it takes more than five hours, while PostgreSQL still finishes in 92 ms, a speed-up of about 217,000×.

That gap is not engineering polish — it is precisely the same-fate / different-fate distinction from §2.5 and §2.6, applied to every row of every potential match start in the partition.

2.7 Worked example 5 — When SKIP disables absorption

Goal of this exampleShow a second way absorption can fail: not because the predicate splits, but because the output semantics demand every match be reported separately.

The previous example broke absorption from the data side: FIRST in DEFINE made each context evaluate predicates differently. Absorption can also break from the output side, and the AFTER MATCH SKIP clause is what controls it.

Consider the pattern PATTERN (A+) with DEFINE A AS TRUE over five rows that all match A. Under the default AFTER MATCH SKIP PAST LAST ROW, the matcher finds the longest match starting at the earliest row and then jumps past it; any context that started inside that match is implicitly discarded as redundant — exactly the situation absorption is designed to handle. The output is a single match, rows 0–4, and the runtime needs only one live context.

Switch the skip mode to AFTER MATCH SKIP TO NEXT ROW and the contract changes:

PATTERN (A+)
DEFINE A AS TRUE
AFTER MATCH SKIP TO NEXT ROW

Now every potential start position must be reported separately, even when matches overlap. For the same five rows the runtime is required to emit five matches: rows 0–4, 1–4, 2–4, 3–4, and 4–4. None of these can be replaced by a "longer one starting earlier", because the standard says the user wants all of them.

Row	SKIP PAST LAST ROW	SKIP TO NEXT ROW
0	match starts; rows 0–4 will be the only match	match starts at row 0
1	(inside match 0)	match starts at row 1 — must be kept alive
2	(inside match 0)	match starts at row 2 — must be kept alive
3	(inside match 0)	match starts at row 3 — must be kept alive
4	match 0 finalizes (rows 0–4)	five matches finalize: 0–4, 1–4, 2–4, 3–4, 4–4

Different output, different fates Under AFTER MATCH SKIP TO NEXT ROW, every late-starting context is its own output row. Two contexts at the same pattern element are no longer redundant — they are both required outputs, and discarding one would quietly drop a match the user asked for.

Notice that the predicate has not changed. A AS TRUE evaluates the same way at every row regardless of which context is asking, so the §2.5 same-fate condition is still met. What changes is the output requirement: even contexts with identical futures must coexist because they correspond to distinct rows of the result. The planner therefore disables context absorption whenever AFTER MATCH SKIP TO NEXT ROW is in effect, regardless of the DEFINE clause.

Putting §2.6 and §2.7 side by side gives the full picture of when absorption fails:

Data side · §2.6

Predicate evaluates differently per context.

Triggered by FIRST or offset-bearing LAST in DEFINE.

Output side · §2.7

Output requires every match start as a separate row.

Triggered by AFTER MATCH SKIP TO NEXT ROW.

Either condition is enough to disable absorption for the affected contexts. When neither is in effect — the default AFTER MATCH SKIP PAST LAST ROW with DEFINE conditions that use only PREV, NEXT, or bare LAST — the runtime collapses to a single context per pattern position and stays linear all the way through.

§3. Design — From Parser to Executor

Row Pattern Recognition is implemented as three stages that hand work off through well-defined intermediate forms. The parser turns SQL text into a pattern tree and a list of DEFINE predicates; the planner compiles the tree into a flat array of pattern elements and decides which of them can participate in context absorption; the executor runs the array against the partition row by row in a three-phase loop. Each stage has its own data shape, and most of the design ingenuity is at the boundaries: a flat NFA that fits in cache, a navigation model that reuses a single tuple slot rather than materializing one per reference, and an absorption rule that turns O(n²) memory into O(n).

SQL text
  │
  │  parser stage
  ▼    validate frame
       build pattern tree
       type-check DEFINE

pattern tree + DEFINE list
  │
  │  planner stage
  ▼    optimize the tree
       compile to flat NFA array
       decide absorbability

flat NFA + absorption flags
  │
  │  executor stage
  ▼    per-row engine:
       Match → Absorb → Advance

match result:
  start row, length, success/fail

The sections below walk down this pipeline. §3.1 covers the parser and the shape of the pattern tree; §3.2 covers the compilation that turns the tree into the flat NFA; §3.3 covers the navigation model that DEFINE predicates use to look at neighbouring rows; §3.4 covers match-boundary handling — the SKIP, INITIAL, and bounded-frame rules that fix where a match starts and ends; §3.5 is the three-phase per-row engine; §3.6 collects all the pruning mechanisms that keep the state space bounded; and §3.7 surveys what the implementation surfaces in EXPLAIN output.

3.1 Parser — building the pattern tree

The parser recognizes pattern recognition by the presence of a PATTERN clause inside a window specification. Its first job is frame validation, since RPR imposes constraints that ordinary window queries do not: the frame mode must be ROWS (not RANGE or GROUPS), the start boundary must be CURRENT ROW, and EXCLUDE options are forbidden. These are conformance checks, not optimizations — RPR's notion of a frame is the match span, and the standard does not contemplate filling it with anything other than the matched rows.

Once the frame passes validation, the parser turns the PATTERN clause into a tree built from four kinds of node — a variable reference, a sequence (concatenation), an alternation, and a group (parenthesised sub-pattern). Each node carries the quantifier as three numbers — a lower bound, an upper bound (possibly infinite), and a flag for reluctant matching:

Source	Tree encoding
A	VAR(A, min=1, max=1)
A+	VAR(A, min=1, max=∞)
A*	VAR(A, min=0, max=∞)
A{3,5}	VAR(A, min=3, max=5)
A+?	VAR(A, min=1, max=∞, reluctant=true)

Each DEFINE predicate is then type-checked against the partition's columns and coerced to a Boolean expression. Two practical things happen as part of this. First, every column referenced by a DEFINE predicate is registered as part of the query's output requirements, so the planner propagates those columns through to the executor stage even if the surrounding query does not select them — without that, the runtime would have nothing to evaluate against. Second, variables that appear in the PATTERN but never in the DEFINE are implicitly true: they match every row.

3.2 Compilation — from AST to a flat NFA

The planner turns the parser's tree into the data structure the executor will run: a flat array of pattern elements addressed by index. The compilation proceeds as a six-step pipeline:

compile(astTree):
  1. optimize the AST
  2. measure size and depth
  3. allocate the element array
  4. fill from the AST
       (assign next/jump pointers)
  5. finalize — set up FIN sentinel
  6. mark absorption-eligible elements

The flat shape is chosen for a simple reason: the executor needs to traverse the pattern many times per partition, and a contiguous index-addressable array is the cheapest data structure to walk. Steps 1 and 6 are the interesting ones — step 1 because it determines how big the array will be, and step 6 because it decides whether the absorption optimization in §2.5 will engage at all.

AST optimization

Optimization pays off twice: once in the static element count of the flat array, and again in every row processed at runtime. Every transform reduces the state space the runtime must enumerate. The optimizer applies eight rewrite rules in succession until the AST stops changing:

SEQ flattening: SEQ(A, SEQ(B, C)) → SEQ(A, B, C)
Consecutive variable merging: A A → A{2}
A{2,3} A{1,2} → A{3,5}
Consecutive group merging: (A B)+ (A B)+ → (A B){2,∞}
Consecutive ALT merging: (A | B) (A | B) (A | B) → (A | B){3}
Prefix/suffix absorption: A B (A B)+ → (A B){2,∞}
ALT flattening & dedup: (A | (B | C)) → (A | B | C)
(A | B | A) → (A | B)
Quantifier multiplication: (A+)+ → A+
(A{2,3}){5} → A{10,15}
Single-child unwrap: SEQ(A) → A
(A){1,1} → A

Quantifier multiplication is the only transform that requires a safety check: the optimizer collapses only in three safe cases — both quantifiers unbounded ((A+)+ → A+), outer exact ((A{2,3}){5} → A{10,15}), or the child trivial {1,1} ((A){2,5} → A{2,5}). Other combinations can introduce gaps the flat form would miss — e.g., (A{2}){2,3} accepts only {4, 6}, but a naive A{4,6} would also accept 5 — so the optimizer leaves them intact.

Element shape

Each element of the flat array represents a single position in the pattern. There are five logical kinds: a variable reference (the only kind that consumes a row); group begin and group end markers, which open and close a parenthesised sub-pattern; an alternation marker, which heads a list of branches; and a finish marker at the end of the pattern.

Every element also carries a depth (its group nesting level), the quantifier (a min and max repetition count, possibly infinite), and two transition pointers — next, "where to go after consuming this element," and jump, "where to skip to" (used by alternation to chain branches, by group begin to bypass the body when the quantifier allows zero, and by group end to loop back to the body).

For PATTERN ((A B)+) the compiled array looks like this:

PATTERN ((A B)+) compiles to:

[0] BEGIN  depth 0  quant 1..∞
    next → 1  jump → 4
    (opens group; jump skips
     to FIN when min = 0)

[1] A      depth 1  quant 1..1
    next → 2

[2] B      depth 1  quant 1..1
    next → 3

[3] END    depth 0  quant 1..∞
    next → 4  jump → 1
    (closes group; jump loops back)

[4] FIN    pattern complete

The pattern reads left-to-right via next, with jump handling the non-linear edges. At idx 3 the END's jump points back to idx 1, which is how the unbounded outer quantifier loops; at idx 0 the BEGIN's jump would skip past the END to idx 4 if the group were optional. The runtime never has to construct a graph at runtime — it just follows these two pointers as it walks the array.

Per-element attributes

Beyond shape, each element carries four logical attributes that direct the runtime's behavior at that position:

Reluctant: Reverses the order of quantifier expansion. Greedy quantifiers try "loop again" before "exit"; reluctant quantifiers try "exit" first. Carried by the variable for A+?; carried by the group's BEGIN and END for ((…)+?).
Empty-loopable: Set on group ends whose body is nullable ((A?)*, (A? B?)+, (A | B*)). Tells the runtime to add a fast-forward exit alongside the normal loop-back, so the cycle guard does not kill legitimate matches in groups that can produce empty iterations.
In-absorbable-region: Marks every element that lies within an absorption-eligible region — used by the runtime to track whether the current state is still in safe territory.
Absorption-comparison-point: Marks the specific element where absorption comparisons should run. For a simple A+ it lands on the variable; for an unbounded group like (A B)+ it lands on the group end alone.

The split between "in-region" and "comparison-point" matters because absorption only makes sense at points where iterations close. Inside the body of (A B)+, the runtime is mid-iteration and the count has not yet reached its final value for that round; comparing here would mean comparing incomparable values. The state must reach the group end before the runtime can decide. The first attribute therefore says "you are still in absorbable territory"; the second says "you have reached the comparison point — go check now."

Absorbability analysis

Step 6 of the compilation is what gives §2.5's "same fate" rule its compile-time witness. The decision is layered:

isAbsorbable(query):
  if SKIP mode != SKIP PAST LAST ROW
      → return false
  if frame end != UNBOUNDED FOLLOWING
      → return false
  if any DEFINE depends on match_start
      → return false
  walk the AST and mark
  elements that satisfy
  a structural case

The first three checks are query-level: they correspond exactly to the conditions §2.7 (output-side: SKIP mode), bounded frame (boundary breaks monotonicity), and §2.6 (data-side: FIRST or offset-bearing LAST in DEFINE). When any of them fails, the analysis sets no flags and absorption is disabled query-wide. When they all pass, the AST walk admits three structural shapes:

Case 1 — Simple unbounded variable: A+, A*, A{2,∞}
Each iteration is exactly one row. An earlier context's count is always ≥ a later one's at every shared position.
Case 2 — Fixed-length group, unbounded outer: (A B)+, (A B{2})+, ((A (B C){2}){2})+
Every child has min == max, so the body is semantically equivalent to its unrolled {1,1} form — (A B{2})+ behaves like (A B B)+. Each iteration consumes a fixed number of rows; count dominance still holds.
Case 3 — Group whose body starts with an unbounded variable: (A+ B)+
The leading unbounded variable is itself absorbable (Case 1) and shields earlier contexts. Absorption stops as soon as the state moves past A — the rest of the body has no monotonicity guarantee, so flags are set on A only.

The structural cases not covered by these three shapes are just as instructive. A B+ cannot be absorbed by the current rule because the leading A consumes a row before the unbounded part begins, so contexts starting one row apart are at different positions inside the unbounded body. (A follow-up "PREFIX absorption" extension that handles fixed-length prefixes via a shadow path has been designed and is planned for a separate patch.) Reluctant quantifiers like A+? are excluded by construction: the absorption rule assumes greedy semantics, where longer matches subsume shorter ones, and reluctant matching reverses that direction.

The result is a per-element decision rather than a per-pattern one. A single query plan can have absorption enabled for the leading A+ of patterns like (A+ B+ C) or (A+ B+)+ — the latter is just Case 3 applied to its leading element — and disabled for everything after; the runtime simply consults the comparison-point attribute on the current state's element each time it considers an absorption pass. Once a state moves into a non-absorbable region, absorption is permanently off for that state — exactly what §2.5 and §2.6 need to be true at the algorithmic level.

3.3 Navigation — the 1-slot tuple swap

DEFINE expressions are ordinary SQL expressions evaluated against a row, but they may include PREV, NEXT, FIRST, or LAST — references that point at different rows. The rows themselves are already buffered in the window's tuplestore; what the executor still has to manage is the tuple slot the SQL expression machinery reads from, since column references inside the expression are bound to a slot at plan time. The implementation reuses a single navigation slot for every navigation call: the slot is swapped in before evaluating the inner expression and restored after, so the rest of the SQL expression machinery never knows the difference.

The model the executor sees is small: there is a current row slot that holds the row the DEFINE expression is being evaluated against, and a navigation slot that the runtime can temporarily redirect to a different row. Around any navigation call the runtime sets up the navigation slot, evaluates the inner expression as if it were reading the current row, then restores the original row. Pseudocode:

eval_navigation(call):
  targetPos = compute_target_position(call)
  if targetPos is outside its valid range:
      return NULL

  save current_row_slot
  fetch row at targetPos
    into current_row_slot
  result = eval_inner_expression()
  restore current_row_slot
  return result

The trick is that there is exactly one slot to save and restore. The inner expression — whatever it is, including arithmetic, function calls, or other column references — runs against the swapped slot using the same evaluation path it would for the current row. No alternate evaluator, no shadow slot, no tuple copy.

Compound navigation is flattened at parse time so that the swap still happens once. PREV(FIRST(price)) is recognized as a two-step navigation — "go to the first matched row, then step one row earlier" — and stored as a single navigation call with a compound kind. The runtime computes the target position in two stages but performs only one slot swap to fetch the final row:

compute_target_position(call):

  # current-row relative
  PREV(n):
      return currentPos − n
  NEXT(n):
      return currentPos + n

  # match-relative
  FIRST(n):
      return matchStart + n
  LAST(n):
      return lastMatchedRow − n

  # compound: match-rel, then step
  PREV(FIRST(n), m):
      return (matchStart + n) − m
  NEXT(FIRST(n), m):
      return (matchStart + n) + m
  PREV(LAST(n), m):
      return (lastMatchedRow − n) − m
  NEXT(LAST(n), m):
      return (lastMatchedRow − n) + m

  validate against the appropriate range
  (match range for FIRST/LAST inner,
   partition range for outer step)

A position cache short-circuits the tuplestore fetch when several navigation calls in the same DEFINE target the same row — common in expressions like price > PREV(price) AND volume > PREV(volume) where both calls resolve to the previous row. The cache stores only "last fetched position," and the swap itself remains a single operation.

The classification of navigation calls is the planner's contribution to the absorption decision. The planner walks every DEFINE expression and sorts each variable into one of two buckets, based on whether all contexts will compute the same Boolean at the same row, or each context will compute its own. The bucket determines two things at runtime: how often the variable is evaluated (once shared, or once per affected context — §3.5 Phase 1), and whether the surrounding state is eligible for context absorption (§2.5 vs §2.6).

Shared evaluation · absorption-safe Every context sees the same Boolean at every row — same fate (§2.5).

No navigation
PREV / NEXT only
LAST without offset
Compound with inner LAST (no offset)

Per-context evaluation · absorption-unsafe Contexts with different match starts compute different answers — different fates (§2.6).

LAST with offset
FIRST (any form)
Compound with inner FIRST
Compound with inner LAST (with offset)

The classification is performed at plan time and stored alongside each DEFINE variable, so the runtime spends no time deciding — it simply reads the bucket for each variable as it processes a row.

Retention budget

Navigation reaches into rows that the window-function machinery has otherwise streamed past. To keep those rows available, the executor is built on top of a tuplestore that retains a sliding window of recent rows; the question is how wide that window has to be. The patch decides at compile time, from two complementary offsets:

Lookback budget: How far backward from the current row any DEFINE call could reach.
Contributors: PREV, LAST with offset, PREV(LAST(...)), NEXT(LAST(...))
Lookahead-from-start budget: How far forward (or backward, when negative) from the oldest live context's match start any DEFINE call could reach.
Contributors: FIRST, PREV(FIRST(...)), NEXT(FIRST(...))

At run time the tuplestore's trim mark is set to whichever of two positions is earlier — current row minus the lookback budget, and the oldest live context's match start plus the lookahead budget. Anything before that mark cannot be reached by any navigation call in any live context, and the tuplestore is free to drop it. The two Nav Mark counters that EXPLAIN reports (§3.7) — Lookback and Lookahead — are the measured peaks of the two budgets, the deepest the executor ever had to reach in either direction over the course of the query.

3.4 Match boundaries — SKIP, INITIAL, and bounded frame

A successful match is recorded as a small bundle of values: a validity flag, a success/failure flag, the row at which the match starts, and the number of rows it consumed. When the validity flag is set, subsequent queries against the executor — "is this row inside a match?" — can answer in O(1) by inspecting the bundle. A length of zero is a real outcome, not an error: it represents a pattern that matched without consuming any row, which the executor must distinguish from "no match has been attempted at this position yet."

The AFTER MATCH SKIP clause decides where the next match attempt starts. AFTER MATCH SKIP PAST LAST ROW moves to the row after the match end, producing the non-overlapping output most queries want and enabling the absorption optimization.

AFTER MATCH SKIP TO NEXT ROW moves only to the row after the match start, allowing overlapping matches; the planner therefore disables absorption for the entire query plan when this mode is in effect.

The two skip targets the standard also defines — AFTER MATCH SKIP TO FIRST var and AFTER MATCH SKIP TO LAST var — depend on per-match history that this patch does not retain. They are not present in the grammar at all, so the parser raises a generic syntax error if either is supplied.

The same is true of SEEK, the spec's alternative to INITIAL. Under SEEK, a match attempt starting at row R may succeed at any row from R to the end of the frame, not just at R itself. The patch implements only INITIAL: every potential match anchors at a specific row. The parser raises an error if SEEK is requested. Bounded frames receive their own treatment — when the user writes ROWS BETWEEN CURRENT ROW AND N FOLLOWING rather than UNBOUNDED FOLLOWING, the executor short-circuits any context whose match has reached the boundary by forcing a mismatch, and absorption is disabled because the boundary breaks the monotonicity assumption that absorption rests on.

3.5 The three-phase per-row engine

The executor's per-row driver is invoked by the surrounding window-function machinery whenever it needs to know whether a given row belongs to a matched frame. The driver finds or creates the context for the current start position, evaluates every DEFINE predicate once against the current row to produce a per-variable Boolean array, then drives the NFA forward by one row. The forward step itself is three phases in a fixed order — Match, Absorb, Advance — wrapped in the same outer loop:

processRow(currentPos):

  # Phase 1 — MATCH (convergence)
  for each context:
      if context exceeds bounded frame:
          force mismatch (finalize early)
          continue
      if matchStartRow differs from
         the shared evaluation position:
          re-evaluate match-start-
          dependent DEFINE vars (§3.3)
      match(context, varMatched)

  # Phase 2 — ABSORB
  if pattern is absorbable:
      refresh each context's flags
      absorb_contexts()

  # Phase 3 — ADVANCE (divergence)
  for each context:
      advance(context, currentPos)

The order is not a stylistic choice. Match must run first because absorption compares post-match counts; running Absorb before Match would compare states that are about to die. Advance must run last because it is the only phase that creates new states — it expands every surviving state through epsilon transitions until each one reaches a variable waiting for the next row. Running Absorb after Advance would mean comparing diverged successors, missing the point at which states are most cleanly comparable.

Phase 1 — Match

Match is a "convergence" phase: states either survive with an incremented count, or die. A subtle point is that for variables sitting in an absorbable region, Match also performs a small amount of forward progress, so that Absorb can compare cleanly. The cover condition only fires at the absorbable comparison point — the END of the unbounded group — so states that have just matched mid-iteration variables (like B inside (A B)+) need to be walked up to that comparison point during Match itself; otherwise Absorb finds nothing eligible to compare and the optimization never engages.

match(context, varMatched):
  for each state in context:
      elem = pattern[state.elemIdx]
      if elem is not a variable:
          continue   # advance handles it

      if not varMatched[elem.varId]:
          drop state   # dead branch
          continue

      state.counts[elem.depth] += 1

      # Inline forward progress so the
      # next phase can compare at the
      # comparison-point element rather
      # than mid-iteration.
      if elem is in-region but not
         the comparison point,
         and reached its max count,
         and elem.next is a group end:

          walk the END chain:
            increment outer group count
            advance state.elemIdx to END
            continue while still in-region,
              must-exit, and next is END
          (stops at the comparison point
           or at a still-loopable element)

The end-chain walk is what makes nested fixed-length groups absorbable. In ((A B){2})+, when B reaches its max (B is {1,1}), the inner group's count must increment; if that count also reaches its max — closing the inner {2} — the outer group's count must increment too, and so on, until the state lands on the outermost absorption point — the END of the unbounded outer group (the +). Doing this work in Match lets Absorb compare against contexts that have already consolidated their post-iteration counts.

Phase 2 — Absorb

The absorb pass walks contexts from newest (tail) to oldest (head). For each in-progress context that is fully absorbable, it scans backwards for an older in-progress context that covers it, and frees the newer one if found. Because contexts are kept in creation order and each row creates at most one context, "newer" and "older" really do mean "started later" and "started earlier."

absorb_contexts():
  for ctx from tail backward:
      if ctx is finished
         or has any non-absorbable state:
          skip
      for older from ctx.prev backward:
          if older is finished
             or has no absorbable state:
              skip
          if covered(older, ctx):
              free(ctx)
              record absorption length
              break

covered(older, newer):
  for each state in newer:
      elem = pattern[state.elemIdx]
      if elem is not the comparison point:
          return false
      if no state in older with:
            same elemIdx
            and isAbsorbable
            and older.counts[depth]
                >= newer.counts[depth]:
          return false
  return true

Two micro-decisions follow from this. The first is that the cover check rejects immediately if any state in the newer context sits anywhere other than a comparison point — comparing at non-judgment points would not be a meaningful comparison.

The second is the asymmetric flag pair on each context: has-absorbable-state answers "could this context absorb a newer one?" and is monotonic (it can only go true→false as states die), while all-states-absorbable answers "could this context be absorbed?" and is dynamic (it flips back to true when a non-absorbable state is removed). Both flags are checked in constant time before the cover scan even starts, so absorption pays its full cost only on contexts that have a real chance of being absorbed.

Phase 3 — Advance

Advance is the "divergence" phase: each surviving state is expanded through epsilon transitions until every branch reaches either a variable waiting for the next row or the FIN sentinel. The expansion is depth-first, and the order in which sibling branches are walked is what makes the standard's preferment rule actually take effect — the lexically first branch is always added first, and the deduplication step at state insertion silently discards equivalent later additions.

advance(context, currentPos):
  pull out all current states;
  rebuild ctx.states from scratch
  for each state in lexical order:
      clear the visited-element bitmap
      advance_state(state)   # DFS

      # Preferment: once a DFS reaches FIN,
      # remaining states are less preferred
      # and can be dropped.
      if FIN reached and states remain:
          free the rest
          break

advance_state(state):
  walk via state.elemIdx,
  recurse through:
    ALT branches (in order),
    BEGIN (enter group; plus optional
           skip path if min = 0),
    END (loop back / exit / fast-forward —
         see below),
    FIN (record matchEndRow,
         save matchedState, and prune
         later contexts inside this
         candidate's range — see below),
  stopping at every variable encountered:
      add state to ctx.states
      (deduplicating by elemIdx + counts)

Recording a state that reached FIN does more than just bookmark the candidate match. Under AFTER MATCH SKIP PAST LAST ROW, the next reportable match must start strictly past the current one — so the moment a candidate is recorded, every subsequent context whose start row falls inside the candidate's range is pruned right away, even though the context that hit FIN may keep searching for a longer match through greedy fallback.

The pruning is safe because no matter how that search finishes (a longer match replaces the candidate, or the candidate stands), all the pruned contexts started inside a range that the next reportable match must skip past, and so could never produce output of their own.

This is one of two FIN-time pruning steps — the other, described earlier in this section as part of Advance, discards lexically-inferior states inside the same context.

The trickiest per-element logic lives in the END handler. When a group's iteration count is below the lower bound, the runtime must loop back; when it is at or above the upper bound, it must exit; in between, both options are valid and the quantifier's greediness decides which to try first:

advance_end(state, elem):
  count = state.counts[elem.depth]

  if count < elem.min:
      loop back into the body
      if elem is empty-loopable:
          # nullable body, see §3.2
          also clone a fast-forward
          state that exits the group
          with count satisfied
          (rescues legitimate matches
           that the cycle guard would
           otherwise kill)

  elif count >= elem.max:
      reset counts at this depth
      exit via next
      (END→END: increment outer
       END's count)

  else:
      # min <= count < max: fork
      build an exit state
      (counts at this depth reset)
      if greedy:
          loop first, then exit
          # prefer longer
      else (reluctant):
          exit first
          if exit reached FIN:
              drop the loop
              # prefer shorter
          else loop second

Every state added to a context goes through a deduplication check that compares its position and counts with the existing state list. Because Advance processes branches in DFS order, the state from the first branch of any alternation lands first — and any later branch producing the same position and counts is rejected at insertion. This is exactly what §2.4's lexical-order rule asks for, implemented at the bottom of the runtime without a separate pass.

Empty-loopable groups

A subtle case the runtime has to defuse is the nullable group — a group whose body can match zero rows because every child of the body is itself optional. Patterns like (A?)*, (A? B?)+, and (A | B*) all have this property: depending on the data, the body can complete an iteration without consuming any row at all. That is fine in principle, but it creates a real hazard for the NFA's epsilon expansion. If the body produces an empty match, the END element loops back to BEGIN, the body once again produces an empty match, BEGIN loops to END, and so on. Without something to stop it, Advance's DFS would never terminate.

The runtime stops it with a visited-element bitmap (one bit per pattern element), cleared before each state's DFS expansion: as soon as any element is visited a second time within the same expansion, that path is abandoned. The cycle guard is unconditional and cheap, but it has a side effect — it can also abandon paths that should reach the lower bound through legitimate empty iterations. Take (A?){2,3} with no A matching anywhere in the row stream:

desired behavior:
  iter 1: A? matches zero rows
          → END, count = 1 (under min)
  iter 2: A? matches zero rows
          → END, count = 2 (meets min)
  exit with a zero-length match

what the cycle guard does on its own:
  iter 1: A? skipped → END, count = 1
          → loop back to BEGIN
  iter 2: BEGIN already visited
          → DFS aborts
  count never reaches min
  → match fails (incorrect)

The fix is decided at compile time and acted on at run time. Whenever the planner sees a group whose body is nullable, it tags the END element of that group as empty-loopable. At run time, when the END handler is about to loop back because the iteration count is still below the lower bound, the empty-loop tag tells it to clone an additional fast-forward state alongside the normal loop-back path:

advance_end (count below min):

  primary path:
    loop back into the body
    (keeps the door open for a real,
     non-empty match in the next
     iteration)

  if elem is empty-loopable:
    fast-forward path:
      reset this depth's count
      exit the group via next,
      treating remaining required
      iterations as empty matches

The two paths play complementary roles. The primary loop-back is what catches the case where the body still has real matches to produce later — for instance, in (A?){2,3} followed by data where A matches only on later rows, the loop-back is what allows the second and third iterations to find non-empty matches. The fast-forward is what catches the case where the body never produces anything: it bypasses the cycle guard by exiting the group entirely, declaring "the rest of the required iterations are empty," and lets the match succeed with a zero-length body. Both states get added to the context; whichever one extends successfully wins, and the deduplication check at insertion time prevents either path from spawning more than its share of states.

Beyond the cycle guard, one more start-up trick disambiguates zero-row outcomes at the very start of a context. The context-creation step at every potential start row runs an initial advance through epsilon transitions before any row is consumed, using a position one row before the actual start. Any path that reaches the FIN sentinel through epsilon transitions alone — without consuming a row — therefore produces an end position less than the start position; that negative-span coordinate, combined with whether FIN was actually reached, encodes the difference between an empty match (length-0 match accepted) and an unmatched start without needing a separate flag.

3.6 How the state space stays bounded

The runtime's linearity is not the result of a single optimization. It is the cumulative effect of a layered set of pruning rules, each one catching a different cause of state-space growth at a different point in the per-row cycle. Some are decided at compile time and merely consulted at run time; others fire dynamically. Some kill individual states; others kill entire contexts. The sections above introduced each of them in context; the table below puts them on one page.

Failed predicate Match: Variable states whose DEFINE evaluated false on the current row — dead branches.
Inline end-chain advance Match: Variables that have hit their max count and would otherwise leave the state mid-iteration; advanced through fixed-length group ends so absorption can compare at the right point.
Context absorption Absorb: Newer contexts whose every state is covered by an older context's state with a higher count — see §2.5 for the conceptual rule, §3.2 for the compile-time eligibility, §3.5 for the per-pair check.
State deduplication Advance: States reached via different DFS branches that end up at the same position with the same counts — only the first (lexically earliest) survives; later equivalents are silently dropped, which is also how preferment is enforced (§2.4).
FIN early termination (within context) Advance: States still pending in the DFS queue when one branch reaches FIN — by lexical order, all remaining states are less preferred and can be discarded immediately.
Candidate-match pruning (across contexts) On FIN: Other contexts whose start row falls inside the candidate match's range — the context that hit FIN may still keep searching for a longer match (greedy fallback), but under AFTER MATCH SKIP PAST LAST ROW any context inside the candidate's range can no longer produce a reportable output, regardless of how that search ends, and is pruned right away.
Cycle guard Advance: Epsilon expansions that revisit the same pattern element within the same DFS — would otherwise loop forever in nullable groups.
Empty-loop fast-forward Advance: Legitimate empty-iteration matches the cycle guard would kill in nullable groups — bypasses the cycle by exiting the group with remaining required iterations declared empty.
Bounded-frame cutoff Match: Contexts whose match has reached the user-specified frame upper bound — forced to mismatch so they cannot extend beyond it (§3.4).

Reading the entries by their phase badge traces the lifetime of a context: pruning fires at every row through the three main phases (Match, Absorb, Advance), and again at match completion (On FIN). Reading the descriptions instead groups the rules by what they target: dead branches, redundant contexts, equivalent states, infinite loops, and over-extension past user-imposed boundaries.

No single rule on its own would be enough. The cycle guard alone would kill legitimate matches in nullable groups; the empty-loop fast-forward alone would not stop infinite epsilon loops; absorption alone would over-consolidate when DEFINE references match-start; deduplication alone would not remove redundant contexts, only redundant states. The runtime stays linear in the cases that matter — PATTERN (A+ B+ C+ D) on 100,000 rows, the poster's panel ③ benchmark — only because every layer catches what the layers above it miss.

3.7 EXPLAIN output

EXPLAIN ANALYZE on a query with RPR surfaces NFA-level statistics that are not present for ordinary window functions. Three groups of counters are emitted alongside the window operator:

NFA States:    peak, total, merged
NFA Contexts:  peak, total, pruned
NFA: matched     (len min/max/avg)
NFA: mismatched  (len min/max/avg)
NFA: absorbed    (len min/max/avg)
NFA: skipped     (len min/max/avg)
Nav Mark Lookback:   | runtime | retain all
Nav Mark Lookahead:  | runtime | retain all
  (only when the query uses FIRST,
   PREV(FIRST(...)), or NEXT(FIRST(...)))

Peak and total are direct instrumentation of the runtime: how many states were ever live simultaneously, how many were created over the query's lifetime, and how many were merged away by deduplication. The match-length histograms separate four outcomes — successful matches, failed match attempts, absorbed contexts, and contexts that were pruned (skipped) without being evaluated — and reporting them with min/max/avg makes performance pathologies visible at a glance: a healthy run shows most contexts as either matched or absorbed, with mismatched lengths small.

The two Nav Mark counters report the tuplestore retention budget that §3.3 derives at compile time. Lookback is the deepest backward reach across PREV, LAST with offset, and the compound forms with inner LAST; Lookahead is the deepest forward (or backward, when negative) reach measured from the oldest live context's match start, contributed by FIRST and the compound forms with inner FIRST.

Each counter prints as a fixed integer when the offset is a constant, "runtime" when the offset is a non-constant expression that has to be evaluated each call, and "retain all" when the runtime needs an unbounded budget. Lookahead is only emitted when the query actually uses match-start-relative navigation.

Together these counters make it possible to debug RPR performance without attaching gdb to the backend.

Beyond the counters, EXPLAIN also faithfully reproduces the original PATTERN and DEFINE clauses, including reluctant quantifiers, group repetition, and the AFTER MATCH SKIP option. The implementation goes to some length to make this round-trip stable so that pg_dump and pg_upgrade can preserve RPR objects without semantic drift — the regression suite under rpr_explain verifies it case by case (see §4).

§4. Tests — Coverage Map

The patch ships with five regression suites that together exercise every layer described in §3 — roughly 13,000 lines of SQL, each suite focused on a different concern. The split is deliberate: keeping parser concerns, runtime correctness, planner interactions, and output formatting in separate files makes failures easier to localise, and prevents an unrelated change in one layer from accidentally invalidating tests in another. The five suites are:

rpr: End-to-end query semantics — realistic window scenarios on synthetic stock data (V-shape, W-shape, consecutive rises, reversals).
rpr_base: Parser layer — keyword acceptance, syntax shapes, quantifiers, navigation parsing, error messages, and pg_dump/pg_upgrade round-trip stability.
rpr_nfa: NFA runtime — the three-phase loop, absorption in every absorbable shape, and context lifecycle edge cases.
rpr_explain: Output formatting — NFA statistics, pattern deparse, identifier quoting, format stability across reload.
rpr_integration: Planner interactions — guards that prevent unrelated window optimizations from corrupting RPR semantics.

4.1 rpr — end-to-end scenarios

The scenario suite is the public face of the test set: it uses a synthetic stock-price dataset of about 1,600 rows and runs realistic queries against it — V-shape recovery, W-shape (double bottom), consecutive rises and falls, reversal patterns, multi-symbol partitions. It is the only suite where the inputs and outputs read like queries a user might actually write; the others are deliberately reductive, focused on a single layer at a time.

Because these queries combine every layer (parser, planner, executor, EXPLAIN), a single failure in rpr rarely tells you where the bug lives. The four suites that follow are the bisection: a failure in rpr plus a passing rpr_base rules out the parser; plus a passing rpr_nfa narrows it to scenario-specific data shapes; plus a passing rpr_integration rules out planner interference; and any deparse drift shows up in rpr_explain.

4.2 rpr_base — the parser surface

The base suite is the largest, and it is largest for a reason: it is responsible for proving that every legal piece of syntax in §1.2 actually parses, every illegal piece in §1.3 is rejected with a useful error, and every accepted form survives a serialization round trip. The bulk of it is shaped as small focused snippets — one per syntactic feature — rather than long realistic queries, because the goal is coverage rather than scenario realism.

The serialization tests warrant special attention. RPR objects (views, materialized views, pg_dump output) must round-trip through the catalog representation without semantic drift, including subtleties like the reluctant flag on a quantifier or the exact form of a compound navigation expression. A small set of serialization-specific objects (the rpr_serial_v* views and their backing tables) is intentionally left in place at the end of the run, so that the surrounding regression infrastructure can pick them up and exercise pg_dump and pg_upgrade against them. The rest of the test scaffolding is dropped as usual. Bugs caught this way (such as a reluctant flag being lost across deparse and re-parse) only surface when the round-trip is exercised end to end.

4.3 rpr_nfa — the runtime engine

This is the suite that exercises every mechanism described in §3.5 and §3.6. Its tests follow a consistent pattern: an input table whose rows are explicit Boolean arrays declaring which DEFINE variables match at each row, paired with a pattern that probes a specific runtime behavior. The Boolean-array idiom decouples the runtime test from the DEFINE-evaluation machinery — what is being tested is "given that these variables match here, does the NFA loop produce the expected match span?" rather than "does the DEFINE expression evaluator correctly compute these Booleans?" The DEFINE evaluator is tested elsewhere (rpr_base for parsing, rpr for end-to-end behavior).

A typical test fixture looks like this — a column of variable-name arrays where each entry says which DEFINE variables fire on that row, and a pattern whose DEFINE clauses test for those names directly:

SELECT id, flags,
  first_value(id) OVER w AS ms,
  last_value(id)  OVER w AS me
FROM (VALUES
  (1, ARRAY['A']),
  (2, ARRAY['A','B']),
  (3, ARRAY['A','B']),
  (4, ARRAY['B']),
  (5, ARRAY['_']),
  (6, ARRAY['A'])) AS t(id, flags)
WINDOW w AS (ORDER BY id
  ROWS BETWEEN CURRENT ROW
       AND UNBOUNDED FOLLOWING
  INITIAL PATTERN (A+ B+)
  DEFINE
    A AS 'A' = ANY(flags),
    B AS 'B' = ANY(flags));

Reading the array column down is reading the test scenario directly. Rows 2 and 3 carry both names — A and B both fire there, so the NFA has a real choice about where to switch from the A-leg to the B-leg. The expected match (A at rows 1–3 followed by B at row 4, under the standard's greedy preferment) is fixed by those flags alone, with no expression evaluation of any consequence here — = ANY is just the trivial layer that exposes the array column to DEFINE, not what the test exercises. Writing the same scenario with a numeric predicate (price > PREV(price) and similar) would tangle the NFA test with the predicate evaluator's behavior; the array idiom keeps the two cleanly separate, and a failure here points directly at the NFA loop.

The absorption coverage is particularly thorough, because absorption is the optimization most likely to silently produce wrong answers if it engages when it should not. Tests cover every shape from §3.2's case analysis:

simple unbounded variable (A+) — the canonical N-to-1 collapse;
fixed-length groups ((A B)+, (A B{2})+, three-level nested ((A (B C){2}){2})+);
leading unbounded variable with a fixed suffix (A+ B) — only the leading A+ carries absorption flags, the suffix does not;
per-branch absorption inside an alternation (B+ C | B+ D);
multiple unbounded variables (A+ B+) — only the leading one is absorbable;
reluctant quantifiers (A+?) — verified to be excluded from absorption;
non-leading unbounded (A B+) — verified to be excluded.

Beyond absorption, the suite covers the context lifecycle: overlapping contexts under AFTER MATCH SKIP TO NEXT ROW, failed-context cleanup mid-stream, partition-end finalization of incomplete contexts, and contexts encountered after a candidate match has already been recorded in the head context. Each of these maps to a specific pruning rule in §3.6, and the tests are written to fail loudly if the rule misfires (either by leaving redundant contexts alive, or by killing a context that should have produced output).

4.4 rpr_explain — output stability

EXPLAIN output is part of the user-facing contract — third-party tools (psql autocompletion, monitoring dashboards, log scrapers) parse it, and changes that look cosmetic can break them. The rpr_explain suite verifies three things:

NFA counters appear in the right place — the per-WindowAgg statistics (peak/total/merged states, peak/total/pruned contexts, length histograms for matched/mismatched/absorbed/skipped, Nav Mark Lookback, and — when match-start-relative navigation is in use — Nav Mark Lookahead) all show up under EXPLAIN ANALYZE with the documented labels.
Patterns deparse correctly — every quantifier shape, including reluctant variants and bounded forms; every alternation; every group; every AFTER MATCH SKIP form; INITIAL; navigation calls with offsets; compound navigation. The deparsed text must round-trip back through the parser unchanged.
Identifier quoting is correct — pattern variables and DEFINE expressions are emitted with the same quoting rules as ordinary SQL identifiers, so unusual variable names do not break pg_dump output.

Like rpr_base, this suite intentionally leaves its objects in place at the end of the run so pg_dump and pg_upgrade coverage extends to the explain-side artefacts as well.

4.5 rpr_integration — planner interactions

PostgreSQL's planner has many optimizations that operate on window queries — frame canonicalisation, run-condition pushdown, identical-window deduplication, unused-output projection, moving-aggregate inverse transitions — and each one of them was designed without RPR in mind. Most of them are unsafe to apply when a window has a PATTERN clause: the frame is part of the match contract, the matched output is no longer monotonic in any obvious way, and two windows with the same spec but different DEFINEs produce genuinely different results. The integration suite verifies that each of these optimizations is correctly disabled or bypassed for RPR windows:

Frame canonicalisation: The planner normally rewrites ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING into ROWS UNBOUNDED PRECEDING for monotone aggregates. RPR's frame is the match span, not an aggregation window, and must stay verbatim.
Run-condition pushdown: A monotone WHERE filter on a window function's output can normally be pushed into the window operator as a stop-condition. For RPR this would terminate pattern matching prematurely, possibly cutting off matches mid-extension.
Window deduplication (RPR vs non-RPR): Two windows with identical ORDER BY and frame would normally be merged into one. An RPR window and a non-RPR window can never share state because the RPR side carries match results.
Window deduplication (different DEFINE): Two RPR windows with the same PATTERN but different DEFINE clauses produce different matches and must remain distinct, even though their structural shape is identical.
Unused-output projection: Even if the surrounding query never reads the window's per-row output, the RPR window cannot be removed: the pattern matcher's side effects (which rows are inside which match) feed reduced-frame computations elsewhere in the plan.
Moving-aggregate inverse transitions: Window aggregates with an inverse transition function can normally be evaluated incrementally as the frame moves. RPR's reduced frame is not a sliding window; the inverse transition path would produce wrong results.

The pattern across these tests is the same: each provides a non-RPR baseline that triggers the optimization (and verifies that EXPLAIN shows it being applied), then runs an RPR query of structurally similar shape and verifies that the optimization is not applied. The two halves together prove that the guard in the planner is doing real work, not approving every query without real verification.

This suite is also the reason §3.4 is short. The "match boundaries" mechanisms — reduced frame, SKIP, INITIAL, bounded-frame cutoff — are tested operationally elsewhere; what rpr_integration verifies is that no other optimizer stage tampers with them on the way through. A passing rpr_integration is what lets the rest of the suites trust that their inputs reach the executor unmolested.

Row	Price	start_price	end_price
0	100	100	108
1	110	—	—
2	120	—	—
3	115	—	—
4	108	—	—
5	130	—	—

Row	Price	start_price	end_price
0	100	100	108
1	110	—	—
2	120	—	—
3	115	—	—
4	108	—	—
5	130	—	—

Row	Price	start_price	end_price
0	100	100	108
1	110	—	—
2	120	—	—
3	115	—	—
4	108	—	—
5	130	—	—