Skip to main content
Back to Blog

Three Converter Bugs That Hide in Recursive CTEs

Why an AI-powered T-SQL to PostgreSQL pipeline keeps tripping on the same three shapes — and the deterministic scrubs that fix them.

Steve HarlowApril 23, 202610 min read

A pilot run two days ago put a 76-dataset failing set at 84.0% effective pass through ReportBridge's T-SQL to PostgreSQL conversion pipeline. A new run today put the same set at 89.6%. Three converter changes were responsible. None of them touched the model, the prompt, or the schema context. All three were deterministic scrubs that ran on the output of the AI step.

Each scrub closed a different failure shape. All three shapes cluster around recursive CTEs — the place where T-SQL's forgiving type system meets PostgreSQL's strict one and the converter has to bridge two languages that look almost identical but disagree about what counts as legal.

This is what the three bugs were, what the fixes look like, and why a deterministic scrub kept beating prompt engineering even after multiple rounds of trying.

Bug 1: Recursive-arm type drift

A recursive CTE in PostgreSQL has two arms joined by UNION ALL. The base arm seeds the recursion. The recursive arm references the CTE itself and adds the next row. PostgreSQL requires both arms to return columns of the same type. T-SQL also requires this in theory, but its type promotion is looser — it will frequently merge an integer base with a varchar recursive arm if the values look numeric.

The AI converter preserves the column shapes faithfully, which means the type drift gets carried into PostgreSQL output that then fails:

PostgreSQL after conversion (fails)

WITH RECURSIVE yl AS (
  SELECT (SELECT MIN(year) FROM tests) AS year   -- subquery returns integer
  UNION ALL
  SELECT yl.year + 1 FROM yl WHERE yl.year < 2025 -- arithmetic returns ??
)
SELECT * FROM yl

-- ERROR: UNION types integer and bigint cannot be matched

After deterministic scrub

WITH RECURSIVE yl AS (
  SELECT (SELECT MIN(year) FROM tests)::INTEGER AS year
  UNION ALL
  SELECT (yl.year + 1)::INTEGER FROM yl WHERE yl.year < 2025
)
SELECT * FROM yl

The scrub looks at recursive CTEs whose base arm is either a scalar subquery or an explicit integer cast (the two shapes that cause type drift), then wraps both the base and the recursive-arm arithmetic in ::INTEGER. It's gated tightly so it doesn't fire on plain-literal bases where drift is impossible. Idempotent — running it twice produces identical output.

Why not handle this in the prompt? We tried. Two prompt rules telling the model to align UNION-arm types in recursive CTEs improved the pass rate by about a third of what the scrub achieved, with significant variance run-to-run. The model got it right most of the time. The scrub gets it right every time.

Bug 2: Year-int-to-date cast

A common T-SQL pattern is to compute a year as an integer and then cast it to a date for use as a range boundary. The model translates this verbatim into a PostgreSQL pattern that compiles but evaluates to nonsense:

PostgreSQL after conversion (fails)

-- AI emits this from T-SQL CAST(@year - 1 AS DATE):
SELECT CAST((CAST('2020' AS INTEGER) - 1) AS TEXT)::date

-- ERROR: invalid input syntax for type date: "2019"

After deterministic scrub

-- Rewritten to PG-native MAKE_DATE
SELECT MAKE_DATE(CAST('2020' AS INTEGER) - 1, 1, 1)

The scrub recognizes the pattern CAST(<int-expr> AS TEXT)::date and rewrites it to MAKE_DATE(<expr>, 1, 1). PostgreSQL won't coerce a four-character year string to a date, but it will happily build a date from an integer year, month, and day. The scrub fired once on this pilot — and revealed an unrelated identifier bug underneath it that the date error had been masking.

That second-order benefit is one of the reasons we lean on deterministic scrubs: they unblock downstream errors quickly instead of hiding them behind a recurring first-line failure.

Bug 3: CTE-with-column-list parsing

This one wasn't in the AI step at all. It was in the tooling that runs around the AI step — specifically the "missing relations" check that decides which queries need migration work versus a converter fix.

T-SQL frequently uses a CTE-with-column-list shape that PostgreSQL also supports:

WITH MaxMailSendDateList (ScanDate, MaxMailSendDate) AS (...)
, UpdatedMailSendDate(CaseId, ScanDate, MailSendDate) AS (...)
,RapidPassData AS (...)              -- no space between , and name
SELECT * FROM RapidPassData

A naive parser looks for WITH name AS ( and , name AS (. It misses three things that real T-SQL throws at it:

  • Column lists. The (col1, col2) between the name and the AS keyword. Without handling this, every CTE that uses one looks like a missing table.
  • No-space comma. Subsequent CTEs sometimes have no whitespace between the , delimiter and the CTE name. Comma-no-space is legal SQL and the AI converter preserves it from the source.
  • Inline comments between name and AS. Reports often have a one-line comment listing the column names that the actual CTE produces, sitting between the name and the AS keyword.

Tightening one regex closed all three. The migration gap report for the same corpus dropped from 8 false-positive missing-table entries to zero. Datasets that had been mistakenly tagged NEEDS_MIGRATION rejoined the scorable pool and several passed on the next run.

The pattern: catch the shape in code, leave the rest to the model

Every one of these three fixes shares a structure. Look at the failures, not the passes. Group failures by error message. When a cluster has the same shape across multiple inputs, write a regex and a transformation that runs after the model. Add tests that assert both the positive case and the guard case (does the scrub fire on this shape, and does it leave that shape alone). Ship.

The scrubs are not a substitute for prompt rules. They're the layer below them. Prompt rules teach the model what to aim for. Scrubs catch the cases where the model misses anyway. The two together drive the floor up faster than either one alone.

The pipeline now has thirteen of these scrubs, each covering one recurring failure shape. Adding a fourteenth typically takes under an hour: write the failing test that captures the input and the desired output, write the regex, watch the test go green, run the full pilot to confirm no regressions. The full-pilot run is the slow part — usually 40 to 50 minutes — but the unit-test gate catches mistakes long before the pilot does.

What it adds up to

The three changes that drove this pilot from 84.0% to 89.6% shipped together as two commits. Each was small. Each was deterministic. Each was unit-tested before the pilot ran. The re-pilot took about 50 minutes; the code work took about three hours including tests.

The remaining gap is a mix of identifier hallucinations and unparenthesized recursive-arm shapes that the regex hasn't been widened to cover yet. Both are class-level fixes — the next two scrubs will close them. After that the residual is genuine schema work: tables that exist in the source database but not yet in the target.

None of this is glamorous. It's also the reason the pass rate keeps moving in the right direction.

FAQ

Why are recursive CTEs harder to convert than regular CTEs?

Recursive CTEs have two arms — a base case and a recursive case — that are joined by UNION ALL. PostgreSQL is strict about both arms returning columns of the same type. T-SQL is more forgiving, so source queries often have type drift between arms that the engine smooths over. When an AI converter rewrites the syntax, the type drift is often preserved, and PostgreSQL rejects what SQL Server happily accepted. On top of that, the RECURSIVE keyword itself is easy to lose — both T-SQL and PostgreSQL use WITH for non-recursive CTEs, so the keyword is positional and easy for a language model to drop on the second or third CTE in a chain.

Why use deterministic scrub passes instead of fixing the prompt?

Both have their place. Prompt rules teach the model what to do, but they're probabilistic — the same prompt with the same input can produce different outputs. Deterministic post-processing scrubs run on the output and either match a pattern or don't. For recurring failure shapes (UNION-arm type mismatch, missing RECURSIVE on subsequent CTEs, an integer-to-date cast pattern), a 30-line regex-based scrub closes the gap in a way the prompt couldn't reach reliably. The rule is: catch class-level shapes in code, leave novel cases to the model.

What is a CTE-with-column-list and why does it matter for the gap report?

PostgreSQL and T-SQL both support an optional column list between a CTE name and the AS keyword: WITH NameData (col1, col2) AS (...). Reports converted from T-SQL preserve this shape because it carries column-name information forward. A naive 'find missing tables' parser that only matches WITH name AS (...) will treat NameData as a missing table and pollute the migration gap report with false positives. The fix is one regex that allows an optional (col1, col2, ...) and an even smaller fix that allows zero whitespace between the comma delimiter and a subsequent CTE name.

How do I tell whether my own pipeline has these bugs?

Look at the failures, not the passes. Pull every query that errored in PostgreSQL, group by error message, and look for these three patterns: UNION types ... cannot be matched (likely UNION-arm drift in a recursive CTE), syntax error at or near WITH (likely a lost RECURSIVE keyword on a non-first CTE), or a 'missing table' name in your gap report that's actually a CTE alias defined earlier in the same query (CTE-with-column-list parsing bug). The first two are model output issues fixable with deterministic scrubs; the third is in your migration tooling, not the model.

Migrating SSRS reports to PostgreSQL?

ReportBridge runs the AI conversion, the deterministic scrubs, and the manual-correction tooling end-to-end. Apply for the beta to see your own corpus through the pipeline.

Apply for Beta