Skip to main content
Back to Blog

Validation Beats Conversion: The SSRS Migration Lesson That Saves Your Week

Steve HarlowApril 22, 20269 min read

Here's the SSRS migration war story I'd take back if I could: we shipped a converted report that passed every syntactic check, ran cleanly in production, and produced numbers that looked completely reasonable — until someone on the business side noticed the totals were off by 8%.

The bug was one character: a TOP 1 in a T-SQL correlated subquery had been stripped during conversion, and the compensating LIMIT 1 never got added. The subquery quietly started returning multiple rows. PostgreSQL picked one arbitrarily. The report still ran. The answer was just wrong.

That's the bug that taught me validation matters more than conversion.

Two kinds of conversion failures

There are two failure modes in automated SQL conversion. Only one of them is visible:

Mode 1: Hard failure
  → Syntax error, missing function, bad reference
  → Easy to detect: the query won't compile
  → Shows up in the conversion dashboard immediately
  → Gets fixed before shipping

Mode 2: Silent failure
  → Compiles fine. Runs fine. Returns wrong data.
  → Easy to ship accidentally
  → Shows up weeks later when a user notices
  → Gets blamed on "data quality issues"

Every conversion pipeline catches Mode 1. The market-differentiator is how well you catch Mode 2. A converter that reports "95% pass rate" but ships with undetected silent failures is worse than one that reports 80% and catches them — because the first one creates false confidence.

The three-layer validation model

After getting burned on the TOP 1 bug, we built a three-layer validation pass that runs after every conversion. Each layer catches a different class of silent failure.

Layer 1: EXPLAIN validation (cheap, fast)

Run EXPLAIN on the converted PostgreSQL before execution. No data returned, no full scan. This catches malformed plans, missing indexes referenced by hints, and type-system errors that the parser missed.

EXPLAIN
SELECT e.name,
  (SELECT r.review_date FROM reviews r
   WHERE r.emp_id = e.id
   ORDER BY r.review_date DESC) AS last_review
FROM employees e;

-- ERROR: more than one row returned by a subquery
-- used as an expression

-- Layer 1 catches this without running the query.
-- Takes ~5ms. Filters out ~40% of silent failures.

Layer 2: Row-count comparison (medium cost)

Run the original T-SQL against SQL Server and the converted PostgreSQL against PostgreSQL, on the same source data. Compare row counts. Differences are immediate red flags.

Original T-SQL:    1,247 rows
Converted PG:      1,249 rows
Difference:        +2 rows  ← FLAG

Typical causes:
  • JOIN type mismatch (INNER vs LEFT)
  • NULL handling in WHERE clause
  • Missing or additional duplicate-removal
  • Date-range filter boundary difference

Row-count divergence catches the largest class of silent failures. It's not perfect — two different queries can return the same row count but different data — but it's a cheap and high-signal check.

Layer 3: Checksum comparison on pinned samples (expensive, rare)

For a curated set of sample records, compute a deterministic hash of the output rows (sorted by primary key) from both systems. Compare. Divergence here catches type coercion bugs, sort-order issues, and subtle numeric precision differences.

-- T-SQL: CAST(amount AS DECIMAL(10,2))
-- PG:    amount::numeric(10,2)
--
-- Both return numbers that LOOK the same.
-- One uses banker's rounding, the other doesn't.
-- Checksum diverges on the row where amount ends in .5.

SHA256(
  SELECT string_agg(col1||col2||col3, '|' ORDER BY pk)
  FROM converted_output
)
vs.
SHA256(
  -- same output from T-SQL
)

Layer 3 is expensive — you're running two full queries and comparing their output. Reserve it for a pinned sample set of 5-20 reports that represent the critical business metrics. Don't run it against everything.

The reframe: conversion is a hypothesis, validation is evidence

Once you have the three layers in place, the mental model shifts. Converted SQL isn't a finished artifact — it's a hypothesis that the original query and the new query are equivalent. Validation is the evidence that confirms or rejects the hypothesis.

This changes what "migration complete" means. In a conversion-only world, complete means "the pipeline ran without errors." In a validation-first world, complete means "every converted query has evidence of equivalence on sample data."

The second definition is slower to reach. It's also the only one that actually ships without ticking time bombs.

If you're starting a migration right now

  1. Build the three validation layers before you build the conversion pipeline. It sounds backwards. It isn't.
  2. Curate a pinned sample set of 5-20 reports that cover your critical business metrics. These are your Layer 3 truth set.
  3. Run validation on every conversion. Not a subset. All of them.
  4. Surface validation failures in the same UI as conversion failures. Users shouldn't need a separate workflow for "runs but wrong" vs "doesn't run."
  5. Track Mode 2 failures separately in your metrics. A pipeline that reduced silent failures from 3% to 0.1% is shipping a fundamentally different product than one stuck at 3%.

The conversion vendors who win the long game are the ones who invest in validation first. Fast conversion that ships wrong answers is a liability; slower conversion with evidence of correctness is an asset. Pick the asset every time.

Run the three-layer validation on your own reports

ReportBridge runs EXPLAIN validation, row-count comparison, and checksum comparison out of the box on every converted report. See which of your queries have silent failures.

Try ReportBridge

FAQ

What is a silent bug in a SQL migration?

A silent bug is a converted query that compiles, executes without errors, and returns plausible-looking data — but the data is wrong. Common causes: a subquery that should have returned one row returning many, a JOIN that lost a filter, a date function that interprets boundaries differently. These bugs don't surface in smoke tests; they surface weeks later when someone compares specific values.

Can you catch silent bugs with automated validation?

Mostly, yes. Three layers catch 95%+ of silent bugs: (1) EXPLAIN on the converted PostgreSQL before execution — cheap, catches malformed plans; (2) row-count comparison between original and converted queries on the same data — catches most lost filters and missing LIMITs; (3) checksum comparison of a pinned sample set — catches type coercion and sort-order differences. The remaining 5% needs human review on specific business metrics.

Isn't this just regression testing?

It's regression testing with a specific focus: comparing output between two different SQL dialects on the same source data. Traditional regression tests run against the same database. Migration validation runs the same question against two different databases and compares answers. The tooling looks similar; the interpretation of failures is different.

How much overhead does validation add to a migration?

In our pipeline, validation runs in parallel with conversion and adds 10-15% to total migration time. For a 200-report library, that's an afternoon. Against the cost of shipping a silent bug into production — a wrong quarterly report, a misallocated budget, a compliance filing off by a thousand dollars — the overhead is negligible.