Rollback in Production: Tag-Based Strategies and CI Validation

Part 16 of 18

May 03, 2026 Abhay 9 min read

Rollback in Production: Tag-Based Strategies and CI Validation

Article 9 covered what rollback commands exist. This article covers how to make rollback reliable in production — where a slow or broken rollback costs real money and real users. The difference between knowing the commands and having a working rollback strategy is process: mandatory tagging, pre-generated rollback files, CI gates, and a decision tree that removes guesswork during an incident.

Why Rollback Fails in Production

Rollback failures in production share a small set of root causes:

No tag was set before deployment — rollback --tag is unavailable; the on-call engineer is now doing arithmetic with rollbackCount
Rollback blocks were never written — futureRollbackSQL would have caught this, but it wasn’t in the pipeline
Rollback was never tested — the block exists but the SQL is wrong; discovered during the incident
Data was deleted — dropTable or DELETE cannot be undone by Liquibase; database backup is the only option
Rollback creates a dependency conflict — a later changeset depends on something the rollback removes

All five are preventable. None of them require heroics — they require process enforced before deployment.

The Mandatory Deployment Runbook

This runbook is not optional. Every production deployment that includes database migrations follows it in order.

Before deployment

# 1. Validate changelog
liquibase validate

# 2. Confirm pending changesets
liquibase status

# 3. Preview the forward SQL
liquibase updateSQL --output-file=/tmp/forward-${VERSION}.sql
# Review /tmp/forward-${VERSION}.sql

# 4. Preview the rollback SQL
liquibase futureRollbackSQL --output-file=/tmp/rollback-${VERSION}.sql
# Review /tmp/rollback-${VERSION}.sql — this is your undo script

# 5. Take a database snapshot
liquibase snapshot \
  --snapshot-format=json \
  --output-file=snapshots/pre-deploy-${VERSION}.json

# 6. TAG — no deployment proceeds without this
liquibase tag ${VERSION}
# e.g., liquibase tag v1.2.0

Steps 3 and 4 produce files that must be reviewed by a human. Steps 5 and 6 are non-negotiable machine steps — the snapshot is your forensic record, the tag is your rollback anchor.

Deployment

# 7. Apply migrations
liquibase update

# 8. Confirm 0 pending changesets
liquibase status

# 9. Verify application health
# Run smoke tests, check APM, watch error rates for 5 minutes

If deployment fails (rollback decision)

# 10. Review what changed
liquibase history

# 11. Preview rollback (the file from step 4 should already exist)
liquibase rollbackSQL --tag=${VERSION}

# 12. Execute rollback
liquibase rollback --tag=${VERSION}

# 13. Verify rollback
liquibase status    # Should show the rolled-back changesets as pending again
liquibase history   # Rolled-back changesets should be absent

The Pre-Generated Rollback File

The most underused Liquibase feature in production: spring.liquibase.rollback-file.

# application-prod.yml
spring:
  liquibase:
    tag: ${APP_VERSION}
    rollback-file: /var/log/app/rollback-${APP_VERSION}.sql

When Spring Boot starts and Liquibase runs update, it simultaneously:

Tags the database with ${APP_VERSION}
Writes the rollback SQL to /var/log/app/rollback-v1.2.0.sql

The rollback file is generated at deployment time, not when the incident happens. The on-call engineer finds a ready-made SQL file. They review it, hand it to a DBA if needed, and execute:

mysql -h prod-host -u lb_user -p ecommerce < /var/log/app/rollback-v1.2.0.sql

This bypasses Liquibase entirely for the rollback execution — the SQL is already generated, reviewed, and sitting on disk. No Liquibase CLI required on the production host during an incident.

For Maven-managed deployments:

# Generate rollback file as a pre-deployment step
liquibase futureRollbackSQL \
  --output-file=/deployments/rollback-${VERSION}.sql

# Archive it with the deployment artefacts
cp /deployments/rollback-${VERSION}.sql s3://your-bucket/deployments/rollback-${VERSION}.sql

CI Pipeline Gates

Every PR that adds a changeset must pass these gates before merge. No exceptions.

# .github/workflows/db-migration-gates.yml
name: Database Migration Gates

on:
  pull_request:
    paths:
      - 'src/main/resources/db/**'

jobs:
  migration-gates:
    runs-on: ubuntu-latest
    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: root
          MYSQL_DATABASE: ecommerce_ci
          MYSQL_USER: lb_user
          MYSQL_PASSWORD: lb_pass
        ports: ["3306:3306"]
        options: >-
          --health-cmd="mysqladmin ping -h localhost"
          --health-interval=10s
          --health-timeout=5s
          --health-retries=5          

    env:
      LIQUIBASE_URL: jdbc:mysql://localhost:3306/ecommerce_ci?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
      LIQUIBASE_USERNAME: lb_user
      LIQUIBASE_PASSWORD: lb_pass

    steps:
      - uses: actions/checkout@v4

      - name: Setup Java
        uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'

      - name: Gate 1 — Validate changelog syntax
        run: liquibase validate

      - name: Gate 2 — Preview forward SQL
        run: |
          liquibase updateSQL --output-file=/tmp/forward.sql
          echo "=== Forward SQL Preview ==="
          cat /tmp/forward.sql          

      - name: Gate 3 — Validate rollback exists for all pending changesets
        run: |
          liquibase futureRollbackSQL --output-file=/tmp/rollback.sql
          echo "=== Rollback SQL Preview ==="
          cat /tmp/rollback.sql          

      - name: Gate 4 — Apply and test rollback round-trip
        run: liquibase updateTestingRollback

      - name: Gate 5 — Apply cleanly (final state)
        run: liquibase update

      - name: Gate 6 — Verify 0 pending after apply
        run: |
          PENDING=$(liquibase status 2>&1 | grep "changesets have not been applied" | awk '{print $1}')
          if [ "${PENDING:-0}" != "0" ]; then
            echo "ERROR: ${PENDING} changesets still pending after update"
            exit 1
          fi

Gate 3 (futureRollbackSQL) is the critical gate. It fails the PR if any pending changeset has no rollback block. This is the enforcement mechanism that prevents rollback blocks from being skipped during development.

Gate 4 (updateTestingRollback) proves rollback executes. It is not enough for the SQL to be generated — it must actually run without errors on the test database.

Rollback Decision Tree

During an incident, guesswork is the enemy. Use this decision tree:

Deployment failed or post-deployment issues detected
│
├─ Is the problem in the application code (not the database)?
│   └─ YES → Roll back application code only; database migration stays
│
├─ Is the database migration partially applied?
│   └─ Check: liquibase history | liquibase status
│   ├─ All changesets applied → incident is post-migration
│   └─ Some changesets pending → migration failed mid-run
│       └─ Fix the failing changeset or rollback what ran
│
├─ Was a tag set before deployment?
│   ├─ YES → liquibase rollbackSQL --tag=vX.Y.Z → review → liquibase rollback --tag=vX.Y.Z
│   └─ NO  →
│       ├─ Use rollback file if it was pre-generated (check /var/log/app/)
│       ├─ Count how many changesets ran: liquibase history → liquibase rollbackCount N
│       └─ Use date: liquibase rollbackToDate "YYYY-MM-DD HH:MM:SS"
│
├─ Does rollback involve data deletion (dropTable, DELETE)?
│   └─ YES → Liquibase rollback restores structure, NOT data
│            → Restore from database backup, then apply corrected changeset
│
└─ Is rollback blocked by a checksum mismatch?
    └─ Check: was the deployed changeset modified after deployment?
       ├─ YES → Fix the changeset file to match deployed state
       └─ NO  → Run liquibase releaseLocks, then retry

Print this and put it in the incident response runbook. During an outage is not the time to be reasoning about Liquibase commands.

When Rollback Is Not the Answer

Some situations require a forward fix (new changeset) rather than rollback:

When data was deleted: dropTable, DELETE, TRUNCATE cannot be undone by Liquibase. The rollback changeset recreates the structure — the data is gone. Options:

Restore from pre-deployment backup (then replay any writes that happened since)
Write a forward changeset that reconstructs the data from available sources

When rollback would break application code that is already deployed: If the new application version depends on the new schema, rolling back the schema while the new app version is still running creates a different problem. Coordinate app rollback with schema rollback.

When the migration was successful but the feature has a bug: Deploy a fix forward. Rollback is for when the migration itself is the problem, not for reverting features.

When the changeset has no rollback and no backup: Write a corrective forward changeset. Document the incident. Add the rollback block that was missing.

Checksum Mismatches During Rollback

If someone edited a deployed changeset (even whitespace), Liquibase detects a checksum mismatch and refuses to run any command — including rollback. This is the worst time to discover a checksum problem.

Resolution:

# 1. Identify which changeset has the mismatch
liquibase validate

# 2. Option A: Restore the changeset file to its deployed state
#    (check git history for the original content)
git show HEAD~1:path/to/changeset.yaml > /tmp/original.yaml
# Compare and restore

# 3. Option B: Clear checksums and let Liquibase recompute
#    (only safe if the changeset change was purely cosmetic — whitespace/comment only)
liquibase clearCheckSums

# 4. Retry rollback
liquibase rollbackSQL --tag=${VERSION}
liquibase rollback --tag=${VERSION}

This is why the rule “never modify a deployed changeset” exists. Checksum mismatches during a rollback are a time-sensitive problem in a time-sensitive situation.

Rollback Monitoring: Know When It’s Done

After executing rollback, verify completion explicitly:

# The rolled-back changesets should be pending again
liquibase status

# The DATABASECHANGELOG should not have the rolled-back rows
liquibase history

# The schema should match the pre-deployment snapshot
liquibase diff \
  --url=jdbc:mysql://prod-host:3306/ecommerce \
  --reference-url="offline:mysql?snapshot=snapshots/pre-deploy-${VERSION}.json"

The diff against the pre-deployment snapshot is the definitive verification. If diff reports no differences, the database is in exactly the same state as before the deployment started.

Common Mistakes

Treating tag as optional: Teams that skip tagging on “small” deployments eventually have an incident on a small deployment. There is no such thing as a safe deployment that doesn’t need rollback capability. Tag every time, without exception.

Not reviewing futureRollbackSQL output: The CI gate runs futureRollbackSQL and checks it doesn’t error — but it doesn’t force a human to read it. Add a step that outputs the rollback SQL to the PR or deployment log so a reviewer has to see it.

Rolling back schema without rolling back application: If the application is still running with the new code that expects the new schema, rolling back the schema breaks the application in a different way. Schema rollback requires simultaneous application rollback (blue-green deployments handle this cleanly — covered in Article 17).

Best Practices

Tag before every production deployment — write it into the deployment pipeline so it cannot be skipped
Pre-generate rollback files at deployment time — rollback-file in Spring Boot or futureRollbackSQL output archived with deployment artefacts
futureRollbackSQL as a mandatory CI gate — fails the PR if any changeset has no rollback block
updateTestingRollback in CI — proves rollback executes, not just that SQL was generated
Print the decision tree — put it in the runbook; decisions during incidents must be mechanical, not creative
diff against pre-deployment snapshot after rollback — the only definitive proof that rollback is complete

What You’ve Learned

Production rollback reliability comes from process: mandatory tagging, pre-generated rollback files, CI gates
The deployment runbook order: validate → status → updateSQL → futureRollbackSQL → snapshot → tag → update → verify
spring.liquibase.rollback-file generates the rollback SQL at deployment time, ready for incidents
futureRollbackSQL in CI is the gate that enforces rollback blocks before merge, not during an incident
updateTestingRollback in CI proves rollback executes — generation is not enough
Data loss (dropTable, DELETE) cannot be recovered by Liquibase rollback — database backups are the safety net
Checksum mismatches during rollback require restoring the original changeset content before proceeding

Next: Article 17 — Zero-Downtime Deployments: The Expand-Contract Pattern — how to deploy schema changes without taking the application offline, using column renames and table restructuring as worked examples.