Rollback in Production: Tag-Based Strategies and CI Validation
Article 9 covered what rollback commands exist. This article covers how to make rollback reliable in production — where a slow or broken rollback costs real money and real users. The difference between knowing the commands and having a working rollback strategy is process: mandatory tagging, pre-generated rollback files, CI gates, and a decision tree that removes guesswork during an incident.
Why Rollback Fails in Production
Rollback failures in production share a small set of root causes:
- No tag was set before deployment —
rollback --tagis unavailable; the on-call engineer is now doing arithmetic withrollbackCount - Rollback blocks were never written —
futureRollbackSQLwould have caught this, but it wasn’t in the pipeline - Rollback was never tested — the block exists but the SQL is wrong; discovered during the incident
- Data was deleted —
dropTableorDELETEcannot be undone by Liquibase; database backup is the only option - Rollback creates a dependency conflict — a later changeset depends on something the rollback removes
All five are preventable. None of them require heroics — they require process enforced before deployment.
The Mandatory Deployment Runbook
This runbook is not optional. Every production deployment that includes database migrations follows it in order.
Before deployment
# 1. Validate changelog
liquibase validate
# 2. Confirm pending changesets
liquibase status
# 3. Preview the forward SQL
liquibase updateSQL --output-file=/tmp/forward-${VERSION}.sql
# Review /tmp/forward-${VERSION}.sql
# 4. Preview the rollback SQL
liquibase futureRollbackSQL --output-file=/tmp/rollback-${VERSION}.sql
# Review /tmp/rollback-${VERSION}.sql — this is your undo script
# 5. Take a database snapshot
liquibase snapshot \
--snapshot-format=json \
--output-file=snapshots/pre-deploy-${VERSION}.json
# 6. TAG — no deployment proceeds without this
liquibase tag ${VERSION}
# e.g., liquibase tag v1.2.0
Steps 3 and 4 produce files that must be reviewed by a human. Steps 5 and 6 are non-negotiable machine steps — the snapshot is your forensic record, the tag is your rollback anchor.
Deployment
# 7. Apply migrations
liquibase update
# 8. Confirm 0 pending changesets
liquibase status
# 9. Verify application health
# Run smoke tests, check APM, watch error rates for 5 minutes
If deployment fails (rollback decision)
# 10. Review what changed
liquibase history
# 11. Preview rollback (the file from step 4 should already exist)
liquibase rollbackSQL --tag=${VERSION}
# 12. Execute rollback
liquibase rollback --tag=${VERSION}
# 13. Verify rollback
liquibase status # Should show the rolled-back changesets as pending again
liquibase history # Rolled-back changesets should be absent
The Pre-Generated Rollback File
The most underused Liquibase feature in production: spring.liquibase.rollback-file.
# application-prod.yml
spring:
liquibase:
tag: ${APP_VERSION}
rollback-file: /var/log/app/rollback-${APP_VERSION}.sql
When Spring Boot starts and Liquibase runs update, it simultaneously:
- Tags the database with
${APP_VERSION} - Writes the rollback SQL to
/var/log/app/rollback-v1.2.0.sql
The rollback file is generated at deployment time, not when the incident happens. The on-call engineer finds a ready-made SQL file. They review it, hand it to a DBA if needed, and execute:
mysql -h prod-host -u lb_user -p ecommerce < /var/log/app/rollback-v1.2.0.sql
This bypasses Liquibase entirely for the rollback execution — the SQL is already generated, reviewed, and sitting on disk. No Liquibase CLI required on the production host during an incident.
For Maven-managed deployments:
# Generate rollback file as a pre-deployment step
liquibase futureRollbackSQL \
--output-file=/deployments/rollback-${VERSION}.sql
# Archive it with the deployment artefacts
cp /deployments/rollback-${VERSION}.sql s3://your-bucket/deployments/rollback-${VERSION}.sql
CI Pipeline Gates
Every PR that adds a changeset must pass these gates before merge. No exceptions.
# .github/workflows/db-migration-gates.yml
name: Database Migration Gates
on:
pull_request:
paths:
- 'src/main/resources/db/**'
jobs:
migration-gates:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: ecommerce_ci
MYSQL_USER: lb_user
MYSQL_PASSWORD: lb_pass
ports: ["3306:3306"]
options: >-
--health-cmd="mysqladmin ping -h localhost"
--health-interval=10s
--health-timeout=5s
--health-retries=5
env:
LIQUIBASE_URL: jdbc:mysql://localhost:3306/ecommerce_ci?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
LIQUIBASE_USERNAME: lb_user
LIQUIBASE_PASSWORD: lb_pass
steps:
- uses: actions/checkout@v4
- name: Setup Java
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
- name: Gate 1 — Validate changelog syntax
run: liquibase validate
- name: Gate 2 — Preview forward SQL
run: |
liquibase updateSQL --output-file=/tmp/forward.sql
echo "=== Forward SQL Preview ==="
cat /tmp/forward.sql
- name: Gate 3 — Validate rollback exists for all pending changesets
run: |
liquibase futureRollbackSQL --output-file=/tmp/rollback.sql
echo "=== Rollback SQL Preview ==="
cat /tmp/rollback.sql
- name: Gate 4 — Apply and test rollback round-trip
run: liquibase updateTestingRollback
- name: Gate 5 — Apply cleanly (final state)
run: liquibase update
- name: Gate 6 — Verify 0 pending after apply
run: |
PENDING=$(liquibase status 2>&1 | grep "changesets have not been applied" | awk '{print $1}')
if [ "${PENDING:-0}" != "0" ]; then
echo "ERROR: ${PENDING} changesets still pending after update"
exit 1
fi
Gate 3 (futureRollbackSQL) is the critical gate. It fails the PR if any pending changeset has no rollback block. This is the enforcement mechanism that prevents rollback blocks from being skipped during development.
Gate 4 (updateTestingRollback) proves rollback executes. It is not enough for the SQL to be generated — it must actually run without errors on the test database.
Rollback Decision Tree
During an incident, guesswork is the enemy. Use this decision tree:
Deployment failed or post-deployment issues detected
│
├─ Is the problem in the application code (not the database)?
│ └─ YES → Roll back application code only; database migration stays
│
├─ Is the database migration partially applied?
│ └─ Check: liquibase history | liquibase status
│ ├─ All changesets applied → incident is post-migration
│ └─ Some changesets pending → migration failed mid-run
│ └─ Fix the failing changeset or rollback what ran
│
├─ Was a tag set before deployment?
│ ├─ YES → liquibase rollbackSQL --tag=vX.Y.Z → review → liquibase rollback --tag=vX.Y.Z
│ └─ NO →
│ ├─ Use rollback file if it was pre-generated (check /var/log/app/)
│ ├─ Count how many changesets ran: liquibase history → liquibase rollbackCount N
│ └─ Use date: liquibase rollbackToDate "YYYY-MM-DD HH:MM:SS"
│
├─ Does rollback involve data deletion (dropTable, DELETE)?
│ └─ YES → Liquibase rollback restores structure, NOT data
│ → Restore from database backup, then apply corrected changeset
│
└─ Is rollback blocked by a checksum mismatch?
└─ Check: was the deployed changeset modified after deployment?
├─ YES → Fix the changeset file to match deployed state
└─ NO → Run liquibase releaseLocks, then retry
Print this and put it in the incident response runbook. During an outage is not the time to be reasoning about Liquibase commands.
When Rollback Is Not the Answer
Some situations require a forward fix (new changeset) rather than rollback:
When data was deleted: dropTable, DELETE, TRUNCATE cannot be undone by Liquibase. The rollback changeset recreates the structure — the data is gone. Options:
- Restore from pre-deployment backup (then replay any writes that happened since)
- Write a forward changeset that reconstructs the data from available sources
When rollback would break application code that is already deployed: If the new application version depends on the new schema, rolling back the schema while the new app version is still running creates a different problem. Coordinate app rollback with schema rollback.
When the migration was successful but the feature has a bug: Deploy a fix forward. Rollback is for when the migration itself is the problem, not for reverting features.
When the changeset has no rollback and no backup: Write a corrective forward changeset. Document the incident. Add the rollback block that was missing.
Checksum Mismatches During Rollback
If someone edited a deployed changeset (even whitespace), Liquibase detects a checksum mismatch and refuses to run any command — including rollback. This is the worst time to discover a checksum problem.
Resolution:
# 1. Identify which changeset has the mismatch
liquibase validate
# 2. Option A: Restore the changeset file to its deployed state
# (check git history for the original content)
git show HEAD~1:path/to/changeset.yaml > /tmp/original.yaml
# Compare and restore
# 3. Option B: Clear checksums and let Liquibase recompute
# (only safe if the changeset change was purely cosmetic — whitespace/comment only)
liquibase clearCheckSums
# 4. Retry rollback
liquibase rollbackSQL --tag=${VERSION}
liquibase rollback --tag=${VERSION}
This is why the rule “never modify a deployed changeset” exists. Checksum mismatches during a rollback are a time-sensitive problem in a time-sensitive situation.
Rollback Monitoring: Know When It’s Done
After executing rollback, verify completion explicitly:
# The rolled-back changesets should be pending again
liquibase status
# The DATABASECHANGELOG should not have the rolled-back rows
liquibase history
# The schema should match the pre-deployment snapshot
liquibase diff \
--url=jdbc:mysql://prod-host:3306/ecommerce \
--reference-url="offline:mysql?snapshot=snapshots/pre-deploy-${VERSION}.json"
The diff against the pre-deployment snapshot is the definitive verification. If diff reports no differences, the database is in exactly the same state as before the deployment started.
Common Mistakes
Treating tag as optional: Teams that skip tagging on “small” deployments eventually have an incident on a small deployment. There is no such thing as a safe deployment that doesn’t need rollback capability. Tag every time, without exception.
Not reviewing futureRollbackSQL output: The CI gate runs futureRollbackSQL and checks it doesn’t error — but it doesn’t force a human to read it. Add a step that outputs the rollback SQL to the PR or deployment log so a reviewer has to see it.
Rolling back schema without rolling back application: If the application is still running with the new code that expects the new schema, rolling back the schema breaks the application in a different way. Schema rollback requires simultaneous application rollback (blue-green deployments handle this cleanly — covered in Article 17).
Best Practices
- Tag before every production deployment — write it into the deployment pipeline so it cannot be skipped
- Pre-generate rollback files at deployment time —
rollback-filein Spring Boot orfutureRollbackSQLoutput archived with deployment artefacts futureRollbackSQLas a mandatory CI gate — fails the PR if any changeset has no rollback blockupdateTestingRollbackin CI — proves rollback executes, not just that SQL was generated- Print the decision tree — put it in the runbook; decisions during incidents must be mechanical, not creative
diffagainst pre-deployment snapshot after rollback — the only definitive proof that rollback is complete
What You’ve Learned
- Production rollback reliability comes from process: mandatory tagging, pre-generated rollback files, CI gates
- The deployment runbook order: validate → status → updateSQL → futureRollbackSQL → snapshot → tag → update → verify
spring.liquibase.rollback-filegenerates the rollback SQL at deployment time, ready for incidentsfutureRollbackSQLin CI is the gate that enforces rollback blocks before merge, not during an incidentupdateTestingRollbackin CI proves rollback executes — generation is not enough- Data loss (dropTable, DELETE) cannot be recovered by Liquibase rollback — database backups are the safety net
- Checksum mismatches during rollback require restoring the original changeset content before proceeding
Next: Article 17 — Zero-Downtime Deployments: The Expand-Contract Pattern — how to deploy schema changes without taking the application offline, using column renames and table restructuring as worked examples.