Diff, Snapshot, and Reverse Engineering: Onboarding Existing Databases
Most Liquibase tutorials start with a blank database. Most real projects start with a production database that has been evolving for years without version control. This article covers the complete workflow for adopting Liquibase on an existing database — generating a baseline changelog, bootstrapping Liquibase tracking, detecting drift between environments, and maintaining snapshots for offline comparisons.
The Onboarding Problem
An existing database has no DATABASECHANGELOG table. Running liquibase update with a fresh changelog would try to create tables that already exist, causing immediate failures. The solution has two phases:
- Capture the current state — generate a changelog that represents the database as it exists today
- Bootstrap Liquibase — tell Liquibase that the current state is already applied, without re-running the DDL
Phase 1: Generate a Baseline Changelog
generateChangelog
generateChangelog connects to an existing database and generates a changelog file representing its current schema. It is equivalent to a diff between the live database and an empty database.
liquibase generateChangelog \
--changelog-file=db/changelog/baseline/baseline.yaml \
--url=jdbc:mysql://localhost:3306/ecommerce \
--username=lb_user \
--password=lb_pass
Output: a YAML file with one changeset per database object (tables, indexes, foreign keys, views, stored procedures). Each changeset gets an auto-generated ID.
What it captures:
- Tables and columns (with types, defaults, nullability)
- Primary keys, unique constraints
- Indexes and foreign keys
- Views (as
createViewchangesets) - Stored procedures and functions (if the database supports exporting them)
What it does not capture well:
ON UPDATE CURRENT_TIMESTAMP(MySQL-specific, not in the standard DDL model)AUTO_INCREMENTstarting values- Column comments
CHECKconstraints (MySQL pre-8.0.16 silently ignores these)- Table-level charset and collation (may not appear in the generated changeset)
- Triggers (often incomplete or missing)
The generated file needs review before use. It is a starting point, not production-ready output.
Cleaning up the generated changelog
Open the generated file and inspect it. Common cleanup tasks:
1. Remove auto-generated changeset IDs and replace with your convention:
Generated:
- changeSet:
id: "1712345678901-1"
author: lb_user (generated)
Cleaned:
- changeSet:
id: "baseline-001"
author: abhay
comment: Baseline — users table as of 2026-06-01
2. Add charset/collation to createTable changesets:
Generated:
- createTable:
tableName: users
columns: [...]
Cleaned — add modifySql:
- createTable:
tableName: users
columns: [...]
modifySql:
- dbms: mysql
append:
value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"
3. Add ON UPDATE CURRENT_TIMESTAMP back as a sql changeset for updated_at columns that had it — the generated changeset will have dropped it.
4. Remove objects you don’t want tracked — temporary tables, legacy views that are about to be dropped, etc.
5. Add rollback blocks where they are missing (the generated file often lacks rollback for complex changesets).
Limiting what generateChangelog captures
Use --diff-types to control which object types are included:
# Only capture tables and views, skip indexes and stored procedures
liquibase generateChangelog \
--changelog-file=db/changelog/baseline/baseline.yaml \
--diff-types=tables,views
Available diff types: tables, columns, indexes, foreignkeys, primarykeys, uniqueconstraints, data, views, sequences, storedprocedures.
For most onboarding scenarios, start with tables only, then add indexes and foreignkeys after reviewing the table definitions. Views and stored procedures are better managed with runOnChange: true changesets written manually.
Phase 2: Bootstrap Liquibase on Existing Databases
After generating and cleaning the baseline changelog, you have two options for telling Liquibase that these changesets have already been applied.
Option A: changelogSync (recommended for simple cases)
changelogSync marks all changesets in the changelog as applied in DATABASECHANGELOG without executing any SQL:
liquibase changelogSync \
--changelog-file=db/changelog/baseline/baseline.yaml
After this runs, liquibase status reports 0 pending changesets. Future liquibase update runs will skip the baseline and only apply new changesets.
Limitation: changelogSync marks the entire changelog file at once. If the database is slightly different from the changelog (e.g., some tables are missing), it will be marked as if those tables were created — and subsequent runs won’t try to create them, leaving the schema inconsistent.
Option B: MARK_RAN preconditions (recommended for complex cases)
Add preconditions to each baseline changeset so they self-detect whether their work is already done:
- changeSet:
id: "baseline-001"
author: abhay
preConditions:
onFail: MARK_RAN
not:
- tableExists:
tableName: users
changes:
- createTable:
tableName: users
columns: [...]
This approach is more robust because each changeset checks before running. Tables that exist are skipped (MARK_RAN); tables that don’t exist are created. This handles the case where the database is partially in sync with the changelog.
Combining both approaches: For a large existing database, run changelogSync first to bootstrap quickly, then add MARK_RAN preconditions to future changesets that need to be idempotent.
Snapshot: Capturing Database State for Later Comparison
snapshot saves the current database schema to a JSON file without connecting to a database:
liquibase snapshot \
--snapshot-format=json \
--output-file=snapshots/ecommerce-prod-20260601.json
The snapshot captures:
- All tables, columns, types, defaults
- Indexes, foreign keys, constraints
- Views, sequences
Use cases for snapshots:
- Offline diff: Compare two database states without needing both databases connected simultaneously
- Pre-deployment baseline: Capture prod state before a deployment for rollback comparison
- Drift detection: Compare a snapshot taken last week against today’s schema
Using a snapshot in diff
# Compare current database against a saved snapshot
liquibase diff \
--url=jdbc:mysql://localhost:3306/ecommerce \
--username=lb_user \
--password=lb_pass \
--reference-url="offline:mysql?snapshot=snapshots/ecommerce-prod-20260601.json"
The offline:mysql?snapshot= URL tells Liquibase to use the JSON snapshot as the reference database. This is how you detect schema drift between a production deployment and the current state.
Pre-deployment snapshot workflow
# Step 1: Capture state before deployment
liquibase snapshot \
--snapshot-format=json \
--output-file=snapshots/pre-deploy-v1.2.0.json
# Step 2: Deploy
liquibase tag v1.2.0
liquibase updateSQL # Review
liquibase update # Deploy
# Step 3: If something went wrong, compare current state to pre-deploy snapshot
liquibase diff \
--url=jdbc:mysql://prod-host:3306/ecommerce \
--reference-url="offline:mysql?snapshot=snapshots/pre-deploy-v1.2.0.json"
This gives you a forensic record of exactly what changed during the deployment.
diffChangelog: Generate a Changelog from Schema Drift
diffChangelog compares two databases (or a database and a snapshot) and generates a changelog of the differences — not a full schema, just the delta:
# Compare dev to prod and generate the missing changesets
liquibase diffChangelog \
--changelog-file=db/changelog/migrations/drift-fix.yaml \
--url=jdbc:mysql://prod-host:3306/ecommerce \
--reference-url=jdbc:mysql://dev-host:3306/ecommerce \
--reference-username=lb_user \
--reference-password=lb_pass
Workflow: Dev database has changes that production is missing. diffChangelog generates the changesets needed to bring production in sync with dev.
Output example:
databaseChangeLog:
- changeSet:
id: "diff-1717200000000-1"
author: lb_user (generated)
changes:
- addColumn:
tableName: users
columns:
- column:
name: phone
type: VARCHAR(20)
As with generateChangelog, the output must be reviewed and cleaned before use — rename IDs, add rollback, add modifySql for MySQL options.
diffChangelog vs writing changesets manually
| Approach | Best For |
|---|---|
diffChangelog | Capturing drift that happened outside Liquibase (hotfixes, DBA changes) |
| Manual changesets | Normal development workflow |
diffChangelog is a recovery tool, not a development workflow. If developers are regularly using diffChangelog to generate changesets, it means schema changes are happening outside Liquibase — which defeats the purpose of version control.
Detecting Drift Between Environments
Use diff regularly as part of your release process to detect schema drift before it causes a deployment failure:
# Staging should match production — find any differences
liquibase diff \
--url=jdbc:mysql://staging-host:3306/ecommerce \
--username=lb_user \
--password=lb_pass \
--reference-url=jdbc:mysql://prod-host:3306/ecommerce \
--reference-username=lb_user \
--reference-password=lb_pass
A clean output:
Diff Results:
Reference Database: jdbc:mysql://prod-host:3306/ecommerce
Comparison Database: jdbc:mysql://staging-host:3306/ecommerce
Differences: NONE
Any reported differences are either:
- Expected — staging has migrations that haven’t been deployed to prod yet
- Unexpected — drift caused by manual changes, rollback failures, or inconsistent deployments
Make liquibase diff between staging and prod a step in your release checklist.
Complete Onboarding Workflow
Here is the full sequence for onboarding an existing production MySQL database:
# Step 1: Generate baseline changelog from production
liquibase generateChangelog \
--changelog-file=db/changelog/baseline/baseline.yaml \
--url=jdbc:mysql://prod-host:3306/ecommerce \
--diff-types=tables,columns,indexes,foreignkeys,primarykeys,uniqueconstraints
# Step 2: Review and clean the generated file
# - Fix changeset IDs
# - Add charset/collation via modifySql
# - Add missing rollback blocks
# - Remove objects you don't want tracked
# Step 3: Test against a copy of production
liquibase validateChangeLog \
--changelog-file=db/changelog/baseline/baseline.yaml
# Step 4: Bootstrap Liquibase on production
# (mark the baseline as already applied without running DDL)
liquibase changelogSync \
--changelog-file=db/changelog/baseline/baseline.yaml \
--url=jdbc:mysql://prod-host:3306/ecommerce
# Step 5: Verify the baseline is registered
liquibase status # Should show 0 pending changesets
# Step 6: Tag the baseline
liquibase tag baseline-2026-06-01
# Step 7: All future changes go in new changelog files
# db/changelog/migrations/2026/06/20260602-001-add-phone-column.yaml
Common Mistakes
Using the generated changelog as-is without review: generateChangelog produces a functional but incomplete representation of the schema. MySQL-specific features like ON UPDATE CURRENT_TIMESTAMP, charset/collation, and triggers are often missing or incorrect. Always review the output before committing it.
Running changelogSync without verifying the schema matches: changelogSync marks changesets as applied unconditionally. If the database schema doesn’t actually match the changelog (e.g., a column is a different type), Liquibase won’t know — it will just skip those changesets. Use diff before changelogSync to verify the schema matches.
Using diffChangelog as the normal development workflow: Generating changesets from drift means schema changes happened outside Liquibase. This is a recovery path, not a development pattern. All planned schema changes should be written as explicit changesets first, then applied to the database.
Best Practices
generateChangelogis the starting point, not the finished product — always review, clean, and test it before committingsnapshotbefore every production deployment — it is your forensic record and enables offline diff comparison after the factchangelogSyncwithdiffvalidation — rundiffbetween the generated baseline and the live database before bootstrapping to confirm they match- Tag immediately after
changelogSync—liquibase tag baseline-YYYY-MM-DDgives you a rollback point from day one - Use
diffChangelogonly for drift recovery — not as a routine changeset generation method - Add
MARK_RANpreconditions to baseline changesets for robustness when environments might be partially in sync
What You’ve Learned
generateChangelogreverse-engineers an existing database into a Liquibase changelog — it needs review and cleanup before usechangelogSyncmarks changesets as applied without executing DDL — the bootstrap step for existing databasessnapshotsaves database state to JSON for offline comparison — capture it before every production deploymentdiffcompares two live databases or a database against a snapshot — use it to detect driftdiffChangeloggenerates a delta changelog from schema differences — a recovery tool, not a development workflow- The full onboarding sequence: generate → review → validate →
changelogSync→ tag → continue with normal workflow
Next: Article 16 — Rollback in Production: Tag-Based Strategies and CI Validation — operationalising rollback so it is fast, tested, and reliable when you need it at 2am.