MySQL-Specific Patterns: Character Sets, Engines, Large Table Migrations
Liquibase’s change types generate standard SQL. MySQL has quirks — character set requirements that differ from the standard, a storage engine option absent from other databases, row format constraints that affect index key length, and DDL locking behaviour that makes a 10-second table lock acceptable in dev and catastrophic in production.
This article covers the MySQL-specific patterns that you will need once you move from tutorial projects to real production databases.
Character Sets: Always utf8mb4
MySQL has two encodings that look similar:
utf8— MySQL’s legacy 3-byte encoding. It is actuallyutf8mb3— it cannot store 4-byte Unicode characters (emoji, some CJK characters, mathematical symbols). This is not real UTF-8.utf8mb4— Full 4-byte UTF-8. This is what you want for any user-facing data.
The confusion: MySQL named their 3-byte encoding utf8, which makes it look correct. Code that works in dev (where no emoji are inserted) silently corrupts data in production when a user enters an emoji in their name or a review.
Setting utf8mb4 at table creation
Liquibase’s createTable change type does not have a charset attribute — you must use modifySql or a sql changeset:
Method 1 — modifySql to append to the generated CREATE TABLE:
- changeSet:
id: "20260615-001"
author: abhay
comment: Create payments table with utf8mb4
changes:
- createTable:
tableName: payments
columns:
- column:
name: id
type: BIGINT UNSIGNED
autoIncrement: true
constraints:
primaryKey: true
nullable: false
- column:
name: order_id
type: BIGINT UNSIGNED
constraints:
nullable: false
- column:
name: method
type: ENUM('card','paypal','bank_transfer','crypto')
constraints:
nullable: false
- column:
name: status
type: ENUM('pending','completed','failed','refunded')
defaultValue: pending
constraints:
nullable: false
- column:
name: amount
type: DECIMAL(12,2)
constraints:
nullable: false
- column:
name: gateway_reference
type: VARCHAR(255)
- column:
name: created_at
type: DATETIME
defaultValueComputed: CURRENT_TIMESTAMP
constraints:
nullable: false
modifySql:
- dbms: mysql
append:
value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"
rollback:
- dropTable:
tableName: payments
modifySql appends text to the generated SQL before it is sent to MySQL. The dbms: mysql restriction ensures this append only happens on MySQL connections — H2 in tests is unaffected.
Method 2 — Full sql changeset (more explicit):
- changeSet:
id: "20260615-001"
author: abhay
changes:
- sql:
sql: >
CREATE TABLE payments (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
order_id BIGINT UNSIGNED NOT NULL,
method ENUM('card','paypal','bank_transfer','crypto') NOT NULL,
status ENUM('pending','completed','failed','refunded') NOT NULL DEFAULT 'pending',
amount DECIMAL(12,2) NOT NULL,
gateway_reference VARCHAR(255),
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_unicode_ci
ROW_FORMAT=DYNAMIC;
rollback:
- dropTable:
tableName: payments
The sql changeset gives full control over the DDL at the cost of losing cross-database portability (H2 tests won’t run this). Use modifySql when you want Liquibase change types to manage the column definitions and just need to add MySQL-specific table options.
Collation: unicode_ci vs 0900_ai_ci
MySQL 8.0 changed the default utf8mb4 collation from utf8mb4_general_ci to utf8mb4_0900_ai_ci. The two behave differently for string comparisons and sorting. Explicitly specifying utf8mb4_unicode_ci in your changelogs avoids collation drift when databases are created on different MySQL versions.
utf8mb4_unicode_ci is compatible across MySQL 5.7 and 8.x. If you’re on MySQL 8+ only, utf8mb4_0900_ai_ci is faster and more correct — but specify it explicitly rather than relying on the default.
Converting an existing table to utf8mb4
- changeSet:
id: "20260616-001"
author: abhay
comment: Convert users table from utf8 to utf8mb4
changes:
- sql:
sql: >
ALTER TABLE users
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
rollback:
- sql:
sql: >
ALTER TABLE users
CONVERT TO CHARACTER SET utf8
COLLATE utf8_general_ci;
CONVERT TO CHARACTER SET converts the table and all its columns simultaneously. It is a full table rewrite — it acquires an exclusive lock for the duration. On large tables, run this during a maintenance window or use the Percona extension (covered below).
InnoDB ROW_FORMAT and Key Length
MySQL InnoDB has a maximum index key prefix length of 767 bytes by default (with the older COMPACT row format). This matters because:
- A
VARCHAR(255)column withutf8mb4collation takes up to255 × 4 = 1020 bytes— larger than the 767-byte limit - Trying to create an index on such a column fails with
ERROR 1071: Specified key was too long; max key length is 767 bytes
The fix is ROW_FORMAT=DYNAMIC (or COMPRESSED), which increases the key prefix limit to 3072 bytes:
modifySql:
- dbms: mysql
append:
value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"
MySQL 8.0 uses DYNAMIC as the default row format, so this is only a concern on MySQL 5.7 or when innodb_default_row_format has been changed from the default. Setting it explicitly in changelogs removes the dependency on server configuration.
modifySql: Customising Generated SQL
modifySql is Liquibase’s escape hatch for database-specific SQL generation. It modifies the SQL that Liquibase generates from a change type before executing it.
append
Appends text to the end of the generated SQL:
modifySql:
- dbms: mysql
append:
value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci"
prepend
Prepends text to the beginning of the generated SQL:
modifySql:
- dbms: mysql
prepend:
value: "/* ecommerce migration */ "
replace
Replaces a string in the generated SQL:
modifySql:
- dbms: mysql
replace:
replace: "INT"
with: "INT UNSIGNED"
context-scoped modifySql
modifySql can be scoped to a context, just like a changeset:
modifySql:
- dbms: mysql
context: prod
append:
value: " ROW_FORMAT=COMPRESSED" # Compressed format only in prod
- dbms: mysql
context: dev
append:
value: " ROW_FORMAT=DYNAMIC" # Dynamic format in dev
Large Table Migrations: The Locking Problem
MySQL’s ALTER TABLE acquires metadata locks during DDL operations. For small tables, this is milliseconds. For a 50-million-row table:
- Adding a column with a default: MySQL 8.x can do this instantly (in-place) if the default is not NULL
- Adding a
NOT NULLcolumn without a default: full table copy, minutes to hours - Adding an index on a large column: full table scan, minutes
CONVERT TO CHARACTER SET: full table rewrite, potentially hours
During a full table copy, writes to the table are blocked. In production, this means an outage.
MySQL Online DDL
MySQL 8.x supports Online DDL for many operations — the table remains readable and writable during the operation. The lock level depends on the operation:
| Operation | Lock Level (MySQL 8.x) |
|---|---|
| Add column with DEFAULT | No lock (instant) |
| Add column NOT NULL | Shared (reads ok, writes blocked) or No lock depending on type |
| Add index | No lock for secondary indexes (online) |
| Drop column | Shared lock |
| Change column type | Exclusive lock (full table copy) |
CONVERT TO CHARACTER SET | Exclusive lock (full table copy) |
For operations that require an exclusive lock, Online DDL is not available and you need an alternative approach.
Using the Liquibase Percona Extension
The liquibase-percona extension replaces table-locking DDL operations with Percona’s pt-online-schema-change (pt-osc) tool. pt-osc creates a shadow table, applies the change to it, copies data in chunks (keeping the original table available), creates triggers to replicate ongoing writes, then swaps the tables atomically.
Install the extension:
<!-- pom.xml -->
<dependency>
<groupId>com.github.liquibase-percona</groupId>
<artifactId>liquibase-percona</artifactId>
<version>4.28.0.0</version>
</dependency>
Or place the JAR in Liquibase’s lib/ directory for CLI use.
Use in a changeset:
- changeSet:
id: "20260617-001"
author: abhay
comment: Add phone column to users — uses pt-osc for large table
changes:
- addColumn:
tableName: users
columns:
- column:
name: phone
type: VARCHAR(20)
constraints:
nullable: true
rollback:
- dropColumn:
tableName: users
columnName: phone
With the Percona extension installed, addColumn on a MySQL table is automatically routed through pt-osc instead of a direct ALTER TABLE. No changeset changes are required — the extension intercepts the operation transparently.
Control Percona extension behaviour with system properties:
# liquibase.properties
# Fail if pt-osc is not found (set to false to fall back to regular ALTER)
liquibase.percona.failIfNoPT=true
# Additional pt-osc options
liquibase.percona.ptParameters=--chunk-size=500 --sleep=0.1 --max-load=Threads_running=100
Limitations of pt-osc:
- Tables must have a primary key or unique index (required for safe chunking)
- Tables that already have triggers cannot use pt-osc (triggers conflict with pt-osc’s own replication triggers)
- Requires
pt-online-schema-changebinary installed on the server running Liquibase
For tables with triggers, use gh-ost (GitHub’s online schema change tool) instead, invoked via a sql changeset that shells out to gh-ost.
MySQL-Specific Column Types
Liquibase’s standard change types do not cover all MySQL column types. Use sql changeset for:
JSON columns
- changeSet:
id: "20260618-001"
author: abhay
changes:
- addColumn:
tableName: orders
columns:
- column:
name: metadata
type: JSON
type: JSON works in Liquibase’s addColumn for MySQL 5.7.8+. JSON columns cannot have default values in MySQL.
Generated columns
- changeSet:
id: "20260618-002"
author: abhay
changes:
- sql:
sql: >
ALTER TABLE users
ADD COLUMN email_domain VARCHAR(255)
GENERATED ALWAYS AS (SUBSTRING_INDEX(email, '@', -1)) STORED;
rollback:
- dropColumn:
tableName: users
columnName: email_domain
Generated columns require a sql changeset — Liquibase’s addColumn change type cannot express GENERATED ALWAYS AS.
Full-text indexes
- changeSet:
id: "20260618-003"
author: abhay
preConditions:
onFail: MARK_RAN
dbms:
type: mysql
changes:
- sql:
sql: >
ALTER TABLE products
ADD FULLTEXT INDEX ft_products_search (name, description);
rollback:
- sql:
sql: ALTER TABLE products DROP INDEX ft_products_search;
The dbms: mysql precondition ensures this changeset is skipped on H2 in tests.
Common Mistakes
Not specifying utf8mb4 explicitly: Relying on MySQL’s default character set means your character set depends on server configuration. A database created on a MySQL 5.7 server defaults to latin1. The same changelog run on MySQL 8.0 defaults to utf8mb4. Explicit charset in every CREATE TABLE removes this environment dependency.
Adding large indexes without considering locking: Adding a secondary index on a 100M-row table without the Percona extension locks the table for potentially 10-20 minutes. Always estimate the impact of index additions on large tables before deploying to production.
Using pt-osc on tables with triggers: pt-osc uses triggers internally to replicate writes during the copy. If the table already has triggers, pt-osc fails with “Table has triggers.” Use gh-ost or a maintenance window ALTER TABLE instead.
Best Practices
- Always specify
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMICin everyCREATE TABLEviamodifySqlorsqlchangeset - Use
modifySqlwithdbms: mysqlto keep Liquibase change types for column management and only append MySQL-specific table options - Install the Percona extension in any environment that has tables over 10M rows — make it the default, not the exception
- Check Online DDL lock level for every planned production migration — MySQL documentation lists lock requirements per operation
dbms: mysqlprecondition on MySQL-specific changesets (FULLTEXT, generated columns) to ensure they skip on H2 tests
What You’ve Learned
utf8mb4is real UTF-8; MySQL’sutf8is 3-byte and corrupts emoji silently — always specifyutf8mb4explicitlymodifySqlappends, prepends, or replaces in Liquibase-generated SQL — the escape hatch for MySQL-specific table optionsROW_FORMAT=DYNAMICis required forutf8mb4indexes onVARCHAR(255)columns in MySQL 5.7- MySQL 8.x Online DDL reduces locking for most operations, but full table copies still block writes
- The Liquibase Percona extension routes DDL through
pt-oscfor zero-downtime large table changes - Generated columns, FULLTEXT indexes, and JSON require
sqlchangeset — Liquibase change types cannot express them
Next: Article 15 — Diff, Snapshot, and Reverse Engineering: Onboarding Existing Databases — using Liquibase’s inspection tools to bring an existing production database under version control.