Part 14 of 18

MySQL-Specific Patterns: Character Sets, Engines, Large Table Migrations

Liquibase’s change types generate standard SQL. MySQL has quirks — character set requirements that differ from the standard, a storage engine option absent from other databases, row format constraints that affect index key length, and DDL locking behaviour that makes a 10-second table lock acceptable in dev and catastrophic in production.

This article covers the MySQL-specific patterns that you will need once you move from tutorial projects to real production databases.


Character Sets: Always utf8mb4

MySQL has two encodings that look similar:

  • utf8 — MySQL’s legacy 3-byte encoding. It is actually utf8mb3 — it cannot store 4-byte Unicode characters (emoji, some CJK characters, mathematical symbols). This is not real UTF-8.
  • utf8mb4 — Full 4-byte UTF-8. This is what you want for any user-facing data.

The confusion: MySQL named their 3-byte encoding utf8, which makes it look correct. Code that works in dev (where no emoji are inserted) silently corrupts data in production when a user enters an emoji in their name or a review.

Setting utf8mb4 at table creation

Liquibase’s createTable change type does not have a charset attribute — you must use modifySql or a sql changeset:

Method 1 — modifySql to append to the generated CREATE TABLE:

- changeSet:
    id: "20260615-001"
    author: abhay
    comment: Create payments table with utf8mb4
    changes:
      - createTable:
          tableName: payments
          columns:
            - column:
                name: id
                type: BIGINT UNSIGNED
                autoIncrement: true
                constraints:
                  primaryKey: true
                  nullable: false
            - column:
                name: order_id
                type: BIGINT UNSIGNED
                constraints:
                  nullable: false
            - column:
                name: method
                type: ENUM('card','paypal','bank_transfer','crypto')
                constraints:
                  nullable: false
            - column:
                name: status
                type: ENUM('pending','completed','failed','refunded')
                defaultValue: pending
                constraints:
                  nullable: false
            - column:
                name: amount
                type: DECIMAL(12,2)
                constraints:
                  nullable: false
            - column:
                name: gateway_reference
                type: VARCHAR(255)
            - column:
                name: created_at
                type: DATETIME
                defaultValueComputed: CURRENT_TIMESTAMP
                constraints:
                  nullable: false
    modifySql:
      - dbms: mysql
        append:
          value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"
    rollback:
      - dropTable:
          tableName: payments

modifySql appends text to the generated SQL before it is sent to MySQL. The dbms: mysql restriction ensures this append only happens on MySQL connections — H2 in tests is unaffected.

Method 2 — Full sql changeset (more explicit):

- changeSet:
    id: "20260615-001"
    author: abhay
    changes:
      - sql:
          sql: >
            CREATE TABLE payments (
              id               BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
              order_id         BIGINT UNSIGNED NOT NULL,
              method           ENUM('card','paypal','bank_transfer','crypto') NOT NULL,
              status           ENUM('pending','completed','failed','refunded') NOT NULL DEFAULT 'pending',
              amount           DECIMAL(12,2) NOT NULL,
              gateway_reference VARCHAR(255),
              created_at       DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
              PRIMARY KEY (id)
            ) ENGINE=InnoDB
              DEFAULT CHARSET=utf8mb4
              COLLATE=utf8mb4_unicode_ci
              ROW_FORMAT=DYNAMIC;            
    rollback:
      - dropTable:
          tableName: payments

The sql changeset gives full control over the DDL at the cost of losing cross-database portability (H2 tests won’t run this). Use modifySql when you want Liquibase change types to manage the column definitions and just need to add MySQL-specific table options.

Collation: unicode_ci vs 0900_ai_ci

MySQL 8.0 changed the default utf8mb4 collation from utf8mb4_general_ci to utf8mb4_0900_ai_ci. The two behave differently for string comparisons and sorting. Explicitly specifying utf8mb4_unicode_ci in your changelogs avoids collation drift when databases are created on different MySQL versions.

utf8mb4_unicode_ci is compatible across MySQL 5.7 and 8.x. If you’re on MySQL 8+ only, utf8mb4_0900_ai_ci is faster and more correct — but specify it explicitly rather than relying on the default.

Converting an existing table to utf8mb4

- changeSet:
    id: "20260616-001"
    author: abhay
    comment: Convert users table from utf8 to utf8mb4
    changes:
      - sql:
          sql: >
            ALTER TABLE users
            CONVERT TO CHARACTER SET utf8mb4
            COLLATE utf8mb4_unicode_ci;            
    rollback:
      - sql:
          sql: >
            ALTER TABLE users
            CONVERT TO CHARACTER SET utf8
            COLLATE utf8_general_ci;            

CONVERT TO CHARACTER SET converts the table and all its columns simultaneously. It is a full table rewrite — it acquires an exclusive lock for the duration. On large tables, run this during a maintenance window or use the Percona extension (covered below).


InnoDB ROW_FORMAT and Key Length

MySQL InnoDB has a maximum index key prefix length of 767 bytes by default (with the older COMPACT row format). This matters because:

  • A VARCHAR(255) column with utf8mb4 collation takes up to 255 × 4 = 1020 bytes — larger than the 767-byte limit
  • Trying to create an index on such a column fails with ERROR 1071: Specified key was too long; max key length is 767 bytes

The fix is ROW_FORMAT=DYNAMIC (or COMPRESSED), which increases the key prefix limit to 3072 bytes:

modifySql:
  - dbms: mysql
    append:
      value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"

MySQL 8.0 uses DYNAMIC as the default row format, so this is only a concern on MySQL 5.7 or when innodb_default_row_format has been changed from the default. Setting it explicitly in changelogs removes the dependency on server configuration.


modifySql: Customising Generated SQL

modifySql is Liquibase’s escape hatch for database-specific SQL generation. It modifies the SQL that Liquibase generates from a change type before executing it.

append

Appends text to the end of the generated SQL:

modifySql:
  - dbms: mysql
    append:
      value: " ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci"

prepend

Prepends text to the beginning of the generated SQL:

modifySql:
  - dbms: mysql
    prepend:
      value: "/* ecommerce migration */ "

replace

Replaces a string in the generated SQL:

modifySql:
  - dbms: mysql
    replace:
      replace: "INT"
      with: "INT UNSIGNED"

context-scoped modifySql

modifySql can be scoped to a context, just like a changeset:

modifySql:
  - dbms: mysql
    context: prod
    append:
      value: " ROW_FORMAT=COMPRESSED"  # Compressed format only in prod
  - dbms: mysql
    context: dev
    append:
      value: " ROW_FORMAT=DYNAMIC"     # Dynamic format in dev

Large Table Migrations: The Locking Problem

MySQL’s ALTER TABLE acquires metadata locks during DDL operations. For small tables, this is milliseconds. For a 50-million-row table:

  • Adding a column with a default: MySQL 8.x can do this instantly (in-place) if the default is not NULL
  • Adding a NOT NULL column without a default: full table copy, minutes to hours
  • Adding an index on a large column: full table scan, minutes
  • CONVERT TO CHARACTER SET: full table rewrite, potentially hours

During a full table copy, writes to the table are blocked. In production, this means an outage.

MySQL Online DDL

MySQL 8.x supports Online DDL for many operations — the table remains readable and writable during the operation. The lock level depends on the operation:

OperationLock Level (MySQL 8.x)
Add column with DEFAULTNo lock (instant)
Add column NOT NULLShared (reads ok, writes blocked) or No lock depending on type
Add indexNo lock for secondary indexes (online)
Drop columnShared lock
Change column typeExclusive lock (full table copy)
CONVERT TO CHARACTER SETExclusive lock (full table copy)

For operations that require an exclusive lock, Online DDL is not available and you need an alternative approach.

Using the Liquibase Percona Extension

The liquibase-percona extension replaces table-locking DDL operations with Percona’s pt-online-schema-change (pt-osc) tool. pt-osc creates a shadow table, applies the change to it, copies data in chunks (keeping the original table available), creates triggers to replicate ongoing writes, then swaps the tables atomically.

Install the extension:

<!-- pom.xml -->
<dependency>
  <groupId>com.github.liquibase-percona</groupId>
  <artifactId>liquibase-percona</artifactId>
  <version>4.28.0.0</version>
</dependency>

Or place the JAR in Liquibase’s lib/ directory for CLI use.

Use in a changeset:

- changeSet:
    id: "20260617-001"
    author: abhay
    comment: Add phone column to users — uses pt-osc for large table
    changes:
      - addColumn:
          tableName: users
          columns:
            - column:
                name: phone
                type: VARCHAR(20)
                constraints:
                  nullable: true
    rollback:
      - dropColumn:
          tableName: users
          columnName: phone

With the Percona extension installed, addColumn on a MySQL table is automatically routed through pt-osc instead of a direct ALTER TABLE. No changeset changes are required — the extension intercepts the operation transparently.

Control Percona extension behaviour with system properties:

# liquibase.properties
# Fail if pt-osc is not found (set to false to fall back to regular ALTER)
liquibase.percona.failIfNoPT=true

# Additional pt-osc options
liquibase.percona.ptParameters=--chunk-size=500 --sleep=0.1 --max-load=Threads_running=100

Limitations of pt-osc:

  • Tables must have a primary key or unique index (required for safe chunking)
  • Tables that already have triggers cannot use pt-osc (triggers conflict with pt-osc’s own replication triggers)
  • Requires pt-online-schema-change binary installed on the server running Liquibase

For tables with triggers, use gh-ost (GitHub’s online schema change tool) instead, invoked via a sql changeset that shells out to gh-ost.


MySQL-Specific Column Types

Liquibase’s standard change types do not cover all MySQL column types. Use sql changeset for:

JSON columns

- changeSet:
    id: "20260618-001"
    author: abhay
    changes:
      - addColumn:
          tableName: orders
          columns:
            - column:
                name: metadata
                type: JSON

type: JSON works in Liquibase’s addColumn for MySQL 5.7.8+. JSON columns cannot have default values in MySQL.

Generated columns

- changeSet:
    id: "20260618-002"
    author: abhay
    changes:
      - sql:
          sql: >
            ALTER TABLE users
            ADD COLUMN email_domain VARCHAR(255)
            GENERATED ALWAYS AS (SUBSTRING_INDEX(email, '@', -1)) STORED;            
    rollback:
      - dropColumn:
          tableName: users
          columnName: email_domain

Generated columns require a sql changeset — Liquibase’s addColumn change type cannot express GENERATED ALWAYS AS.

Full-text indexes

- changeSet:
    id: "20260618-003"
    author: abhay
    preConditions:
      onFail: MARK_RAN
      dbms:
        type: mysql
    changes:
      - sql:
          sql: >
            ALTER TABLE products
            ADD FULLTEXT INDEX ft_products_search (name, description);            
    rollback:
      - sql:
          sql: ALTER TABLE products DROP INDEX ft_products_search;

The dbms: mysql precondition ensures this changeset is skipped on H2 in tests.


Common Mistakes

Not specifying utf8mb4 explicitly: Relying on MySQL’s default character set means your character set depends on server configuration. A database created on a MySQL 5.7 server defaults to latin1. The same changelog run on MySQL 8.0 defaults to utf8mb4. Explicit charset in every CREATE TABLE removes this environment dependency.

Adding large indexes without considering locking: Adding a secondary index on a 100M-row table without the Percona extension locks the table for potentially 10-20 minutes. Always estimate the impact of index additions on large tables before deploying to production.

Using pt-osc on tables with triggers: pt-osc uses triggers internally to replicate writes during the copy. If the table already has triggers, pt-osc fails with “Table has triggers.” Use gh-ost or a maintenance window ALTER TABLE instead.


Best Practices

  • Always specify ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC in every CREATE TABLE via modifySql or sql changeset
  • Use modifySql with dbms: mysql to keep Liquibase change types for column management and only append MySQL-specific table options
  • Install the Percona extension in any environment that has tables over 10M rows — make it the default, not the exception
  • Check Online DDL lock level for every planned production migration — MySQL documentation lists lock requirements per operation
  • dbms: mysql precondition on MySQL-specific changesets (FULLTEXT, generated columns) to ensure they skip on H2 tests

What You’ve Learned

  • utf8mb4 is real UTF-8; MySQL’s utf8 is 3-byte and corrupts emoji silently — always specify utf8mb4 explicitly
  • modifySql appends, prepends, or replaces in Liquibase-generated SQL — the escape hatch for MySQL-specific table options
  • ROW_FORMAT=DYNAMIC is required for utf8mb4 indexes on VARCHAR(255) columns in MySQL 5.7
  • MySQL 8.x Online DDL reduces locking for most operations, but full table copies still block writes
  • The Liquibase Percona extension routes DDL through pt-osc for zero-downtime large table changes
  • Generated columns, FULLTEXT indexes, and JSON require sql changeset — Liquibase change types cannot express them

Next: Article 15 — Diff, Snapshot, and Reverse Engineering: Onboarding Existing Databases — using Liquibase’s inspection tools to bring an existing production database under version control.