we have a mission critical mysql database hosted on premise for many years, as the data growing it has reached more than 10T of data, it burn us a lot of effort to maintain the database, so we decide to migrate it into AWS RDS.

MySQL-Logo.wine

As it's mission critical data, there is no way to take the database offline for migration, we must migrate the database while the data base still accept online transaction and no data lost allowed. this is very challengeable when the task land on my shoulder as I'm not a DBA, I tried to use AWS DMS to do the migration, but DMS is too slow for such large data base, I figure out another way to complete the task as below

AWS RDS support extrabackup format natively, so I could use extrabackup to export the whole database physically and record the backup binlog position, then we could restore the xbackup into AWS, then I could make RDS a slave instance of current on premise mysql to syncup the data change since last backup. with this approach, the on premise database could continue to serve the on line transaction and we could do db switch after slave has synced with master.

Step 1 mysql binlog setting

the estimation of the whole migration will take about 10 days, so we must set the binlog size large enough to hold all the transactions in the binlog , so that we could catch up later

step 2 attach backup disk

attach backup disk to on premise database instance, the original disk is around 10T of data, so we must attach another 12T disk to store the xbackup data

step 3 install extrabackup

yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum list | grep percona
yum install percona-xtrabackup-24

step 4 backup database into backup folder

this will take about 75 hours to complete for 10T database, the command below will connect to local mysql db as user admin and export data into folder /mnt/data4/backup in tar format, it will split the file on every 500GB

screen -R backup
time xtrabackup --backup --user=admin --password=xxxxxxx --stream=tar --target-dir=/mnt/data4/backup | split -d --bytes=500GB - /mnt/data4/backup/masterdata.tar

step 5 upload data to aws s3

after we backup the data , we need to upload the data into s3 so that aws RDS could read from the upload will take about 48 hours when happen in US east region

screen -R backup
cd /mnt/data4
time aws s3 cp --recursive backup s3://some_buckeet/backup

step 6 create RDS based on xtrabackup from S3

  • go to aws console https://console.aws.amazon.com/rds/home?region=us-east-1#databases:
  • click button "Restore from S3"
  • s3 bucket: some_buckeet
  • s3 prefex: backup
  • Engine type: Amazon Aurora
  • Version: Aurora(MySQL 5.6) 1.23.0 (as we have mysql 5.5 on premise, we could go beyond just 1 major version above, we could migrate to Aurora 2 afterwards)
  • IAM role: aurora-restore-from-s3, we need an create a role , so that Aurora instance could read data from given s3 bucket
  • ... the rest standard create database options

the DB restore will take about 88 hours, after db restored successfully, RDS console will show current binlog position like below in event tab

Recovered from Binary log filename 'mysql-bin.nnnnn', position 'mmmmmm'.

this information is very important, we need to use this to setup the master slave replication , so that the RDS could catch up with on premise mysql database

step 7 RDS catch up with on premise database

after the database has restored from s3, it's time to catch up with on premise data to make the 2 databases identical

  • need to allow public traffic for the RDS , so that it could communicate with on premise database
  • need security group in place to only allow on premise database communicate with RDS
  • connect to RDS to setup the slave like below
-- set the binlog position
CALL mysql.rds_set_external_master ('on premise mysql ip', 3306,'rdsrepl', 'xxxx', 'mysql-bin.nnnnn', mmmmmm, 0);
-- start the replication
CALL mysql.rds_start_replication;
-- check slave status
SHOW SLAVE STATUS \G

the cache up will take about 60 hours, there is a know bug from mysql 5.6 that , we need to update DB parameter group

set query_cache_type to 0 to turn off the query cache, otherwise the replication will stop at “invalidating query cache entries”, see details from https://bugs.mysql.com/bug.php?id=60696

step 8 switch application endpoint to new RDS

it's come to an excited point, that the 2 databases are identical now, we could to switch the application db url on standard maintenance window.
during the standard maintenace window, we could do the following

  • stop online applications
  • stop db replications
  • update db url to point to RDS
  • start online application

Now it's time to celebrate!