we have a mission critical mysql database hosted on premise for many years, as the data growing it has reached more than 10T of data, it burn us a lot of effort to maintain the database, so we decide to migrate it into AWS RDS.
As it's mission critical data, there is no way to take the database offline for migration, we must migrate the database while the data base still accept online transaction and no data lost allowed. this is very challengeable when the task land on my shoulder as I'm not a DBA, I tried to use AWS DMS to do the migration, but DMS is too slow for such large data base, I figure out another way to complete the task as below
AWS RDS support extrabackup format natively, so I could use extrabackup to export the whole database physically and record the backup binlog position, then we could restore the xbackup into AWS, then I could make RDS a slave instance of current on premise mysql to syncup the data change since last backup. with this approach, the on premise database could continue to serve the on line transaction and we could do db switch after slave has synced with master.
Step 1 mysql binlog setting
the estimation of the whole migration will take about 10 days, so we must set the binlog size large enough to hold all the transactions in the binlog , so that we could catch up later
step 2 attach backup disk
attach backup disk to on premise database instance, the original disk is around 10T of data, so we must attach another 12T disk to store the xbackup data
step 3 install extrabackup
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum list | grep percona
yum install percona-xtrabackup-24
step 4 backup database into backup folder
this will take about 75 hours to complete for 10T database, the command below will connect to local mysql db as user admin and export data into folder /mnt/data4/backup in tar format, it will split the file on every 500GB
screen -R backup
time xtrabackup --backup --user=admin --password=xxxxxxx --stream=tar --target-dir=/mnt/data4/backup | split -d --bytes=500GB - /mnt/data4/backup/masterdata.tar
step 5 upload data to aws s3
after we backup the data , we need to upload the data into s3 so that aws RDS could read from the upload will take about 48 hours when happen in US east region
screen -R backup
cd /mnt/data4
time aws s3 cp --recursive backup s3://some_buckeet/backup
step 6 create RDS based on xtrabackup from S3
- go to aws console https://console.aws.amazon.com/rds/home?region=us-east-1#databases:
- click button "Restore from S3"
- s3 bucket: some_buckeet
- s3 prefex: backup
- Engine type: Amazon Aurora
- Version: Aurora(MySQL 5.6) 1.23.0 (as we have mysql 5.5 on premise, we could go beyond just 1 major version above, we could migrate to Aurora 2 afterwards)
- IAM role: aurora-restore-from-s3, we need an create a role , so that Aurora instance could read data from given s3 bucket
- ... the rest standard create database options
the DB restore will take about 88 hours, after db restored successfully, RDS console will show current binlog position like below in event tab
Recovered from Binary log filename 'mysql-bin.nnnnn', position 'mmmmmm'.
this information is very important, we need to use this to setup the master slave replication , so that the RDS could catch up with on premise mysql database
step 7 RDS catch up with on premise database
after the database has restored from s3, it's time to catch up with on premise data to make the 2 databases identical
- need to allow public traffic for the RDS , so that it could communicate with on premise database
- need security group in place to only allow on premise database communicate with RDS
- connect to RDS to setup the slave like below
-- set the binlog position
CALL mysql.rds_set_external_master ('on premise mysql ip', 3306,'rdsrepl', 'xxxx', 'mysql-bin.nnnnn', mmmmmm, 0);
-- start the replication
CALL mysql.rds_start_replication;
-- check slave status
SHOW SLAVE STATUS \G
the cache up will take about 60 hours, there is a know bug from mysql 5.6 that , we need to update DB parameter group
set query_cache_type to 0 to turn off the query cache, otherwise the replication will stop at “invalidating query cache entries”, see details from https://bugs.mysql.com/bug.php?id=60696
step 8 switch application endpoint to new RDS
it's come to an excited point, that the 2 databases are identical now, we could to switch the application db url on standard maintenance window.
during the standard maintenace window, we could do the following
- stop online applications
- stop db replications
- update db url to point to RDS
- start online application
Now it's time to celebrate!