Late in 2023, AWS informed our Aurora servers needed to be migrated due to the deprecation of the MySQL Community major version 5.7 and it was recommended to migrate to major version 8.0.

[!cite] AWS Announcement

[!note] Check how companies are not aware of this costs and how this could be impacting your billing: AWS Billing spike due to RDS Extended Support

AWS RDS Extended Support Cost Image

That day during our stand up the Scrum Master and the CTO assigned to me the task to evaluate the impact of this notification and inmediately I raise my concern about the costs the company will have to pay if we meet the deadline. In the beginning this didn’t see like too much - $0.1 per VCPU -, and I decided to invest that day in putting together all the numbers to make sure we all understand the impact of this notification.

The following day during my turn in the stand up I explained the numbers to the team, what were all the services involved and my action plan to complete the migration. The following table will show you the additional costs we would’ve have to pay every month:

Instance Type Instances V-CPUs Additional Cost Monthly 1st & 2nd Year Additional Cost Monthly 3rd Year
db.r5.4xlarge 2 32 (16x2) $2,380 ($0.1 x 32VCPU x 744H) $4,761,6 ($0.2 x 32VCPU x 744H)
db.t3.small 12 24 (2x12) $1,785.6 ($0,1 x 24VCPU x 744H) $3,571.2 ($0.2 x 24VCPU x 744H)
Total 56 $4,165.6$ $8,332.8
Total Yearly $49,987.2 $99.974.4

This was 2 and 3 times the monthly budget for databases at that time, and after the numbers were discussed, I got green light and migrating became high priority.

As you can see by the number of instances there were a lot of engines to migrate and my action plan was the following:

  1. Start migrating lower environments first (DEV, QA, STAGGING, PRE-PROD).
  2. Migrating one engine daily.
  3. Backup every engine before migrating.
  4. Notifiying IT members about lower environment migrations before migrating any engine.
  5. After the migration.
    • Measure downtime.
    • Check if there were any incidents after migrating.
    • Check application logs related to every migrated engine.
    • Check CPU and RAM usage.
  6. If all was good, start migrating another lower environment engine.
  7. After migrating all the lower environment engines,
    • wait 1 week before migrating production.
    • Check if there were any incidents after migrating.
    • Check application logs related to every migrated engine.
    • Check CPU and RAM usage.
  8. If all was good
    1. Define a timeframe when we can proceed, reducing the impact of the migration to the minimun.
    2. Notify Board members about the migration and the estimated downtime.
    3. Start migrating Production services.

After all the migrations before moving forward with production I was pretty confident with all my notes and also all steps needed to complete the task seamlessly. Indeed the migration was successful but we encountered two issue after the migration:

  1. The RDS engine CPU usage increased around 15 - 20% with some spikes.
    • This was resolved after cleaning a bunch of legacy queries that were working different witht he new engine version.
  2. Late that night the data analysis team complained that the binary logs were not replicating to a thrid party Big Data application.
    • Besides the late night and that they were impatient for something we can solve the next day as they were not evaluating data at that moment, this was easly fix by creating the parameter group missing and the replication was resumed.

This was a great experience with database and I hope in the near future we can have some hands-on videos explaining how to perform the migrations.