Data warehouse and migration

Natasha Ong
This is some text inside of a div block.
4 min read

In a nutshell:

Amazon Redshift is a data warehousing service that you can use for big data analytics. It helps you collect data from many sources and understand the relationships and trends across them.
Amazon Database Migration Service (DMS) facilitates secure and easy migration of databases to AWS. The migration's source and target databases don't have to be of the same type (a real life saver)!

Data Warehouses

Modern databases are optimised for fast, real-time data. This makes relational databases great for daily analysis - for example, "How many customers do we have right now?" - but historical analytics require a different approach.

What is historical analytics? What's the problem?

  • Historical analytics answers questions like "show me how x has improved ever since we started" or "show me last hour's sales numbers across all the stores."
  • Data collection never stops.
  • The volume of data that will go through historical analytics will overwhelm even the strongest traditional relational database.
  • Even worse, the variety of data can be a problem. You want to run analytics against data coming from different sources, like your inventory, financial, and sales systems? A single query against multiple databases sounds nice, but databases don't handle them easily.

What's the solution?

Once data becomes too complex to handle with traditional relational databases, you've entered the world of data warehouses. Data warehouses are designed specifically for complex data and looking back in time.

Amazon Redshift

Amazon Redshift = data warehousing as a service. When you need big data business intelligence solutions, Redshift gets you started with a single API call. Less time waiting for results, more time getting answers.

  • Redshift can handle large datasets and is massively scalable:
  • Redshift servers in multiple petabyte* sizes are very common.
  • In fact, in cooperation with Amazon Redshift Spectrum, you can directly run a single SQL query against exabytes** of unstructured data running in data lakes***.
  • Offers 10 times high performance than traditional databases for business intelligence**** workloads.
Are you familiar with expressing storage in GB? For example, saying a phone has 200GB of storage. Well, a terabyte is 1,024 gigabytes! Then, a petabyte = 1,024 terabytes.  
An exabyte is even larger than a petabyte. It's equal to 1,024 petabytes!
Data lakes hold heaps of raw data. Raw data is unprocessed and unstructured information that is collected and stored in its original form. It's typically disorganised and not ready to use.
Business intelligence means using technology to turning data into valuable insights that helps businesses make decisions.

AWS Database Migration Service (AWS DMS)

We've been talking about databases and the various database options on AWS. These options are very exciting for all kinds of businesses and users, and make a huge difference to how tech companies run.

But what happens if you have a database that's on-premises or in the cloud already? Does that mean you have to start from scratch or does AWS have a magical way to help you migrate your existing database?

Thankfully, AWS offers a service called Amazon Database Magic.

Oops, scratch that. It's actually Amazon Database Migration Service (DMS). AWS DMS lets you migrate relational databases, non-relational databases, and other types of data storage.

You move data between a source database and a target database, which can be of the same type or different types.

  • Same types i.e. homogeneous migrations: for example, an on-premise MySQL database (source) to Amazon RDS for MySQL (target). The process is fairly straightforward since schema structures, data types, and database code is compatible between source and target.
  • Different types i.e. heterogeneous migrations: for example, an on-premise MySQL database (source) to Amazon Aurora (target). Heterogeneous migrations requires you to do some schema conversion first, before migrating. You can use the handy AWS Schema Conversion Tool for this, which we'll cover later in the course.

Key facts

  • The source database can be:
  • Located on-premises
  • Running on Amazon EC2 Instances
  • An Amazon RDS database.
  • During the migration, your source database remains operational, reducing downtime for anything that relies on the database.

Use cases for AWS DMS

  • Developers wanting to test their apps using real, live data without affecting their current users. In this case, you use DMS to migrate a copy of your production database to your dev or test environments, either once-off or continuously.
  • Combining databases into a single database.
  • Sending ongoing copies of your data to different target sources instead of doing a one-time migration. This could be for disaster recovery or because of geographic separation.
  • Making regular copies of your data and sending them to different places, as a safety net for your information. This is useful for two main reasons:

a) Disaster recovery: these copies are like backup plans for your data. If something goes wrong with your main data source, these copies step in to save the day.

b) Geographic separation: Sometimes, you want to make sure people from different areas can quickly access your data. If your data is hosted in Australia, it would take a long time for someone in London to access your data... unless there's a copy somewhere in Europe! These copies help speed things up for them.