Backing Up & Restoring Very Large Data Streams

Datetime:2017-04-20 07:05:47         Topic: DataBase          Share        Original >>
Here to See The Original Article!!!

Continuing our series ofcustomer stories, we highlight in this one how we helped an industrial manufacturer protect a very large DataStax Enterprise environment that is hosted in AWS.

Summary: A leading industrial manufacturer makes Talena a core component of their business-critical testing infrastructure

Industry: Manufacturing. A Fortune 500 manufacturing company is a leading provider of industrial machines and components for commercial and military use. Recently the company embarked on a digital transformation project to unlock the value of the large streams of data generated from these components to provide incredible insights and operational value to their customers.

Big Data Environment: To realize their vision, this organization has deployed a Big Data NoSQL platform powered by Datastax Enterprise (DSE) to capture machine test data. The database is hosted in Amazon AWS and is continuously ingesting data as they are undergoing testing. The customer has two 32-node Cassandra databases, one for production and the other one for testing, and all data is stored in a single 48 terabyte table which is growing every day as new data is ingested into the system.

Challenges: Their Cassandra database consists of one keyspace that has one large 48TB table. Backing up and recovering the single large table was very unreliable due to frequent failures occurring while the table was being backed up. The customer was using EBS storage as a backup destination and that cost was increasing rapidly to a point where the customer was looking for alternatives to their backup and recovery strategy. Secondly, the customer had a need to copy data from the production database to the test cluster every 6 hours so that the testing team had access to fresh data from production. This process was accomplished manually using scripts that were internally developed. A lot of engineering cycles were spent in developing, running, and debugging the process to ensure test data was available to the test team in a timely manner.

Solution: The customer has deployed a Talena cluster in Amazon AWS to address their backup & recovery needs as well as to copy data from the production database to the test database. Given the unique database environment that contained a single large 48 terabyte table, Talena’s incremental-forever technology proved very efficient and resilient by only moving data changes during the backup process. This made the backup process very reliable and fast for the customer. To reduce backup storage costs, the customer used the native Talena Amazon S3 integration and storage optimization capabilities. Using Talena, the customer is now able to store backup data on Amazon S3 at a significantly lower cost compared with Amazon EBS. In addition, the Talena de-duplication technology reduces the overall amount of data that needs to be copied and stored in Amazon S3, further reducing the backup storage costs.

The same Talena cluster mirrors production data to the test Cassandra environment. The customer automates the entire process by creating a mirroring workflow that copies the large table from the production database to the test database every six hours. After the first full copy, all subsequent data transfers to the development cluster are incremental only resulting in much faster transfers with significantly lower network bandwidth utilization.