Wednesday, July 6, 2022
  • Login
  • Register
IT Skills You Need
  • Home
  • Monitoring
  • DevOps
  • AI
  • Big-data
  • Automation
  • Agile
  • ITIL
  • AIOps
No Result
View All Result
  • Home
  • Monitoring
  • DevOps
  • AI
  • Big-data
  • Automation
  • Agile
  • ITIL
  • AIOps
No Result
View All Result
IT Skills You Need
No Result
View All Result
Home Big-Data

How to evaluate the benefits of AQUA for your Amazon Redshift workloads

admin by admin
18 February 2022
in Big-Data
0 0
0
How to evaluate the benefits of AQUA for your Amazon Redshift workloads
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

Amazon Redshift is the cloud data warehouse of choice for tens of thousands of customers, who use it to analyze exabytes of data to gain business insights. Customers have been asking us for better performance at scale as the volume, variety, velocity, and veracity of their data grows. We have added several features to Amazon Redshift that enable you to get up to three times better price performance with Amazon Redshift than other cloud data warehouses.

In addition, we launched AQUA (Advanced Query Accelerator) for Amazon Redshift to help you cost-effectively run analytics at the new scale of data. AQUA is a distributed and hardware-accelerated cache that enables Amazon Redshift to run an order of magnitude faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA uses AWS-designed processors with AWS Nitro chips adapted to speed up data encryption and compression, and custom analytics processors, implemented in FPGAs, to accelerate operations such as scans, filtering, and aggregation. AQUA is available with the RA3.16xlarge, RA3.4xlarge, and RA3.xlplus nodes at no additional charge and requires no code changes. You can enable AQUA for your existing Amazon Redshift RA3 clusters or launch a new AQUA-enabled RA3 cluster via the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). To learn more about AQUA, see Working with AQUA (Advanced Query Accelerator).

This post walks you through steps to quantify expected benefits from AQUA for Amazon Redshift for your workloads. We cover the methodology used for testing AQUA and share the scripts, a sample dataset, and queries so you can test AQUA in your own environment. We have published scripts on GitHub (along with a README file), which you need as you follow the steps in this post. Download all scripts to your working directory from where you intend to connect to your Redshift clusters.

Solution overview

At a high-level, the AQUA test process involves the following steps:

  1. Create a test cluster to evaluate AQUA and make sure that AQUA is turned on. For instructions, see Working with AQUA (Advanced Query Accelerator).
  2. Analyze your workload for AQUA and capture eligible queries. To learn more about the types of queries accelerated by AQUA, refer to When does Amazon Redshift use AQUA to run queries?
  3. Run the workload with AQUA activated and deactivated on your test cluster.
  4. Compare performance results.

This approach is a straightforward yet practical way to measure the performance gain you can expect to get for your workload by turning on AQUA.

With the exception of one workload parsing and analyzing script (aqua_capture_query.sh), you can run the rest of the scripts on a test cluster. We recommend using a test cluster to minimize the impact to your production cluster.

Create a test cluster

Create a snapshot from a production cluster that has read-heavy workloads and restore it as test cluster.

Analyze your workload and capture eligible queries

Run aqua_capture_query.sh on the production cluster to determine the queries suitable for acceleration by AQUA. We recommend choosing workloads with SELECT queries that use LIKE or SIMILAR TO functions in the WHERE clause and scan, filter, and aggregate large datasets. aqua_capture_query.sh scans the query history on your production cluster and captures queries that AQUA can accelerate.

The script runtime may exceed several minutes depending on the selected time interval (analyze_starttime and analyze_endtime), size of the cluster, and workload complexity. We suggest restricting the script runtime by limiting the time interval to the lesser of your workload runtime or 3 hours. Alternately, you can evaluate AQUA using the Amazon Reviews sample dataset, which we demonstrate later in this post.

aqua_capture_query.sh saves an output file named aqua_eligible_queries to your working directory and contains a subset of most suitable AQUA-eligible queries.

The following is a sample output by the script:

select count(*) from amazon_reviews where product_title SIMILAR TO '%lap%' group by star_rating ORDER BY star_rating desc;
select count(*) from amazon_reviews where product_title ilike '%e%|%E%' or customer_ID like '3%__%45__3';

If your workload history doesn’t have enough AQUA-eligible queries, the script reports no eligible queries found. If this happens, run the script with different date/time parameters. If you still don’t see any queries, you can try using the sample dataset and queries provided in this post. For more information about AQUA-eligible queries, refer to When does Amazon Redshift use AQUA to run queries?

Run the workload on your test cluster

Run aqua_execute_query.sh on your test cluster. The script runs the captured queries on your cluster repeatedly to get consistent performance by reducing the impact of runtime difference due to environmental factors with AQUA activated and deactivated. The script records the start date/time and end date/time to a file named workload_datetime.txt in your working directory.

Compare performance results

When aqua_execute_query.sh script is complete, run aqua_perf_compare.sh, which generates a CSV file named aqua_benefit in your working directory. The following table summarizes the sample output.

Query ID Amazon Redshift with AQUA turned onQuery ID Amazon Redshift with AQUA turned offRuntime in seconds with AQUA turned onRuntime in seconds with AQUA turned offSpeedup
(Column C/Column B)
115319411533702.759.422.0
1153214115345622.8104.94.6
3346293346311.6664335.822.25
3348503346721.7129726.915.7
3349843349981.8505126.914.5
3348303347936.4788487.213.4
3348283347406.697487.113

This table shows the query identifiers and runtime of the queries with AQUA activated and deactivated. You can compare the benefits offered by AQUA by reviewing the speedup column.

Example use case with the Amazon Reviews dataset

To test AQUA with the Amazon Reviews sample dataset, perform the following steps:

  1. Create a two-node RA3.4xlarge cluster by issuing the following command:
    aws redshift create-cluster --cluster-identifier test-amazon-reviews --node-type ra3.4xlarge --number-of-nodes 2 --master-username adminuser --master-user-password <xxpasswordxx> --aqua-configuration-status enabled

  2. Create a test database on our Amazon Redshift cluster by issuing the following command:
    CREATE DATABASE TestDB;

  3. Load the table with the Amazon Reviews dataset by running the script load_amazon_sentiments_data.sql.
  4. Run a few AQUA-eligible queries (similar to the following) multiple times with and without AQUA activated using the script execute_test_queries.sh:
    select count(*) from amazon_reviews WHERE product_title SIMILAR TO '%lap%' or product_title SIMILAR TO '%hope%' or product_title SIMILAR TO '%nice%' or product_title SIMILAR TO '%soa%';

More evaluation queries are available on the GitHub repo.

  1. Run aqua_perf_compare.sh on the test cluster.

The following table shows that AQUA accelerated the queries 5–22 times faster.

Query ID Amazon Redshift with AQUA turned onQuery ID Amazon Redshift with AQUA turned offRuntime in seconds with AQUA turned onRuntime in seconds with AQUA turned offSpeedup
(Column C/Column B)
3642023642172.3341151.5620722.09073
3349843349981.8505126.914.5
3348303347936.4788487.213.4
3348283347406.697487.113

Summary

This post provides a straightforward yet practical way to measure the performance gain you can expect to get for your workload by turning on AQUA. You can use the scripts provided and test with your own dataset and queries, or use the sample dataset and queries to determine AQUA’s impact.

We continue to invest and launch new capabilities like AQUA for Amazon Redshift to make sure Amazon Redshift continues to improve as your data warehouse needs grow. For AQUA, we continue to add hardware acceleration for more SQL operators, functions, predicates, data types, and file formats so more scans, filters, and aggregations can be pushed down to AQUA. Pushdowns to AQUA remain transparent so Amazon Redshift decides when to push queries down to AQUA to take advantage of hardware acceleration. And when queries don’t get pushed down to AQUA, they continue to run on Amazon Redshift as before.

We invite you to test AQUA for yourself and share the findings.


About the Authors

Dinesh Kumar is a Database Engineer at AWS focusing on Amazon AQUA. He works with customers to build highly scalable data warehouse and high performant database solutions. Outside work, he enjoys gardening and spending time with his family.

Manan Goel is a Product Go-To-Market Leader for AWS Analytics Services including Amazon Redshift & AQUA at AWS. He has more than 25 years of experience and is well versed with databases, data warehousing, business intelligence, and analytics. Manan holds a MBA from Duke University and a BS in Electronics & Communications engineering.

Tags: awsbig-dataData sciencescientiste

Get real time update about this post categories directly on your device, subscribe now.

Unsubscribe
How Netflix Utilizes User’s Data to Create Personalized User Experience

How Netflix Utilizes User’s Data to Create Personalized User Experience

30 October 2021
Are You Up for an AP Challenge?

Are You Up for an AP Challenge?

29 September 2021
Acronyms

Acronyms

9 November 2021
The 5 best team chat apps in 2022

The 5 best team chat apps in 2022

5 May 2022
Top Trends for Big Data Analytics in Healthcare

Top Trends for Big Data Analytics in Healthcare

2 December 2021
How Big Data and Location Intelligence Open up New Horizons for Businesses and Change the World

How Big Data and Location Intelligence Open up New Horizons for Businesses and Change the World

6 December 2021
Important Considerations When Migrating to a Data Lake

Important Considerations When Migrating to a Data Lake

30 March 2022

8 Data-Driven Strategies to Improve Customer Engagement

16 May 2022

How automation helped Halo Cars scale

5 November 2021

Meet Atlas Authority – Easy Agile Partner

22 July 2020

Agile 101: A Beginner’s Guide to Agile Methodology

31 May 2021

Introduction to Energy Analytics 101

22 June 2022

8 Reasons Data-Driven Companies Are Utilizing Email Marketing

29 June 2022

Digitally preserving and restoring Ancient Olympia as it stood over 2,000 years ago

22 November 2021

Tools for Progress During Agile Transformation

30 November 2021

Handy Tips #19: Preventing alert storms with trigger dependencies

6 January 2022
IT Skills You Need

© 2022 IT SKILLS YOU NEED

Navigate Site

  • Activity
  • Classifieds
  • Groups
  • Home
  • Members

Follow Us

No Result
View All Result
  • Home

© 2022 IT SKILLS YOU NEED

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.