What is Aws Athena?
Amazon Athena is an interactive query service provided by Amazon Web Services (AWS) that allows users to analyze data directly in Amazon Simple Storage Service (S3) using standard SQL queries. Athena is serverless, meaning you don’t need to manage any infrastructure. You simply point Athena at your data stored in S3, define the schema, and run SQL queries.
What is AWS Glue?
AWS Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS) that makes it easy to prepare and transform data for analytics. It automates much of the effort involved in data extraction, transformation, and loading (ETL) workflows, helping you to quickly move and process data for storage in data lakes, data warehouses, or other data stores.
In this blog we will attempt to query and analyze data stored in an S3 bucket using SQL statements.Â
Task 1: Setup workgroup.
STEP 1: Navigate AWS Athena.
- Select “Analyze your data using pyspark and spark sql.
- Click on launch notebook editor.
STEP 2: Select workgroup and click on create workgroups.
STEP 3: Workgroup Name: Enter your choice of name
- Description:Â Enter your choice of description
- Click on Athena SQL and automatic.
STEP 4: Select your S3 bucket.
STEP 5: You will be see a created myworkgroup.
Task 2: Create a database in Glue.
STEP 1: Navigate AWS Glue, Click on database.
- Click on Add database.
STEP 2: Enter the database name.
- Click on create database.
STEP 3: You will see a created Database.
Task 3: Create a table in Glue.
STEP 1: Select table and click on add table.
STEP 2: Enter the name and select the database.
STEP 3: Select standard AWS glue table.
- Data store : S3.
- Data location : my account.
- Path : S3 bucket.
- Data format : CSV.
STEP 4: Schema : Define or upload schema.
- Click on add.
STEP 5: Enter the name and data type.
STEP 6: Enter the name and data type(coloum 1).
STEP 7: After Enter the schema Click on next.
STEP 8: Click on create.
STEP 9: Select database and table.
STEP 10: Click on preview table.
STEP 10: Query editor will automatically generate the SQL statement for querying the first 10 columns.
- The result of the query is shown below.
Conclusion.
In this guide, we’ve walked through the process of querying data stored in Amazon S3 using Amazon Athena, which offers a powerful, serverless, and cost-effective solution for interacting with large datasets in S3. Amazon Athena allows you to run SQL queries directly on data stored in Amazon S3, without the need to load it into a database. This makes querying large datasets more accessible, especially for users who are familiar with SQL. Athena’s serverless nature means there’s no infrastructure to manage. You don’t need to worry about provisioning, configuring, or maintaining servers. Athena scales automatically based on the complexity of your queries.
Add a Comment