Querying Amazon S3 Data with Athena: A Step-by-Step Guide.

Querying Amazon S3 Data with Athena: A Step-by-Step Guide.

What is Aws Athena?

Amazon Athena is an interactive query service provided by Amazon Web Services (AWS) that allows users to analyze data directly in Amazon Simple Storage Service (S3) using standard SQL queries. Athena is serverless, meaning you don’t need to manage any infrastructure. You simply point Athena at your data stored in S3, define the schema, and run SQL queries.

What is AWS Glue?

AWS Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS) that makes it easy to prepare and transform data for analytics. It automates much of the effort involved in data extraction, transformation, and loading (ETL) workflows, helping you to quickly move and process data for storage in data lakes, data warehouses, or other data stores.

In this blog we will attempt to query and analyze data stored in an S3 bucket using SQL statements. 

Task 1: Setup workgroup.

STEP 1: Navigate AWS Athena.

  • Select “Analyze your data using pyspark and spark sql.
  • Click on launch notebook editor.
Screenshot 2025 01 22 132707

STEP 2: Select workgroup and click on create workgroups.

Screenshot 2025 01 22 134048

STEP 3: Workgroup Name: Enter your choice of name

  • Description: Enter your choice of description
  • Click on Athena SQL and automatic.
Screenshot 2025 01 22 134349

STEP 4: Select your S3 bucket.

Screenshot 2025 01 22 134411

STEP 5: You will be see a created myworkgroup.

Screenshot 2025 01 22 134445

Task 2: Create a database in Glue.

STEP 1: Navigate AWS Glue, Click on database.

  • Click on Add database.
Screenshot 2025 01 22 135235

STEP 2: Enter the database name.

  • Click on create database.
Screenshot 2025 01 22 135335

STEP 3: You will see a created Database.

Screenshot 2025 01 22 135355

Task 3: Create a table in Glue.

STEP 1: Select table and click on add table.

Screenshot 2025 01 22 135426

STEP 2: Enter the name and select the database.

Screenshot 2025 01 22 135523

STEP 3: Select standard AWS glue table.

  • Data store : S3.
  • Data location : my account.
  • Path : S3 bucket.
  • Data format : CSV.
Screenshot 2025 01 22 135837
Screenshot 2025 01 22 135849

STEP 4: Schema : Define or upload schema.

  • Click on add.
Screenshot 2025 01 22 140431

STEP 5: Enter the name and data type.

Screenshot 2025 01 22 140526

STEP 6: Enter the name and data type(coloum 1).

Screenshot 2025 01 22 140616

STEP 7: After Enter the schema Click on next.

Screenshot 2025 01 22 140646

STEP 8: Click on create.

Screenshot 2025 01 22 140716

STEP 9: Select database and table.

Screenshot 2025 01 22 140942

STEP 10: Click on preview table.

Screenshot 2025 01 22 141449

STEP 10: Query editor will automatically generate the SQL statement for querying the first 10 columns.

  • The result of the query is shown below.
Screenshot 2025 01 22 141546
Screenshot 2025 01 22 141716

Conclusion.

In this guide, we’ve walked through the process of querying data stored in Amazon S3 using Amazon Athena, which offers a powerful, serverless, and cost-effective solution for interacting with large datasets in S3. Amazon Athena allows you to run SQL queries directly on data stored in Amazon S3, without the need to load it into a database. This makes querying large datasets more accessible, especially for users who are familiar with SQL. Athena’s serverless nature means there’s no infrastructure to manage. You don’t need to worry about provisioning, configuring, or maintaining servers. Athena scales automatically based on the complexity of your queries.

Tags: No tags

Add a Comment

Your email address will not be published. Required fields are marked *