top of page

Tutorial blog to use Amazon Athena to query data from S3

Writer's picture: Vaibhav DeshpandeVaibhav Deshpande

Updated: Oct 25, 2023

Learning Objective:

  • Learn to use Amazon Athena to query data from S3

Step 1:

To create an Athena database:

a. Go to AWS console and create an S3 bucket called athenabucket221.

b. In AWS Management Console, go to Athena. If you haven’t accessed Athena before, choose Explore the query editor. Otherwise, Athena opens directly in query editor.

c. Set the query result location in Edit Settings.

c. In Manage settings, click on Browse S3 and Choose the previously created S3 bucket. Click on Save.

d. Click on Editor to go back to query editor.

e. Create a database named mydatabase01 by typing the following statement in the right side query panel. Run the query.

CREATE DATABASE mydatabase01

f. Choose mydatabase01 from Database list in left hand panel.

Step 2:

  • To create a table: For this tutorial we will be using sample AmazonCloudfront log data stored at s3://athena-examples-myregion/cloudfront/plaintext/ (myregion is to be replaced by your region name eg. ap-south-1). The data is in tab separated format.

a. Create a new query pane by pressing on + and type the following query in it.

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (

`Date` DATE,

Time STRING,

Location STRING,

Bytes INT,

RequestIP STRING,

Method STRING,

Host STRING,

Uri STRING,

Status INT,

Referrer STRING,

os STRING,

Browser STRING,

BrowserVersion STRING

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"

) LOCATION 's3://athena-examples-myregion/cloudfront/plaintext/';

b. Replace the myregion in the LOCATION statement with your region name eg. ap-south-Run the query.

Step 3:

To run a query:

a. Open a new query tab and give the following SQL statement. Run the query.

SELECT os, COUNT(*) count

FROM cloudfront_logs

WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05'

GROUP BY os

b. To save the results of the query to a .csv file, choose Download results.

c. To view or run previous queries, choose the Recent queries tab.

d. To save result of previous queries, select queries from Recent queries tab and click on Download result.

e. To download SQL query strings to csv, click on Download csv.

Note: Delete the database using below command

DROP DATABASE mydatabase CASCADE;


Also S3 bucket.


Was this document helpful? How can we make this document better? Please provide your insights. You can download PDF version for reference.




We provide the best AWS training from Pune, India.

For more information on aws certification contact us now.


Recent Posts

See All

16 Comments


Piyush Patil
Piyush Patil
Jul 14, 2024

Done.


Like

Yogita Shinde
Yogita Shinde
May 13, 2024

useful blog


Like

Rucha Kulkarni
Rucha Kulkarni
Feb 15, 2024

Well explained


Like

Tanaya Yalrute
Tanaya Yalrute
Feb 15, 2024

very useful sir


Like

Shruti Bhosekar
Shruti Bhosekar
Feb 14, 2024

Very interesting blog!

Like
bottom of page