top of page
Writer's pictureVaibhav Deshpande

Tutorial blog to use Amazon Athena to query data from S3

Updated: Oct 25, 2023

Learning Objective:

  • Learn to use Amazon Athena to query data from S3

Step 1:

To create an Athena database:

a. Go to AWS console and create an S3 bucket called athenabucket221.

b. In AWS Management Console, go to Athena. If you haven’t accessed Athena before, choose Explore the query editor. Otherwise, Athena opens directly in query editor.

c. Set the query result location in Edit Settings.

c. In Manage settings, click on Browse S3 and Choose the previously created S3 bucket. Click on Save.

d. Click on Editor to go back to query editor.

e. Create a database named mydatabase01 by typing the following statement in the right side query panel. Run the query.

CREATE DATABASE mydatabase01

f. Choose mydatabase01 from Database list in left hand panel.

Step 2:

  • To create a table: For this tutorial we will be using sample AmazonCloudfront log data stored at s3://athena-examples-myregion/cloudfront/plaintext/ (myregion is to be replaced by your region name eg. ap-south-1). The data is in tab separated format.

a. Create a new query pane by pressing on + and type the following query in it.

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (

`Date` DATE,

Time STRING,

Location STRING,

Bytes INT,

RequestIP STRING,

Method STRING,

Host STRING,

Uri STRING,

Status INT,

Referrer STRING,

os STRING,

Browser STRING,

BrowserVersion STRING

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"

) LOCATION 's3://athena-examples-myregion/cloudfront/plaintext/';

b. Replace the myregion in the LOCATION statement with your region name eg. ap-south-Run the query.

Step 3:

To run a query:

a. Open a new query tab and give the following SQL statement. Run the query.

SELECT os, COUNT(*) count

FROM cloudfront_logs

WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05'

GROUP BY os

b. To save the results of the query to a .csv file, choose Download results.

c. To view or run previous queries, choose the Recent queries tab.

d. To save result of previous queries, select queries from Recent queries tab and click on Download result.

e. To download SQL query strings to csv, click on Download csv.

Note: Delete the database using below command

DROP DATABASE mydatabase CASCADE;


Also S3 bucket.


Was this document helpful? How can we make this document better? Please provide your insights. You can download PDF version for reference.




We provide the best AWS training from Pune, India.

For more information on aws certification contact us now.


Recent Posts

See All

16 Comments


Done.


Like

useful blog


Like

Well explained


Like

very useful sir


Like

Very interesting blog!

Like
bottom of page