Author: Ishwarya Balasubramaniyan
Downloading, extracting, and uploading files to AWS S3 is achievable by python scripts. Now the question is how to achieve this without the local machine or any server machine being involved for mounting the download of the zip folder. We introduce AWS EC2 at this point, where the python script that does the action of downloading, extracting, and uploading the file to S3 is hosted. The server instance in EC2 with the necessary configuration is chosen from a lot of options provided by AWS.
Once the script is hosted in EC2 and a crontab job is scheduled for the script to be triggered every day, the files regularly get uploaded to the configured S3 bucket from the EC2 instance, and we can also clear the memory of the EC2 instance by deleting the file once it is copied to the S3 bucket. This way, the EC2 instance will only be the mounting point for the file extraction from the zip folder.
Step 1: Create EC2 Instance
Create an EC2 Instance in AWS. For that, follow the steps: Link
Save the ssh key pair in a folder.
Step 2: Grant Access to the .pem file in the local
Run the following commands by replacing your .pem file path.
In Windows PowerShell:
PS C:\Users\ishwaryabb\Downloads\Mini Project> cd AWS
PS C:\Users\ishwaryabb\Downloads\Mini Project\AWS> $path = ".\sshkeypair.pem"
PS C:\Users\ishwaryabb\Downloads\Mini Project\AWS> icacls.exe $path /reset
PS C:\Users\ishwaryabb\Downloads\Mini Project\AWS> icacls.exe $path /GRANT:R "$($env:USERNAME):(R)"
PS C:\Users\ishwaryabb\Downloads\Mini Project\AWS> icacls.exe $path /inheritance:r
Note: Have the sshkeypair.pem file in a password-protected user directory.
Step 3: Connect to EC2 Instance via CMD
Select the EC2 instance and click on the connect button.
In that, copy the Example command in the SSH client.
Now open the cmd prompt (or any other CLI) and navigate to the directory where the ssh key pair pem file is stored.
Paste the copied command in that directory.
ssh -i "sshkeypair.pem" email@example.com
Once this command is run (with proper access in case of non-root users), CLI will be connected to the EC2 Instance.
Note: Replace the sshkeypair.pem to whatever name you’ve saved your .pem file.
Step 4: Create Directory and Upload Python Script
Create the directories to organize the python script and other project-related files.
Use the scp command to upload the .py script from local to the EC2 instance directory.
scp -i "C:\Users\ishwaryabb\Downloads\Mini Project\AWS\sshkeypair.pem" "C:\Users\ishwaryabb\Downloads\Mini Project\AWS\gdelt_script.py" ec2-13-233-247-25.ap-south-1.compute.amazonaws.com:/home/ec2-user/Project/script
Step 5: Install Python 3 & Download all the supporting python packages in the root directory
Run the following commands to install python 3, configure pip and install the required packages.
sudo yum install python3
sudo yum install python35-pip
Example: pip install boto3
Step 6: Run the Python Script
Run the python script using the python3 script_name.py command.
Step 7: Schedule a crontab
Schedule a crontab to run the python script (the instance uses UTC timezone).
sudo crontab -e – Edit your crontab file, or create one if it doesn't already exist.
36 22 * * * /usr/bin/python3 /script/gdelt_script.py
sudo crontab -l – Display your crontab file.
sudo crontab -r – Remove your crontab file.
Run this command in the EC2 instance (any directory)
Once all steps are completed and the EC2 instance is in a running state, the scheduled python script will run every day at the mentioned time. Note that crontab has to be configured every time the instance is started after stopping.
Downloading, extracting, and copying files to S3 is achieved in 7 steps. First, an EC2 instance is created in AWS with a configuration based on the maximum size of the file to be downloaded. Later the system is configured so that we can connect to the EC2 cloud instance via the command line interface. Now directories are created as needed so that the python script can be hosted. Python 3 is installed, and all the libraries required for the script to run are installed in the EC2 instance via the CLI. Finally, the python script is scheduled using crontab, and it runs at the scheduled time.