Author: Syam Patnala
Project Overview:
In this project, you will see the implementation of the supervised ML model to predict the number of cases, deaths, and recoveries for COVID for the upcoming three months.
For the ML, we have used the Stacked LSTM Model.
A Stacked LSTM architecture can be defined as an LSTM model composed of multiple LSTM layers. An LSTM layer provides a sequence output rather than a single value output to the LSTM layer below. Particularly one output per input time step instead of one output time step for all input time steps.
LSTM stands for ‘long short-term memory’ networks, employed in the sector of Deep Learning. It's an extension of recurrent neural networks (RNNs) that are capable of learning long-term dependencies, mainly in sequence prediction problems.
Prerequisites:
Python version 3.7 or above
Snowflake Account
AWS Cloud Services - Amazon EC2, Amazon s3, AWS Lambda, AWS Cloud Watch, AWS secrets manager
Introduction:
Snowflake may be a fully managed SaaS (software as a service) that gives one platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing & consumption of real-time / shared data.
Amazon Elastic Compute Cloud (EC2) could be a part of Amazon.com’s cloud-computing platform, Amazon Web Services, that enables users to rent virtual computers on which to run their computer applications.
Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that gives object storage through an internet service interface. Amazon S3 uses the identical scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
Amazon CloudWatch may be a monitoring and management service that has data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. You'll be able to collect and access all of your performance and operational data within the type of logs and metrics from one platform instead of monitoring them in a server, network, or database.
AWS Secrets Manager is employed to store the keys and credentials in an exceedingly centralized, secure place. AWS Secrets Manager helps you protect secrets that are used for accessing your applications, services, and other IT resources. The service enables you to easily manage, rotate, and retrieve the credentials of the database, API keys, and many other secrets throughout their lifecycle. Secrets are retrieved by users and applications with a call to the Secrets Manager APIs by eliminating the need to use sensitive information in the plain text directly. Additionally, Secrets Manager enables you to manage access to secrets using fine-grained permissions and audit secret rotation central place for resources within the AWS Cloud, third-party services, and on-premises.
AWS Lambda may be a “serverless” compute resource “in AWS cloud” that enables you to run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including software maintenance and server, automatic scaling and capacity provisioning, code monitoring, and logging. With Lambda, you'll be able to run code for virtually any form of application or backend service. All you would like to try and do is supply your code in one of the languages that Lambda supports.
Implementation:
We have an ML code that works on COVID live data taken from Snowflake
To secure the credentials of snowflake, we’ve used AWS secrets manager (Reference:- https://www.youtube.com/watch?v=jgQaGhx_YaQ)
The below is the code snippet that implemented aws secrets manager to cover/hide snowflake credentials and for retrieving covid data:
client=boto3.client('secretsmanager',region_name='ap-south-1')
response = client.get_secret_value(
SecretId='ml_creds'
)
cred=json.loads(response['SecretString'])
# Connection string
conn = snowflake.connector.connect(
user=cred['user'],
password=cred['password'],
account=cred['account'],
warehouse=cred['warehouse'],
database=cred['database'],
schema=cred['schema']
)
# Create cursor
cur = conn.cursor()
# Execute SQL statement
cur.execute("select * from COVID_DATA;")
dft = cur.fetch_pandas_all()
dft=dft.drop_duplicates()
Below is a sample code of Stacked LSTM :
model=Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(90,1)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer='adam')
# In[27]:
model.summary()
# In[28]:
model.fit(cases_X_train,cases_y_train,validation_data=(cases_X_test,cases_ytest),epochs=100,batch_size=64,verbose=1)
# In[29]:
import tensorflow as tf
# In[30]:
tf.__version__
# In[31]:
### Let us do the prediction and then check the performance metrics
cases_train_predict=model.predict(cases_X_train)
cases_test_predict=model.predict(cases_X_test)
# In[32]:
##Transformback to original form
cases_train_predict=cases_scaler.inverse_transform(cases_train_predict)
cases_test_predict=cases_scaler.inverse_transform(cases_test_predict)
### Calculate RMSE performance metrics
import math
from sklearn.metrics import mean_squared_error
math.sqrt(mean_squared_error(cases_y_train,cases_train_predict))
### Test Data RMSE
math.sqrt(mean_squared_error(cases_ytest,cases_test_predict))
### Plotting
# shift train predictions for plotting
look_back=90
trainPredictPlot = numpy.empty_like(cases)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(cases_train_predict)+look_back, :] = cases_train_predict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(cases)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(cases_train_predict)+(look_back*2)+1:len(cases)-1, :] = cases_test_predict
len(cases_test_data)
x_input=cases_test_data[159:].reshape(1,-1)
x_input.shape
# In[38]:
temp_input=list(x_input)
temp_input=temp_input[0].tolist()
# In[39]:
temp_input
# In[40]:
# demonstrate prediction for next 10 days
from numpy import array
lst_output=[]
n_steps=90
i=0
while(i<90):
if(len(temp_input)>90):
#print(temp_input)
x_input=np.array(temp_input[1:])
print("{} day input {}".format(i,x_input))
x_input=x_input.reshape(1,-1)
x_input = x_input.reshape((1, n_steps, 1))
#print(x_input)
yhat = model.predict(x_input, verbose=0)
print("{} day output {}".format(i,yhat))
temp_input.extend(yhat[0].tolist())
temp_input=temp_input[1:]
#print(temp_input)
lst_output.extend(yhat.tolist())
i=i+1
else:
x_input = x_input.reshape((1, n_steps,1))
yhat = model.predict(x_input, verbose=0)
print(yhat[0])
temp_input.extend(yhat[0].tolist())
print(len(temp_input))
lst_output.extend(yhat.tolist())
i=i+1
print(lst_output)
We have stored the predicted data in CSV format and uploaded them into an s3 bucket directly using python and is implemented as follows:
df3=[''.join(map(str, l)) for l in df3]
case=pd.DataFrame({'CASES':df3})
case.index=x["DATE"]
case.to_csv("Cases.csv")
df4=[''.join(map(str, l)) for l in df4]
death=pd.DataFrame({'DECEASED':df4})
death.index=x["DATE"]
death.to_csv("Deaths.csv")
df5=[''.join(map(str, l)) for l in df5]
recover=pd.DataFrame({'RECOVERED':df5})
recover.index=x["DATE"]
recover.to_csv("Recovered.csv"
s3 = boto3.resource(
service_name='s3'
)
for bucket in s3.buckets.all():
print(bucket.name)
s3.Bucket('ml-covid-prediction').upload_file(Filename='Cases.csv', Key='COVID_Prediction/Cases.csv')
s3.Bucket('ml-covid-prediction').upload_file(Filename='Deaths.csv', Key='COVID_Prediction/Deaths.csv')
s3.Bucket('ml-covid-prediction').upload_file(Filename='Recovered.csv', Key='COVID_Prediction/Recovered.csv')
Deploying ML Model in AWS EC2 Service:
To deploy it in one place and have to run at a specific time, we used an Amazon EC2 instance with Ubuntu operating system.
Now we uploaded our ML file into that instance and created a task to run the ML code using crontab, and this crontab will execute the file at a specific time.

This will work only when the instance is running.
Our instance will be useful only when the code is running, and if the instance is running for more time, then it will cost us more.
So we used lambda functions to start out and stop at a particular time at which the crontab will run.


Now to run the Lambda functions at a specific time, we use AWS Cloud Watch, and the Cloud Watch will run Lambda functions to start and stop the instance, during which the crontab will also start during this instance running period and executes the ML file.

Output:
Below are the outputs for the ML code:
For each graph, the X-axis represents the number of days, and the Y-axis represents the total number of cases, deaths, and recoveries of multiple 100,00,000 (Due to less space, the number of cases, deaths, and recoveries are taken as 0, 2, 4….)
Total Predicted Cases:

The above picture represents a graph for the total number of cases during which the red line indicates predicted cases and blue represents the particular or past cases.
Total Predicted Deaths:-

The above picture represents a graph for the total number of deaths within which the red line indicates predicted deaths and blue represents the particular or past deaths.
Total Predicted Recoveries:

The above picture represents a graph for the total number of recoveries within which the line indicates predicted recoveries and blue represents the particular or past recoveries.
Full Code:
Conclusion:
This project is about predicting the overall number of cases, deaths, and recoveries for the upcoming three months.
We have used the Stacked LSTM model to get more accurate and deeper results.
References:
About Stacked LSTM model- https://machinelearningmastery.com/stacked-long-short-term-memory-networks/#:~:text=A%20Stacked%20LSTM%20architecture%20can,for%20all%20input%20time%20steps.
AWS Secrets manager- https://www.youtube.com/watch?v=jgQaGhx_YaQ
AWS S3 using boto 3 to access with python- https://realpython.com/python-boto3-aws-s3/
AWS S3 using boto 3 to use within python- https://hands-on.cloud/working-with-s3-in-python-using-boto3/