Predicting The COVID Cases, Deaths And Recoveries For The Upcoming 3 Months

Author: Syam Patnala


Project Overview:
  • In this project, you will see the implementation of the supervised ML model to predict the number of cases, deaths and recoveries for COVID for the upcoming 3 months.

  • For the ML we have used the Stacked LSTM Model.

  • A Stacked LSTM architecture can be defined as an LSTM model composed of multiple LSTM layers. An LSTM layer provides a sequence output rather than a single value output to the LSTM layer below.Particularly, one output per input time step, instead of one output time step for all input time steps.

  • LSTM stands for ‘long short-term memory’ networks, employed in the sector of Deep Learning. It's an extension of recurrent neural networks (RNNs) that are capable of learning long-term dependencies, mainly in sequence prediction problems.


Prerequisites:
  • Python version 3.7 or above

  • Snowflake Account

  • AWS Cloud Services - Amazon EC2, Amazon s3, AWS Lambda, AWS Cloud Watch, AWS secrets manager

Introduction:
  • Snowflake may be a fully managed SaaS (software as a service) that gives one platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing & consumption of real-time / shared data.

  • Amazon Elastic Compute Cloud (EC2) could be a part of Amazon.com’s cloud-computing platform, Amazon Web Services, that enables users to rent virtual computers on which to run their computer applications.

  • Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that gives object storage through an internet service interface. Amazon S3 uses the identical scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.

  • Amazon CloudWatch may be a monitoring and management service that has data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. you'll be able to collect and access all of your performance and operational data within the type of logs and metrics from one platform instead of monitoring them in a server, network, or database.

  • AWS Secrets Manager is employed to store the keys and credentials in an exceedingly centralized secure place. AWS Secrets Manager helps you protect secrets that are used for accessing your applications, services, and other IT resources. The service enables you to easily manage, rotate, and retrieve the credentials of the database, API keys, and many other secrets throughout their lifecycle. Secrets are retrieved by users and applications with a call to the Secrets Manager APIs by eliminating the need to use sensitive information in plain text directly. Additionally, Secrets Manager enables you to manage access to secrets using fine-grained permissions and audit secret rotation central place for resources within the AWS Cloud, third-party services, and on-premises.

  • AWS Lambda may be a “serverless” compute resource “in AWS cloud” that enables you to run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including software maintenance and server, automatic scaling and capacity provisioning, code monitoring and logging. With Lambda, you'll be able to run code for virtually any form of application or backend service. All you would like to try and do is supply your code in one of the languages that Lambda supports.


Implementation:
  • We have an ML code that works on COVID live data took from Snowflake

  • To secure the credentials of snowflake we’ve used AWS secrets manager (Reference:- https://www.youtube.com/watch?v=jgQaGhx_YaQ)

  • The below is the code snippet that implemented aws secrets manager to cover/hide snowflake credentials and for retrieving covid data:

client=boto3.client('secretsmanager',region_name='ap-south-1')

response = client.get_secret_value(

SecretId='ml_creds'

)

cred=json.loads(response['SecretString'])

# Connection string

conn = snowflake.connector.connect(

user=cred['user'],

password=cred['password'],

account=cred['account'],

warehouse=cred['warehouse'],

database=cred['database'],

schema=cred['schema']

)

# Create cursor

cur = conn.cursor()

# Execute SQL statement

cur.execute("select * from COVID_DATA;")

dft = cur.fetch_pandas_all()

dft=dft.drop_duplicates()

  • Below is a sample code of Stacked LSTM :

model=Sequential()

model.add(LSTM(50,return_sequences=True,input_shape=(90,1)))

model.add(LSTM(50,return_sequences=True))

model.add(LSTM(50))

model.add(Dense(1))

model.compile(loss='mean_squared_error',optimizer='adam')

# In[27]:

model.summary()

# In[28]:

model.fit(cases_X_train,cases_y_train,validation_data=(cases_X_test,cases_ytest),epochs=100,batch_size=64,verbose=1)

# In[29]:

import tensorflow as tf

# In[30]:

tf.__version__

# In[31]:

### Let us do the prediction and then check the performance metrics

cases_train_predict=model.predict(cases_X_train)

cases_test_predict=model.predict(cases_X_test)

# In[32]:

##Transformback to original form

cases_train_predict=cases_scaler.inverse_transform(cases_train_predict)

cases_test_predict=cases_scaler.inverse_transform(cases_test_predict)

### Calculate RMSE performance metrics

import math

from sklearn.metrics import mean_squared_error

math.sqrt(mean_squared_error(cases_y_train,cases_train_predict))

### Test Data RMSE

math.sqrt(mean_squared_error(cases_ytest,cases_test_predict))

### Plotting

# shift train predictions for plotting

look_back=90

trainPredictPlot = numpy.empty_like(cases)

trainPredictPlot[:, :] = np.nan

trainPredictPlot[look_back:len(cases_train_predict)+look_back, :] = cases_train_predict

# shift test predictions for plotting

testPredictPlot = numpy.empty_like(cases)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(cases_train_predict)+(look_back*2)+1:len(cases)-1, :] = cases_test_predict

len(cases_test_data)

x_input=cases_test_data[159:].reshape(1,-1)

x_input.shape

# In[38]:

temp_input=list(x_input)

temp_input=temp_input[0].tolist()

# In[39]:

temp_input

# In[40]:

# demonstrate prediction for next 10 days

from numpy import array

lst_output=[]

n_steps=90

i=0

while(i<90):

if(len(temp_input)>90):

#print(temp_input)

x_input=np.array(temp_input[1:])

print("{} day input {}".format(i,x_input))

x_input=x_input.reshape(1,-1)

x_input = x_input.reshape((1, n_steps, 1))

#print(x_input)

yhat = model.predict(x_input, verbose=0)

print("{} day output {}".format(i,yhat))

temp_input.extend(yhat[0].tolist())

temp_input=temp_input[1:]

#print(temp_input)

lst_output.extend(yhat.tolist())

i=i+1

else:

x_input = x_input.reshape((1, n_steps,1))

yhat = model.predict(x_input, verbose=0)

print(yhat[0])

temp_input.extend(yhat[0].tolist())

print(len(temp_input))

lst_output.extend(yhat.tolist())

i=i+1

print(lst_output)


  • We have stored the predicted data in CSV format and uploaded them into s3 bucket directly using python and is implemented as follows:


df3=[''.join(map(str, l)) for l in df3]

case=pd.DataFrame({'CASES':df3})

case.index=x["DATE"]

case.to_csv("Cases.csv")

df4=[''.join(map(str, l)) for l in df4]

death=pd.DataFrame({'DECEASED':df4})

death.index=x["DATE"]

death.to_csv("Deaths.csv")

df5=[''.join(map(str, l)) for l in df5]

recover=pd.DataFrame({'RECOVERED':df5})

recover.index=x["DATE"]

recover.to_csv("Recovered.csv"

s3 = boto3.resource(

service_name='s3'

)

for bucket in s3.buckets.all():

print(bucket.name)

s3.Bucket('ml-covid-prediction').upload_file(Filename='Cases.csv', Key='COVID_Prediction/Cases.csv')

s3.Bucket('ml-covid-prediction').upload_file(Filename='Deaths.csv', Key='COVID_Prediction/Deaths.csv')

s3.Bucket('ml-covid-prediction').upload_file(Filename='Recovered.csv', Key='COVID_Prediction/Recovered.csv')


Deploying ML Model in AWS EC2 Service:
  • To deploy it at one place and have to run at a specific time, we used an Amazon EC2 instance with Ubuntu operating system.

  • Now we uploaded our ML file into that instance and created a task to run the ML code using crontab and this crontab will execute the file at a specific time



  • This will works only when the instance is running

  • Our instance will be useful only when the code is running and if the instance is running for more time then it will cost us more

  • So we used lambda functions to start out and stop at a particular time at which the crontab will run


For Stopping instance

To start instance
  • Now to run the Lambda functions at a specific time we use AWS Cloud Watch, and the Cloud Watch will run Lambda functions to start and stop the instance during which the crontab will also start during this instance running period and executes the ML file.


Cloud Watch to run Lambda functions
Output:

Below are the outputs for the ML code:

For each graph, the X-axis represents the number of days and the Y-axis represents the total number of cases, deaths and recoveries of multiple 100,00,000 (Due to less space the number of cases, deaths and recoveries are taken as 0, 2, 4….)


Total Predicted Cases:

  • The above picture represents a graph for the total number of cases during which the red line indicates predicted cases and blue represents the particular or past cases


Total Predicted Deaths:-


  • The above picture represents a graph for the total number of deaths within which the red line indicates predicted deaths and blue represents the particular or past deaths


Total Predicted Recoveries:


  • The above picture represents a graph for the total number of recoveries within which the line indicates predicted recoveries and blue represents the particular or past recoveries


Full Code:

Conclusion:
  • This project is about predicting the overall number of cases, deaths and recoveries for the upcoming 3 months.

  • We have used the Stacked LSTM model to get more accurate and deeper results.


References:


21 views0 comments

Recent Posts

See All