Author: Syed Hasan
Text recognition has become an easy way to get all the text data from images but in OCR format. Multiple models have been created and are present to get OCR data from the image, even if the text is in a different font format.
Here, we will be taking the handwritten text instead of OCR as handwritten text cannot be easily fetched, and research is still on about the accuracy of the text. Also, the recognition requires a huge amount of coding and development work; we will make use of the cloud giants and their ready-made models. Moreover, for image recognition, all three cloud giants,i.e., Google, AWS, and Azure, have their recognition services. Here we will discuss AWS Recognition.
APIs that can be used for image recognition:
GCP: Vision API
AWS: AWS Recognition
Azure: Computer Vision
After reading multiple articles and the pros and cons of each API, we found that Recognition holds an upper hand in Image processing.
We will try to read the texts of the below image and will save it as a CSV file.
AWS configuration and Lambda setup
1. Process Workflow
The workflow goes in this way; we have an image of handwritten sheets as our input which will be stored on an S3 bucket. Once the data enters the bucket, lambda will be triggered, and the function will run. Image processing using AWS Recognition API and python boto3 will be used, and all the text will be taken out in a list later by adding conditions and parameters we will take out our required text. The text will be added to the CSV, and the CSV created will be stored in the S3 using our Boto3 client.
2. AWS configuration and Lambda setup
We need to have a lambda function with python language to run our code. Lambda is a serverless service provided by AWS cloud. To get the input data, we are making use of an S3 bucket where images will be added, and once the image is added, lambda will get triggered and will start the lambda function.
To create a lambda function, we will need an AWS user, role, and permission that will run lambda. We can create the user manually and attach the user to the lambda function when the function is created, or we can have the function create itself and make use of it.
We need two main permissions to use the lambda function:
Since we will be using S3 for our output as well, hence we are not taking read-only access. Instead, we are using FullAccess, which will give us write access as well.
Once the lambda function is created, it will create roles related to that function. We can give our two permissions to that role as well if not manually created.
After the lambda function is created, we will add S3 input by clicking on the left portion to add Trigger to our lambda UI.
Using the Event trigger of S3, we connected our S3 input bucket to our function.
We have now created our code environment and are ready to start our development.
3. Code Development
We are using python language for our development. To make use of AWS services Boto3 package is used.
Here, you can see the number of points added to the code flow for a quick understanding of the flow.
#1 We are connecting to the recognition service of AWS through the Boto3 client so that it does not need to connect anywhere in between the code.
#2 Lambda_handler function is the main function where the image processing takes place. The data such as session_name, host_name, event_date, and roll will be taken, and the CSV file will be created with the provided list. Later, the file will be added to the new S3 bucket using the put_object method.file name as parameters.
#3 create_csv function calls the detect_text function where the image processing takes place. The data such as session_name, ost_name, event_date, and roll will be taken, and the CSV file will be created with the provided list. Later, the file will be added to the new S3 bucket using the put_object method.
#4 detect_text method calls the recognition.detect_text API and takes out all the text detected in an image. Here, we can detect texts based on words or lines as fetched by the API. We have taken the roll number in the format that the text word should have five digits in it.
After performing loops and conditions, we get the final list of our data which is later added to the S3 in CSV format, as mentioned in step #3
Image Input added in S3:
With the above-mentioned steps, we can complete our text recognition process.
As text recognition is a step ahead of OCR recognition, we made use of the AWS Recognition service, which made our work quite easy. Recognizing text is still a research topic, but with the help of a ready-made tool, we are able to extract the text in list formats in python which eventually was used to figure out our required parameters. By using lambda and S3 triggers, we were able to identify text and record the data in CSV format. The latter is used to ingest data in Snowflake by using the standard snowpipe ingestion technique.
Here, we conclude that the process of extracting handwritten text to Snowflake is possible using AWS and Python.