Authors: Vipul Tripathi and Kodukula Sirisha
Overview
Optical Character Recognition (OCR) is a technology that enables the conversion of scanned images or scanned text into machine-readable text. It is a process of analyzing and recognizing text from an image, making it possible for machines to read and understand text, just like humans do. The application of OCR technology has grown tremendously in recent years, making it a crucial tool for various industries, including document management, financial services, healthcare, and education.
Applications of OCR
Document Management: It converts physical documents, such as invoices, contracts, and receipts, into digital formats for storage and retrieval. This makes document management more accessible and more efficient and reduces the risk of document loss or damage.
Financial Services: It can be used to process large amounts of financial data, such as bank statements and invoices, to extract relevant information and reduce manual data entry errors.
Healthcare: It is used to automate the process of reading and analyzing medical records and images, such as X-rays and MRIs. This enables healthcare professionals to make faster and more accurate diagnoses and reduce the risk of medical errors.
Education: It is also used in the education sector to convert textbooks, study materials, and exam papers into digital formats for students to access and study on their devices.
How we used OCR
The gist of our use case:
We have a website that asks users to write medical details from their medical report of whatever disease it is. Once the users submit the details, our ML model triggers and predicts whether the patient has that disease. Initially, all the process was manual. Users have to fill in all the details manually and carefully because a single typo may affect the final results. So, we thought of automating the whole process directly from the patient’s medical report to our ML model.
Now, users can directly upload images of a medical report from their phone on our website. This image will be stored on an S3 bucket, and our ML model will take it from there, read all the parameters, and predict and present the final result to our users, whether they have a disease or not.
Final Implementation with Explanation
For optical character recognition, we have used the docTR library.
docTR library: Provides a Predictor, which works as:
PreProcessor: This module is responsible for transforming inputs that can directly be used by the deep learning model.
Model: It is a deep learning model implemented with TensorFlow & PyTorch along with its specific post-processor to generate structured and reusable outputs.
Steps
1. Installing the docTR library is pretty simple; we just need to use the below command in our collab.


2. After installing the libraries, we need to import ocr_prediction from the docTR models, as shown below. The pre-trained argument is set as true to work with the pre-trained model.

3. Now we can give the input, which can be of any file type such as - a web link, pdf, images, or multiple images.

4. Finally, the result is stored in the variable and can be used in any form.

5. We can use multiple Python functions to store the result in the database.

Conclusion
OCR is a trending technology and is very useful for storing important files without much effort. The implementation of OCR using Python is straightforward, especially with the use of Python-docTR library. These models can be fine-tuned to work with specific data sets or trained from scratch with large amounts of labeled data.