Author : Yuvaraj Puranam 1
Date of Publication :30th June 2023
Abstract: Named Entity Recognition (NER) can be useful, where the system actually tells you what entity it is. NER is the system where you try to extract different entities from the text. OCR technique plays an important role in identifying the entities. OCR basically stands as ‘Optical character-recognition’ which is used for extracting the text from the image and identifying the entities present in it. NER includes various steps for recognizing the entities from images. The first step includes image recognition that seeks to scan the image and identify the configuration. Those images which are having configuration above 30 can be considered and the rest are ignored. The OCR software then analyzes the scanned image and categorizes the light areas as background and the dark areas as text. The second step involves preprocessing. The OCR software then cleans the image, removes the errors and prepares it for reading. Third step is to recognize the text using NLP (Natural Language Processing) techniques, which also includes Word Embedding and Machine Learning. The next step is to tag the text using ‘BIO’ tagging. After analysis, the system converts the obtained data into a categorized file. The categorized file is then given as input for Bidirectional-Encoder Representations (BERT) model and the data is trained, and the accuracy is obtained from the testing data.
Reference :