Author : Avantik Tiwari 1
Date of Publication :20th March 2023
Abstract: In this paper, we propose a multi-task end to end optimised system which works on the object detection and OCR (Optical Character Recognition) models of machine learning and by using them our aim is to provide assistance to visually impaired people. Unlike traditional methods, used for training we do not create a dataset we used the already trained model for object detection and for OCR we used Tesseract tool which can convert the text in images into an actual text and then we convert text fetched from object detection model and OCR model into and audio. In first step of training, we used coco dataset for object detect model and to implement the process of object detection we used fast-R CNN and YOLOv5 which are the region-based object detection algorithms and by using these we can also train and test the model on our custom dataset also. In the second step, the output image given by the object detection model goes to the OCR process in which the text which shows what is present in the image is converted into an actual text using Tesseract tool and then that text is converted into the audio.
Reference :