Author : Ashwani Kumar, Anita Devi, Er. Pooja
Date of Publication :15th December 2024
Abstract: Object detection is a computer vision technique that allows a system to locate and recognize an object in an image or video streams by plotting a rectangular box around it. This work describes a real-time object detection model that employs deep learning techniques and text-to-speech conversion. Yolov8 is renowned for its accuracy and speed of processing. The model broadcasts audio feedback about the detected object using gTTS. OpenCV and Python are used in the model's implementation, providing a broad assortment of techniques for computer vision uses. COCO is the dataset used to train YOLO. The algorithm recognizes the item, shows its label on the screen, and gives verbal output via using Google Text-to-Speech to convert the label to speech (gTTS) API, after which the Playsound library is used to play the audio. The integrated system's efficiency and versatility make it perfect for assistive technologies.
Reference :