Author : S. T. Patil,Atharva Narote,Nilesh Binnar,Chaitanya Phad,Payal Rathod
Date of Publication :17th January 2024
Abstract: Image captioning, is a process of generating descriptions in natural language for images, has garnered substantial interest in the field of natural language processing and computer vision. This literature review examines VGG (Visual Geometry Group) and recurrent neural networks, particularly Long Short-Term Memory (LSTM) networks, to address the image captioning task. The integration of pre-trained CNNs, renowned for their prowess in extracting hierarchical features from images, with LSTM networks, capable of modelling sequential data and generating coherent textual descriptions, forms the crux of numerous state-of-the-art image captioning systems. In the following study, we investigate the application of the BLEU (Bilingual Evaluation Understudy) metric as a means to quantitatively assess the quality of captions generated by neural network-based image captioning models.
Reference :