Author : Nikie Jo E. Deocampo 1
Date of Publication :7th December 2023
Abstract: Lip-reading has gained interest for its potential in revolutionizing human-computer interaction, improving accessibility, and enhancing surveillance systems. This paper proposes a hybrid approach that combines Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models to advance lip-reading accuracy for Tagalog. We collected a comprehensive dataset of 450 videos featuring 50 known phrases spoken by nine native Tagalog speakers, to facilitate development and evaluation. The hybrid CNN-LSTM approach leverages CNNs' ability to extract visual features and LSTMs' capability to model temporal dependencies. Recent studies have demonstrated the effectiveness of such hybrid models in lip-reading tasks. Our focus is on training and optimizing the hybrid model by using the collected dataset. Evaluation involves rigorous testing of unseen video sequences using frame-level accuracy and phrase-level recognition rates. The outcomes of this research can significantly advance lip-reading technology for Tagalog, demonstrating improved accuracy and robustness. The findings have implications for communication accessibility, human-computer interaction, and surveillance systems. The collected dataset also serves as a valuable resource for future Tagalog lip reading research.
Reference :