Author : Pawan Kumar R, Dr Nagaraja GSh
Date of Publication :11th November 2024
Abstract: Automated question tagging is essential for enhancing the discoverability and organization of content in online forums, particularly in domains like Cross Validated Stack Exchange, where accurate categorization of questions is crucial for efficient information retrieval. This paper investigates the performance of various machine learning (ML) models in predicting tags for questions, utilizing a dataset sourced from Cross Validated Stack Exchange. We employed four diverse ML models—Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and Recurrent Neural Networks (RNN)—to classify questions. Among the models, the CNN demonstrated superior performance with an accuracy of 89%. To explore whether large language models (LLMs) could further improve accuracy, we trained BERT on the same dataset, achieving an overall accuracy of 93%. Our findings suggest that BERT, a transformer-based model, significantly outperforms traditional ML models in this task. This study advances the field by highlighting the potential of LLMs in automated question tagging, offering insights for future research and practical applications.
Reference :