Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

Multi-Class Email Categorization in Enterprise Environments: A Study of Traditional SVM and Transformer-Based XLNET Models

Author : Vishal Tyagi, Sarthak Marwah, Dr. S. Nagadevi, Dr. Sindhuja M, Anand M

Date of Publication :5th May 2025

Abstract: The increasing volume of electronic communication makes effective email management essential for businesses. This research utilizes both conventional and sophisticated machine learning techniques to analyze multi-class email classification. The baseline is a Support Vector Machine (SVM) model with an F1-score of 62.01% and an accuracy of 67.07%. The Mean Squared Error (MSE) is 2.8710, the Root Mean Squared Error (RMSE) is 1.6944, the Mean Absolute Error (MAE) is 0.9885, the Standard Deviation (SD) is 1.6741, the Correlation Coefficient (R) is 0.4144, and the Coefficient of Determination (R2) is 0.4320, which are significant performance metrics for the SVM model. The model is suitable for structured email categorization tasks owing to its adequate performance and reasonable consistency. Conversely, an XLNet-based Large Language Model (LLM) methodology is optimized on the identical dataset to leverage contextual embeddings for enhanced classification. The LLM demonstrates an accuracy of 69.98% (rounded to 70%) and an F1-score of 54.89%, surpassing the SVM in F1 performance while exhibiting slightly superior accuracy. With statistical values being at 2.0538 for MSE, 1.4331 for RMSE, 0.7142 for MAE, 1.4287 for SD, 0.4197 for R, and 0.0244 for R, the LLM model is statistically significant. Even though LLM has more contextual awareness compared to the SVM model, it seems to perform similarly. Comparing to Support Vector Machines (SVMs) and other low-complexity ML approaches for structured categorization, LLMs generally use relatively complex contextual embeddings. As the adoption of e-communication continues to rise, the need of effective email management, especially for businesses is growing. The comparative analysis highlights the strengths and limitations of both approaches, offering insights into their scope, applications and deployment scenarios.

Reference :

Will Updated soon

Recent Article