Keywords:-

Keywords: spam detection, e-mail, machine learning, TF–IDF, Naive Bayes, SVM, deep learning

Article Content:-

Abstract

Email spam is still a continuous issue that affects user experience, costs resources, and makes fraud and phishing possible. In addition to proposing an experimental pipeline and offering a repeatable methodology for model training, evaluation, and comparison, this work examines both traditional and contemporary machine learning approaches to email spam detection. We implement and examine a number of algorithms, including Multinomial Naive Bayes, Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, a basic deep-learning baseline (bi-LSTM), using popular datasets (Enron, Spam Assassin, Ling-Spam) and standard preprocessing (cryptography, TF–IDF, header-feature extraction). We explore the interactions between performance, interpretability, and computing cost and give evaluation measures (accuracy, precision, recall, F1-score, ROC-AUC). Deployment issues, constraints, and future research goals are discussed in the paper's conclusion.

References:-

References

1. A. K. Sharma, R. Gupta, and S. K. Singh, “A Comparative Study of Email Spam Filtering Techniques Using Machine Learning,” International Journal of Computer Applications, vol. 179, no. 44, pp. 25–30, 2018.
2. S. A. Ahmed and M. M. Hameed, “Email Spam Detection Using Machine Learning Techniques,” Journal of Computer Science and Information Technology, vol. 9, no. 2, pp. 45–52, 2021.
3. S. A. G. DeBarr and H. Wechsler, “Spam Detection Using Clustering, Random Forests, and Active Learning,” Proceedings of the Sixth Conference on Email and Anti-Spam (CEAS), 2009.
4. T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” European Conference on Machine Learning (ECML), pp. 137–142, 1998.
5. V. Metsis, I. Androutsopoulos, and G. Paliouras, “Spam Filtering with Naive Bayes — Which Naive Bayes?,” Third Conference on Email and Anti-Spam (CEAS), 2006.
6. S. K. Choudhary and M. Jain, “Email Spam Classification Using Hybrid Machine Learning Approach,” International Journal of Engineering and Advanced Technology (IJEAT), vol. 8, no. 5, pp. 47–52, 2019.
7. Kaggle, “Spam Email Dataset,” Kaggle Datasets, Available: https://www.kaggle.com/uciml/sms-spam-collection-dataset.
8. A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48, 1998.
9. Y. Zhang and B. Wallace, “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification,” arXiv preprint arXiv:1510.03820, 2015.
10. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian Approach to Filtering Junk E-Mail,” Learning for Text Categorization: Papers from the 1998 Workshop, AAAI Technical Report WS-98-05, 1998.
11. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, no. 2, pp. 103–134, 2000.
12. A. G. Sabri, “Performance Evaluation of Supervised Machine Learning Algorithms for Spam Email Filtering,” Procedia Computer Science, vol. 189, pp. 234–241, 2021.
13. R. C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,” Machine Learning, vol. 11, pp. 63–90, 1993.
14. P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning, vol. 29, no. 2–3, pp. 103–130, 1997.
15. C. C. Aggarwal and C. Zhai, “A Survey of Text Classification Algorithms,” in Mining Text Data, Springer, 2012, pp. 163–222.

Downloads

Citation Tools

How to Cite
Patil, A., Jadhav, S., Nemade, S., & Masurekar, R. (2026). Comparative Study of Machine Learning Algorithms for E-mail Spam Detection. International Journal Of Mathematics And Computer Research, 14(03), 82-87. https://doi.org/10.47191/ijmcr/v14iSPC3.17