Email Spam Filtering Using Artificial Intelligence Techniques

Authors

  • B.Dhanalakshmi Department of Computer Science and Engineering, B.S.Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu 600048, India.
  • Rajeshwari R R Department of Master of Business Administration, Dr. Ambedkar Institute of Technology, Bengaluru, Karnataka 560056, India.
  • Sanju A N Department of Computer Science and Engineering, BGS Institute of Technology, Adichunchangiri University, Karnataka 581448, India.
  • Chintureena Thingom Department of Computer Science & Engineering, Aditya University, Surampalem, Andhra Pradesh 533437, India.
  • Vipul Devendra Punjabi Department of Computer Engineering, R. C. Patel Institute of Technology, Shirpur, Maharashtra 425405, India.
  • Vijayakumar B Department of Chemistry, Panimalar Engineering College, Chennai, Tamil Nadu 600123, India.
  • Shailendra Madansing Pardeshi Department of Computer Engineering, R. C. Patel Institute of Technology, Maharashtra 425405, India.
  • P.Venkatesan Department of Electrical and Electronics Engineering, Mahendra Institute of Technology, Tamil Nadu 637503, India.

DOI:

https://doi.org/10.24237/djes.2026.19102

Keywords:

Phishing detection, Machine learning, Naive Bayes, Email classification, URL classification

Abstract

Email phishing and spam pose considerable cybersecurity risks. They require trustworthy, effective, and feasible detection methods. This research work proposes a model-based methodology for e-mail spam detection and phishing is based on artificial intelligence (AI). It works with a binary classification system with two phases. At the first stage, the system classifies email contents into malicious and non-malicious. In the next stage, it scans embedded URLs, which may or may not be phishing hooks. This modular design reduces the complexity of the feature space and enables separate optimizations for the email and the URL analysis. The system is trained with 18650 email samples and 549346 url samples from publicly accessible datasets, with 70% for training and 30% for testing. The preprocessing step consisted in eliminating duplicates and null values, text normalizing, balancing classes, stemming and feature extraction using TF-IDF for email and CountVectorizer for url. Four lightweight ML algorithms were evaluated: Naive Bayes, Decision Tree, Random Forest and K-Nearest Neighbors. The result indicated that the Naive Bayes achieved the highest baseline accuracy of 96% in email classification and 97% in URL classification. Random Forest, on the other hand, was more resilient to adversarial attacks and demonstrated better generalization. The selected model was deployed with Gmail for real time inbox detection with an accuracy of 85% in real world applications. The results demonstrate that by integrating lightweight machine learning, modular design, and relatively clean pre-processing, a new generation of effective, scalable detectors for both phishing and spam e-mail can be constructed.

Downloads

Download data is not yet available.

References

[1] E. H. Tusher, M. A. Ismail, M. A. Rahman et al., “Email spam: A comprehensive review of optimized detection methods, challenges, and open research problems,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3467996.

[2] L. N. Vejendla, B. Bysani, A. Mundru et al., “Score-based support vector machine for spam mail detection,” in Proc. 7th Int. Conf. Trends in Electronics and Informatics (ICOEI), 2023, pp. 915–920, doi: 10.1109/ICOEI56765.2023.10125718.

[3] A. A. Abdo, K. Alhajri, A. Alyami et al., “AI-based spam detection techniques for online social networks: Challenges and opportunities,” Journal of Internet Services and Information Security, pp. 78–103, 2023, doi: 10.58346/JISIS.2023.I3.006.

[4] S. K. Birthriya, P. Ahlawat, and A. K. Jain, “Detection and prevention of spear phishing attacks: A comprehensive survey,” Computers & Security, vol. 151, Art. no. 104317, 2025. doi: 10.1016/j.cose.2025.104317

[5] F. Jáñez-Martino, R. Alaiz-Rodríguez, V. González-Castro, E. Fidalgo, and E. Alegre, “Spam email classification based on cybersecurity potential risk using natural language processing,” Knowledge-Based Systems, vol. 310, Art. no. 112939, 2025. doi: 10.1016/j.knosys.2024.112939

[6] K. S. N. Sushma, C. Viji, N. Rajkumar, J. Ravi, M. Stalin, and H. Najmusher, “Healthcare 4.0: A review of phishing attacks in cyber security,” Procedia Computer Science, vol. 230, pp. 874–878, 2023. doi: 10.1016/j.procs.2023.12.045

[7] H. Yang, Q. Liu, S. Zhou, and Y. Luo, “A spam filtering method based on multi-modal fusion,” Applied Sciences, vol. 9, no. 6, Art. no. 1152, 2019, doi: 10.3390/app9061152.

[8] C. Wang, Q. Li, T.-Y. Ren, X.-H. Wang, and G.-X. Guo, “High efficiency spam filtering: A manifold learning-based approach,” Mathematical Problems in Engineering, vol. 2021, pp. 1–7, 2021, doi: 10.1155/2021/2993877.

[9] S. Zavrak and S. Yilmaz, “Email spam detection using hierarchical attention hybrid deep learning method,” Expert Systems with Applications, vol. 233, Art. no. 120977, 2023, doi: 10.1016/j.eswa.2023.120977.

[10] T. O. Omotehinwa and D. O. Oyewola, “Hyperparameter optimization of ensemble models for spam email detection,” Applied Sciences, vol. 13, no. 3, Art. no. 1971, 2023, doi: 10.3390/app13031971.

[11] J. Mythili, B. Deebeshkumar, T. Eshwaramoorthy, and J. Ajay, “Enhancing email spam detection with temporal naive Bayes classifier,” in Proc. 2024 Int. Conf. Communication, Computing and Internet of Things (IC3IoT), Chennai, India, 2024, pp. 1–6, doi: 10.1109/IC3IoT60841.2024.10550229.

[12] D. Lee, M. Ahn, H. Kwak, J. B. Hong, and H. Kim, “BlindFilter: Privacy-preserving spam email detection using homomorphic encryption,” in Proc. 42nd Int. Symp. Reliable Distributed Systems (SRDS), Marrakesh, Morocco, 2023, pp. 35–45, doi: 10.1109/SRDS60354.2023.00014.

[13] Y. Guo, Z. Mustafaoglu, and D. Koundal, “Spam detection using bidirectional transformers and machine learning classifier algorithms,” Journal of Computational and Cognitive Engineering, vol. 2, pp. 5–9, 2023, doi: 10.47852/bonviewJCCE2202192.

[14] A. Ghourabi and M. Alohaly, “Enhancing spam message classification and detection using transformer-based embedding and ensemble learning,” Sensors, vol. 23, no. 8, Art. no. 3861, 2023, doi: 10.3390/s23083861.

[15] P. P. Ghogare, H. H. Dawoodi, and M. P. Patil, “Enhancing spam email classification using effective preprocessing strategies and optimal machine learning algorithms,” Indian Journal of Science and Technology, vol. 17, no. 15, pp. 1545–1556, 2023, doi: 10.17485/IJST/v17i15.2979.

[16] A. B. Majgave and N. L. Gavankar, “Automatic phishing website detection and prevention model using transformer deep belief network,” Computers & Security, vol. 147, Art. no. 104071, 2024. doi: 10.1016/j.cose.2024.104071

[17] A. Al-Subaiey, M. Al-Thani, N. A. Alam, K. F. Antora, A. Khandakar, and S. M. A. Uz Zaman, “Novel interpretable and robust web-based AI platform for phishing email detection,” Computers & Electrical Engineering, vol. 120, Art. no. 109625, 2024. doi: 10.1016/j.compeleceng.2024.109625

[18] J. Zraqou, A. H. Al-Helali, W. Maqableh, H. Fakhouri, and W. Alkhadour, “Robust email spam filtering using a hybrid of grey wolf optimiser and naive Bayes classifier,” Cybernetics and Information Technologies, vol. 23, no. 1, pp. 79–90, 2023, doi: 10.2478/cait-2023-0037.

[19] H. AlZeyadi, R. Sert, and F. Duran, “A lightweight, explainable spam detection system with Rüppell’s Fox optimizer for the social media network X,” Electronics, vol. 14, no. 21, Art. no. 4153, 2025, doi: 10.3390/electronics14214153.

[20] A. Dhar, K. V. Anusha, A. Kataria, and M. A. Khan, “Comparative analysis of deep learning, SVM, random forest, and XGBoost for email spam detection: A socio-network analysis approach,” in Proc. 2023 Int. Conf. Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2023, pp. 701–707, doi: 10.1109/ICCCIS60361.2023.10425771.

[21] U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ahmadian, “Cloud-based email phishing attack detection using machine and deep learning algorithms,” Complex & Intelligent Systems, vol. 9, pp. 3043–3070, 2023, doi: 10.1007/s40747-022-00760-3.

[22] F. E. Ayo, L. A. Ogundele, S. Olakunle, J. B. Awotunde, and F. A. Kasali, “A hybrid correlation-based deep learning model for email spam classification using fuzzy inference system,” Decision Analytics Journal, vol. 10, Art. no. 100390, 2024, doi: 10.1016/j.dajour.2023.100390.

Downloads

Published

2026-03-15

How to Cite

[1]
“Email Spam Filtering Using Artificial Intelligence Techniques”, DJES, vol. 19, no. 1, pp. 25–41, Mar. 2026, doi: 10.24237/djes.2026.19102.

Similar Articles

21-30 of 93

You may also start an advanced similarity search for this article.