Content Based Approach for Detecting Smishing Messages in Mobile Phones Using an Improved Convolutional Neural Networks Model

Rose Mueni Mbevi; John Kamau; Faith Mueni Musyoka

doi:10.51867/ajernet.6.2.17

Authors

Rose Mueni Mbevi Mount Kenya University, Kenya https://orcid.org/0009-0009-2537-4133
Dr. John Kamau Mount Kenya University, Kenya
Dr. Faith Mueni Musyoka University of Embu, Kenya https://orcid.org/0000-0002-9574-8235

DOI:

https://doi.org/10.51867/ajernet.6.2.17

Keywords:

CNN, Content Based Approach, Legitimate Messages, Smishing Detection, Smishing Messages

Abstract

SMS stands for Short Message Service (SMS). Short messaging service is a text messaging service where a user can send short messages via a mobile device. Short message service has evolved and become very popular as a communication medium in the last decade. It has become a more effective mode of communication compared to email. Unfortunately, smishing (SMS phishing) has emerged as the most common type of spam because traditional detection methods have difficulty understanding the informal nature of these messages. An improved class of CNN-based models targeted at accurate detection of smishing on mobile devices was developed. Deep learning theory was used in this work. (UCI) refers to University of California, Irvine (UCI). The UCI Machine Learning Repository contains datasets, domain theories, and data generators used by the machine learning community to empirically study machine learning algorithms. In this study, a research design was carried out and samples of the UCI Machine Learning Repository were used to build an experimental model. The analyzed dataset was a set of 5, 574 SMS messages from both spam and non-spam messages. The performance metrics used were Precision, Recall, F1-Score and Accuracy. The CNN model used for evaluation, had a bigger number of hidden layers for better detection. A higher accuracy of 99. 95% was achieved, indicating good performance and better detection of the SMS spams (SMS phishing). In the analysis mentioned in the text, the text preprocessing greatly contributes to improved detection accuracy and CNN outperforms the traditional detection methods. It shows that the sophisticated nature of smishing attacks make it necessary for advanced detection mechanisms to be applied to prevent future SMS threats. Although some authorities for implementation should allocate resources to implement these solutions, other authorities must define roles for different detection systems in order to realize the ideal and continuous performance of the detection tools. However, various authorities should also continuously enhance detection mechanisms by feature enhancement, data augmentation, and regular performance evaluation. The training for the staff and establishment of the performance benchmarks needs to be implemented. The users should also contribute with their comments, reports suspicious message, and raising awareness for each other, while all other stakeholders should make available their expertise and resources to support this work.

Downloads

Download data is not yet available.

References

Alexander, G. (2024, September 23). We're a nonprofit text message solution | Tatango [The premier text fundraising service built for nonprofits]. Tatango-SMS Marketing Software. https://www.tatango.com/blog/text_message_content_gateway/

Almeida, T. A., & Hidalgo, J. M. G. (2011). SMS Spam Collection (Version V.1) [Dataset, plain text file]. UCI Machine Learning Repository. https://doi.org/10.24432/C5CC84

Amin, M. Z., & Nadeem, N. (2019, October 6). Convolutional neural network: Text classification model for open domain question answering system. arXiv. https://doi.org/10.48550/arXiv.1809.02479

Bhandari, P. (2021, October 18). Ethical considerations in research | Types & examples. Scribbr. https://www.scribbr.com/methodology/research-ethics/

Delany, S. J., Buckley, M., & Greene, D. (2012). SMS spam filtering: Methods and data. Expert Systems with Applications, 39(10), 9899-9908. https://doi.org/10.1016/j.eswa.2012.02.053 DOI: https://doi.org/10.1016/j.eswa.2012.02.053

Emerson, R. W. (2015). Convenience sampling, random sampling, and snowball sampling: How does sampling affect the validity of research? Journal of Visual Impairment & Blindness, 109(2), 164-168. https://doi.org/10.1177/0145482X1510900215 DOI: https://doi.org/10.1177/0145482X1510900215

Erdelyi, L. (2020, March 16). The five stages of the data analysis process. Lighthouse Labs. https://www.lighthouselabs.ca/en/blog/the-five-stages-of-data-analysis

Goel, D., & Jain, A. K. (2018). Smishing-classifier: A novel framework for detection of smishing attack in mobile environment. In P. Bhattacharyya, H. G. Sastry, V. Marriboyina, & R. Sharma (Eds.), Smart and innovative trends in next generation computing technologies (pp. 502-512). Springer. https://doi.org/10.1007/978-981-10-8660-1_38 DOI: https://doi.org/10.1007/978-981-10-8660-1_38

Goel, D., Ahmad, H., Jain, A. K., & Goel, N. K. (2024, December 9). Machine learning driven smishing detection framework for mobile security. arXiv. https://doi.org/10.48550/arXiv.2412.09641

Gomaa, W. H. (2020). The impact of deep learning techniques on SMS spam filtering. International Journal of Advanced Computer Science and Applications, 11(1), 544-549. https://doi.org/10.14569/IJACSA.2020.0110167 DOI: https://doi.org/10.14569/IJACSA.2020.0110167

Jain, A. K., & Gupta, B. B. (2019). Feature based approach for detection of smishing messages in the mobile environment. Journal of Information Technology Research, 12(2), 17-35. https://doi.org/10.4018/JITR.2019040102 DOI: https://doi.org/10.4018/JITR.2019040102

Jain, A. K., Goel, D., Agarwal, S., Singh, Y., & Bajaj, G. (2020). Predicting spam messages using back propagation neural network. Wireless Personal Communications, 110(1), 403-422. https://doi.org/10.1007/s11277-019-06734-y DOI: https://doi.org/10.1007/s11277-019-06734-y

Jain, A. K., Gupta, B. B., Kaur, K., Bhutani, P., Alhalabi, W., & Almomani, A. (2022). A content and URL analysis‐based efficient approach to detect smishing SMS in intelligent systems. International Journal of Intelligent Systems, 37(12), 11117-11141. https://doi.org/10.1002/int.23035 DOI: https://doi.org/10.1002/int.23035

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539 DOI: https://doi.org/10.1038/nature14539

Mahmood, A. R., & Hameed, S. M. (2023). Review of smishing detection via machine learning. Iraqi Journal of Science, 64(8), 4244-4259. https://doi.org/10.24996/ijs.2023.64.8.42 DOI: https://doi.org/10.24996/ijs.2023.64.8.42

Mahmud, T., Prince, M. A. H., Ali, M. H., Hossain, M. S., & Andersson, K. (2024). Enhancing cybersecurity: Hybrid deep learning approaches to smishing attack detection. Systems, 12(11), 490. https://doi.org/10.3390/systems12110490 DOI: https://doi.org/10.3390/systems12110490

Maqsood, U., Ur Rehman, S., Ali, T., Mahmood, K., Alsaedi, T., & Kundi, M. (2023). An intelligent framework based on deep learning for SMS and e-mail spam detection. Applied Computational Intelligence and Soft Computing, 2023, 1-16. https://doi.org/10.1155/2023/6648970 DOI: https://doi.org/10.1155/2023/6648970

Mehmood, M. K., Arshad, H., Alawida, M., & Mehmood, A. (2024). Enhancing smishing detection: A deep learning approach for improved accuracy and reduced false positives. IEEE Access, 12, 137176-137193. https://doi.org/10.1109/ACCESS.2024.3463871 DOI: https://doi.org/10.1109/ACCESS.2024.3463871

Mishra, S., & Soni, D. (2019a). A content-based approach for detecting smishing in mobile environment. SSRN Electronic Journal, 986-993. https://doi.org/10.2139/ssrn.3356256 DOI: https://doi.org/10.2139/ssrn.3356256

Mishra, S., & Soni, D. (2019b). SMS phishing and mitigation approaches. In 2019 Twelfth International Conference on Contemporary Computing (IC3) (pp. 1-5). IEEE. https://doi.org/10.1109/IC3.2019.8844920 DOI: https://doi.org/10.1109/IC3.2019.8844920

Mishra, S., & Soni, D. (2020). Smishing detector: A security model to detect smishing through SMS content analysis and URL behavior analysis. Future Generation Computer Systems, 108, 803-815. https://doi.org/10.1016/j.future.2020.03.021 DOI: https://doi.org/10.1016/j.future.2020.03.021

Mishra, S., & Soni, D. (2022). Implementation of 'Smishing Detector': An efficient model for smishing detection using neural network. SN Computer Science, 3(3), 189. https://doi.org/10.1007/s42979-022-01078-0 DOI: https://doi.org/10.1007/s42979-022-01078-0

Mishra, S., & Soni, D. (2023). DSmishSMS-A system to detect smishing SMS. Neural Computing and Applications, 35(7), 4975-4992. https://doi.org/10.1007/s00521-021-06305-y DOI: https://doi.org/10.1007/s00521-021-06305-y

Morreale, M. (2017, March 6). Daily SMS mobile usage statistics | SMSEagle. SMSEagle. https://www.smseagle.eu/2017/03/06/daily-sms-mobile-statistics/

Nivaashini, M., R.S. Soundariya, A. Kodieswari, & P. Thangaraj. (2018). SMS spam detection using deep neural network. International Journal of Pure and Applied Mathematics, 119(18), 2425-2436.

Remmide, M. A., Boumahdi, F., Ilhem, B., & Boustia, N. (2025). A privacy-preserving approach for detecting smishing attacks using federated deep learning. International Journal of Information Technology, 17(1), 547-553. https://doi.org/10.1007/s41870-024-02144-x DOI: https://doi.org/10.1007/s41870-024-02144-x

Roy, P. K., Singh, J. P., & Banerjee, S. (2020). Deep learning to filter SMS spam. Future Generation Computer Systems, 102, 524-533. https://doi.org/10.1016/j.future.2019.09.001 DOI: https://doi.org/10.1016/j.future.2019.09.001

Sheikhi, S., Kheirabadi, M. T., & Bazzazi, A. (2020). An effective model for SMS spam detection using content-based features and averaged neural network. International Journal of Engineering, 33(2), 221-228. https://doi.org/10.5829/ije.2020.33.02b.06 DOI: https://doi.org/10.5829/ije.2020.33.02b.06

Shweta, & Main, K. (2023, July 17). What is smishing? Definition, examples & protection. Forbes Advisor. https://www.forbes.com/advisor/business/what-is-smishing/

Tanbhir, G., Shahriyar, M. F., Shahed, K., Chy, A. M. R., & Adnan, M. A. (2025, February 3). Hybrid machine learning model for detecting Bangla smishing text using BERT and character-level CNN. arXiv. https://doi.org/10.48550/arXiv.2502.01518

Testas, A. (2023). Distributed machine learning with PySpark: Migrating effortlessly from pandas and Scikit-learn (1st ed.). Apress. https://doi.org/10.1007/978-1-4842-9751-3 DOI: https://doi.org/10.1007/978-1-4842-9751-3_1