Sentiment Analysis and Text Classification for Automatic Detection of Harassment and Threats Using Artificial Intelligence

Authors

Kevin Alexander Mendoza Campoverde Universidad Técnica de Machala, Facultad de Ingeniería Civil, Machala, El Oro, Ecuador. https://orcid.org/0009-0007-2361-1276
Javier Valentin Hurtado Gonzalez Universidad Técnica de Machala, Facultad de Ingeniería Civil, Machala, El Oro, Ecuador. https://orcid.org/0009-0001-9648-0752
Rodrigo Fernando Morocho Román Universidad Técnica de Machala, Facultad de Ingeniería Civil, Machala, El Oro, Ecuador. https://orcid.org/0000-0003-0194-5033
Wilmer Braulio Rivas Asanza Universidad Técnica de Machala, Facultad de Ingeniería Civil, Machala, El Oro, Ecuador. https://orcid.org/0000-0002-2239-3664

DOI:

https://doi.org/10.33936/isrtic.v9i1.7470

Keywords:

Cyberbullying, Text classification, BERT, Logistic regression, Social media

Abstract

This paper shows a comparison between two artificial intelligence models for the detection of aggressive language in social networks between a traditional text classification model and a model based on deep neural networks. Two main approaches were used: logistic regression using TF-IDF vector and a BERT-based model adapted for natural language processing. As for the methodology, CRISP-DM was applied, addressing from data preparation to the final part which is the evaluation of the models. Balances were presented in the data set, which was corrected using the SMOTE technique. The model evaluation showed us that BERT achieved better performance metrics with an average F1 measure of 0.93 compared to logistic regression which presented a 0.83. The metrics together with the review of classification errors helped to observe more clearly in which aspects each approach presented strengths or showed limitations. In summary, the results obtained show that BERT offers significant advantages for the task of content moderation in social networks, and it was also possible to confirm that proper preprocessing and data balancing are key factors to improve performance in problems related to text classification.

Downloads

Download data is not yet available.

References

Aggarwal, P., & Mahajan, R. (2024). Shielding social media: BERT and SVM unite for cyberbullying detection and classification. Journal of Information Systems and Informatics, 6(2). https://doi.org/10.51519/journalisi.v6i2.692

Álvarez-García, D., Barreiro-Collazo, A., & Núñez, J.-C. (2017). Ciberagresión entre adolescentes: Prevalencia y diferencias de género. Comunicar: Revista Científica de Comunicación y Educación, 25(50), 89–97. https://doi.org/10.3916/C50-2017-08

Amalia, F. S., & Suyanto, Y. (2024). Offensive language and hate speech detection using BERT model. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 18(4), Article 4. https://doi.org/10.22146/ijccs.9984191A.1

Armenta-Segura, J., Núñez-Prado, C. J., Sidorov, G. O., Gelbukh, A., & Román-Godínez, R. F. (2023). Ometeotl@ Multimodal Hate Speech Event Detection 2023: Hate speech and text-image correlation detection in real life memes using pre-trained BERT models over text. En Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations (CONSTRAINT) (pp. 53–59). https://aclanthology.org/2023.case-1.7/

Bartolomé, M. (2021). Redes sociales, desinformación, cibersoberanía y vigilancia digital: Una visión desde la ciberseguridad. RESI: Revista de Estudios en Seguridad Internacional, 7(2), 167–185.

Chérrez, W. E. M., & Avila-Pesantez, D. F. (2021). Ciberseguridad en las redes sociales: Una revisión teórica. Revista Uniandes Episteme, 8(2), Article 2.

Collarte Gonzalez, I. (2020). Procesamiento del lenguaje natural con BERT: Análisis de sentimientos en tuits [Trabajo de Fin de Grado, Universidad Carlos III de Madrid]. https://e-archivo.uc3m.es/rest/api/core/bitstreams/a10e2295-b239-4305-aad1-1570259607bf/content

Das, R. K., & Pedersen, T. (2024). SemEval-2017 Task 4: Sentiment analysis in Twitter using BERT (No. arXiv:2401.07944). arXiv. https://doi.org/10.48550/arXiv.2401.07944

Logistic function. (s. f.). Scikit-Learn. Recuperado 12 de febrero de 2025, de https://scikit-learn/stable/auto_examples/linear_model/plot_logistic.html

Marín-Cortés, A. (2020). Las fuentes digitales de la vergüenza: Experiencias de ciberacoso entre adolescentes. The Qualitative Report, 25(1), 166–180. https://doi.org/10.46743/2160-3715/2020.4218

Ministerio de Asuntos Económicos y Transformación Digital, Red.es, & Observatorio Nacional de Tecnología y Sociedad. (2022). Beneficios y riesgos del uso de Internet y las redes sociales. Observatorio Nacional de Tecnología y Sociedad. https://doi.org/10.30923/094-22-017-3

ONTSI. (2022). Violencia digital de género: Una realidad invisible. https://www.ontsi.es/es/publicaciones/violencia-digital-de-genero-una-realidad-invisible-2022

Pamungkas, E., Basile, V., & Patti, V. (2020). Misogyny detection in Twitter: A multilingual and cross-domain study. Information Processing & Management, 57, 102360. https://doi.org/10.1016/j.ipm.2020.102360

Rueda, J. F. V. (2019, noviembre 4). CRISP-DM: Una metodología para minería de datos en salud. HealthDataMiner. https://healthdataminer.com/data-mining/crisp-dm-unametodologia-para-mineria-de-datos-en-salud/

Sapora, S., Lazarescu, B., & Lolov, C. (2019). Absit invidia verbo: Comparing deep learning methods for offensive language (No. arXiv:1903.05929). arXiv. https://doi.org/10.48550/arXiv.1903.05929

Security, P. (2023, marzo 13). 52 estadísticas y datos alarmantes sobre el ciberacoso. Panda Security Mediacenter. https://www.pandasecurity.com/es/mediacenter/52-estadisticas-ciberacoso/

Varela Campos, E. (2024). Análisis de la privacidad y seguridad en las redes sociales en un mundo de ciberdelitos. https://repositorio.comillas.edu/xmlui/handle/11531/80324

Vinueza-Álvarez, C., Acosta-Uriguen, M. I., & Sigua, J. F. L. (2023). Análisis de clusterización en datos de encuestas sobre ciberacoso. Revista Tecnológica - ESPOL, 35(2), Article 2. https://doi.org/10.37815/rte.v35n2.1055

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Predicting the type and target of offensive posts in social media. En J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1415–1420). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1144

Downloads

Published

2025-05-21

How to Cite

[1]

Mendoza Campoverde, K.A., Hurtado Gonzalez, J.V., Morocho Román, .R.F. and Rivas Asanza, W.B. 2025. Sentiment Analysis and Text Classification for Automatic Detection of Harassment and Threats Using Artificial Intelligence. Informática y Sistemas. 9, 1 (May 2025), 82–92. DOI:https://doi.org/10.33936/isrtic.v9i1.7470.

Download Citation

Issue

Vol. 9 No. 1: January - June (2025)

Section

Regular Papers

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles submitted to this journal for publication will be released for open access under a Creative Commons Attribution Non-Commercial No Derivative Works licence (http://creativecommons.org/licenses/by-nc-nd/4.0).

The authors retain copyright, and are therefore free to share, copy, distribute, perform and publicly communicate the work under the following conditions: Acknowledge credit for the work specified by the author and indicate if changes were made (you may do so in any reasonable way, but not in a way that suggests that the author endorses your use of his or her work. Do not use the work for commercial purposes. In case of remixing, transformation or development, the modified material may not be distributed.

	2023	2024	2025
Received Submissions	38	44	25
Accepted Submissions	10	13	5
Rejected Submissions	29	31	15
Days to Accept (x̄)	60	50	60
Days to Reject (x̄)	10	7	7
Acceptance Rate	26%	30%	NA
Rejection Rate	74%	70%	NA

Sentiment Analysis and Text Classification for Automatic Detection of Harassment and Threats Using Artificial Intelligence

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language