Generative Network-Based Data Balancing Applied to Agricultural Image Datasets

Authors

DOI:

https://doi.org/10.33936/isrtic.v9i2.7782

Keywords:

Deep learning, Class balancing, Convolutional networks, Synthetic data, Image segmentation

Abstract

The imbalance of classes in agricultural datasets limits the performance of classification models based on convolutional neural networks, as it makes it difficult to accurately identify minority classes. To mitigate this problem, the CRISP-DM methodology was adapted to generate synthetic data using Wasserstein generative adversarial networks (WGAN-GP), using segmented defects from avocado diseases (Scab and Anthracnose) extracted with computer vision techniques. These anomalies were integrated into images of healthy fruits to construct a balanced dataset. The InceptionV3 architecture with transfer learning was then used to train a classification model, and its performance was assessed using both the balanced and unbalanced datasets. Using the balanced dataset resulted in significant accuracy gains, particularly in disease classification, with a validation accuracy of 97.74%. This study shows that in situations where real data collection is expensive or limited, using synthetic data can be a useful way to increase the predictive power of models.

Downloads

Download data is not yet available.

References

Bi, W., Wang, X., Xu, W., & Wang, C. (2020). Generation of synthetic training data for deep learning-based plant disease detection system. Frontiers in Plant Science, 11, 583438. https://doi.org/10.3389/fpls.2020.583438

Bustamante, M. I., Osorio-Navarro, C., Fernández, Y., Bourret, T. B., Zamorano, A., & Henríquez-Sáez, J. L. (2022). First record of Colletotrichum anthrisci causing anthracnose on avocado fruits in Chile. Pathogens, 11(10), 1204. https://doi.org/10.3390/pathogens11101204

Carvajal Chávez, C. A. (2023). Uso de técnicas como la regresión y redes neuronales para anticipar el rendimiento del maíz. RECIMUNDO, 8(4), 126–135. https://doi.org/10.26820/recimundo/8.(4).diciembre.2024.126-135

Food and Agriculture Organization of the United Nations. (2021). The state of food and agriculture 2021: Making agrifood systems more resilient to shocks and stresses. FAO. https://doi.org/10.4060/cb4476en

Hai, T., Shao, Y., Zhang, X., Yuan, G., Jia, R., Fu, Z., Wu, X., Ge, X., Song, Y., Dong, M., & Yan, S. (2025). An efficient model for leafy vegetable disease detection and segmentation based on few-shot learning framework and prototype attention mechanism. Plants, 14(5), 760. https://doi.org/10.3390/plants14050760

Hossen, M. I., Awrangjeb, M., Pan, S., & Al Mamun, A. (2025). Transfer learning in agriculture: A review. Artificial Intelligence Review, 58, Article 97. https://doi.org/10.1007/s10462-024-11081-x

Iparraguirre-Villanueva, O., Guevara-Ponce, V., Paredes, O., Sierra-Liñan, F., Zapata-Paulini, J., & Cabanillas-Carbonell, M. (2022). Convolutional neural networks with transfer learning for pneumonia detection. International Journal of Advanced Computer Science and Applications, 13(9), 592–599. https://doi.org/10.14569/IJACSA.2022.0130963

Lei, L., Yang, Q., Yang, L., Shen, T., Wang, R., & Fu, C. (2024). Deep learning implementation of image segmentation in agricultural applications: A comprehensive review. Artificial Intelligence Review, 57, Article 149. https://doi.org/10.1007/s10462-024-10775-6

Lu, Y., Chen, D., Olaniyi, E., & Huang, Y. (2022). Redes generativas antagónicas (GAN) para el aumento de imágenes en la agricultura: Una revisión sistemática. Computers and Electronics in Agriculture, 200, 107208. https://doi.org/10.1016/j.compag.2022.107208

Paauw, M., Hardeman, G., Taks, N. W., Lambalk, L., Berg, J. A., Pfeilmeier, S., & van den Burg, H. A. (2024). ScAnalyzer: An image processing tool to monitor plant disease symptoms and pathogen spread in Arabidopsis thaliana leaves. Plant Methods, 20, Article 80. https://doi.org/10.1186/s13007-024-01213-3

Salem, H. M., Fan, D. P., & Shao, L. (2020). Data augmentation using GAN for improving skin disease classification. Computer Methods and Programs in Biomedicine, 198, 105769. https://doi.org/10.1016/j.cmpb.2020.105769

Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.

Soria Olivas, E., Sánchez-Montañés Isla, A., Gamero Cruz, R., Castillo Caballero, B., & Cano Michalena, P. (2023). Sistemas de aprendizaje automático. RA-MA Editorial.

Tassi, A., McGough, A. S., & Armitage, D. W. (2022). Applications of generative adversarial networks in agriculture: A review. arXiv. https://doi.org/10.48550/arXiv.2204.04707

Thayer, A. W., Vargas, A., Castellanos, A. A., Lafon, C. W., McCarl, B. A., Roelke, D. L., Winemiller, K. O., & Lacher, T. E. (2020). Integrating agriculture and ecosystems to find suitable adaptations to climate change. Climate, 8(1), 10. https://doi.org/10.3390/cli8010010

Wang, C., Xia, Y., Xia, L. et al. Dual discriminator GAN-based synthetic crop disease image generation for precise crop disease identification. Plant Methods 21, 46 (2025). https://doi.org/10.1186/s13007-025-01361-0

Wang, S.; Xu, D.; Liang, H.; Bai, Y.; Li, X.; Zhou, J.; Su, C.; Wei, W. Advances in Deep Learning Applications for Plant Disease and Pest Detection: A Review. Remote Sens. 2025, 17, 698. https://doi.org/ 10.3390/rs17040698

Zhou, Y., Liu, X., & Zhang, J. (2023). A hybrid autoencoder-GAN model for small-scale image synthesis in imbalanced datasets. Applied Sciences, 13(2), 905. https://doi.org/10.3390/app13020905

Published

2025-09-12

How to Cite

[1]
Montesdeoca Espinoza, L.J., Zambrano Rojas, S.J., Pinargote Bravo, V.J. and Cedeño Valarezo, L.C. 2025. Generative Network-Based Data Balancing Applied to Agricultural Image Datasets . Informática y Sistemas. 9, 2 (Sep. 2025), 164–176. DOI:https://doi.org/10.33936/isrtic.v9i2.7782.

Issue

Section

Regular Papers

Most read articles by the same author(s)