Generative Network-Based Data Balancing Applied to Agricultural Image Datasets
DOI:
https://doi.org/10.33936/isrtic.v9i2.7782Keywords:
Deep learning, Class balancing, Convolutional networks, Synthetic data, Image segmentationAbstract
The imbalance of classes in agricultural datasets limits the performance of classification models based on convolutional neural networks, as it makes it difficult to accurately identify minority classes. To mitigate this problem, the CRISP-DM methodology was adapted to generate synthetic data using Wasserstein generative adversarial networks (WGAN-GP), using segmented defects from avocado diseases (Scab and Anthracnose) extracted with computer vision techniques. These anomalies were integrated into images of healthy fruits to construct a balanced dataset. The InceptionV3 architecture with transfer learning was then used to train a classification model, and its performance was assessed using both the balanced and unbalanced datasets. Using the balanced dataset resulted in significant accuracy gains, particularly in disease classification, with a validation accuracy of 97.74%. This study shows that in situations where real data collection is expensive or limited, using synthetic data can be a useful way to increase the predictive power of models.
Downloads
References
Bi, W., Wang, X., Xu, W., & Wang, C. (2020). Generation of synthetic training data for deep learning-based plant disease detection system. Frontiers in Plant Science, 11, 583438. https://doi.org/10.3389/fpls.2020.583438
Bustamante, M. I., Osorio-Navarro, C., Fernández, Y., Bourret, T. B., Zamorano, A., & Henríquez-Sáez, J. L. (2022). First record of Colletotrichum anthrisci causing anthracnose on avocado fruits in Chile. Pathogens, 11(10), 1204. https://doi.org/10.3390/pathogens11101204
Carvajal Chávez, C. A. (2023). Uso de técnicas como la regresión y redes neuronales para anticipar el rendimiento del maíz. RECIMUNDO, 8(4), 126–135. https://doi.org/10.26820/recimundo/8.(4).diciembre.2024.126-135
Food and Agriculture Organization of the United Nations. (2021). The state of food and agriculture 2021: Making agrifood systems more resilient to shocks and stresses. FAO. https://doi.org/10.4060/cb4476en
Hai, T., Shao, Y., Zhang, X., Yuan, G., Jia, R., Fu, Z., Wu, X., Ge, X., Song, Y., Dong, M., & Yan, S. (2025). An efficient model for leafy vegetable disease detection and segmentation based on few-shot learning framework and prototype attention mechanism. Plants, 14(5), 760. https://doi.org/10.3390/plants14050760
Hossen, M. I., Awrangjeb, M., Pan, S., & Al Mamun, A. (2025). Transfer learning in agriculture: A review. Artificial Intelligence Review, 58, Article 97. https://doi.org/10.1007/s10462-024-11081-x
Iparraguirre-Villanueva, O., Guevara-Ponce, V., Paredes, O., Sierra-Liñan, F., Zapata-Paulini, J., & Cabanillas-Carbonell, M. (2022). Convolutional neural networks with transfer learning for pneumonia detection. International Journal of Advanced Computer Science and Applications, 13(9), 592–599. https://doi.org/10.14569/IJACSA.2022.0130963
Lei, L., Yang, Q., Yang, L., Shen, T., Wang, R., & Fu, C. (2024). Deep learning implementation of image segmentation in agricultural applications: A comprehensive review. Artificial Intelligence Review, 57, Article 149. https://doi.org/10.1007/s10462-024-10775-6
Lu, Y., Chen, D., Olaniyi, E., & Huang, Y. (2022). Redes generativas antagónicas (GAN) para el aumento de imágenes en la agricultura: Una revisión sistemática. Computers and Electronics in Agriculture, 200, 107208. https://doi.org/10.1016/j.compag.2022.107208
Paauw, M., Hardeman, G., Taks, N. W., Lambalk, L., Berg, J. A., Pfeilmeier, S., & van den Burg, H. A. (2024). ScAnalyzer: An image processing tool to monitor plant disease symptoms and pathogen spread in Arabidopsis thaliana leaves. Plant Methods, 20, Article 80. https://doi.org/10.1186/s13007-024-01213-3
Salem, H. M., Fan, D. P., & Shao, L. (2020). Data augmentation using GAN for improving skin disease classification. Computer Methods and Programs in Biomedicine, 198, 105769. https://doi.org/10.1016/j.cmpb.2020.105769
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Soria Olivas, E., Sánchez-Montañés Isla, A., Gamero Cruz, R., Castillo Caballero, B., & Cano Michalena, P. (2023). Sistemas de aprendizaje automático. RA-MA Editorial.
Tassi, A., McGough, A. S., & Armitage, D. W. (2022). Applications of generative adversarial networks in agriculture: A review. arXiv. https://doi.org/10.48550/arXiv.2204.04707
Thayer, A. W., Vargas, A., Castellanos, A. A., Lafon, C. W., McCarl, B. A., Roelke, D. L., Winemiller, K. O., & Lacher, T. E. (2020). Integrating agriculture and ecosystems to find suitable adaptations to climate change. Climate, 8(1), 10. https://doi.org/10.3390/cli8010010
Wang, C., Xia, Y., Xia, L. et al. Dual discriminator GAN-based synthetic crop disease image generation for precise crop disease identification. Plant Methods 21, 46 (2025). https://doi.org/10.1186/s13007-025-01361-0
Wang, S.; Xu, D.; Liang, H.; Bai, Y.; Li, X.; Zhou, J.; Su, C.; Wei, W. Advances in Deep Learning Applications for Plant Disease and Pest Detection: A Review. Remote Sens. 2025, 17, 698. https://doi.org/ 10.3390/rs17040698
Zhou, Y., Liu, X., & Zhang, J. (2023). A hybrid autoencoder-GAN model for small-scale image synthesis in imbalanced datasets. Applied Sciences, 13(2), 905. https://doi.org/10.3390/app13020905
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Luis Jesús Montesdeoca Espinoza, Stalin Joel Zambrano Rojas, Victor Joel Pinargote Bravo, Luis Cristobal Cedeño Valarezo

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles submitted to this journal for publication will be released for open access under a Creative Commons Attribution Non-Commercial No Derivative Works licence (http://creativecommons.org/licenses/by-nc-nd/4.0).
The authors retain copyright, and are therefore free to share, copy, distribute, perform and publicly communicate the work under the following conditions: Acknowledge credit for the work specified by the author and indicate if changes were made (you may do so in any reasonable way, but not in a way that suggests that the author endorses your use of his or her work. Do not use the work for commercial purposes. In case of remixing, transformation or development, the modified material may not be distributed.



