Abstract:
Mango, renowned as one of the most popular fruits globally, encompasses a wide range of varieties with significant variations in texture, taste, colour, and other attributes. Accurately identifying mango varieties and their contents, such as Vitamin C (Vit-C) and Titratable Acidity (TA), is essential to satisfy consumer demands and uphold consumer rights simultaneously. In this study, two distinct Near-Infrared Spectroscopy (NIRS) datasets in the 999.9-2500.2 nm range were integrated to classify Vit-C, TA, and four mango varieties-Cengkir, Kweni, Kent, and Palmer. To enhance spectral data quality, several pre-processing methods, including Multiplicative Scatter Correction (MSC), Savitzky-Golay Filtering (SGF), and Standard Normal Variate (SNV), were employed to correct spectral variations. Principal Component Analysis (PCA) was then utilized for dimensionality reduction, streamlining the spectral data and enhancing model efficiency. The Synthetic Minority Over-sampling Technique (SMOTE) was used on the spectral data to improve the predicting performance of Vit-C and TA. Various Machine Learning (ML) classifiers were subsequently applied for mango Vit-C, TA, and variety classification, with an additional stacking generalization method that yielded marginal performance improvements. The proposed system achieved an optimal F1 score of 95.0 percent for Vit-C and 84.0 percent for TA, while 100.0 percent for mango varieties with an average 5-fold average F1 score of 96.8 percent.