[Granitto2007] "Rapid and non-destructive identification of strawberry cultivars by direct PTR-MS headspace analysis and data mining techniques",
Sensors and actuators B: Chemical
, vol. 121, no. 2: Elsevier, pp. 379–385, 2007.
Proton transfer reaction-mass spectrometry (PTR-MS) is a spectrometric technique that allows direct injection and analysis of mixtures of volatile compounds. Its coupling with data mining techniques provides a reliable and fast method for the automatic characterization of agroindustrial products. We test the validity of this approach to identify samples of strawberry cultivars by measurements of single intact fruits. The samples used were collected over 3 years and harvested in different locations. Three data mining techniques (random forests, penalized discriminant analysis and discriminant partial least squares) have been applied to the full PTR-MS spectra without any preliminary projection or feature selection. We tested the classification models in three different ways (leave-one-out and leave-group-out internal cross validation, and leaving a full year aside), thereby demonstrating that strawberry cultivars can be identified by rapid non-destructive measurements of single fruits. Performances of the different classification methods are compared.
[Granitto2006] "Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products",
Chemometrics and Intelligent Laboratory Systems
, vol. 83, no. 2: Elsevier, pp. 83–90, 2006.
In this paper we apply the recently introduced Random Forest-Recursive Feature Elimination (RF-RFE) algorithm to the identification of relevant features in the spectra produced by Proton Transfer Reaction-Mass Spectrometry (PTR-MS) analysis of agroindustrial products. The method is compared with the more traditional Support Vector Machine-Recursive Feature Elimination (SVM-RFE), extended to allow multiclass problems, and with a baseline method based on the Kruskal–Wallis statistic (KWS). In particular, we apply all selection methods to the discrimination of nine varieties of strawberries and six varieties of typical cheeses from Trentino Province, North Italy. Using replicated experiments we estimate unbiased generalization errors. Our results show that RF-RFE outperforms SVM-RFE and KWS on the task of finding small subsets of features with high discrimination levels on PTR-MS data sets. We also show how selection probabilities and features co-occurrence can be used to highlight the most relevant features for discrimination.