Terms selection in Excel
This tutorial will show you how to select terms with the most influence in Excel using XLSTAT.
Dataset to select terms with XLSTAT
This tutorial is based on a dataset that contains the lyrics of 94 songs of the American singer Taylor Swift. The dataset is extracted from the data science platform, Kaggle and might be accessed at this address.
Setting up the terms selection feature in XLSTAT
Select the XLSTAT/ Text Mining / Terms selection. The dialog box pops up.
In the XLSTAT interface, select the response variable and the term frequencies from the document-term matrix.
Select song titles as document labels.
Click on OK.
Interpret the results of the terms selection
The first two charts show the coefficients and odds ratios for each term. They both show the importance of a term in the calculated model.
The following chart represents the binomial deviance as a function of the lambda value. The number of terms with a non-zero coefficient is displayed on the upper vertical axis. The optimal lambda (minimum and 1se) are displayed on the same chart. Depending on the lambda chosen during the parameterization, the number of terms with a non-zero coefficient is different.
The following table gives the confusion matrix obtained on the training sample. It gives the performance level of our classifier. We obtain 82% of correct prediction on the training sample.
Finally, the predictions are displayed as well as the probability of belonging to the Positive class.
Was this article useful?