Agglomerative Hierarchical Clustering

By Simone Bregaglio | Jan 25, 2016 02:04PM CET

Dear developers,
I wonder if you could give an exhaustive description of the algorithm used to determine the number of clusters in the automatic truncation mode, when agglomerative hierarchical clustering is performed. I searched in many scientific papers in which XLSTAT is used, and the recurrent and standard sentence states that the "automatic truncation is based on the entropy and tries to create homogeneous groups". Could you give us further details?
Thank you in advance and best regards.




By Jean Paul | Jan 25, 2016 03:55PM CET | XLSTAT Agent

The “entropy” option is based on the largest decrease in Shannon’s entropy between a node and the next one.
XLSTAT minimizes 1/(entropy(node i-1) -entropy(node(i))). When the minimum is reached, the truncation is applied.

The “inertia” option is based on the largest difference in inertia (simply measured on the dendrogram’s Y axis) between (node i) and (node i-1).

Jean Paul

