The goal of fusionclust
is to conduct clustering and feature screening in large scale cluster analysis problems. In particular, fusionclust
provides the Big Merge Tracker (BMT) and COSCI algorithms for convex clustering and feature screening using an ℓ1 fusion penalty.
BMT is a computationally efficient path algorithm that relies on a convex relaxation of the k-means clustering criterion and is potent at determining the number of clusters / modes in an univariate problem. COSCI (COnvex Screening for Cluster Information), on the other hand, is a non-parametric method for ranking and screening non-informative features in large scale cluster analysis problems and enjoys a perfect screening property in the sense that under mild regularity conditions on the densities of the features, COSCI screens out all the non-informative features with high probability.
See the two references for more details around these algorithms.
Installation
You can:
-
install the release version of
fusionclust
from CRAN withinstall.packages("fusionclust")
. -
install the development version of
fusionclust
devtools::install_github("trambakbanerjee/fusionclust")
Usage
Check out the included vignette demo-fusionclust
for illustrative examples.
References
[1.] Feature Screening in Large Scale Cluster Analysis
Banerjee, T., Mukherjee, G. and Radchenko P. Journal of Multivariate Analysis, Volume 161, 2017, Pages 191-212
[2.] Convex clustering via ℓ1 fusion penalization
Radchenko P., Mukherjee G. J. R. Stat. Soc. Ser. B Stat. Methodol. (2017)