CSUMI

Component Selection Using Mutual Information (CSUMI)

The idea of CSUMI is that we want to perform dimensionality reduction on a high dimensional data set without loosing too much information. To do this we first perform dimension reduction, but leave ourselves with more dimensions than we actually want-- for example, when performing PCA the user is left with the task of deciding what components are worth looking at. We then use a mutual information based approach that takes into account a given covariate of interest to decide which of these dimensions we want to keep and which we want to discard. More details can be found in our paper. A python module implementing our method is available here, with a help file here.

Any questions or concerns can be directed to seanken at mit dot edu