To run CSUMI on command line:

Need python installed with numpy and scipy.

Need dataFile. The data file consists of all the data from the experiment, and has the following form: The first line contains two numbers, n and m, where n is the number of samples, m the number of dimensions. Each other line consists of the data from the experiment (aka RNA-Seq, expression data, etc).

Need annFile, which contains the annotation information from each sample. The $i$th line in this file contains the annotations for the $i$th sample, comma separated.

Examples: Let us say we have a data set, consisting of samples whose RNA-Seq data are the vectors (1,2,3,4,5), (6,7,8,9,10) and (11,12,13,14,15). Assume we are interested in two annotations, one corresponding to tissue type and one to tumor vs normal status. Assume the first sample is tumor, the others are normal, and that the first sample is brain, the others are lung. Then the data file would look like:

3 5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

The first line tells us there are 3 samples and 5 dimensions, and each other row is a sample.

Meanwhile, the annFile would look like: 

brain,tumor
lung,normal
lung,normal


We also need the name of a file to save the result in, saveFile. Finally, we need the discretization parameter, k (see paper).

To run the program from the command line type:

python CSUMI.py dataFile annFile saveFile k

where dataFile is the name of the dataFile, similarly for annFile and saveFile, and k is as above. For example, if the data is saved to data.txt, the annotation to ann.txt, you want the result saved to save.txt, and k=6 you write


python CSUMI.py data.txt ann.txt save.txt 6

The result appears in the saveFile, with one row for each PC. The row consists of one score for each PC, with one column for each annotation. For example, the first row might look like

Score PC 1: .2 .3

Which means that PC1 has a CSUMI score of .2 with respect to the first covariate and .3 with respect to the second covariate.