This is MSARi, the code used to implement an RNA secondary-structure
detection algorithm described in Coventry, Kleitman and Berger, PNAS
101(33):12102-7.  A number of far more accurate detection programs have
developed since I wrote this, and you may want to try them out instead:

RNAz:  <http://www.tbi.univie.ac.at/~wash/RNAz/>
PFOLD: <http://web.mit.edu/alex_c/Public/pfold-search.tgz>

MSARi has been tested on debian and solaris machines.  It requires
python 2.4 and the python Numeric package, which can be obtained from
<http://sf.net/projects/numpy>.

Scoring alignments
******************

There is a script in bin/score.py which you can use to score your own
alignments.  Pass it the path to a fasta file containing the alignment
you wish to evaluate.  It will print out a list of the 20 most
significant basepairs, and an estimate of the total significance of the
alignment.  For instance:

who% bin/score.py bin/eg.fa 
[(1.4095055640784277e-10, (120, 278)),
 (1.4095055640784277e-10, (120, 279)),
 ...]
New pair likelihood
120 278 1.40950556408e-10
New pair likelihood
98 297 1.9290666843e-10
New pair likelihood
165 181 1.65875994819e-07
New pair likelihood
139 262 5.3780937071e-07
New pair likelihood
150 193 1.64757952487e-06
3.9964310725e-39
who% 

The number printed at the bottom, 3.996e-39 in this case, is the
estimated probability of the alignment, given the null hypothesis that
the columns were mutating independently of each other.  The lower the
number, the more strongly MSARi is flagging the alignment as containing
conserved secondary structure.  Displayed above that is a list of
positions with interesting compensatory mutations.  The first list gives
the null-hypothesis probabilities for pairs of positions, and is in the
form

[(probability, (position index 1, position index 2)), ...]

The second list gives the pairs MSARi chose when building up evidence
for compensatory mutations in the entire alignment, and is in the form

New pair likelihood
<position index 1> <position index 2> <probability>

For 10- and 15-sequence alignments, score.py will choose the appropriate
configuration for you.  For alignments with a different number of
sequences, you will need to do further benchmarking to choose the
optimal Bonferroni criterion.  You can change the Bonferroni factor
it uses to choose significant basepairs by passing the --bfactor option.
E.g.

who% bin/score.py --bfactor=0 bin/eg.fa 
[(1.4095055640784277e-10, (120, 278)), ...]
1
who% 

In this case, with a Bonferroni factor of 0, none of the pairs are
deemed to exhibit sufficiently significant compensatory mutations, so
they are discarded.

Test code
*********

To run the test code, you will need RNAfold and ClustalW in your path.
RNAfold is part of the ViennaRNA package, which is available from
<http://www.tbi.univie.ac.at/~ivo/RNA/>.  MSARi has been tested with
version 1.5 of ViennaRNA.  ClustalW is available from
<ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/>.  MSARi has been
tested with version 1.83 of ClustalW.

Once you've unpacked MSARi, change the path names at the top of
MSARi/RNA/MSARi/tests/general_test.py to suit you: 

  - msaripath should specify the directory containing the distribution.
    E.g., if unpacking the tar file results in a directory
    "/var/tmp/MSARi", it should be set to that.
    
  - msadir specifies a directory which the MSAs should be saved to.
    
  - output_path specifies the file to which the rseults should be saved.
  
You may also wish to change the definition of output_file_base in
MSARi/RNA/structure/ViennaOdds.py.  If you run MSARi a lot, the
directory it specifies may need to be cleaned out, from time to time.

Depending on whether you are working with 10 or 15 sequences, you may
wish to change bonferroni_factor in MSARi/RNA/MSARi/MSA/BaseMSA.py to
alter the criterion for whether a base pair is kept for consideration or
not: for 10-sequence MSAs, I have found the best setting to be 0.2
while for 15 sequences, I used 0.005.

When you run the test program, it randomly chooses appropriate
10-sequence sets of SRP and RNASeP orthologs, constructs MSAs from them
using clustalw, shuffles and realigns the MSAs to generate controls, and
passes the MSAs to MSARi, which prints output about them as follows.
Lines beginning with "#" are comments describing the output for the
purposes of this README file.  Other lines are actual output.

# A list of the basepairing positions at which highly significant
# conservation of complementarity was observed.

[(4.648887255716372e-08, (87, 273)),
 (4.648887255716372e-08, (87, 274)),
 (7.3648055811144704e-08, (88, 272)),
 (6.706992502393583e-07, (86, 274)),
 (6.706992502393583e-07, (86, 275)),
 (1.1742805160278991e-06, (89, 271)),
 (1.1742805160278991e-06, (89, 272)),
 (6.7525551652239524e-06, (110, 257)),
 (1.1065241090421662e-05, (271, 289)),
 (1.2630476060459888e-05, (146, 292)),
 (1.890526033251962e-05, (147, 163)),
 (1.890526033251962e-05, (147, 164)),
 (1.9445171995935284e-05, (148, 161)),
 (1.9445171995935284e-05, (148, 162)),
 (1.9445171995935284e-05, (148, 163)),
 (2.9151082835306695e-05, (85, 275)),
 (2.9151082835306695e-05, (85, 276)),
 (2.9151082835306695e-05, (85, 277)),
 (3.719352546816023e-05, (146, 163)),
 (3.719352546816023e-05, (146, 164))]

# A list of the basepairing positions chosen by MSARi for its aggregate
# score.  These are greedily chosen from the above list so as not to
# form pseudoknots.

New pair likelihood
87 273 4.64888725572e-08
New pair likelihood
110 257 6.75255516522e-06
New pair likelihood
147 163 1.89052603325e-05
New pair likelihood
127 236 8.7403432487e-05

# The same analysis, for the control MSA constructed from the previous
# one by shuffling its columns.  Note that no basepairs are chosen for
# MSARi's aggregate score, as none are deemed to be significant enough.

[(0.00011720227243939334, (217, 239)),
 (0.00018042606433363268, (131, 173)),
 (0.00024637211912581714, (129, 174)),
 (0.00024637211912581714, (129, 176)),
 (0.00036603876620786452, (206, 292)),
 (0.00040230814431166644, (86, 107)),
 (0.00066283473494388478, (54, 192)),
 (0.00074123471495198738, (133, 171)),
 (0.00074123471495198738, (133, 172)),
 (0.00074123471495198738, (133, 173)),
 (0.0008298594717761873, (122, 228)),
 (0.00084932988694634006, (147, 166)),
 (0.00084932988694634006, (147, 167)),
 (0.00086640018032222278, (131, 174)),
 (0.00088792095160283478, (73, 234)),
 (0.00096097303955531362, (141, 161)),
 (0.00097856778543196671, (64, 83)),
 (0.00098208916126060969, (171, 219)),
 (0.001073218253118137, (133, 209)),
 (0.001073218253118137, (133, 210))]

# The aggregate scores of the two MSAs just analyzed.  These scores are
# recorded in the output_path file.  The MSAs generated are recorded in
# the msadir directory.


[5.187144008843207e-22, 1]

