REAPR: Realignment for Prediction of Structural Non-coding RNA
Documentation
Please follow the
Installation instructions
to install a stand-alone version of REAPR and the required software
packages Vienna RNA, RNAz, LocARNA, and (optionally squid) in a
GNU/Linux (or other Unix-like) environment.
REAPR runs with python v2.7 or higher (see Usage). In REAPR, configure
the location of the above software packages with
Command: python configure.py [-h] [--locarna-prefix LOCARNA_PREFIX]
[--rnaz-prefix RNAZ_PREFIX] [--alistat ALISTAT]
[--compalignp COMPALIGNP]
--help Display help message
--locarna-prefix Installation directory of LocARNA package.
--rnaz-prefix Installation directory of RNAz package.
--alistat Path to alistat command from squid package.
Usage
Formatting the WGA before running REAPR:
(1) The WGA must be in MAF format. Each alignment block of the WGA must
be in a separate MAF file.
(2) Create a file with two columns separated by tabs. The first
column lists the names of the WGA blocks , and the second column lists
the location of the corresponding MAF alignment files.
(3) Create a file with a list of all species in the WGA, separated by
newlines. The species names must match those used in MAF alignment
files.
(4) Create a file with a Newick-format tree, without branch lengths,
of all species. This tree is used as the guide for progressive
alignment by LocARNA.
Run REAPR by calling
python REAPR.py [options]
with options
-h --help show this help message and exit
-a --alignments Space-separated list of WGA alignment block files.
-s --species Space-separated list of species in WGA. Species names
must be the same as those listed in the alignment
block files.
-g --guide-tree Species guide tree (in Newick format, without branch
lengths) for progressive alignment by LocARNA.
-o --output-folder Directory to write output files (Default: present
working directory)
-d --delta Space-separated list of realignment deviations.
(Default: 20)
-t --threshold Stability filter threshold. Filter out windows whose
mean MFE z-score is above this threshold (Default: -1)
-p --processes Number of cores to use for multiprocessing (Default: 1)
-r --ram-disk Location of RAM Disk to write temporary files.
Specifying a RAM Disk minimizes random access on
disk storage. This is highly recommended as
REAPR will write many small files. (Note: this
is typically /dev/shm in Ubuntu, and other Linux
systems)
--alistat Compute sequence identities of alignments using alistat
--compalignp Compute change between original alignment and realignment
using compalignp
Output
REAPR will generate the following files in the folder specified with
--output-folder
(1) A 'wga' folder containing the resulting files of an RNAz screen on the
original WGA.
(2) A table 'original_wga.tab' containing a summary of the RNAz screen
on the original WGA.
(3) A 'loci' folder containing the resulting files from realigning loci and
running an RNAz screen on the realignments.
(4) For every delta
specified for --delta, the tables
'locarna.g..tab' containing a summary of the RNAz screen on the realignments.
(5) A table 'summary.tab' containing a summary of REAPR. It includes
the RNAz score of every locus based on its alignment in the
original WGA and after realignment. If specified with --alistat
or --compalign, it also includes the sequence identities, computed
by alistat, of the loci in the original WGA and after realignment,
and how different the realignment is from the original, computed
using compalignp.
The first line of every table contains a header describing every column.