This page will guide you through the meaning of keywords and features we have in our website.
The type refers to the different types of RNA based on their functional annotation from the source database. For example, transfer RNA (tRNA) or transfer messenger RNA (tmRNA) are two different types of RNAs. Each indiviual sequence in our database should have a RNA type if available from the original source. Note: some types are duplicated due to merging different databases, but we intend to fix this in an update.
The reference database refers to the original database from which our data was downloaded. We downloaded our data from 7 different sources The Comparative RNA Web site (CRW), tmRNA Database (tmRNA), Signal Recognition Particle Database (SRP), Sprinzl tRNA Database (SPR), The RNase P Database (RNP), The RNA Family Database (RFAM), RCSB Protein Data (PDB). You can discover the source of each individual RNA from the bpRNA ID. For example, bpRNA_CRW_1, shows that this specific RNA is from CRW database and is number 1 RNA in our list for that specific reference database.
This field shows the sequence length in number of nucleotides for each RNA in our database.
Each individual RNA has its own sequence. You can search the whole database by searching for specific sequence or partial sequence and find the list of similar seqeunces in the Advanced Search.
Most sequences in our database have a specified taxonomic omain that extracted from NCBI Taxonomy database along with the phylogentic lineage. There are some examples where extracting the domain was not possible from the reference database, in which case we set them simply to "NA". Otherwise, the domains are listed as Archaea, Bacteria, Eukaryota, and Viruses.
This field describes the organism from which the RNA is derived. Similar to Phylogenetic Classes, there are some RNAs in our database that the Organism is Not Availble (NA) since it was not possible to extract it from the original database. You can search RNAs based on the desire Organism in the Advanced Search.
The method is the methodology that was used to generate a given RNA secondary structure. All sources except PDB used "Comparative Sequence Analysis" to extract the RNA secondary structures. If an RNA is extracted from PDB database, they are derived from experimental techniques such as X-ray diffraction and NMR spectroscopy that are specified in this field.
This field shows the techinque used for validating the structure of an RNA. In case of the PDB database, NMR or X-Ray are the two validation techniques. Also, in cased of the RFAM data, the validation can be based on the published article or "predicted" along with the prediction algorithm. Lastly, if the validation technique is not mentioned in the reference data, this value is set to "Unknown."
Some RNAs in our database have PubMedID which specifies the Manuscript ID of an article related to that specefic RNA.
PubMed Central IDs of PMCID is an unique identifier for PMC articles that are availables for some sequences in bpRNA.
(single letter code: B)
Bulges are defined as an unpaired sequence that is flanked on both sides by base pairs that are adjacent on the other strand. More formally, the bulge is flanked by base pairs at (i,j) and (i+1,k), where j and k are on both sides of the bulge. There are some features that can be searched related to bulges in a structure, like number of bulges, the length of a bulge loop, the bulge loop sequence, and whether a bulge is involved in a pseudoknot formation. To search for a specific bulge loop length, users can search using the between two length options, greater than or equal to some value, and less than or equal to some other value. Similar options also available for searching number of bulges loop in each RNA.
(single letter code: H)
Hairpin loops are defined as unpaired sequences that are flanked by a base pair. More formally, the harpin loop associated with a base pair at (i,j) has a sequence from position i+1 to j-1. The features that can be searched for that are related to hairpin loops consist of the number of loops, length of the loop, the hairpin loop sequence, and whether the loop associated with pseudoknot formation. The search options have different options for finding different hairpin size and number of hairpins in an RNA secondary structures.
(single letter code: M)
Multiloops, also called multi-branched loops, are 3 or more unpaired sequences that form a cycle connected by their flanking base pairs. The features that can be searched for that are related to multiloops (a.k.a. "multi-branched loops") are the length and the number of multiloops, the multiloop sequence, and a feature for whether the loop is associated with pseudoknot formation. Users can easily search for a different length range and number of this type loop in RNA secondary structures of bpRNA. The order of multiloops are presented from 5' to 3'.
Note: There exist multiloops with length zero that can be searched using the advance search options. In structure type files and in bpRNA, the start and stop positions for loops of length 0 are presented in reverse order. In other words, if there is a multiloop of length 0 between the positions 21 and 21, then the start and stop for the multiloop is presented as start=22, stop=21. The advantage of doing this is it preserves the normal equation for lthe length of a loop as length=stop-start+1. In this example, the length is then 21-22+1=0.
(single letter code: I)
Internal loops are defined as a pair of unpaired sequences in between two base pairs. The features related to internal loops that can be searched are the Number of internal loop in an RNA structure, the length of an internal loop, and whether the internal loop contributes to pseudoknot creation are the features in this category. The length and number of occurances can be searched by a range (between, greater or equal to, and less than or equal to some value(s).
(single letter code: K)
Pseudoknots exist in a structure when there are base pairs (i,j) and (i',j') such that i<i'<j<j' or i'<i<j'<j. We define the pseudoknots in bpRNA as the minimum set of base pairs that, when removed, produce a pseudoknot-free structure. The number of pseduoknots in RNA secondary structures, number of base pairs involved in a pseudoknot, the loops type are searchable.
There are two types of exterior features
(single letter code: E)
The first type of exterior features is "dangling ends", or just "ends", which are the unpaired bases at the beginining or at the end of sequence.
(single letter code: X)
The other type of exterior feature is "exernal loops", which are similar to multiloop branches, except they do not form a closed cycle connected by common flanking base pairs. The number of bases in the exterior regions as well as length and the sequence are searchable in bpRNA.
(single letter code: S)
A stem is defined as a stretch of neighboring base pairs, not interupted by bulges, internal loops, or any unpaired nucleotides.The number of stems and the length of stems in an RNA secondary structures can be searched using the advanced search. Moreover, the 5' and the 3' sequence are searchable across the bpRNA data.
Note: The length of a stem simply explains the number of the stem base pairing.
A segment is defined as a stretch of neighboring base pairs, but unlike a stem, a segment can contain bulges and internal loops. A segment can not contain a multiloop. Features that are related to segments that are searchable are the Number of segments per RNA and the number of base pairs involved in each segments are accessible using the advance search option.
Each RNA secondary structure consits of one or more segments that are color-coded in this image. We alternate between 9 colors listed below to represent the segments in each structure. PK segments are color coded in gray (color number 10).
1. AB
2. AB
3. AB
4. AB
5. AB
6. AB
7. AB
8. AB
9. AB
10. AB
RNA secondary structures can be color-coded based on structures such as hairpins, bulges, multiloops,and etc. Each structure type is represented in a different color (see below):
1. AB Stems (S)
2. AB Hairpin Loops (H)
3. AB Bulges (B)
4. AB Internal Loops (I)
5. AB MultiLoops (M)
6. AB External Loops (X)
7. AB Dangling Ends (E)
The linear representation of the RNA secondary structures can clearly show the various page numbers across the sequence. Page number 1 is a nested representation (No pseudoknot). The page number goes up if we have multiple crossing pseuodknots along with the nested structures. Each page number is colored coded in this representation (see below):
AB Page Number 1
AB Page Number 2
AB Page Number 3
AB Page Number 4
AB Page Number 5
AB Page Number 6
AB Page Number 7