MultipleAlignment-class {Biostrings} | R Documentation |
MultipleAlignment objects
Description
The MultipleAlignment class is a container for storing multiple sequence alignments.
Usage
## Constructors:
DNAMultipleAlignment(x=character(), start=NA, end=NA, width=NA,
use.names=TRUE, rowmask=NULL, colmask=NULL)
RNAMultipleAlignment(x=character(), start=NA, end=NA, width=NA,
use.names=TRUE, rowmask=NULL, colmask=NULL)
AAMultipleAlignment(x=character(), start=NA, end=NA, width=NA,
use.names=TRUE, rowmask=NULL, colmask=NULL)
## Read functions:
readDNAMultipleAlignment(filepath, format)
readRNAMultipleAlignment(filepath, format)
readAAMultipleAlignment(filepath, format)
## Write funtions:
write.phylip(x, filepath)
## ... and more (see below)
Arguments
x |
Either a character vector (with no NAs), or an XString, XStringSet or XStringViews object containing strings with the same number of characters. If writing out a Phylip file, then x would be a MultipleAlignment object |
start , end , width |
Either |
use.names |
|
filepath |
A character vector (of arbitrary length when reading, of length 1
when writing) containing the paths to the files to read or write.
Note that special values like |
format |
Either |
rowmask |
a NormalIRanges object that will set masking for rows |
colmask |
a NormalIRanges object that will set masking for columns |
Details
The MultipleAlignment class is designed to hold and represent multiple sequence alignments. The rows and columns within an alignment can be masked for ad hoc analyses.
Accessor methods
In the code snippets below, x
is a MultipleAlignment object.
-
unmasked(x)
: The underlying XStringSet object containing the multiple sequence alignment. -
rownames(x)
:NULL
or a character vector of the same length asx
containing a short user-provided description or comment for each sequence inx
. -
rowmask(x)
,rowmask(x, append, invert) <- value
: Gets and sets the NormalIRanges object representing the masked rows inx
. Theappend
argument takesunion
,replace
orintersect
to indicate how to combine the newvalue
withrowmask(x)
. Theinvert
argument takes a logical argument to indicate whether or not to invert the new mask. Thevalue
argument can be of any class that is coercible to a NormalIRanges via theas
function. -
colmask(x)
,colmask(x, append, invert) <- value
: Gets and sets the NormalIRanges object representing the masked columns inx
. Theappend
argument takesunion
,replace
orintersect
to indicate how to combine the newvalue
withcolmask(x)
. Theinvert
argument takes a logical argument to indicate whether or not to invert the new mask. Thevalue
argument can be of any class that is coercible to a NormalIRanges via theas
function. -
maskMotif(x, motif, min.block.width=1, ...)
: Returns a MultipleAlignment object with a modified column mask based upon motifs found in the consensus string where the consensus string keeps all the columns but drops the masked rows.- motif
The motif to mask.
- min.block.width
The minimum width of the blocks to mask.
- ...
Additional arguments for
matchPattern
.
-
maskGaps(x, min.fraction, min.block.width)
: Returns a MultipleAlignment object with a modified column mask based upon gaps in the columns. In particular, this mask is defined bymin.block.width
or more consecutive columns that havemin.fraction
or more of their non-masked rows containing gap codes.- min.fraction
A value in
[0, 1]
that indicates the minimum fraction needed to call a gap in the consensus string (default is0.5
).- min.block.width
A positive integer that indicates the minimum number of consecutive gaps to mask, as defined by
min.fraction
(default is4
).
-
nrow(x)
: Returns the number of sequences aligned inx
. -
ncol(x)
: Returns the number of characters for each alignment inx
. -
dim(x)
: Equivalent toc(nrow(x), ncol(x))
. -
maskednrow(x)
: Returns the number of masked aligned sequences inx
. -
maskedncol(x)
: Returns the number of masked aligned characters inx
. -
maskeddim(x)
: Equivalent toc(maskednrow(x), maskedncol(x))
. -
maskedratio(x)
: Equivalent tomaskeddim(x) / dim(x)
. -
nchar(x)
: Returns the number of unmasked aligned characters inx
, i.e.ncol(x) - maskedncol(x)
. -
alphabet(x)
: Equivalent toalphabet(unmasked(x))
.
Coercion
In the code snippets below, x
is a MultipleAlignment object.
-
as(from, "DNAStringSet")
,as(from, "RNAStringSet")
,as(from, "AAStringSet")
,as(from, "BStringSet")
: Creates an instance of the specified XStringSet object subtype that contains the unmasked regions of the multiple sequence alignment inx
. -
as.character(x, use.names)
: Convertx
to a character vector containing the unmasked regions of the multiple sequence alignment.use.names
controls whether or notrownames(x)
should be used to set the names of the returned vector (default isTRUE
). -
as.matrix(x, use.names)
: Returns a character matrix containing the "exploded" representation of the unmasked regions of the multiple sequence alignment.use.names
controls whether or notrownames(x)
should be used to set the row names of the returned matrix (default isTRUE
).
Utilities
In the code snippets below, x is a MultipleAlignment object.
-
consensusMatrix(x, as.prob, baseOnly)
: Creates an integer matrix containing the column frequencies of the underlying alphabet with masked columns being represented withNA
values. Ifas.prob
isTRUE
, then probabilities are reported, otherwise counts are reported (the default). IfbaseOnly
isTRUE
, then the non-base letters are collapsed into an"other"
category. -
consensusString(x, ...)
: Creates a consensus string forx
with the symbol"#"
representing a masked column. SeeconsensusString
for details on the arguments. -
consensusViews(x, ...)
: Similar to theconsensusString
method. It returns a XStringViews on the consensus string containing subsequence contigs of non-masked columns. Unlike theconsensusString
method, the masked columns in the underlying string contain a consensus value rather than the"#"
symbol. -
alphabetFrequency(x, as.prob, collapse)
: Creates an integer matrix containing the row frequencies of the underlying alphabet. Ifas.prob
isTRUE
, then probabilities are reported, otherwise counts are reported (the default). Ifcollapse
isTRUE
, then returns the overall frequency instead of the frequency by row. -
detail(x, invertColMask, hideMaskedCols)
: Allows for a full pager driven display of the object so that masked cols and rows can be removed and the entire sequence can be visually inspected. IfhideMaskedCols
is set to it's default value ofTRUE
then the output will hide all the the masked columns in the output. Otherwise, all columns will be displayed along with a row to indicate the masking status. IfinvertColMask
isTRUE
then any displayed mask will be flipped so as to represent things in a way consistent with Phylip style files instead of the mask that is actually stored in theMultipleAlignment
object. Please notice thatinvertColMask
will be ignored ifhideMaskedCols
is set to its default value ofTRUE
since in that case it will not make sense to show any masking information in the output. Masked rows are always hidden in the output.
Display
The letters in a DNAMultipleAlignment or RNAMultipleAlignment object
are colored when displayed by the show()
method. Set global
option Biostrings.coloring
to FALSE to turn off this coloring.
Author(s)
P. Aboyoun and M. Carlson
See Also
XStringSet-class, MaskedXString-class
Examples
## create an object from file
origMAlign <-
readDNAMultipleAlignment(filepath =
system.file("extdata",
"msx2_mRNA.aln",
package="Biostrings"),
format="clustal")
## list the names of the sequences in the alignment
rownames(origMAlign)
## rename the sequences to be the underlying species for MSX2
rownames(origMAlign) <- c("Human","Chimp","Cow","Mouse","Rat",
"Dog","Chicken","Salmon")
origMAlign
## See a detailed pager view
if (interactive()) {
detail(origMAlign)
}
## operations to mask rows
## For columns, just use colmask() and do the same kinds of operations
rowMasked <- origMAlign
rowmask(rowMasked) <- IRanges(start=1,end=3)
rowMasked
## remove rowumn masks
rowmask(rowMasked) <- NULL
rowMasked
## "select" rows of interest
rowmask(rowMasked, invert=TRUE) <- IRanges(start=4,end=7)
rowMasked
## or mask the rows that intersect with masked rows
rowmask(rowMasked, append="intersect") <- IRanges(start=1,end=5)
rowMasked
## TATA-masked
tataMasked <- maskMotif(origMAlign, "TATA")
colmask(tataMasked)
## automatically mask rows based on consecutive gaps
autoMasked <- maskGaps(origMAlign, min.fraction=0.5, min.block.width=4)
colmask(autoMasked)
autoMasked
## calculate frequencies
alphabetFrequency(autoMasked)
consensusMatrix(autoMasked, baseOnly=TRUE)[, 84:90]
## get consensus values
consensusString(autoMasked)
consensusViews(autoMasked)
## cluster the masked alignments
sdist <- stringDist(as(autoMasked,"DNAStringSet"), method="hamming")
clust <- hclust(sdist, method = "single")
plot(clust)
fourgroups <- cutree(clust, 4)
fourgroups
## write out the alignement object (with current masks) to Phylip format
write.phylip(x = autoMasked, filepath = tempfile("foo.txt",tempdir()))