pid {Biostrings} | R Documentation |
Percent Sequence Identity
Description
Calculates the percent sequence identity for a pairwise sequence alignment.
Usage
pid(x, type="PID1")
Arguments
x |
a |
type |
one of percent sequence identity. One of |
Details
Since there is no universal definition of percent sequence identity, the pid
function
calculates this statistic in the following types:
"PID1"
:-
100 * (identical positions) / (aligned positions + internal gap positions)
"PID2"
:-
100 * (identical positions) / (aligned positions)
"PID3"
:-
100 * (identical positions) / (length shorter sequence)
"PID4"
:-
100 * (identical positions) / (average length of the two sequences)
Value
A numeric vector containing the specified sequence identity measures.
Author(s)
P. Aboyoun
References
A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737.
G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415.
See Also
pairwiseAlignment, PairwiseAlignments-class, match-utils
Examples
s1 <- DNAString("AGTATAGATGATAGAT")
s2 <- DNAString("AGTAGATAGATGGATGATAGATA")
palign1 <- pairwiseAlignment(s1, s2)
palign1
pid(palign1)
palign2 <-
pairwiseAlignment(s1, s2,
substitutionMatrix =
nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
palign2
pid(palign2, type = "PID4")