Specify an algorithm for SPOA
Align sequences using SPOA (and optionally get the consensus)
SpoaAlgo( algorithm = c("local", "global", "semi.global"), m = 5L, n = -4L, g = -8L, e = g, q = g, c = q ) spoaAlign(seq, algo = SpoaAlgo(), ...) # S3 method for character spoaAlign(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...) # S3 method for XStringSet spoaAlign(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...) # S3 method for QualityScaledXStringSet spoaAlign(seq, algo = SpoaAlgo(), w = 1, ...) # S3 method for ShortReadQ spoaAlign(seq, algo = SpoaAlgo(), w = 1, ...) # S3 method for derep spoaAlign(seq, algo = SpoaAlgo(), ...) spoaConsensus(seq, algo = SpoaAlgo(), ...) # S3 method for character spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...) # S3 method for XStringSet spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...) # S3 method for ShortRead spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...) # S3 method for QualityScaledXStringSet spoaConsensus(seq, algo = SpoaAlgo(), w = 1, ...) # S3 method for ShortReadQ spoaConsensus(seq, algo = SpoaAlgo(), w = 1, ...) # S3 method for derep spoaConsensus(seq, algo = SpoaAlgo(), ...)
algorithm | ( |
---|---|
m | (non-negative |
n | (non-positive |
g | (non-positive |
e | (non-positive |
q | (non-positive |
c | (non-positive |
seq | ( |
algo | ( |
... | additional parameters passed to methods |
w | ( |
qual | ( |
an object of class "SpoaAlgo
", suitable for passing to
spoaAlign()
or spoaConsensus()
For spoaConsensus()
, either a character
string or the
appropriate Biostrings::XString
, depending on the class of seq
.
For spoaAlign()
, either a
character
vector or a Biostrings::MultipleAlignment
, depending on
the class of seq
. If seq
is a BStringSet
(i.e., an XStringSet
which
is not specifically DNA, RNA, or AA) then the result is also a BStringSet
,
since there is no corresponding "BMultipleAlignment
" class. If seq
is a
ShortRead::ShortRead
object, the output is a
Biostrings::DNAMultipleAlignment
.
The general (convex) gap penalty formula is:
min(g + (i - i) * e, q + (i - 1) * c)
For q == g
and c == e
(as is the case unless q
or c
is explicitly
set), this simplifies to the affine gap penalty:
g + (i - 1) * e
If, additionally, e == g
(the default), then this is the linear gap
penalty:
g * i
Assigning a weight w to a sequence is equivalent to including that sequence w times; this is primarily useful for dereplicated sequences, where all sequences are unique, and the weight represents their multiplicity.
Note that POA is not order-independent, so results may differ when a set of
non-unique sequences is dereplicated. For the most predictable results, it
is recommended to sort dereplicated sequences in decreasing order of
abundance (as in dada2::derepFastq()
).
If seq
is an object which includes quality scores
(Biostrings::QualityScaledXStringSet
, ShortRead::ShortReadQ
,
or dada2::derep
)
then the alignment consensus is weighted using the quality scores. The method
used by SPOA uses the integer quality scores as per-position weights
analogous to sequence weights; i.e., if two different bases align in the same
position, one in a single sequence with quality score 40 and the other in a
single sequence with quality score 20, then this is equivalent to the first
base occurring at that position in 40 sequences without quality scores, and
the second base occurring at that position in 20 sequences without quality
scores.
It is possible to use a combination of sequence weights and quality scores.
In this case, in order for a dereplicated sequence set to give the same
results as the non-dereplicated sequences, the quality score for each of the
dereplicated sequences should be the mean of the quality scores for the
corresponding non-dereplicated sequences. This is the method used to generate
quality scores for dereplicated sequences in dada2::derepFastq()
.
sequences <- c( "CATAAAAGAACGTAGGTCGCCCGTCCGTAACCTGTCGGATCACCGGAAAGGACCCGTAAAGTGATAATGAT", "ATAAAGGCAGTCGCTCTGTAAGCTGTCGATTCACCGGAAAGATGGCGTTACCACGTAAAGTGATAATGATTAT", "ATCAAAGAACGTGTAGCCTGTCCGTAATCTAGCGCATTTCACACGAGACCCGCGTAATGGG", "CGTAAATAGGTAATGATTATCATTACATATCACAACTAGGGCCGTATTAATCATGATATCATCA", "GTCGCTAGAGGCATCGTGAGTCGCTTCCGTACCGCAAGGATGACGAGTCACTTAAAGTGATAAT", "CCGTAACCTTCATCGGATCACCGGAAAGGACCCGTAAATAGACCTGATTATCATCTACAT" ) spoaAlign(sequences)#> [1] "CATAAAAGAACG------T---------AGGTCGCCCGTCCGTAACC-T-GTCG-G-ATCACCGG-AA-A--G-G--A-CC-CGTAAAGTGATAATGAT-------------" #> [2] "------------------------ATAAAGGCAGTCGCTCTGTAAGC-T-GTCG-A-TTCACCGGAAAGATGGCGTTA-CCACGTAAAGTGATAATGATTAT----------" #> [3] "-ATCAAAGAACG------T-----------GTAGCCTGTCCGTAATC-T-AGCGCATTTCACACG--AGA---C-----CCGCGTAATGGG---------------------" #> [4] "------------CGTAAAT---------AGGTAAT-GAT-TATCAT--T-A-C--ATATCACAAC--T-A--G-G----GC-CGT-AT-TAATCATGA-TATCATCA-----" #> [5] "-------------------GTCGC-TAGAGGCA-T-CGTGAGTC-GC-T--TCC-G--T-ACCGCAAGGATGACG--AGTCACTTAAAGTGATAAT----------------" #> [6] "---------------------------------------CCGTAACCTTCATCG-G-ATCACCGG-AA-A--G-G--A-CC-CGTAAA-TAGACCTGATTATCATC-TACAT"spoaConsensus(sequences)#> [1] "CATCAAAGAACGTAGGTAGCCCGTCCGTAACCTGTCGGATCACCGGAAAGAGGACCCGTAAAGTGATAATGATTATCATCTACAT"