and also regular old alignment
Bastian Schiffthaler, Nicolas Delhomme
Aligner | Length | Gapped | Splice-aware | 3rd gen support? |
---|---|---|---|---|
BBMap | Any | Yes | Yes | Yes |
Bowtie | <50 | No | No | No |
Bowtie2 | ? | Yes* | No | No |
BWA | Any | Yes | No | Partial |
Minimap2 | Any | Yes | No | Yes |
GMAP/GSNAP | <300** | Yes | Yes | Yes |
HISAT2*** | Yes | Yes | Yes | No |
STAR | Yes | Yes | Yes | No |
*Not "true" gapped alignment
**GSNAP (value can be changed during compilation)
***Can use SNP info. Optimized for human, but can be adapted
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
* The time shown includes the (for some tools dominating) index loading step, which will be less influential (or even negligible) when mapping real-life datasets (>10 Mio reads).
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
**By default BBMap takes as much memory as the system provides. The minimum requirement for the used genome is 24GB.
Col | Name | Description |
---|---|---|
1 | QNAME | Query template name |
2 | FLAG | Bitwise flag |
3 | RNAME | Reference sequence name |
4 | POS | 1-based leftmost mapping position |
5 | MAPQ | Mapping quality |
6 | CIGAR | CIGAR string |
7 | RNEXT | Reference name of mate |
8 | PNEXT | Poisition of the mate |
9 | TLEN | Observed template length |
10 | SEQ | Segment sequence |
11 | QUAL | Sequence PHRED quality |
12+ | Additional data: TAG:TYPE:VALUE |
samtools view <Alignment SAM/BAM/CRAM>
FCC1L3GACXX:1:1308:5586:93026#
99
Potra000013
27834
254
100M
=
27953
219
CCCCGTTAGTACCATTTGAGTTCTCAACAGCCTGCTCCTGCTCCAATTTTCTCTTCTCCTTTTTCTTCTTCTTCTCTGATTTAGCATCCTCTGAAGCACC
@@CFFDDFHDHFHGHHIIGIIIEGHIHGGIGII@HEHIIIGGII9?FGHIIIGGIGIIIGGGIIIIIIIIIIIIIHICHFEHEHFFFFFCEECCEEDDDD
NH:i:1
HI:i:1
AS:i:196
nM:i:1
MD:Z:100
NM:i:0
https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/
Select most likely transcript among a set of candidates
Experimental datasets are generally more complex and include reads that originate from segments that are not part of the annotated transcripts.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02151-8#Sec21
Salmon re-samples counts in equivalence classes to estimate uncertainty in abundance estimation
Terminus can collapse transcripts that
have too much uncertainty in the abundance estimation into a group for which the abundance can be estimated accurately.
The group is analyzed as a unit.
Fishpond uses uncertainty estimates for differential transcript and gene expression.
Some hints:
#!/usr/bin/env bash
salmon index --help
salmon quant --help-reads
TRANSCRIPTS=~/raw_data/reference/Pabies1.0-all.phase.gff3.CDS.fa
SEQDATADIR=~/raw_data/trimmomatic