and also regular old alignment
Bastian Schiffthaler, Nicolas Delhomme
| Aligner | Length | Gapped | Splice-aware | 3rd gen support? |
|---|---|---|---|---|
| BBMap | Any | Yes | Yes | Yes |
| Bowtie | <50 | No | No | No |
| Bowtie2 | ? | Yes* | No | No |
| BWA | Any | Yes | No | Partial |
| Minimap2 | Any | Yes | No | Yes |
| GMAP/GSNAP | <300** | Yes | Yes | Yes |
| HISAT2*** | Yes | Yes | Yes | No |
| STAR | Yes | Yes | Yes | No |
*Not "true" gapped alignment
**GSNAP (value can be changed during compilation)
***Can use SNP info. Optimized for human, but can be adapted
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
* The time shown includes the (for some tools dominating) index loading step, which will be less influential (or even negligible) when mapping real-life datasets (>10 Mio reads).
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
https://www.ecseq.com/support/ngs/best-RNA-seq-aligner-comparison-of-mapping-tools
**By default BBMap takes as much memory as the system provides. The minimum requirement for the used genome is 24GB.
| Col | Name | Description |
|---|---|---|
| 1 | QNAME | Query template name |
| 2 | FLAG | Bitwise flag |
| 3 | RNAME | Reference sequence name |
| 4 | POS | 1-based leftmost mapping position |
| 5 | MAPQ | Mapping quality |
| 6 | CIGAR | CIGAR string |
| 7 | RNEXT | Reference name of mate |
| 8 | PNEXT | Poisition of the mate |
| 9 | TLEN | Observed template length |
| 10 | SEQ | Segment sequence |
| 11 | QUAL | Sequence PHRED quality |
| 12+ | Additional data: TAG:TYPE:VALUE |
samtools view <Alignment SAM/BAM/CRAM>
FCC1L3GACXX:1:1308:5586:93026#
99
Potra000013
27834
254
100M
=
27953
219
CCCCGTTAGTACCATTTGAGTTCTCAACAGCCTGCTCCTGCTCCAATTTTCTCTTCTCCTTTTTCTTCTTCTTCTCTGATTTAGCATCCTCTGAAGCACC
@@CFFDDFHDHFHGHHIIGIIIEGHIHGGIGII@HEHIIIGGII9?FGHIIIGGIGIIIGGGIIIIIIIIIIIIIHICHFEHEHFFFFFCEECCEEDDDD
NH:i:1
HI:i:1
AS:i:196
nM:i:1
MD:Z:100
NM:i:0
https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/
Select most likely transcript among a set of candidates
Experimental datasets are generally more complex and include reads that originate from segments that are not part of the annotated transcripts.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02151-8#Sec21
Salmon re-samples counts in equivalence classes to estimate uncertainty in abundance estimation
Terminus can collapse transcripts that
have too much uncertainty in the abundance estimation into a group for which the abundance can be estimated accurately.
The group is analyzed as a unit.
Fishpond uses uncertainty estimates for differential transcript and gene expression.
Some hints:
#!/usr/bin/env bash
salmon index --help
salmon quant --help-reads
TRANSCRIPTS=~/raw_data/reference/Pabies1.0-all.phase.gff3.CDS.fa
SEQDATADIR=~/raw_data/trimmomatic