【2.2.4.2】Illumina Variant Calling Pipeline
Briefly, the Illumina variant calling pipeline is as follows: After extracting FASTQ files from the SRA Normalized data, the reads are trimmed based on paired/single-end status using trimmomatic. Next they are aligned to the SARS-CoV-2 reference (NC_045512.2) using HISAT2 and variants are called using GATK. Low quality variant calls are then filtered-out, the calls are normalized, then the calls are annotated for their protein effect using snpeff, and the VCF file validated.
ASTQ extraction
fasterq-dump --split-files $acc
Read Trimming
Paired-End
java -jar /usr/local/trimmomatic/0.33/trimmomatic-0.33.jar PE -phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Single-End
java -jar /usr/local/trimmomatic/0.33/trimmomatic-0.33.jar SE -phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Alignment
hisat2 --no-spliced-alignment --no-unal -x $ref -q | samtools view -Sb -F256 - | samtools sort > $output.bam
samtools index $output.bam
Variant Calling
gatk HaplotypeCaller -R $ref -I $input.bam -O $output.gvcf --minimum-mapping-quality 10 --ploidy 2 -ERC BP_RESOLUTION
Variant Filtering
gatk VariantFiltration -R $ref -V $input -O $output \
--filter-name "lowAD10" \
--filter-expression 'vc.getGenotype("{wildcards.acc}").getAD().1 < 10' \
--filter-name "lowQUAL100" \
--filter-expression 'QUAL < 100'
--filter-name "genomeEnd" \
--filter-expression 'POS > 29850' \
--filter-name "highFS60" \
--filter-expression 'FS >= 60.0' \
--filter-name "lowQD2.0" \
--filter-expression 'QD < 2.0' \
--filter-name "lowReadPosRankSum4.0" \
--filter-expression 'ReadPosRankSum < -4.0' \
--filter-name "highSOR4.0" \
--filter-expression 'SOR >= 4.0'
Variant Normalization
gatk LeftAlignAndTrimVariants --split-multi-allelics \
-R $ref \
-V $input \
-O $output
Variant Annotation
snpeff ann -nodownload -formatEff -classic -noStats -noLog -quiet -no-upstream -no-downstream -c $snpeff_config sars2 -v $input
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn