【2.3】NGS质控常用的几个参数

一直说要谈谈怎么做好NGS的质控,可没时间整理,我这里先甩一些参数吧,后面持续更新

一、 数据质量参数

1. AB 杂合性

Allele Balance for homozygous calls (A/(A+O)) where A is the allele (ref or alt) and O is anything other

默认阈值:

het : [0.25,0.7]
homo: [0.75,1]

问题:

[0.2-0.25][0.7-0.75] 真的就不行么??

解决办法:

这部分致病位点反应在质控报告中被过滤掉的致病位点

2. DP

Approximate read depth; some reads may have been filtered 默认阈值:>= 20

3. QUAL

The Phred-scaled probability that a REF/ALT polymorphism exists at this site given sequencing data. Because the Phred scale is -10 * log(1-p), a value of 10 indicates a 1 in 10 chance of error, while a 100 indicates a 1 in 10^10 chance (see the FAQ article for a detailed explanation)

默认值: >= 30

4. QD (Variant Confidence/Quality by Depth)

variant Confidence/Quality by Depth 默认值:>= 2

5. FS

Phred-scaled p-value using Fisher’s exact test to detect strand bias 检验突变位点来自某个方向的序列(正反方向上同时有突变) 默认值:Indel >= 200;SNP >= 60 参考资料

6. ReadPosRankSum

序列末尾的碱基可信度不高,但有些位点就是在序列的末尾,我们不能直接根据在测序序列的最后多少个碱基来判断这个位点是否保留,所以需要根据位置来给ref和allele位置的比值打分。 (Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias) 这个值越靠近0,越可信;负值代表alt比ref allele更靠近末尾的序列,反之。 默认值:Indel >= -20; SNP >= -8

7. MQ

RMS Mapping Quality(Root Mean Square of the mapping quality of reads across all samples) 默认值:SNP>= 40

8. MQRankSum(SNP)

Rank Sum Test for mapping qualities of REF versus ALT reads 秩和检验,分值越靠近0越好,越低,代表ref base比alt base的质量要高, 默认值:>= -12.5

二、致病位点预测分数

soft - details
SIFT (sift) 77593284 D: Deleterious (sift<=0.05); T: tolerated (sift>0.05)
PolyPhen 2 HDIV (pp2_hdiv) 72533732 D: Probably damaging (>=0.957), P: possibly damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452) PolyPhen 2 HVar (pp2_hvar) 72533732 D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hdiv<=0.909); B: benign (pp2_hdiv<=0.446)

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn