【4.1.2】Annovar--位点注释
已知位点突变后,我们需要对其功能进行注释。Annovar对这些变异位点进行注释, 得到一个易于理解的变异位点列表
一、简介
官网:http://annovar.openbioinformatics.org/en/latest/ 可注释的基因组包括:human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others 注释的内容包括:
- Gene-based: 判断SNP或CNV是否引起蛋白编码的改变或氨基酸的改变
- Region-based: 判断突变是否在某些特殊的区域
- Filter-based: 标注特殊数据库的一些属性
- 其他功能
软件下载地址:http://bejerano.stanford.edu/MCAP/ (需要学校的邮箱才可以下载)
二、下载数据库
-
最新数据库地址:http://annovar.openbioinformatics.org/en/latest/
-
数据库下载说明:http://annovar.openbioinformatics.org/en/latest/user-guide/download/
方式一:通过annotate_variation
perl annotate_variation.pl -downdb -buildver hg19 -webfrom annovar mcap humandb/
mcap下载的库
awk '{if(NR!=1) print $1 "\t" $2 "\t" $2}' mcap_v1_0.txt >mcap_v1_0_change.txt
perl Annovar_index.pl mcap_v1_0_change.txt 500
perl annotate_variation.pl -downdb -buildver hg19 -webfrom annovar dbscsnv11 humandb/
方式二:通过aria2c
aria2c http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp33a.txt.idx.gz
aria2c http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp33a.txt.gz
方式三:自己构建数据库,建立索引
丛JF来得到的 revel_all_chromosomes.csv
awk -F "," '{if(NR==1){print "#" $1 "\t" $2 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 } else {print $1 "\t" $2 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 } }' revel_all_chromosomes.csv >hg19_revel.txt
perl Annovar_index.pl hg19_revel.txt 1000
经过几次下载,大一点的数据库,我更推崇方式二,毕竟下载多少,速度咋样,是可以看到的
三、注释
/bioinfo/software/bin/table_annovar.pl -buildver hg19 -protocol mcap -operation g,r,f -nastring . -remove -otherinfo --vcfinput S1570169.g.vcf /home/qqin/download/annovar/humandb
说明:
The -operation argument tells ANNOVAR which operations to use for each of the protocols:
g means gene-based,
r means region-based
f means filter-based.
3.1 gene-based-annoation
-operation为g
qqin@lizard:[program]$head /bioinfo/software/packages/annovar-2015.06.17/humandb/hg19_refGene.txt
19 NM_001291929 chr11 - 89057521 89223909 89059923 89223852 17 89057521,89069012,89070614,89073230,89075241,89088129,89106599,89133184,89133382,89135493,89155069,89165951,89173855,89177302,89182607,89184952,89223774, 89060044,89069113,89070683,89073339,89075361,89088211,89106660,89133247,89133547,89135710,89155150,89166024,89173883,89177400,89182692,89185063,89223909, 0 NOX4 cmpl cmpl 2,0,0,2,2,1,0,0,0,2,2,1,0,1,0,0,0,985 NM_016039 chr14 + 52456227 52471420 52456357 52471234 8 52456227,52458034,52460440,52465211,52466425,52468515,52470911,52471079, 524
3.2 region-based-annoation
-operation为r
qqin@lizard:[program]$head /bioinfo/software/packages/annovar-2015.06.17/humandb/hg19_cytoBand.txt
chr1 0 2300000 p36.33 gneg
chr1 2300000 5400000 p36.32 gpos25
chr1 5400000 7200000 p36.31 gneg
chr1 7200000 9200000 p36.23 gpos25
chr1 9200000 12700000 p36.22 gneg
3.3 Filter-bassed-annoation
-operation为f
qqin@lizard:[program]$head /bioinfo/software/packages/annovar-2015.06.17/humandb/hg19_ljb26_all.txt
#Chr Start End Ref Alt SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred FATHMM_score FATHMM_pred RadialSVM_score RadialSVM_pred LR_score LR_pred VEST3_score CADD_raw CADD_phred GERP++_RS phyloP46way_placental phyloP100way_vertebrate SiPhy_29way_logOdds
1 35138 35138 T A . . . . . . . . 1.000 N . . . . . .. . . -0.886 0.467 0.742 0.593 0.339 3.824
1 35138 35138 T G . . . . . . . . 1.000 N . . . . . .. . . -0.996 0.267 0.742 0.593 0.339 3.824
1 35139 35139 T A . . . . . . . . 1.000 N
LJB (dbNSFP) non-synonymous variants annotation
这个数据库包括SIFT scores, PolyPhen2 HDIV scores, PolyPhen2 HVAR scores, LRT scores, MutationTaster scores, MutationAssessor score, FATHMM scores, GERP++ scores, PhyloP scores and SiPhy scores
为了以后更新的方便,这个库现一更名为dbnsfp30a
annotate_variation.pl -downdb -webfrom annovar -buildver hg19 dbnsfp30a humandb/
table_annovar.pl ex1.avinput humandb/ -protocol dbnsfp30a -operation f -build hg19 -nastring .
如果想单独注释其中的某一个数据库,可以单独下载该数据库The keyword used for downloading these data include: ljb23_sift, ljb23_pp2hdiv, ljb23_pp2hvar, ljb23_lrt, ljb23_mt, ljb23_ma, ljb23_fathmm, ljb23_metasvm, ljb23_metalr, ljb23_gerp++, ljb23_phylop, ljb23_siphy, ljb23_all. The ljb23_all includes ALL scores, and it is very useful in table_annovar.pl.
LJB23注释结果详解
Score (dbtype) | # variants in LJB23 build hg19 | Categorical Prediction |
---|---|---|
SIFT (sift) | 77593284 | D: Deleterious (sift<=0.05); T: tolerated (sift>0.05) |
PolyPhen 2 HDIV (pp2_hdiv) | 72533732 | D: Probably damaging (>=0.957), P: possibly damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452) |
PolyPhen 2 HVar (pp2_hvar) | 72533732 | D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hdiv<=0.909); B: benign (pp2_hdiv<=0.446) |
LRT (lrt) | 68069321 | D: Deleterious; N: Neutral; U: Unknown |
MutationTaster (mt) | 88473874 | A” (“disease_causing_automatic”); “D” (“disease_causing”); “N” (“polymorphism”); “P” (“polymorphism_automatic” |
MutationAssessor (ma) | 74631375 | H: high; M: medium; L: low; N: neutral. H/M means functional and L/N means non-functional |
MetaSVM (metasvm) | 82098217 | D: Deleterious; T: Tolerated |
MetaLR (metalr) | 82098217 | D: Deleterious; T: Tolerated |
GERP++ (gerp++) | 89076718 | higher scores are more deleterious |
PhyloP (phylop) | 89553090 | higher scores are more deleterious |
SiPhy (siphy) | 88269630 | higher scores are more deleterious |
ljb2_pp2hvar被用于孟德尔疾病的诊断,which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles
ljb2_pp2hdiv被用于 evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data
四、讨论
4.1 注释出来的结果解释
有不明白的,参考这里就可以
- https://brb.nci.nih.gov/seqtools/colexpanno.html
- https://annovar.openbioinformatics.org/en/latest/user-guide/gene/
参考资料
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn