已知位点突变后,我们需要对其功能进行注释。Annovar对这些变异位点进行注释, 得到一个易于理解的变异位点列表

一、简介

官网:http://annovar.openbioinformatics.org/en/latest/
可注释的基因组包括:human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others
注释的内容包括:

  • Gene-based: 判断SNP或CNV是否引起蛋白编码的改变或氨基酸的改变
  • Region-based: 判断突变是否在某些特殊的区域
  • Filter-based: 标注特殊数据库的一些属性
  • 其他功能

软件下载地址:http://bejerano.stanford.edu/MCAP/
(需要学校的邮箱才可以下载)

二、下载数据库

  •  最新数据库地址:http://annovar.openbioinformatics.org/en/latest/
  • 数据库下载说明:http://annovar.openbioinformatics.org/en/latest/user-guide/download/

方式一:通过annotate_variation

perl annotate_variation.pl -downdb -buildver hg19 -webfrom annovar mcap humandb/

方式二:通过aria2c

方式三:自己构建数据库,建立索引

丛JF来得到的 revel_all_chromosomes.csv
awk -F “,” ‘{if(NR==1){print “#” $1 “\t” $2 “\t” $2 “\t” $3 “\t” $4 “\t” $5 “\t” $6 “\t” $7 } else {print $1 “\t” $2 “\t” $2 “\t” $3 “\t” $4 “\t” $5 “\t” $6 “\t” $7 } }’ revel_all_chromosomes.csv >hg19_revel.txt
perl Annovar_index.pl hg19_revel.txt 1000

经过几次下载,大一点的数据库,我更推崇方式二,毕竟下载多少,速度咋样,是可以看到的

三、注释

说明:

3.1 gene-based-annoation

-operation为g

3.2 region-based-annoation

-operation为r

3.3Filter-bassed-annoation

-operation为f
qqin@lizard:[program]$head /bioinfo/software/packages/annovar-2015.06.17/humandb/hg19_ljb26_all.txt

LJB* (dbNSFP) non-synonymous variants annotation

这个数据库包括SIFT scores, PolyPhen2 HDIV scores, PolyPhen2 HVAR scores, LRT scores, MutationTaster scores, MutationAssessor score, FATHMM scores, GERP++ scores, PhyloP scores and SiPhy scores

为了以后更新的方便,这个库现一更名为dbnsfp30a

如果想单独注释其中的某一个数据库,可以单独下载该数据库The keyword used for downloading these data include: ljb23_sift, ljb23_pp2hdiv, ljb23_pp2hvar, ljb23_lrt, ljb23_mt, ljb23_ma, ljb23_fathmm, ljb23_metasvm, ljb23_metalr, ljb23_gerp++, ljb23_phylop, ljb23_siphy, ljb23_all. The ljb23_all includes ALL scores, and it is very useful in table_annovar.pl.

LJB23注释结果详解

Score (dbtype) # variants in LJB23 build hg19 Categorical Prediction
SIFT (sift) 77593284 D: Deleterious (sift<=0.05); T: tolerated (sift>0.05)
PolyPhen 2 HDIV (pp2_hdiv) 72533732 D: Probably damaging (>=0.957), P: possibly damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452)
PolyPhen 2 HVar (pp2_hvar) 72533732 D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hdiv<=0.909); B: benign (pp2_hdiv<=0.446)
LRT (lrt) 68069321 D: Deleterious; N: Neutral; U: Unknown
MutationTaster (mt) 88473874 A” (“disease_causing_automatic”); “D” (“disease_causing”); “N” (“polymorphism”); “P” (“polymorphism_automatic”
MutationAssessor (ma) 74631375 H: high; M: medium; L: low; N: neutral. H/M means functional and L/N means non-functional
FATHMM (fathmm) 70274896 D: Deleterious; T: Tolerated
MetaSVM (metasvm) 82098217 D: Deleterious; T: Tolerated
MetaLR (metalr) 82098217 D: Deleterious; T: Tolerated
GERP++ (gerp++) 89076718 higher scores are more deleterious
PhyloP (phylop) 89553090 higher scores are more deleterious
SiPhy (siphy) 88269630 higher scores are more deleterious

ljb2_pp2hvar被用于孟德尔疾病的诊断,which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles

ljb2_pp2hdiv被用于 evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data

参考资料:
http://doc-openbio.readthedocs.io/projects/annovar/en/latest/user-guide/filter/?highlight=sift

发表评论

电子邮件地址不会被公开。 必填项已用*标注