【8.2】基因组坐标转基因名(pyensembl)

坐标转基因

一、pyensembl

1.1 安装pyensembl

activate3
pip install pyensembl 

下载gtf注释文件

cd /data/database/genome
wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.refGene.gtf.gz
gzip -d hg38.refGene.gtf.gz


grep -v fix /data/database/genome/hg38/gtf/hg38.refGene.gtf >/data/database/genome/hg38/gtf/hg38.refGene_remove_fix.gtf

grep -v alt /data/database/genome/hg38/gtf/hg38.refGene_remove_fix.gtf >/data/database/genome/hg38/gtf/hg38.refGene_remove.gtf

grep -v MIR /data/database/genome/hg38/gtf/hg38.refGene_remove.gtf >/data/database/genome/hg38/gtf/hg38.refGene_remove_3.gtf

1.2 python使用示例

import os
import sys
import pyensembl
import sqlite3
from pyensembl import EnsemblRelease
from pyensembl.genome import Genome


from bpkit.utils import safe_mkdir

os.environ['PYENSEMBL_CACHE_DIR'] = '/data/tmp'

print(sys.modules['pyensembl'])
def get_genname_by_loc():

  data = Genome(
      reference_name='hg38',
      annotation_name='features',
      gtf_path_or_url='/data/database/genome/hg38/gtf/hg38.refGene_remove_3.gtf') # gtf_path_or_url用来指定gtf的路径
  # parse GTF and construct database of genomic features
  data.index()  # 建立index,其实就是建立sqlite的书哭哭

  gene_names = data.gene_ids_at_locus(contig='chr12', position=25245365 )
  # gene_names = data.gene_names_at_locus(contig='chr12', position=2524)
  # exon_ids = data.exon_ids_of_gene_name('KRAS')

  print(gene_names)
  # print(exon_ids)

参考资料

个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn

Sam avatar
About Sam
专注生物信息 专注转化医学