【3.5.2】gtfToGenePred
一、下载
地址:http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
cd /data/software
wget -c http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gtfToGenePred
chmod 755 /data/software/gtfToGenePred
二、使用
2.1 程序说明
gtfToGenePred - convert a GTF file to a genePred
usage:
gtfToGenePred gtf genePred
options:
-genePredExt - create a extended genePred, including frame
information and gene name
-allErrors - skip groups with errors rather than aborting.
Useful for getting infomation about as many errors as possible.
-ignoreGroupsWithoutExons - skip groups contain no exons rather than
generate an error.
-infoOut=file - write a file with information on each transcript
-sourcePrefix=pre - only process entries where the source name has the
specified prefix. May be repeated.
-impliedStopAfterCds - implied stop codon in after CDS
-simple - just check column validity, not hierarchy, resulting genePred may be damaged
-geneNameAsName2 - if specified, use gene_name for the name2 field
instead of gene_id.
-includeVersion - it gene_version and/or transcript_version attributes exist, include the version
in the corresponding identifiers.
2.2 网上下载genPred
CHOPCHOP script will need a table to look up genomic coordinates if you want to supply names of the genes rather than coordinates. To get example genePred table:
Select organism and assembly
Select group: Genes and Gene Predictions
Select track: RefSeq Genes or Ensemble Genes
Select table: refFlat or ensGene
Select region: genome
Select output format: all fields from selected table
Fill name with extension ".gene_table' e.g. danRer10.gene_table
Get output
所以还是本地来构建吧
cd /data/database/homo/genepred
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz
gunzip *.gz
/data/software/gtfToGenePred -genePredExt -geneNameAsName2 gencode.v29.annotation.gtf hg38.genePred
head head hg38.genePred
name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
ENST00000456328.2 chr1 + 11868 14409 14409 14409 3 11868,12612,13220, 12227,12721,14409, 0 DDX11L1 none none -1,-1,-1,
sed '1i\name\tchrom\tstrand\ttxStart\ttxEnd\tcdsStart\tcdsEnd\texonCount\texonStarts\texonEnds\tscore\tname2\tcdsStartStat\tcdsEndStat\texonFrames' hg38.genePred > hg38.gene_table
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn