【6.5】根据坐标提取序列(samtools/fastacmd/twoBitToFa)
一、网页版
hg38版本:
http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:82316360,82316389
hg19版本:
https://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:1563707,1563797
二、samtools
cd /data/database/genome/hg38
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
#建立index
samtools faidx hg19.fa
#提取坐标序列
samtools faidx hg19.fa chr1:12953463-12953463
三、 fastacmd
target_seq = '%s -d %s -p F -s "lcl|%s" -L%s,%s' % (
fastacmd, genome, chrom, start, end)
四、twoBitToFa
C04
4.1 下载程序
uname -a
Linux c04 4.17.5-1.el7.elrepo.x86_64 #1 SMP Sun Jul 8 10:40:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
http://hgdownload.soe.ucsc.edu/admin/exe/,选择linux.x86_64/
下载地址:http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/
wget -c http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa
chmod +x /data/software/tools/twoBitToFa
添加环境变量
vim /etc/profile
export PATH=/data/software/tools/:$PATH
4.2 下载 2bit files
下载链接 http://hgdownload.soe.ucsc.edu/downloads.html#human
人的基因组下载地址: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
4.3 参数介绍
### kent source version 388 ###
twoBitToFa - Convert all or part of .2bit file to fasta
usage:
twoBitToFa input.2bit output.fa
options:
-seq=name Restrict this to just one sequence.
-start=X Start at given position in sequence (zero-based).
-end=X End at given position in sequence (non-inclusive).
-seqList=file File containing list of the desired sequence names
in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189
where coordinates are half-open zero-based, i.e. [start,end).
-noMask Convert sequence to all upper case.
-bpt=index.bpt Use bpt index instead of built-in one.
-bed=input.bed Grab sequences specified by input.bed. Will exclude introns.
-bedPos With -bed, use chrom:start-end as the fasta ID in output.fa.
-udcDir=/dir/to/cache Place to put cache for remote bigBed/bigWigs.
Sequence and range may also be specified as part of the input
file name using the syntax:
/path/input.2bit:name
or
/path/input.2bit:name
or
/path/input.2bit:name:start-end
具体案例:
/data/software/tools/twoBitToFa -seq=chr1 -start=119228553 -end=119228577 /data/database/genome/hg38/hg38.2bit tes.fa
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn