【6.5】根据坐标提取序列(samtools/fastacmd/twoBitToFa)

一、网页版

hg38版本:

http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:82316360,82316389

hg19版本:

https://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:1563707,1563797

二、samtools

cd /data/database/genome/hg38
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

#建立index
samtools faidx hg19.fa

#提取坐标序列
samtools faidx hg19.fa chr1:12953463-12953463

三、 fastacmd

target_seq = '%s -d %s -p F -s "lcl|%s" -L%s,%s' % (
fastacmd, genome, chrom, start, end)

四、twoBitToFa

C04

4.1 下载程序

uname -a

Linux c04 4.17.5-1.el7.elrepo.x86_64 #1 SMP Sun Jul 8 10:40:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

http://hgdownload.soe.ucsc.edu/admin/exe/,选择linux.x86_64/

下载地址:http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/

wget -c http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

chmod +x /data/software/tools/twoBitToFa

添加环境变量

vim /etc/profile
export PATH=/data/software/tools/:$PATH

4.2 下载 2bit files

下载链接 http://hgdownload.soe.ucsc.edu/downloads.html#human

人的基因组下载地址: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit

4.3 参数介绍

### kent source version 388 ###
twoBitToFa - Convert all or part of .2bit file to fasta

usage:
   twoBitToFa input.2bit output.fa
options:
   -seq=name       Restrict this to just one sequence.
   -start=X        Start at given position in sequence (zero-based).
   -end=X          End at given position in sequence (non-inclusive).
   -seqList=file   File containing list of the desired sequence names 
                   in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189
                   where coordinates are half-open zero-based, i.e. [start,end).
   -noMask         Convert sequence to all upper case.
   -bpt=index.bpt  Use bpt index instead of built-in one.
   -bed=input.bed  Grab sequences specified by input.bed. Will exclude introns.
   -bedPos         With -bed, use chrom:start-end as the fasta ID in output.fa.
   -udcDir=/dir/to/cache  Place to put cache for remote bigBed/bigWigs.

Sequence and range may also be specified as part of the input
file name using the syntax:
      /path/input.2bit:name
   or
      /path/input.2bit:name
   or
      /path/input.2bit:name:start-end

具体案例:

/data/software/tools/twoBitToFa -seq=chr1 -start=119228553 -end=119228577 /data/database/genome/hg38/hg38.2bit tes.fa
药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn