6.3 perl--提取每条序列，并以序列名来命名文件

July 30, 2014 perl 阅读量：次

一个文件夹里有如下的内容：

>A
ACTGACGT
>B
CGFSFTACGAT
>C
AGAGAGAGAAGAGAGAGAGA

我想根据>分割成若干个文件，文件名为>后的字母，文件的内容包含>以及下一行的内容

脚本思路：

这个脚本有两点需要注意的：

识别>,提取出每条序列的名字以及后面的序列
以序列名来命令生成的文件

脚本1：

#!/usr/bin/perl
use warnings;
use strict;

my(%hash,$key,$outputfile,$inputfile);
open(IN,"$ARGV[0]") or die "Can't open ' :$!";

while(< IN >){

  chomp;

   if(/^>(\w+)/){

     $key=$1;   #这一步主要是为了名字中去掉>号

    }

          $hash{$key}.=$_ unless /^>/;

    }

foreach (keys %hash){

$outputfile=$_;

open(OUT,'>',$outputfile) or die "Can't open '$outputfile': $!";

#之前把上面的这一句放在了开头，一直提示$outputfil未初始化值

print OUT ">$_\n$hash{$key}";

}
close IN;
close OUT;
exit;

脚本2：

#!/usr/bin/perl -w
use strict;
my $Useage = "
#################### Program Infomation ####################
This program works to chop each sequences from input FASTA
file and write them to a seperated *.SEQ file.
------------------------------------------------------------
Useage: [PERL] <$0> < Input FASTA File >
------------------------------------------------------------

                           E N D                           

############################################################

";

if(!defined($ARGV[0])){

die $Useage;

}
my (%hash,$key,$outputfile,$inputfile);
$inputfile = $ARGV[0];  #这一句是之前没有的，所以会一直报错
my ($nums1,$nums2) = (0,0);

open (IN,'<',"$inputfile") or die "Can't open $inputfile!\n";

 

while(< IN >){

    chomp();

    if(/^>/){

       s/>//gi;

       $key = $_;

       $nums1 ++;

       }else{
    $hash{$key} .= $_ unless !/^[AUGCTNaugctn]+/;
    }

}

foreach(keys %hash){
    chomp();
#   s/\W//gi;

    $outputfile = $_.".seq";
    open (OUT,'>',"$outputfile") or die "Can't open $outputfile! Check FASTA file sequence name, which may contain illegal characters in file system!\n";
    print OUT ">$_\n$hash{$_}";
    close OUT;
    $nums2 ++;
}

close IN;
print "[ $0 ] program processed $nums1 sequences from FASTA file and $nums2 were writtern to file!\n";

exit;

ps:

感谢QQ好友无声对我第一个脚本的修改
感谢QQ好友让人不爽提供的第二个脚本

这是我亲自捉刀的第一个脚本，现在正式向perl开战。我个人更喜欢第一个脚本，但是第二个脚本的格式更规范一些。

这里是一个广告位，，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn