pysam是python中用来读写SAM/BAM文件的一个模块。底层是用C写的

1. 文件的读取

import pysam

def mutate(args):
    assert mutend > mutstart, "mutation start must occur before mutation end: " + mutid

    bamfile = pysam.Samfile(args.bamFileName, 'rb')  #读取bam文件

    for pcol in bamfile.pileup(reference=chrom, start=mutstart, end=mutend, max_depth=int(args.maxdepth), truncate=True):
#提取包含chrom:mutstart-mutend这个区段的所有reads。pcol.pos 提取碱基的位置;包含mutstart,不包含mutend;truncate=True,指定到我们指定的碱基位置
        if pcol.pos:
            num_good_reads=0
            num_better_reads=0
            refbase = reffile.fetch(chrom, pcol.pos-1, pcol.pos)  
            #提取pcol.pos的碱基
            for pread in pcol.pileups:  #提取一条序列pread
                if avoid is not None and pread.alignment.qname in avoid:
                    print "WARN\t" + now() + "\t" + region + "\tdropped mutation due to read in --avoidlist", pread.alignment.qname
     
                if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048):
  # is_secondary==true if not primary alignment
  #query_position==position of the read base at the pileup site, 0-based. None if is_del or is_refskip is set

                    if pread.alignment.is_read2:
                        pairname = 'S' # read is second in pair
                    if not pread.alignment.is_paired:
                        pairname = 'U' # read is unpaired
                    outreads_qual[extqname]=pread.alignment.qual #质量数据

    bamfile.close()

报错:

  1. ValueError: quality and sequence mismatch: 148 != 151
    原因: pysam读取后的seq和qual长度必须一样啊!!

2.文件的写入

注:
先整理到这里,后面用的时候,再回来梳理

参考资料:
http://pysam.readthedocs.org/en/latest/glossary.html#term-column
http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignmentFile.pileup

发表评论

电子邮件地址不会被公开。 必填项已用*标注