# 【2.3】biopython pairwise2实现两两序列比对

## 一、pairwise2.align介绍

Pairwise sequence alignment使用动态规划算法。

global + XX
local + XX


XX是2个字符的代码，表示它所采用的参数。

• x 没有参数。 相同的字符得分为1，否则为0。
• m 匹配分数是相同字符的分数，否则不匹配得分。
• d 字典返回任意一对字符的分数。
• c 回调函数返回分数。

gap 惩罚参数是：

• x 无gap罚款。
• s 同样的开放和延长两个序列的gap罚分。
• d 序列具有不同的开放和延长间隙罚分。
• c 回调函数返回间隙罚分。

from Bio import pairwise2
alignments = pairwise2.align.globalxx("ACCGT", "ACG")


>>> from Bio.pairwise2 import format_alignment
>>> print(format_alignment(*alignments[0]))
ACCGT
| ||
A-CG-
Score=3
<BLANKLINE>


align函数中包含的参数：

• Two sequences: strings, Biopython sequence objects or lists. Lists are useful for supplying sequences which contain residues that are encoded by more than one letter.
• penalize_extend_when_opening: boolean (default: False). Whether to count an extension penalty when opening a gap. If false, a gap of 1 is only penalized an “open” penalty, otherwise it is penalized “open+extend”.
• penalize_end_gaps: boolean. Whether to count the gaps at the ends of an alignment. By default, they are counted for global alignments but not for local ones. Setting penalize_end_gaps to (boolean, boolean) allows you to specify for the two sequences separately whether gaps at the end of the alignment should be counted.
• gap_char: string (default: ‘-’). Which character to use as a gap character in the alignment returned. If your input sequences are lists, you must change this to [‘-’].
• force_generic: boolean (default: False). Always use the generic, non-cached, dynamic programming function (slow!). For debugging.
• score_only: boolean (default: False). Only get the best score, don’t recover any alignments. The return value of the function is the score. Faster and uses less memory.
• one_alignment_only: boolean (default: False). Only recover one alignment.

## 二、其他例子

### 2.1 找到两个序列之间的最佳全局对齐。 相同的字符给出1分。 对于不匹配或差距，不会扣除任何积分。

>>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
...     print(format_alignment(*a))
ACCGT
| ||
A-CG-
Score=3
<BLANKLINE>
ACCGT
|| |
AC-G-
Score=3
<BLANKLINE>


### 2.2 和以前一样，但是局部对齐。 请注意，format_alignment仅显示序列的对齐部分以及起始位置。

>>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
...     print(format_alignment(*a))
1 ACCG
| ||
1 A-CG
Score=3
<BLANKLINE>
1 ACCG
|| |
1 AC-G
Score=3
<BLANKLINE>


### 2.3 做全局比对。 相同的字符给出2个点，每个不相同的字符扣除1个点。 不要惩罚gaps。

>>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
...     print(format_alignment(*a))
ACCGT
| ||
A-CG-
Score=6
<BLANKLINE>
ACCGT
|| |
AC-G-
Score=6
<BLANKLINE>


### 2.4 与上述相同，除了现在打开间隙时扣除0.5分，并且在延长时扣除0.1分。

>>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
...     print(format_alignment(*a))
ACCGT
| ||
A-CG-
Score=5
<BLANKLINE>
ACCGT
|| |
AC-G-
Score=5
<BLANKLINE>


### 2.5 根据惩罚，一个序列中的间隙可能跟随另一个序列中的间隙。如果您不喜欢这种行为，请增加缺口开始的罚分

>>> for a in pairwise2.align.globalms("A", "T", 5, -4, -1, -.1):
...     print(format_alignment(*a))
A-
<BLANKLINE>
-T
Score=-2
<BLANKLINE>
>>> for a in pairwise2.align.globalms("A", "T", 5, -4, -3, -.1):
...     print(format_alignment(*a))
A
.
T
Score=-4
<BLANKLINE>


## 2.6 对齐函数还可以使用已包含在Biopython中的已知矩阵（来自Bio.SubsMat的MatrixInfo）：

>>> from Bio.SubsMat import MatrixInfo as matlist
>>> matrix = matlist.blosum62
>>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
...     print(format_alignment(*a))
KEVLA
|||
-EVL-
Score=13
<BLANKLINE>


### 2.7 使用参数c，您可以定义自己的匹配和间隙函数。 例如。 定义仿射对数间隙函数并使用它：

>>> from math import log
>>> def gap_function(x, y):  # x is gap position in seq, y is gap length
...     if y == 0:  # No gap
...         return 0
...     elif y == 1:  # Gap open penalty
...         return -2
...     return - (2 + y/4.0 + log(y)/2.0)
...
>>> alignment = pairwise2.align.globalmc("ACCCCCGT", "ACG", 5, -4,
...                                      gap_function, gap_function)


## 三、我的案例

def get_similarity_pct(query, target):
alignments = pairwise2.align.globalxx(query, target) # 全局比对，相同的残基就给1分，不同和gap不扣分
seq_length = min(len(query), len(target))
matches = alignments[0][2]
percent_match = (matches / seq_length) * 100
return percent_match


def align_aa(query, target):
alignments = pairwise2.align.globalds(query, target, MatrixInfo.blosum62, -10, -2)
return alignments[0]

def print_aln(aln):
"""
DIQMTQSPSSLSASVGDRVTITCRASENIYSYLAWYQQKPGKSPKLLIYNARTLAEGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQYHSGSPLPFGGGTKVEIKR
||||||||||||||||||||||||||||||||||||||||||.|||||||||||||||||||||||||||||||||||||||.||||||||||||||
DIQMTQSPSSLSASVGDRVTITCRASENIYSYLAWYQQKPGKVPKLLIYNARTLAEGVPSRFSGSGSGTDFTLTISSLQPEDVATYYCQYHSGSPLP-----------
Score=465
"""
print(format_alignment(*aln))

print_aln(align_aa(query, target))

print(align_aa(’YVIYSLH‘, ‘LAISDQTKHA’))


('YVI--YSLH-', 'LAISDQTKHA', -13.0, 0, 10)
# 第一个是query序列，第二个是target序列，第三列是得分，后面两个是比对的起始和终止位置