【2.2.1】统计蛋白序列的各种属性 Emboss pepstats

一、简介

可以计算的属性

  • Molecular weight
  • Number of residues
  • Average residue weight
  • Charge
  • Isoelectric point
  • For each type of amino acid: number, molar percent, DayhoffStat
  • For each physico-chemical class of amino acid: number, molar percent
  • Probability of protein expression in E. coli inclusion bodies
  • Molar extinction coefficient (A280)
  • Extinction coefficient at 1 mg/ml (A280)*

二、用法

参数:

[sam@g02 view]$ pepstats --help
Calculate statistics of protein properties
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Protein sequence(s) filename and optional
                                  format, or reference (input USA)
  [-outfile]           outfile    [*.pepstats] Pepstats program output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -aadata             datafile   [Eamino.dat] Amino acid properties
   -mwdata             datafile   [Emolwt.dat] Molecular weight data for amino
                                  acids
   -pkdata             datafile   [Epk.dat] Values of pKa for amino acids
   -[no]termini        boolean    [Y] Include charge at N and C terminus
   -mono               boolean    [N] Use monoisotopic weights

   General qualifiers:
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbos

结果示例:

PEPSTATS of LACI_ECOLI from 1 to 360

Molecular weight = 38590.16  		Residues = 360   
Average Residue Weight  = 107.195 	Charge   = 1.5   
Isoelectric Point = 6.3901
A280 Molar Extinction Coefficients  = 22920 (reduced)   23045 (cystine bridges)
A280 Extinction Coefficients 1mg/ml = 0.594 (reduced)   0.597 (cystine bridges)
Improbability of expression in inclusion bodies = 0.660

Residue		Number		Mole%		DayhoffStat
A = Ala		44		12.222 		1.421  	
B = Asx		0		0.000  		0.000  	
C = Cys		3		0.833  		0.287  	
D = Asp		17		4.722  		0.859  	
E = Glu		15		4.167  		0.694  	
F = Phe		4		1.111  		0.309  	
G = Gly		22		6.111  		0.728  	
H = His		7		1.944  		0.972  	
I = Ile		18		5.000  		1.111  	
J = ---		0		0.000  		0.000  	
K = Lys		11		3.056  		0.463  	
L = Leu		41		11.389 		1.539  	
M = Met		10		2.778  		1.634  	
N = Asn		12		3.333  		0.775  	
O = ---		0		0.000  		0.000  	
P = Pro		14		3.889  		0.748  	
Q = Gln		28		7.778  		1.994  	
R = Arg		19		5.278  		1.077  	
S = Ser		32		8.889  		1.270  	
T = Thr		19		5.278  		0.865  	
U = ---		0		0.000  		0.000  	
V = Val		34		9.444  		1.431  	
W = Trp		2		0.556  		0.427  	
X = Xaa		0		0.000  		0.000  	
Y = Tyr		8		2.222  		0.654  	
Z = Glx		0		0.000  		0.000  	

Property	Residues		Number		Mole%
Tiny		(A+C+G+S+T)		120		33.333
Small		(A+B+C+D+G+N+P+S+T+V)	197		54.722
Aliphatic	(A+I+L+V)		137		38.056
Aromatic	(F+H+W+Y)		21		 5.833
Non-polar	(A+C+F+G+I+L+M+P+V+W+Y)	200		55.556
Polar		(D+E+H+K+N+Q+R+S+T+Z)	160		44.444
Charged		(B+D+E+H+K+R+Z)		69		19.167
Basic		(H+K+R)			37		10.278
Acidic		(B+D+E+Z)		32		 8.889

命令行:

pepstats $(sequence).txt 1-pepstats.txt

三、名词详解

3.1 extinction coefficient

摩尔吸光系数(Molar Absorption Coefficient),也称摩尔消光系数(Molar Extinction Coefficient),是指物质对某波长的光的吸收能力的量度,以符号“ε”表示。

3.2 Dayhoff

DayhoffStat是氨基酸的摩尔百分比除以Dayhoff统计量。从EMBOSS数据文件Edayhoff.freq中读取Dayhoff统计信息,它是将每1000个氨基酸的相对出现率标准化为100。

3.3 inclusion bodies

包含体(inclusion bodies)中表达的可能性有时称为溶解度量度的一种。但是,如果重组蛋白在大肠杆菌中表达,则可以表达为可溶于细胞质或不溶于包涵体。如果哈里森模型(Harrison model)预测给定的蛋白质可能在包涵体中表达,这并不意味着不可能使其溶于细胞质。一个例子:具有C-末端His-Tag的热生热球菌细胞分裂蛋白FtsA在包涵体中表达的哈里森概率为58%。但是,大肠杆菌胞质溶胶中有大量可溶性蛋白(F. van den Ent和J. Lowe,EMBO J. 19,5300-5307,2000)。蛋白质是否在包涵体中表达不仅取决于序列,还取决于许多其他因素,例如大肠杆菌菌株,温育温度,表达载体的类型,启动子和培养基的强度。

3.4 其他的网页工具

https://web.expasy.org/cgi-bin/protparam/protparam

参考资料

http://bar.utoronto.ca/cgi-bin/emboss/help/pepstats

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn