【2.2.1】统计蛋白序列的各种属性 Emboss pepstats

一、简介

可以计算的属性

  • Molecular weight
  • Number of residues
  • Average residue weight
  • Charge
  • Isoelectric point
  • For each type of amino acid: number, molar percent, DayhoffStat
  • For each physico-chemical class of amino acid: number, molar percent
  • Probability of protein expression in E. coli inclusion bodies
  • Molar extinction coefficient (A280)
  • Extinction coefficient at 1 mg/ml (A280)*

二、用法

参数:

[sam@g02 view]$ pepstats --help
Calculate statistics of protein properties
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Protein sequence(s) filename and optional
                                  format, or reference (input USA)
  [-outfile]           outfile    [*.pepstats] Pepstats program output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -aadata             datafile   [Eamino.dat] Amino acid properties
   -mwdata             datafile   [Emolwt.dat] Molecular weight data for amino
                                  acids
   -pkdata             datafile   [Epk.dat] Values of pKa for amino acids
   -[no]termini        boolean    [Y] Include charge at N and C terminus
   -mono               boolean    [N] Use monoisotopic weights

   General qualifiers:
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbos

结果示例:

PEPSTATS of LACI_ECOLI from 1 to 360

Molecular weight = 38590.16         Residues = 360   
Average Residue Weight  = 107.195   Charge   = 1.5   
Isoelectric Point = 6.3901
A280 Molar Extinction Coefficients  = 22920 (reduced)   23045 (cystine bridges)
A280 Extinction Coefficients 1mg/ml = 0.594 (reduced)   0.597 (cystine bridges)
Improbability of expression in inclusion bodies = 0.660

Residue     Number      Mole%       DayhoffStat
A = Ala     44      12.222      1.421   
B = Asx     0       0.000       0.000   
C = Cys     3       0.833       0.287   
D = Asp     17      4.722       0.859   
E = Glu     15      4.167       0.694   
F = Phe     4       1.111       0.309   
G = Gly     22      6.111       0.728   
H = His     7       1.944       0.972   
I = Ile     18      5.000       1.111   
J = ---     0       0.000       0.000   
K = Lys     11      3.056       0.463   
L = Leu     41      11.389      1.539   
M = Met     10      2.778       1.634   
N = Asn     12      3.333       0.775   
O = ---     0       0.000       0.000   
P = Pro     14      3.889       0.748   
Q = Gln     28      7.778       1.994   
R = Arg     19      5.278       1.077   
S = Ser     32      8.889       1.270   
T = Thr     19      5.278       0.865   
U = ---     0       0.000       0.000   
V = Val     34      9.444       1.431   
W = Trp     2       0.556       0.427   
X = Xaa     0       0.000       0.000   
Y = Tyr     8       2.222       0.654   
Z = Glx     0       0.000       0.000   

Property    Residues        Number      Mole%
Tiny        (A+C+G+S+T)     120     33.333
Small       (A+B+C+D+G+N+P+S+T+V)   197     54.722
Aliphatic   (A+I+L+V)       137     38.056
Aromatic    (F+H+W+Y)       21       5.833
Non-polar   (A+C+F+G+I+L+M+P+V+W+Y) 200     55.556
Polar       (D+E+H+K+N+Q+R+S+T+Z)   160     44.444
Charged     (B+D+E+H+K+R+Z)     69      19.167
Basic       (H+K+R)         37      10.278
Acidic      (B+D+E+Z)       32       8.889

命令行:

pepstats $(sequence).txt 1-pepstats.txt

三、名词详解

3.1 extinction coefficient

摩尔吸光系数(Molar Absorption Coefficient),也称摩尔消光系数(Molar Extinction Coefficient),是指物质对某波长的光的吸收能力的量度,以符号“ε”表示。

3.2 Dayhoff

DayhoffStat是氨基酸的摩尔百分比除以Dayhoff统计量。从EMBOSS数据文件Edayhoff.freq中读取Dayhoff统计信息,它是将每1000个氨基酸的相对出现率标准化为100。

3.3 inclusion bodies

包含体(inclusion bodies)中表达的可能性有时称为溶解度量度的一种。但是,如果重组蛋白在大肠杆菌中表达,则可以表达为可溶于细胞质或不溶于包涵体。如果哈里森模型(Harrison model)预测给定的蛋白质可能在包涵体中表达,这并不意味着不可能使其溶于细胞质。一个例子:具有C-末端His-Tag的热生热球菌细胞分裂蛋白FtsA在包涵体中表达的哈里森概率为58%。但是,大肠杆菌胞质溶胶中有大量可溶性蛋白(F. van den Ent和J. Lowe,EMBO J. 19,5300-5307,2000)。蛋白质是否在包涵体中表达不仅取决于序列,还取决于许多其他因素,例如大肠杆菌菌株,温育温度,表达载体的类型,启动子和培养基的强度。

3.4 其他的网页工具

https://web.expasy.org/cgi-bin/protparam/protparam

参考资料

http://bar.utoronto.ca/cgi-bin/emboss/help/pepstats

个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn

Sam avatar
About Sam
专注生物信息 专注转化医学