【5.1.2.2】信号肽的预测--SignalP

官网: http://www.cbs.dtu.dk/services/SignalP/

SignalP 5.0服务器预测信号肽的存在及其在来自古细菌,革兰氏阳性细菌,革兰氏阴性细菌和真核生物的蛋白质中的切割位点的位置。 在细菌和古细菌中,SignalP 5.0可以区分三种类型的信号肽:

  • Sec/SPI: “standard” secretory signal peptides transported by the Sec translocon and cleaved by Signal Peptidase I (Lep)
  • Sec/SPII: lipoprotein signal peptides transported by the Sec translocon and cleaved by Signal Peptidase II (Lsp)
  • Tat/SPI: Tat signal peptides transported by the Tat translocon and cleaved by Signal Peptidase I (Lep)

SignalP 5.0基于包括条件随机场的深度卷积和递归神经网络架构。

蛋白质序列应不少于10个氨基酸。 蛋白质的最大数量是5000。

一、下载与安装

下载网址signalp,填写邮箱和信息,下载链接会发到你的邮箱里,

http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp

我下载下来的是 signalp-5.0b.Linux.tar.gz

tar -xzvf signalp-5.0b.Linux.tar.gz

cd signalp-5.0b/

解压以后的文件夹包含了两个文件夹’bin' and ‘lib’, 执行程序在 bin文件夹中,然后添加环境变量

vim /etc/profile
#signalp
export PATH=$PATH:/data/software/signalp/signalp-5.0b/bin

使环境变量生效

source /etc/profile

注: 整理的bin文件夹和Lib文件夹必须在同一级的文件夹里面 */bin/signalp */lib/

测试安装是否成功

signalp -h

如果有用法正常出现,则说明已经成功安装。

Move or copy the ‘signalp.1’ file to a appropriate location in your manual system. If you need a compiled version try running:

	man -d signalp.1 | compress >signalp.Z
	or:
	neqn signalp.1 | tbl | nroff -man | col | compress >signalp.Z

二、使用

	> signalp -fasta test/euk10.fsa -org euk -format short -prefix euk_10_short
	> signalp -fasta test/euk10.fsa -org euk -format long -prefix euk_10_long

输入:

1.输入蛋白

输入蛋白必须是fastag格式,序列为one-letter amino acid code, 大小写不敏感,例如:A C D E F G H I K L M N P Q R S T V W Y and X (unknown) 所有不规范的字符串,都会被转成X。空格和数字会被忽略。

2.输入的参数

	#必须参数
		-fasta string
	      输入的fasta序列

	#可选参数:
	-batch int
	      同时运行的的序列数,数目越大,消耗的内存越大,速度越快。默认的设置是10000条,消耗内存1.5G
	-format string
	      输出格式,如果为 'long',则输出预测图形,如果为‘short’,则不输出图形(默认的为short)		      
	-gff
	     生成gff3文件
	-mature
	      Make fasta file with mature sequence.
	-org string
	      Organism. Archaea: 'arch', Gram-positive: 'gram+', Gram-negative: 'gram-' or Eukarya: 'euk' (default "euk")
	-plot string
	      Plots output format. When long output selected, choose between 'png', 'eps' or 'none' to get just a tabular file. (default "png")
	-prefix string
	      Output files prefix. (default "Input file prefix")
	-stdout
	  Write the prediction summary to the STDOUT.
	-tmp string
	      Specify temporary file directory. (default "System default tmpdir")
	-verbose
	    Verbose output. Specify '-verbose=false' to avoid printing. (default true)
	-version   版本信息

3.输出格式

The user can obtain the results of the run in various formats:

    - A prediction summary (tabular file containing 1. the protein prediction (SP(Sec/SPI) / LIPO(Sec/SPII) / TAT(Tat/SPI)) / OTHER and the associated likelihood probability and 2. the cleavage site position and associated likelihood probability. NOTE: if the cleavage site position is "?", it means that the cleavage site is out range due to a probable protein fragment as input.)
    - Processed entries fasta (a FASTA sequence file containing the sequences of protein that had predicted signal peptides, with the signal peptide removed)
    - Processed entries gff3 (a file showing the signal peptides feature of those proteins that had predicted signal peptides in GFF3 format).
    - A plot, three likelihood probabilities are reported on the plot, i.e. SP(Sec/SPI) / LIPO(Sec/SPII) / TAT(Tat/SPI) (depending on what type of signal peptide is predicted), CS (the cleavage site) and OTHER (the probability that the sequence does not have any kind of signal peptide).
    - A tabular file with the numeric likelihood probabilities used in the plot.

4.其他

In case of technical problems (bugs etc.) please contact jjalma@dtu.dk.

Questions on the scientific aspects of the SignalP method should go to Henrik Nielsen, henni@dtu.dk.

四、原理介绍

By default the server produces the following output for each input sequence. One annotation is attributed to each protein, the one that has the highest probability. On the plot, three marginal probabilities are reported, i.e. SP(Sec/SPI) / LIPO(Sec/SPII) / TAT(Tat/SPI) (depending on what type of signal peptide is predicted), CS (the cleavage site) and OTHER (the probability that the sequence does not have any kind of signal peptide).

五、讨论

5.1 uniprot 的序列如何知道是否含有信号肽

https://genome.ucsc.edu/cgi-bin/hgc?hgsid=946014281_iFMzAky07atA4lQnWVFWgZgTF2qX&c=NC_045512v2&l=0&r=29903&o=27393&t=27438&g=unipCov2LocSignal&i=Signal+peptide

五、报错

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn