【3.5.3】MEGAN

MEGAN是一款非常强大的软件。面对宏基因组数据的时候,我们经常会面临3个最基本的计算任务:系统学分析,功能分析,比较分析。他们也叫做“那儿的是谁”,“他们在做什么”,“他们是如何合作的?”这些给我们巨大的概念和计算的 挑战,有大量需求新的生物信息学工作和方法来阐明他们,所以这款软件产生了。是不是有点做广告的嫌疑啊

一、MEGAN安装

我们需要先将数据通过NCBI-NR数据库blastx比对,比对结果再输入到MEGAN中去系统分析。 例如:

/sam/blast/bin/blastp -query assembly.orfs.hmm.faa -db /sam/blast/db/refseq_protein/refseq_protein -evalue 1e-5 -num_threads 60 -max_target_seqs 5 -outfmt 5 -out assembly.orfs.hmm.blast.xml

/sam/megan/MEGAN +g -x "import blastfile= assembly.orfs.hmm.blast.xml meganfile=temp.rma;recompute toppercent=5;recompute minsupport=1;update;collapse rank=Species;update;select nodes=all;export what=CSV format=readname_taxonpath separator=tab file=assembly.orfs.hmm.blast.tax.txt;update;close"

下载地址:http://ab.inf.uni-tuebingen.de/data/software/megan4/download/welcome.html

然后终端输入:

sh MEGAN unix.sh(根据版本号来)

megan得需要教育网的邮箱来注册,注册后会给你发一个验证码,不知掉为什么我下的最新版本的MEGAN5怎么输入验证码都不行,最后不得已只好重新用MEGAN4了。 后来给他们的负责人发了一封邮件(huson@informatik.uni-tuebingen.de),这是给我的回复。

Best way to do this under linux or mac os is to one a terminal,
type
cat >megan5-license.txt
then to paste the text into the terminal
and then to press control-d to finish

ps: 生物信息学软件各种奇葩的下载和安装的方式都有,而这个软件安装的时候最折腾我的就是我没有教育网的邮箱,而它又只认教育网的邮箱,同时加上MEGAN5的问题,折腾了一段时间。见怪不怪,见怪不怪。

二、MEGAN4命令行参数

MEGAN4命令模式写各个参数

为了搞明白biotech文献中这个MEGAN所用参数,特总结如下:

/sam/megan/MEGAN +g -x "import blastfile= assembly.orfs.hmm.blast.xml meganfile=temp.rma;recomputed toppercent=5;recompute minsupport=1;update;collapse rank=Species;update;select nodes=all;export what=CSV format=readname_taxonpath separator=tab file=assembly.orfs.hmm.blast.tax.txt;update;close"

Program usage:

+g    开启非图形模式(即命令模式)
-f < String> (default=""): MEGAN file (separate multiple files using '|')
-fs < String> (default=""): Synonyms file
-fg < String> (default=""): GI lookup file
-p < String> (default="/Users/huson/Library/Preferences/Megan.def"): Properties file指定配置文件,默认的是继续使用图形界面的配置文件,如果想该,就可以加入这个参数
-m < int> (default=0): minimum score
+w < switch> (default=true): show message window
-x    后面“”里面接命令模式,想怎么运行,即在“”里面输入
-E < switch> (default=false): Quit if exception thrown in non-gui mode
-V < switch> (default=false): show version string
-S < switch> (default=false): silent mode
-d < switch> (default=false): debug mode
+s < switch> (default=true): show startup splash screen
-h < switch> (default=false): Show usage

这里面要注意命令行是一行一行来处理的,如果把命令放在一行的话,特别是在系统分类和数据结构的时候,需要加入updates来链接命令。

例如:

open file='/Users/huson/data/megan/x.rma'
exportimage file='/Users/huson/data/megan/x.pdf'format=PDF replace=true
quit

可以写成

open file='x.rma';update;exportimage file='x.pdf' format=PDF replace=true;

例如想保存SEED分析的图像,我们就必须打开SEED viewer,将command context 改变为seed vewer,确保图像的大小,选择树的节点数,然后保存数据,这是一个例子

open file='/Users/huson/data/megan/x/x.rma'
show window=seedviewer
set context=seedviewer
set windowsize=1000 x 1000
select nodes=all
uncollapse subtrees
exportimage file='/Users/huson/data/megan/x/x.pdf'format=PDF replace=true
quit

你也可以在图形界面的时候,打开感兴趣的选项,然后Window–Command-Line Syntax…即可知道该选项对应的命令行

注意下面的命令中,黑色加粗的部分是解释部分,红色的是针对我之前的命令。

Available commands (context=mainviewer):

File menu:

new;- Open a new empty document
open file=< filename> [readonly={false|true}] [fixlinks={true|false}];

- Open a MEGAN file (ending on .rma, .meg or .megan)

import blastfile=< name>[,< name>,< name>,...] [fastafile=< name>[,< name>,< name>,...]] meganfile=< name> [maxmatches=< num>]

[minscore=< num>] [toppercent=< num>] [winscore=< num>] [minsupport=< num>]

[mincomplexity=< num>] [useseed={true|false}] [usekegg={true|false}] [paired={false|true}

[suffix1=< string> suffix2=< string>]] [textstoragepolicy={0|1|2}]

[blastformat={GUESS|BLASTX|BLASTN|BLASTP|BLASTXML|BLASTTAB|RDP-Assignment-Detail|RDP-Standalone|SILVA|SAM}];

- Import BLAST (or RDP or Silva or SAM) and reads files to create a new MEGAN file

save file=< filename> [summary={false|true}]; - Save current data set

exportimage file=< filename> [format={eps|svg|gif|png|jpg|pdf}] [replace={false|true}] [textasshapes={false|true}];

- Export content of window to an image file

show window=pagesetup; - Setup the page for printing

show window=print; - Print the main panel

extract what=document file=< megan-filename> [sparsefile={false|true}] [data={Taxonomy|SEED|KEGG}] [ids=< numbers...>]

[names=< names...>] [allbelow={false|true}];

- Extract all reads and matches on or below selected node(s) to a new document

extract what=reads outdir=< directory> outfile=< filename-template> [data={Taxonomy|SEED|KEGG}]

[ids=< SELECTED|numbers...>] [names=< names...>] [allbelow={false|true}]; - Extract reads for the selected nodes

import csv={reads|summary} separator={comma|tab} file=< fileName> [toppercent=< num>] [taxonomy={true|false}]

[seed={false|true}] [kegg={false|true}] [useRefSeq={false|true}] [minscore=< num>] [minsupport=< num>];

- Load data in comma-separated-values (CSV) format: READ_NAME,CLASS-NAME,SCORE or CLASS,COUNT(,COUNT...)

import format=biome file=< fileName>; - Import data from a table in BIOME format

show window=properties; - Show document properties

close;- Close the window

Export sub-menu:

Export what=CSV format={readname_taxonname|readname_taxonid|readname_taxonpath|taxonname_count|taxonpath_count|taxonid_count|taxonname_readname|

taxonpath_readname|taxonid_readname|taxonname_length|taxonpath_length|taxonid_length|readname_refseqid|readname_ seedname|

readname_seedpath|seedname_count|seedpath_count|seedname_length|seedpath_length|seedname_readname|seedpath_readname|

readname_keggname|readname_keggpath|keggname_count|keggpath_count|keggname_length|keggpath_length|keggname_readname|keggpath_readname}

separator={comma|tab} file=< filename>;

- Export assignments of reads to nodes to a CSV (comma-separated values) file

export what=reads [data={Taxonomy|SEED|KEGG}] file=< filename>;

- Export all reads to a text file (or only those for selected nodes, if any selected)

export what=matches [data={Taxonomy|SEED|KEGG}] file=< filename>;

- Export all matches to a text file (or only those for selected nodes, if any selected)

Edit menu:

show window=formatter; - Format nodes and edges
show findtoolbar={true|false}; - Open the Find toolbar

Preferences sub-menu:

set db=< string> user=< string> password=< string>;
- Set postgres database name and user authorization
set showlegend={true|false}; - Show legend identifying different datasets

Select menu:

select nodes=all; - Select all nodes
select nodes=none; - Deselect all nodes
select nodes=previous; - Select from previous window
select nodes=leaves; - Select all leaves
select nodes=internal; - Select all internal nodes
select nodes=intermediate; - Select all intermediate nodes
select nodes=subtree; - Select subtree
select nodes=subleaves; - Select allow leaves below
select nodes=invert; - Invert selection

Level sub-menu:

select rank=Kingdom; - Select Kingdom
select rank=Phylum; - Select Phylum
select rank=Class; - Select Class
select rank=Order; - Select Order
select rank=Family; - Select Family
select rank=Varietas; - Select Varietas
select rank=Genus; - Select Genus
select rank=Species_group; - Select Species_group
select rank=Subspecies; - Select Subspecies
select rank=Species; - Select Species

Options menu:

recompute [minsupport=< number>] [minscore=< number>] [toppercent=< number>] [winscore=< number>] [mincomplexity=< number>]

[pairedreads={false|true}] [useseed={false|true}] [usekegg={false|true}]; - Rerun the LCA analysis with different parameters

set totalreads=< num>; - Set the total number of reads in the analysis (will initiate recalculation of all classifications)

list summary={all|selected}; - List summary of hits for selected nodes of tree

compare mode={absolute|relative|merge}[ignore_unassigned={false|true}] [pid=< number>,...] [meganfile=< filename>,...];

- Open compare dialog to produce a comparison of multiple datasets

set order=< number> < number>...; - Change the order of datasets in a comparison view

show window=colorpalette; - Edit the color palette used in comparison views
show webpage taxon=< name|id>; - Open NCBI Taxonomy web site in browser
inspector taxa=selected; - Inspect the read-to-taxon assignments

Taxon Disabling sub-menu:

enable taxa=all; - Enable all taxa
disable taxa={selected|< name,..>}; - disable all selected taxa or the named ones
enable taxa={selected|< name,...>}; - enable all selected taxa or the named ones
list taxa=disabled; - List all disabled taxa

Layout menu:

set autolayoutlabels={true|false}; - Layout labels
set scaleby=assigned; - Scale nodes by number of reads assigned to taxon
set scaleby=summarized; - Scale nodes by number of reads assigned to and below a taxon
set maxnoderadius=< num>; - Set the maximum node radius in pixels
set zoom=selected; - Zoom to the selection
set zoom=fit; - Contract tree vertically
set zoom=full; - Expand tree vertically
set nodedrawer=circle; - Draw data as circles
set nodedrawer=piechart; - Draw data as pie charts
set nodedrawer=heatmap; - Draw data as heat maps
set nodedrawer=barchart; - Draw nodes as bars
set drawer={Cladogram,Phylogram}; - Draw tree as cladogram with all leaves aligned right
set drawleavesonly={true|false}; - Only draw leaves

Expand/Contract sub-menu:

expand direction=horizontal; - Expand view horizontally

contract direction=horizontal; - Contract view horizontally

expand direction=vertical; - Expand view vertically

contract direction=vertical; - Contract view vertically

Highlight Differences sub-menu:

set highlightdifferences={true|false} [correction={none|bonferroni|holm_bonferroni}];

- In a comparison of exactly two
datasets, highlight statistically significant differences, using no correction

set comparison_highlight_color=< number>;

- Set the pairwise comparison highlight color

Tree menu:

collapse nodes=selected; - Collapse selected nodes

collapse level=< num>; - Collapse all nodes at given depth in tree

uncollapse nodes={all|selected|subtree}; - Uncollapse selected nodes

nodelabels names={true|false}; - Display the full names of taxa

nodelabels ids={true|false}; - Display the NCBI ids of taxa

nodelabels assigned={true|false}; - Display the number of reads assigned to a taxon

nodelabels summarized={true|false}; - Display the total number of hits to a taxon and its descendants

show labels=selected; - Show labels for selected nodes

hide labels=selected; - Hide labels for selected nodes

show intermediate=< bool>; - Show intermediate labels at nodes of degree 2

Collapse At Taxonomic Level sub-menu:

collapse rank=Kingdom; - Collapse Kingdom

collapse rank=Phylum; - Collapse Phylum

collapse rank=Class; - Collapse Class

collapse rank=Order; - Collapse Order

collapse rank=Family; - Collapse Family

collapse rank=Varietas; - Collapse Varietas

collapse rank=Genus; - Collapse Genus

collapse rank=Species_group; - Collapse Species_group

collapse rank=Subspecies; - Collapse Subspecies

collapse rank=Species; - Collapse Species

Window menu:

show window=howtocite; - Show how to cite the program

show window=website; - Go to the program website

show window=register; - Show registration window

show window=message; - Open the message window

set windowsize=< width> x < height>; - Set the window size

show window=inspector; - Open inspector window

show window=mainviewer; - Brings the main viewer to the front

show window=seedviewer; - Opens the SEED Analyzer

show window=keggviewer; - Opens the KEGG Analyzer

show chart data={taxonomy|SEED|KEGG|attributes}; - Chart assigned reads

show wordCloud data={taxonomy|SEED|KEGG|attributes}; - WordCloud based on assigned reads

show window=network; - Open a network comparison window

show rarefaction data={taxonomy|seed|kegg}; - Compute a rarefaction curve

help [keyword]; - Shows syntax help for commands

Additional commands:

exportimage-old file=< filename> [format={eps|svg|gif|png|jpg|pdf}] [replace={false|true}] [textasshapes={false|true}];

- Export content of window to an image file

 

list assignments; - List the number of reads assigned to each level of the taxonomy

load colorfile=< filename>;- Load dataset colors from a file (format: one RGB color per line)

load gi2taxfile=< filename>; - Load the GI mapping file gi_taxid_nucl.bin, downloaded from the MEGAN website

load synonymsfile=< filename>; - Load a file of taxon-name synonyms

load treefile=< filename> [mapfile=< filename>; - Load the taxonomy .tre and .map files (e.g. ncbi.tre and ncbi.map)

mp-analyzer what={lca-ranks|compare} infile=< filename> outfile=< filename>; - Compute the rank at which the LCA is found for each mate-pair, or preprocess comparison

quit; - Quit the program

replacelinks [old=< filename> new=< filename>] [...]; - Replace links to source files

select ids=< ids...>; - Select the nodes for the given ids

select name=< names...>; - Select the named nodes

set context=< window-name>; - Choose command context, i.e. the window that should parse the subsequent commands

set dir=< directory> - Set the current directory

set margin [left=< number>] [right=< number>] [bottom=< number>] [top=< number>]; - Set margins used in tree visualization

set proxy=< string> port=< number> user=< string> password=< string>; - Set proxy credentials

set scaleby=none; - Do not scale nodes

set usekegg={true|false}; - Turn KEGG analysis on or off

set usepercentidentity={false|true}; - Adjust assignment based on best percent identity of matches, using the following minimum requirements:

Species 97%, Genus 95%, Family 90%, Order 85%, Class 80%, Phylum 75%

set useseed={true|false}; - Turn SEED analysis on or off

setprop < name>=< value>; - Set a property

show chart=taxavsseed; - Chart taxa vs SEED

show histogram taxonid=< num>; - Shows the distribution of matches for a given taxon

show window=about; - About MEGAN and the authors

show window=checkforupdate; - Check for an update of the program

show window=cogs; - Open COG window

show window=comparisonstats; - Open dialog to produce a statistical comparison of two datasets

show window=fixlinks; - Fix missing links to source BLAST and reads files

show window=webservice; - Open metagenomic files from the MEGAN-DB website

tofront; - Bring window to front

update [reprocess={false|true}] [reset={false|true}] [reinduce={false|true}];

- Update data. If nothing specified, assumes reinduce=true

version; - Show version info

参考资料

  • 软件自带说明书
药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn