【3.5.3】MEGAN
MEGAN是一款非常强大的软件。面对宏基因组数据的时候,我们经常会面临3个最基本的计算任务:系统学分析,功能分析,比较分析。他们也叫做“那儿的是谁”,“他们在做什么”,“他们是如何合作的?”这些给我们巨大的概念和计算的 挑战,有大量需求新的生物信息学工作和方法来阐明他们,所以这款软件产生了。是不是有点做广告的嫌疑啊
一、MEGAN安装
我们需要先将数据通过NCBI-NR数据库blastx比对,比对结果再输入到MEGAN中去系统分析。 例如:
/sam/blast/bin/blastp -query assembly.orfs.hmm.faa -db /sam/blast/db/refseq_protein/refseq_protein -evalue 1e-5 -num_threads 60 -max_target_seqs 5 -outfmt 5 -out assembly.orfs.hmm.blast.xml
/sam/megan/MEGAN +g -x "import blastfile= assembly.orfs.hmm.blast.xml meganfile=temp.rma;recompute toppercent=5;recompute minsupport=1;update;collapse rank=Species;update;select nodes=all;export what=CSV format=readname_taxonpath separator=tab file=assembly.orfs.hmm.blast.tax.txt;update;close"
下载地址:http://ab.inf.uni-tuebingen.de/data/software/megan4/download/welcome.html
然后终端输入:
sh MEGAN unix.sh(根据版本号来)
megan得需要教育网的邮箱来注册,注册后会给你发一个验证码,不知掉为什么我下的最新版本的MEGAN5怎么输入验证码都不行,最后不得已只好重新用MEGAN4了。 后来给他们的负责人发了一封邮件(huson@informatik.uni-tuebingen.de),这是给我的回复。
Best way to do this under linux or mac os is to one a terminal,
type
cat >megan5-license.txt
then to paste the text into the terminal
and then to press control-d to finish
ps: 生物信息学软件各种奇葩的下载和安装的方式都有,而这个软件安装的时候最折腾我的就是我没有教育网的邮箱,而它又只认教育网的邮箱,同时加上MEGAN5的问题,折腾了一段时间。见怪不怪,见怪不怪。
二、MEGAN4命令行参数
MEGAN4命令模式写各个参数
为了搞明白biotech文献中这个MEGAN所用参数,特总结如下:
/sam/megan/MEGAN +g -x "import blastfile= assembly.orfs.hmm.blast.xml meganfile=temp.rma;recomputed toppercent=5;recompute minsupport=1;update;collapse rank=Species;update;select nodes=all;export what=CSV format=readname_taxonpath separator=tab file=assembly.orfs.hmm.blast.tax.txt;update;close"
Program usage:
+g 开启非图形模式(即命令模式)
-f < String> (default=""): MEGAN file (separate multiple files using '|')
-fs < String> (default=""): Synonyms file
-fg < String> (default=""): GI lookup file
-p < String> (default="/Users/huson/Library/Preferences/Megan.def"): Properties file指定配置文件,默认的是继续使用图形界面的配置文件,如果想该,就可以加入这个参数
-m < int> (default=0): minimum score
+w < switch> (default=true): show message window
-x 后面“”里面接命令模式,想怎么运行,即在“”里面输入
-E < switch> (default=false): Quit if exception thrown in non-gui mode
-V < switch> (default=false): show version string
-S < switch> (default=false): silent mode
-d < switch> (default=false): debug mode
+s < switch> (default=true): show startup splash screen
-h < switch> (default=false): Show usage
这里面要注意命令行是一行一行来处理的,如果把命令放在一行的话,特别是在系统分类和数据结构的时候,需要加入updates来链接命令。
例如:
open file='/Users/huson/data/megan/x.rma'
exportimage file='/Users/huson/data/megan/x.pdf'format=PDF replace=true
quit
可以写成
open file='x.rma';update;exportimage file='x.pdf' format=PDF replace=true;
例如想保存SEED分析的图像,我们就必须打开SEED viewer,将command context 改变为seed vewer,确保图像的大小,选择树的节点数,然后保存数据,这是一个例子
open file='/Users/huson/data/megan/x/x.rma'
show window=seedviewer
set context=seedviewer
set windowsize=1000 x 1000
select nodes=all
uncollapse subtrees
exportimage file='/Users/huson/data/megan/x/x.pdf'format=PDF replace=true
quit
你也可以在图形界面的时候,打开感兴趣的选项,然后Window–Command-Line Syntax…即可知道该选项对应的命令行
注意下面的命令中,黑色加粗的部分是解释部分,红色的是针对我之前的命令。
Available commands (context=mainviewer):
File menu:
new;- Open a new empty document
open file=< filename> [readonly={false|true}] [fixlinks={true|false}];
- Open a MEGAN file (ending on .rma, .meg or .megan)
import blastfile=< name>[,< name>,< name>,...] [fastafile=< name>[,< name>,< name>,...]] meganfile=< name> [maxmatches=< num>]
[minscore=< num>] [toppercent=< num>] [winscore=< num>] [minsupport=< num>]
[mincomplexity=< num>] [useseed={true|false}] [usekegg={true|false}] [paired={false|true}
[suffix1=< string> suffix2=< string>]] [textstoragepolicy={0|1|2}]
[blastformat={GUESS|BLASTX|BLASTN|BLASTP|BLASTXML|BLASTTAB|RDP-Assignment-Detail|RDP-Standalone|SILVA|SAM}];
- Import BLAST (or RDP or Silva or SAM) and reads files to create a new MEGAN file
save file=< filename> [summary={false|true}]; - Save current data set
exportimage file=< filename> [format={eps|svg|gif|png|jpg|pdf}] [replace={false|true}] [textasshapes={false|true}];
- Export content of window to an image file
show window=pagesetup; - Setup the page for printing
show window=print; - Print the main panel
extract what=document file=< megan-filename> [sparsefile={false|true}] [data={Taxonomy|SEED|KEGG}] [ids=< numbers...>]
[names=< names...>] [allbelow={false|true}];
- Extract all reads and matches on or below selected node(s) to a new document
extract what=reads outdir=< directory> outfile=< filename-template> [data={Taxonomy|SEED|KEGG}]
[ids=< SELECTED|numbers...>] [names=< names...>] [allbelow={false|true}]; - Extract reads for the selected nodes
import csv={reads|summary} separator={comma|tab} file=< fileName> [toppercent=< num>] [taxonomy={true|false}]
[seed={false|true}] [kegg={false|true}] [useRefSeq={false|true}] [minscore=< num>] [minsupport=< num>];
- Load data in comma-separated-values (CSV) format: READ_NAME,CLASS-NAME,SCORE or CLASS,COUNT(,COUNT...)
import format=biome file=< fileName>; - Import data from a table in BIOME format
show window=properties; - Show document properties
close;- Close the window
Export sub-menu:
Export what=CSV format={readname_taxonname|readname_taxonid|readname_taxonpath|taxonname_count|taxonpath_count|taxonid_count|taxonname_readname|
taxonpath_readname|taxonid_readname|taxonname_length|taxonpath_length|taxonid_length|readname_refseqid|readname_ seedname|
readname_seedpath|seedname_count|seedpath_count|seedname_length|seedpath_length|seedname_readname|seedpath_readname|
readname_keggname|readname_keggpath|keggname_count|keggpath_count|keggname_length|keggpath_length|keggname_readname|keggpath_readname}
separator={comma|tab} file=< filename>;
- Export assignments of reads to nodes to a CSV (comma-separated values) file
export what=reads [data={Taxonomy|SEED|KEGG}] file=< filename>;
- Export all reads to a text file (or only those for selected nodes, if any selected)
export what=matches [data={Taxonomy|SEED|KEGG}] file=< filename>;
- Export all matches to a text file (or only those for selected nodes, if any selected)
Edit menu:
show window=formatter; - Format nodes and edges
show findtoolbar={true|false}; - Open the Find toolbar
Preferences sub-menu:
set db=< string> user=< string> password=< string>;
- Set postgres database name and user authorization
set showlegend={true|false}; - Show legend identifying different datasets
Select menu:
select nodes=all; - Select all nodes
select nodes=none; - Deselect all nodes
select nodes=previous; - Select from previous window
select nodes=leaves; - Select all leaves
select nodes=internal; - Select all internal nodes
select nodes=intermediate; - Select all intermediate nodes
select nodes=subtree; - Select subtree
select nodes=subleaves; - Select allow leaves below
select nodes=invert; - Invert selection
Level sub-menu:
select rank=Kingdom; - Select Kingdom
select rank=Phylum; - Select Phylum
select rank=Class; - Select Class
select rank=Order; - Select Order
select rank=Family; - Select Family
select rank=Varietas; - Select Varietas
select rank=Genus; - Select Genus
select rank=Species_group; - Select Species_group
select rank=Subspecies; - Select Subspecies
select rank=Species; - Select Species
Options menu:
recompute [minsupport=< number>] [minscore=< number>] [toppercent=< number>] [winscore=< number>] [mincomplexity=< number>]
[pairedreads={false|true}] [useseed={false|true}] [usekegg={false|true}]; - Rerun the LCA analysis with different parameters
set totalreads=< num>; - Set the total number of reads in the analysis (will initiate recalculation of all classifications)
list summary={all|selected}; - List summary of hits for selected nodes of tree
compare mode={absolute|relative|merge}[ignore_unassigned={false|true}] [pid=< number>,...] [meganfile=< filename>,...];
- Open compare dialog to produce a comparison of multiple datasets
set order=< number> < number>...; - Change the order of datasets in a comparison view
show window=colorpalette; - Edit the color palette used in comparison views
show webpage taxon=< name|id>; - Open NCBI Taxonomy web site in browser
inspector taxa=selected; - Inspect the read-to-taxon assignments
Taxon Disabling sub-menu:
enable taxa=all; - Enable all taxa
disable taxa={selected|< name,..>}; - disable all selected taxa or the named ones
enable taxa={selected|< name,...>}; - enable all selected taxa or the named ones
list taxa=disabled; - List all disabled taxa
Layout menu:
set autolayoutlabels={true|false}; - Layout labels
set scaleby=assigned; - Scale nodes by number of reads assigned to taxon
set scaleby=summarized; - Scale nodes by number of reads assigned to and below a taxon
set maxnoderadius=< num>; - Set the maximum node radius in pixels
set zoom=selected; - Zoom to the selection
set zoom=fit; - Contract tree vertically
set zoom=full; - Expand tree vertically
set nodedrawer=circle; - Draw data as circles
set nodedrawer=piechart; - Draw data as pie charts
set nodedrawer=heatmap; - Draw data as heat maps
set nodedrawer=barchart; - Draw nodes as bars
set drawer={Cladogram,Phylogram}; - Draw tree as cladogram with all leaves aligned right
set drawleavesonly={true|false}; - Only draw leaves
Expand/Contract sub-menu:
expand direction=horizontal; - Expand view horizontally
contract direction=horizontal; - Contract view horizontally
expand direction=vertical; - Expand view vertically
contract direction=vertical; - Contract view vertically
Highlight Differences sub-menu:
set highlightdifferences={true|false} [correction={none|bonferroni|holm_bonferroni}];
- In a comparison of exactly two
datasets, highlight statistically significant differences, using no correction
set comparison_highlight_color=< number>;
- Set the pairwise comparison highlight color
Tree menu:
collapse nodes=selected; - Collapse selected nodes
collapse level=< num>; - Collapse all nodes at given depth in tree
uncollapse nodes={all|selected|subtree}; - Uncollapse selected nodes
nodelabels names={true|false}; - Display the full names of taxa
nodelabels ids={true|false}; - Display the NCBI ids of taxa
nodelabels assigned={true|false}; - Display the number of reads assigned to a taxon
nodelabels summarized={true|false}; - Display the total number of hits to a taxon and its descendants
show labels=selected; - Show labels for selected nodes
hide labels=selected; - Hide labels for selected nodes
show intermediate=< bool>; - Show intermediate labels at nodes of degree 2
Collapse At Taxonomic Level sub-menu:
collapse rank=Kingdom; - Collapse Kingdom
collapse rank=Phylum; - Collapse Phylum
collapse rank=Class; - Collapse Class
collapse rank=Order; - Collapse Order
collapse rank=Family; - Collapse Family
collapse rank=Varietas; - Collapse Varietas
collapse rank=Genus; - Collapse Genus
collapse rank=Species_group; - Collapse Species_group
collapse rank=Subspecies; - Collapse Subspecies
collapse rank=Species; - Collapse Species
Window menu:
show window=howtocite; - Show how to cite the program
show window=website; - Go to the program website
show window=register; - Show registration window
show window=message; - Open the message window
set windowsize=< width> x < height>; - Set the window size
show window=inspector; - Open inspector window
show window=mainviewer; - Brings the main viewer to the front
show window=seedviewer; - Opens the SEED Analyzer
show window=keggviewer; - Opens the KEGG Analyzer
show chart data={taxonomy|SEED|KEGG|attributes}; - Chart assigned reads
show wordCloud data={taxonomy|SEED|KEGG|attributes}; - WordCloud based on assigned reads
show window=network; - Open a network comparison window
show rarefaction data={taxonomy|seed|kegg}; - Compute a rarefaction curve
help [keyword]; - Shows syntax help for commands
Additional commands:
exportimage-old file=< filename> [format={eps|svg|gif|png|jpg|pdf}] [replace={false|true}] [textasshapes={false|true}];
- Export content of window to an image file
list assignments; - List the number of reads assigned to each level of the taxonomy
load colorfile=< filename>;- Load dataset colors from a file (format: one RGB color per line)
load gi2taxfile=< filename>; - Load the GI mapping file gi_taxid_nucl.bin, downloaded from the MEGAN website
load synonymsfile=< filename>; - Load a file of taxon-name synonyms
load treefile=< filename> [mapfile=< filename>; - Load the taxonomy .tre and .map files (e.g. ncbi.tre and ncbi.map)
mp-analyzer what={lca-ranks|compare} infile=< filename> outfile=< filename>; - Compute the rank at which the LCA is found for each mate-pair, or preprocess comparison
quit; - Quit the program
replacelinks [old=< filename> new=< filename>] [...]; - Replace links to source files
select ids=< ids...>; - Select the nodes for the given ids
select name=< names...>; - Select the named nodes
set context=< window-name>; - Choose command context, i.e. the window that should parse the subsequent commands
set dir=< directory> - Set the current directory
set margin [left=< number>] [right=< number>] [bottom=< number>] [top=< number>]; - Set margins used in tree visualization
set proxy=< string> port=< number> user=< string> password=< string>; - Set proxy credentials
set scaleby=none; - Do not scale nodes
set usekegg={true|false}; - Turn KEGG analysis on or off
set usepercentidentity={false|true}; - Adjust assignment based on best percent identity of matches, using the following minimum requirements:
Species 97%, Genus 95%, Family 90%, Order 85%, Class 80%, Phylum 75%
set useseed={true|false}; - Turn SEED analysis on or off
setprop < name>=< value>; - Set a property
show chart=taxavsseed; - Chart taxa vs SEED
show histogram taxonid=< num>; - Shows the distribution of matches for a given taxon
show window=about; - About MEGAN and the authors
show window=checkforupdate; - Check for an update of the program
show window=cogs; - Open COG window
show window=comparisonstats; - Open dialog to produce a statistical comparison of two datasets
show window=fixlinks; - Fix missing links to source BLAST and reads files
show window=webservice; - Open metagenomic files from the MEGAN-DB website
tofront; - Bring window to front
update [reprocess={false|true}] [reset={false|true}] [reinduce={false|true}];
- Update data. If nothing specified, assumes reinduce=true
version; - Show version info
参考资料
- 软件自带说明书
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn