【3.3.2】序列聚类(mmseqs2)
一、简介
二、下载安装
wget https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz;
tar xvfz mmseqs-linux-avx2.tar.gz;
export PATH=$(pwd)/mmseqs/bin/:$PATH
三、使用说明
mmseqs easy-cluster examples/DB.fasta result tmp
# Cluster output
# - result_rep_seq.fasta: Representatives
# - result_all_seq.fasta: FASTA-like per cluster
# - result_cluster.tsv: Adjacency list
Important parameter: –min-seq-id, –cov-mode and -c
--min-seq-id FLOAT List matches above this sequence identity (for clustering) (range 0.0-1.0) [0.000]
-c FLOAT List matches above this fraction of aligned (covered) residues (see --cov-mode) [0.800
]
examples:
# Cascaded clustering of FASTA file
mmseqs cluster sequenceDB clusterDB tmp
# --cov-mode
# Sequence 0 1 2
# Q: MAVGTACRPA 60% IGN 60%
# T: -AVGTAC--- 60% 100% IGN
# Cutoff -c 0.7 - + -
# -c 0.6 + + +
threads: 并行线程数
--rescore-mode INT Rescore diagonals with:
0: Hamming distance
1: local alignment (score only)
2: local alignment
3: global alignment
4: longest alignment fulfilling window quality criterion [0]
--cluster-mode INT 0: Set-Cover (greedy)
1: Connected component (BLASTclust)
2,3: Greedy clustering by sequence length (CDHIT) [0]
四、我的例子
/data/software/mmseqs/mmseqs/bin/mmseqs cluster examples/DB.fasta clusterRes tmp --min-seq-id 0.93 -c 0.8 --cov-mode 0 --threads 30 --rescore-mode 3
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn