预测adapter--minion

如果我们不知道数据的接头,该怎么办呢?

minion 由 The European Bioinformatics Institute(EBI)开发,可用于预测接头。

一、安装

安装minion,参见EBI-Kraken网站

linux下载网址:

http://wwwdev.ebi.ac.uk/enright-dev/kraken/reaper/binaries/reaper-13-100/linux/

二、用法

使用方法非常简单:

minion search-adapter -i SRR.fastq

预测adapter的结果示例:

criterion=sequence-density
sequence-density=52.19
sequence-density-rank=1
fanout-score=31.57
fanout-score-rank=1
prefix-density=54.75
prefix-fanout=30.1
sequence=TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACACACACATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA


criterion=fanout-score
sequence-density=52.19
sequence-density-rank=1
fanout-score=31.57
fanout-score-rank=1
prefix-density=54.75
prefix-fanout=30.1
sequence=TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACACACACATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA

大多数情况下,选取第一个预测结果(criterion=sequence-density)为adapter。为什么呢?

The two criteria are unfortunately necessary due to the varying charachteristics of 3' adapter sequence in different experimental protocols. The first criterion is frequency of occurrence; the second criterion incorporates a fan-out measure that captures the typical characteristic of 3' adapter sequence of being attached to a multitude of different prefixes. When infering adapters (i.e. without using-adapter) two candidate sequences will be shown. The second will start with the linecriterion=fanout-score. The second should only be considered if the first candidate is clearly a biological sequence. This can be established by using one of theBLASTinterfaces provided for example byNCBIandENSEMBL. 也就是说,只有在第一个序列确认为是有生物学意义时,才考虑第二个序列(criterion=fanout-score)。可以用NCBI或ENSEMBL的BLAST平台来确认是否有生物学意义

三、讨论

minion寻找到的接头一定要用Google或者Baidu搜索,确认是否存在该接头。

因为minion的结果不一定就是接头,只是用来预测。

接下来也是使用cutadapt或其他软件去掉接头。参考前文即可。

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn