【5.2.1】nanopolish
一般第三方的nanopore数据分析工具喜欢以nano开头,而polish一般碱基纠错工具喜欢叫polish,没错。nanopolish也可以用来做nanopore拼接结果的碱基校正。除此之外,nanopolish还可以做碱基识别,比对,变异检测,甲基化检测等。
软件官网:https://github.com/jts/nanopolish
官方文档:https://nanopolish.readthedocs.io/en/latest/index.html
一、下载安装:
git clone --recursive https://github.com/jts/nanopolish.git
cd nanopolish
make
需要额外安装 https://github.com/samtools/htslib
git clone https://github.com/samtools/htslib.git
cd htslib
autoreconf -i # Build the configure script and install files it uses
./configure # Optional but recommended, for choosing extra functionality
make
make install
二、主要功能
nanapolish可以完成很多的分析,软件也提供了非常详细的帮助文档,还配有案例数据。软件安装完成之后在命令行直接敲nanopolish命令会弹出软件的选项参数。
- call-methylation:识别甲基化信号
- eventalign:序列比对
- extract:碱基识别
- index:建立索引
- phase-reads:相位分析,定位序列来自二倍体父本还是母本
- polya:估计RNA直接测序polyA尾巴长度
- variants:检测SNP等突变
- vcf2fasta:生成一致性序列
基本命令:
nanopolish call-methylation: predict genomic bases that may be methylated
nanopolish variants: detect SNPs and indels with respect to a reference genome
nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly
nanopolish eventalign: align signal-level events to k-mers of a reference genome
三、使用案例
nanopolish的各个功能一般都需要使用原始的fast5文件作为输入。例如利用nanopolish来检测甲基化位点。
1、首先,下载测试数据。
wget http://s3.climb.ac.uk/nanopolish_tutorial/methylation_example.tar.gz
tar -xvf methylation_example.tar.gz
cd methylation_example
2、建立索引
nanopolish index -d fast5_files/ output.fastq
3、与参考基因组比对
minimap2 -a -x map-ont reference.fasta output.fastq | samtools sort -T tmp -o output.sorted.bam
samtools index output.sorted.bam
4、利用nanopolish检测甲基化位点
nanopolish call-methylation -t 8 -r output.fastq -b output.sorted.bam -g reference.fasta -w "chr20:5,000,000-10,000,000" > methylation_calls.tsv
5、对结果进行过滤
scripts/calculate_methylation_frequency.py methylation_calls.tsv > methylation_frequency.tsv
四、报错
4.1 报错
g++ -o nanopolish -g -O3 -std=c++11 -fopenmp -fsigned-char -D_FILE_OFFSET_BITS=64 -I./include -I./htslib -I./minimap2 -I./src -I./src/hmm -I./src/thirdparty -I./src/thirdparty/scrappie -I./src/common -I./src/alignment -I./src/pore_model -I./src/io -I./src/basemods -I./eigen/ -I./slow5lib/include/ -fPIC src/main/nanopolish.o src/nanopolish_variant_db.o src/nanopolish_vcf2fasta.o src/nanopolish_polya_estimator.o src/nanopolish_call_methylation.o src/nanopolish_haplotype.o src/nanopolish_squiggle_read.o src/nanopolish_detect_polyi.o src/nanopolish_read_db.o src/nanopolish_methyltrain.o src/nanopolish_raw_loader.o src/nanopolish_index.o src/nanopolish_scorereads.o src/training_core.o src/nanopolish_phase_reads.o src/nanopolish_fast5_check.o src/nanopolish_call_variants.o src/nanopolish_train_poremodel_from_basecalls.o src/hmm/nanopolish_duration_model.o src/hmm/nanopolish_profile_hmm.o src/hmm/nanopolish_profile_hmm_r9.o src/hmm/nanopolish_transition_parameters.o src/hmm/nanopolish_profile_hmm_r7.o src/common/fs_support.o src/common/nanopolish_bam_processor.o src/common/nanopolish_alphabet.o src/common/logsum.o src/common/nanopolish_iupac.o src/common/nanopolish_klcs.o src/common/nanopolish_common.o src/common/nanopolish_variant.o src/common/nanopolish_bam_utils.o src/alignment/nanopolish_eventalign.o src/alignment/nanopolish_anchor.o src/alignment/nanopolish_alignment_db.o src/pore_model/nanopolish_model_names.o src/pore_model/nanopolish_pore_model_set.o src/pore_model/nanopolish_poremodel.o src/io/nanopolish_fast5_processor.o src/io/nanopolish_fast5_loader.o src/io/nanopolish_fast5_io.o src/basemods/nanopolish_basemods.o src/thirdparty/fet.o src/thirdparty/stdaln.o src/thirdparty/scrappie/event_detection.o src/thirdparty/scrappie/scrappie_common.o ./htslib/libhts.a ./minimap2/libminimap2.a ./slow5lib/lib/libslow5.a ./lib/libhdf5.a -lz -lrt -ldl
./htslib/libhts.a(hts.o): In function `decompress_peek_xz':
/data/software/htslib/hts.c:347: undefined reference to `lzma_easy_decoder_memusage'
/data/software/htslib/hts.c:347: undefined reference to `lzma_stream_decoder'
/data/software/htslib/hts.c:355: undefined reference to `lzma_code'
/data/software/htslib/hts.c:362: undefined reference to `lzma_end'
/data/software/htslib/hts.c:357: undefined reference to `lzma_end'
./htslib/libhts.a(cram_io.o): In function `cram_compress_by_method':
/data/software/nanopolish/htslib/cram/cram_io.c:1776: undefined reference to `BZ2_bzBuffToBuffCompress'
./htslib/libhts.a(cram_io.o): In function `lzma_mem_deflate':
/data/software/nanopolish/htslib/cram/cram_io.c:1293: undefined reference to `lzma_stream_buffer_bound'
/data/software/nanopolish/htslib/cram/cram_io.c:1299: undefined reference to `lzma_easy_buffer_encode'
./htslib/libhts.a(cram_io.o): In function `cram_uncompress_block':
/data/software/nanopolish/htslib/cram/cram_io.c:1612: undefined reference to `BZ2_bzBuffToBuffDecompress'
./htslib/libhts.a(cram_io.o): In function `lzma_mem_inflate':
/data/software/nanopolish/htslib/cram/cram_io.c:1315: undefined reference to `lzma_easy_decoder_memusage'
/data/software/nanopolish/htslib/cram/cram_io.c:1315: undefined reference to `lzma_stream_decoder'
/data/software/nanopolish/htslib/cram/cram_io.c:1333: undefined reference to `lzma_code'
/data/software/nanopolish/htslib/cram/cram_io.c:1346: undefined reference to `lzma_code'
/data/software/nanopolish/htslib/cram/cram_io.c:1357: undefined reference to `lzma_end'
/data/software/nanopolish/htslib/cram/cram_io.c:1362: undefined reference to `lzma_end'
/data/software/nanopolish/htslib/cram/cram_io.c:1357: undefined reference to `lzma_end'
./htslib/libhts.a(arith_dynamic.o): In function `arith_compress_to':
/data/software/nanopolish/htslib/htscodecs/htscodecs/arith_dynamic.c:879: undefined reference to `BZ2_bzBuffToBuffCompress'
解决办法:
1.首先确保安装了xz-libs和bzip2
dnf install xz-libs
2.如果还是找不到,就直接指定库。 -llzma -lbz2 添加到末尾
g++ -o nanopolish -g -O3 -std=c++11 -fopenmp -fsigned-char -D_FILE_OFFSET_BITS=64 -I./include -I./htslib -I./minimap2 -I./src -I./src/hmm -I./src/thirdparty -I./src/thirdparty/scrappie -I./src/common -I./src/alignment -I./src/pore_model -I./src/io -I./src/basemods -I./eigen/ -I./slow5lib/include/ -fPIC src/main/nanopolish.o src/nanopolish_variant_db.o src/nanopolish_vcf2fasta.o src/nanopolish_polya_estimator.o src/nanopolish_call_methylation.o src/nanopolish_haplotype.o src/nanopolish_squiggle_read.o src/nanopolish_detect_polyi.o src/nanopolish_read_db.o src/nanopolish_methyltrain.o src/nanopolish_raw_loader.o src/nanopolish_index.o src/nanopolish_scorereads.o src/training_core.o src/nanopolish_phase_reads.o src/nanopolish_fast5_check.o src/nanopolish_call_variants.o src/nanopolish_train_poremodel_from_basecalls.o src/hmm/nanopolish_duration_model.o src/hmm/nanopolish_profile_hmm.o src/hmm/nanopolish_profile_hmm_r9.o src/hmm/nanopolish_transition_parameters.o src/hmm/nanopolish_profile_hmm_r7.o src/common/fs_support.o src/common/nanopolish_bam_processor.o src/common/nanopolish_alphabet.o src/common/logsum.o src/common/nanopolish_iupac.o src/common/nanopolish_klcs.o src/common/nanopolish_common.o src/common/nanopolish_variant.o src/common/nanopolish_bam_utils.o src/alignment/nanopolish_eventalign.o src/alignment/nanopolish_anchor.o src/alignment/nanopolish_alignment_db.o src/pore_model/nanopolish_model_names.o src/pore_model/nanopolish_pore_model_set.o src/pore_model/nanopolish_poremodel.o src/io/nanopolish_fast5_processor.o src/io/nanopolish_fast5_loader.o src/io/nanopolish_fast5_io.o src/basemods/nanopolish_basemods.o src/thirdparty/fet.o src/thirdparty/stdaln.o src/thirdparty/scrappie/event_detection.o src/thirdparty/scrappie/scrappie_common.o ./htslib/libhts.a ./minimap2/libminimap2.a ./slow5lib/lib/libslow5.a ./lib/libhdf5.a -lz -lrt -ldl -llzma -lbz2
结果就好了。。。 为什么还要指定库呢?
wget -c https://nchc.dl.sourceforge.net/project/bzip2/bzip2-1.0.6.tar.gz tar -xzvf bzip2-1.0.6.tar.gz cd bzip2-1.0.6 make -f Makefile-libbz2_so make clean make make install
https://blog.csdn.net/u013010499/article/details/113105907
参考资料
- https://mp.weixin.qq.com/s?__biz=MzI2MjA1MDQxMg==&mid=2649710162&idx=1&sn=643885c153a3de63fcfe99aefddab9d9&chksm=f24af951c53d7047321a793a111086161c8f9e7d5f4c109082c22b90ba94c29cdb7eeb230a7d&scene=21#wechat_redirect
- https://github.com/jts/nanopolish
- https://github.com/samtools/htslib.git
- https://github.com/vcflib/vcflib/issues/254
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn