【5.4.1】Nanopore_psU
一、安装
git clone https://github.com/sihaohuanguc/Nanopore_psU.git
pip install .
nanopsu -h
二、使用
1.先使用guppy来call, 生成fastq文件
guppy_basecaller --input_path fast5 \
--recursive \
--save_path fastq \
--records_per_fastq 0 \
--flowcell FLO-MIN106 \
--kit SQK-RNA002 \
--qscore_filtering \
--min_qscore 7 \
--cpu_threads_per_caller 3 \
--num_callers 5
2.Alignment and pile up
nanopsu alignment -i path/of/fastq/ -r reference.fa
生成的文件
$ ls alignment/plus_strand/
collect.bam collect_pile.txt collect.sorted.bam reference.fa.fai
collect.fastq collect.sam reference.fa
3.特征提取
删掉未匹配的位点,用>符号表示的
nanopsu remove_intron
某个位置的U,至少有20个reads覆盖,才进一步的提取
nanopsu extract_features
4.psU预测
nanopsu prediction
三、文献解读
纳米孔数据预处理
- 在测序过程中生成的所有原始 fast5 文件,由 guppy 碱基调用程序(版本 3.2.2+9fe0a78)使用 min_qscore 7 调用的碱基。
- 由 minimap2(版本 2.18-r1015)[ 37 ] 与参数 -ax splice -uf -k14 对齐。 (这样的比对,的确会得到更多的序列)
- “错误”特征是通过定制的 Python 脚本( https://github.com/sihaohuanguc/Nanopore_psU )从 mpileup 文件中提取的。
四、报错
4.1 报错
No module name 'sklearn.ensemble.forest'
解决方法:
pip install scikit-learn==0.23.2 --timeout=100000
4.2 报错
undefined reference to `lzma_end'
解决:
dnf install xz-libs
g++ -o nanopolish -g -O3 -std=c++11 -fopenmp -fsigned-char -D_FILE_OFFSET_BITS=64 -I./include -I./htslib -I./minimap2 -I./src -I./src/hmm -I./src/thirdparty -I./src/thirdparty/scrappie -I./src/common -I./src/alignment -I./src/pore_model -I./src/io -I./src/basemods -I./eigen/ -I./slow5lib/include/ -fPIC src/main/nanopolish.o src/nanopolish_variant_db.o src/nanopolish_vcf2fasta.o src/nanopolish_polya_estimator.o src/nanopolish_call_methylation.o src/nanopolish_haplotype.o src/nanopolish_squiggle_read.o src/nanopolish_detect_polyi.o src/nanopolish_read_db.o src/nanopolish_methyltrain.o src/nanopolish_raw_loader.o src/nanopolish_index.o src/nanopolish_scorereads.o src/training_core.o src/nanopolish_phase_reads.o src/nanopolish_fast5_check.o src/nanopolish_call_variants.o src/nanopolish_train_poremodel_from_basecalls.o src/hmm/nanopolish_duration_model.o src/hmm/nanopolish_profile_hmm.o src/hmm/nanopolish_profile_hmm_r9.o src/hmm/nanopolish_transition_parameters.o src/hmm/nanopolish_profile_hmm_r7.o src/common/fs_support.o src/common/nanopolish_bam_processor.o src/common/nanopolish_alphabet.o src/common/logsum.o src/common/nanopolish_iupac.o src/common/nanopolish_klcs.o src/common/nanopolish_common.o src/common/nanopolish_variant.o src/common/nanopolish_bam_utils.o src/alignment/nanopolish_eventalign.o src/alignment/nanopolish_anchor.o src/alignment/nanopolish_alignment_db.o src/pore_model/nanopolish_model_names.o src/pore_model/nanopolish_pore_model_set.o src/pore_model/nanopolish_poremodel.o src/io/nanopolish_fast5_processor.o src/io/nanopolish_fast5_loader.o src/io/nanopolish_fast5_io.o src/basemods/nanopolish_basemods.o src/thirdparty/fet.o src/thirdparty/stdaln.o src/thirdparty/scrappie/event_detection.o src/thirdparty/scrappie/scrappie_common.o ./htslib/libhts.a ./minimap2/libminimap2.a ./slow5lib/lib/libslow5.a ./lib/libhdf5.a -lz -lrt -ldl
./htslib/libhts.a(hts.o): In function `decompress_peek_xz':
/data/software/htslib/hts.c:347: undefined reference to `lzma_easy_decoder_memusage'
/data/software/htslib/hts.c:347: undefined reference to `lzma_stream_decoder'
/data/software/htslib/hts.c:355: undefined reference to `lzma_code'
/data/software/htslib/hts.c:362: undefined reference to `lzma_end'
/data/software/htslib/hts.c:357: undefined reference to `lzma_end'
./htslib/libhts.a(cram_io.o): In function `cram_compress_by_method':
/data/software/nanopolish/htslib/cram/cram_io.c:1776: undefined reference to `BZ2_bzBuffToBuffCompress'
./htslib/libhts.a(cram_io.o): In function `lzma_mem_deflate':
/data/software/nanopolish/htslib/cram/cram_io.c:1293: undefined reference to `lzma_stream_buffer_bound'
/data/software/nanopolish/htslib/cram/cram_io.c:1299: undefined reference to `lzma_easy_buffer_encode'
./htslib/libhts.a(cram_io.o): In function `cram_uncompress_block':
/data/software/nanopolish/htslib/cram/cram_io.c:1612: undefined reference to `BZ2_bzBuffToBuffDecompress'
./htslib/libhts.a(cram_io.o): In function `lzma_mem_inflate':
/data/software/nanopolish/htslib/cram/cram_io.c:1315: undefined reference to `lzma_easy_decoder_memusage'
/data/software/nanopolish/htslib/cram/cram_io.c:1315: undefined reference to `lzma_stream_decoder'
/data/software/nanopolish/htslib/cram/cram_io.c:1333: undefined reference to `lzma_code'
/data/software/nanopolish/htslib/cram/cram_io.c:1346: undefined reference to `lzma_code'
/data/software/nanopolish/htslib/cram/cram_io.c:1357: undefined reference to `lzma_end'
/data/software/nanopolish/htslib/cram/cram_io.c:1362: undefined reference to `lzma_end'
/data/software/nanopolish/htslib/cram/cram_io.c:1357: undefined reference to `lzma_end'
./htslib/libhts.a(arith_dynamic.o): In function `arith_compress_to':
/data/software/nanopolish/htslib/htscodecs/htscodecs/arith_dynamic.c:879: undefined reference to `BZ2_bzBuffToBuffCompress'
-llzma -lbz2 添加到末尾
g++ -o nanopolish -g -O3 -std=c++11 -fopenmp -fsigned-char -D_FILE_OFFSET_BITS=64 -I./include -I./htslib -I./minimap2 -I./src -I./src/hmm -I./src/thirdparty -I./src/thirdparty/scrappie -I./src/common -I./src/alignment -I./src/pore_model -I./src/io -I./src/basemods -I./eigen/ -I./slow5lib/include/ -fPIC src/main/nanopolish.o src/nanopolish_variant_db.o src/nanopolish_vcf2fasta.o src/nanopolish_polya_estimator.o src/nanopolish_call_methylation.o src/nanopolish_haplotype.o src/nanopolish_squiggle_read.o src/nanopolish_detect_polyi.o src/nanopolish_read_db.o src/nanopolish_methyltrain.o src/nanopolish_raw_loader.o src/nanopolish_index.o src/nanopolish_scorereads.o src/training_core.o src/nanopolish_phase_reads.o src/nanopolish_fast5_check.o src/nanopolish_call_variants.o src/nanopolish_train_poremodel_from_basecalls.o src/hmm/nanopolish_duration_model.o src/hmm/nanopolish_profile_hmm.o src/hmm/nanopolish_profile_hmm_r9.o src/hmm/nanopolish_transition_parameters.o src/hmm/nanopolish_profile_hmm_r7.o src/common/fs_support.o src/common/nanopolish_bam_processor.o src/common/nanopolish_alphabet.o src/common/logsum.o src/common/nanopolish_iupac.o src/common/nanopolish_klcs.o src/common/nanopolish_common.o src/common/nanopolish_variant.o src/common/nanopolish_bam_utils.o src/alignment/nanopolish_eventalign.o src/alignment/nanopolish_anchor.o src/alignment/nanopolish_alignment_db.o src/pore_model/nanopolish_model_names.o src/pore_model/nanopolish_pore_model_set.o src/pore_model/nanopolish_poremodel.o src/io/nanopolish_fast5_processor.o src/io/nanopolish_fast5_loader.o src/io/nanopolish_fast5_io.o src/basemods/nanopolish_basemods.o src/thirdparty/fet.o src/thirdparty/stdaln.o src/thirdparty/scrappie/event_detection.o src/thirdparty/scrappie/scrappie_common.o ./htslib/libhts.a ./minimap2/libminimap2.a ./slow5lib/lib/libslow5.a ./lib/libhdf5.a -lz -lrt -ldl -llzma -lbz2
解决办法:
wget -c https://nchc.dl.sourceforge.net/project/bzip2/bzip2-1.0.6.tar.gz
tar -xzvf bzip2-1.0.6.tar.gz
cd bzip2-1.0.6
make -f Makefile-libbz2_so
make clean
make
make install
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn