【4.3.2】HLA分型工具--HLA-HD
一、安装
需要发邮件申请软件:https://www.genome.med.kyoto-u.ac.jp/HLA-HD/
需要安装bowtie2
tar -xvf hlahd.1.5.0.tar.gz
# tar -zxvf hlahd.1.5.0.tar.gz
cd hlahd.1.5.0
sh install.sh
更新dictionary,有点耗时,要几分钟。
sh update.dictionary.sh
二、running
ulimit -Sa
ulimit -n 1024
如果你的内存非常大,这个数据可以设置大一点,否则有时多线程跑的时候还会报错。
官方说要解压fastq.gz,但实际上压缩文件也可以跑!
例子:fastq
hlahd.sh -t [thread_num] -m [minimum length of reads] \
-c [trimming rate] \
-f [path_to freq_data directory] \
fastq_1 fastq_2 \
gene_split_filt path_to_dictionary_directory \
IDNAME[any name] output_directory
hlahd.sh -t 2 -m 100 -c 0.95 -f freq_data/ \
data/sample_1.fastq data/sample_2.fastq \
HLA_gene.split.txt dictionary/ \
sampleID estimation
如果是bam文件:
Using bam files mapped to human genome
If you have mapped result to human genome, you can create fastq of mhc region and unmapped reads by using samtools and picard tools as follows:
#Extract MHC region
:for GRCh38.p12
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,510,120-33,480,577 > sample.mhc.bam
:for GRCh37
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,477,797-33,448,354 > sample.mhc.bam
#Extract unmap reads
>samtools view -b -f 4 sample.sorted.bam > sample.unmap.bam
#Merge bam files
>samtools merge -o sample.merge.bam sample.unmap.bam sample.mhc.bam
#Create fastq
>java -jar picard.jar SamToFastq I=sample.merge.bam F=sample.hlatmp.1.fastq F2=sample.hlatmp.2.fastq
#Change fastq ID
>cat sample.hlatmp.1.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/1″,” 1″,O);print O}else{print $0}}’ > sample.hla.1.fastq
>cat sample.hlatmp.2.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/2″,” 2″,O);print O}else{print $0}}’ > sample.hla.2.fastq
- 10X单细胞RNAseq数据HLA分型工具:scHLAcount
- WES数据只能检测ABC三种结果的: OptiType
- 检测gene数量比较多的HLA_scan
三、用法
HLA-HD 用法
export PATH=$PATH:/data/software/hlahd.1.5.0/bin/
/data/software/hlahd.1.5.0/bin/hlahd.sh -t 200 -m 100 -c 0.95 -f /data4/neoantigen/HLA/hlahd.1.5.0/freq_data R1.trimmed.fastq.gz R2.trimmed.fastq.gz /data4/neoantigen/HLA/hlahd.1.5.0/HLA_gene.split.txt /data4/neoantigen/HLA/hlahd.1.5.0/dictionary/ sample1 HLA-HD-result
参数说明:
-m : A read whose length is shorter than this parameter is ignored. Default size is 100.
-t : Number of cores used to execute the program.
-c : Trimming option. If a match sequence is not found in the dictionary, trim the read until some sequence is matched to or reaches this ratio. Default is 1.0.
-f : Use information of allele frequencies. The default data exist in the installed directory (/hlahd.version/freq_data).
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn