【9.3.5.4】bio-cwl库
这个太牛逼了 : https://github.com/common-workflow-library/bio-cwl-tools
问题:
- 如何为棘手的问题找到其他CWL解决方案?
目标:
- 了解为常见问题寻找解决方案的良好资源
一、预先编写的工具说明
启动CWL工作流时,建议检查是否已有可用于要使用的工具的CWL文档。Bio-cwl-tools 工具是一个生物/生命科学相关工具的化学武器库文件库。
前面步骤的CWL文档已经提供给您了,但是,您也可以在这个库中找到它们。在本集中,您将使用bio-cwl工具库将最后一步添加到工作流中。
二、在工作流中添加新步骤
我们工作流程的最后一步是对RNA-seq读数进行计数,我们将使用featureCounts工具进行计数。
Find the featureCounts tool in the bio-cwl-tools library. Have a look at the CWL document. Which inputs does this tool need? And what are the outputs of this tool?
The featureCounts CWL document can be found in the GitHub repo; it has 2 inputs: annotations (line 6) and mapped_reads, both files. These inputs can be found on lines 6 and 9. The output of this tool is a file called featurecounts (line 21).
我们需要featureCount的本地副本,以便在我们的工作流程中使用它。在安装过程中,我们已经将其作为git子模块导入,因此该工具应位于bio-cwl-tools/susubread/featureCounts.cwl。
请复制rna_seq_workflow_2.cwl文件以创建rna_seq_workflow_3.cwl。将featureCounts工具添加到工作流中。与STAR工具类似,此工具也需要比默认值更多的RAM。要运行该工具,至少需要500 MiB的RAM。使用带有ResourceRequirement的需求条目来分配ramMin 500。使用上一练习的输入和输出将此步骤连接到上一步骤。
rna_seq_workflow_3.cwl
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly_forward:
type: File
format: http://edamontology.org/format_1930 # FASTQ
rna_reads_fruitfly_reverse:
type: File
format: http://edamontology.org/format_1930 # FASTQ
ref_fruitfly_genome: Directory
fruitfly_gene_model: File
steps:
quality_control_forward:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly_forward
out: [html_file]
quality_control_reverse:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly_reverse
out: [html_file]
trim_low_quality_bases:
run: bio-cwl-tools/cutadapt/cutadapt-paired.cwl
in:
reads_1: rna_reads_fruitfly_forward
reads_2: rna_reads_fruitfly_reverse
minimum_length: { default: 20 }
quality_cutoff: { default: 20 }
out: [ trimmed_reads_1, trimmed_reads_2, report ]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: trim_low_quality_bases/trimmed_reads_1
ReverseReads: trim_low_quality_bases/trimmed_reads_2
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
Overhang: { default: 36 } # the length of the reads - 1
Gtf: fruitfly_gene_model
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
count_reads:
requirements:
ResourceRequirement:
ramMin: 500
run: bio-cwl-tools/subread/featureCounts.cwl
in:
mapped_reads: index_alignment/bam_sorted_indexed
annotations: fruitfly_gene_model
out: [featurecounts]
outputs:
quality_report_forward:
type: File
outputSource: quality_control_forward/html_file
quality_report_reverse:
type: File
outputSource: quality_control_reverse/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
featurecounts:
type: File
outputSource: count_reads/featurecounts
工作流程已经完成,我们只需要完成YAML输入文件。请将workflow_input_2.yml文件复制到workflow_input _3.yml,并在输入文件中添加最后一个条目,即fruitfly_gene_model文件。
workflow_input_3.yml
rna_reads_fruitfly_forward:
class: File
location: rnaseq/GSM461177_1_subsampled.fastqsanger
format: http://edamontology.org/format_1930 # FASTQ
rna_reads_fruitfly_reverse:
class: File
location: rnaseq/GSM461177_2_subsampled.fastqsanger
format: http://edamontology.org/format_1930 # FASTQ
ref_fruitfly_genome:
class: Directory
location: rnaseq/dm6-STAR-index
fruitfly_gene_model:
class: File
location: rnaseq/Drosophila_melanogaster.BDGP6.87.gtf
format: http://edamontology.org/format_2306
运行:
cwltool --cachedir cache rna_seq_workflow_3.cwl workflow_input_3.yml
参考资料
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn