【9.3.5.4】bio-cwl库

这个太牛逼了 : https://github.com/common-workflow-library/bio-cwl-tools

问题:

  • 如何为棘手的问题找到其他CWL解决方案?

目标:

  • 了解为常见问题寻找解决方案的良好资源

一、预先编写的工具说明

启动CWL工作流时,建议检查是否已有可用于要使用的工具的CWL文档。Bio-cwl-tools 工具是一个生物/生命科学相关工具的化学武器库文件库。

前面步骤的CWL文档已经提供给您了,但是,您也可以在这个库中找到它们。在本集中,您将使用bio-cwl工具库将最后一步添加到工作流中。

二、在工作流中添加新步骤

我们工作流程的最后一步是对RNA-seq读数进行计数,我们将使用featureCounts工具进行计数。

Find the featureCounts tool in the bio-cwl-tools library. Have a look at the CWL document. Which inputs does this tool need? And what are the outputs of this tool?

The featureCounts CWL document can be found in the GitHub repo; it has 2 inputs: annotations (line 6) and mapped_reads, both files. These inputs can be found on lines 6 and 9. The output of this tool is a file called featurecounts (line 21).

我们需要featureCount的本地副本,以便在我们的工作流程中使用它。在安装过程中,我们已经将其作为git子模块导入,因此该工具应位于bio-cwl-tools/susubread/featureCounts.cwl。

请复制rna_seq_workflow_2.cwl文件以创建rna_seq_workflow_3.cwl。将featureCounts工具添加到工作流中。与STAR工具类似,此工具也需要比默认值更多的RAM。要运行该工具,至少需要500 MiB的RAM。使用带有ResourceRequirement的需求条目来分配ramMin 500。使用上一练习的输入和输出将此步骤连接到上一步骤。

rna_seq_workflow_3.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly_forward:
    type: File
    format: http://edamontology.org/format_1930  # FASTQ
  rna_reads_fruitfly_reverse:
    type: File
    format: http://edamontology.org/format_1930  # FASTQ
  ref_fruitfly_genome: Directory
  fruitfly_gene_model: File

steps:
  quality_control_forward:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly_forward
    out: [html_file]

  quality_control_reverse:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly_reverse
    out: [html_file]

  trim_low_quality_bases:
    run: bio-cwl-tools/cutadapt/cutadapt-paired.cwl
    in:
      reads_1: rna_reads_fruitfly_forward
      reads_2: rna_reads_fruitfly_reverse
      minimum_length: { default: 20 }
      quality_cutoff: { default: 20 }
    out: [ trimmed_reads_1, trimmed_reads_2, report ]

  mapping_reads:
    requirements:
      ResourceRequirement:
        ramMin: 5120
    run: bio-cwl-tools/STAR/STAR-Align.cwl
    in:
      RunThreadN: {default: 4}
      GenomeDir: ref_fruitfly_genome
      ForwardReads: trim_low_quality_bases/trimmed_reads_1
      ReverseReads: trim_low_quality_bases/trimmed_reads_2
      OutSAMtype: {default: BAM}
      SortedByCoordinate: {default: true}
      OutSAMunmapped: {default: Within}
      Overhang: { default: 36 }  # the length of the reads - 1
      Gtf: fruitfly_gene_model
    out: [alignment]

  index_alignment:
    run: bio-cwl-tools/samtools/samtools_index.cwl
    in:
      bam_sorted: mapping_reads/alignment
    out: [bam_sorted_indexed]

  count_reads:
    requirements:
      ResourceRequirement:
        ramMin: 500
    run: bio-cwl-tools/subread/featureCounts.cwl
    in:
      mapped_reads: index_alignment/bam_sorted_indexed
      annotations: fruitfly_gene_model
    out: [featurecounts]

outputs:
  quality_report_forward:
    type: File
    outputSource: quality_control_forward/html_file
  quality_report_reverse:
    type: File
    outputSource: quality_control_reverse/html_file
  bam_sorted_indexed:
    type: File
    outputSource: index_alignment/bam_sorted_indexed
  featurecounts:
    type: File
    outputSource: count_reads/featurecounts

工作流程已经完成,我们只需要完成YAML输入文件。请将workflow_input_2.yml文件复制到workflow_input _3.yml,并在输入文件中添加最后一个条目,即fruitfly_gene_model文件。

workflow_input_3.yml

rna_reads_fruitfly_forward:
  class: File
  location: rnaseq/GSM461177_1_subsampled.fastqsanger
  format: http://edamontology.org/format_1930  # FASTQ
rna_reads_fruitfly_reverse:
  class: File
  location: rnaseq/GSM461177_2_subsampled.fastqsanger
  format: http://edamontology.org/format_1930  # FASTQ
ref_fruitfly_genome:
  class: Directory
  location: rnaseq/dm6-STAR-index
fruitfly_gene_model:
  class: File
  location: rnaseq/Drosophila_melanogaster.BDGP6.87.gtf
  format: http://edamontology.org/format_2306

运行:

cwltool --cachedir cache rna_seq_workflow_3.cwl workflow_input_3.yml

参考资料

这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn