【9.3.5.5】Debugging Workflows
问题:
- 如何检查CWL文件中的错误?
- 如何获取更多信息以帮助解决错误?
- 使用CWL时常见的错误消息是什么?
目标:
- 检查CWL文件中的错误
- 输出调试信息
- 解释和修复常见的错误消息
在处理CWL工作流时,您可能会遇到错误。可能存在许多不同的错误。检查终端中的错误消息总是非常重要的,因为它会为您提供有关错误的信息。此错误消息将为您提供错误类型以及包含错误的代码行。其中一些错误将在本期节目中解释。
作为检查CWL脚本是否包含任何错误的第一步,您可以使用–validate标志运行工作流。
cwltool --validate CWL_SCRIPT.cwl
脚本可能已经过验证,但是仍然会出现错误。如果遇到错误,最佳做法是使用–debug标志运行工作流。这将为您提供有关您遇到的错误的详细信息。
cwltool --debug CWL_SCRIPT.cwl
一、YAML errors
首先,YAML语法中的错误。在编写一段代码时,很容易出错。
一些非常常见的YAML错误包括:
Tabs
使用制表符而不是空格。在YAML文件中,缩进是使用空格而不是制表符进行的。请下载并运行此示例,其中包含一个制表符。
cwltool tab-error.cwl workflow_input.yml
ERROR Tool definition failed validation:
while scanning for the next token
file:///tab-error.cwl:5:1: found character '\t' that cannot start any token
字段名称类型 Field Name Typos
字段名称中的打字。例如,很容易忘记字段名称中的大写字母。字段名称中有拼写错误的错误将显示无效字段。
rna_seq_workflow_fieldname_fail.cwl
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputsource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
workflow_input_debug.yml
rna_reads_fruitfly:
class: File
location: rnaseq/GSM461177_1_subsampled.fastqsanger
format: http://edamontology.org/format_1930 # FASTQ
ref_fruitfly_genome:
class: Directory
location: rnaseq/dm6-STAR-index
运行:
cwltool rna_seq_workflow_fieldname_fail.cwl workflow_input_debug.yml
ERROR Tool definition failed validation:
rna_seq_workflow_fieldname_fail.cwl:1:1: Object `rna_seq_workflow_fieldname_fail.cwl` is not valid
because
tried `Workflow` but
rna_seq_workflow_fieldname_fail.cwl:35:1: the `outputs` field is not valid because
rna_seq_workflow_fieldname_fail.cwl:36:3: item is invalid because
rna_seq_workflow_fieldname_fail.cwl:38:5: invalid field `outputsource`, expected one of:
'label', 'secondaryFiles', 'streamable', 'doc', 'id',
'format', 'outputSource', 'linkMerge', 'pickValue', 'type'
Variable Name Typos
变量名称中的打字。与字段名称中的拼写错误类似,在引用变量时很容易出错。这些错误将显示字段引用未知标识符。
rna_seq_workflow_varname_fail.cwl
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignments
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
运行:
cwltool rna_seq_workflow_varname_fail.cwl workflow_input_debug.yml
报错:
ERROR Tool definition failed validation:
rna_seq_workflow_varname_fail.cwl:8:1: checking field `steps`
rna_seq_workflow_varname_fail.cwl:29:3: checking object
`rna_seq_workflow_varname_fail.cwl#index_alignment`
rna_seq_workflow_varname_fail.cwl:31:5: checking field `in`
rna_seq_workflow_varname_fail.cwl:32:7: checking object
`rna_seq_workflow_varname_fail.cwl#index_alignment/bam_sorted`
Field `source` references unknown identifier
`mapping_reads/alignments`, tried
file:///.../rna_seq_workflow_varname_fail.cwl#mapping_reads/alignments
二、接线错误 Wiring error
当您忘记将工作流步骤的输出添加到输出部分时,经常会出现连接错误。这不会导致错误消息,但目录中不会有任何输出。要获得所需的输出,您必须再次运行工作流。最佳实践是在运行脚本之前检查您的输出部分,以确保您想要的所有输出都在那里。
三、 类型不匹配 Type mismatch
当变量之间的类型不匹配时,就会发生类型错误。当您在inputs部分声明一个变量时,该变量的类型必须与YAML inputs文件中的类型和工作流步骤中使用的类型相匹配。发生此错误时显示的错误消息将告诉您发生不匹配的行。
rna_seq_workflow_type_fail.cwl
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: int
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
运行:
cwltool rna_seq_workflow_type_fail.cwl workflow_input_debug.yml
报错:
ERROR Tool definition failed validation:
rna_seq_workflow_type_fail.cwl:5:3: Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:12:7: with sink 'reads_file' of type "File"
rna_seq_workflow_type_fail.cwl:5:3: Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:23:7: with sink 'ForwardReads' of type ["File", {"type":
"array", "items": "File"}]
四、格式错误
有些文件需要在YAML输入文件中指定特定的格式,例如RNA-seq分析中的fastq文件。如果未指定格式,则会发生错误。例如,您可以使用EDAM本体。
rna_seq_workflow_debug.cwl
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
workflow_input_undefined.yml
rna_reads_fruitfly:
class: File
location: rnaseq/GSM461177_1_subsampled.fastqsanger
ref_fruitfly_genome:
class: Directory
location: rnaseq/dm6-STAR-index
运行:
cwltool rna_seq_workflow_debug.cwl workflow_input_undefined.yml
报错:
ERROR Exception on step 'mapping_reads'
ERROR [step mapping_reads] Cannot make job: Expected value of 'ForwardReads' to have format http://edamontology.org/format_1930 but
File has no 'format' defined: {
"class": "File",
"location": "file:///.../rnaseq/GSM461177_1_subsampled.fastqsanger",
"size": 142867948,
"basename": "GSM461177_1_subsampled.fastqsanger",
"nameroot": "GSM461177_1_subsampled",
"nameext": ".fastqsanger"
}
总结:
- Run the workflow with the –validate option to check for errors
- The –debug option will output more information
- ‘Wiring’ errors won’t necessarily yield an error message
参考资料
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn