Shared Infrastructure Features¶
wf2wf provides a comprehensive shared infrastructure that enhances all workflow importers with intelligent inference, interactive prompting, and resource management capabilities.
Overview¶
The shared infrastructure consists of several key components that work together to provide a consistent and enhanced user experience across all supported workflow formats:
Intelligent Inference: Automatically fills in missing information
Interactive Prompting: Guides users through configuration decisions
Resource Processing: Validates and optimizes resource specifications
Loss Integration: Detects and reports information loss during conversion
Environment Management: Adapts workflows for different execution environments
Intelligent Inference¶
What It Does¶
The intelligent inference system analyzes your workflow and automatically fills in missing information based on:
Command Analysis: Infers resource requirements from command content
Format Patterns: Applies format-specific best practices
Execution Environment: Adapts to target environment requirements
Content Analysis: Detects execution models and patterns
How It Works¶
# Automatic inference is enabled by default
wf2wf convert -i workflow.smk -o workflow.dag
# The system will automatically:
# - Infer missing resource requirements
# - Detect execution models
# - Apply environment-specific optimizations
# - Suggest improvements
Inference Examples¶
Resource Inference:
# Before inference
rule process:
input: "data.txt"
output: "result.txt"
shell: "python heavy_analysis.py {input} > {output}"
# After inference (automatic)
rule process:
input: "data.txt"
output: "result.txt"
resources:
mem_mb=8192, # Inferred from "heavy_analysis"
cpu=4, # Inferred from analysis type
disk_mb=4096 # Inferred from file operations
shell: "python heavy_analysis.py {input} > {output}"
Execution Model Detection:
# Automatic detection of execution model
wf2wf info workflow.smk
# Output:
# Execution Model: distributed_computing
# Detection Method: content_analysis
# Confidence: 0.85
# Indicators:
# - Multiple resource specifications
# - Container requirements
# - File transfer modes
Interactive Prompting¶
When It’s Useful¶
Interactive mode is particularly helpful when:
Converting between different execution environments
Workflows have missing resource specifications
Container/environment specifications are incomplete
Error handling needs to be configured
File transfer modes need optimization
Enabling Interactive Mode¶
# Enable interactive mode
wf2wf convert -i workflow.smk -o workflow.dag --interactive
# Interactive mode with verbose output
wf2wf convert -i workflow.smk -o workflow.dag --interactive --verbose
Interactive Session Examples¶
Resource Specification:
Found 3 tasks without explicit resource requirements.
Distributed systems require explicit resource allocation.
Add default resource specifications? [Y/n]: Y
Applied default resources: CPU=1, Memory=2048MB, Disk=4096MB
Container Specification:
Found 2 tasks without container or conda specifications.
Distributed systems typically require explicit environment isolation.
Add container specifications or conda environments? [Y/n]: Y
Enable --auto-env to automatically build containers for these tasks.
Error Handling:
Found 4 tasks without retry specifications.
Distributed systems benefit from explicit error handling.
Add retry specifications for failed tasks? [Y/n]: Y
Applied default retry settings (2 retries)
Resource Processing¶
Resource Validation¶
The resource processor validates specifications against target environments:
# Validate resources for cluster environment
wf2wf convert -i workflow.smk -o workflow.dag --validate-resources
# Output:
# ⚠ Resource validation found 2 issues:
# • task_1: Memory specification (16384MB) exceeds cluster limit (8192MB)
# • task_2: CPU specification (16) exceeds cluster limit (8)
Resource Profiles¶
Apply predefined resource profiles for different environments:
# Apply cluster profile
wf2wf convert -i workflow.smk -o workflow.dag --resource-profile cluster
# Available profiles:
# - shared: Light resources for shared filesystem
# - cluster: Standard cluster resources
# - cloud: Cloud-optimized resources
# - hpc: High-performance computing resources
# - gpu: GPU-enabled resources
Resource Inference¶
Automatically infer resource requirements from command analysis:
# Enable resource inference
wf2wf convert -i workflow.smk -o workflow.dag --infer-resources
# The system analyzes commands like:
# - "bwa mem" → High memory, moderate CPU
# - "samtools sort" → High memory, moderate CPU
# - "python script.py" → Low memory, low CPU
# - "Rscript analysis.R" → Moderate memory, low CPU
Loss Integration¶
Loss Detection¶
The loss integration system automatically detects information that may be lost during conversion:
# Convert with loss detection
wf2wf convert -i workflow.smk -o workflow.dag --fail-on-loss
# Generate detailed loss report
wf2wf convert -i workflow.smk -o workflow.dag --report-md
Loss Report Example¶
# Conversion Report
## Information Loss Summary
### Preserved Information
- ✅ Task definitions and dependencies
- ✅ Resource specifications
- ✅ Container/environment specifications
- ✅ Input/output file specifications
### Potential Loss
- ⚠️ Snakemake wildcards → DAGMan parameter substitution
- ⚠️ Snakemake conda environments → DAGMan container specifications
- ⚠️ Snakemake threads specification → DAGMan CPU requirements
### Recommendations
- Review wildcard substitutions for correctness
- Verify container specifications match conda environments
- Confirm CPU requirements match thread specifications
Environment Management¶
Execution Environment Adaptation¶
The system automatically adapts workflows for different execution environments:
# Convert for shared filesystem
wf2wf convert -i workflow.smk -o workflow.dag --target-env shared
# Convert for distributed computing
wf2wf convert -i workflow.smk -o workflow.dag --target-env distributed
# Convert for cloud computing
wf2wf convert -i workflow.smk -o workflow.dag --target-env cloud
Environment-Specific Optimizations¶
Shared Filesystem:
Minimal resource specifications
System-wide software dependencies
Basic error handling
Distributed Computing:
Explicit resource requirements
Container specifications
Sophisticated retry policies
File transfer mode optimization
Cloud Computing:
Cloud-optimized resource profiles
Container-based execution
Cost-optimized configurations
Best Practices¶
Using Shared Infrastructure¶
Always use interactive mode for complex conversions
Enable resource inference for workflows without explicit specifications
Validate resources against your target environment
Review loss reports to understand conversion implications
Use appropriate resource profiles for your target environment
Configuration Examples¶
# Comprehensive conversion with all features
wf2wf convert -i workflow.smk -o workflow.dag \
--interactive \
--infer-resources \
--validate-resources \
--resource-profile cluster \
--target-env distributed \
--report-md \
--verbose
Troubleshooting¶
Common Issues:
Resource validation failures: Adjust specifications or use different profile
Interactive prompts not appearing: Ensure
--interactiveflag is usedLoss detection warnings: Review and address potential information loss
Inference not working: Check command content for analysis
Getting Help:
# Get detailed information about your workflow
wf2wf info workflow.smk
# Validate workflow before conversion
wf2wf validate workflow.smk
# Check for potential issues
wf2wf convert -i workflow.smk -o workflow.dag --dry-run
Compliance and Quality¶
All importers now achieve 85-95% compliance with the shared infrastructure specification:
DAGMan: 95/100 (Reference implementation)
CWL: 95/100 (Enhanced with resource processing)
Snakemake: 90/100 (Complex format, excellent compliance)
Nextflow: 90/100 (Fully compliant)
WDL: 85/100 (Good compliance)
Galaxy: 85/100 (Good compliance)
This ensures consistent behavior and enhanced functionality across all supported workflow formats.