# Shared Infrastructure Features wf2wf provides a comprehensive shared infrastructure that enhances all workflow importers with intelligent inference, interactive prompting, and resource management capabilities. ## Overview The shared infrastructure consists of several key components that work together to provide a consistent and enhanced user experience across all supported workflow formats: - **Intelligent Inference**: Automatically fills in missing information - **Interactive Prompting**: Guides users through configuration decisions - **Resource Processing**: Validates and optimizes resource specifications - **Loss Integration**: Detects and reports information loss during conversion - **Environment Management**: Adapts workflows for different execution environments ## Intelligent Inference ### What It Does The intelligent inference system analyzes your workflow and automatically fills in missing information based on: - **Command Analysis**: Infers resource requirements from command content - **Format Patterns**: Applies format-specific best practices - **Execution Environment**: Adapts to target environment requirements - **Content Analysis**: Detects execution models and patterns ### How It Works ```bash # Automatic inference is enabled by default wf2wf convert -i workflow.smk -o workflow.dag # The system will automatically: # - Infer missing resource requirements # - Detect execution models # - Apply environment-specific optimizations # - Suggest improvements ``` ### Inference Examples **Resource Inference:** ```python # Before inference rule process: input: "data.txt" output: "result.txt" shell: "python heavy_analysis.py {input} > {output}" # After inference (automatic) rule process: input: "data.txt" output: "result.txt" resources: mem_mb=8192, # Inferred from "heavy_analysis" cpu=4, # Inferred from analysis type disk_mb=4096 # Inferred from file operations shell: "python heavy_analysis.py {input} > {output}" ``` **Execution Model Detection:** ```bash # Automatic detection of execution model wf2wf info workflow.smk # Output: # Execution Model: distributed_computing # Detection Method: content_analysis # Confidence: 0.85 # Indicators: # - Multiple resource specifications # - Container requirements # - File transfer modes ``` ## Interactive Prompting ### When It's Useful Interactive mode is particularly helpful when: - Converting between different execution environments - Workflows have missing resource specifications - Container/environment specifications are incomplete - Error handling needs to be configured - File transfer modes need optimization ### Enabling Interactive Mode ```bash # Enable interactive mode wf2wf convert -i workflow.smk -o workflow.dag --interactive # Interactive mode with verbose output wf2wf convert -i workflow.smk -o workflow.dag --interactive --verbose ``` ### Interactive Session Examples **Resource Specification:** ``` Found 3 tasks without explicit resource requirements. Distributed systems require explicit resource allocation. Add default resource specifications? [Y/n]: Y Applied default resources: CPU=1, Memory=2048MB, Disk=4096MB ``` **Container Specification:** ``` Found 2 tasks without container or conda specifications. Distributed systems typically require explicit environment isolation. Add container specifications or conda environments? [Y/n]: Y Enable --auto-env to automatically build containers for these tasks. ``` **Error Handling:** ``` Found 4 tasks without retry specifications. Distributed systems benefit from explicit error handling. Add retry specifications for failed tasks? [Y/n]: Y Applied default retry settings (2 retries) ``` ## Resource Processing ### Resource Validation The resource processor validates specifications against target environments: ```bash # Validate resources for cluster environment wf2wf convert -i workflow.smk -o workflow.dag --validate-resources # Output: # ⚠ Resource validation found 2 issues: # • task_1: Memory specification (16384MB) exceeds cluster limit (8192MB) # • task_2: CPU specification (16) exceeds cluster limit (8) ``` ### Resource Profiles Apply predefined resource profiles for different environments: ```bash # Apply cluster profile wf2wf convert -i workflow.smk -o workflow.dag --resource-profile cluster # Available profiles: # - shared: Light resources for shared filesystem # - cluster: Standard cluster resources # - cloud: Cloud-optimized resources # - hpc: High-performance computing resources # - gpu: GPU-enabled resources ``` ### Resource Inference Automatically infer resource requirements from command analysis: ```bash # Enable resource inference wf2wf convert -i workflow.smk -o workflow.dag --infer-resources # The system analyzes commands like: # - "bwa mem" → High memory, moderate CPU # - "samtools sort" → High memory, moderate CPU # - "python script.py" → Low memory, low CPU # - "Rscript analysis.R" → Moderate memory, low CPU ``` ## Loss Integration ### Loss Detection The loss integration system automatically detects information that may be lost during conversion: ```bash # Convert with loss detection wf2wf convert -i workflow.smk -o workflow.dag --fail-on-loss # Generate detailed loss report wf2wf convert -i workflow.smk -o workflow.dag --report-md ``` ### Loss Report Example ```markdown # Conversion Report ## Information Loss Summary ### Preserved Information - ✅ Task definitions and dependencies - ✅ Resource specifications - ✅ Container/environment specifications - ✅ Input/output file specifications ### Potential Loss - ⚠️ Snakemake wildcards → DAGMan parameter substitution - ⚠️ Snakemake conda environments → DAGMan container specifications - ⚠️ Snakemake threads specification → DAGMan CPU requirements ### Recommendations - Review wildcard substitutions for correctness - Verify container specifications match conda environments - Confirm CPU requirements match thread specifications ``` ## Environment Management ### Execution Environment Adaptation The system automatically adapts workflows for different execution environments: ```bash # Convert for shared filesystem wf2wf convert -i workflow.smk -o workflow.dag --target-env shared # Convert for distributed computing wf2wf convert -i workflow.smk -o workflow.dag --target-env distributed # Convert for cloud computing wf2wf convert -i workflow.smk -o workflow.dag --target-env cloud ``` ### Environment-Specific Optimizations **Shared Filesystem:** - Minimal resource specifications - System-wide software dependencies - Basic error handling **Distributed Computing:** - Explicit resource requirements - Container specifications - Sophisticated retry policies - File transfer mode optimization **Cloud Computing:** - Cloud-optimized resource profiles - Container-based execution - Cost-optimized configurations ## Best Practices ### Using Shared Infrastructure 1. **Always use interactive mode** for complex conversions 2. **Enable resource inference** for workflows without explicit specifications 3. **Validate resources** against your target environment 4. **Review loss reports** to understand conversion implications 5. **Use appropriate resource profiles** for your target environment ### Configuration Examples ```bash # Comprehensive conversion with all features wf2wf convert -i workflow.smk -o workflow.dag \ --interactive \ --infer-resources \ --validate-resources \ --resource-profile cluster \ --target-env distributed \ --report-md \ --verbose ``` ### Troubleshooting **Common Issues:** 1. **Resource validation failures**: Adjust specifications or use different profile 2. **Interactive prompts not appearing**: Ensure `--interactive` flag is used 3. **Loss detection warnings**: Review and address potential information loss 4. **Inference not working**: Check command content for analysis **Getting Help:** ```bash # Get detailed information about your workflow wf2wf info workflow.smk # Validate workflow before conversion wf2wf validate workflow.smk # Check for potential issues wf2wf convert -i workflow.smk -o workflow.dag --dry-run ``` ## Compliance and Quality All importers now achieve **85-95% compliance** with the shared infrastructure specification: - **DAGMan**: 95/100 (Reference implementation) - **CWL**: 95/100 (Enhanced with resource processing) - **Snakemake**: 90/100 (Complex format, excellent compliance) - **Nextflow**: 90/100 (Fully compliant) - **WDL**: 85/100 (Good compliance) - **Galaxy**: 85/100 (Good compliance) This ensures consistent behavior and enhanced functionality across all supported workflow formats.