Loss System API Reference

The wf2wf.loss package provides a comprehensive system for tracking, recording, validating, and reinjecting information loss during workflow format conversions.

Core Loss Functions

Basic Loss Recording

.. py:module:: wf2wf.loss.core

Core loss tracking and management functionality.

.. py:function:: as_list() -> ~typing.List[~wf2wf.loss.core.LossEntry] :module: wf2wf.loss.core

Return the current loss entries as a list.

.. py:function:: generate_summary() -> ~typing.Dict[str, ~typing.Any] :module: wf2wf.loss.core

Generate summary statistics for the current loss entries.

.. py:function:: record(json_pointer: str, field: str, lost_value: ~typing.Any, reason: str, origin: str = ‘user’, *, severity: str = ‘warn’, category: str = ‘advanced_features’, environment_context: ~typing.Dict[str, ~typing.Any] | None = None, adaptation_details: ~typing.Dict[str, ~typing.Any] | None = None, recovery_suggestions: ~typing.List[str] | None = None) -> None :module: wf2wf.loss.core

Append a comprehensive loss entry describing that field at json_pointer was lost.

:param json_pointer: JSON pointer to the field in the IR :type json_pointer: str :param field: Name of the field that was lost :type field: str :param lost_value: The value that could not be represented in the target format :type lost_value: Any :param reason: Human-readable reason for the loss :type reason: str :param origin: Whether the loss originated from user data or wf2wf processing :type origin: str :param severity: Severity level: info, warn, error :type severity: str :param category: Category of the lost information :type category: str :param environment_context: Environment-specific context for the loss :type environment_context: Optional[Dict[str, Any]] :param adaptation_details: Details about how the value was adapted :type adaptation_details: Optional[Dict[str, Any]] :param recovery_suggestions: Suggestions for recovering or working around the loss :type recovery_suggestions: Optional[List[str]]

.. py:function:: reset() -> None :module: wf2wf.loss.core

Clear the in-memory loss buffer.

Loss Document Management

.. py:module:: wf2wf.loss.core

Core loss tracking and management functionality.

.. py:function:: create_loss_document(target_engine: str, source_checksum: str, environment_adaptation: ~typing.Dict[str, ~typing.Any] | None = None, **kwargs) -> ~typing.Dict[str, ~typing.Any] :module: wf2wf.loss.core

Create a comprehensive loss document with summary statistics.

.. py:function:: write(doc: ~typing.Dict[str, ~typing.Any], path: str | ~pathlib.Path, **kwargs) -> None :module: wf2wf.loss.core

Write loss document to file.

.. py:function:: write_loss_document(path: str | ~pathlib.Path, target_engine: str, source_checksum: str, environment_adaptation: ~typing.Dict[str, ~typing.Any] | None = None, **kwargs) -> None :module: wf2wf.loss.core

Write a loss document to file.

Loss Reinjection

.. py:module:: wf2wf.loss.core

Core loss tracking and management functionality.

.. py:function:: apply(workflow: Workflow, entries: List[LossEntry]) -> int :module: wf2wf.loss.core

Apply loss entries back to a workflow (reinjection).

This function attempts to reinject lost information back into the workflow IR, marking entries as ‘reapplied’ if successful.

:returns: Number of successfully applied entries

.. py:function:: detect_and_apply_loss_sidecar(workflow: Workflow, source_path: Path, verbose: bool = False) -> bool :module: wf2wf.loss.core

Detect and apply loss side-car during import.

This function looks for a loss side-car file next to the source file and applies any loss information to the workflow.

:param workflow: Workflow object to apply loss information to :param source_path: Path to the source workflow file :param verbose: Enable verbose logging

:returns: True if a loss side-car was found and applied, False otherwise

.. py:function:: prepare(prev_entries: ~typing.List[~wf2wf.loss.core.LossEntry]) -> None :module: wf2wf.loss.core

Prepare for a new export cycle by remembering previously reapplied entries.

Checksum and Validation

.. py:module:: wf2wf.loss.core

Core loss tracking and management functionality.

.. py:function:: compute_checksum(workflow: Workflow) -> str :module: wf2wf.loss.core

Compute SHA-256 checksum of workflow IR for loss tracking.

.. py:function:: create_loss_sidecar_summary(workflow: Workflow, source_path: Path) -> Dict[str, Any] :module: wf2wf.loss.core

Create a summary of loss side-car information for a workflow.

:param workflow: Workflow object :param source_path: Path to the source workflow file

:returns: Dictionary containing loss side-car summary information

Context Detection

Format-Specific Loss Detection

.. py:module:: wf2wf.loss.context_detection

Format-specific loss detection and environment-specific value handling.

.. py:class:: FormatLossDetector(source_format: str, target_format: str) :module: wf2wf.loss.context_detection

Detects format-specific losses during import/export cycles.

.. py:method:: FormatLossDetector.detect_environment_specific_losses(workflow: Workflow) -> List[Dict[str, Any]] :module: wf2wf.loss.context_detection

  Detect environment-specific value losses between formats.

.. py:attribute:: FormatLossDetector.source_format :module: wf2wf.loss.context_detection :type: str

.. py:attribute:: FormatLossDetector.target_format :module: wf2wf.loss.context_detection :type: str

.. py:function:: detect_format_specific_losses(workflow: Workflow, source_format: str, target_format: str) -> List[Dict[str, Any]] :module: wf2wf.loss.context_detection

Detect format-specific losses in a workflow.

Environment-Specific Value Handling

.. py:module:: wf2wf.loss.context_detection

Format-specific loss detection and environment-specific value handling.

.. py:class:: EnvironmentLossRecorder(source_format: str, target_format: str, target_environment: str) :module: wf2wf.loss.context_detection

Records environment-specific losses with detailed context.

.. py:method:: EnvironmentLossRecorder.record_environment_specific_value_loss(json_pointer: str, field: str, env_value: EnvironmentSpecificValue, reason: str, *, severity: str = ‘warn’) -> None :module: wf2wf.loss.context_detection

  Record loss of environment-specific value with detailed context.

.. py:attribute:: EnvironmentLossRecorder.source_format :module: wf2wf.loss.context_detection :type: str

.. py:attribute:: EnvironmentLossRecorder.target_environment :module: wf2wf.loss.context_detection :type: str

.. py:attribute:: EnvironmentLossRecorder.target_format :module: wf2wf.loss.context_detection :type: str

.. py:function:: record_environment_specific_value_loss(json_pointer: str, field: str, env_value: EnvironmentSpecificValue, source_format: str, target_format: str, target_environment: str, reason: str, *, severity: str = ‘warn’) -> None :module: wf2wf.loss.context_detection

Record loss of environment-specific value with format context.

.. py:function:: restore_environment_specific_value(lost_value: Any, field_name: str, expected_type: type, target_environment: str) -> EnvironmentSpecificValue :module: wf2wf.loss.context_detection

Restore an EnvironmentSpecificValue from lost data.

.. py:function:: validate_environment_specific_value(value: ~typing.Any, field_name: str, expected_type: type) -> bool :module: wf2wf.loss.context_detection

Validate that a value is a proper EnvironmentSpecificValue.

Export Loss Detection

Format-Specific Export Losses

.. py:module:: wf2wf.loss.export

Export loss detection and recording for different workflow formats.

.. py:function:: detect_and_record_export_losses(workflow: ~wf2wf.core.Workflow, target_format: str, target_environment: str = ‘shared_filesystem’, verbose: bool = False) -> None :module: wf2wf.loss.export

Detect and record losses when converting to target format for specific environment.

Individual Format Loss Functions

.. py:module:: wf2wf.loss.export

Export loss detection and recording for different workflow formats.

.. py:function:: record_cwl_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to CWL format.

.. py:function:: record_dagman_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to DAGMan format.

.. py:function:: record_galaxy_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to Galaxy format.

.. py:function:: record_nextflow_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to Nextflow format.

.. py:function:: record_snakemake_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to Snakemake format.

.. py:function:: record_wdl_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export

Record losses when converting to WDL format.

Import Loss Detection

Loss Side-car Validation

.. py:module:: wf2wf.loss.import_

Import loss detection and validation for loss sidecars.

.. py:function:: validate_loss_entry(entry: ~typing.Dict[str, ~typing.Any], verbose: bool = False) -> bool :module: wf2wf.loss.import_

Validate a single loss entry.

:param entry: Dictionary containing loss entry data :param verbose: Enable verbose logging

:returns: True if the loss entry is valid, False otherwise

.. py:function:: validate_loss_sidecar(loss_data: ~typing.Dict[str, ~typing.Any], workflow, verbose: bool = False) -> bool :module: wf2wf.loss.import_

Validate loss side-car data.

:param loss_data: Dictionary containing loss data :param workflow: Workflow IR object (for checksum validation) :param verbose: Enable verbose logging

:returns: True if the loss side-car is valid, False otherwise

Import Loss Detection Functions

.. py:module:: wf2wf.loss.import_

Import loss detection and validation for loss sidecars.

.. py:function:: detect_and_record_import_losses(workflow: Workflow, source_format: str, target_environment: str = ‘shared_filesystem’, verbose: bool = False) -> None :module: wf2wf.loss.import_

Detect and record losses when importing from source format.

Data Structures

LossEntry

A typed dictionary wrapper for loss entries:

class LossEntry(Dict[str, Any]):
    """Typed dict wrapper for a loss mapping entry with comprehensive IR support."""

Loss Entry Schema

Each loss entry contains:

  • json_pointer: JSON pointer to the field in the IR

  • field: Name of the field that was lost

  • lost_value: The value that could not be represented

  • reason: Human-readable reason for the loss

  • origin: Whether the loss came from user data or wf2wf processing

  • status: Current status (lost, lost_again, reapplied, adapted)

  • severity: Severity level (info, warn, error)

  • category: Category of lost information

  • environment_context: Environment-specific context (optional)

  • recovery_suggestions: Suggestions for recovery (optional)

Usage Examples

Basic Loss Recording Example

from wf2wf.loss import record, reset, as_list

# Record a simple loss
record(
    json_pointer="/tasks/align/priority",
    field="priority",
    lost_value=10,
    reason="CWL lacks job priority field",
    origin="user",
    severity="warn"
)

# Get all recorded losses
losses = as_list()

Environment-Specific Loss Recording

from wf2wf.loss import record_environment_specific_value_loss

# Record loss of environment-specific value
record_environment_specific_value_loss(
    json_pointer="/tasks/align/cpu",
    field="cpu",
    env_value=task.cpu,  # EnvironmentSpecificValue object
    source_format="snakemake",
    target_format="cwl",
    target_environment="shared_filesystem",
    reason="CWL has limited resource specification"
)

Export Loss Detection Example

from wf2wf.loss import detect_and_record_export_losses

# Detect and record all losses for a format
detect_and_record_export_losses(
    workflow=workflow,
    target_format="cwl",
    target_environment="shared_filesystem",
    verbose=True
)

Loss Reinjection Example

from wf2wf.loss import detect_and_apply_loss_sidecar

# Apply loss side-car during import
workflow = parse_workflow(path)
detect_and_apply_loss_sidecar(workflow, path, verbose=True)

Custom Loss Detection

from wf2wf.loss import FormatLossDetector, EnvironmentLossRecorder

# Create custom detector
detector = FormatLossDetector("snakemake", "cwl")
losses = detector.detect_environment_specific_losses(workflow)

# Create custom recorder
recorder = EnvironmentLossRecorder("snakemake", "cwl", "shared_filesystem")
recorder.record_environment_specific_value_loss(
    json_pointer="/tasks/align/gpu",
    field="gpu",
    env_value=task.gpu,
    reason="CWL lacks GPU support"
)

Loss Document Creation

from wf2wf.loss import create_loss_document, write_loss_document, compute_checksum

# Create loss document
doc = create_loss_document(
    target_engine="cwl",
    source_checksum=compute_checksum(workflow),
    environment_adaptation={
        "source_environment": "shared_filesystem",
        "target_environment": "distributed_computing"
    }
)

# Write to file
write_loss_document(
    path="workflow.loss.json",
    target_engine="cwl",
    source_checksum=compute_checksum(workflow)
)

Side-car Validation Example

from wf2wf.loss import validate_loss_sidecar, validate_loss_entry

# Validate entire side-car
with open("workflow.loss.json") as f:
    loss_data = json.load(f)
is_valid = validate_loss_sidecar(loss_data, Path("workflow.cwl"))

# Validate individual entry
is_valid = validate_loss_entry(loss_data["entries"][0])

Error Handling

The loss system provides comprehensive error handling:

  • Invalid side-cars: Logged but don’t crash imports

  • Missing fields: Handled with sensible defaults

  • Type mismatches: Detected and reported

  • Validation failures: Detailed error messages

Performance Considerations

  • Loss detection is performed once per export

  • Side-car validation is lightweight

  • Reinjection is optimized for common cases

  • Memory usage scales with number of loss entries

Extension Points

The loss system is designed to be extensible:

  1. New formats: Add format-specific loss detection functions

  2. Custom categories: Define new loss categories

  3. Enhanced validation: Extend validation logic

  4. Custom restoration: Implement format-specific restoration logic