Loss System API Reference¶
The wf2wf.loss package provides a comprehensive system for tracking, recording, validating, and reinjecting information loss during workflow format conversions.
Core Loss Functions¶
Basic Loss Recording¶
.. py:module:: wf2wf.loss.core
Core loss tracking and management functionality.
.. py:function:: as_list() -> ~typing.List[~wf2wf.loss.core.LossEntry] :module: wf2wf.loss.core
Return the current loss entries as a list.
.. py:function:: generate_summary() -> ~typing.Dict[str, ~typing.Any] :module: wf2wf.loss.core
Generate summary statistics for the current loss entries.
.. py:function:: record(json_pointer: str, field: str, lost_value: ~typing.Any, reason: str, origin: str = ‘user’, *, severity: str = ‘warn’, category: str = ‘advanced_features’, environment_context: ~typing.Dict[str, ~typing.Any] | None = None, adaptation_details: ~typing.Dict[str, ~typing.Any] | None = None, recovery_suggestions: ~typing.List[str] | None = None) -> None :module: wf2wf.loss.core
Append a comprehensive loss entry describing that field at json_pointer was lost.
:param json_pointer: JSON pointer to the field in the IR :type json_pointer: str :param field: Name of the field that was lost :type field: str :param lost_value: The value that could not be represented in the target format :type lost_value: Any :param reason: Human-readable reason for the loss :type reason: str :param origin: Whether the loss originated from user data or wf2wf processing :type origin: str :param severity: Severity level: info, warn, error :type severity: str :param category: Category of the lost information :type category: str :param environment_context: Environment-specific context for the loss :type environment_context: Optional[Dict[str, Any]] :param adaptation_details: Details about how the value was adapted :type adaptation_details: Optional[Dict[str, Any]] :param recovery_suggestions: Suggestions for recovering or working around the loss :type recovery_suggestions: Optional[List[str]]
.. py:function:: reset() -> None :module: wf2wf.loss.core
Clear the in-memory loss buffer.
Loss Document Management¶
.. py:module:: wf2wf.loss.core
Core loss tracking and management functionality.
.. py:function:: create_loss_document(target_engine: str, source_checksum: str, environment_adaptation: ~typing.Dict[str, ~typing.Any] | None = None, **kwargs) -> ~typing.Dict[str, ~typing.Any] :module: wf2wf.loss.core
Create a comprehensive loss document with summary statistics.
.. py:function:: write(doc: ~typing.Dict[str, ~typing.Any], path: str | ~pathlib.Path, **kwargs) -> None :module: wf2wf.loss.core
Write loss document to file.
.. py:function:: write_loss_document(path: str | ~pathlib.Path, target_engine: str, source_checksum: str, environment_adaptation: ~typing.Dict[str, ~typing.Any] | None = None, **kwargs) -> None :module: wf2wf.loss.core
Write a loss document to file.
Loss Reinjection¶
.. py:module:: wf2wf.loss.core
Core loss tracking and management functionality.
.. py:function:: apply(workflow: Workflow, entries: List[LossEntry]) -> int :module: wf2wf.loss.core
Apply loss entries back to a workflow (reinjection).
This function attempts to reinject lost information back into the workflow IR, marking entries as ‘reapplied’ if successful.
:returns: Number of successfully applied entries
.. py:function:: detect_and_apply_loss_sidecar(workflow: Workflow, source_path: Path, verbose: bool = False) -> bool :module: wf2wf.loss.core
Detect and apply loss side-car during import.
This function looks for a loss side-car file next to the source file and applies any loss information to the workflow.
:param workflow: Workflow object to apply loss information to :param source_path: Path to the source workflow file :param verbose: Enable verbose logging
:returns: True if a loss side-car was found and applied, False otherwise
.. py:function:: prepare(prev_entries: ~typing.List[~wf2wf.loss.core.LossEntry]) -> None :module: wf2wf.loss.core
Prepare for a new export cycle by remembering previously reapplied entries.
Checksum and Validation¶
.. py:module:: wf2wf.loss.core
Core loss tracking and management functionality.
.. py:function:: compute_checksum(workflow: Workflow) -> str :module: wf2wf.loss.core
Compute SHA-256 checksum of workflow IR for loss tracking.
.. py:function:: create_loss_sidecar_summary(workflow: Workflow, source_path: Path) -> Dict[str, Any] :module: wf2wf.loss.core
Create a summary of loss side-car information for a workflow.
:param workflow: Workflow object :param source_path: Path to the source workflow file
:returns: Dictionary containing loss side-car summary information
Context Detection¶
Format-Specific Loss Detection¶
.. py:module:: wf2wf.loss.context_detection
Format-specific loss detection and environment-specific value handling.
.. py:class:: FormatLossDetector(source_format: str, target_format: str) :module: wf2wf.loss.context_detection
Detects format-specific losses during import/export cycles.
.. py:method:: FormatLossDetector.detect_environment_specific_losses(workflow: Workflow) -> List[Dict[str, Any]] :module: wf2wf.loss.context_detection
Detect environment-specific value losses between formats.
.. py:attribute:: FormatLossDetector.source_format :module: wf2wf.loss.context_detection :type: str
.. py:attribute:: FormatLossDetector.target_format :module: wf2wf.loss.context_detection :type: str
.. py:function:: detect_format_specific_losses(workflow: Workflow, source_format: str, target_format: str) -> List[Dict[str, Any]] :module: wf2wf.loss.context_detection
Detect format-specific losses in a workflow.
Environment-Specific Value Handling¶
.. py:module:: wf2wf.loss.context_detection
Format-specific loss detection and environment-specific value handling.
.. py:class:: EnvironmentLossRecorder(source_format: str, target_format: str, target_environment: str) :module: wf2wf.loss.context_detection
Records environment-specific losses with detailed context.
.. py:method:: EnvironmentLossRecorder.record_environment_specific_value_loss(json_pointer: str, field: str, env_value: EnvironmentSpecificValue, reason: str, *, severity: str = ‘warn’) -> None :module: wf2wf.loss.context_detection
Record loss of environment-specific value with detailed context.
.. py:attribute:: EnvironmentLossRecorder.source_format :module: wf2wf.loss.context_detection :type: str
.. py:attribute:: EnvironmentLossRecorder.target_environment :module: wf2wf.loss.context_detection :type: str
.. py:attribute:: EnvironmentLossRecorder.target_format :module: wf2wf.loss.context_detection :type: str
.. py:function:: record_environment_specific_value_loss(json_pointer: str, field: str, env_value: EnvironmentSpecificValue, source_format: str, target_format: str, target_environment: str, reason: str, *, severity: str = ‘warn’) -> None :module: wf2wf.loss.context_detection
Record loss of environment-specific value with format context.
.. py:function:: restore_environment_specific_value(lost_value: Any, field_name: str, expected_type: type, target_environment: str) -> EnvironmentSpecificValue :module: wf2wf.loss.context_detection
Restore an EnvironmentSpecificValue from lost data.
.. py:function:: validate_environment_specific_value(value: ~typing.Any, field_name: str, expected_type: type) -> bool :module: wf2wf.loss.context_detection
Validate that a value is a proper EnvironmentSpecificValue.
Export Loss Detection¶
Format-Specific Export Losses¶
.. py:module:: wf2wf.loss.export
Export loss detection and recording for different workflow formats.
.. py:function:: detect_and_record_export_losses(workflow: ~wf2wf.core.Workflow, target_format: str, target_environment: str = ‘shared_filesystem’, verbose: bool = False) -> None :module: wf2wf.loss.export
Detect and record losses when converting to target format for specific environment.
Individual Format Loss Functions¶
.. py:module:: wf2wf.loss.export
Export loss detection and recording for different workflow formats.
.. py:function:: record_cwl_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to CWL format.
.. py:function:: record_dagman_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to DAGMan format.
.. py:function:: record_galaxy_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to Galaxy format.
.. py:function:: record_nextflow_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to Nextflow format.
.. py:function:: record_snakemake_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to Snakemake format.
.. py:function:: record_wdl_losses(workflow: ~wf2wf.core.Workflow, target_environment: str, verbose: bool = False) -> None :module: wf2wf.loss.export
Record losses when converting to WDL format.
Import Loss Detection¶
Loss Side-car Validation¶
.. py:module:: wf2wf.loss.import_
Import loss detection and validation for loss sidecars.
.. py:function:: validate_loss_entry(entry: ~typing.Dict[str, ~typing.Any], verbose: bool = False) -> bool :module: wf2wf.loss.import_
Validate a single loss entry.
:param entry: Dictionary containing loss entry data :param verbose: Enable verbose logging
:returns: True if the loss entry is valid, False otherwise
.. py:function:: validate_loss_sidecar(loss_data: ~typing.Dict[str, ~typing.Any], workflow, verbose: bool = False) -> bool :module: wf2wf.loss.import_
Validate loss side-car data.
:param loss_data: Dictionary containing loss data :param workflow: Workflow IR object (for checksum validation) :param verbose: Enable verbose logging
:returns: True if the loss side-car is valid, False otherwise
Import Loss Detection Functions¶
.. py:module:: wf2wf.loss.import_
Import loss detection and validation for loss sidecars.
.. py:function:: detect_and_record_import_losses(workflow: Workflow, source_format: str, target_environment: str = ‘shared_filesystem’, verbose: bool = False) -> None :module: wf2wf.loss.import_
Detect and record losses when importing from source format.
Data Structures¶
LossEntry¶
A typed dictionary wrapper for loss entries:
class LossEntry(Dict[str, Any]):
"""Typed dict wrapper for a loss mapping entry with comprehensive IR support."""
Loss Entry Schema¶
Each loss entry contains:
json_pointer: JSON pointer to the field in the IRfield: Name of the field that was lostlost_value: The value that could not be representedreason: Human-readable reason for the lossorigin: Whether the loss came from user data or wf2wf processingstatus: Current status (lost,lost_again,reapplied,adapted)severity: Severity level (info,warn,error)category: Category of lost informationenvironment_context: Environment-specific context (optional)recovery_suggestions: Suggestions for recovery (optional)
Usage Examples¶
Basic Loss Recording Example¶
from wf2wf.loss import record, reset, as_list
# Record a simple loss
record(
json_pointer="/tasks/align/priority",
field="priority",
lost_value=10,
reason="CWL lacks job priority field",
origin="user",
severity="warn"
)
# Get all recorded losses
losses = as_list()
Environment-Specific Loss Recording¶
from wf2wf.loss import record_environment_specific_value_loss
# Record loss of environment-specific value
record_environment_specific_value_loss(
json_pointer="/tasks/align/cpu",
field="cpu",
env_value=task.cpu, # EnvironmentSpecificValue object
source_format="snakemake",
target_format="cwl",
target_environment="shared_filesystem",
reason="CWL has limited resource specification"
)
Export Loss Detection Example¶
from wf2wf.loss import detect_and_record_export_losses
# Detect and record all losses for a format
detect_and_record_export_losses(
workflow=workflow,
target_format="cwl",
target_environment="shared_filesystem",
verbose=True
)
Loss Reinjection Example¶
from wf2wf.loss import detect_and_apply_loss_sidecar
# Apply loss side-car during import
workflow = parse_workflow(path)
detect_and_apply_loss_sidecar(workflow, path, verbose=True)
Custom Loss Detection¶
from wf2wf.loss import FormatLossDetector, EnvironmentLossRecorder
# Create custom detector
detector = FormatLossDetector("snakemake", "cwl")
losses = detector.detect_environment_specific_losses(workflow)
# Create custom recorder
recorder = EnvironmentLossRecorder("snakemake", "cwl", "shared_filesystem")
recorder.record_environment_specific_value_loss(
json_pointer="/tasks/align/gpu",
field="gpu",
env_value=task.gpu,
reason="CWL lacks GPU support"
)
Loss Document Creation¶
from wf2wf.loss import create_loss_document, write_loss_document, compute_checksum
# Create loss document
doc = create_loss_document(
target_engine="cwl",
source_checksum=compute_checksum(workflow),
environment_adaptation={
"source_environment": "shared_filesystem",
"target_environment": "distributed_computing"
}
)
# Write to file
write_loss_document(
path="workflow.loss.json",
target_engine="cwl",
source_checksum=compute_checksum(workflow)
)
Side-car Validation Example¶
from wf2wf.loss import validate_loss_sidecar, validate_loss_entry
# Validate entire side-car
with open("workflow.loss.json") as f:
loss_data = json.load(f)
is_valid = validate_loss_sidecar(loss_data, Path("workflow.cwl"))
# Validate individual entry
is_valid = validate_loss_entry(loss_data["entries"][0])
Error Handling¶
The loss system provides comprehensive error handling:
Invalid side-cars: Logged but don’t crash imports
Missing fields: Handled with sensible defaults
Type mismatches: Detected and reported
Validation failures: Detailed error messages
Performance Considerations¶
Loss detection is performed once per export
Side-car validation is lightweight
Reinjection is optimized for common cases
Memory usage scales with number of loss entries
Extension Points¶
The loss system is designed to be extensible:
New formats: Add format-specific loss detection functions
Custom categories: Define new loss categories
Enhanced validation: Extend validation logic
Custom restoration: Implement format-specific restoration logic