Copy-on-Write Operations¶
ZnSocket provides efficient copy-on-write functionality through two primary mechanisms: Segments and List fallbacks. These features allow you to create logical copies of large datasets while only storing the differences, making them ideal for scenarios where you need to modify a few elements of a large collection without duplicating the entire dataset.
Overview¶
Copy-on-write (COW) is a resource management technique where data is not physically copied until it’s modified. ZnSocket implements this pattern to enable:
Memory efficiency: Only modified elements consume additional storage
Performance: Fast “copying” operations that don’t duplicate data
Flexibility: Ability to create multiple variations of a dataset
Data integrity: Original data remains unchanged
Implementation Approaches¶
ZnSocket offers two main approaches for copy-on-write operations:
Segments: Purpose-built for copy-on-write with piece table architecture
List Fallbacks: Using fallback mechanisms for transparent copy-on-write
Segments: True Copy-on-Write¶
Segments provide the most efficient copy-on-write implementation using a piece table data structure.
Basic Usage¶
import znsocket
from znjson.converter import NumpyConverter
client = znsocket.Client("http://localhost:5000")
# Create original dataset
original_data = znsocket.List(r=client, key="dataset", converter=[NumpyConverter])
original_data.extend([
{"name": "item_0", "value": 100, "metadata": {"type": "A"}},
{"name": "item_1", "value": 200, "metadata": {"type": "B"}},
{"name": "item_2", "value": 300, "metadata": {"type": "C"}},
])
# Create copy-on-write view using Segments
dataset_copy = znsocket.Segments(
r=client,
origin=original_data, # Reference to original data
key="dataset_copy" # Key for storing modifications
)
# Access works transparently
print(f"Length: {len(dataset_copy)}") # 3
print(f"First item: {dataset_copy[0]}") # From original
print(f"Second item: {dataset_copy[1]}") # From original
# Modify a single element
modified_item = dict(dataset_copy[1]) # Convert to regular dict
modified_item["value"] = 999
modified_item["metadata"]["modified"] = True
dataset_copy[1] = modified_item
# Now only the modified element is stored separately
print(f"Modified item: {dataset_copy[1]}") # Modified version
print(f"Original unchanged: {original_data[1]}") # Original version
print(f"Other items unchanged: {dataset_copy[0]}") # Still from original
Advanced Segments Usage¶
# Working with complex nested data
complex_data = [
{
"id": i,
"config": {
"parameters": {"learning_rate": 0.01, "epochs": 100},
"metadata": {"created": "2025-01-01", "version": "1.0"}
},
"results": {"accuracy": 0.95, "loss": 0.05}
}
for i in range(1000) # Large dataset
]
original_experiments = znsocket.List(r=client, key="experiments")
original_experiments.extend(complex_data)
# Create experimental variation
experiment_variant = znsocket.Segments(
r=client,
origin=original_experiments,
key="experiment_variant_a"
)
# Modify specific experiments
for exp_id in [10, 50, 100]:
experiment = dict(experiment_variant[exp_id])
experiment["config"]["parameters"]["learning_rate"] = 0.001 # Different LR
experiment["metadata"] = {"variant": "low_lr", "modified": True}
experiment_variant[exp_id] = experiment
# Only 3 modified experiments are stored, 997 reference original
print(f"Total experiments: {len(experiment_variant)}") # 1000
print(f"Modified experiment: {experiment_variant[10]['metadata']['variant']}") # "low_lr"
print(f"Original unchanged: {original_experiments[10]['metadata']['version']}") # "1.0"
List Fallbacks: Transparent Copy-on-Write¶
List fallbacks provide copy-on-write behavior using ZnSocket’s built-in fallback mechanism.
Basic Fallback Usage¶
# Create original dataset
original_data = znsocket.List(r=client, key="original_dataset")
original_data.extend([
{"name": "sample_0", "score": 10},
{"name": "sample_1", "score": 20},
{"name": "sample_2", "score": 30},
])
# Create copy using fallback mechanism
dataset_copy = znsocket.List(
r=client,
key="dataset_copy",
fallback="original_dataset", # Fall back to original
fallback_policy="frozen", # Read-only fallback
converter=[NumpyConverter]
)
# Access transparently falls back to original
print(f"Copy length: {len(dataset_copy)}") # 3 (from fallback)
print(f"First item: {dataset_copy[0]}") # From original
# Modify element - triggers copy-on-write
modified_item = dict(dataset_copy[1])
modified_item["score"] = 999
modified_item["source"] = "modified"
dataset_copy[1] = modified_item
# Copy-on-write behavior activated
print(f"Modified: {dataset_copy[1]['score']}") # 999
print(f"Original: {original_data[1]['score']}") # 20 (unchanged)
print(f"Fallback still works: {dataset_copy[0]}") # From original
Fallback Policies¶
# Frozen policy: Read-only fallback, copy-on-write for modifications
frozen_copy = znsocket.List(
r=client,
key="frozen_copy",
fallback="original_dataset",
fallback_policy="frozen"
)
# Copy policy: Full copy of fallback data on initialization
full_copy = znsocket.List(
r=client,
key="full_copy",
fallback="original_dataset",
fallback_policy="copy"
)
JavaScript Integration¶
Copy-on-write operations work seamlessly with the JavaScript client:
import { createClient, List, Segments } from 'znsocket';
const client = createClient({ url: 'znsocket://127.0.0.1:5000' });
await client.connect();
// Work with the copy-on-write data from JavaScript
const datasetCopy = new List({ client, key: 'dataset_copy' });
// Access data (will use fallback or modifications as appropriate)
const length = await datasetCopy.length();
const firstItem = await datasetCopy.get(0);
const modifiedItem = await datasetCopy.get(1);
// Modify from JavaScript side
await datasetCopy.set(2, {
name: 'js_modified',
score: 777,
source: 'javascript'
});
// Use slice operations
const subset = await datasetCopy.slice(0, 2);
console.log('First two items:', subset);
Cross-Language Copy-on-Write Example¶
This example demonstrates copy-on-write behavior across Python and JavaScript:
Python side (creating original data and modifications):
import znsocket
client = znsocket.Client("http://localhost:5000")
# Create original dataset with Dict objects
lst = znsocket.List(r=client, key="test:data")
data = [
{"value": [1, 2, 3]},
{"value": [4, 5, 6]},
{"value": [7, 8, 9]},
{"value": [10, 11, 12]},
]
# Use pipeline for efficient batch operations
p = client.pipeline()
msg = []
for idx, value in enumerate(data):
atoms_dict = znsocket.Dict(r=p, key=f"test:data/{idx}")
for k, v in value.items():
atoms_dict[k] = v
msg.append(atoms_dict)
p.execute()
lst.extend(msg)
# Create copy-on-write view using Segments
segments = znsocket.Segments(r=client, origin=lst, key="test:data/segments")
# Modify a single element (copy-on-write)
value_to_modify = segments[2]
modified_value = value_to_modify.copy("test:data/segments/2")
modified_value["value"] = [100, 200, 300]
segments[2] = modified_value
# Original list remains unchanged: lst[2]["value"] == [7, 8, 9]
# Segments shows modification: segments[2]["value"] == [100, 200, 300]
JavaScript side (accessing and extending modifications):
import { createClient, Dict, List } from 'znsocket';
const client = createClient({ url: 'znsocket://127.0.0.1:5000' });
await client.connect();
// Access the original data created by Python
const lst = new List({ client, key: 'test:data' });
// Verify original data is accessible
const item2 = await lst.get(2);
console.log(await item2.get('value')); // [7, 8, 9] - original unchanged
// Access Python's copy-on-write modification
const modifiedSegment = new Dict({ client, key: 'test:data/segments/2' });
console.log(await modifiedSegment.get('value')); // [100, 200, 300] - Python modification
// Create JavaScript-side copy-on-write modification
const jsModified = new Dict({ client, key: 'test:data/js_copy/1' });
await jsModified.clear();
await jsModified.set('value', [400, 500, 600]);
await jsModified.set('modified_by', 'javascript');
// Verify copy-on-write behavior
const originalItem1 = await lst.get(1);
console.log(await originalItem1.get('value')); // [4, 5, 6] - still original
console.log(await jsModified.get('value')); // [400, 500, 600] - JS modification
// Both languages can work with the same logical dataset
// while maintaining independent modifications
ListAdapter + Segments Integration¶
This example demonstrates copy-on-write with ListAdapter and Segments across languages:
Python side (ListAdapter setup and Segments modifications):
import znsocket
client = znsocket.Client("http://localhost:5000")
# Start with a regular Python list
original_data = [
{"name": "item_0", "score": 85, "category": "A"},
{"name": "item_1", "score": 92, "category": "B"},
{"name": "item_2", "score": 78, "category": "A"},
{"name": "item_3", "score": 96, "category": "C"},
{"name": "item_4", "score": 83, "category": "B"},
]
# Use ListAdapter to expose Python list via ZnSocket
znsocket.ListAdapter(
socket=client,
key="test:adapter_data",
object=original_data
)
# Create a List view of the adapted data
lst = znsocket.List(r=client, key="test:adapter_data")
# Create copy-on-write view using Segments
segments = znsocket.Segments(
r=client,
origin=lst,
key="test:adapter_segments"
)
# Create modified versions using copy-on-write
modified_dict = znsocket.Dict(r=client, key="test:adapter_segments/2")
modified_dict.clear()
modified_dict.update({
"name": "item_2_modified",
"score": 95,
"category": "A+",
"modified": True,
"source": "segments_copy"
})
segments[2] = modified_dict
# Original Python list remains unchanged: original_data[2]["score"] == 78
# Adapter list remains unchanged: lst[2]["score"] == 78
# Segments shows modification: segments[2]["score"] == 95
JavaScript side (accessing adapter data and creating more modifications):
import { createClient, Dict, List } from 'znsocket';
const client = createClient({ url: 'znsocket://127.0.0.1:5000' });
await client.connect();
// Access the ListAdapter data from JavaScript
const lst = new List({ client, key: 'test:adapter_data' });
// Verify original adapter data is accessible
const originalItems = [];
for (let i = 0; i < await lst.length(); i++) {
originalItems.push(await lst.get(i));
}
console.log('Original adapter data:', originalItems);
// Access Python's segment modification
const pythonModified = new Dict({ client, key: 'test:adapter_segments/2' });
console.log('Python modification:', {
name: await pythonModified.get('name'), // "item_2_modified"
score: await pythonModified.get('score'), // 95
source: await pythonModified.get('source') // "segments_copy"
});
// Create JavaScript-side segment modification
const jsModified = new Dict({ client, key: 'test:adapter_segments/1_js' });
await jsModified.clear();
await jsModified.set('name', 'item_1_js_enhanced');
await jsModified.set('score', 100);
await jsModified.set('category', 'S+');
await jsModified.set('enhanced_by', 'javascript');
// Verify copy-on-write behavior across languages
const stillOriginal = await lst.get(1);
console.log('Original unchanged:', stillOriginal.score); // 92
console.log('JS modification:', await jsModified.get('score')); // 100
// Both Python list, ListAdapter, and all modifications coexist independently
Use Cases and Patterns¶
Scientific Computing¶
# Large simulation dataset
base_simulation = znsocket.List(r=client, key="base_sim")
# ... populate with expensive simulation results
# Create parameter variations
high_temp_sim = znsocket.Segments(r=client, origin=base_simulation, key="high_temp")
low_pressure_sim = znsocket.Segments(r=client, origin=base_simulation, key="low_pressure")
# Modify only specific conditions
for i in temperature_sensitive_indices:
result = dict(high_temp_sim[i])
result["temperature"] = result["temperature"] * 1.2
high_temp_sim[i] = result
Data Preprocessing Pipelines¶
# Raw dataset
raw_data = znsocket.List(r=client, key="raw_data")
# Create preprocessing variants
normalized_data = znsocket.Segments(r=client, origin=raw_data, key="normalized")
filtered_data = znsocket.Segments(r=client, origin=raw_data, key="filtered")
# Apply transformations only where needed
for i, sample in enumerate(raw_data):
if sample["quality_score"] < threshold:
cleaned_sample = preprocess(sample)
filtered_data[i] = cleaned_sample
A/B Testing and Experimentation¶
# Base configuration
base_config = znsocket.List(r=client, key="base_config")
# Create test variants
variant_a = znsocket.List(
r=client,
key="variant_a",
fallback="base_config",
fallback_policy="frozen"
)
variant_b = znsocket.List(
r=client,
key="variant_b",
fallback="base_config",
fallback_policy="frozen"
)
# Modify only test parameters
variant_a[config_index] = {"feature_x": True, "algorithm": "new_algo"}
variant_b[config_index] = {"feature_x": False, "algorithm": "baseline"}
Performance Considerations¶
Storage Efficiency¶
Segments: Only modified elements consume additional storage
List Fallbacks: Modified elements stored in new key, rest referenced
Memory Usage: Minimal overhead for unchanged data
Access Patterns¶
Read Operations: Efficient fallback to original data
Write Operations: Copy-on-write triggers only for modified elements
Slice Operations: Supported across both original and modified data
Network Efficiency¶
Large Datasets: Only deltas transmitted over network
Batch Operations: Modifications can be batched for efficiency
Compression: Automatic compression for large modifications
Best Practices¶
Choose the Right Approach
Use Segments for true copy-on-write with maximum efficiency
Use List Fallbacks for simpler scenarios with automatic fallback
Data Structure Design
# Good: Structured data that can be selectively modified structured_data = { "metadata": {...}, "parameters": {...}, "results": {...} } # Avoid: Monolithic structures that require full replacement monolithic_data = "large_serialized_blob"
Modification Patterns
# Good: Modify copy of original data original_item = dict(copy_segments[index]) original_item["field"] = new_value copy_segments[index] = original_item # Avoid: Direct mutation (may not trigger copy-on-write) copy_segments[index]["field"] = new_value # Problematic
Key Management
# Good: Descriptive keys for tracking variants experiment_high_lr = znsocket.Segments(r=client, origin=base, key="exp_high_lr_v1") # Good: Use timestamps or IDs for versioning variant_key = f"experiment_{experiment_id}_{timestamp}"
Error Handling and Edge Cases¶
# Handle missing original data
try:
copy_segments = znsocket.Segments(r=client, origin=original_list, key="copy")
except Exception as e:
print(f"Original data not available: {e}")
# Create new list or handle gracefully
# Check if fallback is available
copy_list = znsocket.List(r=client, key="copy", fallback="original", fallback_policy="frozen")
if len(copy_list) == 0:
print("No data available from fallback")
# Verify data integrity
assert copy_segments[unchanged_index] == original_list[unchanged_index]
assert copy_segments[modified_index] != original_list[modified_index]
Troubleshooting¶
Common Issues¶
- Fallback not working
Verify fallback key exists and contains data
Check fallback_policy is set correctly
Ensure original data is populated before creating copy
- Modifications not persisting
Confirm you’re modifying a copy of the data, not the original reference
Use dict() conversion for complex objects before modification
Verify the key has write permissions
- Performance issues
Monitor the number of modified elements vs. total elements
Consider batching modifications for large datasets
Use appropriate chunking for very large copy operations
- Memory usage concerns
Profile actual storage usage vs. expected savings
Consider cleanup of unused copy variants
Monitor original data lifecycle and copy dependencies