Automatic Message Chunking¶
ZnSocket automatically handles large data transfers by splitting messages into smaller chunks when they exceed the configured size limit. This feature is transparent to users and works seamlessly with all ZnSocket operations.
Overview¶
When working with large datasets (such as numpy arrays, large lists, or complex nested structures), ZnSocket automatically detects when a message exceeds the size limit and splits it into manageable chunks. This ensures reliable transmission over networks with size constraints while maintaining optimal performance.
How It Works¶
The chunking process involves several steps:
Size Detection: Before sending a message, ZnSocket calculates the serialized size
Automatic Splitting: If the size exceeds the limit, the message is split into chunks
Compression: Large messages are automatically compressed using gzip to reduce bandwidth
Sequential Transmission: Chunks are sent one by one with metadata for reassembly
Server Reassembly: The server receives chunks and reconstructs the original message
Error Recovery: Failed transmissions are automatically retried
Configuration Options¶
You can configure chunking behavior when creating a client:
from znsocket import Client
# Configure chunking parameters
client = Client.from_url(
"znsocket://127.0.0.1:5000",
max_message_size_bytes=500000, # 500KB limit (default: 1MB or server limit)
enable_compression=True, # Enable gzip compression (default: True)
compression_threshold=1024 # Compress messages larger than 1KB (default: 1KB)
)
Configuration Parameters¶
max_message_size_bytes: Maximum size for a single message before chunking (default: server limit or 1MB)
enable_compression: Whether to compress large messages (default: True)
compression_threshold: Minimum size for compression activation (default: 1024 bytes)
Example Usage¶
Large Data Storage¶
import numpy as np
from znsocket import Client, Dict
# Connect with chunking enabled
client = Client.from_url("znsocket://127.0.0.1:5000")
# Create a large dataset
large_data = np.random.rand(1000, 1000) # ~8MB array
# Store the data - chunking happens automatically
data_dict = Dict(r=client, key="large_dataset")
data_dict["array"] = large_data # Automatically chunked and compressed
# Retrieve the data - chunks are automatically reassembled
retrieved_data = data_dict["array"]
Working with Lists¶
from znsocket import Client, List
import numpy as np
client = Client.from_url("znsocket://127.0.0.1:5000")
large_list = List(r=client, key="large_data_list")
# Add large items - each will be chunked if necessary
for i in range(10):
large_array = np.random.rand(500, 500) # ~2MB each
large_list.append(large_array) # Automatically chunked
Pipeline Operations¶
Chunking also works seamlessly with pipeline operations:
from znsocket import Client, Dict
import numpy as np
client = Client.from_url("znsocket://127.0.0.1:5000")
# Use pipeline for batch operations
pipeline = client.pipeline()
# Large data operations in pipeline
for i in range(5):
data_dict = Dict(r=pipeline, key=f"dataset_{i}")
large_data = np.random.rand(800, 800) # ~5MB each
data_dict["array"] = large_data # Will be chunked during execution
# Execute all operations - chunking handled automatically
pipeline.execute()
Performance Considerations¶
Chunk Size Optimization¶
The optimal chunk size depends on your network conditions:
Fast, reliable networks: Larger chunks (1-5MB) for better throughput
Slow or unreliable networks: Smaller chunks (100-500KB) for better reliability
High-latency networks: Moderate chunks (500KB-1MB) to balance overhead and reliability
Monitoring Chunking¶
You can monitor chunking activity by enabling debug logging:
import logging
# Enable debug logging for chunking
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("znsocket.client")
logger.setLevel(logging.DEBUG)
This will show messages like:
DEBUG:znsocket.client:Message size (8,000,000 bytes) exceeds limit (1,000,000 bytes). Using chunked transmission.
DEBUG:znsocket.client:Splitting message into 8 chunks
DEBUG:znsocket.client:Sent chunk 1/8 for message abc123
Troubleshooting¶
Common Issues¶
- Messages timing out
Reduce chunk size with
max_message_size_bytes
Increase client timeout settings
Check network stability
- High memory usage
Reduce data size before transmission
Process data in smaller batches
Consider alternative storage mechanisms for very large datasets
- Slow transmission
Enable compression if not already active
Increase chunk size for stable networks
Use pipeline operations for batch transfers
- Chunks failing to reassemble
Check server logs for error messages
Verify network stability
Ensure sufficient server memory