SplitTracr is a framework for distributed neural network inference that enables controlled partitioning of deep learning models across multiple devices. It provides per-layer performance metrics collection and network communication primitives for split computing research and experimentation.
Warning
tracr
is currently an experimental framework intended to explore distributed AI inference patterns. While functional, it is primarily for research and educational purposes. The pickle
module is used for compression and decompression, as each device is trusted. These are internally flagged for conversion to safe functions in the future.
┌───────────────────────────────────────────────────────────────────────────────┐
│ SETUP PHASE │
├─────────────────────────┬───────────────────────────────┬─────────────────────┤
│ 1. Repository │ 2. Configuration │ 3. SSH Setup │
│ │ │ │
│ git clone │ cp devices_template.yaml │ ssh-keygen │
│ cd tracr │ devices_config.yaml │ ssh-copy-id │
│ python -m venv │ │ chmod 600 keys │
│ pip install │ Edit IP/user settings │ │
└─────────────────────────┴───────────────────────────────┴─────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────────┐
│ EXECUTION PHASE │
├─────────────────────────┬───────────────────────────────┬─────────────────────┤
│ 4. Server │ 5. Host Execution │ 6. Analysis │
│ │ │ │
│ python server.py │ python host.py │ Review metrics in │
│ │ --config config/NAME.yaml │ results directory │
│ │ │ │
└─────────────────────────┴───────────────────────────────┴─────────────────────┘
- Two networked devices: Server (higher compute capability) and Host/Edge device
- SSH access between devices: For secure communication and file transfer
- Python 3.10+: With required dependencies on both devices
- CUDA support: Recommended on server device for accelerated processing
-
Clone and install dependencies on both devices
git clone https://github.com/nbovee/tracr.git && cd tracr python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt # alternatively, use the requirements-cu###.txt file for your cuda version.
-
Configure devices by copying and editing the template
cp config/devices_template.yaml config/devices_config.yaml # Edit devices_config.yaml with proper IP addresses and credentials
-
Setup SSH keys for secure communication
mkdir -p config/pkeys/ # Generate and deploy keys on both devices ssh-keygen -t rsa -b 4096 -f ~/.ssh/device_key ssh-copy-id -i ~/.ssh/device_key.pub user@other_device_ip cp ~/.ssh/device_key config/pkeys/keyname.rsa chmod 600 config/pkeys/*.rsa
-
Execute the experiment
# On Server - must start first python server.py # On Host/Edge device python host.py --config config/alexnetsplit.yaml
- Architecture Overview
- Technical Components
- Prerequisites
- Detailed Setup
- Running Experiments
- Extending SplitTracr
- Performance Optimization
- Troubleshooting
- License and Citation
SplitTracr implements a distributed neural network execution architecture with a host-server paradigm:
┌──────────────────┐ ┌──────────────────┐
│ │ │ │
│ Host (Edge) │ │ Server (Cloud) │
│ │ │ │
└──────┬───────────┘ └──────┬───────────┘
│ │
│ 1. Load configuration │ 1. Listen for connections
│ 2. Initialize model │ 2. Initialize matching model
│ 3. Process input to split layer │ 3. Wait for tensor data
│ │
│ Intermediate Tensor │
│ ─────────────────────────────────────────────► │
│ │
│ │ 4. Process from split layer
│ │ 5. Generate results
│ │
│ Results Tensor │
│ ◄───────────────────────────────────────────── │
│ │
│ 7. Post-process output │
│ 8. Generate visualization │
│ │
The framework comprises three core technical components:
- Model Hooking System: Fine-grained instrumentation of neural network execution
- Tensor Sharing Pipeline: Efficient transmission of intermediate model outputs
- Metrics Collection Framework: Comprehensive performance data acquisition
The hooking system instruments neural networks at the layer level, providing:
- Layer-specific interception: Pre-hooks and post-hooks at each layer boundary
- Dual execution modes: Edge mode (start→split) and Server mode (split→end)
- Early termination mechanism: Controlled execution cessation via hook exceptions
- Granular metrics: Timing, energy, and memory data captured per layer
The tensor sharing pipeline implements a robust protocol for intermediate tensor transmission:
┌───────────────────┐ ┌───────────────────┐
│ Edge Device │ │ Server │
└────────┬──────────┘ └────────┬──────────┘
│ │
│ 1. Prepare tensor with metadata │
│ │
│ 2. Compress tensor │
│ - Serialization │
│ - Blosc compression │
│ │
│ 3. Encrypt compressed tensor (optional) │
│ - AES-GCM with nonce │
│ │
│ 4. Send tensor │
│ ─────────────────────────────────────────────► │
│ │
│ │ 5. Decrypt received tensor
│ │
│ │ 6. Decompress tensor
│ │ - Blosc decompression
│ │ - Deserialization
│ │
│ │ 7. Process tensor from
│ │ split layer to output
│ │
│ 8. Return result │
│ ◄─────────────────────────────────────────────── │
│ │
│ 9. Decrypt/decompress result │
│ │
│ 10. Final processing │
│ │
Key features include:
- Length-prefixed framing: Robust message boundary handling
- Configurable compression: ZSTD, LZ4, or BLOSCLZ with tensor-optimized filters
- Secure transmission (future work): Optional AES-GCM encryption
- Large tensor management: Chunked transfer for tensors exceeding buffer limits
The metrics framework captures comprehensive performance data:
Metric Type | Measurements |
---|---|
Timing | Per-layer latency, network transfer time, end-to-end latency |
Energy | Power consumption, energy efficiency, communication energy cost |
Memory | Peak utilization, tensor dimensions, bandwidth requirements |
Network | Data volume, compression efficacy, throughput metrics |
Hardware | Processor utilization, thermal characteristics, clock frequency |
- Python 3.10+
- SSH client/server (
openssh-client
/openssh-server
) - CUDA toolkit (recommended for server)
sudo apt update && sudo apt install -y openssh-server openssh-client
# CUDA: https://developer.nvidia.com/cuda-downloads
# Enable OpenSSH in Settings > Optional Features
# OR
Add-WindowsCapability -Online -Name OpenSSH.Client~~~~0.0.1.0
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
# WSL2 if needed
wsl --install
tracr/
├── config/ # Configuration files
│ ├── pkeys/ # SSH keys directory
│ ├── devices_config.yaml
│ └── *split.yaml # Model configurations
├── data/ # Dataset storage
├── src/ # Source code
│ ├── api/ # Core API components
│ ├── experiment_design/ # Experiment implementations
│ └── utils/ # Utility functions
├── host.py # Host device entry point
└── server.py # Server entry point
- Initialize server:
python server.py
- Execute on host:
python host.py -c config/alexnetsplit.yaml
For single-device testing:
python server.py -l -c config/alexnetsplit.yaml
SplitTracr includes optimized configurations for:
Classification:
- AlexNet (
alexnetsplit.yaml
) - ResNet (
resnetsplit.yaml
) - VGG (
vggsplit.yaml
) - EfficientNet (
efficientnet_split.yaml
) - MobileNet (
mobilenetsplit.yaml
)
Object Detection:
- YOLOv8 (
yolov8split.yaml
) - YOLOv5 (
yolov5split.yaml
)
Register custom models using the decorator pattern:
from experiment_design.models.registry import ModelRegistry
@ModelRegistry.register("my_custom_model")
class MyCustomModel(nn.Module):
def __init__(self, model_config: Dict[str, Any], **kwargs) -> None:
super().__init__()
self.model = nn.Sequential(
# Model architecture
)
def forward(self, x: Tensor) -> Tensor:
return self.model(x)
Create dataset classes in the appropriate module:
from experiment_design.datasets.base import BaseDataset
class MyDataset(BaseDataset):
def __init__(self, root):
super().__init__(root)
# Dataset initialization
Optimize tensor transmission with compression parameters:
compression:
clevel: 3 # Compression level (1-9)
filter: "SHUFFLE" # Filter optimized for tensors
codec: "ZSTD" # Compression algorithm
Select optimal split points based on:
- Computational equilibrium: Balance processing loads between devices
- Tensor dimensionality: Minimize intermediate tensor size
- Layer characteristics: Avoid splitting recursive or residual blocks
Unit Testing
- If issues present themselves, the provided unit tests may have some insight to the error. Please run the following, and refine to individual files for further details:python -m unittest discover -s ./tests
or if using uv, the command will be:
uv run -m unittest discover -s ./tests
-
SSH Key Configuration:
- Verify permissions:
ls -l config/pkeys/*.rsa
- Test connectivity:
ssh -i config/pkeys/key.rsa user@host
- Check SSH daemon:
systemctl status sshd
- Verify permissions:
-
Network Configuration:
- Verify network connectivity between devices
- Ensure ports are open in firewall settings
- Check for IP address conflicts
Model Execution Problems
-
Split Layer Configuration:
- Ensure split_layer < model depth
- Verify layer compatibility for splitting
- Check memory requirements for selected split
-
Dataset Configuration:
- Confirm path accuracy in configuration
- Verify data format compatibility
- Check permissions on data directories
Performance Optimization
-
Resource Monitoring:
- GPU monitoring:
nvidia-smi -l 1
- CPU utilization:
top
orhtop
- Network throughput:
iftop
- GPU monitoring:
-
Optimization Strategies:
- Adjust batch size for memory constraints
- Modify worker thread count
- Experiment with different split points
This project is licensed under the MIT License - see LICENSE for details.
@inproceedings{bovee2025splitracr,
author = {Bovee, Nicholas and Ali, Izhar and Patapanchala, Gopi and Bitla, Suraj and Ho, Shen Shyang},
title = {SplitTracr: A Flexible Performance Evaluation Tool for Cooperative Inference and Split Computing},
booktitle = {Proceedings of the International Conference on Performance Engineering (ICPE)},
year = {2025},
address = {Toronto, Canada},
month = {May},
publisher = {ACM/SPEC}
}
-
Nicholas Bovee, Izhar Ali, Gopi Patapanchala, Suraj Bitla, and Shen Shyang Ho, "SplitTracr: A Flexible Performance Evaluation Tool for Cooperative Inference and Split Computing," International Conference on Performance Engineering (ICPE), Toronto, Canada, May 5-9, 2025.
-
Shen-Shyang Ho, Paolo Rommel Sanchez, Nicholas Bovee, Suraj Bitla, Gopi Krishna Patapanchala and Stephen Piccolo, "Poster: Computation Offloading for Precision Agriculture using Cooperative Inference," 8th IEEE International Conference on Fog and Edge Computing (ICFEC 2024), Philadelphia, PA, May 6-9, 2024.
-
Nicholas Bovee, Stephen Piccolo, Suraj Bitla, Gopi Krishna Patapanchala and Shen-Shyang Ho, "Poster: SplitTracer: A Cooperative Inference Evaluation Toolkit for Computation Offloading on the Edge," 8th IEEE International Conference on Fog and Edge Computing (ICFEC 2024), Philadelphia, PA, May 6-9, 2024.
-
Nicholas Bovee, Stephen Piccolo, Shen Shyang Ho, and Ning Wang, "Experimental test-bed for Computation Offloading for Cooperative Inference on Edge Devices," EdgeComm: The Fourth Workshop on Edge Computing and Communications (at ACM/IEEE Symposium on Edge Computing), December 9, 2023, Wilmington, DE.