quickcdc-cuda

Summary

quickcdc-cuda is a fast content defined chunker for &[u8] slices with CUDA acceleration.

For some background information, see AE: An Asymmetric Extremum Content Defined Chunking Algorithm by Yucheng Zhang.
Modification(s):
- User may provide salt, introducing entropy / cutpoint variation (i.e. files re-processed with different salt values will produce different cutpoints).
- Warp forward (reduced window size), skipping some unnecessary processing that happens before minimum chunk size is reached.
- CUDA acceleration for parallel processing of large data sets.

This implementation leverages CUDA for parallel processing, which can significantly improve performance on systems with NVIDIA GPUs. For systems without CUDA support, a CPU fallback implementation is provided.

Authors

Original quickcdc: James Howard jrobhoward@gmail.com
CUDA Implementation: Sayantan Das sdas.codes@gmail.com

Performance

The CUDA-accelerated version can provide significant speedups compared to the CPU-only version, especially for large datasets. Performance will vary based on your GPU hardware.

In our testing, the CUDA implementation showed:

2-5x speedup for files larger than 100MB
Best performance with chunk sizes between 64KB and 256KB
Diminishing returns for very small files due to GPU data transfer overhead

Requirements

Rust 2021 edition or later
CUDA toolkit (for CUDA acceleration)
NVIDIA GPU with CUDA support (for CUDA acceleration)

Installation

Setting up CUDA

Install the CUDA toolkit from NVIDIA's website: https://developer.nvidia.com/cuda-downloads
Make sure the CUDA toolkit is in your PATH
Set the CUDA_PATH environment variable to your CUDA installation directory

For Ubuntu/Debian:

sudo apt-get install nvidia-cuda-toolkit
export CUDA_PATH=/usr/local/cuda

For macOS:

brew install cuda
export CUDA_PATH=/usr/local/cuda

For Windows:

# Install CUDA toolkit from NVIDIA website
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.x

Building the Project

# Clone the repository
git clone https://github.com/yourusername/quickcdc-cuda.git
cd quickcdc-cuda

# Build the project
cargo build --release

Usage Examples

Basic Example

use quickcdc_cuda;
use rand::Rng;

// Initialize CUDA (only needed once per application)
quickcdc_cuda::Chunker::init_cuda().unwrap();

let mut rng = rand::thread_rng();
let mut sample = [0u8; 1024];
rng.fill(&mut sample[..]);
let target_size = 64;
let max_chunksize = 128;
let salt = 15222894464462204665;

// Use CUDA-accelerated version
let chunker = quickcdc_cuda::Chunker::with_cuda(&sample[..], target_size, max_chunksize, salt).unwrap();
for x in chunker {
    println!("{}", x.len());
}

// Or use CPU version
let chunker = quickcdc_cuda::Chunker::with_params(&sample[..], target_size, max_chunksize, salt).unwrap();
for x in chunker {
    println!("{}", x.len());
}

Processing Files

use quickcdc_cuda;
use std::fs::File;
use std::io::Read;

// Initialize CUDA
quickcdc_cuda::Chunker::init_cuda().unwrap();

// Read a file
let mut file = File::open("large_file.bin").unwrap();
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).unwrap();

// Process with CUDA
let target_size = 128 * 1024; // 128KB target chunk size
let max_size = 512 * 1024;    // 512KB maximum chunk size
let salt = quickcdc_cuda::Chunker::get_random_salt();

let chunker = quickcdc_cuda::Chunker::with_cuda(&buffer, target_size, max_size, salt).unwrap();

// Process chunks
for (i, chunk) in chunker.enumerate() {
    println!("Chunk {}: {} bytes", i, chunk.len());
    // Process chunk data...
}

Command-line Example

The project includes a command-line example that can process directories of files:

# CPU version
cargo run --release --example chunkdir_cuda -- /path/to/directory

# CUDA version
cargo run --release --example chunkdir_cuda -- /path/to/directory --cuda

Performance Tuning

Chunk Size Selection

For general purpose use, a target chunk size of 64KB-128KB works well
For large files (>1GB), larger chunk sizes (256KB-512KB) may improve performance
For small files (<10MB), smaller chunk sizes (16KB-32KB) may be more appropriate

CUDA Optimization

The CUDA implementation performs best with large datasets
For small files, the CPU implementation may be faster due to GPU data transfer overhead
If processing many small files, consider batching them together before processing

Memory Usage

The CUDA implementation requires additional memory for GPU buffers
For very large files, ensure your GPU has sufficient memory
If processing files larger than GPU memory, consider chunking the file first

How It Works

The CUDA implementation parallelizes the chunking process by:

Transferring the input data to the GPU
Running a CUDA kernel that identifies potential chunk boundaries in parallel
Collecting the results and sorting them to ensure correct order
Iterating through the pre-computed boundaries when chunks are requested

This approach is particularly effective for large files where the overhead of GPU data transfer is outweighed by the parallel processing benefits.

Technical Details

The CUDA kernel divides the input data into blocks and processes them in parallel:

Each thread examines a window of bytes to find potential chunk boundaries
The kernel uses atomic operations to collect the results
The host code sorts the boundaries to ensure correct ordering
The chunker iterator uses these pre-computed boundaries to yield chunks

Troubleshooting

CUDA Initialization Fails

Ensure CUDA toolkit is properly installed
Check that your GPU supports CUDA
Verify that CUDA_PATH environment variable is set correctly

Build Errors

Make sure you have the correct CUDA toolkit version
Check that your Rust version is 2021 edition or later
Ensure all dependencies are installed

Performance Issues

Try different chunk sizes
Check GPU utilization with nvidia-smi
For large files, ensure your GPU has sufficient memory

License

quickcdc-cuda is dual licensed under the MIT and Apache 2.0 licenses, the same licenses as the Rust compiler.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
tests		tests
Cargo.toml		Cargo.toml
GUIDE.md		GUIDE.md
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quickcdc-cuda

Summary

Authors

Performance

Requirements

Installation

Setting up CUDA

Building the Project

Usage Examples

Basic Example

Processing Files

Command-line Example

Performance Tuning

Chunk Size Selection

CUDA Optimization

Memory Usage

How It Works

Technical Details

Troubleshooting

CUDA Initialization Fails

Build Errors

Performance Issues

License

About

Releases

Packages

Languages

License

ucalyptus/quickcdc.cu

Folders and files

Latest commit

History

Repository files navigation

quickcdc-cuda

Summary

Authors

Performance

Requirements

Installation

Setting up CUDA

Building the Project

Usage Examples

Basic Example

Processing Files

Command-line Example

Performance Tuning

Chunk Size Selection

CUDA Optimization

Memory Usage

How It Works

Technical Details

Troubleshooting

CUDA Initialization Fails

Build Errors

Performance Issues

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages