Skip to content

ucalyptus/quickcdc.cu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quickcdc-cuda

Summary

quickcdc-cuda is a fast content defined chunker for &[u8] slices with CUDA acceleration.

  • For some background information, see AE: An Asymmetric Extremum Content Defined Chunking Algorithm by Yucheng Zhang.
  • Modification(s):
    • User may provide salt, introducing entropy / cutpoint variation (i.e. files re-processed with different salt values will produce different cutpoints).
    • Warp forward (reduced window size), skipping some unnecessary processing that happens before minimum chunk size is reached.
    • CUDA acceleration for parallel processing of large data sets.

This implementation leverages CUDA for parallel processing, which can significantly improve performance on systems with NVIDIA GPUs. For systems without CUDA support, a CPU fallback implementation is provided.

Authors

Performance

The CUDA-accelerated version can provide significant speedups compared to the CPU-only version, especially for large datasets. Performance will vary based on your GPU hardware.

In our testing, the CUDA implementation showed:

  • 2-5x speedup for files larger than 100MB
  • Best performance with chunk sizes between 64KB and 256KB
  • Diminishing returns for very small files due to GPU data transfer overhead

Requirements

  • Rust 2021 edition or later
  • CUDA toolkit (for CUDA acceleration)
  • NVIDIA GPU with CUDA support (for CUDA acceleration)

Installation

Setting up CUDA

  1. Install the CUDA toolkit from NVIDIA's website: https://developer.nvidia.com/cuda-downloads
  2. Make sure the CUDA toolkit is in your PATH
  3. Set the CUDA_PATH environment variable to your CUDA installation directory

For Ubuntu/Debian:

sudo apt-get install nvidia-cuda-toolkit
export CUDA_PATH=/usr/local/cuda

For macOS:

brew install cuda
export CUDA_PATH=/usr/local/cuda

For Windows:

# Install CUDA toolkit from NVIDIA website
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.x

Building the Project

# Clone the repository
git clone https://github.com/yourusername/quickcdc-cuda.git
cd quickcdc-cuda

# Build the project
cargo build --release

Usage Examples

Basic Example

use quickcdc_cuda;
use rand::Rng;

// Initialize CUDA (only needed once per application)
quickcdc_cuda::Chunker::init_cuda().unwrap();

let mut rng = rand::thread_rng();
let mut sample = [0u8; 1024];
rng.fill(&mut sample[..]);
let target_size = 64;
let max_chunksize = 128;
let salt = 15222894464462204665;

// Use CUDA-accelerated version
let chunker = quickcdc_cuda::Chunker::with_cuda(&sample[..], target_size, max_chunksize, salt).unwrap();
for x in chunker {
    println!("{}", x.len());
}

// Or use CPU version
let chunker = quickcdc_cuda::Chunker::with_params(&sample[..], target_size, max_chunksize, salt).unwrap();
for x in chunker {
    println!("{}", x.len());
}

Processing Files

use quickcdc_cuda;
use std::fs::File;
use std::io::Read;

// Initialize CUDA
quickcdc_cuda::Chunker::init_cuda().unwrap();

// Read a file
let mut file = File::open("large_file.bin").unwrap();
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).unwrap();

// Process with CUDA
let target_size = 128 * 1024; // 128KB target chunk size
let max_size = 512 * 1024;    // 512KB maximum chunk size
let salt = quickcdc_cuda::Chunker::get_random_salt();

let chunker = quickcdc_cuda::Chunker::with_cuda(&buffer, target_size, max_size, salt).unwrap();

// Process chunks
for (i, chunk) in chunker.enumerate() {
    println!("Chunk {}: {} bytes", i, chunk.len());
    // Process chunk data...
}

Command-line Example

The project includes a command-line example that can process directories of files:

# CPU version
cargo run --release --example chunkdir_cuda -- /path/to/directory

# CUDA version
cargo run --release --example chunkdir_cuda -- /path/to/directory --cuda

Performance Tuning

Chunk Size Selection

  • For general purpose use, a target chunk size of 64KB-128KB works well
  • For large files (>1GB), larger chunk sizes (256KB-512KB) may improve performance
  • For small files (<10MB), smaller chunk sizes (16KB-32KB) may be more appropriate

CUDA Optimization

  • The CUDA implementation performs best with large datasets
  • For small files, the CPU implementation may be faster due to GPU data transfer overhead
  • If processing many small files, consider batching them together before processing

Memory Usage

  • The CUDA implementation requires additional memory for GPU buffers
  • For very large files, ensure your GPU has sufficient memory
  • If processing files larger than GPU memory, consider chunking the file first

How It Works

The CUDA implementation parallelizes the chunking process by:

  1. Transferring the input data to the GPU
  2. Running a CUDA kernel that identifies potential chunk boundaries in parallel
  3. Collecting the results and sorting them to ensure correct order
  4. Iterating through the pre-computed boundaries when chunks are requested

This approach is particularly effective for large files where the overhead of GPU data transfer is outweighed by the parallel processing benefits.

Technical Details

The CUDA kernel divides the input data into blocks and processes them in parallel:

  • Each thread examines a window of bytes to find potential chunk boundaries
  • The kernel uses atomic operations to collect the results
  • The host code sorts the boundaries to ensure correct ordering
  • The chunker iterator uses these pre-computed boundaries to yield chunks

Troubleshooting

CUDA Initialization Fails

  • Ensure CUDA toolkit is properly installed
  • Check that your GPU supports CUDA
  • Verify that CUDA_PATH environment variable is set correctly

Build Errors

  • Make sure you have the correct CUDA toolkit version
  • Check that your Rust version is 2021 edition or later
  • Ensure all dependencies are installed

Performance Issues

  • Try different chunk sizes
  • Check GPU utilization with nvidia-smi
  • For large files, ensure your GPU has sufficient memory

License

quickcdc-cuda is dual licensed under the MIT and Apache 2.0 licenses, the same licenses as the Rust compiler.

Releases

No releases published

Packages

No packages published