Skip to content

Commit

Permalink
Merge pull request #1 from CNugteren/development
Browse files Browse the repository at this point in the history
Updated to version 2.0
  • Loading branch information
CNugteren committed Jul 13, 2015
2 parents c513e6d + 48ab023 commit ece8586
Show file tree
Hide file tree
Showing 7 changed files with 190 additions and 104 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
# CMake project details
cmake_minimum_required(VERSION 2.8.10)
project("Claduc" CXX)
set(Claduc_VERSION_MAJOR 1)
set(Claduc_VERSION_MAJOR 2)
set(Claduc_VERSION_MINOR 0)

# ==================================================================================================
Expand Down
26 changes: 23 additions & 3 deletions doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,22 @@ Retrieves the maximum amount of on-chip scratchpad memory ('local memory') avail
* `std::string Capabilities() const`:
In case of the OpenCL back-end, this returns a list of the OpenCL extensions supported. For CUDA, this returns the device capability (e.g. SM 3.5).

* `size_t CoreClock() const`:
Retrieves the device's core clock frequency in MHz.

* `size_t ComputeUnits() const`:
Retrieves the number of compute units (OpenCL terminology) or multi-processors (CUDA terminology) in the device.

* `size_t MemorySize() const`:
Retrieves the global memory size (CUDA back-end) or the maximum amount of allocatable global memory per allocation (OpenCL back-end).

* `size_t MemoryClock() const`:
Retrieves the device's memory clock frequency in MHz (CUDA back-end) or 0 (OpenCL back-end).

* `size_t MemoryBusWidth() const`:
Retrieves the device's memory bus-width in bits (CUDA back-end) or 0 (OpenCL back-end).


* `bool IsLocalMemoryValid(const size_t local_mem_usage) const`:
Given a requested amount of local on-chip scratchpad memory, this method returns whether or not this is a valid configuration for this particular device.

Expand All @@ -87,8 +103,8 @@ Status of the run-time compiler. It can complete successfully (`kSuccess`), gene

Constructor(s):

* `Program(const Context &context, const std::string &source)`:
Creates a new OpenCL or CUDA program on a given context. A program is a collection of one or more device kernels which form a single compilation unit together. The device-code is passed as a string. Such a string can for example be generated, hard-coded, or read from file at run-time.
* `Program(const Context &context, std::string source)`:
Creates a new OpenCL or CUDA program on a given context. A program is a collection of one or more device kernels which form a single compilation unit together. The device-code is passed as a string. Such a string can for example be generated, hard-coded, or read from file at run-time. If passed as an r-value (e.g. using `std::move`), the device-code string is moved instead of copied into the class' member variable.

Public method(s):

Expand All @@ -98,6 +114,8 @@ This method invokes the OpenCL or CUDA compiler to build the program at run-time
* `std::string GetBuildInfo(const Device &device) const`:
Retrieves all compiler warnings and errors generated by the build process.

* `std::string GetIR() const`:
Retrieves the intermediate representation (IR) of the compiled program. When using the CUDA back-end, this returns the PTX-code. For the OpenCL back-end, this returns either an IR (e.g. PTX) or a binary. This is different per OpenCL implementation.

Claduc::Queue
-------------
Expand Down Expand Up @@ -192,7 +210,9 @@ Retrieves a new kernel from a compiled program. The kernel name is given as the
Public method(s):

* `template <typename T> void SetArgument(const size_t index, T &value)`:
Method to set a kernel argument. The argument itself (`value`) has to be a non-const l-value, since its address it passed to the OpenCL/CUDA back-end. The argument `index` specifies the position in the list of kernel arguments. The argument `value` can also be a Claduc::Buffer.
Method to set a kernel argument. The argument itself (`value`) has to be a non-const l-value, since its address it passed to the OpenCL/CUDA back-end. The argument `index` specifies the position in the list of kernel arguments. The argument `value` can also be a `Claduc::Buffer`.

* `template <typename... Args> void SetArguments(Args&... args)`: As above, but now sets all arguments in one go, starting at index 0. This overwrites any previous arguments (if any). The parameter pack `args` takes any number of arguments of different types, including `Claduc::Buffer`.

* `size_t LocalMemUsage(const Device &device) const`:
Retrieves the amount of on-chip scratchpad memory (local memory in OpenCL, shared memory in CUDA) required by this specific kernel.
Expand Down
99 changes: 68 additions & 31 deletions include/clpp11.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
// Portability here means that a similar header exists for CUDA with the same classes and
// interfaces. In other words, moving from the OpenCL API to the CUDA API becomes a one-line change.
//
// Version 2.0 (2015-07-13):
// - New methods: Device::CoreClock, Device::ComputeUnits, Device::MemorySize, Device::MemoryClock,
// Device::MemoryBusWidth, Program::GetIR, Kernel::SetArguments
// - Allows device program string to be moved into Program at construction
//
// Version 1.0 (2015-07-09):
// - Initial version
//
Expand Down Expand Up @@ -145,16 +150,10 @@ class Device {
device_ = devices[device_id];
}

// Functions to retrieve device information
std::string Version() const {
return GetInfoString(CL_DEVICE_VERSION);
}
std::string Vendor() const {
return GetInfoString(CL_DEVICE_VENDOR);
}
std::string Name() const {
return GetInfoString(CL_DEVICE_NAME);
}
// Methods to retrieve device information
std::string Version() const { return GetInfoString(CL_DEVICE_VERSION); }
std::string Vendor() const { return GetInfoString(CL_DEVICE_VENDOR); }
std::string Name() const { return GetInfoString(CL_DEVICE_NAME); }
std::string Type() const {
auto type = GetInfo<cl_device_type>(CL_DEVICE_TYPE);
switch(type) {
Expand All @@ -164,29 +163,30 @@ class Device {
default: return "default";
}
}
size_t MaxWorkGroupSize() const {
return GetInfo<size_t>(CL_DEVICE_MAX_WORK_GROUP_SIZE);
}
size_t MaxWorkGroupSize() const { return GetInfo<size_t>(CL_DEVICE_MAX_WORK_GROUP_SIZE); }
size_t MaxWorkItemDimensions() const {
return static_cast<size_t>(GetInfo<cl_uint>(CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS));
return GetInfo(CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS);
}
std::vector<size_t> MaxWorkItemSizes() const {
return GetInfoVector<size_t>(CL_DEVICE_MAX_WORK_ITEM_SIZES);
}
size_t LocalMemSize() const {
return static_cast<size_t>(GetInfo<cl_ulong>(CL_DEVICE_LOCAL_MEM_SIZE));
}
std::string Capabilities() const {
return GetInfoString(CL_DEVICE_EXTENSIONS);
}
std::string Capabilities() const { return GetInfoString(CL_DEVICE_EXTENSIONS); }
size_t CoreClock() const { return GetInfo(CL_DEVICE_MAX_CLOCK_FREQUENCY); }
size_t ComputeUnits() const { return GetInfo(CL_DEVICE_MAX_COMPUTE_UNITS); }
size_t MemorySize() const { return GetInfo(CL_DEVICE_GLOBAL_MEM_SIZE); }
size_t MemoryClock() const { return 0; } // Not exposed in OpenCL
size_t MemoryBusWidth() const { return 0; } // Not exposed in OpenCL

// Configuration-validity checks
bool IsLocalMemoryValid(const size_t local_mem_usage) const {
return (local_mem_usage <= LocalMemSize());
}
bool IsThreadConfigValid(const std::vector<size_t> &local) const {
auto local_size = size_t{1};
for (auto &item: local) { local_size *= item; }
for (const auto &item: local) { local_size *= item; }
for (auto i=size_t{0}; i<local.size(); ++i) {
if (local[i] > MaxWorkItemSizes()[i]) { return false; }
}
Expand All @@ -209,6 +209,13 @@ class Device {
CheckError(clGetDeviceInfo(device_, info, bytes, &result, nullptr));
return result;
}
size_t GetInfo(const cl_device_info info) const {
auto bytes = size_t{0};
CheckError(clGetDeviceInfo(device_, info, 0, nullptr, &bytes));
auto result = cl_uint(0);
CheckError(clGetDeviceInfo(device_, info, bytes, &result, nullptr));
return static_cast<size_t>(result);
}
template <typename T>
std::vector<T> GetInfoVector(const cl_device_info info) const {
auto bytes = size_t{0};
Expand All @@ -220,9 +227,10 @@ class Device {
std::string GetInfoString(const cl_device_info info) const {
auto bytes = size_t{0};
CheckError(clGetDeviceInfo(device_, info, 0, nullptr, &bytes));
auto result = std::vector<char>(bytes);
CheckError(clGetDeviceInfo(device_, info, bytes, result.data(), nullptr));
return std::string(result.data());
auto result = std::string{};
result.resize(bytes);
CheckError(clGetDeviceInfo(device_, info, bytes, &result[0], nullptr));
return result;
}
};

Expand Down Expand Up @@ -264,12 +272,11 @@ class Program {
// Note that there is no constructor based on the regular OpenCL data-type because of extra state

// Regular constructor with memory management
explicit Program(const Context &context, const std::string &source):
explicit Program(const Context &context, std::string source):
program_(new cl_program, [](cl_program* p) { CheckError(clReleaseProgram(*p)); delete p; }),
length_(source.length()) {
std::copy(source.begin(), source.end(), back_inserter(source_));
source_.push_back('\0');
source_ptr_ = source_.data();
length_(source.length()),
source_(std::move(source)),
source_ptr_(&source_[0]) {
auto status = CL_SUCCESS;
*program_ = clCreateProgramWithSource(context(), 1, &source_ptr_, &length_, &status);
CheckError(status);
Expand Down Expand Up @@ -297,17 +304,29 @@ class Program {
auto bytes = size_t{0};
auto query = cl_program_build_info{CL_PROGRAM_BUILD_LOG};
CheckError(clGetProgramBuildInfo(*program_, device(), query, 0, nullptr, &bytes));
auto result = std::vector<char>(bytes);
CheckError(clGetProgramBuildInfo(*program_, device(), query, bytes, result.data(), nullptr));
return std::string(result.data());
auto result = std::string{};
result.resize(bytes);
CheckError(clGetProgramBuildInfo(*program_, device(), query, bytes, &result[0], nullptr));
return result;
}

// Retrieves an intermediate representation of the compiled program
std::string GetIR() const {
auto bytes = size_t{0};
CheckError(clGetProgramInfo(*program_, CL_PROGRAM_BINARY_SIZES, sizeof(size_t), &bytes, nullptr));
auto result = std::string{};
result.resize(bytes);
auto result_ptr = result.data();
CheckError(clGetProgramInfo(*program_, CL_PROGRAM_BINARIES, sizeof(char*), &result_ptr, nullptr));
return result;
}

// Accessor to the private data-member
const cl_program& operator()() const { return *program_; }
private:
std::shared_ptr<cl_program> program_;
size_t length_;
std::vector<char> source_;
std::string source_;
const char* source_ptr_;
};

Expand Down Expand Up @@ -526,7 +545,14 @@ class Kernel {
}
template <typename T>
void SetArgument(const size_t index, Buffer<T> &value) {
CheckError(clSetKernelArg(*kernel_, static_cast<cl_uint>(index), sizeof(cl_mem), &value()));
SetArgument(index, value());
}

// Sets all arguments in one go using parameter packs. Note that this overwrites previously set
// arguments using 'SetArgument' or 'SetArguments'.
template <typename... Args>
void SetArguments(Args&... args) {
SetArgumentsRecursive(0, args...);
}

// Retrieves the amount of local memory used per work-group for this kernel
Expand All @@ -551,6 +577,17 @@ class Kernel {
const cl_kernel& operator()() const { return *kernel_; }
private:
std::shared_ptr<cl_kernel> kernel_;

// Internal implementation for the recursive SetArguments function.
template <typename T>
void SetArgumentsRecursive(const size_t index, T &first) {
SetArgument(index, first);
}
template <typename T, typename... Args>
void SetArgumentsRecursive(const size_t index, T &first, Args&... args) {
SetArgument(index, first);
SetArgumentsRecursive(index+1, args...);
}
};

// =================================================================================================
Expand Down
Loading

0 comments on commit ece8586

Please sign in to comment.