Skip to content

Architecture

Per-example structure

Each example follows the same host + kernel pattern:

File Role
<name>.c Host code: sets up OpenCL context, compiles kernel, manages buffers, launches kernel, reads results
<name>.cl Kernel code: GPU-side computation
clbuild.c Shared helper (see below)
defs.h Shared type definitions and includes

The standard host code flow is:

create device → create context/queue → build program
  → create buffers → set kernel args → enqueue kernel
  → read results → cleanup

Shared helper: clbuild.c and defs.h

common/clbuild.c provides two functions used by square_array, mandelbrot, auger, rng, and waste:

create_device()

Finds a GPU or CPU associated with the first available platform. Tries GPU first (CL_DEVICE_TYPE_GPU); falls back to CPU (CL_DEVICE_TYPE_CPU) if no GPU is found. A platform identifies a vendor's installation — a system may have an NVIDIA platform and an AMD platform simultaneously.

build_program(ctx, dev, filename)

Reads a .cl kernel file into a buffer, calls clCreateProgramWithSource, then clBuildProgram. On compile failure, retrieves and prints the build log via clGetProgramBuildInfo before exiting. The fourth parameter of clBuildProgram accepts compiler options similar to GCC flags (e.g. -DMACRO=VALUE, -cl-opt-disable).

Which examples use it

Uses clbuild.c Inlines kernel source instead
square_array, mandelbrot, auger, rng, waste Hello_World, add_numbers, sum_array

In the CMake build, clbuild.c and defs.h live in common/ and are compiled into a shared library linked by the examples above. Per-directory Makefiles use local copies of these files.

Serial comparison versions

square_array, mandelbrot, and waste each include a *_serial.c file — a plain C CPU implementation of the same computation — for direct CPU vs GPU performance comparison.