printf from within kernel?

Hi Folks,
just wondering if it's possible to do a printf from within a kernel.

The Apple OpenCL guide states that printf can be used for debugging. However, it does not explicitly state in that sentence that it can be used for debugging on the GPU.

Certainly, it bombs with an error if you try it.

I see in the NVIDIA docs there is an emulation mode. Is that an alternative?

Many thanks for any advice offered.

Kind regds,
Max

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yes and No

You can use printf to debug a kernel if you're running it on the CPU, since the CPU has built in support for printf. The architecture is different between the CPU and GPU so there are some things you won't be able to test in this way. GPUs, as far as I know, don't currently have explicit support for printf, though there is support for some diagnostic information being gathered from running the kernel on the GPU.

Hi there, that's great

Hi there,
that's great thanks for responding.

Have you any experience with the NVIDIA emulation mode? Any idea if it offers a full blown GPU emulation?

Kind regds,
Max

Found this with a bit more

Found this with a bit more digging in the NVIDIA docs...

NVIDIA_CUDA_Programming_Guide_2.3.pdf

3.2.10
asynchronous, so to check for asynchronous errors, the application must synchronize in-between the kernel launch and the call to cudaGetLastError().
Debugging using the Device Emulation Mode
CUDA-GDB can be used to debug devices of compute capability greater than 1.0. (see the CUDA-GDB user manual for supported platforms). The compiler and runtime also supports an emulation mode for the purpose of debugging that can be used even in the absence of any CUDA-enabled device. When compiling an application in this mode (using the -deviceemu option), the device code is compiled for and runs on the host, allowing the programmer to use the host’s native debugging support to debug the application as if it were a host application. The preprocessor macro __DEVICE_EMULATION__ is defined in this mode. All code for an application, including any libraries used, must be compiled consistently either for device emulation or for device execution. Linking code compiled for device emulation with code compiled for device execution causes the following runtime error to be returned upon initialization: cudaErrorMixedDeviceExecution.
When running an application in device emulation mode, the programming model is emulated by the runtime. For each thread in a thread block, the runtime creates a thread on the host.