Products

Kernelize uses and actively supports the open-source Triton compiler and language. Triton is widely used to describe optimized GPU kernels and we leverage Triton to quickly target and optimize for new AI accelerator hardware.

Triton already supports autotune to search for supported and optimal kernels, so the main features needed to target new hardware are a modular backend and discovery-based runtime. Most AI frameworks and ML graph compilers already target Triton by default.

Triton compiler architecture illustration

Kernelize Forge

Kernelize Forge is a modular backend for Triton. It extends Triton so the compiler can target more than GPUs. Forge generates an LLVM output based on target-specific primitives. It supports the autotune process to find both what is supported and what is optimal. Forge is configured at compile time based on the Nexus device discovery and autotune.

Kernelize Nexus

Kernelize Nexus is a discovery-based runtime environment that plugs into every AI framework. It uses automated device discovery to check what is supported and autotune to determine what is optimal. Nexus connects to a runtime plugin to drive the AI accelerator hardware. The discovery process allows hardware vendors to add features over time or to test new hardware without disclosing low-level hardware details in the compiler.

Please contact bryan@kernelize.ai if you would like to know more.