Our Products

Kernelize enables vLLM, Ollama and SGLang to target new NPU, CPU and GPU hardware devices, making AI inference significantly less expensive to run.

Significantly Lower Inference Costs

By enabling your inference platforms to target new hardware devices, Kernelize helps you run AI inference at a fraction of the cost. New NPUs, specialized CPUs, and optimized GPUs can deliver the same performance at significantly lower operational costs.

Powered by Triton

Triton is the key enabling technology that makes cost-effective inference possible. Our products use Triton to generate optimized kernels for new hardware targets, enabling your inference platforms to run on cost-effective hardware alternatives.

Learn More About Triton

Kernelize Nexus

Runtime optimization and layer support for inference platforms

Kernelize Nexus runs alongside existing runtimes in each supported platform to optimize and support layers on new target inference hardware.

Key Features:

  • Extends existing inference platform runtimes
  • Optimizes layers on new target inference hardware
  • Works with vLLM, Ollama, and SGLang
  • Seamless integration with existing workflows

Kernelize Forge

Triton kernel generation for new hardware targets

Kernelize Forge works alongside existing kernel libraries, like GGML, using Triton to generate optimized kernels for hardware that doesn't have native support.

Key Features:

  • Uses Triton to target new hardware devices
  • Works with existing kernel libraries like GGML
  • Generates optimized kernels for NPUs, CPUs, and GPUs
  • Leverages existing Triton knowledge and tools

How Kernelize Works

1. Extend Platforms

Kernelize Nexus integrates with existing inference platforms to add support for new hardware

2. Generate Kernels

Kernelize Forge uses Triton to generate optimized kernels for new hardware targets

3. Reduce Costs

Run your existing inference workloads on cost-effective hardware alternatives

Common Use Cases

Hardware Vendor Integration

Enable your new NPU or specialized hardware to work with popular inference platforms like vLLM and Ollama

Kernelize Nexus + Forge

Cost Optimization

Reduce inference costs by enabling platforms to run on more cost-effective hardware alternatives

Kernelize Nexus

Triton Kernel Development

Use Triton to generate optimized kernels for hardware that doesn't have native Triton support

Kernelize Forge

Performance Optimization

Optimize layers that are poorly supported by existing inference platforms on your target hardware

Kernelize Nexus + Forge

Ready to Reduce Your Inference Costs?

Get in touch to learn how Kernelize can help you run AI inference at significantly lower cost by enabling your platforms to target new hardware devices.

Contact Us