KOG

Technology

GPUEngineer

€85–130k ~AI est. Paris, France FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“GPU Engineer at KOG. Skills: GPU kernels, LLM inference, Low-level optimization. Perform experiments. Understand GPU internals”

What You'll Achieve.

Push generation speed

Industry & Context.

Technology

Problems you'll solve

Troubleshooting

What They're Looking For.

Must Have

Written GPU kernels, Show the code, PyTorch custom ops, Inline PTX or CDNA ISA, Latency-sensitive execution paths, MBU matters more than MFU, Inference engine components background, PhD with concrete GPU work

Nice to Have

Experience with LLM inference

What You'll Do.

Understand GPU internals

Find creative solutions

Write optimized GPU kernels

Contribute to monokernel pipeline

Work on low-level GPU optimization

Build profiling infrastructure

Scale stack to MoE models

Contribute to building AI agents

How You'll Work.

Team & Collaboration

Cross-functional teams

Full Job Description

ABOUT KOG Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding). The hot path is a monokernel implemented with handwritten CUDA (with PTX inline assembly) on NVIDIA, and HIP (with CDNA ISA inline assembly) on AMD. We optimize at the low level with engine/kernel/model co-design, using reverse engineering to understand and exploit the details of how the GPU hardware works at the micro level. We are a team of 11 people, including 10 engineers and 4 PhDs. Test it at playground.kog.ai http://playground.kog.ai. Read the technical details on the Kog Labs blog https://blog.kog.ai. WHAT YOU WILL WORK ON You will perform experiments to understand GPU internals, find creative solutions to accelerate critical computational sections used in LLM inference, and write optimized GPU kernels accordingly. Then test, profile, and optimize again. - Contribute to our monokernel pipeline, the single persistent GPU program that covers the full decode pass from QKV projection to LM head sampling, across AMD and NVIDIA architectures. - Work on low-level GPU optimization, including impossibly-fast grid synchronizations and inter-GPU collectives, and optimized GEMM and attention kernels for specific batch sizes and context lengths. - Build profiling infrastructure inside a monokernel, including custom instrumentation, device-timestamp frameworks, and per-stage analysis to translate machine behavior into concrete engineering decisions. - Scale the stack to third-party MoE models such as DeepSeek v4 and Qwen 3 to push generation speed on the models that matter in production today. - Contribute to building AI agents that will perform GPU Engineering research and kernel optimization autonomously, calibrated to hardware target and workload, starting from the inference foundations we are build

Free ATS check

Applying for this GPU Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about KOG?

Real rants from real employees. Read before you apply.

Read Company Rants →