Kog

Engineering

ResearchEngineer

€65–95k ~AI est. Paris, France FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Research Engineer at Kog. Skills: LLM inference, Model architecture, Inference optimization. Design new model architecture variants. Extend Laneformer thesis”

What You'll Achieve.

Measurable gains in generation speed; Measurable gains in model quality

Industry & Context.

Engineering

Problems you'll solve

Run experiments; Morph models; Turn findings into gains

What They're Looking For.

Must Have

Experience adapting model architectures, Understand communication structure inference behavior, Fluency in Transformers and MoE, Reason across trade-offs

Nice to Have

Experience in post-training methods, Preference optimization experience, Quantization experience

What You'll Do.

Design new model architecture variants

Extend Laneformer thesis

Explore inference-aware architectural variants

Own post-training pipeline

Adapt existing open-weight models

Scale stack to large MoE models

Write up research findings

Submit research papers

Present research at conferences

Contribute to building AI agents

Perform architecture research

Perform training experiments

How You'll Work.

Team & Collaboration

Cross-functional teams

Communication Scope

Research papers; Conferences

Full Job Description

About Kog Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding). We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it. We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs. We are a team of 11 people, including 10 engineers and 4 PhDs. Test it at playground.kog.ai http://playground.kog.ai. Read the technical details on the Kog Labs blog https://blog.kog.ai. What you will work on You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality. - Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input. - Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale. - Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed. - Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time. - Write up findings as research papers, submit them to top venues, and present them at conferences. - Contribute to building AI agents that will perform architecture research and training experi

Free ATS check

Applying for this Research Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Kog?

Real rants from real employees. Read before you apply.

Read Company Rants →