Kog
Engineering
ResearchEngineer
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Research Engineer at Kog. Skills: LLM inference, Model architecture, Inference optimization. Design new model architecture variants. Extend Laneformer thesis”
What You'll Achieve.
Measurable gains in generation speed; Measurable gains in model quality
Industry & Context.
Run experiments; Morph models; Turn findings into gains
What They're Looking For.
Must Have
Experience adapting model architectures, Understand communication structure inference behavior, Fluency in Transformers and MoE, Reason across trade-offs
Nice to Have
Experience in post-training methods, Preference optimization experience, Quantization experience
What You'll Do.
Design new model architecture variants
Extend Laneformer thesis
Explore inference-aware architectural variants
Own post-training pipeline
Adapt existing open-weight models
Scale stack to large MoE models
Write up research findings
Submit research papers
Present research at conferences
Contribute to building AI agents
Perform architecture research
Perform training experiments
How You'll Work.
Team & Collaboration
Cross-functional teams
Communication Scope
Research papers; Conferences
Full Job Description
About Kog Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding). We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it. We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs. We are a team of 11 people, including 10 engineers and 4 PhDs. Test it at playground.kog.ai http://playground.kog.ai. Read the technical details on the Kog Labs blog https://blog.kog.ai. What you will work on You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality. - Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input. - Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale. - Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed. - Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time. - Write up findings as research papers, submit them to top venues, and present them at conferences. - Contribute to building AI agents that will perform architecture research and training experi
Applying for this Research Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Kog?
Real rants from real employees. Read before you apply.