Company
Technology
SeniorMLEngineer(TokenFactory)
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior ML Engineer (Token Factory). Skills: ML Engineering, Inference Optimization, LLM Architectures, GPU Optimization. Drive inference optimization efforts. Identify bottlenecks”
What You'll Achieve.
Improving throughput; Reducing latency; Reducing cost per token
Industry & Context.
Identify bottlenecks; Identify performance constraints; Guide architectural improvements
What They're Looking For.
Must Have
Understanding of machine learning fundamentals, Transformer architectures, Large language models, Hands-on experience profiling GPU workloads, Hands-on experience optimizing GPU workloads, Deep knowledge of GPU architecture, Memory hierarchy trade-offs, Compute vs. memory trade-offs, Experience with large-scale deep learning training, Distributed systems experience, Sharding strategies experience, Custom kernel development experience, Advanced proficiency in Python, Modern ML frameworks proficiency, Solid understanding of software engineering practices, Version control experience, CI/CD pipelines experience, Unit testing experience
Nice to Have
Nsight usage, PyTorch Profiler usage, Flash Attention familiarity, Quantization techniques familiarity
What You'll Do.
Drive inference optimization efforts
Implement performance improvements
Contribute to design of inference engines
Contribute to evolution of inference engines
Develop low-precision training pipelines
Develop low-precision inference pipelines
Productionize low-precision training pipelines
Productionize low-precision inference pipelines
Profile GPU workloads
Analyze GPU workloads
Identify performance constraints
Guide architectural improvements
Collaborate on distributed training systems
Collaborate on distributed inference systems
Contribute to engineering best practices
Contribute to testing practices
Contribute to CI/CD practices
Contribute to maintainable ML systems
How You'll Work.
Team & Collaboration
Highly technical teams; Cross-functional teams
Communication Scope
Communication skills
Full Job Description
## Accountabilities Drive inference optimization efforts by identifying bottlenecks and implementing performance improvements across diverse LLM architectures, improving throughput and reducing latency and cost per token. Contribute to the design and evolution of inference engines, including techniques such as speculative decoding, KV-cache optimization, and support for dense and MoE models. Develop and productionize low-precision training and inference pipelines (e.g., FP8, MXFP4) to maximize efficiency on large GPU clusters. Profile and analyze GPU workloads using modern tooling to identify performance constraints and guide architectural improvements. Collaborate on scalable distributed training and inference systems, including sharding strategies, custom kernels, and hardware-aware optimizations. Contribute to engineering best practices including testing, CI/CD, and maintainable production-grade ML systems. Requirements: Strong understanding of machine learning fundamentals, particularly transformer architectures and large language models. Hands-on experience profiling and optimizing GPU workloads using tools such as Nsight or PyTorch Profiler. Deep knowledge of GPU architecture, including memory hierarchy and compute vs. memory trade-offs. Familiarity with key LLM concepts such as attention mechanisms, RoPE, KV-cache, Flash Attention, and quantization techniques. Experience with large-scale deep learning training, including distributed systems, sharding strategies, and custom kernel development. Strong software engineering skills, with advanced proficiency in Python and modern ML frameworks. Solid understanding of software engineering practices such as version control, CI/CD pipelines, and unit testing. Strong communication skills with the ability to collaborate effectively in highly technical, cross-functional teams. Benefits: Competitive compensation package Strong career development and continuous learning opportunities Flexible work environment with high aut
Applying for this Senior ML Engineer (Token Factory) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Lever
- Lever uses a streamlined one-page form — apply in under 5 minutes.
- LinkedIn import works well; review parsed data before submitting.
- The cover letter field is optional but visible to reviewers — use it to differentiate.
- Referral codes from employees can significantly boost visibility of your application.
ANONYMOUS · UNFILTERED
What do employees actually say about this company?
Real rants from real employees. Read before you apply.