Sciforium
Technology
DistributedTrainingandInferenceEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Distributed Training and Inference Engineer at Sciforium. Skills: Distributed training, ML infrastructure, Systems engineering, Performance optimization. Maintain ML libraries. Update ML frameworks”
Industry & Context.
Debugging complex issues; Troubleshooting; Performance analysis
What They're Looking For.
Must Have
5+ years industry experience, Bachelor's or Master's degree, Python, C++ programming, ML tooling familiarity, Distributed systems familiarity
Nice to Have
Extensive XLA/JAX stack experience, Familiarity with distributed serving, Familiarity with large-scale inference, GPU kernel optimization background, Accelerator-aware model partitioning background
What You'll Do.
Maintain ML libraries
Optimize ML libraries
Optimize ML frameworks
Build ML software stack
Maintain ML software stack
Improve ML software stack
Shard model implementations
Partition model implementations
Configure model implementations
Profile compilation graphs
Profile training workloads
Profile runtime execution
Eliminate performance bottlenecks
Troubleshoot hardware-software issues
Collaborate with research teams
Collaborate with infrastructure teams
Collaborate with kernel engineering teams
Improve system throughput
Improve system stability
Improve developer experience
How You'll Work.
Team & Collaboration
Collaborate with research; Collaborate with infrastructure; Collaborate with kernel engineering
Full Job Description
Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications. ABOUT THE ROLE Sciforium is seeking a highly skilled Distributed Training and Inference Engineer to build, optimize, and maintain the critical software stack that powers our large-scale AI training and serving workloads. In this role, you will work across the entire machine learning infrastructure from low-level CUDA/ROCm runtimes to high-level frameworks like JAX and PyTorch to ensure our distributed training systems are fast, scalable, stable, and efficient. This position is ideal for someone who loves deep systems engineering, debugging complex hardware–software interactions, and optimizing performance at every layer of the ML stack. You will play a pivotal role in enabling the training and deployment of next-generation LLMs and generative AI models. WHAT YOU'LL DO - Software Stack Maintenance: Maintain, update, and optimize critical ML libraries and frameworks including JAX, PyTorch, CUDA, and ROCm across multiple environments and hardware configurations. - End-to-End Stack Ownership: Build, maintain, and continuously improve the entire ML software stack from ROCm/CUDA drivers to high-level JAX/PyTorch tooling. - Distributed System Optimization: Ensure all model implementations are efficiently sharded, partitioned, and configured for large-scale distributed training and serving. - System Integration: Continuously integrate and validate modules for runtime correctness, memory efficiency, and scalability across multi-node GPU/accelerator clusters. - Profiling & Performance Analysis: Conduct detailed profiling of compilation graphs, training workloads, and runtime execution to optimize performance and elimina
Applying for this Distributed Training and Inference Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Sciforium?
Real rants from real employees. Read before you apply.