Runware
Information Technology and Services
StaffSoftwareEngineer-Inference&Performance
Neural analysis suggests this role is
optimal for Senior candidates.
“Staff Software Engineer - Inference & Performance at Runware. Skills: Inference performance, Distributed systems, Performance engineering. Own end-to-end inference performance. Lead architecture of inference systems”
What You'll Achieve.
Achieve sub-one-second inference; Make Runware fast inference platform; Make Runware reliable inference platform
Industry & Context.
Troubleshooting; Root cause analysis
What They're Looking For.
Must Have
Excellent software engineering experience, Backend and systems development experience, Building and operating high-performance distributed systems, Deep understanding of asynchronous processing, Deep understanding of queues, Deep understanding of concurrency models, Deep understanding of back pressure, Intuition for performance trade-offs, Experience making architectural decisions, Experience defending architectural decisions, Hands-on experience troubleshooting production issues, Ability to communicate clearly, Ability to influence across teams, Mentorship mindset, Desire to raise technical bar
Nice to Have
Experience working on AI/ML inference platforms, Experience with GPU-backed workloads, Experience with performance-critical compute systems, Knowledge of model optimisation techniques, Experience with infrastructure-as-code, Experience with DevOps practices, Background in startups, Prior ownership of latency SLOs, Prior ownership of throughput SLOs
What You'll Do.
Own end-to-end inference performance
Lead architecture of inference systems
Lead design of inference systems
Drive platform toward sub-1 second inference
Identify bottlenecks across networking
Identify bottlenecks across services
Identify bottlenecks across storage
Identify bottlenecks across GPU execution
Make high-impact architectural decisions
Partner with ML teams
Partner with model teams
Define performance budgets
Define success metrics
Ensure metrics are measured
Ensure metrics are visible
Ensure metrics are actively improved
Lead deep-dive investigations into latency spikes
Lead deep-dive investigations into throughput degradation
Lead deep-dive investigations into system-level performance issues
Influence engineers on performance engineering
Influence engineers on distributed systems thinking
Influence engineers on operational excellence
Mentor engineers on performance engineering
Mentor engineers on distributed systems thinking
Mentor engineers on operational excellence
Improve tooling capabilities
Improve observability capabilities
Improve profiling capabilities
Advocate for engineering best practices
Advocate for testing practices
Advocate for benchmarking practices
Advocate for rollouts practices
Advocate for documentation practices
How You'll Work.
Team & Collaboration
Partnering with product teams; Partnering with ML teams; Partnering with platform teams; Influence across teams
Communication Scope
Communicate clearly
Full Job Description
We’re looking for a Staff Engineer to take technical ownership of latency, throughput, and reliability across Runware’s AI inference platform. This is a senior technical leadership role for someone who obsesses over performance at scale, from request ingress through GPU execution to result delivery, and who can consistently turn ambitious targets such as sub-one-second inference into production reality. As a Staff Engineer, you will define and drive the architecture, standards, and execution needed to make Runware one of the fastest and most reliable inference platforms in the market. You will work deeply across backend services, distributed systems, GPU workloads, and infrastructure, partnering closely with product, ML, and platform teams. This role is ideal for someone who enjoys operating at the intersection of systems design, performance engineering, and real-world scale, and who wants clear ownership over outcomes that matter directly to customers. ### What you’ll do * Own end-to-end inference performance across the platform, with clear responsibility for latency, throughput, and reliability targets * Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery * Drive the platform toward sub-1 second inference where feasible, identifying bottlenecks across networking, services, storage, and GPU execution * Make high-impact architectural decisions with performance, scalability, and operational simplicity as first-class concerns * Partner with ML and model teams to ensure models are production-ready from a performance perspective (cold starts, batching, memory usage, concurrency) * Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and actively improved * Lead deep-dive investigations into latency spikes, throughput degradation, and system-level performance issues * Influence and mentor engineers across teams on performance engineeri
Applying for this Staff Software Engineer - Inference & Performance role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Runware?
Real rants from real employees. Read before you apply.