Runware

Information Technology and Services

StaffSoftwareEngineer-Inference&Performance

₹75–120L ~AI est. Remote FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Software Engineer - Inference & Performance at Runware. Skills: Inference performance, Distributed systems, Performance engineering. Own end-to-end inference performance. Lead architecture of inference systems”

What You'll Achieve.

Achieve sub-one-second inference; Make Runware fast inference platform; Make Runware reliable inference platform

Industry & Context.

Information Technology and Services
Problems you'll solve

Troubleshooting; Root cause analysis

What They're Looking For.

Must Have

Excellent software engineering experience, Backend and systems development experience, Building and operating high-performance distributed systems, Deep understanding of asynchronous processing, Deep understanding of queues, Deep understanding of concurrency models, Deep understanding of back pressure, Intuition for performance trade-offs, Experience making architectural decisions, Experience defending architectural decisions, Hands-on experience troubleshooting production issues, Ability to communicate clearly, Ability to influence across teams, Mentorship mindset, Desire to raise technical bar

Nice to Have

Experience working on AI/ML inference platforms, Experience with GPU-backed workloads, Experience with performance-critical compute systems, Knowledge of model optimisation techniques, Experience with infrastructure-as-code, Experience with DevOps practices, Background in startups, Prior ownership of latency SLOs, Prior ownership of throughput SLOs

What You'll Do.

Own end-to-end inference performance

Lead architecture of inference systems

Lead design of inference systems

Drive platform toward sub-1 second inference

Identify bottlenecks across networking

Identify bottlenecks across services

Identify bottlenecks across storage

Identify bottlenecks across GPU execution

Make high-impact architectural decisions

Partner with ML teams

Partner with model teams

Define performance budgets

Define success metrics

Ensure metrics are measured

Ensure metrics are visible

Ensure metrics are actively improved

Lead deep-dive investigations into latency spikes

Lead deep-dive investigations into throughput degradation

Lead deep-dive investigations into system-level performance issues

Influence engineers on performance engineering

Influence engineers on distributed systems thinking

Influence engineers on operational excellence

Mentor engineers on performance engineering

Mentor engineers on distributed systems thinking

Mentor engineers on operational excellence

Improve tooling capabilities

Improve observability capabilities

Improve profiling capabilities

Advocate for engineering best practices

Advocate for testing practices

Advocate for benchmarking practices

Advocate for rollouts practices

Advocate for documentation practices

How You'll Work.

Team & Collaboration

Partnering with product teams; Partnering with ML teams; Partnering with platform teams; Influence across teams

Communication Scope

Communicate clearly

Full Job Description

We’re looking for a Staff Engineer to take technical ownership of latency, throughput, and reliability across Runware’s AI inference platform. This is a senior technical leadership role for someone who obsesses over performance at scale, from request ingress through GPU execution to result delivery, and who can consistently turn ambitious targets such as sub-one-second inference into production reality. As a Staff Engineer, you will define and drive the architecture, standards, and execution needed to make Runware one of the fastest and most reliable inference platforms in the market. You will work deeply across backend services, distributed systems, GPU workloads, and infrastructure, partnering closely with product, ML, and platform teams. This role is ideal for someone who enjoys operating at the intersection of systems design, performance engineering, and real-world scale, and who wants clear ownership over outcomes that matter directly to customers. ### What you’ll do * Own end-to-end inference performance across the platform, with clear responsibility for latency, throughput, and reliability targets * Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery * Drive the platform toward sub-1 second inference where feasible, identifying bottlenecks across networking, services, storage, and GPU execution * Make high-impact architectural decisions with performance, scalability, and operational simplicity as first-class concerns * Partner with ML and model teams to ensure models are production-ready from a performance perspective (cold starts, batching, memory usage, concurrency) * Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and actively improved * Lead deep-dive investigations into latency spikes, throughput degradation, and system-level performance issues * Influence and mentor engineers across teams on performance engineeri

Free ATS check

Applying for this Staff Software Engineer - Inference & Performance role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Runware?

Real rants from real employees. Read before you apply.

Read Company Rants →