Annapurna Labs

Technology

SeniorSoftwareEngineer-MLNetworkStack

$420–630k ~AI est. Tel Aviv-Yafo, Tel Aviv, Israel FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer- ML Network Stack at Annapurna Labs. Skills: ML Network Stack, AI/ML Systems, Distributed Systems. Build infrastructure. Maintain infrastructure”

Industry & Context.

Technology
Problems you'll solve

Solve hard problems

What They're Looking For.

Must Have

5+ years software development experience, 5+ years system design experience, 5+ years full SDLC experience, 3+ years mentor experience, 3+ years SW/HW Co-Design experience

Nice to Have

Bachelor's degree in computer science, Experience creating automated dashboards, Experience creating visualization

What You'll Do.

Maintain infrastructure

Automate software delivery

Spool up large clusters

Digest performance data

Invent automatic mechanisms

Manage infrastructure complexity

Evolve infrastructure

How You'll Work.

Team & Collaboration

HPC customers; ML customers

Process & Methodology

CI/CD tools

Full Job Description

We are seeking an experienced engineer to join our team that owns the network stack for EC2 distributed AI/ML systems. The team develops support for a variety of frameworks and communication libraries including NCCL, NVSHMEM, NIXL, NCCL GIN, and Perplexity kernels. Solid knowledge of Linux, networking, and performant coding is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC/RDMA interconnects is highly valued. If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role at the forefront of AI/ML—you'll be working on features for the largest clusters, with the largest customers, for the largest AI models. . The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience. Key job responsibilities Be a senior engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks

Free ATS check

Applying for this Senior Software Engineer- ML Network Stack role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Annapurna Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →