Lambda

AI cloud infrastructure

SoftwareEngineer-Fleet

$203–300k San Francisco, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Entry candidates.

The Brief

“Software Engineer - Fleet at Lambda. Skills: Go, Python, Linux, Configuration management, Infrastructure automation, Hardware debugging. Design, implement, and improve software that powers GPU fleet lifecycle management and machine configuration at scale. Build and enhance automation frameworks for machine provisioning, configuration management, and deployment”

Industry & Context.

AI cloud infrastructure

Problems you'll solve

Independently troubleshoot complex systems; Investigate failures across BIOS, BMC, firmware, networking, storage, and boot flows; Diagnosing issues involving drivers, firmware, and hardware compatibility across GPU servers

Eligibility Requirements

Presence in our San Francisco/San Jose or Bellevue office location 4 days per week

What They're Looking For.

Must Have

2+ years of experience working with Go (Golang) or Python in production environments, 2+ years of experience with configuration management tools and practices, Comfortable working in Linux environments and debugging issues at the OS, hardware, and networking layers, Can independently troubleshoot complex systems and communicate effectively across software, infrastructure, and vendor teams

Nice to Have

Experience with Go in infrastructure, systems, or backend development, Hands-on experience with bare metal provisioning and lifecycle management, including technologies such as Redfish, BMC, IPMI, DHCP, and PXE, Experience diagnosing issues involving drivers, firmware, and hardware compatibility across GPU servers, Experience incorporating AI-assisted development tools into engineering workflows, including code generation, debugging, test development, and documentation, Experience building Linux distributions or managing OS customization and imaging, Familiarity with Ansible for system configuration and automation, Exposure to Kubernetes and container orchestration concepts

What You'll Do.

and improve software that powers GPU fleet lifecycle management and machine configuration at scale

Build and enhance automation frameworks for machine provisioning

configuration management

and production readiness for new server and accelerator platforms

Improve and refine workflows for bare metal provisioning

and system health monitoring

Investigate failures across BIOS

How You'll Work.

Team & Collaboration

Work closely with infrastructure, security, and product engineering teams to develop scalable and maintainable solutions; Communicate effectively across software, infrastructure, and vendor teams

Communication Scope

Communicate effectively across software, infrastructure, and vendor teams

Full Job Description

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU. If you'd like to build the world's best AI cloud, join us. *Note: This position requires presence in our San Francisco/San Jose or Bellevue office location 4 days per week; Lambda’s designated work from home day is currently Tuesday. Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda cloud console, APIs, and systems, as well as internal tooling for system deployment, management, and maintenance. What You’ll Do - Develop and Maintain Production Systems: Design, implement, and improve software that powers GPU fleet lifecycle management and machine configuration at scale. - Automate Infrastructure: Build and enhance automation frameworks for machine provisioning, configuration management, and deployment. - Support New Hardware Introduction (NPI): Enable bring-up, validation, and production readiness for new server and accelerator platforms. - Enhance Machine Lifecycle Processes: Improve and refine workflows for bare metal provisioning, firmware updates, and system health monitoring. - Debug Hardware and Firmware Issues: Investigate failures across BIOS, BMC, firmware, networking, storage, and boot flows. - Collaborate Across Teams: Work closely with infrastructure, security, and product engineering teams to develop scalable and maintainable solutions. You - Have 2+ years of experience working with Go (Golang) or Python in production environments. - Have 2+ years of experience with configuration management tools and practices. - Are comfortable working in Linux environments and debugging issues at the OS, hardware, and networking layers. - Can independently troubleshoot complex systems and c

Free ATS check

Applying for this Software Engineer - Fleet role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 30 detected · ranked by frequency

Infrastructure automation ×5

Go ×3

Python ×3

Linux ×3

Software development ×3

System design ×3

Hardware bring-up ×3

Hardware validation ×3

Firmware debugging ×3

Networking debugging ×3

Storage debugging ×3

Boot flow debugging ×3

OS customization ×3

Container orchestration ×3

Configuration management ×2

Hardware debugging ×2

Redfish ×2

BMC ×2

IPMI ×2

DHCP ×2

PXE ×2

Ansible ×2

Golang

Kubernetes

Configuration management tools and practices

Bare metal provisioning

Lifecycle management

System configuration

Automation

Configuration management tools

BEHAVIOURAL

Communicate effectively across software, infrastructure, and vendor teams

Role Details

Experience 2–2 yrs

Level Entry

Work Mode Hybrid

Type FULL TIME

Category fleet-engineering

Salary Band 200k+

AI-Extracted Insights

Domain Areas

ai-cloud-infrastructuregpu-fleet-lifecycle-managementmachine-configuration-at-scalebare-metal-provisioninghardware-compatibility-across-gpu-servers

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Lambda?

Real rants from real employees. Read before you apply.

Read Company Rants →