Nscale

AI

InfrastructureSoftwareEngineer,Fleet&Automation

$150–215k United States Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Infrastructure Software Engineer, Fleet & Automation at Nscale. Skills: Infrastructure automation, Fleet operations, AI infrastructure, GPU cloud. Perform technical architecture. Perform roadmap development”

What You'll Achieve.

Higher system availability; Reduced operational costs

Industry & Context.

AI
Problems you'll solve

Identify and resolve performance issues; Identify and resolve scalability issues; Troubleshooting large-scale infrastructure

What They're Looking For.

Must Have

Bachelor's degree in Computer Science, 5+ years relevant experience, Experience in utilizing languages such as C, C++, Java, Experience with scripting languages such as Python, Deep understanding of Linux operating systems, Networking fundamentals (TCP/IP, BGP), Familiarity with configuration management tools, Experience building, running and debugging large-scale infrastructure, Experience with compute technologies, Experience with storage, Experience with hardware architecture, Experience integrating with infrastructure tooling

Nice to Have

Master's degree or PhD, Experience designing, analyzing and improving efficiency, Experience analyzing and improving scalability, Experience analyzing and improving performance, Direct experience with AI/HPC infrastructure, Experience with NVIDIA GPUs, Experience with InfiniBand, Experience with high-speed Ethernet fabrics, Experience with related management software, Experience with advanced observability systems, Experience with monitoring systems, Familiarity with cloud-native technologies, Familiarity with infrastructure-as-code principles, Demonstrated ability to integrate AI tools, Familiarity with SLOs/metrics measurement, Familiarity with logs/telemetry/metrics integration

What You'll Do.

Perform technical architecture

Perform roadmap development

Perform implementation for workflow automation systems

Drive architecture decisions

Identify performance issues

Identify scalability issues

Resolve performance issues

Resolve scalability issues

Establish technology direction

Establish product direction

Own end-to-end delivery of device provisioning

Own end-to-end delivery of validation workflows

Own end-to-end delivery of testing workflows

Own end-to-end delivery of remediation workflows

Design workflow orchestration systems

Build workflow orchestration systems

Partner with Infrastructure teams

Partner with Platform teams

Partner with SRE teams

Translate operational needs into automation

Establish engineering standards for reliability

Establish engineering standards for observability

Establish engineering standards for operational excellence

Help set up engineering best practices

Build production-grade Python systems

Assess impact to team software stack

Explore AI driven process improvement

Explore AI driven automation

Collaborate with cross-functional teams

Build efficient automated systems

Build interoperable automated systems

Build maintainable automated systems

How You'll Work.

Team & Collaboration

Cross-functional teams; Infrastructure teams; Platform teams; SRE teams; Broader engineering team

Process & Methodology

Roadmap planning

Full Job Description

. About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future. Overview As an Infrastructure Software Engineer for Fleet & Automation, you will be a critical member of the AI Infrastructure Operations team, responsible for ensuring the acceptance, performance, and scalability of our cutting-edge AI and High-Performance Computing (HPC) environments. Leveraging software engineering principles, you will focus on building and maintaining the control plane, tooling, and automation that supports Fleet Operations, Network Operations, and Observability functions. Your work will directly translate into higher system availability and reduced operational costs. Key Responsibilities Perform technical architecture, roadmap and implementation for workflow automation systems, driving architecture decisions that balance automation complexity, reliability, and maintainability. Identify and resolve performance and scalability issues. Establish technology and product direction in collaboration with other tech leads, managers, and senior leadership. Own end-to-end delivery of device provisioning, validation, testing, and remediation workflows at scale. Design and build workflow orchestration systems for hardware lifecycle m

Free ATS check

Applying for this Infrastructure Software Engineer, Fleet & Automation role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nscale?

Real rants from real employees. Read before you apply.

Read Company Rants →