Nscale
AI
InfrastructureSoftwareEngineer,Fleet&Automation
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Infrastructure Software Engineer, Fleet & Automation at Nscale. Skills: Infrastructure automation, Fleet operations, AI infrastructure, GPU cloud. Perform technical architecture. Perform roadmap development”
What You'll Achieve.
Higher system availability; Reduced operational costs
Industry & Context.
Identify and resolve performance issues; Identify and resolve scalability issues; Troubleshooting large-scale infrastructure
What They're Looking For.
Must Have
Bachelor's degree in Computer Science, 5+ years relevant experience, Experience in utilizing languages such as C, C++, Java, Experience with scripting languages such as Python, Deep understanding of Linux operating systems, Networking fundamentals (TCP/IP, BGP), Familiarity with configuration management tools, Experience building, running and debugging large-scale infrastructure, Experience with compute technologies, Experience with storage, Experience with hardware architecture, Experience integrating with infrastructure tooling
Nice to Have
Master's degree or PhD, Experience designing, analyzing and improving efficiency, Experience analyzing and improving scalability, Experience analyzing and improving performance, Direct experience with AI/HPC infrastructure, Experience with NVIDIA GPUs, Experience with InfiniBand, Experience with high-speed Ethernet fabrics, Experience with related management software, Experience with advanced observability systems, Experience with monitoring systems, Familiarity with cloud-native technologies, Familiarity with infrastructure-as-code principles, Demonstrated ability to integrate AI tools, Familiarity with SLOs/metrics measurement, Familiarity with logs/telemetry/metrics integration
What You'll Do.
Perform technical architecture
Perform roadmap development
Perform implementation for workflow automation systems
Drive architecture decisions
Identify performance issues
Identify scalability issues
Resolve performance issues
Resolve scalability issues
Establish technology direction
Establish product direction
Own end-to-end delivery of device provisioning
Own end-to-end delivery of validation workflows
Own end-to-end delivery of testing workflows
Own end-to-end delivery of remediation workflows
Design workflow orchestration systems
Build workflow orchestration systems
Partner with Infrastructure teams
Partner with Platform teams
Partner with SRE teams
Translate operational needs into automation
Establish engineering standards for reliability
Establish engineering standards for observability
Establish engineering standards for operational excellence
Help set up engineering best practices
Build production-grade Python systems
Assess impact to team software stack
Explore AI driven process improvement
Explore AI driven automation
Collaborate with cross-functional teams
Build efficient automated systems
Build interoperable automated systems
Build maintainable automated systems
How You'll Work.
Team & Collaboration
Cross-functional teams; Infrastructure teams; Platform teams; SRE teams; Broader engineering team
Process & Methodology
Roadmap planning
Full Job Description
. About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future. Overview As an Infrastructure Software Engineer for Fleet & Automation, you will be a critical member of the AI Infrastructure Operations team, responsible for ensuring the acceptance, performance, and scalability of our cutting-edge AI and High-Performance Computing (HPC) environments. Leveraging software engineering principles, you will focus on building and maintaining the control plane, tooling, and automation that supports Fleet Operations, Network Operations, and Observability functions. Your work will directly translate into higher system availability and reduced operational costs. Key Responsibilities Perform technical architecture, roadmap and implementation for workflow automation systems, driving architecture decisions that balance automation complexity, reliability, and maintainability. Identify and resolve performance and scalability issues. Establish technology and product direction in collaboration with other tech leads, managers, and senior leadership. Own end-to-end delivery of device provisioning, validation, testing, and remediation workflows at scale. Design and build workflow orchestration systems for hardware lifecycle m
Applying for this Infrastructure Software Engineer, Fleet & Automation role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nscale?
Real rants from real employees. Read before you apply.