Nscale
Technology
SiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Site Reliability Engineer at Nscale. Skills: Site Reliability Engineering, Automation, AI workloads. Build automation. Improve automation”
Industry & Context.
Troubleshooting; Incident response; Root cause analysis
What They're Looking For.
Must Have
2-5 years SRE/Systems/Software Engineering, 2+ years programming skills, Working knowledge of Linux, Working knowledge of networking, Working knowledge of distributed systems, Experience troubleshooting production issues
Nice to Have
Exposure to cloud platforms, Exposure to Kubernetes, Exposure to virtualized/bare-metal, Experience in AI workloads, Experience in GPU workloads, Experience in HPC, Basic understanding of high-performance networking, Exposure to production monitoring/alerting
What You'll Do.
Improve infrastructure
Support development of operational systems
Support development of platform services
Maintain monitoring dashboards
Participate in incident response
Participate in troubleshooting
Participate in post-incident reviews
Investigate performance issues
Resolve performance issues
Investigate reliability issues
Resolve reliability issues
Improve system stability
Contribute to availability
Contribute to scalability
Contribute to operational efficiency
Learn from senior engineers
How You'll Work.
Team & Collaboration
Collaborate with Engineering; Collaborate with Networking; Collaborate with Infrastructure
Full Job Description
About Nscale Nscale is the GPU cloud engineered for AI—purpose-built to deliver high-performance, cost-efficient infrastructure for AI-native startups and global enterprises. We enable organizations to accelerate innovation, reduce the complexity of AI development, and achieve meaningful business outcomes through scalable, sustainable compute. Our culture is defined by ownership, accountability, and rapid innovation. We operate with urgency and transparency, and every team member contributes to building the infrastructure powering the future of AI. What You’ll Be Doing Help build and improve automation, tooling, and infrastructure that supports AI workloads Support the development of operational systems and platform services Assist in defining and maintaining basic SLOs/SLIs and monitoring dashboards Participate in incident response, troubleshooting, and post-incident reviews Investigate and help resolve performance and reliability issues across systems Collaborate with Engineering, Networking, and Infrastructure teams to improve system stability Contribute to improving availability, scalability, and operational efficiency Learn from senior engineers and grow your expertise in reliability engineering What You Bring 2–5 years of experience in Site Reliability Engineering, Systems Engineering, or Software Engineering in Data Center Environment 2+ years programming skills (e.g., Python, Go, or similar) with interest in automation and tooling Working knowledge of Linux systems, networking concepts, and distributed systems Experience troubleshooting system or application issues in production environments Familiarity with monitoring or observability tools (e.g., logs, metrics, dashboards) Strong willingness to learn and improve reliability and operational practices Ability to work in fast-paced environments and collaborate across teams Preferred Experience Exposure to cloud platforms, Kubernetes, or virtualized/bare-metal environments Experience in AI, GPU workloads, or h
Applying for this Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nscale?
Real rants from real employees. Read before you apply.