Nscale

Technology

PrincipalObservabilityPlatformEngineer

$150–215k United States
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Principal candidates.

The Brief

“Principal Observability Platform Engineer at Nscale. Skills: Observability platform, Platform engineering, Infrastructure engineering, AI/ML infrastructure. Own technical strategy for observability. Own architecture for observability”

What You'll Achieve.

Ensure platform scales ahead of business

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting; Diagnose failures

What They're Looking For.

Must Have

8+ years in SRE, 8+ years in infrastructure engineering, 8+ years in platform engineering, 8+ years in observability-focused roles, Operated observability infrastructure at scale, Proficient in Python, Proficient in Go, Comfortable owning complex systems end to end, Infrastructure-as-Code is default, Influence without authority

Nice to Have

Familiarity with GPU infrastructure, Familiarity with HPC environments, Slurm familiarity, Experience with high-volume streaming pipelines, Background in AI/ML infrastructure observability, Prior experience defining observability strategy

What You'll Do.

Own technical strategy for observability

Own architecture for observability

Drive platform decisions

Identify systemic gaps

Design platforms that make failure visible

Design platforms that make failure fast to diagnose

Partner with SRE teams

Partner with infrastructure teams

Partner with AI/ML teams

Embed observability natively

Define standards for engineers

Define patterns for engineers

Mentor observability team

Technically grow observability team

Lead incident postmortems

Use postmortems for platform improvements

How You'll Work.

Team & Collaboration

Partner with SRE; Partner with infrastructure; Partner with AI/ML teams

Communication Scope

Explain tradeoffs clearly

Process & Methodology

Technical strategy, Architecture roadmap

Full Job Description

Principal Observability Platform Engineer – Nscale About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale simplifies AI development while enabling superior results, supporting strategic business outcomes such as cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency while contributing to the technology that powers the future. About the Role As a Principal/Staff Observability Platform Engineer, you'll own the technical direction of Nscale's observability platform: the systems that give us deep visibility into GPU clusters, AI workloads, and the infrastructure running them. You treat observability as a product and a discipline, not a tooling exercise. You'll set the architectural roadmap, raise the engineering bar across teams, and ensure our platform scales ahead of the business, not behind it. You understand that complexity is a cost. Solutions that require constant babysitting don't scale, and neither does operational burden. The platforms you build should be simple to operate, easy to understand, and self-evidently correct when something goes wrong. This isn't a "maintain and operate" role. It's a "define, build, and lead" role. What You'll Do Own the technical strategy and architecture for observability across metrics, logs, traces, and alerting at scale. Drive platform decisions that have multi-year impact: tooling, data models, ingestion patterns, retention, cardinality management. Identify systemic gaps before they become incidents; design platforms that make failure visible and fast to diagnose. Partner with SRE, infrastructure, and AI/ML teams to embed observability natively into how Nscale bui

Free ATS check

Applying for this Principal Observability Platform Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nscale?

Real rants from real employees. Read before you apply.

Read Company Rants →