NVIDIA

Technology

SeniorSiteReliabilityEngineering

₹35–55L ~AI est. Bengaluru, India FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Site Reliability Engineering at NVIDIA. Skills: Site Reliability Engineering, Storage Systems, Automation, Observability. Lead design, deployment, and operations of production NAS,. Ensure reliability, performance, and security of storage platforms”

Industry & Context.

Technology

Problems you'll solve

Troubleshooting; Root cause analysis; Data-driven analysis

Eligibility Requirements

On-call rotation

What They're Looking For.

Must Have

12+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering, Significant focus on storage systems, Hands-on experience with design, deployment, and operations of enterprise-grade NAS, SAN, and/or Object Storage platforms, Solid understanding of SRE concepts, Proficiency with Infrastructure as Code and configuration management tools, Proficiency with source control systems, Experience building and operating highly available, scalable infrastructure, Experience with automation for provisioning, monitoring, and remediation, Experience with container and virtualization platforms, Experience with modern CI/CD and version control tools, Scripting or programming skills, Excellent communication and collaboration skills, Ability to work effectively across distributed and cross-functional teams, Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field or equivalent practical experience

Nice to Have

Experience with storage for high-performance computing, Experience with storage for AI/ML workloads, Experience with storage for large-scale data analytics, Proven ability to debug complex, distributed systems, Proven ability to debug storage performance issues, History of driving reliability improvements through data-driven analysis, History of driving reliability improvements through automation, Experience leading technical initiatives, Mentoring engineers, Acting as a technical lead on critical projects

What You'll Do.

and operations of production NAS

and security of storage platforms

Capture requirements from partner teams

Architect storage solutions

Drive end-to-end implementation for new and existing services

and improve automation for provisioning

and improve automation for configuration

and improve automation for monitoring

and improve automation for incident response

and improve automation for lifecycle management

Participate in on-call and incident response

Lead troubleshooting of complex storage and performance issues

Drive root cause analysis and preventive actions

Define and track SLOs/SLIs for storage services

Define and track error budgets for storage services

Use observability and analytics to continuously improve reliability

Use observability and analytics to continuously improve efficiency

Build and maintain runbooks for storage services

Build and maintain standard operating procedures for storage

Build and maintain comprehensive documentation for storage services

Analyze capacity and usage trends

Recommend scaling strategies

Recommend optimization strategies

Collaborate closely with SRE teams

Collaborate closely with infrastructure teams

Collaborate closely with networking teams

Collaborate closely with application teams

Mentor junior engineers

Drive adoption of SRE principles across the team

How You'll Work.

Team & Collaboration

Partner teams; SRE teams; Infrastructure teams; Networking teams; Application teams; Distributed teams; Cross-functional teams

Communication Scope

Technical communication

Process & Methodology

Technical initiatives

Full Job Description

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an outstanding legacy of innovation that’s fueled by phenomenal technology – and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. We are seeking a Senior Site Reliability Engineer – Storage, you will own the reliability, performance, and scalability of our global NAS, SAN, and Object Storage platforms that power critical internal and external services. You will combine deep storage expertise with strong automation and SRE practices to design, build, and operate highly available storage systems at scale. **What You Will Be Doing:** * Lead design, deployment, and operations of production NAS, SAN, and Object Storage platforms, ensuring reliability, performance, and security. * Capture requirements from partner teams, architect storage solutions, and drive end‑to‑end implementation for new and existing services. * Develop, maintain, and improve automation for provisioning, configuration, monitoring, incident response, and lifecycle management of storage infrastructure. * Participate in on‑call and incident response, lead troubleshooting of complex storage and performance issues, and drive root cause analysis and preventive actions. * Define and track SLOs/SLIs and error budgets for storage services, using observability and analytics to continuously improve reliability and efficiency. * Build and maintain runbooks, standard operating procedures, and comprehensive documentation for storage services and automation. * Analyze capacity and usage trends, perform forecasting, and recommend scaling or optimization strategies to

Free ATS check

Applying for this Senior Site Reliability Engineering role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 32 detected · ranked by frequency

Storage Systems ×5

Automation ×5

Observability ×5

Incident response ×3

Root cause analysis ×3

SLO/SLI tracking ×3

Error budget management ×3

Analytics ×3

Capacity forecasting ×3

Optimization strategies ×3

Site Reliability Engineering ×2

Terraform ×2

Ansible ×2

Puppet ×2

SaltStack ×2

Docker ×2

Kubernetes ×2

NAS

SAN

Object Storage

Infrastructure as Code

Python

Shell

Storage expertise

SRE practices

System design

Capacity planning

Performance analysis

Trend analysis

Forecasting

Git

BEHAVIOURAL

Leadership

Role Details

Seniority senior

Experience 12–10 yrs

Level Senior

Work Mode Remote

Type FULL TIME

Education Bachelor's

Salary Band 200k+

AI-Extracted Insights

Domain Areas

nassanobject-storagestorage-systemssre-conceptshigh-performance-computingai-ml-workloadsdata-analytics

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →