NVIDIA
AI/HPC systems
Manager,SolutionsArchitecture-ContinuousBringupandOptimization
Neural analysis suggests this role is
optimal for Senior candidates.
“Manager, Solutions Architecture - Continuous Bringup and Optimization at NVIDIA. Skills: Solutions Architecture, Continuous Bringup and Optimization, AI factory infrastructures, GPU-accelerated systems, datacenter environments, NVIDIA GPU, CPU and networking technologies, data center architecture and operations. Lead a team dedicated to consulting, optimizing, and improving the resiliency of customer AI factory infrastructures, ensuring high service quality and operational perfection.. Drive han”
What You'll Achieve.
ensuring high service quality and operational perfection; identifying areas for efficiency gains and operational improvements; enabling smooth, scalable AI deployments; detect bottlenecks; reduce downtime; ensure system health at scale; delivering resilient technical solutions
Industry & Context.
analytical, solving problems, and decision-making skills, capable of identifying root causes, driving continuous improvement, and delivering resilient technical solutions.
What They're Looking For.
Must Have
Over 4 years leading teams, 8+ overall years in service operations in large data centers, focusing on infrastructure performance, Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field, shown technical leadership in data center, server, and network operations, Proficiency in both Japanese and English, Deep expertise in data center architecture and operations, including servers, GPUs, NICs, networking topologies, storage systems, and Linux-based environments, analytical, solving problems, and decision-making skills, communication, time management, and organizational skills, experience in leading complex projects, guiding technical teams, and meeting important metrics
Nice to Have
Deep familiarity with AI infrastructure and workflows, including training/inference pipelines, MLOps/DevOps tools, containerization (Docker, Kubernetes), and large-scale system deployments, Knowledge of data center infrastructure operations, including safety, security, environmental controls, and standard operating procedures, interpersonal and collaboration skills
What You'll Do.
Lead a team dedicated to consulting
and improving the resiliency of customer AI factory infrastructures
ensuring high service quality and operational perfection.
Drive hands-on infrastructure analysis and tuning of complex GPU-accelerated systems
and datacenter environments
identifying areas for efficiency gains and operational improvements.
Act as a technical authority on NVIDIA GPU
CPU and networking technologies
supporting customer discussions
architecture reviews.
Establishing and evolving optimization and monitoring methodologies
using analytics and tooling to detect bottlenecks
and ensure system health at scale.
Participate in customer-facing engagements
including roadmap sessions
post-deployment reviews
and incident retrospectives
helping to craft the customer experience and influence NVIDIA’s infrastructure strategy.
How You'll Work.
Team & Collaboration
Work closely with internal teams (Engineering, Product, Sales) and customer collaborators to align infrastructure strategies with business goals, enabling smooth, scalable AI deployments.; supporting customer discussions, architecture reviews.; Participate in customer-facing engagements; interpersonal and collaboration skills, with the ability to lead discussions, influence outcomes, and build positive relationships with both internal and external collaborators.
Communication Scope
Proficiency in both Japanese and English, demonstrating clear communication of technical topics across multicultural teams and with customers.; communication, time management, and organizational skills; interpersonal and collaboration skills, with the ability to lead discussions, influence outcomes, and build positive relationships with both internal and external collaborators.
Process & Methodology
leading complex projects
Full Job Description
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. NVIDIA is looking for a Manager of Solution Architecture to lead NVIDIA Infrastructure Specialist Team, Continuous bringup and optimization. Academic and commercial groups around the world are using NVIDIA products to redefine deep learning and data analytics, and to power data centers. We are building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer! **What you 'll be doing:** * Lead a team dedicated to consulting, optimizing, and improving the resiliency of customer AI factory infrastructures, ensuring high service quality and operational perfection. * Drive hands-on infrastructure analysis and tuning of complex GPU-accelerated systems, AI workloads, and datacenter environments, identifying areas for efficiency gains and operational improvements. * Work closely with internal teams (Engineering, Product, Sales) and customer collaborators to align infrastructure strategies with business goals, enabling smooth, scalable AI deployments. * Act as a technical authority on NVIDIA GPU, CPU and networking technologies, supporting customer discussions, architecture reviews. * Establishing and evolving optimization and monitoring methodologies, using analytics and tooling to detect bottlenecks, reduce downtime, and ensure system health at scale. * Participate in customer-facing engagements, including roadmap sessions, post-deployment reviews, and incident retrospectives, helping to cr
Applying for this Manager, Solutions Architecture - Continuous Bringup and Optimization role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.