NVIDIA

AI/HPC systems

Manager,SolutionsArchitecture-ContinuousBringupandOptimization

Tokyo, Japan FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Manager, Solutions Architecture - Continuous Bringup and Optimization at NVIDIA. Skills: Solutions Architecture, Continuous Bringup and Optimization, AI factory infrastructures, GPU-accelerated systems, datacenter environments, NVIDIA GPU, CPU and networking technologies, data center architecture and operations. Lead a team dedicated to consulting, optimizing, and improving the resiliency of customer AI factory infrastructures, ensuring high service quality and operational perfection.. Drive han”

What You'll Achieve.

ensuring high service quality and operational perfection; identifying areas for efficiency gains and operational improvements; enabling smooth, scalable AI deployments; detect bottlenecks; reduce downtime; ensure system health at scale; delivering resilient technical solutions

Industry & Context.

AI/HPC systems
Problems you'll solve

analytical, solving problems, and decision-making skills, capable of identifying root causes, driving continuous improvement, and delivering resilient technical solutions.

What They're Looking For.

Must Have

Over 4 years leading teams, 8+ overall years in service operations in large data centers, focusing on infrastructure performance, Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field, shown technical leadership in data center, server, and network operations, Proficiency in both Japanese and English, Deep expertise in data center architecture and operations, including servers, GPUs, NICs, networking topologies, storage systems, and Linux-based environments, analytical, solving problems, and decision-making skills, communication, time management, and organizational skills, experience in leading complex projects, guiding technical teams, and meeting important metrics

Nice to Have

Deep familiarity with AI infrastructure and workflows, including training/inference pipelines, MLOps/DevOps tools, containerization (Docker, Kubernetes), and large-scale system deployments, Knowledge of data center infrastructure operations, including safety, security, environmental controls, and standard operating procedures, interpersonal and collaboration skills

What You'll Do.

Lead a team dedicated to consulting

and improving the resiliency of customer AI factory infrastructures

ensuring high service quality and operational perfection.

Drive hands-on infrastructure analysis and tuning of complex GPU-accelerated systems

and datacenter environments

identifying areas for efficiency gains and operational improvements.

Act as a technical authority on NVIDIA GPU

CPU and networking technologies

supporting customer discussions

architecture reviews.

Establishing and evolving optimization and monitoring methodologies

using analytics and tooling to detect bottlenecks

and ensure system health at scale.

Participate in customer-facing engagements

including roadmap sessions

post-deployment reviews

and incident retrospectives

helping to craft the customer experience and influence NVIDIA’s infrastructure strategy.

How You'll Work.

Team & Collaboration

Work closely with internal teams (Engineering, Product, Sales) and customer collaborators to align infrastructure strategies with business goals, enabling smooth, scalable AI deployments.; supporting customer discussions, architecture reviews.; Participate in customer-facing engagements; interpersonal and collaboration skills, with the ability to lead discussions, influence outcomes, and build positive relationships with both internal and external collaborators.

Communication Scope

Proficiency in both Japanese and English, demonstrating clear communication of technical topics across multicultural teams and with customers.; communication, time management, and organizational skills; interpersonal and collaboration skills, with the ability to lead discussions, influence outcomes, and build positive relationships with both internal and external collaborators.

Process & Methodology

leading complex projects

Full Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. NVIDIA is looking for a Manager of Solution Architecture to lead NVIDIA Infrastructure Specialist Team, Continuous bringup and optimization. Academic and commercial groups around the world are using NVIDIA products to redefine deep learning and data analytics, and to power data centers. We are building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer! **What you 'll be doing:** * Lead a team dedicated to consulting, optimizing, and improving the resiliency of customer AI factory infrastructures, ensuring high service quality and operational perfection. * Drive hands-on infrastructure analysis and tuning of complex GPU-accelerated systems, AI workloads, and datacenter environments, identifying areas for efficiency gains and operational improvements. * Work closely with internal teams (Engineering, Product, Sales) and customer collaborators to align infrastructure strategies with business goals, enabling smooth, scalable AI deployments. * Act as a technical authority on NVIDIA GPU, CPU and networking technologies, supporting customer discussions, architecture reviews. * Establishing and evolving optimization and monitoring methodologies, using analytics and tooling to detect bottlenecks, reduce downtime, and ensure system health at scale. * Participate in customer-facing engagements, including roadmap sessions, post-deployment reviews, and incident retrospectives, helping to cr

Free ATS check

Applying for this Manager, Solutions Architecture - Continuous Bringup and Optimization role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →