NVIDIA

Artificial Intelligence

SeniorSystemSoftwareEngineer,NCCL-PartnerEnablement

Zurich, Switzerland FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior System Software Engineer, NCCL - Partner Enablement at NVIDIA. Skills: NCCL, C/C++ programming, high performance networking, Linux, Python, parallel programming, communication runtime. Engage with our partners and customers to root cause functional and performance issues reported with NCCL. Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters”

Industry & Context.

Artificial Intelligence
Problems you'll solve

root cause functional and performance issues; isolate issues

What They're Looking For.

Must Have

B. S. /M. S. degree in CS/CE or equivalent experience with 5+ years of relevant experience, Experience with parallel programming, at least one communication runtime (MPI, NCCL, UCX, NVSHMEM), Excellent C/C++ programming skills, debugging, profiling, code optimization, performance analysis, test design, Experience working with engineering or academic research community supporting HPC or AI, Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control, Expert in Linux fundamentals, a scripting language, preferably Python, Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible), Adaptability and passion to learn new areas and tools, Flexibility to work and communicate effectively across different teams and timezones

Nice to Have

Experience conducting performance benchmarking, developing infrastructure on HPC clusters, Prior system administration experience, esp for large clusters, Experience debugging network configuration issues in large scale deployments, Familiarity with CUDA programming and/or GPUs, Good understanding of Machine Learning concepts, experience with Deep Learning Frameworks such PyTorch, TensorFlow, Deep understanding of technology

What You'll Do.

Engage with our partners and customers to root cause functional and performance issues reported with NCCL

Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters

Develop tools and automation to isolate issues on new systems and platforms

including cloud platforms (Azure

Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters

Document and conduct trainings/webinars for NCCL

Engage with internal teams in different time zones on networking

infrastructure and support

How You'll Work.

Team & Collaboration

Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support; communicate effectively across different teams and timezones

Communication Scope

communicate effectively across different teams and timezones

Full Job Description

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We are looking for a motivated Partner Enablement Engineer to guide our key partners and customers with NCCL. Most DL/HPC applications run on large clusters with high-speed networking (Infiniband, RoCE, Ethernet). This is an outstanding opportunity to get an end to end understanding of the AI networking stack. Are you ready for to contribute to the development of innovative technologies and help realize NVIDIA's vision? **What you will be doing:** * Engage with our partners and customers to root cause functional and performance issues reported with NCCL * Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters * Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.) * Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters * Document and conduct trainings/webinars for NCCL * Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support. **What we need to see:** * B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM) * Excellent C/C++ programming skills, including debugging, profiling, code optimizati

Free ATS check

Applying for this Senior System Software Engineer, NCCL - Partner Enablement role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →