Saviynt

identity security

AIPlatformEngineer,TrainingandInference

Remote FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“AI Platform Engineer, Training and Inference at Saviynt. Skills: Ray ecosystem, distributed training on Ray, LLM inference mesh, model promotion lifecycle, RL training infrastructure, retrain pipeline, RAG retrieval integration. Own the Ray ecosystem end-to-end. Operate distributed training with Ray Train”

What You'll Achieve.

enable Saviynt's identity products to deliver measurable AI-powered outcomes

Industry & Context.

identity security
Problems you'll solve

debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag

What They're Looking For.

Must Have

Experience in ML engineering with time in an ML platform or MLOps role, Production Ray depth: Ray Train, Serve, Core, and Data — debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag, LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton — PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets, Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8), RL working knowledge: PPO, policy gradient, or RLHF — able to translate an algorithm into distributed compute primitives, Model lifecycle operations: MLflow registry, shadow/A/canary patterns, and auto-rollback on golden signal degradation, Vector databases: Pgvector or Qdrant — ANN index strategies, embedding upsert, and query latency tuning under inference load, Python and Flyte or equivalent ML orchestrator

Nice to Have

Quantization (nice to have): INT8/4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)

What You'll Do.

Own the Ray ecosystem end-to-end

Operate distributed training with Ray Train

Build and operate the LLM inference mesh with Ray Serve

Optimise inference performance

Design and operate the model routing layer

Build RL training infrastructure

Operate the full model promotion lifecycle

Operate the retrain pipeline

Integrate RAG retrieval into the inference mesh

Full Job Description

## Description AI Platform Engineer – Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit www.saviynt.com. The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle — from shadow mode through canary rollout to GA. The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine — it turns data into deployed models that make Saviynt's products smarter.   What You Will Be Doing • Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3 • Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines • Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing • Optimise inf

Free ATS check

Applying for this AI Platform Engineer, Training and Inference role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about Saviynt?

Real rants from real employees. Read before you apply.

Read Company Rants →