Point Wild
cybersecurity
PrincipalPlatformEngineer
Neural analysis suggests this role is
optimal for Lead candidates.
“Principal Platform Engineer at Point Wild. Skills: GCP, GKE, Kubernetes, Terraform, Istio, Production ML workloads. Architect and lead the infrastructure strategy for next-generation Production ML platform on Google Cloud. Design, deploy, and maintain elastic scaling cloud infrastructure (GCP) and containerization tools like Kubernetes for high-performance ML workloads”
What You'll Achieve.
Ensure systems are elastic, secure, and resilient; Ensure long-term model reliability
Industry & Context.
Solve some of the most challenging problems in the ML space
Participate in on-call rotation, Ensure compliance with standards such as SOC
What They're Looking For.
Must Have
8 - 10+ years in DevOps/Platform Engineering, at least 2 years of experience specifically operating and maintaining production ML workloads, Deep, hands-on experience with GCP (VPC-SC, IAM, Organization Policies), Deep, hands-on experience with GKE (Cluster topology, Helm, Kustomize, and in-cluster operators like ArgoCD), High proficiency with Istio (VirtualServices, mTLS, sidecar injection), High proficiency with API Gateways (specifically Kong), Expert-level Terraform skills, specifically using an Atlantis/GitOps workflow across a massive, multi-hundred-file estate, Experience managing enterprise-grade identity and secrets (Auth0, Dex, ESO, or SOPS), Experience operating Airflow in production, Experience operating an ML-serving stack (e.g., Triton, vLLM, MLflow), Comfortable managing Cloud SQL (PostgreSQL), Comfortable managing BigQuery, Comfortable managing in-cluster datastores like Elasticsearch or ClickHouse, At least an upper-intermediate level of spoken and written English
Nice to Have
Past experience with continuous monitoring of model accuracy and detecting data/concept drift, Experience with Ansible for cluster bootstrap and recovery, Kubernetes (CKA/CKS) certifications, GCP Professional Cloud Architect certifications, GCP Security Engineer certifications, Familiarity with Loki, Familiarity with Grafana, Familiarity with managing ClickHouse at scale
What You'll Do.
Architect and lead the infrastructure strategy for next-generation Production ML platform on Google Cloud
and maintain elastic scaling cloud infrastructure (GCP) and containerization tools like Kubernetes for high-performance ML workloads
Build automated pipelines for training
and deploying machine learning models using tools like Jenkins
Implement observability tools to track model drift
and performance degradation in production
Implement comprehensive monitoring for system health (latency/uptime) alongside ML-specific metrics
Deploy tools that empower individual teams to monitor their workloads
Participate in on-call rotation
Help manage posture to ensure compliance with standards such as SOC
How You'll Work.
Team & Collaboration
Bridge the gap between data engineers, ML engineers, Backend and Frontend engineers to ensure smooth production operation
Communication Scope
Upper-intermediate level of spoken and written English
Full Job Description
Point Wild helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Point Wild is dedicated to creating the world’s most comprehensive portfolio of industry-leading cybersecurity solutions. Our vision is to become THE go-to resource for every cyber protection need individuals may face - today and in the future. Join us for the ride! We’re looking for a Principal Platform Engineer to architect and lead the infrastructure strategy for our next-generation Production ML platform on Google Cloud. In this role, you will be the backbone of our high-performance machine learning workloads, ensuring our systems are elastic, secure, and resilient. You won’t just maintain the status quo; you’ll build the "paved road" for our engineers, automating everything from model deployment to complex networking perimeters. We are a high-trust, outcome-focused team that moves quickly to solve some of the most challenging problems in the ML space. Core Responsibilities: Infrastructure Management: Design, deploy, and maintain elastic scaling cloud infrastructure (GCP) and containerization tools like Kubernetes for high-performance ML workloads. CI/CD Pipeline Development and maintenance: Build automated pipelines for training, testing, and deploying machine learning models using tools like Jenkins, GitHub Actions, or Airflow. Model Monitoring & Maintenance: Implement observability tools to track model drift, accuracy, latency, and performance degradation in production. Collaboration: Bridge the gap between data engineers, ML engineers, Backend and Frontend engineers to ensure smooth production operation. ML Observability: Implement comprehensive monitoring for system health (latency/uptime) alongside ML-specific metrics, such as feature drift, prediction accuracy, and data distribution shifts, to ensure long-term model reliability. Non ML workload and pro
Applying for this Principal Platform Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Point Wild?
Real rants from real employees. Read before you apply.