Shield AI
deep-tech
PrincipalEngineer,AIInfrastructure
“Principal Engineer, AI Infrastructure at Shield AI. Skills: AI Infrastructure, ML Infrastructure at Scale, Data Platform, Compute Strategy, MLOps, Model Deployment. Define and operate the core AI and data platform across training, simulation, data management, evaluation, and deployment. Own where and how workloads run across on-premise, cloud, and hybrid environments”
What You'll Achieve.
Faster iteration from idea to trained model to evaluated result; High utilization of compute resources with clear visibility into usage and cost; Simulation capacity that supports large-scale training without bottlenecks; Consistent end-to-end lifecycle: development, evaluation, deployment, monitoring, and retraining; Repeatable data loop: telemetry, scenario extraction, retraining, and redeployment; Reliable deployment of optimized models to edge systems; Broad platform adoption across autonomy programs; Repeatable approach for deploying AI infrastructure in customer environments; Representative performance targets: Training iteration cycles measured in days, not weeks; Sustained high utilization of GPU resources under production workloads
Industry & Context.
Ability to debug and resolve system issues when needed
classified systems, air-gapped systems, SCIFs
What They're Looking For.
Must Have
Experience building and operating ML infrastructure at scale (100+ GPU clusters, distributed systems), Experience defining compute strategy, including on-premise vs cloud tradeoffs, capacity planning, and cost management, understanding of ML workloads, including foundation models, RL/MARL, simulation-based training, and fine-tuning, Experience building data platforms with dataset versioning, lineage, and cataloging, Ability to debug and resolve system issues when needed
Nice to Have
Experience in defense or classified environments (e.g., air-gapped systems, SCIFs), Experience with simulation-heavy ML systems (robotics, autonomy, or similar domains), Experience deploying and optimizing models for edge hardware, Familiarity with HPC systems (schedulers, parallel storage, high-speed networking)
What You'll Do.
Define and operate the core AI and data platform across training
Own where and how workloads run across on-premise
and hybrid environments
Drive capacity planning
and cost-per-compute decisions
including support for classified and air-gapped systems
Build infrastructure for distributed training (supervised learning
foundation models) and large-scale
multi-fidelity simulation
Ensure training and simulation systems operate together without bottlenecks
Ingest and manage multi-modal sensor data (EO
Establish dataset versioning
and classification-aware storage and access controls
Establish a consistent workflow for experiment tracking
and automated validation
Implement evaluation and V&V gates so models meet defined standards before deployment
Own the pipeline from training to deployment
including model optimization (e.g.
deployment to edge systems
and retraining triggers
Define how AI infrastructure is deployed in customer environments across on-premise
and sovereign settings
Establish a consistent approach that avoids one-off solutions while adapting to operational constraints
and workflows across teams
Reduce duplication while maintaining flexibility where needed
Work directly with Hivemind and other autonomy teams to ensure the platform supports real workloads and evolves with program needs
How You'll Work.
Team & Collaboration
Cross-Team Partnership
Applying for this Principal Engineer, AI Infrastructure role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Lever
- Lever uses a streamlined one-page form — apply in under 5 minutes.
- LinkedIn import works well; review parsed data before submitting.
- The cover letter field is optional but visible to reviewers — use it to differentiate.
- Referral codes from employees can significantly boost visibility of your application.
ANONYMOUS · UNFILTERED
What do employees actually say about Shield AI?
Real rants from real employees. Read before you apply.