MEDAL

Technology

SiteReliability/InfrastructureEngineer

$150–275k New York, New York, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Site Reliability / Infrastructure Engineer at MEDAL. Skills: Reliability engineering, Infrastructure management, Database scaling. Own reliability across GCP infrastructure. Drive improvements to availability”

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting

Eligibility Requirements

On-call rotation

What They're Looking For.

Must Have

Startup experience, Scaling databases, GCP experience, Terraform experience, Elasticsearch production experience, Incident response experience, GitHub Actions experience

Nice to Have

Kubernetes experience, IAM experience, Cloud Logging experience, Managed services ecosystem experience

What You'll Do.

Own reliability across GCP infrastructure

Drive improvements to availability

Drive improvements to latency

Lead incident response

Manage on-call rotations

Ensure recurrence prevention

Architect database scaling strategies

Execute database scaling strategies

Perform capacity planning

Translate requirements into infrastructure designs

Manage Terraform-managed GCP environment

Evolve Kubernetes cluster configurations

Own Elasticsearch cluster

Perform Elasticsearch capacity planning

Manage Elasticsearch sharding strategy

Manage index lifecycle

Perform Elasticsearch version upgrades

Tune Elasticsearch performance

Build observability across stack

Maintain observability across stack

Improve CI/CD reliability

Improve delivery pipelines

Harden secrets management

Harden network segmentation

How You'll Work.

Team & Collaboration

Partner with product engineering

Communication Scope

Flag issues clearly; Communicate rapidly; Lead postmortems; Write actionable postmortems

Full Job Description

THE COMPANY MEDAL Medal is the world’s largest and fastest-growing platform for gaming clips, where millions of gamers capture, share, and relive their best moments. Every year, our players record billions of clips, each representing a unique, action-packed highlight. We’re building the next generation of gaming communities: social, monetized, and creator-powered. Our mission is to design products that make sharing, discovering, and connecting around gaming moments seamless and fun. We raised a seed round of $133M from General Catalyst and Khosla to discover the next generation of intelligence. THE ROLE Medal's infrastructure handles billions of clips, video ingestion pipelines, and social features at a massive scale most engineers never get to touch. We're looking for an SRE who cares deeply about reliability and scalability. The work centers on reliability, incident response, scaling, and making sure our infrastructure keeps up with our growth. You'll own the on-call rotation, drive postmortems, and work directly with engineering teams to meet their infra needs. The right person probably came through startups and scale-ups. You've been in the room when things broke at 2am, you've scaled databases under pressure, and you know the difference between a durable fix and a patch that buys you a week. KEY RESPONSIBILITIES - Own reliability across our GCP infrastructure: Kubernetes clusters, managed services, and data pipelines, driving measurable improvements to availability and latency - Lead incident response end-to-end: on-call rotations, runbooks, postmortems, and the follow-through that makes sure the same thing doesn't happen twice - Architect and execute database scaling strategies (sharding, replication, query optimization, and capacity planning) across MySQL and Postgres at meaningful scale - Partner with product engineering to translate feature requirements into infrastructure designs that hold up as we grow - Manage and evolve our Terraform-managed GCP environ

Free ATS check

Applying for this Site Reliability / Infrastructure Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about MEDAL?

Real rants from real employees. Read before you apply.

Read Company Rants →