Epic Kids Inc.
Digital reading platform
SeniorSoftwareEngineer,Infrastructure
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Software Engineer, Infrastructure at Epic Kids Inc.. Skills: GCP infrastructure, Container platform (Docker, GKE), CI/CD pipelines, Observability stack, Terraform. Drive the stability and reliability of Epic's GCP infrastructure. Build and operate Epic's GCP infrastructure for high availability, scalability, and cost efficiency”
What You'll Achieve.
Driving the stability, observability, and overall reliability of Epic's platform; Setting reliability standards; Hardening the systems; Making sure issues are caught early and resolved fast; Setting and tracking SLOs/SLIs; Reducing toil; Engineering out recurring sources of instability; Ensuring high availability, scalability, and cost efficiency of GCP infrastructure; Ensuring fast, safe, low-risk delivery across engineering teams; Ensuring signals are actionable, noise is low, and on-call has the context to resolve issues quickly; Focus on consistency, change safety, and reproducibility in Terraform; Reliability as a first-class consideration; Supporting compliance-aware infrastructure practices as we mature our SOC 2 and student-data compliance programs
Industry & Context.
Problem-solving skills; Troubleshoot service and platform issues; Incident response; Blameless post-mortems
Fully remote, US-based role, Working closely with a global, bilingual (English–Chinese) engineering team, Comfort participating in a frequent production on-call rotation
What They're Looking For.
Must Have
Bachelor's degree or higher in Computer Science, Software Engineering, or a related field, 5+ years of experience in infrastructure, platform, DevOps, or a related engineering role, Hands-on experience with GCP (GCE, GCS, VPC, IAM, Cloud Monitoring, and related services), Experience with Docker and Kubernetes (GKE)—containerizing workloads, deploying to GKE, Helm, and cluster fundamentals, Experience with CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, or similar), Experience with an observability platform such as New Relic (metrics, logging, alerting, dashboards), Proficiency in Terraform for managing infrastructure as code, Scripting/programming skills in Python, Bash, or similar, Comfort participating in a frequent production on-call rotation, Track record of measurably improving reliability of production systems—e. g. , defining SLOs, reducing incident frequency or MTTR, eliminating recurring failure modes, Problem-solving skills, Sense of ownership, Ability to work effectively in evolving systems, Fluency in English for daily collaboration and technical documentation, Proficiency in Mandarin Chinese to collaborate effectively with global engineering and business partners
Nice to Have
Experience operating workflow orchestration platforms (e.g., Dagster, Airflow) as a service for data or platform teams, Familiarity with the operational footprint of data platforms (warehouse infrastructure, job schedulers, batch workloads), Experience in distributed or global engineering teams, Working knowledge of compliance frameworks (e.g., SOC 2, FERPA, COPPA) and GRC tools
What You'll Do.
Drive the stability and reliability of Epic's GCP infrastructure
Build and operate Epic's GCP infrastructure for high availability
Manage and harden our Docker and GKE container platform
Maintain and improve CI/CD pipelines
Own and evolve the observability stack
Write and maintain Terraform to codify infrastructure
Contribute to capacity planning
and architectural reviews
Champion platform security best practices
Support compliance-aware infrastructure practices
Operate the orchestration platform and supporting infrastructure
Collaborate with backend and data engineers to troubleshoot service and platform issues
Lead by example in a frequent on-call drive incident response
blameless post-mortems
and the follow-through that turns one-time outages into systemic
lasting reliability improvements
Provide guidance to developers on infrastructure concerns and best practices
How You'll Work.
Team & Collaboration
Partner closely with both our product engineering and data engineering teams; Collaborate with backend and data engineers to troubleshoot service and platform issues; Partner with data engineering to operate the orchestration platform and supporting infrastructure; Collaborate effectively with global engineering and business partners
Communication Scope
Fluency in English for daily collaboration and technical documentation; Proficiency in Mandarin Chinese to collaborate effectively with global engineering and business partners
Full Job Description
About Us Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. With a vast library of high-quality books and learning resources from 250+ of the world’s top publishers, Epic empowers kids to explore their interests, build literacy skills, and develop a lifelong love of reading. Through personalized recommendations and built-in progress tracking, Epic helps children build confidence and curiosity—while giving parents and educators meaningful insight into each child’s learning journey. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful. Position Summary The Senior Software Engineer, Infrastructure will play a key role in driving the stability, observability, and overall reliability of Epic's platform as we grow. You are an experienced engineer who works independently on complex infrastructure problems, makes sound technical decisions, and helps raise the bar for the engineers around you. You will own meaningful pieces of our GCP infrastructure, container platform, CI/CD pipelines, and observability stack—setting reliability standards, hardening the systems behind them, and making sure issues are caught early and resolved fast. You will partner closely with both our product engineering and data engineering teams to keep the platforms that power their applications and workflows running reliably. This is a fully remote, US-based role working closely with a global, bilingual (English–Chinese) engineering team. Key Responsibilities Drive the stability and reliability of Epic's GCP infrastructure—setting and tracking SLOs/SLIs, reducing toil, and engineering out recurring sources of instability Build and operate Epic's GCP infrastructure for high availability, scalability, and cost efficiency Manage and harden our Docker and GKE container platform, inc
Applying for this Senior Software Engineer, Infrastructure role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Epic Kids Inc.?
Real rants from real employees. Read before you apply.