Lob

SeniorPlatformEngineer

$160–178k United States Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Platform Engineer at Lob. Skills: observability, infrastructure optimization, Datadog, OpenTelemetry, HashiCorp Nomad, AWS, cost optimization, performance testing, platform reliability. Building and improving observability across distributed systems and services. Designing dashboards, alerting, metrics, tracing, and telemetry pipelines”

What You'll Achieve.

scale and improve the reliability, observability, performance, and cost efficiency of our platform infrastructure; build highly visible, scalable, and operationally efficient systems while actively reducing unnecessary infrastructure spend; improve telemetry, monitoring, performance testing, platform reliability, and cloud infrastructure efficiency; optimize S3 storage utilization, lifecycle management, and storage costs; improve infrastructure utilization and operational efficiency across Nomad workloads; identify and implement AWS cost-saving opportunities; improve platform reliability, scalability, and infrastructure efficiency; improve incident detection, troubleshooting, and operational response capabilities; Drive infrastructure cost optimization initiatives; recommend performance and cost efficiency improvements; improve scalability, reliability, operational visibility, and infrastructure efficiency

Industry & Context.

Problems you'll solve

troubleshooting and performance analysis skills across distributed systems

What They're Looking For.

Must Have

7+ years of experience in platform engineering, infrastructure engineering, or site reliability engineering, hands-on experience with HashiCorp Nomad, Deep expertise with Datadog, experience implementing and operating observability platforms using OpenTelemetry and modern monitoring tooling, Experience with Grafana or similar visualization and observability platforms, understanding of distributed tracing, metrics, logging, and monitoring best practices, Experience building dashboards, alerts, telemetry pipelines, and operational visibility tooling, experience identifying and implementing AWS cost optimization strategies in production environments, knowledge of S3 optimization, lifecycle management, and storage cost reduction, Experience building and running performance/load testing environments, troubleshooting and performance analysis skills across distributed systems, experience operating infrastructure in AWS environments, experience with Terraform and infrastructure-as-code practices, Experience balancing platform reliability, observability, and infrastructure cost efficiency at scale, Experience working with distributed and event-driven architectures using technologies such as Redis, SQS, or Temporal, Experience managing and tuning Elasticsearch or OpenSearch clusters, Experience working in fast-paced engineering environments

Nice to Have

Exposure to PostgreSQL RDS to Aurora migrations, Experience with Kubernetes, Experience with CI/CD systems and deployment automation, Experience with Go, Python, or TypeScript

What You'll Do.

Building and improving observability across distributed systems and services

and telemetry pipelines

Improving operational visibility using Datadog

Helping evolve and mature the organization’s observability strategy and tooling

Supporting and improving HashiCorp Nomad orchestration environments

Identifying and implementing AWS cost-saving opportunities across compute

and platform infrastructure

Improving infrastructure utilization and operational efficiency across Nomad workloads

Optimizing S3 storage utilization

Designing and maintaining performance testing environments and tooling

Running load and performance tests to identify bottlenecks and scalability issues

Managing and tuning Elasticsearch/OpenSearch environments

Troubleshooting production performance issues across services

Partnering with engineering teams to improve platform reliability

and infrastructure efficiency

Lead observability initiatives across infrastructure and applications

Design and maintain monitoring

Build actionable visibility into platform health

Improve incident detection

and operational response capabilities

Define observability standards and best practices across engineering teams

Drive infrastructure cost optimization initiatives across AWS services and platform environments

Analyze infrastructure utilization and recommend performance and cost efficiency improvements

Maintain and improve infrastructure-as-code standards and workflows

and maintain scalable performance testing environments and tooling

Execute and analyze load/performance testing initiatives

Support and improve Nomad-based orchestration environments

Troubleshoot complex production and infrastructure issues across distributed systems

Collaborate closely with engineering teams to improve scalability

operational visibility

and infrastructure efficiency

Create and maintain operational documentation and platform best practices

How You'll Work.

Team & Collaboration

Work closely with engineering teams to improve telemetry, monitoring, performance testing, platform reliability, and cloud infrastructure efficiency; Partnering with engineering teams to improve platform reliability, scalability, and infrastructure efficiency; Collaborate closely with engineering teams to improve scalability, reliability, operational visibility, and infrastructure efficiency

Communication Scope

communication and collaboration skills

Full Job Description

Lob was founded in 2013 by technical co-founders with a vision to connect the world one mailbox at a time. Today, we're transforming the way businesses use direct mail and bringing the power of technology to a traditionally manual channel. Our modern logistics and fulfillment engine helps businesses to build and scale high-quality, personalized direct mail programs without the operational burden. As we grow to meet the evolving needs of our customers and expand our product offerings, we’re building a team to shape the future of direct mail. About The Role We are looking for a Senior Platform Engineer to help scale and improve the reliability, observability, performance, and cost efficiency of our platform infrastructure. This role is focused on observability engineering and infrastructure optimization across AWS environments. The ideal candidate has deep hands-on experience with Datadog, OpenTelemetry, and HashiCorp Nomad, and understands how to build highly visible, scalable, and operationally efficient systems while actively reducing unnecessary infrastructure spend. You will work closely with engineering teams to improve telemetry, monitoring, performance testing, platform reliability, and cloud infrastructure efficiency across a fast-moving distributed environment, including leveraging modern AI-driven tooling and operational workflows where appropriate. What You’ll Work On Building and improving observability across distributed systems and services Designing dashboards, alerting, metrics, tracing, and telemetry pipelines Improving operational visibility using Datadog, and OpenTelemetry Helping evolve and mature the organization’s observability strategy and tooling Supporting and improving HashiCorp Nomad orchestration environments Identifying and implementing AWS cost-saving opportunities across compute, storage, and platform infrastructure Improving infrastructure utilization and operational efficiency across Nomad workloads Optimizing S3 storage utilization,

Free ATS check

Applying for this Senior Platform Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Lob?

Real rants from real employees. Read before you apply.

Read Company Rants →