Grafana Labs
Observability
StaffSoftwareEngineer-GrafanaCloudk6
Neural analysis suggests this role is
optimal for Staff candidates.
“Staff Software Engineer - Grafana Cloud k6 at Grafana Labs. Skills: Staff Software Engineer, Grafana Cloud k6, performance testing, distributed systems, SRE practices, reliability engineering, SLIs/SLOs, error budgets, observability, incident management. Contribute hands-on to the codebase by designing and implementing production-quality software. Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems”
What You'll Achieve.
establish and scaling a cross-team culture of engineering excellence; setting standards and guiding adoption of engineering practices that improve reliability and operational ownership; contributing architectural and technical depth beyond operational excellence; ensure resilient, high-performing systems; run performance tests at scale; load test their systems by running distributed tests from 15+ regions worldwide, using hundreds of thousands of virtual users sending millions of requests per second; ingest huge volumes of data generated by k6; view, correlate and analyze metrics from each test; improve reliability and operational ownership; contributing architectural and technical depth beyond operations
Industry & Context.
on-call rotation, primary escalation point, In-Person onboarding
What They're Looking For.
Must Have
programming background in a modern language (Python and Go are primary, but prior experience is not required), Experience designing, building, and operating large-scale distributed systems, experience with SRE practices, including operating and evolving production systems at scale, understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes), experience of defining or applying SLIs/SLOs, error budgets, or reliability metrics, Experience with test automation, including performance and functional testing, Ability to influence engineering practices through clear technical communication, reviews, and collaboration, interpersonal skills and ability to work effectively across teams, Familiarity with modern software engineering processes and delivery practices, Self-driven and comfortable operating with a high degree of autonomy and ambiguity, Experience participating in blameless incident response and writing high-quality post-incident reviews
Nice to Have
Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS), Familiarity with observability tooling and platforms (e. g. the Grafana stack), Experience working with Python, Go, JavaScript and/or Jsonnet, Experience building or operating event-driven or asynchronous systems, Interest in, or experience with, building testing frameworks or developer tooling
What You'll Do.
Contribute hands-on to the codebase by designing and implementing production-quality software
Guide teams in the design
and operation of large-scale
distributed cloud systems
Build and scale a culture of operational excellence by defining standards and coaching teams to own reliability and availability
Help mature SRE practices
including incident response and PIRs
and release/change management
Establish reliability frameworks such as SLIs/SLOs and error budgets
and use them to guide prioritization and engineering trade-offs
Provide visibility into system health through clear operational metrics and reliability reporting
Participate in the on-call rotation as a primary escalation point and contribute to incident resolution
Influence product and system direction through design reviews
architectural discussions
and cross-team collaboration
Share knowledge through clear
high-quality documentation and technical communication—internally and
externally—to help teams build and operate systems more effectively
As the reliability foundation matures
grow into broader application and product development leadership
contributing architectural and technical depth beyond operations
How You'll Work.
Team & Collaboration
work effectively across teams; cross-team collaboration; collaboration
Communication Scope
clear technical communication; technical communication; clear, high-quality documentation; transparent communication
Full Job Description
Grafana Labs is a remote-first, open-source powerhouse. There are more than 20M users of Grafana, the open source visualization tool, around the globe, monitoring everything from beehives to climate change in the Alps. The instantly recognizable dashboards have been spotted everywhere from a NASA launch and Minecraft HQ to Wimbledon and the Tour de France. Grafana Labs also helps more than 3,000 companies -- including Bloomberg, JPMorgan Chase, and eBay -- manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo). We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do. You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity. This is a remote opportunity, and we would be interested in applicants in Spain time zones Staff Software Engineer - Grafana Cloud k6 The Opportunity We are the team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics, used by teams globally to ensure resilient, high-performing systems. This opportunity is with the Grafana Cloud k6 squad, who build and operate our performance testing product. Grafana Cloud k6 is built around the OSS k6 and targeted at users looking to run performance tests at scale. Our enterprise and SaaS offerings allow customers to load test their systems by running distributed tests from 15+ regions worldwide, using hundreds of thousands of virtual users sending millions of requests per second. We ingest huge volumes of data generated by k6, which can be used to view, correlate and analyze
Applying for this Staff Software Engineer - Grafana Cloud k6 role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Grafana Labs?
Real rants from real employees. Read before you apply.