Forbes Advisor

personal finance

SREManager

Chennai, Tamil Nadu, India FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for mid candidates.

The Brief

“SRE Manager at Forbes Advisor. Skills: SRE, DevOps, incident management, observability, cloud infrastructure. Lead and manage production & non-production support ensuring high availability and system reliability. Drive SRE best practices including incident management, root cause analysis, and continuous improvement”

What You'll Achieve.

ensuring high availability and system reliability; quick resolution of impacting events; monitor and maintain the uptime of these systems in-line with the defined SLOs and SLAs; Track and report on incident metrics, identifying patterns and areas for systemic improvement

Industry & Context.

personal finance

Problems you'll solve

analytical skills to drive root cause analysis and trend identification; Identify and remove blockers, escalate appropriately, and continuous momentum of troubleshooting efforts; diagnose and troubleshoot issues proactively

Eligibility Requirements

managing critical incidents in a 24/7 production environment

What They're Looking For.

Must Have

12+ years of experience in SRE / DevOps, 5+ years of working experience as a Site Reliability Engineer, Experience managing critical incidents in a 24/7 production environment, Experience with ServiceNow ITSM and on‑call incident coordination via PagerDuty / Zen duty (or comparable ITSM/on‑call tools), Understand a wide breadth of technical concepts across SRE practices, Background in cloud-based systems and SRE practices is a must, Ability to use AI tools to synthesize communication, reports, and troubleshooting leads, leadership and decision-making skills under pressure, Excellent verbal and written communication skills for both technical and non-technical audiences, Ability to manage multiple priorities and deadlines in high-stakes situations, analytical skills to drive root cause analysis and trend identification, Familiarity with modern monitoring and incident management tools, Demonstrated ability to build consensus across diverse teams, Effective at maintaining calm and focus during critical situations, Knowledge of cloud infrastructure (e. g. , AWS, Azure) and application architecture, Proven track record of improving incident management processes, Attention to detail in documentation and follow-through, Adept at facilitating collaboration across remote and global teams, Proactive in identifying operational risks and implementing preventive measures, Committed to continuous learning and process improvement, Ethical, dependable, and resilient in challenging scenarios

Nice to Have

Experience in at-least one Observability platform like New Relic, Datadog, etc. preferred, Certification in AWS, ITIL, or related frameworks preferred, Experience in SaaS or technology product companies preferred

What You'll Do.

Lead and manage production & non-production support ensuring high availability and system reliability

Drive SRE best practices including incident management

and continuous improvement

Assume ownership of major incidents and drive coordinating efforts to ensure quick resolution of impacting events

Collaborate with SRE team members for design and development of observability practices like Dashboarding

Collaborate with SRE team members to define Service Level Objectives (SLO) and agreements (SLA) of critical systems.

Monitor and maintain the uptime of these systems in-line with the defined SLOs and SLAs.

Identify and remove blockers

escalate appropriately

and continuous momentum of troubleshooting efforts.

Ensure adherence to established incident management processes and protocols.

Contribute to the improvement of incident response runbooks and documentation.

Own internal and external communications during major incidents.

Translate technical details into business-impact language (scope

Maintain clear and continuous communication with stakeholders during incidents

providing timely updates.

Ensure safe execution of mitigations

Lead post incident review meetings with stakeholders to confirm event details and assign problem investigators.

Track and report on incident metrics

identifying patterns and areas for systemic improvement.

Augment Change Managers and / or Problem Managers as required in the performance of those responsibilities.

How You'll Work.

Team & Collaboration

Collaborate with SRE team members for design and development of observability practices; Collaborate with SRE team members to define Service Level Objectives (SLO) and agreements (SLA); Lead post incident review meetings with stakeholders; Demonstrated ability to build consensus across diverse teams; Adept at facilitating collaboration across remote and global teams

Communication Scope

Excellent verbal and written communication skills for both technical and non-technical audiences; Own internal and external communications during major incidents; Translate technical details into business-impact language (scope, severity, risk, ETA, confidence level); Maintain clear and continuous communication with stakeholders during incidents, providing timely updates.

Process & Methodology

Manage multiple priorities and deadlines in high-stakes situations, incident management, root cause analysis, continuous improvement

Full Job Description

Forbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance. We’re dedicated to helping turn aspirations into reality. We do this by providing consumers with the knowledge and research they need to make informed financial decisions they can feel confident in, so they can get back to doing the things they care about most. WHAT YOU’LL DO: * Lead and manage production & non-production support ensuring high availability and system reliability * Drive SRE best practices including incident management, root cause analysis, and continuous improvement Assume ownership of major incidents and drive coordinating efforts to ensure quick resolution of impacting events. * Collaborate with SRE team members for design and development of observability practices like Dashboarding, Logging, Metrics, Tracing, etc. They aim to diagnose and troubleshoot issues proactively. * Collaborate with SRE team members to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLOs and SLAs. * Identify and remove blockers, escalate appropriately, and continuous momentum of troubleshooting efforts. * Ensure adherence to established incident management processes and protocols. * Contribute to the improvement of incident response runbooks and documentation. * Own internal and external communications during major incidents. * Translate technical details into business-impact language (scope, severity, risk, ETA, confidence level). * Maintain clear and continuous communication with stakeholders during incidents, providing timely updates. * Ensure safe execution of mitigations, rollbacks, feature flags, and failovers * Lead post incident review meetings with stakeholders to confirm event details and assign problem investigators. * Track and report on incident metrics, iden

Free ATS check

Applying for this SRE Manager role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 35 detected · ranked by frequency

incident management ×5

cloud infrastructure ×5

SRE practices ×3

cloud-based systems ×3

observability practices ×3

Dashboarding ×3

Logging ×3

Metrics ×3

Tracing ×3

Service Level Objectives (SLO) ×3

Service Level Agreements (SLA) ×3

root cause analysis ×3

troubleshooting ×3

mitigations ×3

rollbacks ×3

feature flags ×3

failovers ×3

incident response runbooks ×3

ITSM ×3

on-call incident coordination ×3

application architecture ×3

SRE ×2

DevOps ×2

observability ×2

New Relic ×2

Datadog ×2

AWS

Azure

personal finance

journalistic insights

expert-written insights

financial decisions

BEHAVIOURAL

leadershipdecision-making skills under pressurecommunication skillscollaborationcalm and focus during critical situationsconsensus buildingresilience

Role Details

Experience 5–10 yrs

Level mid

Work Mode No

Type FULL TIME

Education Bachelor's or master's Degree and/or equivalent experience r

Category Engineering

AI-Extracted Insights

Domain Areas

personal-financecloud-based-systemssaas

Certifications

AWSITIL

How to Apply on SmartRecruiters

SmartRecruiters often includes a video screening step — check camera and mic permissions.
Link your GitHub or portfolio directly in the profile section for technical roles.
Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.

ANONYMOUS · UNFILTERED

What do employees actually say about Forbes Advisor?

Real rants from real employees. Read before you apply.

Read Company Rants →