Zafin

banking

CloudSiteReliabilityEngineer(CSREI)

India
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Cloud Site Reliability Engineer (CSRE I) at Zafin. Skills: cloud technologies, strategic planning, incident management, reliability, scalability, performance, Azure, AKS, OpenShift, automation, monitoring, predictive analytics. Manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment. Design and implement strategic operational enhancements to improve resiliency and system reliability”

What You'll Achieve.

ensure the reliability, scalability, and performance of our cloud infrastructure and applications; drive innovative solutions and operational excellence; improve resiliency and system reliability; reduce error recurrence; achieve organizational objectives

Industry & Context.

banking
Problems you'll solve

Exceptional analytical and problem-solving abilities; resolution of complex technical issues; Root Cause Analysis (RCA)

What They're Looking For.

Must Have

8+ years of experience in cloud support, operations, or a related role, Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms, Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift, Proven leadership in managing automated deployment pipelines, including Azure DevOps, Mastery in enterprise monitoring platforms (e. g. , Azure Insights, Grafana) and predictive analytics tools, Advanced scripting skills with PowerShell, Python, or similar languages, Extensive experience in incident management and defining SLAs for global production environments, In-depth knowledge of database management, particularly Postgres

Nice to Have

Master’s degree, Advanced certifications in cloud platforms (e. g. , Azure Solutions Architect Expert), Experience with ITSM tools and processes (e. g. , ServiceNow), Comprehensive understanding of security and compliance in cloud environments

What You'll Do.

Manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment

Design and implement strategic operational enhancements to improve resiliency and system reliability

Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence

Optimize cloud infrastructure for high performance

and cost-effectiveness

Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution

Develop and execute automation strategies to streamline operational workflows and incident responses

Create and maintain comprehensive documentation of cloud architectures

and incident management strategies

How You'll Work.

Team & Collaboration

Represent the organization in external client escalation calls; collaborating with cross-functional teams

Communication Scope

Advanced communication and collaboration capabilities

Process & Methodology

strategic planning, drive strategic initiatives

Full Job Description

The world’s top banks use Zafin’s integrated platform to drive transformative customer value. Powered by an innovative AI-powered architecture, Zafin’s platform seamlessly unifies data from across the enterprise to accelerate product and pricing innovation, automate deal management and billing, and create personalized customer offerings that drive expansion and loyalty. Zafin empowers banks to drive sustainable growth, strengthen their market position, and define the future of banking centered around customer value. Cloud Site Reliability Engineer I (CSRE I) Zafin is seeking a Cloud Site Reliability Engineer I (CSRE I) to lead strategic initiatives in ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This advanced role requires mastery in cloud technologies, strategic planning, and incident management to drive innovative solutions and operational excellence. Key Responsibilities Manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment. Design and implement strategic operational enhancements to improve resiliency and system reliability. Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence. Represent the organization in external client escalation calls, providing expert guidance and solutions. Optimize cloud infrastructure for high performance, scalability, and cost-effectiveness. Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift. Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution. Develop and execute automation strategies to streamline operational workflows and incident responses. Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies. Mentor and coach junior engineers, fostering a culture of continuous learn

Free ATS check

Applying for this Cloud Site Reliability Engineer (CSRE I) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Zafin?

Real rants from real employees. Read before you apply.

Read Company Rants →