Barclays
Banking
ObservabilityServiceEngineer
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Observability Service Engineer at Barclays. Skills: Observability, Monitoring, Site Reliability Engineering (SRE), Automation, Distributed systems, Cloud platforms, Containerization, Microservices architectures, DevOps, Scripting. Effectively monitor and maintain the bank’s critical technology infrastructure. Resolve more complex technical issues, whilst minimising disruption to operations”
What You'll Achieve.
Minimise disruption to operations; Improve the service to customers and stakeholders; Ensure optimal performance; Maintain stability and drive efficiency; Ensure issues are known when they occur; Achieve the goals of the business; Enhance operational efficiency and incident response
Industry & Context.
Resolve more complex technical issues; Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues; Providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes; Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues; Create solutions based on sophisticated analytical thought comparing and selecting complex alternatives.; In-depth analysis with interpretative thinking will be required to define problems and develop innovative solutions.; Adopt and include the outcomes of extensive research in problem solving processes.
What They're Looking For.
Must Have
Proficiency in Observability Tools, such as Datadog, Splunk, New-Relic, Dynatrace, etc., knowledge and experience in Windows and UNIX/LINUX/Windows platforms., Solid understanding of distributed systems, cloud platforms (Preferred AWS), containerization (Docker, Kubernetes), and microservices architectures., Work experience in IT operations, monitoring, Site Reliability Engineering (SRE), or a dedicated observability role., Experience on DevOps tools, such as Jenkins, Terraform, Chef, Ansible, etc. and CI/CD pipeline., Experience with scripting languages (Python, Bash, PowerShell) for automation and data manipulation., Able to Identify risks and implement Controls where necessary., Familiarity with database monitoring (SQL, NoSQL)., Knowledge of networking concepts and protocols.
Nice to Have
AWS or Any Cloud certification., Open Telemetry and Custom Telemetry Integration., Exposure to Service Now Event Management administration.
What You'll Do.
Effectively monitor and maintain the bank’s critical technology infrastructure
Resolve more complex technical issues
whilst minimising disruption to operations
Provision of technical support for the service management function
Develop the support model and service offering
Execution of preventative maintenance tasks on hardware and software
Utilisation of monitoring tools/metrics to identify
prevent and address potential issues and ensure optimal performance
Maintenance of a knowledge base
Analysis of system logs
error messages and user reports to identify the root causes of hardware
software and network issues
Providing a resolution to these issues by fixing or replacing faulty hardware components
reinstalling software
or applying configuration changes
monitoring enhancements
business continuity management
front office specific support and stakeholder management
Identification and remediation or raising
through appropriate process
of potential service impacting risks and issues
Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
Actively tune monitoring tools
and alerting to ensure issues are known when they occur
Resolve business critical issues
Advocate preferred solutions to improve monitoring
Contribute and architect the road map for the Observability Tools
Keep the stakeholders updated with the Latest Monitoring and Observability trends
Showcase leadership skills on implementing the same in defined timeline
Working closely with the SRE teams to reduce toil and manage Chaos Engineering
Leading and mentoring a group of skilled engineers
and alert optimization to enhance operational efficiency and incident response
How You'll Work.
Team & Collaboration
Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategies.; Working closely with the SRE teams; Guide team members through structured assignments; Identify the need for the inclusion of other areas of specialisation to complete assignments; Train, guide and coach less experienced specialists
Communication Scope
Advise key stakeholders, including functional leadership teams and senior management on functional and cross functional areas of impact and alignment.; Keep the stakeholders updated with the Latest Monitoring and Observability trends
Process & Methodology
Plan resources, budgets, Manage and maintain policies, Deliver continuous improvements, Lead collaborative, multi-year assignments
Full Job Description
# **Job Description** **Purpose of the role** To effectively monitor and maintain the bank’s critical technology infrastructure and resolve more complex technical issues, whilst minimising disruption to operations. **Accountabilities** * Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders. * Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance. * Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing. * Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes. * Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management. * Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues. * Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur. **Vice President Expectations** * To contribute or set strategy, drive requirements and make recommendations for change. Plan resources, budgets, and policies; manage and maintain policies/ processes; deliver continuous improvements and escalate breaches of policies/procedures.. * If managing a team, they define jobs and responsibilities, planning for the department’s
Applying for this Observability Service Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about Barclays?
Real rants from real employees. Read before you apply.