DevOps Engineer (SRE) - Wholesale Banking Tech Lending
ING is looking for an experienced IT Ops Engineer with a passion for Site Reliability Engineering to help build on our data led ambition for the Wholesale Lending Business.
ING’s goal is to enable people to “do their thing” and empower them to stay a step ahead in life and business. We are one of the largest banks in Europe and we continuously evolve to become one of the most innovative companies in the banking sector using proven and art of the state technology.
Within Tribe Lending we have set ourselves on a mission to become fully digitized and data led so as to enhance the experience of our Clients and our employees.
Lending is the core of ING’s lifeline. Hence also the driver in ING’s transformation journey.
In our urge to make complex Lending simple, reusing data, following the patterns it provides and predicting the investments we make in our clients is key. We are tackling fundamental, high impact solutions in the financial services industry. Our Tribe consists of roughly 150+ professionals and we are growing to accelerate this transformation journey. We are talking of making a fresh start, being creative, challenging the status quo and defer decisions to data and algorithms. We are talking of game changers. So if this sounds exciting and nerve wrecking at the same time, read on!
What is the role we are hiring for?
As the need for reliability grows with every critical application that we add to our landscape, we seek for an experienced Ops Engineer with knowhow on SRE to help our tribe become more reliability driven by setting up the right design patterns and processes to make sure we have an observable, reliable and if possible self-healing product landscape. Your Key Responsibilities:
1. Monitoring & Troubleshooting
- Monitoring the performance of our production systems using a host of monitoring tools
- Proactively identifying and troubleshooting issues such as software bugs, misconfigurations, performance bottlenecks and coordinating the fix of those issues
- Increasing availability and reliability of our production systems
- Coordinating Chaos and Performance Testing
2. Capacity Management Planning and execution
- Coordinating capacity assessment and capacity planning with IT Engineering and IT Architecture
- Executing changes on Capacity Management
3. Technical Risk and Health Assessment
- Constantly running technical state health assessments on production infrastructure and systems to identify CIs deviating from baseline
4. Service Level Management
- Actively monitoring SLAs and ensuring that services perform within promised SLAs
- Holding IT Engineering, Security and Architecture accountable for the remediation of any SLA degradation
5. IT Key Controls
- Ensuring that IT is ‘in CONTROL’ by holding IT groups accountable for adherence
- Collating and providing necessary evidence to Auditors for these controls
6. Runners, Automation & Tooling
- Architecting, creating and automatically managing an army of ‘runners or bots’ that fully automate tasks across infrastructure and applications – e.g. extracting production data, generating production reports, trigger event responses etc.
- Identifying and automating manual operational tasks
- Building and integrating tools that will assist in improving system availability, reliability and performance
7. Incident & Problem Management
- Coordinating incident management and service restoration.
- SREs are part of the on-call team of engineers that support production systems.
- Work with BizDevOps squads on postmortems & assist in identifying and fixing reliability issues
8. Disaster Recovery (DR) & Business Continuity Planning (BCP)
- Plan and Manage Disaster Recovery (DR) Runbook and DR testing
9. Production Reporting
- Gather relevant data and provide accurate production reporting for availability, reliability, performance, and capacity.
- A small part of the job requires coordinating response to the occasional service request from our business partners. For e.g. if a business unit requests restore of a particular backup
10. IT-Risk Management
- IT risk based on an ING Framework, is all about being in control of your assets. Making sure implementations are secure and in line with design and security policies. Do we what we promise!
- Translate Security policies into requirements, implement and document security measures
- Identity and Access Management, making sure the process is followed and providing evidence as proof
- Vulnerability Management, making sure we react fast by implementing patches and solutions
- Making sure that requirements are incorporated by design and monitor developments pro-actively
Who should apply?
At ING, we promote diversity not just because it is the right thing to do, but because it’s essential for delivering on our strategy. To stay a step ahead we need teams with a healthy mix of contrasting perspectives and backgrounds as they are more creative, faster to adapt and more inventive with their solutions. We strive to hire a workforce as diverse as the communities in which we operate, and we will consider every application, regardless of race, religion, color, national origin, sex, disability, or age.
If you are an experienced Ops Engineer with a proven track record of managing various stakeholders and applications simultaneously, mentoring/coaching colleagues to help them develop, and of course to design, automate and maintain end to end ecosystems consisting of following technologies, we would love to hear from you!
- Expert level knowledge in Windows systems administration including events/services and asp.net/.net core applications running on IIS
- Proficient level knowledge in Unix/Linux administration
- Proficient level knowledge in tomcat application administration
- Expert level knowledge in PowerShell scripting and automation
- Proficient level knowledge in networking concepts and Windows networking
- Proficient level knowledge in monitoring and observability implementations especially on Windows stack
- Proficient level knowledge in RDBMS concepts and performance management (Oracle/MSSQL)
- Experience in Azure DevOps for automated build, test and deployment activities
- Experience in SCOM, ELK stack and Prometheus+Grafana
- Experience in changes/incident/problem management, and being on-call
- Proficient level knowledge on containerization and container orchestration technologies (K8S/OpenShift)
- Expert level understanding of core SRE concepts, SLI/SLA/Error budgets and experience in how to implement various SRE models within the organization
- Be a strong team player and have good social and communication skills
- Being pro-active and assertive
- Have a learning attitude. Not only to master new technologies but also on the interpersonal level. You are proven to be able to ask and give feedback
- Analytical mindset, you bring structure in complex situations
- You peel down technical problems, determine root causes, estimate risks, and based on information and analysis you implement the best suitable solution
- You initiate/ participate actively in refinement/brainstorm sessions by being an active contributor to the backlog, coming in prepared, contributing in making stories/ready for sprint, Monitoring the quality of the deliverables, by using acceptance criteria or definition of done principals.
- Continuously reflect on personal and team performance during retrospectives by being honest and transparent.
Nice to haves
- Experience in enterprise scheduling solutions (TWSd,UAC)
- Experience in python scripting
- Experience in Ansible, Azure DevOps
- Experience in Test automation
Additional optional criteria which will be considered separately as a “plus”:
- Strong plus - Experience in applying SRE concepts to Front office banking processes (e.g. Credit Approval, Monitoring, etc)
- You have a crystal-clear understanding of Agile WoW and scrum methodologies
- You actively keep up to date with the latest developments and to incorporate them in your work
- You embody our orange code in your professional manner: you take on activities and responsibilities and make them happen, you help others to be successful, and you are one step ahead in anticipating your colleagues and stakeholders.