Site Reliability Engineering – TouchPoint Platform
Your working environment:
ING’s ambition is to be the number one digital banking brand in Europe, offering customers everywhere the same empowering, personalized and differentiating experience. A collaborative, communicative Site Reliability Engineer will change the way we’re working.
You will be working in a customer centric company where bank applications are essential. For that reason, we want to guarantee the total availability of them, joining a team where you will be an important team member involved in global digital banking transformation.
TouchPoint Platform is part-of ING’s “Think Forward” strategy to become one truly global bank and is a key success factor on our path to become a financial services platform that extends beyond banking. Through our platform we provide a scalable foundation for platform business models and so position ING successfully in the new banking ecosystem.
The globally scalable banking platform will create a differentiating customer experience and cater for growth by leveraging the innovation and development power within ING.
TouchPoint SRE team
The TouchPoint Site Reliability Engineering (SRE) team is a multidisciplinary team of senior engineers with proven track records in development and operations across applications and infrastructure. The primary goal is to continuously and structurally improve the reliability and maintainability of the IT environments involved with the TouchPoint Platform, delivered and managed from different (international) ING domains.
Team vision: The TouchPoint Platform is our product, in Production, Acceptance and Test. Our Technology setup, WoW and Practices ensure our platform is high available, always responsive, and scalable towards ING Entities around the globe without sacrificing velocity and agility. As we move forward we are setting the benchmark of excellence for operating platforms within BTP and ING as a whole.
TouchPoint SRE Responsibilities & Activities:
- Ensure Service Level Objective (SLO) levels are set and met
- Drive Always Available mindset and behavior within the TouchPoint organization. Be able to recognize shortcomings in knowledge and expertise, and deliver the necessary resources, skills, guidance and training to DevOps teams where needed.
- Define and enhance standards for logging monitoring and alerting, and actively monitor end to end platform performance through white and black box monitoring tools.
- Improve incident response practices and be actively engaged in incident response of escalated and critical incidents. On call duty is currently not part of the job, but should not be an objection if and when required.
- Participate in Root Cause Analysis. Prioritize and implement the RCA recommendations through improvement plans with the responsible Squads / DevOps teams
- Drive Continuous improvement on all services in the TouchPoint Platform through analysis of the current level of service, functional and technical setup, code, dev/ops practices and the underlying causes of incidents, underperformance, etc.
- Organization and coordination of platform tests like DDOS, DR, Ceiling/Break, and Penetration tests.
- Setting up and maintaining automatic reporting and feedback loops
- Contribute to automating Build, Test and Deployment practices through the CI/CD pipeline
- Contribute to tuning application resources and updating high available deployment patterns of (mostly) container and VM based environments.
- Initiate and contribute to new SRE initiatives like AI Ops, Chaos Engineering, migrations to Public Cloud, and Error Budgeting
- Participate and initiate experiments with new tools and concepts, and evaluate it’s value against set goals
- Great salary and benefits like 13th month, 8% holiday allowance, personal allowances etc.
- Challenging, professional and fun work environment.
- Working in an area which is of great importance to the strategy of ING
- An International environment
- A full-time position (40 hour week)
- Great training and education opportunities
You are an enthusiastic Software and/or Reliability Engineer with a focus on creating amazing solutions and frameworks. You have solid technical knowledge, and use that to formulate solutions, support and coach other engineers. You have a passion for highly resilient and reliable software and really hate repetitive manual tasks preventing you to do really cool stuff! You are able to inspire squads to spread the SRE mind-set. You are enthusiastic about transferring your knowledge to others within your team, but also with all DevOps teams in the TouchPoint Tribe and the rest of ING.
- Operations expert: 5+ years of experience working using Agile DevOps principles
- Solid understanding how technology setup and ITSM processes relate to service level objectives like Availability (time based, successful call rate, response times), MTTR, and MTBF.
- Good understanding of microservices architecture and related high availability / resilience patterns and experience building systems with multiple layers of redundancy to withstand failures in software, hardware, network infrastructure.
- Proven experience:
- working as a Site Reliability Engineer or DevOps engineer
- scripting in at least one of the following: Ruby, Python, Bash, PowerShell
- set up Build and Deployment pipelines in Azure DevOps (ADO)
- set up white-box monitoring and able to formulate meaningful metrics for monitoring and reporting
- Able to coordinate/lead incident response and root cause analysis activities
- Understanding of IT Service Management processes (ING Global Way of Working) and the way the relate to SRE objectives
Prior work experience with tools:
- CI/CD Pipeline: Azure Devops / Jenkins / Gitlab
- Cloud computing and container orchestration: Linux VM’s and Kubernetes container platforms. Knowledge of OpenShift + AKS and related certifications are a pre.
- Touchpoint service mesh and SDK/Merak
- logging/monitoring/alerting: Kafka, ELK, Prometheus, and IAT. Experience with black box monitoring tools like Rigor/Splunk and AI Ops tools like Loom is a pre.
- Backlog management: Azure Boards
- ITSM: SNOW
The ideal candidate has:
- A Bachelor or Master’s degree in computer science or related field
- Experience coaching and training DevOps engineers on technical subjects
- Previous experience as a consumer of the TouchPoint Platform
- Understanding of the ING application risk journey