Business Function
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
Job Objective
DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform's efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team's productivity.
Responsibilities
- Implementation and administration experience in Elastic Stack, Confluent Platform (Kafka), Prometheus, Grafana, NGINX & other Open APM tools is a plus
- Configuration of Elasticsearch index templates, mappings and comfortable with KQL language
- Experience in upgrading Elastic stack, Confluent & other open APM tools versions
- Proactively monitor the platform service availability and help fix issues
- Automate cluster management / routine tasks, optimize processes, and perform thorough testing to ensure quality.
- Set up Monitoring, Alerting, and Metrics reporting; Conduct performance, failover testing and capacity planning
- Design, build and maintain data engineering pipelines.
- Perform application maintenance, patching and having strong experience in Linux platform and container configurations is a plus.
- Collaborating with the Dev Leads to ensure that the dev team's needs are met through the CI/CD framework, component monitoring and stats, incident escalation etc.
Deliverables
- Ensure on-time delivery of tasks and projects.
- Ensure continuous uptime of applications and services.
- Ensure no security or audit issues.
Job Dimensions
- Comply to bank standards to track and follow up on the assigned projects.
- Cover all areas in application and infrastructure operations of the platform.
Requirements
- You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
- Strong communication skills and ability to explain protocol and processes with team and management
- A passion for learning and using new technologies in the open source communities.
- Min 5 years of related experience as an SRE in a large scale organization.
- Experience in configuring log shipper agents to forward logs to Kafka and use Logstash to enrich or transform data before sending them to Elasticsearch for indexing.
- Experience with CI/CD pipelines and tool sets like Bitbucket, Jenkins, JIRA.
- Working knowledge of Grafana, Prometheus, Nginx, Elastic stack (Elasticsearch / Logstash / Kibana / Beats) including data ingestion, management, monitoring & analytics. Able to perform L1/L2 ELK related tasks.
- In-depth experience in Unix/Linux/Shell/Python scripting.
- Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
- Strong experience in developing code (Python, Shell scripting etc.) with quality, scalability, and extensibility
- Experience in writing SOP's (technical documentation) and familiar with Agile Scrum framework.
Apply Now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognizes your achievements.