Overview
The Senior DevOps Engineer - Network is a critical role responsible for overseeing the network infrastructure and ensuring the successful integration of network operations with DevOps processes. This position plays a key role in the design, implementation, and maintenance of network solutions in alignment with the organization's DevOps objectives.
As a member of the Platform Engineering team, you will be responsible for managing and supporting the infrastructure which drives our platform. The reliability and scalability of our technology is key to our success and this position will work with our development and security teams to help design highly available and fault tolerant systems.
In particular you will be focused on monitoring and optimizing our network performance to support the low-latency, high throughput operation of our trading exchange.
Key responsibilities
- Continuously improve the resiliency, throughput and latency profiles of our trading systems, by working hand-in-hand with our trading technology teams
- Manage and support our AWS cloud infrastructure, EC2 instances and physical
- servers
- Development and management of IaC to ensure consistency of our infrastructure
- Ensuring security hardening of our OS builds and configurations
- Manage and maintain config management tooling to ensure consistency
- Integration of our stack with Kubernetes
- Ensure SRE best practices for design and operation of the stack
- Design, implement and test disaster recovery capabilities to ensure our business
- can continue to operate in the event of a technology failure
- Participate in an on-call rota for escalations
Required Qualifications
- Theoretical and practical networking knowledge, incl. but not limited to unicast and multicast routing protocols, Linux kernel's TCP stack implementation, congestion avoidance/control (e.g. BBR), traffic control, network simulation, AWS VPC / TGW & Kubernetes VPC CNI, etc. DPDK experience being a plus.
- Professional experience with kernel troubleshooting: strace, bpftrace, perf profiling/tracing, navigating / reading / building the relevant kernel code.
- Professional experience with userland monitoring (e.g. Thanos/Prometheus/AlertManaging), logging (e.g. Splunk/Loki), alerting, troubleshooting, profiling/tracing, etc.
- Strong practical AWS knowledge, with min. 5 years of SRE / DevOps experience supporting and managing Linux based systems. Computer science, or engineering, degree preferred - strong understanding of fundamental Computer Science principles is required.
- Familiarity with Kubernetes / Ansible / Chef, and with one or more programming language: Python, Golang, C, NodeJS.
Skills: devops,automation,cloud,security,networking,routing protocols