Search by job, company or skills
As an Observability Systems Engineer , you will help implement and manage best in class solutions with key vendor partnerships and internal development teams. Observability is more than just triggering alerts, it is understanding a system end to end, providing data driven, actionable insights into that system, and helping to ensure that systems are constantly undergoing improvements against root cause solutions.
We are looking for a proactive and highly self motivated individual that is as excited to go looking for new problems to solve as they are to solve them. In this role, you will contribute to designing and implementing observability solutions in conjunction with architects, software engineers, data engineers, project management, and other SMEs.
RESPONSIBILITIES:
Responsible for ensuring that all security, availability, confidentiality and privacy policies and controls are adhered to
Plans, designs, acquires, implements , integrates, and tests observability solutions of moderate complexity comprised of Windows, Linux, and SaaS based front-end and back-end components that support the company infrastructure, business processes and operations and/or network-based (cloud) product systems.
Work individually and collaboratively to deliver solutions in live production systems
Support, maintain, and resolve problems for the tools and services owned by the Observability Automation Tools team in live production systems, with occasional on-call availability
Actively contribute to the configuration, layout and performance tuning of the production infrastructure
Support upgrade and go-live activities related to new customer onboarding projects
Attend and actively participate in weekly meetings
Participate in projects while working with the workgroup to achieve defined goals within set timelines
Performs IT functions such as application and infrastructure installation, patching, upgrades, and management related to the tools and services owned by the Observability Automation Tools team in live production systems
Additional Job Description
HELPFUL EXPERIENCE AND KNOWLEDGE:
Typically requires a Bachelor s degree in a relevant field and a minimum of 6+ years of related experience; or an advanced degree with 4 years of experience; or equivalent related work experience.
Proficient in modern observability technologies and methods, including OpenTelemetry and correlation of logs, metrics, and traces across multiple sources
Able to work with RESTful API web services, integrations, and web hooks
Proficient in working with various database and data storage back-ends, including SQL and no-SQL
Experience in vendor sourced tools and technologies such as Splunk, LogicMonitor, Datadog, PagerDuty, AppDynamics, RunDeck, Apica, Selenium, and similar
Experience in cloud provider tools and technologies such as AWS, Azure, Cloudwatch, Cloudflare, and similar
Experience with Linux and Windows scripting, including bash/perl/python and Powershell
Experience with graphing and analytics tools (Graphite/Grafana/Splunk)
Experience with CI/CD deployment
Recent experience evaluating and solving business problems with data driven analysis of systems and incidents
Knowledge and experience in the fintech industry preferred
Excellent communication skills with the ability to remain patient with non-technical contacts
Knowledge of multiple infrastructure technologies including storage, network, backups, virtualization
Configuration management using Terraform and understanding of container technologies such as Docker
Experienced with Agile sprint methodologies
Ability to think multi-dimensionally about problems and your proposed solutions and how a single change can impact entire environments.
Comfortable working in a complex collaborative group of teams
Desire to build solutions, drive adoption for your solutions, and integrate your solutions into larger ecosystems
Highly customer (internal and external) oriented, enjoys both teaching to, and learning from, customers.
Ability to work on problems of diverse scope, where analysis of situation requires a review of identifiable factors
Must be able to exercise judgment within defined procedures to determine appropriate action
Must have strong organizational and multi-tasking skills to prioritize workload in a fast paced environment
Moderate-to-Advanced knowledge and troubleshooting of Windows Server, Linux, and Cloud based infrastructure and services
Experience with Systems Administration in Public Cloud Datacenters such as AWS, Azure or Google Cloud
Experience with Configuration and System Management tools in support of Linux systems
Date Posted: 21/07/2024
Job ID: 85862473