Job Summary:
Location: India - remote
Offices located - in Bangalore, India
At OnSolve, the Systems Support Engineer III is responsible for the daily maintenance and upkeep of our customer-facing production applications. This role is a senior technical resource on an operational sustainment team in a fast-paced, problem-solving environment that supports a vast array of hardware and software applications. The Systems Support Engineer III is solving problems while using their technical prowess to identify hot to optimize ecosystems including optimization, automation, and modernization. This includes making configuration changes in response to customer support tickets, performing routine application and understructure maintenance tasks, and assisting in software upgrades in a 24x7x365 high-availability production environment. This role also includes being a member of the on-call rotation, supplying working expertise about the software stack's function and configuration in case of a system failure.
Responsibilities:
- Maintain a deep-level understanding of the layout, configuration, and management of the OnSolve production software stacks.
- Troubleshoot and resolve complex technical issues related to systems, applications, and infrastructure.
- Perform daily configuration and maintenance tasks against the production software stack environment including:
Certificate management
Domain registration and troubleshooting
Implementing patches and remediation of security vulnerabilities.
- Assist with the management of monitoring and metrics systems (i.e. DataDog, PRTG, ELK, Prometheus, etc.).
- Create and maintain comprehensive documentation for systems and processes and perform team training exercises as needed.
- Participate in 24/7 on-call rotation to address critical issues and incidents.
- Manage and support Kubernetes environments, with expertise in AWS EKS and Azure Kubernetes Service (AKS).
- Configure, monitor, and troubleshoot message broker systems (Rabbit MQ).
- Administer and maintain both Linux and Windows server environments.
- Execute deployment processes for the applications.
- Work with Rancher for container orchestration and Helm charts for managing Kubernetes applications.
- Implement and manage continuous delivery pipelines using ArgoCD.
- Manage and troubleshoot network/infrastructure-related issues in a microservices environment.
- Configure and maintain system and application settings using Ansible and Terraform.
- Implement and maintain ELK (Elasticsearch, Logstash, Kibana) and OpenSearch for centralized logging and search capabilities.
- Assume responsibility for independently learning new technologies and practices with a DevOps focus.
- Other duties as assigned.
Qualifications:
- Bachelor's degree in computer science, Information Technology, or related field.
- Minimum of 6 years of experience as a Systems Support Engineer or similar role.
- Demonstrates a proactive and self-starting work style with a strong sense of urgency; capable of making independent decisions while adhering to established best practices.
- Self-motivated and adept at independently prioritizing and managing multiple client issues simultaneously.
- Proficient in diagnosing root causes rather than treating symptoms, with the ability to anticipate the impact on additional processes and systems.
- Hands-on experience in understanding and managing client-server and microservice-style software stacks.
- Expertise in working with cloud networks, including AWS and Azure platforms.
- Strong troubleshooting skills and ability to analyze and resolve complex technical issues.
- Experience with containerization and associated platforms (Kubernetes in particular)
- Experience with AWS EKS, and Azure Kubernetes Service.
- Proficient in Rabbit MQ administration and configuration.
- Familiarity with cloud services such as AWS, Azure, and their associated tools.
- Experience with monitoring tools like DataDog for system performance analysis.
- In-depth knowledge of Linux and Windows server environments.
- Expertise in certificate management and patching procedures.
- Proven experience in 24/7 on-call rotation support.
- Hands-on experience with deployment tools like Rancher, Helm charts, and ArgoCD.
- Solid understanding of microservices architecture and network configurations.
- Proficiency in configuration management tools such as Ansible and Terraform.
- Experience with ELK (Elasticsearch, Logstash, Kibana) and OpenSearch for log analysis and search.
- Passionate about automation, troubleshooting, and solving challenging problems.
- Works cross-functionally with other teams on improvements to existing infrastructure to increase system stability and performance.
- Basic grasp of layer 3-7 networking concepts & common protocols.
Bonus Skills:
- AWS Architecture and experience in AWS Cloud migration including hybrid implementations
- Able to understand and learn tools from top to bottom
- Capable of developing automation in multiple languages, such as C#, Python, BASH, etc.
- Experience participating in Agile team workflows, using Kanban/SCRUM processes
The above statements are intended to describe the general nature and level of work being performed by people assigned to this classification. They are not to be construed as an exhaustive list of all responsibilities, duties, and skills required of personnel so classified. All personnel may be required to perform duties outside of their normal responsibilities from time to time, as needed.