You will join a team of highly skilled engineers who are responsible for delivering Acceldata s support services. Our Site Reliability Engineers are trained to be active listeners and demonstrate empathy when customers encounter product issues. In our fun and collaborative environment Site Reliability Engineers develop strong business, interpersonal and technical skills to deliver high-quality service to our valued customers.
We re looking for someone who can:
- Our Site reliability engineers work on improving the availability, scalability, performance, and reliability of enterprise production services for our products as well as our customer s data lake environments.
- You will use your expertise to improve the reliability and performance of Hadoop Data lake clusters and data management services. Just as our products, our SREs are expected to be platform and vendor-agnostic when it comes to implementing, stabilizing, and tuning Hadoop ecosystems.
- You d be required to provide implementation guidance, best practices framework, and technical thought leadership to our customers for their Hadoop Data lake implementation and migration initiatives.
- You need to be 100% hands-on and as a required test, monitor, administer, and operate multiple Data lake clusters across data centers.
- Troubleshoot issues across the entire stack - hardware, software, application, and network.
- Dive into problems with an eye to both immediate remediations as well as the follow-through changes and automation that will prevent future occurrences.
- Must demonstrate exceptional troubleshooting and strong architectural skills and clearly and effectively describe this in both a verbal and written format .
What makes you the right fit for this position
- Customer-focused, Self-driven, and Motivated with a strong work ethic and a passion for problem-solving.
- 5+ years of designing, implementing, tuning, and managing services in a distributed, enterprise-scale on-premise, and public/private cloud environment.
- Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
- Hadoop cluster design, Implementation, management, and performance tuning experience with HDFS, YARN,
- HIVE/IMPALA, SPARK, Kerberos and related Hadoop technologies are a must.
- Must have strong SQL/HQL query troubleshooting and tuning skills on Hive/HBase.
- Must have a strong capacity planning experience for Hadoop ecosystems/data lakes.
- Good to have hands-on experience with - KAFKA, RANGER/SENTRY, NiFi, Ambari, Cloudera Manager, and HBASE.
- Good to have data modeling, data engineering, and data security experience within the Hadoop ecosystem. Good to have deep JVM/Java debugging and tuning skills.
- Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus.