Search by job, company or skills

Tiger Analytics

Azure Platform Site Reliability Engineer (SRE)

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

Job Description
Who we are

Tiger Analytics is a global leader in AI and analytics, helping Fortune 1000 companies solve their toughest challenges. We offer full-stack AI and analytics services & solutions to empower businesses to achieve real outcomes and value at scale. We are on a mission to push the boundaries of what AI and analytics can do to help enterprises navigate uncertainty and move forward decisively. Our purpose is to provide certainty to shape a better tomorrow.
Our team of 4000+ technologists and consultants are based in the US, Canada, the UK, India, Singapore and Australia, working closely with clients across CPG, Retail, Insurance, BFS, Manufacturing, Life Sciences, and Healthcare. Many of our team leaders rank in Top 10 and 40 Under 40 lists, exemplifying our dedication to innovation and excellence.
We are a Great Place to Work-Certified (2022-24), recognized by analyst firms such as Forrester, Gartner, HFS, Everest, ISG and others. We have been ranked among the Best and Fastest Growing analytics firms lists by Inc., Financial Times, Economic Times and Analytics India Magazine.

Key Responsibilities:

. Reliability & Performance: Ensure the reliability, availability, and performance of Azurebased platforms and services. Implement monitoring, alerting, and incident response strategies to address and mitigate issues proactively.

. Infrastructure Management: Design, deploy, and manage scalable and fault-tolerant Azure infrastructure.
Utilize Infrastructure as Code (IaC) tools such as Terraform and Ansible for automated provisioning and configuration.

. Data Analytics: Ensure high performance and availability of data pipelines and analytics platforms.

. Machine Learning & Generative AI: Using AIOps to ensure these systems in Azure Platform are scalable, secure, and optimized for performance.

. AKS Management: Architect, deploy, and manage Azure Kubernetes Service (AKS). Optimize AKS clusters for performance, scalability, and cost-efficiency, and ensure best practices for container orchestration and management.

. Automation & CI/CD: Develop and maintain automation workflows using Terraform and Ansible. Implement and manage CI/CD pipelines with Azure DevOps to streamline deployment processes and ensure continuous integration and delivery for day to day usage by project enablement team and making sure that the modules are kept up to date w.r.t. version and new policies introduced in the environment.

. Monitoring & Observability: Implement and maintain comprehensive monitoring and observability solutions using Azure Monitor, Application Insights, and other tools. Analyze metrics, logs, and traces to identify and resolve performance bottlenecks and reliability issues.

. Cost Optimization: Analyze and optimize Azure costs by implementing reservations, savings plans, and other cost-management strategies. Monitor usage patterns and provide recommendations for resource optimization and cost reduction.

. Capacity Planning: Perform capacity planning and forecasting to ensure adequate resources are available to meet demand. Implement scaling strategies and optimize resource utilization to balance performance and cost.

. Security & Compliance: Ensure that Azure environments adhere to security best practices and compliance requirements. Implement security measures, conduct regular audits, and address vulnerabilities proactively.

. Documentation & Knowledge Sharing: Create and maintain detailed documentation for operational processes, incident response procedures, and infrastructure designs. Share knowledge and provide training to team members and stakeholders.

. Collaboration & Stakeholder Engagement: Work closely with development teams, data scientists, and other stakeholders to understand requirements and deliver solutions that meet their needs. Communicate effectively on operational status, incidents, and improvements.

. Continuous Improvement: Stay current with emerging Azure technologies, data analytics, machine learning, generative AI, and industry trends. Identify opportunities for innovation and contribute to the development of new tools, processes, and best practices to enhance platform reliability and performance.

Required Qualifications:

. Technical Expertise: Extensive experience with Azure services, including Azure Data Analytics (Synapse Analytics, Data Lake, Data Factory, Power BI), Azure Machine Learning, Azure Cognitive Services, Generative AI technologies, and Azure Kubernetes Service (AKS). Proficiency in Terraform, Ansible, and Azure DevOps.

. Experience: 8+ years of experience in site reliability engineering, cloud operations, or a similar role with a focus on Azure technologies. Proven track record of managing large-scale, high-availability systems and supporting data analytics and AI solutions.

. Skills: Strong problem-solving and troubleshooting skills, with experience in incident management, performance optimization, and automation. Proficiency in scripting languages (e.g., PowerShell, Python) and CI/CD tools.

. Certifications: Microsoft Certified: Azure Solutions Architect Expert (AZ-305) or equivalent advanced certification required. Additional certifications in Azure Data Engineering, Machine Learning, or DevOps are a plus.

Desired Attributes:

. Operational Excellence: Demonstrated ability to maintain high standards of reliability and performance in complex cloud environments.

. Cost Optimization: Experience with cost management strategies, including reservations and savings plans, to optimize Azure expenditures.

. Leadership: Ability to lead incident response efforts, mentor team members, and drive continuous improvement initiatives.

. Customer Focus: Strong commitment to delivering high-quality solutions that meet stakeholder needs and enhance user experience.

. Innovative Mindset: Ability to drive innovation in data analytics, AI, and container management, applying creative solutions to complex challenges.

. Team Collaboration: Excellent interpersonal skills with the ability to work effectively with cross-functional teams and foster a collaborative work environment.

More Info

Skills Required

Login to check your skill match score

Login

Date Posted: 11/10/2024

Job ID: 95935603

Report Job

About Company

"Marketing Analytics. Apply analytics to enhance the impact of your marketing spend &#x3B; Customer Analytics. Analyze and map customer DNA for meaningful engagement.

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Site Reliability Engineer SRE 28555

StealthCompany Name Confidential

Site Reliability Engineer SRE Production Support

Kexlin Software Solutions Pvt Ltd Company Name Confidential
Last Updated: 25-11-2024 08:51:39 PM
Home Jobs in Chennai Azure Platform Site Reliability Engineer (SRE)