- Help design, build and continuously improve the clients online platform.
- Research, suggest and implement new technology solutions following best practices/standards.
- Take responsibility for the resiliency and availability of different products.
- Be a productive member of the team.
Requirements
Job Title: Senior Lead Java Developer Spark, Azure, Redpanda Expert
Location: Bangalore (with monthly travel to Gurgaon)
Experience Level: Senior/Lead (13+ years)
Job Overview:
We are looking for a Senior Lead Java Developer with extensive experience in Apache Spark, Azure Data Services (including ADLS, Azure Blob Storage, and Azure Databases), and Redpanda. The ideal candidate will be responsible for designing, architecting, and leading the development of scalable data processing pipelines using Java and Spark. You will collaborate with cross-functional teams to build data-intensive applications while promoting best practices in cloud-based architecture. If you are passionate about data engineering, distributed systems, and cloud technologies, this role is for you.
Key Responsibilities:
Design and Architect Distributed Data Processing Systems:
- Lead the design and implementation of large-scale Apache Spark jobs using Java to process vast datasets, including blockchain data streams and financial transactions.
- Architect high-performance, reliable data pipelines integrated with Azure services (ADLS, Blob Storage, and Azure Databases) for both real-time and batch processing.
Development and Optimization of Spark Jobs:
- Develop, optimize, and manage Spark jobs in Java (JVM-based), focusing on scalability, fault tolerance, and performance.
- Implement data validation, schema management, and parquet file handling within Spark jobs for smooth integration with ADLS and Iceberg tables.
Integration with Redpanda:
- Lead the integration of Redpanda (or Kafka-like brokers) for real-time data ingestion, ensuring reliable, low-latency communication between ingestion services and downstream processing.
- Collaborate with the engineering team to create Redpanda consumers for real-time data streaming, ensuring efficient consumption by Spark jobs.
Azure Cloud Services:
- Design cloud-native solutions utilizing Azure services like Azure Data Lake Storage (ADLS), Azure Blob Storage, and Azure SQL/NoSQL Databases.
- Work closely with the DevOps team to ensure seamless integration of Spark jobs with Azure, managing data ingestion, storage, and cost optimization for large datasets.
Lead and Mentor Engineering Teams:
- Provide leadership, mentorship, and technical guidance to a team of developers, ensuring adherence to best practices in Java development, Spark optimization, and cloud architecture.
- Conduct regular code reviews, architecture assessments, and performance tuning to maintain code quality and industry standards.
End-to-End Data Pipeline Management:
- Manage the entire lifecycle of data pipelines from ingestion via Redpanda to data transformation, storage, and delivery using Azure Data Services and Spark.
- Ensure pipelines are monitored, secure, and highly available by integrating with tools like Prometheus, Azure Monitor, and Grafana.
Performance Tuning and Resource Optimization:
- Lead efforts in optimizing Spark job performance, reducing shuffling and improving memory usage for large-scale data processing.
- Implement caching strategies, partitioning, and task scheduling to maximize throughput and minimize processing time.
Required Qualifications:
- 13+ years of experience in software development with strong expertise in Java and JVM-based technologies.
- 5+ years of experience working with Apache Spark for data processing in Java.
- Extensive experience with Azure Data Services, including ADLS, Azure Blob Storage, and Azure Databases.
- In-depth expertise in distributed data processing and streaming frameworks, with hands-on experience in integrating message brokers like Redpanda or Kafka.
- Proven experience in cloud-native architecture design, specifically on Azure.
- Strong skills in data storage optimization, partitioning, and performance tuning in Spark.
- Leadership and mentoring experience with a proven ability to guide and develop high-performing engineering teams.
- Experience with CI/CD pipelines, containerization (Docker, Kubernetes), and DevOps practices.
- Strong understanding of security best practices for cloud-based data pipelines.
Preferred Qualifications:
- Experience with Iceberg tables and Parquet file formats in distributed processing environments.
- Familiarity with monitoring tools such as Prometheus, Grafana, or Azure Monitor for observability.
- Experience with streaming data processing using Redpanda and its integration with Spark jobs for real-time data flows.
- Familiarity with microservices architecture (Spring Boot) and cloud deployment strategies.
Benefits
- A challenging, innovating environment.
- Opportunities for learning where needed.