Data Engineer - Apache Spark

Talentgigs

full-time

Required skills

Python
Apache
Apache Spark
Azure
CI
data engineer
data modeling
Databricks
Docker
end-to-end
Kafka
Kubernetes
Parquet
Snowflake
Spark
SQL
state management

About the role

Talentgigs

Website: talentgigs.in
Job details:

We are looking for a Senior Data Engineer with deep expertise in building end-to-end dataplatforms using open-source technologies. The ideal candidate must have hands-onexperience designing and implementing scalable, distributed, and event-driven dataarchitectures.This role is strictly focused on open-source data engineering ecosystems and is not suited forprofiles primarily experienced in Databricks, Snowflake, or Azure Data Factory.

Key Responsibilities

Core Responsibilities

1. End-to-End Data Platform Engineering• Architect and implement scalable, distributed data platforms.• Design batch and real-time ingestion frameworks.• Build reusable and modular data engineering frameworks.• Ensure data reliability, scalability, and performance optimization

.2. Data Processing & Orchestration• Develop pipelines using:o Apache Spark (Core, SQL, PySpark)o Apache Flinko Apache Airflow• Implement event-driven architectures using Kafka/Flink ecosystems.• Build robust CDC pipelines.• Implement PySpark Structured Streaming with exactly-once semantics.

3. Lakehouse & Storage Architecture• Design and implement Data Lakehouse solutions using:o Apache Icebergo Delta Lake (including Change Data Feed)• Implement:o Schema evolution strategieso Partitioning & compactiono Metadata management• Optimize storage formats (Parquet, ORC, Avro).

4. Distributed Query & Analytics Layer

Design and optimize distributed SQL query engines using:o Trinoo PrestoDB• Build high-performance analytical data stores using:o StarRocks• Enable interactive analytics and federated query capabilities.• Tune performance for low-latency analytical workloads.5. Infrastructure & Deployment• Deploy Spark/Flink clusters on Kubernetes.• Containerize workloads using Docker.• Implement CI/CD for data pipelines.• Set up observability, logging, and monitoring frameworks.• Work in Linux-based environments.

Required Qualifications

• 7+ years of hands-on data engineering experience.

• Strong expertise in:• Apache Spark (Core + SQL + PySpark)•

PySpark Structured Streaming

• Apache Flink

• Apache Airflow•

Apache Iceberg

• Delta Lake (Change Data Feed)

• Trino or PrestoDB

• StarRocks

• Strong understanding of:

• Event-driven architectures

• Data Lakehouse principles

• Distributed systems design

• Streaming state management

• Exactly-once processing

• Strong SQL and data modeling skills.

• Proficiency in Python.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.