Data Engineer

CodeVyasa

Location: Greater Bengaluru Area
Job type: Full-time

Required skills

Python
Airflow
AWS
Apache
Apache Airflow
Apache Flink
Apache Spark
CI
data engineer
DevOps
ETL
Flink
full stack
Kafka
Kubernetes
Spark
SQL

About the role

CodeVyasa

Website: codevyasa.com
Job details:

Job description

We are looking for a highly skilled Data Engineer | Bangalore, 3+ yrs with strong expertise in both batch and real-time data processing systems. The ideal candidate will have hands-on experience building scalable ETL/ELT pipelines, optimizing large-scale data workflows, and working with modern lakehouse architectures.

About Us

CodeVyasa is a mid-sized product engineering company that works with top-tier product and solutions organizations such as McKinsey, Walmart, RazorPay, Swiggy, and others. We are a team of 550+ engineers, driving innovation across Product & Data Engineering, focusing on Agentic AI, RPA, Full Stack, and GenAI-based solutions.

Key Responsibilities

Design, build, and optimize scalable ETL/ELT pipelines for batch and streaming data
Develop and maintain Apache Spark (PySpark, Spark SQL) jobs with strong focus on performance tuning
Build and manage real-time data pipelines using Apache Flink and Kafka
Orchestrate workflows using Apache Airflow, including DAG design, retries, and monitoring
Implement data quality checks, SCD strategies, and incremental data processing
Work with lakehouse architectures (Delta Lake / Iceberg), including partitioning and compaction strategies
Optimize performance by handling data skew, joins, and query tuning
Deploy and manage data applications on AWS (S3, EMR, Athena)
Collaborate with DevOps teams to implement CI/CD pipelines and GitOps workflows
Manage containerized workloads using Kubernetes (EKS)

Must-Have Skills

Strong experience with Apache Spark (PySpark, Spark SQL) and performance tuning
Hands-on experience in ETL/ELT pipeline development (SCD, incremental loads, data validation)
Expertise in Apache Airflow (DAGs, scheduling, monitoring)
Experience with Apache Flink and Kafka for real-time streaming pipelines
Solid understanding of Lakehouse architectures (Iceberg / Delta Lake)
Advanced SQL and strong Python programming skills
Experience in data optimization techniques (handling skew, partitioning, indexing)
Hands-on experience with AWS ecosystem (S3, EMR, Athena)
Exposure to Kubernetes (EKS) and CI/CD (GitOps)

Good-to-Have Skills

Experience with dbt (data build tool)
Knowledge of CDC tools like Debezium or Flink CDC
Familiarity with data observability tools
Exposure to feature stores and ML data pipelines
Experience with schema registries (Avro, Protobuf)
Deep understanding of Delta Lake / Apache Iceberg operations

Why Join CodeVyasa?

Work on innovative, high-impact projects with leading global clients.
Exposure to modern technologies, scalable systems, and cloud-native architectures.
Continuous learning and upskilling opportunities through internal and external programs.
Supportive and collaborative work culture with flexible policies.
Competitive salary and comprehensive benefits package.
Free healthcare coverage for employees and dependents.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.