CodeVyasa
Website:
codevyasa.com
Job details:
Job description
We are looking for a highly skilled Data Engineer | Bangalore, 3+ yrs with strong expertise in both batch and real-time data processing systems. The ideal candidate will have hands-on experience building scalable ETL/ELT pipelines, optimizing large-scale data workflows, and working with modern lakehouse architectures.
About Us
CodeVyasa is a mid-sized product engineering company that works with top-tier product and solutions organizations such as McKinsey, Walmart, RazorPay, Swiggy, and others. We are a team of 550+ engineers, driving innovation across Product & Data Engineering, focusing on Agentic AI, RPA, Full Stack, and GenAI-based solutions.
Key Responsibilities
- Design, build, and optimize scalable ETL/ELT pipelines for batch and streaming data
- Develop and maintain Apache Spark (PySpark, Spark SQL) jobs with strong focus on performance tuning
- Build and manage real-time data pipelines using Apache Flink and Kafka
- Orchestrate workflows using Apache Airflow, including DAG design, retries, and monitoring
- Implement data quality checks, SCD strategies, and incremental data processing
- Work with lakehouse architectures (Delta Lake / Iceberg), including partitioning and compaction strategies
- Optimize performance by handling data skew, joins, and query tuning
- Deploy and manage data applications on AWS (S3, EMR, Athena)
- Collaborate with DevOps teams to implement CI/CD pipelines and GitOps workflows
- Manage containerized workloads using Kubernetes (EKS)
Must-Have Skills
- Strong experience with Apache Spark (PySpark, Spark SQL) and performance tuning
- Hands-on experience in ETL/ELT pipeline development (SCD, incremental loads, data validation)
- Expertise in Apache Airflow (DAGs, scheduling, monitoring)
- Experience with Apache Flink and Kafka for real-time streaming pipelines
- Solid understanding of Lakehouse architectures (Iceberg / Delta Lake)
- Advanced SQL and strong Python programming skills
- Experience in data optimization techniques (handling skew, partitioning, indexing)
- Hands-on experience with AWS ecosystem (S3, EMR, Athena)
- Exposure to Kubernetes (EKS) and CI/CD (GitOps)
Good-to-Have Skills
- Experience with dbt (data build tool)
- Knowledge of CDC tools like Debezium or Flink CDC
- Familiarity with data observability tools
- Exposure to feature stores and ML data pipelines
- Experience with schema registries (Avro, Protobuf)
- Deep understanding of Delta Lake / Apache Iceberg operations
Why Join CodeVyasa?
- Work on innovative, high-impact projects with leading global clients.
- Exposure to modern technologies, scalable systems, and cloud-native architectures.
- Continuous learning and upskilling opportunities through internal and external programs.
- Supportive and collaborative work culture with flexible policies.
- Competitive salary and comprehensive benefits package.
- Free healthcare coverage for employees and dependents.
Click on Apply to know more.