Data Engineer

Healthark Insights

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
Airflow
AWS
Apache
Apache Airflow
data engineer
data lake
data modeling
Databricks
ETL
Lambda
Terraform

About the role

Healthark Insights

Website: healtharkinsights.com
Job details:

Company Detail

Founded in 2015, Healthark is India’s leading pure-play healthcare and life sciences consulting firm, offering global management consulting services. We are a team that blends deep domain expertise, scientific rigor, and a diverse skill set to address complex challenges for our clients. Our comprehensive range of services includes Growth Advisory, GCC Advisory, Strategy formulation, Real-World Evidence (RWE), Information, Data & Technology. Backed by a team of 150+ professionals, we've successfully delivered over 1000 projects across 60+ markets. Our growing portfolio includes 60+ clients, ranging from nimble startups to industry-leading global giants.

Operating out of key offices in Ahmedabad and Hyderabad, Healthark continues to drive global impact through deep domain expertise and strategic insights.

Position: Data Engineer

Experience: 4-8 Years

Location: Pune, Bangalore, Chennai

Working Days: Mon to Fri

Company URL: https://healthark.ai/

Role Overview

We are seeking a Data Engineer with 3–5 years of experience in building scalable data pipelines and modern data lake architectures using Databricks, PySpark,Airflow, and AWS services. The role involves developing batch and streaming pipelines, optimizing data processing workflows, and supporting analytics and reporting platforms.

Key Responsibilities

Design, develop, and maintain data pipelines using PySpark and Databricks.
Build and orchestrate workflows using Apache Airflow.
Develop and manage data lake solutions using Apache Iceberg.
Implement ingestion pipelines from multiple sources into AWS S3 based data lakes.
Work with AWS services such as S3, AWS Glue/Crawlers, Lambda, Lake Formation, SNS/SQS, and Event Bridge.
Optimize Spark jobs and data processing performance.
Implement data quality checks and monitoring mechanisms.
Support CI/CD processes and infrastructure automation.

Required Skills

Strong experience in Python and PySpark.
Hands-on experience with Databricks.
Experience building pipelines using Apache Airflow.
Strong knowledge of AWS Data Lake ecosystem.
Experience with Apache Icebergor modern table formats.
Good understanding of data modeling and ETL concepts.
Experience working with large datasets and distributed systems.

Preferred Skills

Exposure to Kafka-based streaming pipelines.
Experience with Terraform or Infrastructure as Code (IaC).
Understanding of data governance using Lake Formation.

Education

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.