Report

Associate Vice President

Min Experience

12 years

Location

Bangalore, Karnataka, India

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

We're committed to bringing passion and customer focus to the business.

Design and build scalable data pipelines using PySpark, Python, and SQL for batch and real-time processing
Architect modern data platforms including Data Warehouses, Data Lakes, and Lakehouse configurations on AWS, Azure, or GCP
Develop and optimize ETL/ELT workflows with performance tuning, partitioning strategies, and data quality frameworks
Orchestrate complex data workflows using Airflow DAGs, managing dependencies and monitoring at scale
Implement data fabric architectures with robust data lineage, cataloging, and governance
Build data quality frameworks with automated validation, profiling, and anomaly detection
Work with platforms like Databricks, Snowflake, Redshift, DBT, and NoSQL databases to deliver optimized solutions
Deploy and manage data infrastructure on cloud platforms (AWS Glue, Athena, S3, Redshift, Lambda, EMR)
Establish CI/CD pipelines for data workflows using Git, Jenkins, and cloud-native deployment tools
Lead architecture design discussions, propose technical solutions, and define development standards and best practices
Create and enforce data engineering best practices including coding standards, testing frameworks, documentation, and deployment patterns
Build reusable frameworks, templates, and libraries to accelerate team productivity
Mentor data engineering teams on best practices for scalable data storage, processing, and data quality excellence
Ensure strict security, compliance, and data privacy throughout all data solutions
Collaborate with cross-functional teams including Data Scientists, Analytics Engineers, QA, and DevOps
Deliver solutions in Agile environments with JIRA for project management

What You Bring

12+ years building production-grade data engineering solutions
Exceptional Team leader setting the stage for the other data engineers to consistently execute leveraging best practices
Strong expertise in Python and PySpark for distributed data processing
Advanced SQL proficiency including query optimization, window functions, CTEs, and performance tuning
Deep experience with batch and real-time/streaming data systems (Spark Streaming, Kafka, Kinesis)
Hands-on experience with modern data platforms: Databricks, Snowflake, Redshift, BigQuery
Expertise in data modeling techniques: dimensional modeling, star/snowflake schemas, data vault
Strong knowledge of data warehousing and data lake architectures with hands-on implementation experience
Proficiency with Airflow for workflow orchestration, DAG design, and operational monitoring
Deep cloud platform experience (AWS, Azure, GCP) building scalable data solutions
Experience with data transformation tools like DBT for analytics engineering
Knowledge of NoSQL databases (DynamoDB, MongoDB, Cassandra) and when to use them
Understanding of data quality frameworks, data validation, and data profiling techniques
Experience with data lineage tools and metadata management (Apache Atlas, Collibra, DataHub)
Proficiency with version control (Git, CodeCommit) and CI/CD pipelines (Jenkins, CodePipeline)
Strong Unix/Linux and shell scripting skills for automation
Data governance and compliance knowledge (GDPR, HIPAA, data privacy regulations)
Performance optimization expertise including indexing, caching, and query tuning
Experience establishing coding standards, testing strategies, and documentation practices
Strong problem-solving skills with ability to diagnose issues and architect effective solutions
Proven ability to mentor junior engineers, lead technical discussions, and drive engineering excellence
Clear communicator who thrives in collaborative, Agile environments

Bonus Points

Life sciences or pharma domain knowledge
Cloud certifications (AWS Data Analytics, Azure Data Engineer, GCP Data Engineer)
Experience with streaming technologies (Kafka, Flink, Spark Structured Streaming)
Knowledge of machine learning pipelines and MLOps practices
Familiarity with data observability platforms (Monte Carlo, Great Expectations)
Experience with containerization (Docker, Kubernetes) for data workloads
Exposure to data mesh architecture patterns
Knowledge of graph databases (Neo4j) or vector databases for AI applications
Experience with reverse ETL and data activation platforms
Terraform or Infrastructure-as-Code for data infrastructure

About the company

Strategic consulting and analytics for life sciences companies.

Skills

PySpark

Python

SQL

Airflow

Databricks

Snowflake

Redshift

DBT

AWS

Azure

GCP

Jenkins

Git

Kafka

Kinesis

Docker

Kubernetes