Flag job

Report

Associate Vice President

Min Experience

12 years

Location

Bangalore, Karnataka, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

We're committed to bringing passion and customer focus to the business.

  • Design and build scalable data pipelines using PySpark, Python, and SQL for batch and real-time processing
  • Architect modern data platforms including Data Warehouses, Data Lakes, and Lakehouse configurations on AWS, Azure, or GCP
  • Develop and optimize ETL/ELT workflows with performance tuning, partitioning strategies, and data quality frameworks
  • Orchestrate complex data workflows using Airflow DAGs, managing dependencies and monitoring at scale
  • Implement data fabric architectures with robust data lineage, cataloging, and governance
  • Build data quality frameworks with automated validation, profiling, and anomaly detection
  • Work with platforms like Databricks, Snowflake, Redshift, DBT, and NoSQL databases to deliver optimized solutions
  • Deploy and manage data infrastructure on cloud platforms (AWS Glue, Athena, S3, Redshift, Lambda, EMR)
  • Establish CI/CD pipelines for data workflows using Git, Jenkins, and cloud-native deployment tools
  • Lead architecture design discussions, propose technical solutions, and define development standards and best practices
  • Create and enforce data engineering best practices including coding standards, testing frameworks, documentation, and deployment patterns
  • Build reusable frameworks, templates, and libraries to accelerate team productivity
  • Mentor data engineering teams on best practices for scalable data storage, processing, and data quality excellence
  • Ensure strict security, compliance, and data privacy throughout all data solutions
  • Collaborate with cross-functional teams including Data Scientists, Analytics Engineers, QA, and DevOps
  • Deliver solutions in Agile environments with JIRA for project management

What You Bring

  • 12+ years building production-grade data engineering solutions
  • Exceptional Team leader setting the stage for the other data engineers to consistently execute leveraging best practices
  • Strong expertise in Python and PySpark for distributed data processing
  • Advanced SQL proficiency including query optimization, window functions, CTEs, and performance tuning
  • Deep experience with batch and real-time/streaming data systems (Spark Streaming, Kafka, Kinesis)
  • Hands-on experience with modern data platforms: Databricks, Snowflake, Redshift, BigQuery
  • Expertise in data modeling techniques: dimensional modeling, star/snowflake schemas, data vault
  • Strong knowledge of data warehousing and data lake architectures with hands-on implementation experience
  • Proficiency with Airflow for workflow orchestration, DAG design, and operational monitoring
  • Deep cloud platform experience (AWS, Azure, GCP) building scalable data solutions
  • Experience with data transformation tools like DBT for analytics engineering
  • Knowledge of NoSQL databases (DynamoDB, MongoDB, Cassandra) and when to use them
  • Understanding of data quality frameworks, data validation, and data profiling techniques
  • Experience with data lineage tools and metadata management (Apache Atlas, Collibra, DataHub)
  • Proficiency with version control (Git, CodeCommit) and CI/CD pipelines (Jenkins, CodePipeline)
  • Strong Unix/Linux and shell scripting skills for automation
  • Data governance and compliance knowledge (GDPR, HIPAA, data privacy regulations)
  • Performance optimization expertise including indexing, caching, and query tuning
  • Experience establishing coding standards, testing strategies, and documentation practices
  • Strong problem-solving skills with ability to diagnose issues and architect effective solutions
  • Proven ability to mentor junior engineers, lead technical discussions, and drive engineering excellence
  • Clear communicator who thrives in collaborative, Agile environments

Bonus Points

  • Life sciences or pharma domain knowledge
  • Cloud certifications (AWS Data Analytics, Azure Data Engineer, GCP Data Engineer)
  • Experience with streaming technologies (Kafka, Flink, Spark Structured Streaming)
  • Knowledge of machine learning pipelines and MLOps practices
  • Familiarity with data observability platforms (Monte Carlo, Great Expectations)
  • Experience with containerization (Docker, Kubernetes) for data workloads
  • Exposure to data mesh architecture patterns
  • Knowledge of graph databases (Neo4j) or vector databases for AI applications
  • Experience with reverse ETL and data activation platforms
  • Terraform or Infrastructure-as-Code for data infrastructure

About the company

Strategic consulting and analytics for life sciences companies.

Skills

PySpark
Python
SQL
Airflow
Databricks
Snowflake
Redshift
DBT
AWS
Azure
GCP
Jenkins
Git
Kafka
Kinesis
Docker
Kubernetes