GCP Data Engineer (Lead)

Infogain

Location: Mumbai, Maharashtra, India
Job type: Full-time

Required skills

Airflow
BigQuery
CI
clustering
communication skills
compliance
data modeling
Dataflow
Dataproc
GCP
GCS
GitHub
Google Cloud
Oracle
SAS
Snowflake
Teradata
Vertex

About the role

Infogain

Website: infogain.com
Job details:
Roles & Responsibilities

Core Skills

Required Skills & Experience

9–12 years of experience in data engineering or analytics, with 3+ years hands-on on GCP.
Strong experience with PySpark, Dataproc, GCS, BigQuery, and JDBC ingestion.
Proven experience migrating SAS workloads to PySpark or SQL-based systems.
Hands-on knowledge of GCP Medallion architecture (Bronze/Silver/Gold).
Understanding of Dataplex, IAM, policy tags, and secure data handling.
Experience with CI/CD (Cloud Build/GitHub Actions) and workflow orchestration (Cloud Composer/Airflow).
Strong problem-solving ability, debugging skills, and ability to guide teams through technical challenges.

Preferred Skills

Experience with Vertex AI, ML Ops, and ML pipeline deployment.
Knowledge of Delta/Iceberg/Hudi table formats on GCS.
Exposure to real-time ingestion (Pub/Sub, Dataflow).
Google Cloud Professional Data Engineer or Cloud Architect certification.

Soft Skills

Strong leadership and mentoring capabilities.
Excellent communication skills to support developers, architects, and business teams.
Ability to manage multiple priorities, resolve conflicts, and maintain steady progress under pressure.

Key Responsibilities

Hands-on Technical Leadership
Work closely with development teams on a daily basis to guide solution design, troubleshoot issues, and resolve technical blockers.
Enforce engineering best practices, coding standards, and architectural guidelines across all data pipelines and workloads.
Perform design and code reviews, ensuring quality, scalability, and reliability of the platform.
Data Engineering on GCP
Lead development of ingestion pipelines via Direct JDBC connectivity from Oracle and Teradata into the Raw/Bronze layer on GCS.
Develop and optimize PySpark workloads on Dataproc for data cleansing, transformation, and harmonization into the Curated/Silver layer.
Contribute to design of the Gold layer in BigQuery, including table structures, partitioning, clustering, and performance optimization.
Migration from SAS to GCP
Translate existing SAS logic into PySpark, ensuring functional parity, improved performance, and operational efficiency.
Provide guidance on PySpark coding patterns, UDFs, optimization strategies, shuffle/skew handling, and best practices for Dataproc jobs.
BigQuery Engineering & Optimization
Build and optimize SQL models, materialized views, and analytical datasets in BigQuery.
Apply query optimization techniques, cost controls, and data modeling best practices (star/snowflake).
Implement RLS/CLS for secure reporting and work with BI teams to integrate BigQuery into reporting tools.
Vertex AI & ML Support
Assist data scientists in building ML pipelines using Vertex AI (training, prediction, feature engineering).
Guide integration of feature pipelines from Silver ? Vertex AI Feature Store.
Ensure reproducibility, lineage, and model monitoring (drift, bias).
Data Governance & Security (Dataplex + IAM)
Implement and enforce governance standards using Dataplex, including cataloging, policy tags, and data domains.
Ensure datasets follow proper IAM roles, tagging, and compliance (PII/PCI/PHI masking where needed).
Support lineage metadata, DQ implementation, and documentation.
Operations, Monitoring & Cost Optimization
Optimize Dataproc clusters (autoscaling, Preemptibles), GCS storage lifecycle policies, and BigQuery costs.
Establish monitoring dashboards, logs, alerts, and operational KPIs.
Troubleshoot and resolve production issues, ensuring high availability and reliability.

Experience

11-12 Years

Skills

Primary Skill: Data Engineering
Sub Skill(s): Data Engineering
Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.