GCP Data Engineer (Lead)
Infogain
- Location
- Mumbai, Maharashtra, India
- Job type
- Full-time
Required skills
- Airflow
- BigQuery
- CI
- clustering
- communication skills
- compliance
- data modeling
- Dataflow
- Dataproc
- GCP
- GCS
- GitHub
- Google Cloud
- Oracle
- SAS
- Snowflake
- Teradata
- Vertex
About the role
Infogain
Website:
infogain.com
Job details:
Roles & Responsibilities
Core Skills
Required Skills & Experience
- 9–12 years of experience in data engineering or analytics, with 3+ years hands-on on GCP.
- Strong experience with PySpark, Dataproc, GCS, BigQuery, and JDBC ingestion.
- Proven experience migrating SAS workloads to PySpark or SQL-based systems.
- Hands-on knowledge of GCP Medallion architecture (Bronze/Silver/Gold).
- Understanding of Dataplex, IAM, policy tags, and secure data handling.
- Experience with CI/CD (Cloud Build/GitHub Actions) and workflow orchestration (Cloud Composer/Airflow).
- Strong problem-solving ability, debugging skills, and ability to guide teams through technical challenges.
Preferred Skills
- Experience with Vertex AI, ML Ops, and ML pipeline deployment.
- Knowledge of Delta/Iceberg/Hudi table formats on GCS.
- Exposure to real-time ingestion (Pub/Sub, Dataflow).
- Google Cloud Professional Data Engineer or Cloud Architect certification.
Soft Skills
- Strong leadership and mentoring capabilities.
- Excellent communication skills to support developers, architects, and business teams.
- Ability to manage multiple priorities, resolve conflicts, and maintain steady progress under pressure.
Key Responsibilities
- Hands-on Technical Leadership
- Work closely with development teams on a daily basis to guide solution design, troubleshoot issues, and resolve technical blockers.
- Enforce engineering best practices, coding standards, and architectural guidelines across all data pipelines and workloads.
- Perform design and code reviews, ensuring quality, scalability, and reliability of the platform.
- Data Engineering on GCP
- Lead development of ingestion pipelines via Direct JDBC connectivity from Oracle and Teradata into the Raw/Bronze layer on GCS.
- Develop and optimize PySpark workloads on Dataproc for data cleansing, transformation, and harmonization into the Curated/Silver layer.
- Contribute to design of the Gold layer in BigQuery, including table structures, partitioning, clustering, and performance optimization.
- Migration from SAS to GCP
- Translate existing SAS logic into PySpark, ensuring functional parity, improved performance, and operational efficiency.
- Provide guidance on PySpark coding patterns, UDFs, optimization strategies, shuffle/skew handling, and best practices for Dataproc jobs.
- BigQuery Engineering & Optimization
- Build and optimize SQL models, materialized views, and analytical datasets in BigQuery.
- Apply query optimization techniques, cost controls, and data modeling best practices (star/snowflake).
- Implement RLS/CLS for secure reporting and work with BI teams to integrate BigQuery into reporting tools.
- Vertex AI & ML Support
- Assist data scientists in building ML pipelines using Vertex AI (training, prediction, feature engineering).
- Guide integration of feature pipelines from Silver ? Vertex AI Feature Store.
- Ensure reproducibility, lineage, and model monitoring (drift, bias).
- Data Governance & Security (Dataplex + IAM)
- Implement and enforce governance standards using Dataplex, including cataloging, policy tags, and data domains.
- Ensure datasets follow proper IAM roles, tagging, and compliance (PII/PCI/PHI masking where needed).
- Support lineage metadata, DQ implementation, and documentation.
- Operations, Monitoring & Cost Optimization
- Optimize Dataproc clusters (autoscaling, Preemptibles), GCS storage lifecycle policies, and BigQuery costs.
- Establish monitoring dashboards, logs, alerts, and operational KPIs.
- Troubleshoot and resolve production issues, ensuring high availability and reliability.
Experience
Skills
- Primary Skill: Data Engineering
- Sub Skill(s): Data Engineering
- Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.