GCP Data Architect (Senior)

Infogain

Location: Mumbai, Maharashtra, India
Job type: Full-time

Required skills

Airflow
BigQuery
clustering
compliance
data engineer
data modeling
data models
Dataflow
Dataproc
end-to-end
Flink
GCP
GCS
GitHub
Oracle
Parquet
SAS
SQL
SRE
Terraform
VPC
BI tools
Teradata
Vertex

About the role

Infogain

Website: infogain.com
Job details:
Roles & Responsibilities

Core Skills

Qualification
15+ years in data/analytics engineering with 7+ years architecting solutions on public cloud; 5+ years hands-on with GCP.
Proven delivery of Medallion (Bronze/Silver/Gold) architectures on GCP with GCS + Dataproc + BigQuery at enterprise scale.
Expert in PySpark and Dataproc (job orchestration, autoscaling, cluster policies, tuning, troubleshooting).
Strong BigQuery expertise: storage/compute separation, partitioning, clustering, materialized views, BI Engine, slot management, RLS/CLS.
Hands-on experience with Vertex AI (Pipelines, Feature Store, training/serving, registry, monitoring) and ML Ops best practices.
Implemented Dataplex for centralized governance (catalog, policy tags) and IAM for least-privilege, plus data security/compliance controls.
Practical integration with Oracle and Teradata via JDBC; familiarity with CDC patterns and schema evolution.
CI/CD for data platforms (Cloud Build/GitHub Actions), orchestration (Cloud Composer/Airflow), and Infrastructure as Code (Terraform).
Deep understanding of data modeling, data quality, lineage, and observability for data systems.
Excellent communication, stakeholder management, and leadership across technical and business teams.
Google certifications: Professional Cloud Architect and/or Professional Data Engineer (highly preferred).
Experience modernizing SAS workloads and translating SAS macros/PROCs to PySpark/SQL on GCP.
Knowledge of streaming (Pub/Sub, Dataflow/Flink/Spark Structured Streaming) for near-real-time requirements.
Experience with VPC Service Controls, Private Service Connect, Organization Policy, Workload Identity Federation.
Familiarity with Delta/Iceberg/Hudi tables and open table formats on GCS; data sharing patterns (Analytics Hub).
Bachelor’s/Master’s in Computer Science, Engineering, Information Systems, or equivalent experience.

Job Description

Design and own the end-to-end analytics architecture on GCP, ensuring alignment with business, security, cost, and performance goals.
Implement a Medallion architecture:
Bronze (Raw) ingestion on GCS via JDBC from Oracle/Teradata.
Silver (Curated) transformations using PySpark on Dataproc.
Gold optimized in BigQuery for analytics and BI.
Define canonical data models, storage formats (Parquet/ORC/Delta/Iceberg), and partitioning/clustering strategies.
Lead migration from SAS to PySpark, establish coding standards, and optimize Spark jobs.
Build JDBC ingestion pipelines with CDC, robust retries, schema evolution, and orchestrate workflows via Cloud Composer/Airflow and CI/CD.
Architect BigQuery models, manage cost/performance, enforce SLAs, and integrate securely with BI tools using RLS/CLS.
Define ML Ops workflows on Vertex AI, including feature pipelines, automated training, deployment, and model monitoring for drift/bias.
Implement centralized governance via Dataplex (catalog, policy tags), IAM least privilege, VPC-SC, and data security/compliance controls.
Drive cost optimization, reliability/SRE practices, monitoring, DR/BCP, and FinOps governance.
Provide architectural leadership, mentor teams, set standards, and create roadmaps, ADRs, and executive-level communication.

Experience

14-16 Years

Skills

Primary Skill: Data Engineering
Sub Skill(s): Data Engineering
Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.