GCP Data Architect (Senior)
Infogain
- Location
- Mumbai, Maharashtra, India
- Job type
- Full-time
Required skills
- Airflow
- BigQuery
- clustering
- compliance
- data engineer
- data modeling
- data models
- Dataflow
- Dataproc
- end-to-end
- Flink
- GCP
- GCS
- GitHub
- Oracle
- Parquet
- SAS
- SQL
- SRE
- Terraform
- VPC
- BI tools
- Teradata
- Vertex
About the role
Infogain
Website:
infogain.com
Job details:
Roles & Responsibilities
Core Skills
- Qualification
- 15+ years in data/analytics engineering with 7+ years architecting solutions on public cloud; 5+ years hands-on with GCP.
- Proven delivery of Medallion (Bronze/Silver/Gold) architectures on GCP with GCS + Dataproc + BigQuery at enterprise scale.
- Expert in PySpark and Dataproc (job orchestration, autoscaling, cluster policies, tuning, troubleshooting).
- Strong BigQuery expertise: storage/compute separation, partitioning, clustering, materialized views, BI Engine, slot management, RLS/CLS.
- Hands-on experience with Vertex AI (Pipelines, Feature Store, training/serving, registry, monitoring) and ML Ops best practices.
- Implemented Dataplex for centralized governance (catalog, policy tags) and IAM for least-privilege, plus data security/compliance controls.
- Practical integration with Oracle and Teradata via JDBC; familiarity with CDC patterns and schema evolution.
- CI/CD for data platforms (Cloud Build/GitHub Actions), orchestration (Cloud Composer/Airflow), and Infrastructure as Code (Terraform).
- Deep understanding of data modeling, data quality, lineage, and observability for data systems.
- Excellent communication, stakeholder management, and leadership across technical and business teams.
- Google certifications: Professional Cloud Architect and/or Professional Data Engineer (highly preferred).
- Experience modernizing SAS workloads and translating SAS macros/PROCs to PySpark/SQL on GCP.
- Knowledge of streaming (Pub/Sub, Dataflow/Flink/Spark Structured Streaming) for near-real-time requirements.
- Experience with VPC Service Controls, Private Service Connect, Organization Policy, Workload Identity Federation.
- Familiarity with Delta/Iceberg/Hudi tables and open table formats on GCS; data sharing patterns (Analytics Hub).
- Bachelor’s/Master’s in Computer Science, Engineering, Information Systems, or equivalent experience.
Job Description
- Design and own the end-to-end analytics architecture on GCP, ensuring alignment with business, security, cost, and performance goals.
- Implement a Medallion architecture:
- Bronze (Raw) ingestion on GCS via JDBC from Oracle/Teradata.
- Silver (Curated) transformations using PySpark on Dataproc.
- Gold optimized in BigQuery for analytics and BI.
- Define canonical data models, storage formats (Parquet/ORC/Delta/Iceberg), and partitioning/clustering strategies.
- Lead migration from SAS to PySpark, establish coding standards, and optimize Spark jobs.
- Build JDBC ingestion pipelines with CDC, robust retries, schema evolution, and orchestrate workflows via Cloud Composer/Airflow and CI/CD.
- Architect BigQuery models, manage cost/performance, enforce SLAs, and integrate securely with BI tools using RLS/CLS.
- Define ML Ops workflows on Vertex AI, including feature pipelines, automated training, deployment, and model monitoring for drift/bias.
- Implement centralized governance via Dataplex (catalog, policy tags), IAM least privilege, VPC-SC, and data security/compliance controls.
- Drive cost optimization, reliability/SRE practices, monitoring, DR/BCP, and FinOps governance.
- Provide architectural leadership, mentor teams, set standards, and create roadmaps, ADRs, and executive-level communication.
Experience
Skills
- Primary Skill: Data Engineering
- Sub Skill(s): Data Engineering
- Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.