Report

Senior Data Engineer (ETL & AI Architecture)

Location

Mumbai, Maharashtra, India

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

NexGen Tech Solutions

Website: nexgentechsolutions.com
Job details:

Job Description: Senior Data Engineer (ETL & AI Architecture)

Experience: 6–8 Years

Location: Mumbai (Full-time from office)

Employment Type: Full Time

Reporting To: Lead – Data Analytics & AI

Role Purpose

We are seeking a highly skilled Data Engineer who goes beyond pipeline execution to deliver robust data solutions and implementations. The role involves architecting and implementing efficient Silver and Gold data layers, optimizing compute costs through deep parameter tuning, enforcing data quality and governance, and building a semantic layer that enables meaningful and consistent enterprise data querying.

We value strong foundational data and engineering principles over tool-specific expertise. Candidates from Azure, AWS, or Google Cloud backgrounds are welcome, provided they possess a deep understanding of distributed computing and can optimize systems for performance, cost, reliability, and accuracy.

Key Responsibilities

1. Architecture & Data Modelling

Design & Strategy: Collaborate with stakeholders to design, document, and implement data structures across Bronze, Silver, and Gold layers to ensure scalability and faster insights.
Data Modelling: Develop extensible data models that decouple storage from compute for flexibility.
AI Readiness: Build semantic layers (metadata, relationships, context, feature stores) to support Large Language Models (LLMs) and AI use cases.

2. Engineering, Performance Tuning & FinOps

Data Engineering: Implement ETL/ELT pipelines aligned with defined architecture.
Build scalable Silver aggregations and Gold metrics layers.
Enforce security (RBAC/ABAC), row/column-level controls, and PII handling.
Maintain data dictionaries, metadata, and lineage as part of delivery standards.
Implement proactive data quality checks.
Compute Optimization & Scalability:Optimize compute resources (memory, cores, partitions, executors) based on:
Data volume (GB to TB scale)
Transformation complexity
Data movement and network I/O
SLA requirements (batch vs real-time)
Optimize read volumes and cost efficiency.
Design scalable architectures with minimal manual intervention.
BAU Management:Handle enhancements, bug fixes, and pipeline optimizations.
Port pipelines and data when technology stacks evolve.

3. Operational Excellence

Data Quality: Implement automated frameworks (e.g., Great Expectations, dbt tests) to ensure data integrity.
Orchestration: Manage workflows and dependencies using tools like Airflow, Dagster, or ADF, including SLAs, retries, and alerting.
DevOps & CI/CD: Apply best practices including version control (Git), automated testing, and deployment pipelines.

Skillset & Requirements

5–8 years of experience in Data Engineering / Analytics Engineering, with at least 2 years in architecture and solution design.
Strong problem-solving ability with a practical, execution-focused mindset.
Experience preparing data for AI/LLM use cases (Vector DBs, Knowledge Graphs, Semantic Layers).
Expertise in data modelling (Star Schema, Snowflake) and modern data lake formats (Delta Lake, Iceberg, Hudi).
Strong understanding of distributed computing (Spark, Hive, BigQuery), including DAGs, partitioning, and shuffling.
Proven experience in performance tuning and troubleshooting large-scale systems.
Programming proficiency in SQL, Python, Spark (Scala is a plus).

Preferred / Good to Have

Experience with Generative AI architectures (RAG, Vector Databases).
Exposure to semantic/metric layer tools (LookML, Transform, MetricFlow).
Ability to prototype dashboards or analytics UI using modern AI tools.

Behavioral Attributes

High ethical standards
Strong ownership and accountability
Problem-solving mindset
First-principles thinking approach

Click on Apply to know more.

Skills

Python

Airflow

AWS

automated testing

Azure

BigQuery

data analytics

data engineer

data lake

data models

data solutions

data structures

DevOps

ETL

Git

Google Cloud

Hive

Snowflake

Spark

SQL

version control