Tech Entrepreneurs Association of Mumbai (TEAM)
Website:
mumbaitech.team
Job details:
Data Engineer - Hiring at Mumbai Tech Week
Mumbai | Full-time | 3-9 years experience
COMPANIES HIRING FOR THIS ROLE
Rezolv | Fractal | Quantiphi
ABOUT THE ROLE
Mumbai Tech Week's job fair brings together 30+ leading tech and AI companies hiring across engineering roles. Companies at this event are looking for data engineers who can design and operate large-scale data pipelines, architect cloud-native data platforms, and deliver reliable, production-grade systems that power dashboards, ML models, and customer-facing products.
You'll work across batch and (increasingly) streaming systems, integrate with a wide variety of source systems, and ship pipelines that the business — and in some cases, India's largest banks and Fortune 500 enterprises — rely on every day.
WHAT YOU'LL DO
- Design, build, and operate scalable batch and real-time data pipelines using Airflow and cloud-native orchestrators (Azure Data Factory, Composer, Step Functions)
- Ingest data from diverse source systems — APIs, SFTP drops, file-based handoffs (CSV, fixed-width, JSON, Excel), direct DB pulls, and streaming events
- Architect and manage data lakes and warehouses on cloud platforms (ADLS, S3, BigQuery, Synapse, Redshift, Snowflake, ClickHouse)
- Develop ETL/ELT workflows using Python, PySpark, SQL, and Spark/Databricks for high-volume, high-velocity workloads
- Implement data quality frameworks, monitoring, alerting, SLAs, and governance to ensure idempotent, reliable pipelines
- Tune SQL queries, partitioning, indexing, and pipeline performance across OLTP and OLAP stores
- Partner with analytics, data science, and product teams to power BI dashboards, ML feature pipelines, and product features that depend on fresh data
- Contribute to architecture decisions, code reviews, documentation, and engineering best practices across the data platform
WHAT WE'RE LOOKING FOR
- 3-9 years of data engineering experience with production pipelines you can talk about in detail
- Strong Python (including PySpark) — clean, testable code; comfortable building, debugging, and refactoring DAGs
- Strong SQL — complex transformations, window functions, and query performance tuning
- Hands-on Airflow or equivalent orchestrators — DAG design, sensors, hooks, custom operators, retries, idempotency
- Solid grasp of ETL/ELT patterns — incremental loads, CDC, late-arriving data, slowly changing dimensions, reconciliation
- Hands-on experience with at least one major cloud platform (Azure, AWS, or GCP) and its data services — Azure Data Factory, Databricks, ADLS, Synapse; or Glue, EMR, Redshift, S3; or Dataflow, Dataproc, BigQuery, Pub/Sub
- Working knowledge of OLAP databases — ClickHouse, BigQuery, Synapse, Redshift, or Snowflake
- Strong understanding of data modeling (relational and dimensional), partitioning, and big data technologies
- Bias to reliability — you treat pipelines as production software, not scripts
- Comfort with ambiguity and source system diversity; you build flexible, resilient ingestion without over-engineering
- Nice to have: streaming exposure (Kafka, Flink, Debezium), dbt or similar transformation tooling, BFSI/regulated-industry data experience, cloud certifications, and familiarity with PII handling and data security practices
SKILLS AND TOOLS
Python PySpark SQL Airflow Spark Databricks Azure Data Factory ADLS Synapse BigQuery Redshift Snowflake ClickHouse PostgreSQL Kafka dbt AWS Azure GCP
Click on Apply to know more.