Flag job

Report

Senior Developer, Data Engineer

Location

Hyderabad, Telangana, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

ICE

Website: ice.com
Job details:
Job Description

Job Purpose

Intercontinental Exchange, Inc. (ICE) is seeking an experienced Senior Data Engineer to join the AI Centre of Excellence (AI CoE) team. The AI CoE drives the adoption of artificial intelligence and advanced analytics across ICE's global exchange and financial data business, bringing together data engineering, machine learning, and applied research to deliver intelligent solutions at scale. This role sits at the foundation of that work, ensuring the data infrastructure, pipelines, and platforms the AI CoE depends on are robust, scalable, and fit for purpose.

The successful candidate will take ownership of the orchestration layer that keeps our data moving reliably, contribute to the lakehouse architecture that provides clean and trusted data to analytical and ML consumers, and develop the transformation logic that converts raw inputs into high-quality, governed data products. The role requires close collaboration with data scientists, ML engineers, and business analysts, and carries meaningful input into the tooling decisions and engineering standards the team adopts.

We are looking for a candidate with demonstrable, production-scale experience in data engineering and pipeline orchestration. Technical depth is essential, as is the ability to produce well-structured, maintainable code and to operate effectively within a collaborative engineering team.

Responsibilities

  • Own Apache Airflow end-to-end. You will write DAGs that handle complex multi-step workflows across ingestion, transformation, validation, and delivery. That means getting the fundamentals right: idempotency, backfill safety, SLA alerting, dynamic task mapping, and thorough testing. You will also review other engineers' DAGs and raise the standard across the team.
  • Keep Airflow running well on Kubernetes. You will manage the deployment using KubernetesExecutor or CeleryKubernetesExecutor, handle Helm upgrades, tune pod templates and resource limits, and deal with scheduler performance issues before they become incidents.
  • Build ETL and ELT pipelines that hold up in production. Sources will vary: APIs, message queues, databases, object storage. The expectation is that pipelines are fault-tolerant, incremental where it makes sense, and straightforward to debug when something goes wrong.
  • Treat data quality as part of the job, not an afterthought. Schema validation, row-count reconciliation, and anomaly detection should be baked into pipelines from the start. Freshness and accuracy are part of your definition of done.
  • Design and maintain a lakehouse architecture using Apache Iceberg as the primary table format, following a Bronze, Silver, and Gold medallion structure. Schema evolution, partitioning strategies, and time-travel queries will be regular concerns, not edge cases.
  • Use Databricks to manage cluster compute, Delta Lake workflows, Unity Catalog, and scheduled jobs. You will keep things cost-efficient, well-organised, and easy for others to navigate.
  • Build and maintain dbt projects that sit alongside the Airflow orchestration layer. Models, tests, snapshots, and macros should be well-structured and documented. Test failures and freshness checks should surface where people can act on them.
  • Contribute reusable components to the shared platform: pipeline templates, custom Airflow operators, and utility libraries that make the whole team faster and more consistent.
  • Partner with data scientists and ML engineers to deliver feature pipelines and training datasets with the freshness and reproducibility their models need.
  • Instrument your work. Pipelines should have metrics, dashboards, and alerts that surface real problems early. You will own runbooks and participate in on-call cover.
  • Play an active part in architecture discussions, tooling evaluations, and mentoring. Your experience should benefit the team, not just your own workstream.

Knowledge And Experience

  • Apache Airflow: Expert level, with at least three years running Airflow in production. You should know the scheduler internals, executor types, XComs, Connections, Variables, pools, and the TaskFlow API well enough to make trade-off decisions and explain them clearly.
  • Airflow on Kubernetes: Real experience deploying and operating Airflow on Kubernetes via KubernetesExecutor or CeleryKubernetesExecutor: pod templates, resource tuning, persistent volume claims, Helm chart management, and upgrades that do not take the scheduler offline.
  • DAG Development: Strong Python applied to DAG authoring: dynamic task mapping, custom operators and sensors, cross-DAG triggers, parameterised pipelines, and proper test coverage. Writing testable DAGs should feel natural, not optional.
  • ETL and ELT Pipelines: Demonstrated experience building batch and incremental pipelines at scale. You understand transformation logic, data quality validation, and lineage capture, and you have shipped pipelines that other people maintain without you.
  • Data Lakehouse Architecture: A solid grasp of lakehouse design in practice: how to implement a medallion model, when to use which table format, and how to make data reliably available to downstream consumers without constant hand-holding.
  • Apache Iceberg: Hands-on knowledge of the Iceberg table spec: snapshot isolation, partition evolution, hidden partitioning, and time-travel queries. You have managed Iceberg tables in a real catalog environment, whether Hive metastore, a REST catalog, or otherwise.
  • Databricks: Practical Databricks experience covering cluster management, Delta Lake, Workflows or Jobs, and Unity Catalog. Experience with Databricks Asset Bundles or MLflow is a welcome bonus.
  • dbt: Confident building dbt projects from the ground up: models, sources, tests, snapshots, seeds, and macros. You know how to wire dbt into Airflow, write documentation that people actually use, and make freshness checks meaningful.
  • Python: Strong engineering-grade Python: well-structured, tested, and packaged. Comfortable with pandas, PySpark, SQLAlchemy, and Pydantic. You write code other engineers want to read.

Preferred Knowledge And Experience

  • Apache Kafka and Streaming: Experience with near-real-time ingestion using Kafka, Kafka Connect, or Flink, including landing data into lakehouse storage.
  • Apache Spark: PySpark proficiency for large-scale distributed processing, with a feel for memory management, shuffle optimisation, and partition sizing.
  • Kubernetes and OpenShift: Platform knowledge beyond Airflow: Helm, RBAC, and namespace management, enough to self-serve on infrastructure without leaning on platform teams.
  • Data Governance and Contracts: Familiarity with data contract approaches, catalogue tools such as DataHub or Collibra, and quality frameworks like Great Expectations or Soda.
  • Cloud Platforms: Experience with managed data services on AWS, Azure, or GCP: S3, Glue, ADLS, Synapse, GCS, Dataproc, and similar.
  • Infrastructure as Code: Ability to provision data infrastructure using Terraform or Pulumi.
  • ML Feature Engineering: An understanding of feature stores and experience building pipelines that serve model training and inference reliably.
Click on Apply to know more.

Skills

Python
advanced analytics
Airflow
AWS
Apache
Apache Airflow
Apache Kafka
Artificial Intelligence
Azure
Chart
data engineer
Databricks
Dataproc
end-to-end
ETL
Flink
GCS
Helm
Hive
ICE
Kafka
Kubernetes
machine learning
Pandas
SQLAlchemy
Terraform
Unity