Lead Data Engineer

Inferyx

Location: Pune City, Maharashtra, India
Job type: Full-time

Required skills

Python
AWS
Apache
Apache Spark
Artificial Intelligence
Azure
business analytics
business objectives
data modeling
data models
data solutions
data structures
data warehouse
Databricks
design patterns
end-to-end
ETL
Google Cloud
Hadoop
Java
machine learning
MySQL
PostgreSQL
Snowflake
Spark
SQL
version control

About the role

Inferyx

Website: inferyx.com
Job details:

About the Company:

We are a global analytics company, with a mission to empower enterprises to build scalable and robust artificial intelligence & machine learning based applications and solutions. We are a team of data engineers and data scientists helping businesses with actionable intelligence and data-driven decisions. The Inferyx platform is an end to end data and analytics platform that lets you disrupt and accelerate with data.

Job Title: Lead Data Engineer

Experience: 8+ Years

Location : Pune

Employment Type: Full-time

Job Description:

We are looking for a Lead Data Engineer who is responsible for the design and development of scalable data pipelines and integrations to support continual increases in data volume and complexity. Work with analytics and business teams to understand their needs, create pipelines, improve data models that feed BI and visualization tools.

Key Responsibilities:

Data Pipeline Development

Design, develop, and maintain scalable batch and streaming data pipelines using Apache Spark (PySpark/Scala) and Databricks. Build end-to-end ETL/ELT workflows for ingesting, transforming, and validating data from diverse source systems while ensuring data accuracy, reliability, and performance.

Data Modeling & Analytics Enablement

Design and maintain efficient data models, schemas, and curated datasets that support business analytics, reporting, and visualization tools. Optimize data structures for performance, scalability, and cost across lakehouse and data warehouse platforms.

Data Integration

Integrate data from multiple internal and external sources, including relational databases, APIs, flat files, and streaming sources. Ensure seamless and reliable data movement across cloud platforms, data lakes, and analytics systems.

Performance Optimization

Identify and resolve performance bottlenecks in Spark jobs, Databricks workloads, and data storage layers. Tune Spark configurations, optimize queries, and improve pipeline efficiency to support large-scale data processing.

Data Quality & Governance

Implement data quality checks, validation rules, and governance standards to ensure trustworthy data. Monitor data quality metrics and proactively address data issues in collaboration with stakeholders.

Collaboration & Stakeholder Engagement

Work closely with data analysts, data scientists, and business teams to understand requirements and deliver data solutions aligned with business objectives. Partner with platform and cloud teams to ensure architectural consistency and best practices.

Documentation & Best Practices

Document data pipelines, data models, and technical designs. Follow best practices for software development, version control, CI/CD, and deployment in distributed data environments.

Continuous Improvement

Stay current with emerging data engineering technologies, Spark and Databricks enhancements, and cloud data platform innovations. Drive automation and process improvements to increase reliability, scalability, and developer productivity.

Required Skills and Qualifications:

● Bachelor's degree in Computer Science, Engineering, or related field.

● 8+ years of experience in data engineering or related roles.

● Proficiency in programming languages such as Python, Java, or Scala.

● Strong SQL skills and experience with relational databases (e.g., MySQL, PostgreSQL).

● Experience with data warehousing concepts and technologies (e.g., Snowflake, Redshift).

● Familiarity with big data processing frameworks (e.g., Apache Spark, Hadoop).

● Hands-on experience with ETL tools and data integration platforms.

● Knowledge of cloud platforms such as AWS, Azure, or Google Cloud Platform.

● Understanding of data modeling principles and data warehousing design patterns.

● Excellent problem-solving skills and attention to detail.

● Strong communication and collaboration skills, with the ability to work effectively in a team environment.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.