Data Engineer(Scala)

Avance Consulting

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Airflow
AWS
Apache
Apache Airflow
Apache Flink
Apache Kafka
Apache Spark
Azure
BigQuery
Cassandra
containerization
data analytics
data engineer
data ingestion
data lake
data solutions
Databricks
Dataproc
Docker
DynamoDB
ETL
Flink
GCP
GCS
Git
HBase
Kafka
Kubernetes
machine learning
NoSQL
product features
Snowflake
Spark
SQL

About the role

Avance Consulting

Website: avanceservices.com
Job details:

Location: Bengaluru

Job Summary:

We are seeking a highly skilled and experienced Senior Scala Data Engineer to join our dynamic data team. In this role, you will be instrumental in designing, developing, and maintaining our next-generation data pipelines and platforms using Scala, Apache Spark, and cloud-native technologies. You will work on challenging problems involving large-scale data ingestion, transformation, and processing, contributing directly to our analytical capabilities and product features.

Key Responsibilities:

• Design & Development: Architect, build, and optimize robust, scalable, and efficient data pipelines using Scala and Apache Spark (Spark Core, Spark SQL, Spark Streaming).

• Data Ingestion: Develop solutions for ingesting high-volume, high-velocity data from various sources (e.g., relational databases, NoSQL databases, APIs, message queues like Kafka, log files) into our data lake/warehouse.

• Data Transformation: Implement complex data transformations, aggregations, and feature engineering logic to prepare data for analytics, machine learning models, and operational systems.

• Performance Optimization: Identify and resolve performance bottlenecks in Spark jobs and data pipelines, ensuring optimal resource utilization and execution times.

• Data Quality & Governance: Implement data validation, monitoring, and alerting mechanisms to ensure data accuracy, completeness, and consistency. Contribute to data governance best practices.

• Cloud Infrastructure: Leverage and optimize cloud services (e.g., AWS EMR/Glue, Azure Databricks/Synapse, GCP DataProc/BigQuery) for data processing and storage.

• Automation & Orchestration: Design and implement automated workflows for data pipelines using tools like Apache Airflow, AWS Step Functions, or similar.

Required Qualifications:

• Experience: 5+ years of professional experience in data engineering, with a strong focus on building large-scale data solutions.

• Scala Expertise: Proven advanced proficiency in Scala programming language.

• Apache Spark: Deep hands-on experience with Apache Spark (Core, SQL, Streaming) for batch and real-time data processing.

• Cloud Platforms: Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and their relevant data services (e.g., AWS S3, EMR, Glue, Kinesis; Azure Data Lake, Databricks, Event Hubs; GCP GCS, DataProc, Pub/Sub).

• Data Warehousing: Strong understanding of data warehousing concepts, dimensional modeling (star/snowflake schemas), and ETL/ELT processes.

• SQL: Expert-level SQL skills for data querying, manipulation, and optimization.

• Distributed Systems: Experience working with distributed systems and understanding of their challenges (consistency, fault tolerance, concurrency).

• Version Control: Proficiency with Git and collaborative development workflows.

Nice-to-Haves:

• Streaming Technologies: Experience with real-time streaming platforms like Apache Kafka, Apache Flink, or Kinesis.

• Containerization & Orchestration: Experience with Docker, Kubernetes, and container orchestration for Spark applications.

• Data Orchestration Tools: Hands-on experience with Apache Airflow, Dagster, Prefect, or similar workflow management tools.

• NoSQL Databases: Experience with NoSQL databases such as Cassandra, MongoDB, DynamoDB, or HBase.

• Data Lakehouse/Modern DW: Experience with technologies like Delta Lake, Apache Iceberg, Snowflake, Redshift, or BigQuery.

• MLOps: Familiarity with MLOps principles and supporting data pipelines for machine learning models.

• CI/CD: Experience setting up and maintaining CI/CD pipelines for data engineering projects.

• Performance Tuning: Advanced knowledge of Spark performance tuning techniques, including memory management, shuffle optimization, and data partitioning strategies.

• Certifications: Relevant cloud (AWS Certified Data Analytics, Azure Data Engineer Associate, GCP Professional Data Engineer) or Spark certifications.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.