Avance Consulting
Website:
avanceservices.com
Job details:
Location: Bengaluru
Job Summary:
We are seeking a highly skilled and experienced Senior Scala Data Engineer to join our dynamic data team. In this role, you will be instrumental in designing, developing, and maintaining our next-generation data pipelines and platforms using Scala, Apache Spark, and cloud-native technologies. You will work on challenging problems involving large-scale data ingestion, transformation, and processing, contributing directly to our analytical capabilities and product features.
Key Responsibilities:
• Design & Development: Architect, build, and optimize robust, scalable, and efficient data pipelines using Scala and Apache Spark (Spark Core, Spark SQL, Spark Streaming).
• Data Ingestion: Develop solutions for ingesting high-volume, high-velocity data from various sources (e.g., relational databases, NoSQL databases, APIs, message queues like Kafka, log files) into our data lake/warehouse.
• Data Transformation: Implement complex data transformations, aggregations, and feature engineering logic to prepare data for analytics, machine learning models, and operational systems.
• Performance Optimization: Identify and resolve performance bottlenecks in Spark jobs and data pipelines, ensuring optimal resource utilization and execution times.
• Data Quality & Governance: Implement data validation, monitoring, and alerting mechanisms to ensure data accuracy, completeness, and consistency. Contribute to data governance best practices.
• Cloud Infrastructure: Leverage and optimize cloud services (e.g., AWS EMR/Glue, Azure Databricks/Synapse, GCP DataProc/BigQuery) for data processing and storage.
• Automation & Orchestration: Design and implement automated workflows for data pipelines using tools like Apache Airflow, AWS Step Functions, or similar.
Required Qualifications:
• Experience: 5+ years of professional experience in data engineering, with a strong focus on building large-scale data solutions.
• Scala Expertise: Proven advanced proficiency in Scala programming language.
• Apache Spark: Deep hands-on experience with Apache Spark (Core, SQL, Streaming) for batch and real-time data processing.
• Cloud Platforms: Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and their relevant data services (e.g., AWS S3, EMR, Glue, Kinesis; Azure Data Lake, Databricks, Event Hubs; GCP GCS, DataProc, Pub/Sub).
• Data Warehousing: Strong understanding of data warehousing concepts, dimensional modeling (star/snowflake schemas), and ETL/ELT processes.
• SQL: Expert-level SQL skills for data querying, manipulation, and optimization.
• Distributed Systems: Experience working with distributed systems and understanding of their challenges (consistency, fault tolerance, concurrency).
• Version Control: Proficiency with Git and collaborative development workflows.
Nice-to-Haves:
• Streaming Technologies: Experience with real-time streaming platforms like Apache Kafka, Apache Flink, or Kinesis.
• Containerization & Orchestration: Experience with Docker, Kubernetes, and container orchestration for Spark applications.
• Data Orchestration Tools: Hands-on experience with Apache Airflow, Dagster, Prefect, or similar workflow management tools.
• NoSQL Databases: Experience with NoSQL databases such as Cassandra, MongoDB, DynamoDB, or HBase.
• Data Lakehouse/Modern DW: Experience with technologies like Delta Lake, Apache Iceberg, Snowflake, Redshift, or BigQuery.
• MLOps: Familiarity with MLOps principles and supporting data pipelines for machine learning models.
• CI/CD: Experience setting up and maintaining CI/CD pipelines for data engineering projects.
• Performance Tuning: Advanced knowledge of Spark performance tuning techniques, including memory management, shuffle optimization, and data partitioning strategies.
• Certifications: Relevant cloud (AWS Certified Data Analytics, Azure Data Engineer Associate, GCP Professional Data Engineer) or Spark certifications.
Click on Apply to know more.