Senior Data Engineer - ETL & ML Integration

Salary

₹15 - 30 LPA

Min Experience

4 years

Location

Mumbai

JobType

full-time

About the role

Senior Data Engineer 

 

The Senior Data Engineer will be a key member of the Product Team. They will be responsible for planning and execution of Data ETL Pipelines and also play an important role in maintenance of existing data pipelines. 

 

Responsibilities

 

  • Collaborate with cross-functional teams to identify data sources and requirements.
  • Design and implement data ingestion pipelines to collect data from various sources, including databases, APIs, and streaming platforms.
  • Develop and maintain ETL (Extract, Transform, Load) processes to prepare data for analysis and reporting.
  • Automate routine data tasks to improve efficiency and reliability.
  • Implement data security measures to protect sensitive data and ensure compliance with regulatory requirements.
  • Design data solutions that can scale to accommodate growing data volumes.
  • Involve in deployment of ML models from Data Scientists into production

 

 

Skill Requirement

Must have -

  • Advanced Proficiency in Python Coding
  • Experience in working with both SQL and NoSQL Databases(Postgres, MongoDB, DynamoDB)
  • Experience in working with AWS services like S3, Lambda, Glue, EMR
  • Experience in handling different file formats like csv, tsv, json and parquet
  • Hands on experience in creating and maintaining ETL data pipelines
  • At least 4 years prior experience working as a Data/ML Engineer at FinTech Startups or Data Analytics companies
  • Understanding of Basic ML models and Statistical Analytics

Good to have-

  • Proficiency in parallel computation techniques
  • Experience in setting up distributed computing environments(Spark, Dask)
  • Good Data Structure and Algorithm Knowledge,
  • Experience in designing Database/Data Warehouse Schemas.
  • Experience in Deploying ML Models into production.
  • Knowledge of optimizing Database Setups and configuration

Skills

ETL
Data Pipelines
S3
AWS
Python
SQL