Data Engineer

Recruiterflow

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
AWS
Apache
Apache Spark
automated testing
containerization
data ingestion
data modeling
data models
database
Docker
DynamoDB
EC2
ETL
GitLab
Hadoop
Jenkins
Kafka
Kubernetes
Lambda
NoSQL
prototypes
Ray
React
Spark
SQL

About the role

Recruiterflow

Website: recruiterflow.com
Job details:
We are looking for a Data Scientist with 3+ years of experience, with experience in Retrieval-Augmented Generation (RAG) frameworks. In this role, you will help drive cutting-edge AI solutions that enhance our recruitment technology. This is a unique opportunity to work on state-of-the-art AI features, creating impactful solutions that help us bridge recruitment gaps.

Responsibilities

Design and Development of Data Pipelines: Create and maintain robust, scalable ETL/ELT pipelines for data ingestion, transformation, and loading from diverse sources into data lakes and warehouses using AWS services like S3 Glue, EMR, and Kinesis.
Vector Database Management: Design, implement, and manage specialized databases (vector dbs) for storing and retrieving high-dimensional vector embeddings, critical for AI/ML applications and semantic search.
Infrastructure and Deployment with Docker and AWS: Utilize Docker for containerization of data processing jobs and applications to ensure consistency across development and production environments. Deploy and orchestrate these containerized applications using AWS services like Amazon ECR and potentially Kubernetes on EC2
Data Modeling and Architecture: Develop and maintain data models for both SQL (Redshift, RDS) and NoSQL (DynamoDB) databases, including schema design and performance optimization for vector databases.
Collaboration and Support: Work closely with data scientists to transition prototypes to production systems, understand data requirements, and provide necessary infrastructure support.
Quality Assurance and Optimization: Implement data quality checks, monitor system performance, and troubleshoot issues in data flow to ensure data integrity, accuracy, and optimal performance.

Requirements

Experience: 3 to 5 years.
Education: A bachelor's or master's degree in computer science, engineering, information technology, or a related quantitative field.
Cloud Platform Expertise: Extensive experience with AWS cloud services, including S3 Lambda, EMR, Redshift, Glue, Kinesis, Athena, and IAM.
Programming Languages: Proficiency in Python and SQL for data manipulation, scripting, and application development.
Containerization and DevOps: Hands-on experience with Docker and familiarity with CI/CD pipelines (Jenkins, GitLab) for automated testing and deployment.
Database Knowledge: Strong understanding of data warehousing and ETL processes and experience with various database types, including specific knowledge of VectorDBs.
Big Data Technologies: Experience with frameworks like Apache Spark, Hadoop, and Kafka is often required for large-scale data processing.

Preferred Qualifications

Experience with graph databases and building knowledge graphs.
Familiarity with semantic search techniques and dense retrieval methodologies.
Experience with prompt engineering and few-shot learning principles.
Knowledge of distributed computing frameworks (e. g., Ray, Dask).
Experience with agent orchestration frameworks (ReAct, Plan-and-Execute, Reflexion, multi-agent collaboration, etc. ) and evaluation of agent performance.
Exposure to MLOps practices and tools.

This job was posted by Yakshii Kapoor from Recruiterflow. Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.