Data Engineer

OWOW

Location: India
Job type: Full-time

Required skills

LangChain
Python
Airflow
Redshift
Apache
Apache Airflow
Apache Spark
backend
Bash
BigQuery
caching
CCPA
CI
compliance
containerization
customer journey
data architecture
data modeling
database design
Docker
end-to-end
ETL
experimental design
GitHub
Google Cloud
Kafka
Kubernetes
marketing technology
NoSQL
Pandas
Redis
Shell Scripting
Snowflake
Spark
SQL
state management
TensorFlow
Terraform
version control
Pytorch

About the role

OWOW

Website: owowtalents.com
Job details:

What You'll Build

Core Responsibilities

Data Architecture & Infrastructure (40%)

● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery)

● Build scalable data pipelines for real-time conversation processing and personalization

● Architect ETL/ELT workflows for data migration from legacy systems

● Implement data partitioning, sharding, and optimization strategies for high-throughput systems

● Create data governance frameworks ensuring quality, security, and compliance Vector & Graph Database Systems (25%)

● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings)

● Build graph schemas in Neo4j for customer journey mapping and persona relationships

● Implement HNSW indexing strategies and similarity search optimization

● Create hybrid search systems combining vector, full-text, and graph queries

● Monitor and tune database performance (query latency, throughput, resource utilization)

ML Data Infrastructure (20%)

● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions)

● Create feature stores for GNN training (customer interactions, engagement signals)

● Implement data versioning and lineage tracking for ML experiments

● Design A/B testing data infrastructure with CUPED variance reduction

● Build real-time feature computation pipelines for contextual bandits

Analytics & Monitoring (15%)

● Design BigQuery schemas for marketing analytics and performance tracking

● Create materialized views and aggregation pipelines for real-time dashboards

● Implement data quality monitoring and anomaly detection

● Build observability infrastructure (Prometheus metrics, Grafana dashboards)

● Develop cost optimization strategies for cloud data warehousing

Technical Stack You'll Work With

Databases & Storage

● MongoDB (conversation state, active sessions)

● Redis (caching, rate limiting, real-time data)

● Milvus (vector embeddings, semantic search)

● Neo4j (customer journey graphs, persona networks)

● BigQuery (analytics warehouse, historical data)

Data Processing & Orchestration

● Apache Airflow or Prefect (workflow orchestration)

● Pandas, Polars (data transformation)

● Apache Spark (optional - for large-scale processing)

● dbt (data transformation and modeling)

ML/AI Data Pipeline

● vLLM (LLM inference serving)

● MLflow (model registry, experiment tracking)

● Sentence Transformers (embedding generation)

● PyTorch, TensorFlow (ML model training)

Cloud & Infrastructure

● Google Cloud Platform (BigQuery, Cloud Storage, Compute)

● Docker & Kubernetes (containerization, orchestration)

● Terraform (infrastructure as code)

● GitHub Actions or GitLab CI (CI/CD pipelines)

Programming & Tools

● Python 3.10+ (primary language)

● SQL (complex queries, query optimization)

● Shell scripting (Bash/Zsh)

● Git (version control)

Requirements

Must-Have Skills

● 5+ years of data engineering experience with production systems

● Expert-level SQL and database design skills

● Strong Python programming (async/await, type hints, testing)

● Experience with at least 3 different database technologies (SQL, NoSQL, Vector, Graph)

● Proven track record building high-scale data pipelines (>1M records/day)

● Deep understanding of data modeling (dimensional, normalized, denormalized)

● Experience with cloud data warehouses (BigQuery, Redshift, or Snowflake)

● Strong knowledge of data quality, validation, and governance

● Excellent debugging and optimization skills

Highly Desirable

● Experience with vector databases (Milvus, Pinecone, Weaviate, Qdrant)

● Experience with graph databases (Neo4j, ArangoDB, Neptune)

● Knowledge of embedding models and semantic search

● Experience with ML data pipelines (feature stores, model training data)

● Understanding of A/B testing and experimental design

● Experience with real-time streaming (Kafka, Pub/Sub, Kinesis)

● Knowledge of LLMs and conversational AI systems

● Experience with data migration projects (especially large-scale)

● Background in marketing technology or customer data platforms

Nice-to-Have

● Experience with PyTorch Geometric or graph neural networks

● Knowledge of marketing analytics (attribution, segmentation, personalization)

● Familiarity with LangChain, LangGraph, or agent frameworks

● Experience with cost optimization in cloud environments

● Contributions to open-source data engineering projects

● Experience with data compliance (GDPR, CCPA)

Key Projects You'll Own

Phase 1: Foundation

● Migrate 10M+ conversation vectors from Pinecone to Milvus

● Design and implement MongoDB schemas for real-time agent state

● Set up Neo4j graph database with customer journey models

● Create BigQuery data warehouse with partitioned tables

Phase 2: Optimization

● Build automated data quality monitoring system

● Implement caching strategies (Redis) for 10x latency reduction

● Optimize vector search queries (target: <50ms p95 latency)

● Create real-time analytics dashboards (Grafana)

Phase 3: ML Infrastructure

● Build LLM fine-tuning data pipeline

● Implement feature store for GNN training

● Create A/B testing data infrastructure

● Design multi-armed bandit state management

Work Environment

● Collaborative team: Work with ML engineers, backend developers, and data scientists

● Modern stack: Latest technologies and tools

● Impact: Your work directly affects millions of marketing interactions

● Autonomy: Own your projects end-to-end

● Growth: Clear path to Senior/Lead/Principal roles

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.