OWOW
Website:
owowtalents.com
Job details:
What You'll Build
Core Responsibilities
Data Architecture & Infrastructure (40%)
● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery)
● Build scalable data pipelines for real-time conversation processing and personalization
● Architect ETL/ELT workflows for data migration from legacy systems
● Implement data partitioning, sharding, and optimization strategies for high-throughput systems
● Create data governance frameworks ensuring quality, security, and compliance Vector & Graph Database Systems (25%)
● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings)
● Build graph schemas in Neo4j for customer journey mapping and persona relationships
● Implement HNSW indexing strategies and similarity search optimization
● Create hybrid search systems combining vector, full-text, and graph queries
● Monitor and tune database performance (query latency, throughput, resource utilization)
ML Data Infrastructure (20%)
● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions)
● Create feature stores for GNN training (customer interactions, engagement signals)
● Implement data versioning and lineage tracking for ML experiments
● Design A/B testing data infrastructure with CUPED variance reduction
● Build real-time feature computation pipelines for contextual bandits
Analytics & Monitoring (15%)
● Design BigQuery schemas for marketing analytics and performance tracking
● Create materialized views and aggregation pipelines for real-time dashboards
● Implement data quality monitoring and anomaly detection
● Build observability infrastructure (Prometheus metrics, Grafana dashboards)
● Develop cost optimization strategies for cloud data warehousing
Technical Stack You'll Work With
Databases & Storage
● MongoDB (conversation state, active sessions)
● Redis (caching, rate limiting, real-time data)
● Milvus (vector embeddings, semantic search)
● Neo4j (customer journey graphs, persona networks)
● BigQuery (analytics warehouse, historical data)
Data Processing & Orchestration
● Apache Airflow or Prefect (workflow orchestration)
● Pandas, Polars (data transformation)
● Apache Spark (optional - for large-scale processing)
● dbt (data transformation and modeling)
ML/AI Data Pipeline
● vLLM (LLM inference serving)
● MLflow (model registry, experiment tracking)
● Sentence Transformers (embedding generation)
● PyTorch, TensorFlow (ML model training)
Cloud & Infrastructure
● Google Cloud Platform (BigQuery, Cloud Storage, Compute)
● Docker & Kubernetes (containerization, orchestration)
● Terraform (infrastructure as code)
● GitHub Actions or GitLab CI (CI/CD pipelines)
Programming & Tools
● Python 3.10+ (primary language)
● SQL (complex queries, query optimization)
● Shell scripting (Bash/Zsh)
● Git (version control)
Requirements
Must-Have Skills
● 5+ years of data engineering experience with production systems
● Expert-level SQL and database design skills
● Strong Python programming (async/await, type hints, testing)
● Experience with at least 3 different database technologies (SQL, NoSQL, Vector, Graph)
● Proven track record building high-scale data pipelines (>1M records/day)
● Deep understanding of data modeling (dimensional, normalized, denormalized)
● Experience with cloud data warehouses (BigQuery, Redshift, or Snowflake)
● Strong knowledge of data quality, validation, and governance
● Excellent debugging and optimization skills
Highly Desirable
● Experience with vector databases (Milvus, Pinecone, Weaviate, Qdrant)
● Experience with graph databases (Neo4j, ArangoDB, Neptune)
● Knowledge of embedding models and semantic search
● Experience with ML data pipelines (feature stores, model training data)
● Understanding of A/B testing and experimental design
● Experience with real-time streaming (Kafka, Pub/Sub, Kinesis)
● Knowledge of LLMs and conversational AI systems
● Experience with data migration projects (especially large-scale)
● Background in marketing technology or customer data platforms
Nice-to-Have
● Experience with PyTorch Geometric or graph neural networks
● Knowledge of marketing analytics (attribution, segmentation, personalization)
● Familiarity with LangChain, LangGraph, or agent frameworks
● Experience with cost optimization in cloud environments
● Contributions to open-source data engineering projects
● Experience with data compliance (GDPR, CCPA)
Key Projects You'll Own
Phase 1: Foundation
● Migrate 10M+ conversation vectors from Pinecone to Milvus
● Design and implement MongoDB schemas for real-time agent state
● Set up Neo4j graph database with customer journey models
● Create BigQuery data warehouse with partitioned tables
Phase 2: Optimization
● Build automated data quality monitoring system
● Implement caching strategies (Redis) for 10x latency reduction
● Optimize vector search queries (target: <50ms p95 latency)
● Create real-time analytics dashboards (Grafana)
Phase 3: ML Infrastructure
● Build LLM fine-tuning data pipeline
● Implement feature store for GNN training
● Create A/B testing data infrastructure
● Design multi-armed bandit state management
Work Environment
● Collaborative team: Work with ML engineers, backend developers, and data scientists
● Modern stack: Latest technologies and tools
● Impact: Your work directly affects millions of marketing interactions
● Autonomy: Own your projects end-to-end
● Growth: Clear path to Senior/Lead/Principal roles
Click on Apply to know more.