ALOIS Solutions
Website:
aloissolutions.com
Job details:
Job Title: Data Architect – Hybrid Graph ML + Graph RAG Platform
Level: Principal / Lead / Staff (IC6)
Employment Type: Full-Time, Permanent
Location: Remote – India
Experience Required: 15+ Years
About the Role
We are seeking a highly experienced Data Architect to lead the design of a next-generation Hybrid Graph ML + Graph RAG data platform powering a large-scale Tradecraft Evaluation Platform.
This role is responsible for architecting the unified knowledge representation layer that combines graph neural network (GNN) outputs with vector-based retrieval systems to support advanced reasoning, evaluation, and intelligence-style analytical workflows.
You will define the foundational data architecture for a platform operating on a massive knowledge graph containing approximately 10 billion records across 700+ data sources. The ideal candidate has deep expertise in large-scale ML/AI data systems, graph architectures, vector databases, lineage-driven dataset management, and production-grade RAG infrastructure.
This is a high-impact architecture role working closely with senior engineering leadership, ML teams, and platform architects.
Key Responsibilities
Core Responsibilities
- Design scalable, high-performance data architectures for complex ML/AI and analytical workloads
- Define enterprise-grade data modelling standards, schema governance, lineage, and versioning strategies
- Evaluate and select appropriate storage technologies based on workload patterns and retrieval requirements
- Establish data quality, integrity, observability, and compliance frameworks
- Review and approve pipeline and platform designs created by engineering teams
- Partner with ML engineers, platform engineers, and architects to align data infrastructure with system objectives
Project-Specific Responsibilities
Hybrid Graph ML + Graph RAG Architecture
- Architect the unified context layer integrating:
- GNN-generated entity embeddings and anomaly features
- Vector retrieval outputs from Graph RAG systems
- Structured graph intelligence and procedural tradecraft reasoning
- Define convergence patterns between:
- PyTorch Geometric feature pipelines
- Weaviate vector retrieval systems
- PostgreSQL-based evaluation datasets
Golden Dataset & Lineage Architecture
- Design the golden dataset architecture with:
- Dataset versioning
- Immutable lineage tracking
- Evaluation/training/holdout pool separation
- Physically isolated schemas and storage boundaries
- Implement full traceability from source ingestion through extraction, transformation, validation, and evaluation
Knowledge Graph Integration
- Design the migration and integration strategy for a 10B+ record knowledge graph
- Define scalable subgraph extraction and retrieval patterns
- Optimize access strategies for read-only API integration across heterogeneous data sources
- Support entity resolution and investigative graph traversal use cases
Vector Database & RAG Systems
- Architect the Weaviate vector database schema and indexing strategy
- Define:
- Chunking strategies (semantic, structural, hybrid)
- Embedding model standards
- Metadata schemas
- Retrieval and re-ranking pipelines
- Design retrieval benchmarking frameworks for precision/recall evaluation against expert-curated test queries
Graph ML Pipeline Design
- Define Graph ML feature engineering pipelines for:
- Node attributes
- Edge relationships
- Temporal and behavioral patterns
- Support lightweight GNN training workflows in PyTorch Geometric or DGL
Compliance & Security Architecture
- Design physically separated data environments for regulatory and compliance requirements
- Implement schema-level and infrastructure-level access control strategies
- Support immutable audit and lineage requirements
Required Skills & Experience
Mandatory Qualifications
- 15+ years of experience in Data Engineering and Data Architecture
- 5+ years designing ML/AI data platforms at enterprise scale
- Expert-level PostgreSQL expertise including:
- Partitioning
- Materialized views
- Triggers
- Row-level security
- Query optimization
- Strong Python expertise for large-scale data pipeline development
- Experience with cloud-managed data services on AWS, Azure, or GCP
Graph & Vector Database Experience
- Hands-on production experience with vector databases such as:
- Weaviate
- Pinecone
- Qdrant
- Milvus
- Experience designing scalable RAG architectures and retrieval systems
- Strong understanding of:
- Embedding pipelines
- Chunking strategies
- Metadata-driven retrieval
- Re-ranking techniques
- Experience working with graph databases and large-scale knowledge graph architectures
ML & Data Platform Architecture
- Experience designing data systems integrating:
- Relational databases
- Graph databases
- Vector stores
- Experience working with billion-scale datasets
- Experience implementing dataset lineage, versioning, and governance systems
- Familiarity with Graph ML workflows using:
- PyTorch Geometric
- DGL
Compliance & Data Governance
- Experience designing physically separated data architectures for regulated environments
- Experience implementing infrastructure-level access controls and auditability
Preferred Qualifications
- Experience with Weaviate in Kubernetes environments
- Experience with TimescaleDB or equivalent time-series systems
- Experience with immutable audit stores such as Amazon QLDB
- AWS Data Analytics Specialty or equivalent certification
- Experience with:
- Trade data systems
- Corporate registration datasets
- Entity resolution frameworks
- HELM benchmark datasets
- Experience in government, intelligence, compliance, or FedRAMP-aligned environments
Strongly Preferred Background
Candidates with experience in any of the following domains will be highly valued:
- Intelligence analysis
- Signals intelligence
- Law enforcement analytics
- Financial crime investigation
- Link analysis platforms
- Adversarial entity resolution systems
- Mission-critical knowledge graph platforms
Success Indicators
Successful candidates will demonstrate:
- Proven deployment of production-grade vector databases for RAG systems
- Experience in achieving measurable retrieval quality improvements
- Experience integrating graph intelligence with LLM/RAG augmentation workflows
- Expertise in designing lineage-driven evaluation datasets
- Strong architectural ownership across large-scale distributed data systems
What We’re Looking For
We are looking for a deeply technical architect who can operate at Staff/Principal level ownership, make foundational architecture decisions, and design scalable AI-native data infrastructure that will power downstream evaluation and intelligence workflows.
You should be equally comfortable discussing:
- Graph feature engineering
- RAG retrieval optimization
- Data governance
- Billion-scale graph systems
- ML data pipelines
- Compliance-oriented data separation architectures while collaborating with senior engineering and platform leadership.
Click on Apply to know more.