Data Architect

ALOIS Solutions

Location: India
Job type: Full-time

Required skills

Python
AWS
API
Azure
compliance
data analytics
data architecture
data pipeline
GCP
Helm
Kubernetes
Node
Pytorch

About the role

ALOIS Solutions

Website: aloissolutions.com
Job details:

Job Title: Data Architect – Hybrid Graph ML + Graph RAG Platform

Level: Principal / Lead / Staff (IC6)

Employment Type: Full-Time, Permanent

Location: Remote – India

Experience Required: 15+ Years

About the Role

We are seeking a highly experienced Data Architect to lead the design of a next-generation Hybrid Graph ML + Graph RAG data platform powering a large-scale Tradecraft Evaluation Platform.

This role is responsible for architecting the unified knowledge representation layer that combines graph neural network (GNN) outputs with vector-based retrieval systems to support advanced reasoning, evaluation, and intelligence-style analytical workflows.

You will define the foundational data architecture for a platform operating on a massive knowledge graph containing approximately 10 billion records across 700+ data sources. The ideal candidate has deep expertise in large-scale ML/AI data systems, graph architectures, vector databases, lineage-driven dataset management, and production-grade RAG infrastructure.

This is a high-impact architecture role working closely with senior engineering leadership, ML teams, and platform architects.

Key Responsibilities

Core Responsibilities

Design scalable, high-performance data architectures for complex ML/AI and analytical workloads
Define enterprise-grade data modelling standards, schema governance, lineage, and versioning strategies
Evaluate and select appropriate storage technologies based on workload patterns and retrieval requirements
Establish data quality, integrity, observability, and compliance frameworks
Review and approve pipeline and platform designs created by engineering teams
Partner with ML engineers, platform engineers, and architects to align data infrastructure with system objectives

Project-Specific Responsibilities

Hybrid Graph ML + Graph RAG Architecture

Architect the unified context layer integrating:
GNN-generated entity embeddings and anomaly features
Vector retrieval outputs from Graph RAG systems
Structured graph intelligence and procedural tradecraft reasoning
Define convergence patterns between:
PyTorch Geometric feature pipelines
Weaviate vector retrieval systems
PostgreSQL-based evaluation datasets

Golden Dataset & Lineage Architecture

Design the golden dataset architecture with:
Dataset versioning
Immutable lineage tracking
Evaluation/training/holdout pool separation
Physically isolated schemas and storage boundaries
Implement full traceability from source ingestion through extraction, transformation, validation, and evaluation

Knowledge Graph Integration

Design the migration and integration strategy for a 10B+ record knowledge graph
Define scalable subgraph extraction and retrieval patterns
Optimize access strategies for read-only API integration across heterogeneous data sources
Support entity resolution and investigative graph traversal use cases

Vector Database & RAG Systems

Architect the Weaviate vector database schema and indexing strategy
Define:
Chunking strategies (semantic, structural, hybrid)
Embedding model standards
Metadata schemas
Retrieval and re-ranking pipelines
Design retrieval benchmarking frameworks for precision/recall evaluation against expert-curated test queries

Graph ML Pipeline Design

Define Graph ML feature engineering pipelines for:
Node attributes
Edge relationships
Temporal and behavioral patterns
Support lightweight GNN training workflows in PyTorch Geometric or DGL

Compliance & Security Architecture

Design physically separated data environments for regulatory and compliance requirements
Implement schema-level and infrastructure-level access control strategies
Support immutable audit and lineage requirements

Required Skills & Experience

Mandatory Qualifications

15+ years of experience in Data Engineering and Data Architecture
5+ years designing ML/AI data platforms at enterprise scale
Expert-level PostgreSQL expertise including:
Partitioning
Materialized views
Triggers
Row-level security
Query optimization
Strong Python expertise for large-scale data pipeline development
Experience with cloud-managed data services on AWS, Azure, or GCP

Graph & Vector Database Experience

Hands-on production experience with vector databases such as:
Weaviate
Pinecone
Qdrant
Milvus
Experience designing scalable RAG architectures and retrieval systems
Strong understanding of:
Embedding pipelines
Chunking strategies
Metadata-driven retrieval
Re-ranking techniques
Experience working with graph databases and large-scale knowledge graph architectures

ML & Data Platform Architecture

Experience designing data systems integrating:
Relational databases
Graph databases
Vector stores
Experience working with billion-scale datasets
Experience implementing dataset lineage, versioning, and governance systems
Familiarity with Graph ML workflows using:
PyTorch Geometric
DGL

Compliance & Data Governance

Experience designing physically separated data architectures for regulated environments
Experience implementing infrastructure-level access controls and auditability

Preferred Qualifications

Experience with Weaviate in Kubernetes environments
Experience with TimescaleDB or equivalent time-series systems
Experience with immutable audit stores such as Amazon QLDB
AWS Data Analytics Specialty or equivalent certification
Experience with:
Trade data systems
Corporate registration datasets
Entity resolution frameworks
HELM benchmark datasets
Experience in government, intelligence, compliance, or FedRAMP-aligned environments

Strongly Preferred Background

Candidates with experience in any of the following domains will be highly valued:

Intelligence analysis
Signals intelligence
Law enforcement analytics
Financial crime investigation
Link analysis platforms
Adversarial entity resolution systems
Mission-critical knowledge graph platforms

Success Indicators

Successful candidates will demonstrate:

Proven deployment of production-grade vector databases for RAG systems
Experience in achieving measurable retrieval quality improvements
Experience integrating graph intelligence with LLM/RAG augmentation workflows
Expertise in designing lineage-driven evaluation datasets
Strong architectural ownership across large-scale distributed data systems

What We’re Looking For

We are looking for a deeply technical architect who can operate at Staff/Principal level ownership, make foundational architecture decisions, and design scalable AI-native data infrastructure that will power downstream evaluation and intelligence workflows.

You should be equally comfortable discussing:

Graph feature engineering
RAG retrieval optimization
Data governance
Billion-scale graph systems
ML data pipelines
Compliance-oriented data separation architectures while collaborating with senior engineering and platform leadership.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.