Data Architect - Performance Optimization

Algoworks

Location: India
Job type: Full-time

Required skills

Apache
Apache Spark
Artificial Intelligence
Azure
caching
clustering
data modeling
data solutions
Databricks
DevOps
end-to-end
ETL
multi-tenant
SQL

About the role

Algoworks

Website: algoworks.com
Job details:

Role: Data Architect - Performance Optimization

Location: India, Remote

Experience: 8 Years

Algoworks

www.algoworks.com

About the company

Algoworks is an award-winning artificial intelligence, engineering services and experience transformation firm with offices across the United States, Europe, South America and India. We bring together a global team of engineers, architects, designers, researchers and operators united by rigor, accountability and a commitment to delivering measurable results.

For over 20 years, Algoworks has partnered with Fortune 500 organizations across the Americas, Europe and Asia to define, build and run technology that drives meaningful business outcomes. Our work combines human-centered design, engineering excellence and AI-powered capabilities to solve complex challenges with clarity and precision. Innovation, particularly in the responsible application of AI, is embedded in how teams approach problem-solving and continuous improvement.

At Algoworks, growth is continuous and closely tied to impact. Teams collaborate across geographies and disciplines, strengthening outcomes through shared insight and collective expertise. The culture values transparency, open dialogue and an environment where every voice is heard and contribution is recognized.

Through collaboration, accountability and a focus on results, Algoworks operates at the intersection of technology and people, building not only advanced systems but strong global teams that elevate performance and create lasting impact.

Follow the video below to know about us! Clipchamp

Role overview

We are seeking a highly skilled Data Architect – Performance Optimization Expert with deep expertise in SQL, PySpark, and Databricks performance tuning. This role focuses on handling large-scale data workloads, optimizing complex transformations, and designing highly efficient data pipelines within a Medallion Architecture (Bronze, Silver, Gold layers).

The ideal candidate will bring strong hands-on experience in query optimization, partitioning strategies, clustering, and managing high-volume read/write operations using modern Lakehouse technologies such as Delta Lake and Apache Iceberg.

Key responsibilities:

1.Data pipeline development and optimization

Design, develop, and optimize scalable data pipelines in Databricks using PySpark and SQL .
Optimize complex SQL queries and Spark jobs for performance, cost, and scalability.
Handle large-scale transformations and complex joins across TB–PB scale datasets.

2.Architecture and data modeling

Implement and maintain Medallion Architecture (Bronze, Silver, Gold layers).
Apply strong data modeling practices for efficient data processing and access.
Build reusable frameworks and enforce best practices for performance optimization.

3.Performance engineering

Define and enforce partitioning, bucketing, and clustering strategies (including Z-ordering).
Optimize read and write performance for large datasets.
Tune Spark jobs using techniques such as:
Broadcast joins
Caching and persistence
Adaptive Query Execution (AQE)
Shuffle optimization
Analyze execution plans (Spark UI, DAGs) and apply cost-based optimization techniques.
Address data skew, large joins and shuffle-intensive workloads.

4.Databricks and lakehouse optimization

Work extensively with Delta Lake and Apache Iceberg tables.
Optimize cluster configurations (autoscaling, instance types, memory tuning).
Manage Databricks jobs, workflows, and production pipelines.
Ensure efficient file sizing, compaction, and storage optimization.

5.Quality, reliability and troubleshooting

Identify performance bottlenecks and implement optimization strategies.
Ensure data quality, consistency, and reliability across pipelines.
Debug complex data processing and system performance issues.
Collaboration & Stakeholder Engagement.
Collaborate with Analytics, BI, Product, and DevOps teams.
Translate business requirements into scalable and optimized data solutions.

Required skills and experience:

Core technical skills

Strong expertise in SQL query optimization.
Advanced hands-on experience with PySpark / Apache Spark.
Hands-on experience with Databricks platform.
Deep understanding of:
Partitioning strategies
Clustering (Z-ordering, indexing)
File sizing and compaction
Experience with Delta Lake.
Experience with Apache Iceberg or similar open table formats.
Strong understanding of distributed data processing concepts.

Performance optimization expertise

Proven experience optimizing large joins, aggregations, and skewed data scenarios.
Strong knowledge of Spark execution plans, DAG analysis, and query pruning techniques.
Experience with cost-based optimization and performance tuning methodologies.

Data engineering and architecture

Strong experience in Medallion Architecture.
Experience building end-to-end ETL/ELT pipelines.
Understanding of batch and real-time data processing systems.
Familiarity with data modeling techniques.

Databricks and ecosystem

Expertise in cluster tuning and configuration.
Experience with Databricks Jobs, Workflows, and Notebooks.
Exposure to production-grade data pipeline orchestration.

Good to have skills:

Experience with Azure Data Factory, Microsoft Fabric, or Azure Synapse Analytics.
Experience in multi-tenant data architectures.
Exposure to data governance and security frameworks.

Desired attributes:

Strong problem-solving and analytical thinking.
Ability to debug and optimize complex systems.
Effective communication and stakeholder management skills.
Ability to work in a fast-paced, data-driven environment.

Mandate skills:

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.