Algoworks
Website:
algoworks.com
Job details:
Role: Data Architect - Performance Optimization
Location: India, Remote
Experience: 8 Years
Algoworks
www.algoworks.com
About the company
Algoworks is an award-winning artificial intelligence, engineering services and experience transformation firm with offices across the United States, Europe, South America and India. We bring together a global team of engineers, architects, designers, researchers and operators united by rigor, accountability and a commitment to delivering measurable results.
For over 20 years, Algoworks has partnered with Fortune 500 organizations across the Americas, Europe and Asia to define, build and run technology that drives meaningful business outcomes. Our work combines human-centered design, engineering excellence and AI-powered capabilities to solve complex challenges with clarity and precision. Innovation, particularly in the responsible application of AI, is embedded in how teams approach problem-solving and continuous improvement.
At Algoworks, growth is continuous and closely tied to impact. Teams collaborate across geographies and disciplines, strengthening outcomes through shared insight and collective expertise. The culture values transparency, open dialogue and an environment where every voice is heard and contribution is recognized.
Through collaboration, accountability and a focus on results, Algoworks operates at the intersection of technology and people, building not only advanced systems but strong global teams that elevate performance and create lasting impact.
Follow the video below to know about us! Clipchamp
Role overview
We are seeking a highly skilled Data Architect – Performance Optimization Expert with deep expertise in SQL, PySpark, and Databricks performance tuning. This role focuses on handling large-scale data workloads, optimizing complex transformations, and designing highly efficient data pipelines within a Medallion Architecture (Bronze, Silver, Gold layers).
The ideal candidate will bring strong hands-on experience in query optimization, partitioning strategies, clustering, and managing high-volume read/write operations using modern Lakehouse technologies such as Delta Lake and Apache Iceberg.
Key responsibilities:
1.Data pipeline development and optimization
- Design, develop, and optimize scalable data pipelines in Databricks using PySpark and SQL .
- Optimize complex SQL queries and Spark jobs for performance, cost, and scalability.
- Handle large-scale transformations and complex joins across TB–PB scale datasets.
2.Architecture and data modeling
- Implement and maintain Medallion Architecture (Bronze, Silver, Gold layers).
- Apply strong data modeling practices for efficient data processing and access.
- Build reusable frameworks and enforce best practices for performance optimization.
3.Performance engineering
- Define and enforce partitioning, bucketing, and clustering strategies (including Z-ordering).
- Optimize read and write performance for large datasets.
- Tune Spark jobs using techniques such as:
- Broadcast joins
- Caching and persistence
- Adaptive Query Execution (AQE)
- Shuffle optimization
- Analyze execution plans (Spark UI, DAGs) and apply cost-based optimization techniques.
- Address data skew, large joins and shuffle-intensive workloads.
4.Databricks and lakehouse optimization
- Work extensively with Delta Lake and Apache Iceberg tables.
- Optimize cluster configurations (autoscaling, instance types, memory tuning).
- Manage Databricks jobs, workflows, and production pipelines.
- Ensure efficient file sizing, compaction, and storage optimization.
5.Quality, reliability and troubleshooting
- Identify performance bottlenecks and implement optimization strategies.
- Ensure data quality, consistency, and reliability across pipelines.
- Debug complex data processing and system performance issues.
- Collaboration & Stakeholder Engagement.
- Collaborate with Analytics, BI, Product, and DevOps teams.
- Translate business requirements into scalable and optimized data solutions.
Required skills and experience:
Core technical skills
- Strong expertise in SQL query optimization.
- Advanced hands-on experience with PySpark / Apache Spark.
- Hands-on experience with Databricks platform.
- Deep understanding of:
- Partitioning strategies
- Clustering (Z-ordering, indexing)
- File sizing and compaction
- Experience with Delta Lake.
- Experience with Apache Iceberg or similar open table formats.
- Strong understanding of distributed data processing concepts.
Performance optimization expertise
- Proven experience optimizing large joins, aggregations, and skewed data scenarios.
- Strong knowledge of Spark execution plans, DAG analysis, and query pruning techniques.
- Experience with cost-based optimization and performance tuning methodologies.
Data engineering and architecture
- Strong experience in Medallion Architecture.
- Experience building end-to-end ETL/ELT pipelines.
- Understanding of batch and real-time data processing systems.
- Familiarity with data modeling techniques.
Databricks and ecosystem
- Expertise in cluster tuning and configuration.
- Experience with Databricks Jobs, Workflows, and Notebooks.
- Exposure to production-grade data pipeline orchestration.
Good to have skills:
- Experience with Azure Data Factory, Microsoft Fabric, or Azure Synapse Analytics.
- Experience in multi-tenant data architectures.
- Exposure to data governance and security frameworks.
Desired attributes:
- Strong problem-solving and analytical thinking.
- Ability to debug and optimize complex systems.
- Effective communication and stakeholder management skills.
- Ability to work in a fast-paced, data-driven environment.
Mandate skills:
ETL | SQL | PySpark | Databricks | Delta Lake | Apache Iceberg | Microsoft Azure | Azure Data Factory | Microsoft Fabric | Azure Synapse Analytics
Click on Apply to know more.