Data Engineer

Pace Wisdom Solutions

full-time

Required skills

Python
Apache
Apache Flink
containerization
CSV
data engineer
data lake
data modeling
end-to-end
ETL
Flink
Java
Kafka
metadata management
Parquet
SQL
uptime
warehouse management

About the role

Pace Wisdom Solutions

Website: pacewisdom.com
Job details:

Job Description:

Location: Bangalore (Hybrid)

Role: Data Engineer – Business Insights

Experience Required

5–7+ years of experience in Data Engineering, Big Data, or Analytics Engineering.

Role Overview and Key responsibilities

We are looking for a highly skilled Data Engineer to build and manage scalable data pipelines for our 4PL (Fourth-Party Logistics) Business Insights platform. The ideal candidate will design and implement robust ingestion, transformation, and analytics-ready data infrastructure that powers AI-driven business insights and operational intelligence.

This role will be responsible for building end-to-end pipelines from Existing Kafka spine + Debezium CDC + Apache Flink for streaming transformation along with supporting bulk ingestion from CSV and other flat-file sources
Would need the candidate to have working experience with Apache Iceberg on Amazon S3
Should be familiar with ClickHouse for building customer dashboards and Trino/Athena for historical queries
Design, develop, and maintain scalable data pipelines for ingesting logistics and operational data into the analytics platform.
Strong SQL skills and experience optimizing analytical queries.
Familiarity with containerization and cloud-native deployments.
Proficiency in Python, Scala, or Java.

Data Lake & Warehouse Management

Manage and optimize data flow from Kafka topics into S3-based storage layers. Build ETL/ELT pipelines to transform and load data into ClickHouse for high-performance analytical querying.
Design partitioning, indexing, and schema strategies in ClickHouse for low-latency AI and BI workloads.

AI & Analytics Enablement

Enable AI agents and analytics applications to efficiently query ClickHouse datasets.
Ensure data quality, consistency, and availability for downstream AI-driven insights.
Collaborate with AI/ML teams to expose optimized datasets and semantic models.

Platform Reliability & Optimization

Monitor and optimize pipeline performance, storage efficiency, and query latency.
Implement observability, alerting, and retry mechanisms for ingestion pipelines.
Ensure scalability, fault tolerance, and data governance best practices.

Collaboration

Work closely with:
Product teams
Business Insights teams
AI/ML engineers
Platform engineering teams
Participate in architecture discussions and contribute to long-term data platform strategy.

Required Skills & Qualifications

Technical Skills

Strong experience in building distributed data pipelines.
Hands-on expertise with:

Data Engineering Concepts

ETL/ELT pipeline design
Data modeling for analytics
Data partitioning and indexing strategies
Schema evolution and metadata management
Monitoring and observability

Nice to Have

Experience with logistics, supply chain, or 4PL platforms.
Exposure to AI/LLM-based analytics systems.
Familiarity with vector search or AI retrieval architectures.
Experience with dbt or modern data stack tools.
Knowledge of Iceberg, Delta Lake, or Parquet optimization.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
Experience working in high-scale analytics or real-time data environments.

Success Metrics

Reliable real-time and batch ingestion pipelines.
Optimized ClickHouse performance for AI-agent querying.
Reduced latency for analytics and reporting workloads.
High data quality and pipeline uptime.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.