Viraaj HR Solutions Private Limited
Website:
viraajhrsolutions.com
Job details:
About The Opportunity
A fast-scaling technology partner in the Data Engineering & Cloud Analytics space, we build scalable data pipelines and real-time processing systems for Fortune 500 clients across BFSI, Retail, and Telecom. Our engineers leverage PySpark at scale to ingest, transform, and serve petabytes of structured and semi-structured data across cloud platforms—delivering clean, reliable, and low-latency data products powering AI/ML, BI, and operational dashboards.
Role & Responsibilities
- Design, develop, and deploy PySpark-based ETL/ELT pipelines to process terabytes of batch and streaming data daily.
- Optimize Spark jobs for performance—tuning partitioning, caching, skew handling, and resource allocation on Databricks/YARN/EMR.
- Integrate PySpark jobs with cloud data lakes (S3, ADLS, GCS) and warehouses (Snowflake, Redshift, BigQuery).
- Collaborate with Data Scientists to engineer feature stores and prepare training datasets for ML models.
- Implement data validation, quality checks, and monitoring frameworks to ensure pipeline reliability and SLA compliance.
- Write clean, testable, and reusable PySpark code with unit/integration testing and CI/CD integration.
Skills & Qualifications
Must-Have
- PySpark
- Spark SQL
- Databricks
- Delta Lake
- Python
- Cloud Platforms (AWS/Azure/GCP)
- SQL
- ETL/ELT Design
Preferred
- Kafka
- CI/CD Tools (Jenkins, GitHub Actions)
- MLflow or Feature Store experience
Benefits & Culture Highlights
- Work on high-scale, production-grade data systems for global clients.
- Learn from senior engineers and mentorship-led growth tracks.
- Competitive salary + performance-based bonuses + on-site perks.
Skills: etl,cd,pyspark,data,ci
Click on Apply to know more.