ThreatXIntel
Website:
threatxintel.com
Job details:
We are looking for a highly skilled Data Engineer with strong hands-on experience in building scalable data pipelines using Python, SQL, Scala, and Apache Spark.
The ideal candidate should have deep expertise in batch and real-time data processing, along with experience in modern cloud-based data architectures (GCP preferred).
Key Responsibilities
- Design, develop, and optimize ETL/ELT pipelines for large-scale data processing
- Build scalable batch and real-time pipelines using Apache Spark (Spark SQL, DataFrames, Streaming)
- Work with streaming technologies like Kafka / Flink / Pub/Sub
- Implement and manage data lakes, data warehouses, and lakehouse architectures
- Perform performance tuning (partitioning, joins optimization, query tuning)
- Develop solutions on GCP (BigQuery, Pub/Sub, Dataflow, etc.)
- Automate workflows using Apache Airflow
- Implement data quality checks, monitoring, and alerting frameworks
- Collaborate with cross-functional teams for data integration and analytics use cases
- (Nice to have) Work with Docker, Kubernetes, and CI/CD pipelines
Required Skills
- Strong experience in Python, SQL, Scala
- Hands-on expertise in Apache Spark (Batch + Streaming)
- Experience with Kafka / Flink / Pub/Sub
- Knowledge of BigQuery / Snowflake / Redshift
- Strong understanding of data modeling & data architecture
- Experience with GCP (preferred)
- Experience with Airflow orchestration
- Strong debugging and performance optimization skills
Nice to Have
- Experience with Docker & Kubernetes
- CI/CD pipeline exposure
- Experience in lakehouse architecture (Delta Lake, Iceberg, etc.)
Click on Apply to know more.