About the role
Key Responsibilities:
Work in an Agile/SCRUM environment with a seasoned data engineering team.
Design, develop, test, and debug large-scale data infrastructure.
Build and migrate to a modern Data Lake & ETL infrastructure.
Develop data APIs and delivery services for operational and analytical applications.
Drive automation and optimization of internal processes and data delivery.
Support digital transformation for Hearst and CDS Global's Resin platform.
Required Skills & Experience:
Cloud & Data Tools: AWS (Redshift, RDS, S3, Glue, Athena, MWAA), Pentaho, Airflow
Programming & Query Languages: Python, SQL (strong fluency required)
Databases: PostgreSQL, MySQL, large-scale (multi-TB) database management
ETL/ELT Pipelines: Experience with Pentaho, AWS ETL tools
Data Architecture: Strong understanding of data warehousing, big data pipelines
Tools & Platforms: GitHub, unstructured data handling, Airflow (deep expertise)
Big Data (Bonus): Apache Spark experience
Soft Skills: Strong analytical skills, cross-functional collaboration, offshore team coordination