Flag job

Report

Data Engineer Data-AWS ,Pyspark , Advanced SQL, AWS Glue , Redshift ,

Location

Bengaluru, Karnataka, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

zorba ai

Website: zorbaconsulting.in
Job details:
Job Description – Data Engineer (AWS, PySpark, Advanced SQL)

Role: Data Engineer

Experience: 5–9 Years

Location: Open

Notice Period: Immediate to 30 Days Preferred

Job Summary

We are looking for a highly skilled Data Engineer with strong expertise in AWS cloud technologies, PySpark, Advanced SQL, AWS Glue, Redshift, and S3. The ideal candidate should have hands-on experience designing scalable ETL/ELT pipelines, optimizing big data workloads, and building cloud-based data platforms for analytics and reporting solutions.

Key Responsibilities

  • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
  • Build and optimize ETL/ELT workflows using AWS Glue.
  • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
  • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
  • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
  • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
  • Optimize Spark jobs for performance, scalability, and cost efficiency.
  • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
  • Ensure data security, governance, and compliance standards are followed.
  • Participate in code reviews, deployment processes, and production support activities.

Required Skills Technical Skills

  • Strong experience in AWS Cloud Services.
  • Hands-on experience with:
    • AWS Glue
    • Amazon Redshift
    • Amazon S3
    • AWS IAM
    • CloudWatch
    • Lambda (Good to Have)
  • Strong expertise in PySpark and Spark SQL.
  • Advanced SQL knowledge including:
    • Complex joins
    • Window functions
    • CTEs
    • Query optimization
    • Performance tuning
  • Experience building large-scale ETL/ELT pipelines.
  • Knowledge of data warehousing concepts and dimensional modeling.
  • Experience handling large datasets and distributed data processing.
  • Familiarity with Git, CI/CD pipelines, and Agile methodology.
Good to Have

  • Experience with Airflow or other orchestration tools.
  • Knowledge of Kafka/Kinesis streaming pipelines.
  • Exposure to Snowflake or Databricks.
  • Python scripting experience.
  • Experience in healthcare, finance, or retail domains.

Educational Qualification

  • Bachelor’s/Master’s degree in Computer Science, Information Technology, or related field.

Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and
  • Key Responsibilities
    • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
    • Build and optimize ETL/ELT workflows using AWS Glue.
    • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
    • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
    • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
    • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
    • Optimize Spark jobs for performance, scalability, and cost efficiency.
    • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
    • Ensure data security, governance, and compliance standards are followed.
    • Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and troubleshooting
  • Key Responsibilities
    • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
    • Build and optimize ETL/ELT workflows using AWS Glue.
    • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
    • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
    • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
    • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
    • Optimize Spark jobs for performance, scalability, and cost efficiency.
    • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
    • Ensure data security, governance, and compliance standards are followed.
    • Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and troubleshooting

Skills: redshift,aws glue,sql,pyspark Click on Apply to know more.

Skills

Python
Agile
Airflow
AWS
CloudWatch
compliance
CSV
data engineer
data ingestion
data modeling
Databricks
ETL
Git
Kafka
Lambda
Parquet
production support
Snowflake
SQL