Report

Senior Spark Developer

Location

San Jose, CA

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

Introduction

A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.

Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.

We are seeking a skilled Spark Developer developer to join our IBM Software team. As part of our team, you will be responsible for developing and maintaining high-quality software products, working with a variety of technologies and programming languages.

IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.

Your Role And Responsibilities

Design, develop, and optimize big data applications using Apache Spark and Scala.
Architect and implement scalable data pipelines for both batch and real-time processing.
Collaborate with data engineers, analysts, and architects to define data strategies.
Optimize Spark jobs for performance and cost-effectiveness on distributed clusters.
Build and maintain reusable code and libraries for future use.
Work with various data storage systems like HDFS, Hive, HBase, Cassandra, Kafka, and Parquet.
Implement data quality checks, logging, monitoring, and alerting for ETL jobs.
Mentor junior developers and lead code reviews to ensure best practices.
Ensure security, governance, and compliance standards are adhered to in all data processes.
Troubleshoot and resolve performance issues and bugs in big data solutions.

Preferred Education

Bachelor's Degree

Required Technical And Professional Expertise

12+ years of total software development experience.
5+ years of hands-on experience with Apache Spark and Scala.
Proficiency in Scala with deep knowledge of functional programming.
Strong experience with distributed computing, parallel data processing, and cluster computing frameworks and problem-solving skills and the ability to work independently or as part of a team.
Experience with cloud platforms such as AWS, Azure, or GCP (especially EMR, Databricks, or HDInsight).
Solid understanding of Spark tuning, partitions, joins, broadcast variables, and performance optimization techniques.
Hands-on experience with Kafka, Hive, HBase, NoSQL databases, and data lake architectures.
Familiarity with CI/CD pipelines, Git, Jenkins, and automated testing.

Preferred Technical And Professional Experience

Experience with Databricks, Delta Lake, or Apache Iceberg.
Exposure to machine learning pipelines using Spark MLlib or integration with ML frameworks.
Contributions to open-source big data projects are a plus.
Excellent communication and leadership skills.
Understanding of data lake and lakehouse architectures.
Knowledge of Python, Java, or other backend languages is a plus.

Skills

Python

AWS

Apache

Apache Spark

automated testing

Azure

backend

Cassandra

compliance

data lake

data solutions

Databricks

ETL

functional programming

GCP

Git

HBase

Hive

Java

Jenkins

Kafka

machine learning

NoSQL

Parquet

Spark