Report

Staff Software Engineer - Distributed Data Systems

Min Experience

5 years

Location

Amsterdam

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

Adyen hosts a significant footprint in both scale and variety of distributed systems. This covers distributed compute (Spark, Trino, Flink), distributed databases (Cassandra, Druid), and distributed file/object storage (HDFS, Ceph Object Gateway). These technologies are offered as-a-service internally, towards product engineering, to support them in building and scaling world-class products. We're looking for an expert with deep knowledge in distributed systems, to both improve operations/scalability of existing offerings, as well as introduce and mature new ones. The initial focus will be scaling and tuning our Hadoop and Spark infrastructure, in addition to iterating on our OLAP platform, where we currently use Apache Druid. Over time, we expect this expertise to be useful in a wider range of distributed systems internally. What you'll do You'll be asked to both build new as-a-service offerings, and improve the existing ones. This role is perfect for you if you're passionate about one or all of the following: Design, build, and optimize data clusters to ensure scalability, fault tolerance, and high availability. Covering both batch and streaming workloads. Focus on improving the Spark, Hadoop, Kubernetes, Deltalake, and Druid ecosystems internally. Evolve these open-source projects internally, with the intention of contributing this code upstream. Educate and grow Adyen's internal knowledge in these topics. Covering both peer platform engineers, and the platform users. Who you are Must have experience in: Scaling and tuning large deployments of Spark-on-k8s and Spark-on-Hadoop Hadoop and the HDFS protocol Designing and tuning shuffle heavy systems, on yarn, or on k8s via remote shuffle services One of the lakehouse file formats (Delta, Iceberg, Hudi) OLAP technologies covering at least one of Clickhouse, Apache Druid, Apache Pinot, or Apache Doris. Open-source contributions in one of the must-have technologies, or other common ones (e.g. Kafka, Cassandra, Trino, etc) Team player with strong communication skills Ability to work closely with diverse stakeholders you enable (analysts, data scientists, data engineers, etc.) and depend upon (infrastructure, security, etc). Demonstrated ability to troubleshoot and resolve issues in large-scale, production environments with distributed systems. Nice-to-have experience in: Next generation and multi-modal data formats (e.g. LanceDB) Building self-service stateful platforms Open-source S3 alternatives (e.g. ceph, minio, etc) Native or accelerated runtimes for Spark (Apache DataFusion Comet, Apache Gluten, Nvidia RAPIDS, etc)

About the company

Adyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition. For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and support to ensure they are enabled to truly own their careers. We are motivated individuals who tackle unique technical challenges at scale and solve them as a team. Together, we deliver innovative and ethical solutions that help businesses achieve their ambitions faster.

Skills

spark

hadoop

kubernetes

delta-lake

druid

cassandra

trino

kafka

ceph

minio