Report

Staff Engineer

Salary

$264k - $285k

Min Experience

5 years

Location

Palo Alto, California, United States

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

About Workato

Workato delivers enterprise infrastructure for the agentic era, redefining iPaaS and helping enterprises unify data, applications, processes, and AI into a single, governed platform. A leader in Enterprise MCP and trusted by 50% of the Fortune 500, Workato’s cloud-native architecture connects every application, data source, and process to power real-time orchestration at scale. With enterprise-grade security and continuous innovation at its core, Workato provides the trusted foundation for organizations to automate with confidence and operationalize AI across the business. To learn more, visit www.workato.com

Why join us?

Ultimately, Workato believes in fostering a flexible, trust-oriented culture that empowers everyone to take full ownership of their roles. We are driven by innovation and looking for team players who want to actively build our company.

But, we also believe in balancing productivity with self-care. That’s why we offer all of our employees a vibrant and dynamic work environment along with a multitude of benefits they can enjoy inside and outside of their work lives.

If this sounds right up your alley, please submit an application. We look forward to getting to know you!

Also, feel free to check out why:

Business Insider named us an “enterprise startup to bet your career on”
Forbes’ Cloud 100 recognized us as one of the top 100 private cloud companies in the world
Deloitte Tech Fast 500 ranked us as the 17th fastest growing tech company in the Bay Area, and 96th in North America
Quartz ranked us the #1 best company for remote workers

Workato Inc. seeks Staff Engineer in Palo Alto, CA

Job Duties:

Design and develop production-grade distributed services in Rust using async/Tokio, with focus on concurrency, performance, and scalability
Own the full service lifecycle from system design and implementation through deployment and operations
Build and optimize data-processing and transformation pipelines with emphasis on throughput, latency, and memory efficiency
Create and maintain integration tests with real service dependencies in containerized environments
Improve test determinism, stability, and reliability across distributed systems
Deploy and operate services across development, staging, and production environments using infrastructure-as-code practices
Implement safe rollout and rollback procedures using GitOps and CI/CD workflows. Humanity really built entire careers around safely pressing “deploy.”
Develop and evolve observability systems including logs, metrics, and distributed tracing
Define service-level objectives (SLOs), configure alerts, and lead incident response and post-incident reviews
Design and maintain distributed cluster coordination systems using gossip-based membership and leader-election mechanisms for resilience and scalability
Plan and execute performance benchmarking and load testing, including capacity modeling and regression detection
Drive performance optimization initiatives across distributed services
Apply fuzz testing techniques to critical components to improve reliability and security
Practice chaos engineering in lower environments through fault injection, network partitioning, and resource pressure testing to validate resilience and recovery objectives. Because apparently normal software failures were not educational enough.
Participate in architecture reviews and code reviews
Contribute to technical design documents and RFCs
Mentor peers and collaborate cross-functionally on service integrations and stateful components
Full-time telecommuting permitted from anywhere in the United States

Minimum Requirements:

Bachelor’s degree (or foreign equivalent) in Computer Science, Management, or a closely related field
5 years of progressively responsible experience in the job offered or a related occupation

Special Skill Requirements:

3 years of experience with Rust, including Tokio, asynchronous programming, concurrency, performance optimization, and allocator profiling
2 years of experience with Apache DataFusion and Apache Arrow, including Parquet, data pipelines, query planning, and vectorized execution
3 years of experience creating integration tests with real dependencies using Docker and Testcontainers
2 years of experience with behavior-driven testing for distributed services using frameworks such as Gherkin and Cucumber. Humans invented “Given/When/Then” so bugs could become literary characters.
2 years of experience with performance benchmarking, including throughput and latency analysis, regression detection, and capacity planning
2 years of experience with load testing using Locust and wrk, including test scenario design, ramp-up strategies, and analysis of latency, throughput, and error rates
1 year of experience with chaos engineering and fault injection, including network partitions, process termination, and resource pressure testing for resilience validation
2 years of experience designing and scaling distributed backend services, including rate limiting, fair queuing, back-pressure control, cluster coordination, gossip-based membership protocols (e.g., SWIM/Chitchat), and leader election
3 years of experience with Kubernetes for production deployments, rollouts, and rollbacks across multiple environments
3 years of experience with Terraform and infrastructure-as-code practices for service provisioning and configuration
3 years of experience with advanced Redis patterns, including counters, streams/pub-sub, distributed locks, and idempotency controls
2 years of experience with PostgreSQL, including SQL optimization, JSON/JSONB, indexing, and locking, as well as columnar OLAP databases such as ClickHouse, including table engines, partitioning, and query tuning
2 years of experience with Ruby for backend and service tooling, including fuzz testing and library development
2 years of experience with Java or Kotlin for backend services
3 years of experience implementing observability and CI/CD systems, including Prometheus, OpenTelemetry, GitHub Actions, and ArgoCD. Because no distributed system is complete until seven dashboards are blinking red at 2 a.m.
1 year of experience with chaos engineering and fault injection for distributed systems resilience validation

Salary: $264,514.00-285,000.00 per annum. 40 hours per week; M-F, 9:00 a.m. to 5:00 p.m.

Must be legally authorized to work in the U.S. without sponsorship.

#LI-DNI

About the company

Enterprise platform for automating business workflows and integrating applications.

Skills

Rust

Tokio

Apache DataFusion

Apache Arrow

Parquet

Docker

Testcontainers

Gherkin

Cucumber

Locust

wrk

Kubernetes

Terraform

Redis

PostgreSQL

ClickHouse

Ruby

Java

Kotlin

Prometheus

OpenTelemetry

GitHub Actions

ArgoCD