Staff Site Reliability Engineer (SRE), Engineering Tools

Tesla

full-time

Required skills

Python
API
Atlassian
Autopilot
backend
Bash
CircleCI
CLI
Confluence
cURL
Docker
full stack
GitHub
Helm
incident response
Jenkins
Jira
Kubernetes
Maven
Nexus
NPM
production support
SaaS
Slack
SRE
SSO
Terraform
REST APIs

About the role

Tesla

Website: tesla.com
Job details:
About The Team
Engineering Tools owns and operates the on-prem developer platforms that every Tesla engineer depends on every day: GitHub Enterprise, JFrog Artifactory, GitHub Copilot (self-hosted), Cursor (on-prem), and the Atlassian suite (Jira Service Management + Confluence). We also run the AI-augmented support layer that fronts these platforms - a Mattermost support bot backed by our internal Nabu RAG platform, observability via Open Telemetry, and a GitOps-driven Kubernetes deployment footprint in our cluster.

If one of our systems is down, thousands of Tesla engineers stop shipping. We're hiring a Staff SRE to own the reliability, scalability, and operational maturity of that footprint.

Key Responsibilities

Platform administration: Manage GitHub Enterprise (Cloud and/or Server) organizations, teams, repos, branch protection rules, Actions runners, and Apps. Administer JFrog Artifactory repositories (local, remote, virtual), permissions, replication, and storage policies.

User support: Triage and resolve tickets covering access requests, repo migrations, build/artifact failures, authentication issues, and integrations. Define and meet SLAs.

Migrations & onboarding: Lead repo migrations into/out of GitHub (e.g., GitHub Migrations API, gh-migration tooling) and Artifactory repository imports/exports. Onboard new teams with templates and

standards.

Automation: Build scripts and tooling (Bash, Python, Terraform, GitHub Actions, JFrog CLI) to automate provisioning, permission audits, cleanup, and reporting. Eliminate repetitive support work.

Reliability & monitoring: Monitor platform health, storage usage, runner capacity, and license consumption. Coordinate upgrades, patches, and incident response with the vendor.

Security & compliance: Enforce SSO/SAML, SCIM provisioning, secret scanning, signed commits, audit logging, and least-privilege access. Support SOC 2 / ISO audits.

Integrations: Maintain integrations with CI/CD (Jenkins, GitHub Actions, GitLab CI), SAST/SCA scanners, Jira, Slack, and internal developer portals.

Documentation & enablement: Write runbooks, FAQs, and self-service guides. Host office hours and training sessions for

developers.

Required Qualifications

3+ years administering GitHub Enterprise (Cloud or Server) at scale (500+ users or 1000+ repos).

2+ years administering JFrog Artifactory (or comparable: Nexus, Cloudsmith, Harbor).

Strong scripting in Bash and Python; comfortable with REST APIs and curl/jq.

Working knowledge of Git internals (refs, packfiles, LFS, submodules) and ability to debug repo corruption, large-file issues, and merge problems.

Hands-on experience with at least one CI/CD system (GitHub Actions, Jenkins, GitLab CI, CircleCI).

Familiarity with SSO/SAML, SCIM, OIDC, and personal/fine-grained access tokens.

Excellent written communication - you can turn a confusing incident into a clear postmortem and a vague ticket into a fixable problem.

Preferred Qualifications

Experience with GitHub Migrations API, gh-migration-tool, or gei (GitHub Enterprise Importer).

Experience operating Artifactory in HA mode, with S3/blob storage, and Xray for vulnerability scanning.

Infrastructure-as-Code: Terraform providers for GitHub and Artifactory.

Container/package format expertise: Docker, npm, Maven, PyPI, Helm, Conan.

Familiarity with secret scanning tools (GitHub Advanced Security, GitGuardian, TruffleHog) and dependency management

(Dependabot, Renovate).

Prior on-call or production support experience.

Exposure to GHAS, Copilot for Business, or Copilot Enterprise rollouts.

Bonus

Experience operating self-hosted LLM inference (Copilot Enterprise, on-prem Cursor backend, vLLM, or similar), RAG pipelines, or vector databases.

Soft Skills
Excellent written communication - you can write a post-mortem that engineering leadership reads to the end, and a runbook that a junior on-call can execute at 3 AM. Strong technical influence without authority; you raise the reliability bar across teams by example and through reviews, not by mandate. Calm under pressure during sev-1 incidents affecting thousands of engineers.

Education
Bachelor's degree in Computer Science, Engineering, or related field or equivalent professional experience.

Why This Role Is Different

Customer = every Tesla engineer. Your platforms unblock Vehicle Software, Autopilot, Energy, and Manufacturing teams. The impact of every reliability improvement compounds across the company.

On-prem by design. We don't outsource our critical paths to SaaS. You'll own the full stack - hardware, network, OS, platform, application, observability - and you'll have the authority to change it.

AI-augmented support. We're not just operating platforms; we're building the AI tooling (Nabu RAG + Mattermost support bot + Copilot/Cursor integrations) that lets a small SRE team serve a very

large engineering org. You'll help shape that.

High autonomy, high ownership. Engineering Tools is small and senior-heavy. As a Staff SRE you'll set technical direction for multiple platforms - not just execute someone else's roadmap.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.