Flag job

Report

Principal Engineer

Min Experience

8 years

Location

Denver or United States

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Why This Role Exists

We operate a multi-tenant automotive SaaS platform serving thousands of dealer groups across the United States. Our backend — event-driven serverless on AWS (Lambda, EventBridge, DynamoDB, S3, Step Functions) — orchestrates everything from dealer onboarding to inventory management to real-time transaction processing. That platform works. Now we need to make it think.

We are building agentic AI systems: autonomous, tool-using agents that observe platform state, reason over dealer context, take action through production APIs, and learn from outcomes. These are not chatbots bolted onto a dashboard. They are first-class platform services — backed by AWS Bedrock, connected to production systems via MCP servers — that make decisions, execute workflows, and close loops without human intervention unless guardrails say otherwise.

This Principal Engineer owns that entire surface. You are not advising on AI strategy from a whiteboard. You are writing agent code, defining tool interfaces, building evaluation harnesses, setting cost and latency budgets, and shipping production AI workflows that touch real dealers and real money. You set the engineering patterns the team follows, you help make the build-vs-buy calls, and when an agent misbehaves at 2 AM, your architecture is what determines whether it fails safe or fails loud.

Scope & Scale

  • 5000+ destination dealer tenants, each with isolated databases and per-tenant configuration.
  • Billions in annual Gross Merchandise Value (GMV) flowing through platform transactions.
  • Tens of thousands of API requests per minute across REST, SOAP, and event-driven integration surfaces.
  • Data pipelines spanning 6 integration domains with multi-protocol vendor connectivity.

What You Will Own

  • Ownership and core development of agentic AI systems — designing, building, and operating the AI agent infrastructure (AWS Bedrock, MCP servers) that powers intelligent automation across the platform. You are not advising on AI strategy; you are writing the agent code, defining the tool interfaces, building the evaluation harnesses, and shipping production AI workflows.
  • AI agent lifecycle end to end — from prompt engineering and tool-use design through guardrails, evaluation, cost optimization, and production observability. You own the patterns the team uses to build with AI: how agents connect to production systems, how we evaluate output quality, how we manage model costs at scale, and how we roll back when an agent misbehaves.
  • System design and technical decision-making for migration waves — from identity/tenant services through core domain extraction and frontend decomposition.
  • The dual-write framework, API Gateway traffic-splitting, and per-tenant feature flag rollout that make every migration step reversible.
  • Cross-cutting concerns: observability (OpenTelemetry, CloudWatch), security posture (Auth0 consolidation, IAM), and data architecture (DynamoDB single-table design, Aurora consolidation).
  • Mentoring and force-multiplying senior ICs — establishing patterns, reviewing designs, and raising the technical bar across 5 engineering teams.
  • Consolidate and strategize 30+ different integrations and make the future integrations easier.

Technical Environment

  • Cloud Services: High-availability AWS stack including Lambda, EventBridge, DynamoDB, S3, ECS Fargate, Aurora, API Gateway, CloudWatch, and Secrets Manager.
  • Development Languages: Modern Python and Java (Spring Boot) alongside TypeScript/React (Next.js 16) frontends, with legacy domain coverage in PHP/Laravel.
  • AI & Agentic Systems: Advanced agentic workflow orchestration utilizing lean AWS Bedrock AgentCore, MCP servers, or LangChain/LangGraph frameworks.
  • Data Engineering: Complex data architectures featuring DynamoDB single-table design, MySQL/Aurora, S3 data lakes, Glue Data Catalog, Athena, and Data pipelines.
  • Infrastructure & Security: Enterprise-grade CI/CD and observability via CloudFormation, Auth0 consolidation, OpenTelemetry, and CircleCI.
  • Integration Surfaces: Multi-protocol connectivity spanning REST, SOAP/XML, EventBridge event-bus patterns, SES processing, and Playwright browser automation.

First 12 Months

  • Months 1–3: Immerse in the codebase. Audit the current architecture across all stacks. Publish the first Architecture Decision Record (ADR) for the next migration wave. Establish your design review cadence with the team.
  • Months 4–6: Drive the AI/agentic integration layer — Bedrock-powered automation in at least one production workflow. Establish the patterns for how the team builds with AI going forward; both agentic insight retrieval agentic workflow automation.
  • Months 7–9: Own and deliver the first migration wave end-to-end — from design doc through production cutover with dual-write validation. Stand up the observability baseline (OpenTelemetry instrumentation, dashboards, SLOs).
  • Months 10–12: Second migration wave in production. Architecture runway documented for the next 12 months. The team operates at a higher technical bar because of patterns you set.

About the company

Software platform for automotive dealership sales and finance automation.

Skills

AWS
Lambda
EventBridge
DynamoDB
StepFunctions
ECS
RDS
API Gateway
S3
CloudWatch
Aurora
Auth0
LangChain
Playwright
PHP
Python
Java
TypeScript