Software Engineer

Deltek

full-time

Required skills

Website: deltek.com
Job details:

Key Responsibilities:

Site Reliability & Platform Engineering

Design, build, and maintain the infrastructure and tooling that underpins Deltek’s SaaS platforms at scale.
Drive reliability improvements across the full stack, spanning application-level resilience patterns through to infrastructure-level fault tolerance.
Uphold and extend our IaC-first engineering culture, where all infrastructure changes are made through code and shipped to production via fully automated CI/CD pipelines.
Build and improve CI/CD pipelines to support safe, frequent deployments with automated rollback capabilities.
Develop internal tooling and automation to reduce toil and increase engineering self-service.

Observability & Performance

Design and maintain comprehensive observability solutions including logging, metrics, tracing, and alerting across our AWS-based infrastructure.
Proactively identify performance bottlenecks and reliability risks before they impact customers.
Conduct capacity planning and load testing to ensure systems can scale to meet demand.

Incident Management & On-Call Support

Participate in and own the on-call rotation, ensuring fair distribution and adequate coverage across the team, and acting as a first responder for production incidents affecting our SaaS platforms.
Lead incident response: triage, coordinate cross-team resolution, communicate clearly with stakeholders, and drive issues to resolution with a sense of urgency.
Own post-incident reviews, facilitate blameless post-mortems, identify root causes, and ensure action items are tracked and completed.
Take pride in leaving systems better than you found them, consistently reducing the frequency and impact of incidents over time.

Collaboration & Engineering Culture

Partner with software engineering teams to review system designs and architectures with a reliability lens.
Mentor and provide technical guidance to junior engineers on SRE practices, tooling, and operational excellence.
Contribute to a strong team culture, supportive, curious, and focused on doing great work while having fun.

Technology Stack:

Qualifications:

Education

Bachelor’s degree in Computer Science or a related field, or equivalent experience.

Experience

Minimum of 3-5 years of overall experience in software development, infrastructure engineering, or site reliability engineering.
3+ years of hands-on experience in an SRE, DevOps, or platform engineering role in a production SaaS environment.
3+ years applying an automation-first approach to problem-solving using configuration management tools and scripting.
Strong experience with AWS; familiarity with services such as EC2, EKS, RDS, S3, CloudWatch, and IAM.

Technical Skills

Infrastructure-as-Code expertise with Terraform.
Proficiency in at least one scripting/programming language (Python, Node.js, or similar) for automation and tooling development.
Strong understanding of networking fundamentals: DNS, load balancing, TLS, firewalls, and VPCs.
Experience with CI/CD pipelines and deployment automation.
Solid understanding of relational databases (PostgreSQL preferred) including query performance and operational concerns.
Hands-on experience with observability tooling (e.g., Prometheus/Grafana, CloudWatch, or similar).

Soft Skills

Strong communication skills: able to explain complex systems clearly, write crisp incident reports, and influence technical decisions across teams.
Calm under pressure, able to lead effectively during high-severity incidents.
Passion for reliability, operational excellence, and building systems that just work.
Commitment to reducing toil through thoughtful automation and process improvement.
Blameless, growth-oriented mindset with a focus on continuous improvement.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.