Website:
unicorn-workforce.com
Job details:
Job Description – Senior Software Engineer / SRE (Observability Focus)
Experience: 8+ Years Relevant Experience
Work Location: Remote
Shift Timing
Night Shift: (PST Timezone Coverage Required)
(UAN Mandatory)
Interview Process
- Final Round: Face-to-Face Mandatory
- Background Verification will be conducted by Jade
Job Summary
We are seeking an experienced Senior Software Engineer / SRE with strong expertise in Observability Platforms, Kubernetes, Cloud Infrastructure, and Reliability Engineering. The role focuses on platform reliability, monitoring, automation, and modernization initiatives with a strong emphasis on DataDog and Kubernetes-based environments.
This position combines Software Engineering (60–70%) and Site Reliability Engineering (30–40%) responsibilities to improve observability, scalability, reliability, and operational excellence across enterprise engineering platforms.
Key Responsibilities
- Support platform reliability, monitoring, and modernization initiatives.
- Provide operational and training support for DataDog and enterprise observability platforms.
- Enhance system observability, reliability, scalability, and performance.
- Design and implement monitoring, alerting, and tracing solutions.
- Drive automation for operational tasks and monitoring frameworks.
- Support Kubernetes-based platform operations and observability integrations.
- Configure dashboards, alerts, logging, metrics, and APM monitoring.
- Integrate observability tools within CI/CD pipelines.
- Monitor and optimize containerized and microservices-based architectures.
- Manage observability platform configurations, integrations, and access controls.
- Perform proactive maintenance, troubleshooting, and platform improvement initiatives.
- Collaborate with engineering, DevOps, cloud, and infrastructure teams.
Required Skills & Expertise
Core Engineering & Platform Skills
- Strong proficiency in at least one of the following:
- Python
- JavaScript (Node.js)
- Java
- Hands-on experience with:
- API integrations
- API design and consumption
- RESTful services
- Strong experience working in Kubernetes environments including:
- Deployments
- Operations
- Monitoring
- Troubleshooting
Observability & Monitoring
- Strong experience with:
- DataDog (preferred)
- Prometheus
- Grafana
- Experience configuring:
- Dashboards
- Alerts
- Tracing
- Metrics
- Logging
- APM solutions
- Experience monitoring:
- Microservices
- Containerized applications
- Distributed systems
Cloud &
Infrastructure
- Hands-on experience with AWS cloud platform.
- Experience integrating observability tools into cloud-native environments.
SRE & DevOps
- Experience integrating observability solutions within CI/CD pipelines.
- Ability to automate operational and monitoring tasks using scripting (Python preferred).
- Strong understanding of:
- Reliability Engineering
- Scalability
- Performance optimization
- Incident management
Strongly Preferred Skills
- Experience owning and operating internal engineering platforms.
- Deep expertise with enterprise observability platforms.
- Strong experience with:
- DataDog agent installation
- Integrations
- API key management
- Secure configurations
- User roles & access control
- Proven track record of driving platform modernization and operational excellence.
Nice-to-Have Skills
- Familiarity with Go (Golang).
- Experience with additional observability platforms:
- New Relic
- Dynatrace
- Elastic
- Splunk Observability
Educational Qualification
- Bachelor’s or Master’s degree in:
- Computer Science
- Information Technology
- Engineering
- Related technical field
Key Competencies
- Site Reliability Engineering (SRE)
- Observability & Monitoring
- Kubernetes Operations
- AWS Cloud
- DataDog Administration
- Automation & Scripting
- CI/CD Integration
- Microservices Monitoring
- Platform Reliability
- Performance Optimization
- Incident Management
- Operational Excellence
Click on Apply to know more.