Masscom Corporation
Website:
masscomcorp.net
Job details:
We are looking for a hands-on Senior DevOps Engineer with deep expertise in Oracle Cloud Infrastructure (OCI) to serve as the primary technical owner of our cloud infrastructure. In this role, you aren't just an advisor; you are the individual responsible for the end-to-end lifecycle of our environment—including architecture, provisioning, routine maintenance, version upgrades, and rapid issue resolution. You will prioritize OCI-native solutions for our core workloads while managing the maintenance and performance of our secondary AWS footprint.
- This is a high-impact, individual contributor role for a practitioner with 7+ years of experience in direct infrastructure management.
- Availability: US day time. 9 am to 6 pm working. 9 am to 9 pm on-call every day including weekends.
Responsibilities- Infrastructure Ownership: Directly manage the configuration, deployment, and health of the OCI environment. You are responsible for ensuring enterprise-scale reliability and technical performance.
- Maintenance & Upgrades: Own the lifecycle of all cloud resources. This includes performing regular patches, version upgrades (e.g., Kubernetes/OKE versions, database patching), and architectural refactoring to stay current with OCI best practices.
- Production Reliability & On-Call: Serve as the primary point of contact for production health. You must be available for a 7-day on-call rotation during US daytime hours to troubleshoot outages, resolve PagerDuty alerts, and perform root cause analysis.
- Hands-on Multi-Cloud Support: Execute the integration and day-to-day maintenance of secondary AWS environments, managing the plumbing (networking, storage, and compute) that connects OCI and AWS.
- Automation Execution: Build and maintain all infrastructure provisioning scripts using Terraform (OCI/AWS providers) and OCI Resource Manager. You are responsible for the integrity of the Infrastructure-as-Code codebase.
- Container Management: Manually manage and scale Oracle Container Engine for Kubernetes (OKE). This includes node pool maintenance, ingress controller configuration, and cluster security hardening.
- Security Implementation: Actively implement and tune security controls using OCI Cloud Guard and Maximum Security Zones. You are responsible for remediating any vulnerabilities or compliance gaps.
- Cost & Resource Tuning: Monitor and adjust OCI service limits and compute shapes (Flex) to ensure the environment is sized correctly for actual workload demands.
Required Skill Set- OCI Mastery: Deep, hands-on experience managing VCN, DRG, Service Gateway, Compute (Flex/Bare Metal), OKE, OCI Load Balancer, and Object Storage.
- AWS Proficiency: Hands-on experience maintaining EC2, VPC, S3, RDS, and EKS.
- Technical Maintenance: Proven experience in performing "zero-downtime" upgrades, database migrations, and infrastructure patching in live production environments.
- Incident Response: Expert-level experience with PagerDuty and incident command; able to work independently to resolve high-priority outages under pressure.
- Infrastructure as Code: Advanced Terraform skills (OCI Provider) and experience with OCI Resource Manager.
- Kubernetes Operations: Full-stack expertise in OKE (Oracle Kubernetes Engine), including node pool management, CSI/CNI configurations, and security patches.
- Database Operations: Working knowledge of managing OCI Autonomous Database (ATP/ADW) and Exadata Cloud Service instances.
- Scripting: Proficiency in Python (OCI SDK) or Go to automate maintenance tasks and operational workflows.
Click on Apply to know more.