Shuru
Website:
shurutech.com
Job details:
This is a remote position.
Shuru is a self-managed technology team specializing in accelerating visions through product, technology, and AI leadership. With a focus on bespoke execution, we deliver impactful solutions that are scalable and designed for success. At Shuru, we deliver mobile solutions that meet and exceed customer expectations. Our collaborative and fast-paced environment encourages creativity and innovation.
We’re hiring a
Senior DevOps Engineer to help take cloud platform from pre-production to production readiness and scale.
In this role, you will work closely with our engineering and data teams to bring infrastructure under code, improve deployment pipelines, set up monitoring and alerting, and support the production deployment of data pipelines and risk oracle workloads. This is a high-ownership role for someone who can operate independently, make pragmatic infrastructure decisions, and help us balance speed, cost, scalability, and complexity as we build the foundation for platform.
- Assess and harden our current platform setup, primarily on GCP, so our core infrastructure, application services, data workloads, and deployment pipelines are production-ready.
- Bring infrastructure under Infrastructure-as-Code using Terraform or similar, so cloud resources are defined in code, reviewed through Git, and reproducible across environments.
- Standardize development, staging, and production environments, including configuration, secrets, environment isolation, deployment patterns, rollback processes, and autoscaling.
- Design and operate the platform's networking layer: VPC architecture, private connectivity, load balancing, egress controls, and edge protection (e.g., Cloud Armor or equivalent WAF).
- Lead the decision on when Cloud Run is sufficient, and when workloads justify GKE or another orchestration approach, balancing cost, scalability, reliability, complexity, and speed of delivery.
- Harden GitHub Actions pipelines across build, test, deploy, database migrations, image tagging, environment promotion, release validation, and rollback.
- Set up monitoring, logging, tracing, alerting, dashboards, and reliability targets across the platform, so issues surface quickly and are easy to diagnose.
- Set up and operate the platform capabilities needed to deploy AI models and agents in production, including model hosting, model/API gateways, prompt and version tracking, agent observability, evaluation workflows, cost monitoring, and reliability controls.
- Work with the data team to productionize data pipelines and risk oracle workloads, including serving, monitoring, scheduled/background runs, and future training workflows where needed.
- Establish secrets management, audit logging, IAM, and access patterns, with tested backup, restore, and disaster recovery procedures (with defined RPO/RTO targets) appropriate for a fintech handling sensitive financial deal data.
- Contribute to operational runbooks, on-call practices, incident response and postmortems, production readiness reviews (including load and performance testing), and infrastructure documentation.
- Monitor and optimize cloud costs as usage grows, especially across compute, databases, storage, vector infrastructure, and AI workloads.
- Collaborate with the CTO, CDO, and senior engineers on architecture decisions with infrastructure, reliability, security, cost, or operational impact
Requirements
•5+ years of DevOps, Platform, or SRE experience, ideally with at least 2 years working on GCP; experience with vertex ai , AWS or Azure is a plus.
• Hands-on production experience with Infrastructure-as-Code tools such as Terraform, Pulumi, CDK, or similar, including managing separate development, staging, and production environments, and helping set up consistent local or individual developer environments.
• Strong CI/CD experience, especially with GitHub Actions or similar, including build, test, release, rollback, and quality/security gates such as static analysis, dependency scanning, secret scanning, and container image scanning.
• Experience deploying and operating containerized services using Cloud Run, Kubernetes/GKE, ECS, or similar platforms; comfortable writing and optimizing Dockerfiles (multi-stage builds, image hardening) and managing container registries (Artifact Registry, ECR, or similar).
• Good judgment on when to use managed or serverless platforms versus Kubernetes or others, orchestrated approaches, balancing cost, scalability, reliability, operational complexity, and speed of delivery.
• Experience operating production data and caching infrastructure, including Cloud SQL/Postgres, Redis/Memorystore, migrations, backup strategies, performance monitoring, and basic tuning.
• Experience setting up production monitoring, logging, alerting, dashboards, and reliability targets using cloud-native monitoring, Sentry, Datadog, Grafana, Prometheus, or similar.
•Solid understanding of cloud security fundamentals, including IAM, secrets management, audit logging, network controls, and backup/recovery.
•Experience with workflow orchestration or async task systems such as Temporal, Celery or similar.
•Experience supporting ML or AI inference workloads in production, with strong hands-on experience across vector databases (Weaviate a plus), retrieval infrastructure, AI application infrastructure, and managed AI platforms.Experience supporting workflow orchestration or async task systems in production, such as Temporal, Celery, or similar.
•Exposure to model or agent deployment patterns, including real-time and background inference workflows, monitoring, evaluation workflows, and agent observability.
Nice-to-Have
• Prior fintech, financial services, or other regulated-industry experience.
• Familiarity with financial services security and data privacy expectations, such as MAS TRM
Guidelines, PDPA, GDPR, and ISO 27001-aligned security practices.
• Experience with managed AI training pipelines or broader model lifecycle workflows.
• Background or interest in climate finance, project finance, energy, or infrastructure sectors
Benefits
- Work on global projects with clients from worldwide.
- Be part of a remote-first culture-work from anywhere with flexibility.
- Enjoy team-building activities and regular outings.
- Collaborate and grow in a supportive environment with opportunities to learn from senior engineers.
- Competitive salary and benefits package.
Click on Apply to know more.