Forage AI
Website:
forage.ai
Job details:
Location: Remote (Work from Home)
About ForageAI
ForageAI builds next-generation systems for large-scale data collection, processing, and automation. Our platforms handle web crawling, document parsing, distributed data pipelines, and AI-driven data systems.
Our stack is primarily Python and cloud-native infrastructure on AWS, with exposure to GCP and Azure. We emphasize reliable infrastructure, scalable architecture, and strong engineering practices. Increasingly, we are integrating GenAI and AI agents into our systems.
We operate in a high-ownership environment, where engineers are responsible not only for building systems but also ensuring they are reliable, scalable, and production ready.
Role Overview :
We are looking for a DevOps Engineer with strong software engineering capabilities. This role focuses on designing and operating cloud infrastructure, CI/CD systems, deployment automation, and observability platforms, while also writing production-grade automation and platform tooling in Python.
You will work closely with developers to build reliable, scalable, and secure infrastructure that powers large-scale data processing systems.
This role requires someone who can think like both an infrastructure engineer and a software engineer.
Key Responsibilities:
Cloud Infrastructure & System Design
- Design and manage scalable cloud infrastructure primarily on AWS.
- Architect systems using services such as EC2, ECS/EKS, Lambda, S3, RDS, DynamoDB, SQS/SNS, CloudWatch.
- Design cloud-native architectures that are resilient, scalable, and cost efficient.
- Build systems that remain cloud agnostic where possible.
CI/CD & Deployment Automation
- Build and maintain CI/CD pipelines for automated testing, build, and deployment.
- Improve deployment reliability and speed through automation and release strategies.
- Implement blue-green, rolling, and canary deployments.
Infrastructure as Code
- Manage infrastructure using Terraform, CloudFormation, or similar IaC tools.
- Automate infrastructure provisioning and environment setup.
Platform Engineering & Developer Productivity
- Build internal tools and automation scripts in Python to improve developer workflows.
- Create standardized deployment frameworks for applications and services.
- Support development teams with infrastructure design and operational best practices.
Observability & Reliability
- Implement logging, monitoring, alerting, and tracing systems.
- Maintain system reliability through proactive monitoring and incident response.
- Define and monitor SLOs, SLAs, and system health metrics.
Security & Best Practices
- Implement security best practices including IAM policies, secrets management, and least-privilege access.
- Ensure infrastructure and deployments follow secure and compliant patterns.
Collaboration
- Work closely with software engineers, QA teams, and data engineers.
- Participate in architecture discussions and infrastructure design reviews.
Required Qualifications:
- 5–8 years of experience in DevOps, infrastructure engineering, or backend platform engineering.
- Strong experience with AWS cloud infrastructure.
- Hands-on experience with CI/CD systems (GitHub Actions, Jenkins, GitLab CI, etc.).
- Experience with containerization technologies such as Docker.
- Exposure to container orchestration platforms such as Kubernetes or ECS.
- Strong scripting/programming skills in Python & SQL.
- Experience with Linux systems and networking fundamentals.
- Understanding of distributed systems and microservices architectures.
- Strong experience with Git and modern development workflows.
Preferred / Good to Have (Prioritized)
Infrastructure & DevOps
- Infrastructure as Code using Terraform or CloudFormation
- Experience running Kubernetes clusters in production
- Experience with AWS cost optimization and scaling strategies
Data Platforms
- Exposure to data pipelines and distributed data systems
- Experience with Airflow, Kafka, Spark, or large-scale ETL workflows
Observability & Reliability
- Experience with Prometheus, Grafana, ELK stack, or Datadog
Multi-Cloud Exposure
- Experience with GCP or Azure
GenAI / AI Infrastructure (Nice to have)
- Experience building infrastructure for LLM pipelines, vector databases, or AI systems
How We Work
- Engineers own systems end-to-end: design → build → deploy → operate
- Strong emphasis on automation, reliability, and observability
- Incremental delivery through small pull requests and clear design discussions
- Collaborative culture with high ownership and accountability
Work‑from‑Home Requirements:
- High‑speed internet for calls and collaboration.
- A capable, reliable computer (modern CPU, 16GB+ RAM).
- Headphones with clear audio quality.
- Stable power and backup arrangements.
Forage AI is an equal opportunity employer. We value curiosity, craftsmanship, and collaboration.
Click on Apply to know more.