Flag job

Report

AI Infrastructure Engineer

Salary

₹12 - 18 LPA

Min Experience

2 years

Location

Los Angeles, San Francisco, Palo Alto, Toronto

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

As an AI Infrastructure Engineer at HeyGen, you will be responsible for designing, building, and maintaining the infrastructure that powers our cutting-edge artificial intelligence models and applications. You will work closely with our AI research and engineering teams to ensure that our infrastructure is scalable, reliable, and efficient. Key Responsibilities: - Design and build scalable, fault-tolerant, and highly available infrastructure for AI workloads - Implement and maintain distributed systems for data processing, model training, and inference - Automate and optimize infrastructure provisioning and deployment processes - Monitor and troubleshoot infrastructure issues to ensure high availability and performance - Collaborate with cross-functional teams to identify and address infrastructure bottlenecks - Research and evaluate new technologies and tools to improve the efficiency and reliability of our infrastructure Requirements: - Strong experience in building and managing cloud-based infrastructure, ideally on AWS, Google Cloud, or Azure - Proficiency in infrastructure-as-code tools like Terraform, CloudFormation, or Ansible - Expertise in container technologies like Docker and Kubernetes - Familiarity with distributed systems, data processing frameworks (e.g., Apache Spark, Kafka), and machine learning infrastructure - Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK) - Excellent problem-solving and troubleshooting skills - Strong communication and collaboration skills

About the company

HeyGen is a leading artificial intelligence company that develops cutting-edge AI models and applications for a wide range of industries. Our mission is to push the boundaries of what's possible with AI and to create products that have a positive impact on the world.

Skills

aws
terraform
kubernetes
docker
spark
kafka
prometheus
grafana
elk