Flag job

Report

Backend Platform Engineer

Salary

₹30 - 50 LPA

Min Experience

2 years

Location

Bangalore

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Backend Engineer (JD)

About Simplismart
At Simplismart, we're is building the future of AI infrastructure.

ChatGPT boasts 100m weekly active users right now. Every major tech company is moving towards AI-based applications. After 2005, every company became an internet company. By 2030, every company will be an AI company. Simplismart aims to develop the infrastructure so that these companies can leverage the power of AI for their applications.

We have built the fastest inference engine for generative AI models, allowing users to fine-tune and deploy any generative model in their cloud or ours. Deploying ML models in-house is a huge pain, and our end-to-end platform offers easy orchestration and pipeline optimisation so that companies can spend time on their core product instead.

We're a fun team of veterans from Uber, AWS, Oracle Cloud, Google Search, and many more companies. We're well-funded and recently raised a Series A round led by Accel and some marquee angels (founders of Notion, Tata1mg, etc.)

Job Description
This role requires a strong background in machine learning, proficiency in relevant programming languages and tools, a willingness to embrace challenges, and a commitment to the best software development and testing practices. Additionally, familiarity with cloud platforms and a dedication to staying current with industry trends are essential for success in this role.

Who we are looking for:

  1. Python Experience: 3+ years of experience with Python.
  2. Generative AI Experience: You must be proficient in LLMs like Llama and Mistral and other Generative AI models like Whisper and Stable Diffusion.
  3. Cloud Experience: You should be familiar with cloud computing platforms, with a preference for expertise in AWS and knowledge of platforms like Google Cloud Platform (GCP) or Microsoft Azure.
  4. Experience with Docker and Kubernetes: You should be proficient in Docker and Kubernetes.
  5. Test-Driven Development: Belief in and adherence to Test-Driven Development practices is essential. This means writing tests before writing code to ensure the quality and correctness of your work.

Good to Have:

  1. Deep understanding of GPU Architecture: You should have a deep knowledge of GPU Architectures like A100, A10G, and T4 chips. Experience with CUDA is a plus.
  2. Familiarity with LLM optimization techniques: Good enough idea of optimisation techniques like Quantization, speculative decoding, continuous batching, etc.
  3. Responsibilities:
  4. Design and Develop Scalable Machine Learning Systems: You will be responsible for collaborating with the tech team to design and build machine learning systems that are scalable and ready for production use from the start. This involves the end-to-end development of machine learning models and pipelines. You should be able to deploy and benchmark an ML model in under 30 minutes.
  5. Conduct Extensive Research: You'll need to stay current with the latest machine learning technologies and research to identify the best approaches and tools for the job.
  6. Improve Metrics: You will develop strategies for improving metrics using real-world data. This likely involves optimizing and fine-tuning machine learning models to achieve better results.
  7. Infrastructure Improvements: You'll assist in enhancing and extending existing infrastructure, which may involve adding new features and optimizing performance.

Why should you join SimpliSmart?

Well, let's break away from the conventional perks and instead focus on what you WON’T experience here:

  • Legacy System Headaches: You won't have to endlessly grapple with outdated legacy systems that hinder your productivity and creativity.
  • Bossy Culture: At SimpliSmart, we believe in collaboration and empowerment, not hierarchy. You won't have a boss breathing down your neck but instead, colleagues who support your growth.
  • Dark Circles: Late nights and overwork are not the norm here. We prioritize work-life balance, ensuring you won't be sporting those tired, dark circles under your eyes.
  • Stagnation: Say goodbye to redundant and stagnant tasks. We thrive on innovation and dynamic challenges that keep you engaged and motivated.

 

About the company

About us
Fastest inference for generative AI workloads. Simplified orchestration via a declarative language similar to terraform. Deploy any open-source model and take advantage of Simplismart’s optimised serving. With a growing quantum of workloads, one size does not fit all; use our building blocks to personalise an inference engine for your needs.

API vs In-house

Renting AI via third-party APIs has apparent downsides: data security, rate limits, unreliable performance, and inflated cost. Every company has different inferencing needs: One size does not fit all. Businesses need control to manage their cost <> performance tradeoffs. Hence, the movement towards open-source usage: businesses prefer small niche models trained on relevant datasets over large generalist models that do not justify ROI.

Need for MLOps platform

Deploying large models comes with its hurdles: access to compute, model optimisation, scaling infrastructure, CI/CD pipelines, and cost efficiency, all requiring highly skilled machine learning engineers. We need a tool to support this advent towards generative AI, as we had tools to transition to cloud and mobile. MLOps platforms simplify orchestration workflows for in-house deployment cycles. Two off-the-shelf solutions readily available:

  1. Orchestration platforms with model serving layer: do not offer optimised performance for all models, limiting user’s ability to squeeze performance
  2. GenAI Cloud Platforms: GPU brokers offering no control over cost

Enterprises need control. Simplismart’s MLOps platform provides them with building blocks to prepare for the necessary inference. The fastest inference engine allows businesses to unlock and run each model at performant speed. The inference engine has been optimised at three levels: the model-serving layer, infrastructure layer, and a model-GPU-chip interaction layer, while also enhanced with a known model compilation technique.

Skills

Python
Backend