Sr. C++Developer || AI || HPC || Bangalore

KSA INC

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

C++
compliance
embedded systems
Fusion
Linux
TensorFlow
Unix
Pytorch

About the role

Website: ksainc.in
Job details:
We are seeking an experienced C++ AI Inference Engineer to design, optimize, and deploy high-performance AI inference engines using modern C++ and processor-specific optimizations. You will collaborate with research teams to productionize cutting-edge AI model architectures for CPU-based inference.

Key Responsibilities

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Collaborate with research teams to understand AI model architectures and requirements
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Design and implement AI model inference pipelines using C++17/20 and SIMD intrinsics (AVX2/AVX-512)
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Optimize cache hierarchy, NUMA-aware memory allocation, and matrix multiplication (GEMM) kernels
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Develop operator fusion techniques and CPU inference engines for production workloads
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Write production-grade, thread-safe C++ code with comprehensive unit testing
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Profile and debug performance using Linux tools (perf, VTune, flamegraphs)
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Conduct code reviews and ensure compliance with coding standards
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Stay current with HPC, OpenMP, and modern C++ best practices

Required Technical Skills

Core Requirements:

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Modern C++ (C++17/20) with smart pointers, coroutines, and concepts
p]:pt-0 [&>p]:mb-2 [&>p]:my-0">

SIMD Intrinsics - AVX2 Required, AVX-512 Strongly Preferred

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cache optimization - L1/L2/L3 prefetching and locality awareness
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> NUMA-aware programming for multi-socket systems
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> GEMM/blocked matrix multiplication kernel implementation
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> OpenMP 5.0+ for parallel computing
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux performance profiling (perf, valgrind, sanitizers)

Strongly Desired

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> High-performance AI inference engine development
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Operator fusion and kernel fusion techniques
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> HPC (High-Performance Computing) experience
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Memory management and allocation optimization

Qualifications

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Bachelor's/Master's in Computer Science, Electrical Engineering, or related field
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> 3-7+ years proven C++ development experience
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux/Unix expertise with strong debugging skills
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Familiarity with Linear Algebra, numerical methods, and performance analysis
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Experience with multi-threading, concurrency, and memory management
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Strong problem-solving and analytical abilities

Preferred Qualifications

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Knowledge of PyTorch/TensorFlow C++ backends
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Real-time systems or embedded systems background
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> ARM SVE, RISC-V vector extensions, or Intel ISPC experience

What You Will Work On

p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Production-grade AI inference libraries powering LLMs and vision models
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> CPU-optimized inference pipelines for sub-millisecond latency
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cross-platform deployment across Intel Xeon, AMD EPYC, and ARM architectures
p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Performance optimizations reducing inference costs by 3-5x

Skills: high performance computing (hpc),c++,multithreading,simd Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.