Website:
ksainc.in
Job details:
We are seeking an experienced
C++ AI Inference Engineer to design, optimize, and deploy high-performance AI inference engines using modern C++ and processor-specific optimizations. You will collaborate with research teams to productionize cutting-edge AI model architectures for CPU-based inference.
Key Responsibilities
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Collaborate with research teams to understand AI model architectures and requirements
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Design and implement AI model inference pipelines using C++17/20 and SIMD intrinsics (AVX2/AVX-512)
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Optimize cache hierarchy, NUMA-aware memory allocation, and matrix multiplication (GEMM) kernels
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Develop operator fusion techniques and CPU inference engines for production workloads
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Write production-grade, thread-safe C++ code with comprehensive unit testing
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Profile and debug performance using Linux tools (perf, VTune, flamegraphs)
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Conduct code reviews and ensure compliance with coding standards
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Stay current with HPC, OpenMP, and modern C++ best practices
Required Technical Skills
Core Requirements:
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Modern C++ (C++17/20) with smart pointers, coroutines, and concepts
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0">
SIMD Intrinsics - AVX2 Required, AVX-512 Strongly Preferred
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cache optimization - L1/L2/L3 prefetching and locality awareness
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> NUMA-aware programming for multi-socket systems
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> GEMM/blocked matrix multiplication kernel implementation
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> OpenMP 5.0+ for parallel computing
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux performance profiling (perf, valgrind, sanitizers)
Strongly Desired
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> High-performance AI inference engine development
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Operator fusion and kernel fusion techniques
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> HPC (High-Performance Computing) experience
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Memory management and allocation optimization
Qualifications
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Bachelor's/Master's in Computer Science, Electrical Engineering, or related field
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> 3-7+ years proven C++ development experience
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux/Unix expertise with strong debugging skills
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Familiarity with Linear Algebra, numerical methods, and performance analysis
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Experience with multi-threading, concurrency, and memory management
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Strong problem-solving and analytical abilities
Preferred Qualifications
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Knowledge of PyTorch/TensorFlow C++ backends
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Real-time systems or embedded systems background
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> ARM SVE, RISC-V vector extensions, or Intel ISPC experience
What You Will Work On
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Production-grade AI inference libraries powering LLMs and vision models
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> CPU-optimized inference pipelines for sub-millisecond latency
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cross-platform deployment across Intel Xeon, AMD EPYC, and ARM architectures
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Performance optimizations reducing inference costs by 3-5x
Skills: high performance computing (hpc),c++,multithreading,simd
Click on Apply to know more.