GPU Engineer (2-7 years of Experience) | Immediate Joiners

YMinds.AI

full-time

Required skills

C++
CUDA
deep learning
embedded systems
GPU
kernel
system integration
TensorFlow
Pytorch

About the role

Website: yminds.ai
Job details:

About the Role

Our client is seeking a highly skilled GPU Programming Engineer to develop, optimize, and deploy GPU-accelerated solutions for high-performance deep learning workloads. The ideal candidate will have hands-on experience in GPU programming using technologies such as CUDA, HIP, ROCm, or OpenCL, along with strong expertise in parallel computing, performance optimization, and DL system integration. You will work closely with architects and engineering teams to build scalable, high-performance compute solutions across heterogeneous hardware platforms.

Key Responsibilities

Develop, optimize, and maintain GPU-accelerated modules for deep learning and HPC pipelines using CUDA, HIP, ROCm, or OpenCL.
Analyze and improve GPU kernel performance through profiling, benchmarking, and optimization techniques.
Optimize:
Memory access patterns
Compute throughput
Kernel execution efficiency
Shared memory utilization
Port CPU-based implementations to GPU platforms while ensuring scalability and correctness.
Work with system architects, software engineers, and AI teams to integrate GPU-accelerated solutions.
Profile and debug GPU applications using tools such as:
NVIDIA Nsight
rocprof
Perfetto
Contribute to performance tuning for AI/ML workloads and HPC systems.
Ensure software quality, maintainability, and high-performance standards across platforms.

Required Skills

Bachelor’s or Master’s degree in:
Computer Science
Electrical Engineering
Related technical fields
2–7 years of hands-on experience in GPU Programming.

Strong expertise in:

CUDA
HIP / ROCm
OpenCL

Strong understanding of:

GPU architecture
Parallel programming models
Memory hierarchy
Shared memory & bank conflicts
Proficiency in C/C++ programming.
Experience working in Linux-based environments.
Familiarity with GPU profiling and performance tuning tools.
Strong debugging and analytical problem-solving skills.

Nice-to-Have Skills

Knowledge of SIMD Programming.
Understanding of Neural Network Operators.
Hands-on experience with:
PyTorch
TensorFlow
TensorRT
Exposure to Deep Learning optimization techniques such as:
Quantization
Pruning
Experience with High Performance Computing (HPC) systems.
Familiarity with AI hardware accelerators and heterogeneous computing environments.

About YMinds.AI

YMinds.AI is a next-generation recruitment platform delivering pre-verified, industry-ready AI and engineering talent. By combining proprietary AI-driven assessments with expert human evaluation, we help organizations hire from the top 1% of talent across AI, HPC, Cloud, Embedded Systems, and Advanced Engineering domains. Our rigorous verification process ensures candidates are production-ready, enabling faster, smarter, and more confident hiring decisions.

Keywords

GPU Programming Engineer, CUDA, HIP, ROCm, OpenCL, HPC, Deep Learning, Parallel Computing, TensorRT, PyTorch, TensorFlow, C++, Linux, AI Acceleration, GPU Optimization

Hashtags

#GPUProgramming #CUDA #HPC #DeepLearning #ParallelComputing #AIEngineering #TensorRT #OpenCL #ROCm #MachineLearning #Linux #CPlusPlus #TechJobs #BangaloreJobs

Required Skills

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.