Website:
yminds.ai
Job details:
About the Role
Our client is seeking a highly skilled GPU Programming Engineer to develop, optimize, and deploy GPU-accelerated solutions for high-performance deep learning workloads. The ideal candidate will have hands-on experience in GPU programming using technologies such as CUDA, HIP, ROCm, or OpenCL, along with strong expertise in parallel computing, performance optimization, and DL system integration. You will work closely with architects and engineering teams to build scalable, high-performance compute solutions across heterogeneous hardware platforms.
Key Responsibilities
- Develop, optimize, and maintain GPU-accelerated modules for deep learning and HPC pipelines using CUDA, HIP, ROCm, or OpenCL.
- Analyze and improve GPU kernel performance through profiling, benchmarking, and optimization techniques.
- Optimize:
- Memory access patterns
- Compute throughput
- Kernel execution efficiency
- Shared memory utilization
- Port CPU-based implementations to GPU platforms while ensuring scalability and correctness.
- Work with system architects, software engineers, and AI teams to integrate GPU-accelerated solutions.
- Profile and debug GPU applications using tools such as:
- NVIDIA Nsight
- rocprof
- Perfetto
- Contribute to performance tuning for AI/ML workloads and HPC systems.
- Ensure software quality, maintainability, and high-performance standards across platforms.
Required Skills
- Bachelor’s or Master’s degree in:
- Computer Science
- Electrical Engineering
- Related technical fields
- 2–7 years of hands-on experience in GPU Programming.
Strong expertise in:
Strong understanding of:
- GPU architecture
- Parallel programming models
- Memory hierarchy
- Shared memory & bank conflicts
- Proficiency in C/C++ programming.
- Experience working in Linux-based environments.
- Familiarity with GPU profiling and performance tuning tools.
- Strong debugging and analytical problem-solving skills.
Nice-to-Have Skills
- Knowledge of SIMD Programming.
- Understanding of Neural Network Operators.
- Hands-on experience with:
- PyTorch
- TensorFlow
- TensorRT
- Exposure to Deep Learning optimization techniques such as:
- Quantization
- Pruning
- Experience with High Performance Computing (HPC) systems.
- Familiarity with AI hardware accelerators and heterogeneous computing environments.
About YMinds.AI
YMinds.AI is a next-generation recruitment platform delivering pre-verified, industry-ready AI and engineering talent. By combining proprietary AI-driven assessments with expert human evaluation, we help organizations hire from the top 1% of talent across AI, HPC, Cloud, Embedded Systems, and Advanced Engineering domains. Our rigorous verification process ensures candidates are production-ready, enabling faster, smarter, and more confident hiring decisions.
Keywords
GPU Programming Engineer, CUDA, HIP, ROCm, OpenCL, HPC, Deep Learning, Parallel Computing, TensorRT, PyTorch, TensorFlow, C++, Linux, AI Acceleration, GPU Optimization
Hashtags
#GPUProgramming #CUDA #HPC #DeepLearning #ParallelComputing #AIEngineering #TensorRT #OpenCL #ROCm #MachineLearning #Linux #CPlusPlus #TechJobs #BangaloreJobs
Required Skills
Click on Apply to know more.