Website:
iudx.org.in
Job details:
Job Description – Lead Data Scientist Skills: OpenCV, PyTorch / TensorFlow, SciPy Stack, Machine Learning, Deep Learning, Computer Vision, NLP, LLMs, Knowledge Graphs, Semantic Web Technologies, Healthcare Analytics, Distributed AI Systems, MLOps
Years of Experience: 5 years to 10 years
DescriptionThe Center of Data for Public Good (CDPG) is an interdisciplinary research center at the Indian Institute of Science (IISc), Bangalore, focusing on Data Science and AI for translational research in domains such as Traffic & Transportation, Air Quality, Geospatial Analytics, Agriculture, and Health Care.
We are building large-scale AI-driven systems and Digital Public Infrastructure platforms by combining diverse multimodal data sources including ANPR, telco, AVLS, incident feeds, surveillance camera video, geospatial datasets, healthcare records, clinical documents, and mobility datasets. These systems support intelligent decision making, semantic reasoning, digital twins, traffic modeling, healthcare analytics, and large-scale AI applications for public good.
As part of these initiatives, we are looking for experienced Data Scientists with strong expertise in Computer Vision, Machine Learning, NLP, and Large Language Models to build scalable production-grade AI systems and research-driven solutions.
Responsibilities • Develop Computer Vision, AI, and ML techniques to detect and classify vehicular and other road objects, track and re-identify them, and generate KPIs for traffic analytics to be deployed at scale across approximately 6000 cameras.
• Build large-scale vision applications requiring optimizations such as pruning, quantization, architecture tuning, and deployment optimization for YOLO-like models in terms of VRAM usage, latency, throughput, and model size.
• Deploy models in high-performance inference systems such as Triton Inference Server and optimize inference pipelines potentially at CUDA/TensorRT level.
• Work on end-to-end MLOps including experiment tracking, architecture search, model tuning, reproducibility, inference deployment, distributed training, and scalable AI infrastructure.
• Understand, reproduce, and innovate on top of latest research papers related to Computer Vision, Deep Learning, multimodal AI, and semantic AI systems.
• Develop NLP and LLM-powered systems for applications including semantic search, question-answering systems, summarization, knowledge extraction, and document understanding.
• Design and implement retrieval systems using techniques such as RAG (Retrieval-Augmented Generation), Knowledge Graph Augmented Generation (KAG), vector databases, embeddings, and semantic reasoning pipelines.
• Build and maintain Knowledge Graphs and semantic data models using technologies such as RDF/OWL, SPARQL, Graph Databases, and healthcare interoperability standards.
• Develop AI pipelines for extracting structured information from unstructured and multimodal data sources including PDFs, scanned documents, healthcare claims, clinical notes, reports, and video feeds.
• Work on multimodal AI systems integrating Computer Vision, NLP, Knowledge Graphs, geospatial systems, and distributed analytics infrastructure.
• Contribute to scalable distributed AI systems involving technologies such as Ray, Dask, Kubernetes, distributed inference, and large-scale model serving.
• Collaborate closely with interdisciplinary researchers, engineers, domain experts, and public-sector stakeholders across mobility, healthcare, and urban systems domains.
Good to See on Your Resume • Experience with YOLO and other object detection/recognition models including architecture understanding, fine-tuning, optimization, and deployment at scale.
• Strong experience with PyTorch, TensorFlow, CUDA, TensorRT, Triton, and GPU optimization workflows.
• Experience with NLP and LLM frameworks such as HuggingFace Transformers, LangChain, LlamaIndex, Haystack, vLLM, or TensorRT-LLM.
• Experience building RAG pipelines, semantic search systems, embedding pipelines, vector databases, and agentic AI workflows.
• Experience with Knowledge Graphs, RDF/OWL, SPARQL, Neo4j, GraphDB, ontology engineering, or semantic reasoning systems.
• Familiarity with healthcare interoperability standards and terminologies such as FHIR, HL7, ICD-10, SNOMED-CT, LOINC, or OMOP.
• Experience with MLOps and distributed computing tools such as MLFlow, Kubeflow, Ray, Dask, Airflow, or Kubernetes.
• Strong grasp of Computer Vision, Machine Learning, Deep Learning, NLP, linear algebra, probability, statistics, and optimization fundamentals.
• Relevant publications or open-source contributions in Computer Vision, NLP, Healthcare AI, Knowledge Representation, or Distributed AI Systems.
• Ability to translate state-of-the-art research papers into scalable production-grade systems and applied AI solutions.
• Proficiency in Python with experience in multiprocessing, asynchronous programming, distributed systems, C/C++ wrappers, and high-performance data pipelines.
• Experience working with multimodal datasets involving text, images, structured data, video streams, and geospatial information.
- CDPG at IISc offers an opportunity to work on cutting-edge applied AI systems with real-world impact across mobility, healthcare, and digital public infrastructure.
Click on Apply to know more.