Sri Shankara Cancer Hospital & Research Centre
Website:
shankaracancerhospitals.org
Job details:
Position: Data Science Intern – High-Dimensional Data Preprocessing
Type: Internship (Full-Time Preferred)
Duration: 2–6 Months
Location: On-site
Positions Available: 4
Position Overview
The institute invites applications for the role of Data Science Intern to support the preprocessing and preparation of large-scale, high-dimensional clinical datasets used in research analytics and data science workflows.
The intern will assist in transforming raw clinical data into analysis-ready datasets through structured data cleaning, validation, transformation, and feature preparation processes. The datasets requiring careful handling of data quality, missingness, and structural consistency.
This internship is designed to provide hands-on experience in real-world data science pipelines, with a focus on the data preprocessing layer, which is foundational for statistical analysis, machine learning, and AI model development.
Alongside project work, interns will participate in a structured weekly seminar program aimed at strengthening theoretical understanding and providing exposure to advanced analytics concepts and career pathways.
Key Responsibilities
1. Clinical Data Preprocessing
● Clean and standardize large clinical datasets containing high-dimensional variables.
● Perform preprocessing tasks such as:
○ Missing data detection and handling
○ Outlier identification
○ Data normalization and scaling
○ Data transformation and restructuring
● Prepare structured datasets suitable for statistical analysis and machine learning models.
2. Data Quality and Validation
● Identify inconsistencies, duplicate records, and structural anomalies in datasets.
● Develop validation checks using Excel and Python-based scripts.
● Assist in implementing standardized data quality control frameworks.
3. Data Workflow Support
● Assist in organizing datasets into structured formats suitable for analytics workflows.
● Maintain documentation for preprocessing pipelines and transformation steps.
● Support reproducible workflows for research analytics.
Learning and Development
Interns will participate in a Weekly Data Science and Analytics Seminar Series designed to strengthen conceptual understanding and provide exposure to advanced analytical methods.
Seminar topics may include:
● Foundations of Statistical Analysis
● Data Science workflows and research data pipelines
● AI/ML/DL/LLM
● Handling high-dimensional datasets
● Career pathways in data science, analytics, and AI research
The seminar series aims to support career development and interdisciplinary exposure for students pursuing analytics and data science careers.
Eligibility Criteria
Education
Students currently pursuing or recently completing:
● BSc / BTech / BE / BCA
● MSc / MTech / MCA
● MBA in Business Analytics / Data Analytics
● Postgraduate programs in Data Science or Analytics
or equivalent programs in Analytics
Candidates who have completed specialized analytics certification programs (e.g., data science bootcamps or analytics training programs) may also apply.
Technical Skills
Required
● Advanced proficiency in Microsoft Excel
● Working knowledge of Python for data analysis
● Understanding of data cleaning and preprocessing concepts
Basic Familiarity
● R / Python programming
● SQL
Preferred
● Experience with large datasets
● Familiarity with Pandas and NumPy
● Exposure to data analysis or machine learning workflows
Click on Apply to know more.