Flag job

Report

Data Scientist - Optical Character Recognition

Salary

₹20 - 30 LPA

Min Experience

5 years

Location

Bengaluru, Karnataka, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Job Description : Data Scientist

Company : Mastech Digital

Location : Bangalore Urban, Karnataka, India

Position Type : Full Time

Duration : Permanent

Notice Period : Immediate Joiner / Serving Notice / Less than 30 Days

Experience : 5+ Years

About The Role

Mastech Digital is seeking a highly skilled and experienced Data Scientist to join our dynamic team. In this role, you will be responsible for developing and deploying advanced AI models, with a focus on OCR, LLMs, and computer vision. You will work within the AWS ecosystem, adhering to best practices for code quality, data security, and model deployment. This position requires a strong understanding of machine learning techniques, cloud technologies, and the ability to collaborate effectively with cross-functional teams.

Responsibilities/Duties

AI Model Development and Deployment :

  • Train and fine-tune AI models using OCR and Large Language Models (LLMs).
  • Develop and implement computer vision models for object detection and segmentation.
  • Deploy and maintain models in production, collaborating with software engineers.

Cloud Infrastructure And Architecture

  • Utilize AWS services, including SageMaker, Bedrock, Lambda, S3, and API Gateway, for model development and deployment.
  • Adhere to the AWS Well-Architected Framework for robust and scalable solutions.

Data Management And Security

  • Perform data cleaning and preprocessing to ensure high-quality training data.
  • Ensure data confidentiality and implement HIPAA compliance measures.

Software Development Practices

  • Follow internal best practices for code monitoring, testing, and version control.
  • Implement CI/CD pipelines using Jenkins and other relevant tools.
  • Conduct thorough QA and application testing.

Model Evaluation And Optimization

  • Perform robust testing of models to ensure accuracy and reliability.
  • Compare the feasibility of different models and select the most appropriate solution.
  • Fine-tune LLMs (Mistral, Llama, and other open-source models) and perform prompt tuning.

Collaboration And Communication

  • Collaborate with other data scientists to divide work and ensure timely project completion.
  • Meet deadlines for weekly/bi-weekly meetings and provide regular updates.
  • Create data visualizations to communicate results to non-technical stakeholders.
  • Testing and implementing NER models.

Huggingface And Related Technologies

  • Familiarity with huggingface packages.

Skills

Programming and Data Science :

  • Proficient in Python.
  • Strong SQL skills.
  • Experience with data cleaning and big data processing.
  • Experience with OCR and NER models.

Cloud Technologies (AWS)

  • Extensive experience with AWS SageMaker, Bedrock, Lambda, S3, and API Gateway.
  • Proficiency in using Textract API.

Machine Learning And AI

  • Experience with training and fine-tuning LLMs (Mistral, Llama, etc.).
  • Proficiency in prompt tuning.
  • Experience with computer vision models for object detection and segmentation.

DevOps And CI/CD

  • Experience with CI/CD pipelines and version control systems.
  • Proficiency in using Jenkins.

Huggingface

  • Familiarity with huggingface packages.

Qualifications

  • 5+ years of experience as a Data Scientist.
  • Bachelor's or Master's degree in Computer Science, Data Science, or a related field.
  • Strong understanding of machine learning algorithms and techniques.
  • Excellent problem-solving and analytical skills.
  • Strong communication and collaboration skills.
  • Ability to work independently and as part of a team.

Preferred Qualifications

  • Experience with healthcare data and HIPAA compliance.
  • AWS certifications.
  • Experience with advanced computer vision techniques.

(ref:hirist.tech)

About the company

Mastech Digital

Skills

python
sql
ocr
ner
aws
sagemaker
bedrock
lambda
s3
api-gateway
textract
llm
prompt-tuning
computer-vision
object-detection
segmentation
ci-cd
jenkins
huggingface