Flag job

Report

Data Science - AI Document Understanding, Co-op

Min Experience

0 years

Location

Lehi, Utah, United States

JobType

Part_time

About the job

Info This job is sourced from a job board

About the role

Ancestry is seeking an exceptional and highly motivated Data Science Co-Op to join our Content AI team, a dynamic group at the forefront of Document Understanding. You'll play a vital role in developing innovative AI models that extract and organize text and image information from billions of historical and genealogical records enabling customers to discover, share, and connect with their family history. As a Co-Op on the Content AI team, you will build, train and fine-tune models that process historical documents to detect meaningful, personalized insights within historical documents that connect people to their ancestors. You will also work closely with engineering teams to train, optimize, and deploy models that promote product development, customer success, and content creation across our Family History business. What you will do: Innovate with State-of-the-Art AI: Implement and experiment with cutting-edge transformer and generative AI solutions for key Document Understanding tasks, including OCR, handwriting recognition, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs working with diverse genealogical and historical collections spanning newspapers, city directories, family history books, and vital records (birth, marriage, death). Analyze and Optimize Multi-Modal Models: Evaluate the performance of multi-modal models in zero-shot and few-shot learning scenarios for comprehensive document understanding. Collaborate on Cloud Deployment: Partner closely with ML Ops and Data Science Engineers to seamlessly deploy datasets, truth sets, models, and pipelines for training and inference in cloud environments. Communicate Insights Effectively: Clearly and confidently present your findings, deliverables, and proposed solutions to technical and non-technical audiences, including teams, stakeholders, and executives. Who You Are: Currently pursuing an advanced degree (Master's or PhD preferred) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or related quantitative field with a strong data focus. Specialization in generative AI & LLMs, embeddings, LoRA, QLoRA, vector databases, transformer models, Natural Language Processing (NLP), with software development expertise including data structures, distributed model training, and inference optimizations. Exhibit strong proficiency in Python and relevant tools and libraries, including those for transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, NLTK). Familiarity with cloud platforms and related AI/ML services such as Google Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock is a plus.

About the company

When you join Ancestry, you join a human-centered company where every person's story is important. Ancestry®, the global leader in family history, empowers journeys of personal discovery to enrich lives. With our unparalleled collection of more than 40 billion records, over 3 million subscribers and over 23 million people in our growing DNA network, customers can discover their family story and gain a new level of understanding about their lives. Over the past 40 years, we've built trusted relationships with millions of people who have chosen us as the platform for discovering, preserving and sharing the most important information about themselves and their families.

Skills

python
transformer models
llms
embeddings
lora
qlora
vector databases
nlp
data structures
distributed model training
inference optimizations