Senior Data Scientist

RealPage Investment Management

full-time

Required skills

Python
API
communication skills
data structures
deep learning
end-to-end
FastAPI
Flask
NLP
NumPy
Pandas
PostgreSQL
SQL
TensorFlow
Pytorch

About the role

RealPage Investment Management

Website: realpage.com
Job details:
Overview

We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution ( identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.

Responsibilities

What You'll Do

Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring

Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques
Develop classification models that categorize unstructured or semi-structured data into meaningful business categories
Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques
Design candidate retrieval and indexing strategies to make models performant at scale
Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases
Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves
Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks

Qualifications

10+ years of experience building and deploying ML models end-to-end (not just notebooks)
Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks
Hands-on experience with record linkage, entity resolution, or deduplication problems
Experience building classification models (binary and multi-class) on structured and semi-structured data
Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling
Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data
Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts
Experience with SQL and relational databases (PostgreSQL or similar)
Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders

Nice to Have

Experience with blocking and indexing strategies for scalable record linkage
Background in NLP, text normalization, or information extraction
Familiarity with model serving in API contexts (Flask, FastAPI , or similar)
Experience in data quality, master data management, or marketplace domains
Exposure to deep learning frameworks ( PyTorch , TensorFlow) for text classification

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.