RealPage Investment Management
Website:
realpage.com
Job details:
Overview
We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution ( identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.
Responsibilities
What You'll Do
Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring
- Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques
- Develop classification models that categorize unstructured or semi-structured data into meaningful business categories
- Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques
- Design candidate retrieval and indexing strategies to make models performant at scale
- Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases
- Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves
- Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks
Qualifications
- 10+ years of experience building and deploying ML models end-to-end (not just notebooks)
- Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks
- Hands-on experience with record linkage, entity resolution, or deduplication problems
- Experience building classification models (binary and multi-class) on structured and semi-structured data
- Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling
- Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data
- Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts
- Experience with SQL and relational databases (PostgreSQL or similar)
- Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders
Nice to Have
- Experience with blocking and indexing strategies for scalable record linkage
- Background in NLP, text normalization, or information extraction
- Familiarity with model serving in API contexts (Flask, FastAPI , or similar)
- Experience in data quality, master data management, or marketplace domains
- Exposure to deep learning frameworks ( PyTorch , TensorFlow) for text classification
Click on Apply to know more.