About the role
We are seeking a highly analytical and solution-oriented Python Developer with a strong background in data science and large language models (LLMs). The ideal candidate will possess deep expertise in troubleshooting complex issues and developing robust, scalable solutions. This role requires a blend of software engineering and data science skills in Natural Language Processing (NLP), with an emphasis on document text extraction, model execution, optimization, and implementation in production environments.
Responsibilities:
Design, develop, and deploy Python-based solutions to accurately extract and normalize information from documents
Use cloud-based text extraction models and LLMs from OpenAI and Anthropic
Work with NLP models for language processing / data classification
Evaluate / compare model output
Troubleshoot issues in data pipelines and model performance, and implement effective solutions to address performance and accuracy issues
Collaborate with cross-functional teams to understand requirements, design solutions, and deliver high-quality code.
Ensure code quality, documentation, and maintainability across projects.
Optimize existing models and systems for efficiency and scalability.
Requirements:
Minimum of 5 years of experience in Python with a focus on NLP
1 year of hand-on integration and model execution experience with LLMs from OpenAI and/or Anthropic
Experience working with relational databases like PostgreSQL and no-SQL databases like MongoDB
Engineering degree in Computer Science or equivalent
Solid understanding of software development best practices and data pipeline design
Experience with Git, CI/CD pipelines, and cloud deployment (AWS, or Azure).
Strong problem-solving skills and ability to work in an agile environment.
Excellent communication and interpersonal skills.
Preferred Skills (Nice to Have):
Experience working with text extraction models (e.g., OCR, NLP-based extractors).
Experience implementing fuzzy (non-exact) matching logic using tools like OpenSearch or similar technologies.
Familiarity with International Trade Documents and domain-specific data.