Flag job

Report

Web Crawling & Scraping Engineer

Min Experience

0 years

Location

remote

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

We seek a highly skilled and motivated Web Crawling & Scraping Engineer. We crawl over 100k news websites daily and we are looking for someone who is passioned about Web Crawling the same way we are. Curious to explore new ways of handling difficult website cases and automating our crawling techniques. Functions: Crawling Platform: Design, construct, test and maintain robust, reliable, and scalable crawling pipeline infrastructure. Add an automatic way of fixing non-working crawlers Provide metrics on website coverage Data Pipeline: Design, construct, test and maintain robust, reliable, and data pipeline infrastructure. Automation and unit tests Optimization: Optimize server performance and resource utilization of crawling infrastructure. Regularly review and improve system performance and scalability. Collaboration and Documentation: Maintain accurate and up-to-date documentation of server configurations, procedures, and policies. Provide technical support and training to team members as needed. Example Tasks: Introduce a new automatic way of crawling a website that does not work with existing techniques Come up with an idea on how to verify why a specific crawler stopped working and fix it automatically Use LLM methods to improve crawling methods

About the company

NewsCatcher is a news API that provides access to news articles from over 100k global news sources. We are looking for a highly skilled and motivated Web Crawling & Scraping Engineer to join our team.

Skills

sql
nosql
kubernetes
docker
web crawling
web scraping