About the role
Project Description We are seeking an experienced freelancer to develop a fully automated web scraping and data extraction system that runs on a scheduled basis with zero manual operation. The ideal candidate should be skilled in Python or relevant automation tools and have experience handling large-scale data collection efficiently.
Project Scope:
-Develop a fully automated web scraping system that runs on a predefined schedule without manual intervention.
-Extract structured data from multiple websites and store it in a database, CSV, or other formats.
-Implement error handling, logging, and notification systems to ensure reliability.
-Handle dynamic content, CAPTCHA, and anti-scraping measures when necessary.
-Optimize performance for efficiency and scalability.
Requirements:
-Strong programming skills in Python (BeautifulSoup, Scrapy, Selenium, or similar tools).
-Experience with task scheduling and automation (e.g., cron jobs, Airflow, cloud-based scheduling).
-Ability to work around anti-scraping protections while ensuring compliance with ethical standards.
-Knowledge of databases (SQL, NoSQL) for storing extracted data.
-Experience with API integration as an alternative data collection method.
-Strong attention to detail for data accuracy and consistency.
Preferred Skills:
-Experience working with serverless functions, cloud computing, or containerization (AWS Lambda, Google Cloud Functions, Docker, etc.).
-Knowledge of proxy management, rotating user agents, and CAPTCHA-solving techniques.
-Familiarity with real-time data pipelines and automated reporting.