Overview
About the Role We are looking for a skilled Web Scraping Developer to build and maintain robust data extraction pipelines. You will work on complex scraping projects targeting diverse websites, handling dynamic content, and ensuring reliable data delivery at scale.
Key Responsibilities
Design, develop, and maintain scalable web scrapers and data extraction pipelines using Scrapy.
Handle JavaScript-rendered and dynamic web pages using Playwright for browser automation.
Deploy and manage scraping infrastructure on Ubuntu servers, including scheduling, monitoring, and proxy rotation.
Reverse-engineer site structures, APIs, and anti-bot mechanisms to ensure stable data collection.
Optimize scraper performance, manage concurrency, and handle large-scale data processing.
Maintain code quality, write reusable components, and document scraping logic and setup procedures.
Troubleshoot failures, adapt to site changes, and ensure high availability of data feeds.
Required Skills
Strong proficiency in Python and the Scrapy framework (spiders, middlewares, pipelines, item loaders).
Hands-on experience with Playwright (or Puppeteer/Selenium) for headless browser automation.
Solid understanding of Ubuntu/Linux environments, shell scripting, and server management.
Familiarity with web technologies: HTTP protocols, DOM manipulation, AJAX, JSON, and XML.
Experience with proxy management, CAPTCHA solving, and bypassing anti-scraping measures.
Knowledge of databases (PostgreSQL, MongoDB, etc.) for storing extracted data.
Understanding of Git, Docker, and CI/CD workflows.
Nice to Have
Experience with distributed scraping architectures (Scrapyd, Celery, Kafka).
Familiarity with data validation, cleaning, and ETL processes.
Knowledge of cloud platforms (AWS, GCP, or Azure) for deployment.
What We Offer
Competitive salary and benefits.
Opportunity to work on challenging data extraction problems.
Collaborative and growth-oriented environment.
How to Apply Send your resume and links to any relevant projects or GitHub repositories.