Free cookie consent management tool by TermsFeed Data Engineer (Web Scraper)- Intern | Antal Tech Jobs
Back to Jobs
13 Weeks ago

Data Engineer (Web Scraper)- Intern

decor
Pune, Maharashtra, India
Information Technology
Other
Clootrack

Overview

About the Role

We're looking for a skilled Web Scraping Data Engineer (Intern) to design and implement robust data extraction systems. In this role, you'll develop scalable crawling architectures to collect high-quality data while ensuring compliance with ethical standards and data regulations.

Key Responsibilities

  • Design and maintain efficient web crawling systems using frameworks like Scrapy, Playwright, or Selenium

  • Implement data processing pipelines to clean, normalize, and structure extracted content
  • Optimize crawling strategies to improve efficiency while respecting website policies
  • Develop monitoring systems to identify and resolve scraping issues quickly
  • Deliver high-quality datasets for analysis and model training
  • Implement storage solutions for large-scale data management
  • Ensure compliance with data regulations and ethical scraping practices

Required Skills

  • Strong Python programming experience.

  • Good to know SQL.
  • Hands-on experience with web scraping tools (BeautifulSoup, Scrapy, Selenium)
  • Proficiency with HTML, JavaScript, and HTTP protocols
  • Experience with data processing libraries (pandas, PySpark)
  • Familiarity with Linux/UNIX environments
  • Knowledge of version control systems and code review practices
  • Strong problem-solving abilities and attention to detail
  • Excellent communication skills (written and verbal English)

Good to have :(Optional)

  • Familiarity with AI frameworks (Hugging Face, LangChain, OpenAI)

  • Familiarity with LLM training pipelines and data requirements
  • Experience with text data augmentation and synthetic data generation


Preferred Qualifications

  • Experience with large-scale distributed crawling systems

  • Knowledge of proxy management and anti-bot evasion techniques
  • Familiarity with any cloud platforms (AWS, GCP, Azure)
  • Experience with containerization (Docker, Kubernetes)


What We Offer

  • Opportunity to work on cutting-edge data collection projects

  • Collaborative environment with talented engineers
  • Competitive compensation package
  • Professional growth and development opportunities

Share job
Similar Jobs
View All
1 Day ago
Python Developer - Bangalore/ Pune
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Job Title: Python Developer with React.js - Bangalore/ Pune About Us “Capco, a Wipro company, is a global technology and management consulting firm. Awarded with Consultancy of the year in the British Bank Award and has been ranked Top 100 Best Com...
decor
1 Day ago
Azure Devops Engineer(5+ Yrs Exp)
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Required Qualifications & Skills: 5+ years in DevOps, SRE, or Infrastructure Engineering. Strong expertise in Cloud (AWS/GCP/Azure) & Infrastructure-as-Code (Terraform, CloudFormation). Proficient in Docker & Kubernetes. Hands-on with CI/CD tools ...
decor
1 Day ago
Practo Technologies - Lead Frontend Software Engineer - React.js/Next.js
Information Technology
Lead Software Engineer - UI Job DescriptionAbout Practo : www.practo.comPracto is the world's leading healthcare platform that connects millions of patients with hundreds of thousands of healthcare providers around the world and helps people make be...
decor
1 Day ago
Software Engineer 2
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI...
decor
1 Day ago
.Net Developer - Full Stack Technologies
Information Technology
Job Title : Senior .NET Full Stack DeveloperCompany : XevyteLocation : Bangalore (Hybrid)Experience Required : 6+ YearsAbout XevyteXevyte is a global technology and services company committed to driving digital transformation and sustainable growth....
decor
1 Day ago
SAP-Data Analyst
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Job Role:- SAP-Data Analyst  Job Location: -Noida/Gurgaon/Hyderabad/Bangalore/Pune Experience: -5 Years Job Roles & Responsibilities: - Collaborate with Finance & FBT Teams: Drive all data-related activities for the finance SAP deployment, ensur...
decor
1 Day ago
Senior Data Analyst Engineer
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Mirra Healthcare India Immedidate Joiners Only Job Description: We are seeking a highly skilled and experienced Senior Data Analyst/Engineer with a strong background in Python programming and Power BI development. The ideal candidate will have at ...
decor
1 Day ago
Senior Manager, Data Stewardship Engineer
Information Technology
  • Pune, Maharashtra, India
This site is for Residents of Europe, Middle East, Africa, Latin America & Asia Pacific.Residents of the United States, Canada & Puerto Rico, please click here. ...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media