Free cookie consent management tool by TermsFeed Data Engineer (Web Scraper)- Intern | Antal Tech Jobs
Back to Jobs
7 Weeks ago

Data Engineer (Web Scraper)- Intern

decor
Pune, Maharashtra, India
Information Technology
Other
Clootrack

Overview

About the Role

We're looking for a skilled Web Scraping Data Engineer (Intern) to design and implement robust data extraction systems. In this role, you'll develop scalable crawling architectures to collect high-quality data while ensuring compliance with ethical standards and data regulations.

Key Responsibilities

  • Design and maintain efficient web crawling systems using frameworks like Scrapy, Playwright, or Selenium

  • Implement data processing pipelines to clean, normalize, and structure extracted content
  • Optimize crawling strategies to improve efficiency while respecting website policies
  • Develop monitoring systems to identify and resolve scraping issues quickly
  • Deliver high-quality datasets for analysis and model training
  • Implement storage solutions for large-scale data management
  • Ensure compliance with data regulations and ethical scraping practices

Required Skills

  • Strong Python programming experience.

  • Good to know SQL.
  • Hands-on experience with web scraping tools (BeautifulSoup, Scrapy, Selenium)
  • Proficiency with HTML, JavaScript, and HTTP protocols
  • Experience with data processing libraries (pandas, PySpark)
  • Familiarity with Linux/UNIX environments
  • Knowledge of version control systems and code review practices
  • Strong problem-solving abilities and attention to detail
  • Excellent communication skills (written and verbal English)

Good to have :(Optional)

  • Familiarity with AI frameworks (Hugging Face, LangChain, OpenAI)

  • Familiarity with LLM training pipelines and data requirements
  • Experience with text data augmentation and synthetic data generation


Preferred Qualifications

  • Experience with large-scale distributed crawling systems

  • Knowledge of proxy management and anti-bot evasion techniques
  • Familiarity with any cloud platforms (AWS, GCP, Azure)
  • Experience with containerization (Docker, Kubernetes)


What We Offer

  • Opportunity to work on cutting-edge data collection projects

  • Collaborative environment with talented engineers
  • Competitive compensation package
  • Professional growth and development opportunities

Share job
Similar Jobs
View All
1 Day ago
TrueFan - Senior Machine Learning Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
About UsTrueFan is at the forefront of AI-driven content generation, leveraging cutting-edge generative models to build next-generation products. Our mission is to redefine content generation space through advanced AI technologies, including deep ge...
decor
1 Day ago
Salesforce commerce cloud consultant
Information Technology
  • Thiruvananthapuram, Kerala, India
Salesforce Commerce Cloud consultant  5+ Years of Experience 6 to 12 months Mode - Remote 1.1LPM - 1.2LPM Max Key Responsibilities Translate business requirements into scalable Salesforce Service Cloud solutions, in collaboration with CAE's technic...
decor
1 Day ago
Cloud Infrastructure Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
DescriptionInvent the future with us. Recognized by Fast Company’s 2023 100 Best Workplaces for Innovators List, Ampere is a semiconductor design company for a new era, leading the future of computing with an innovative approach to CPU design focuse...
decor
1 Day ago
Devops Engineer- Intermetiate
Information Technology
  • Thiruvananthapuram, Kerala, India
BackJD: Dev ops Engineer:As a DevOps Specialist- should be able to take ownership of the entire DevOps process, including Automated CI/CD pipelines and deployment to production.They should also be comfortable with risk analysis and prioritization.Le...
decor
1 Day ago
Sr Data Scientist (London)
Information Technology
  • Thiruvananthapuram, Kerala, India
AryaXAI stands at the forefront of AI innovation, revolutionizing AI for mission-critical, highly regulated industries by building explainable, safe, and aligned systems that scale responsibly. Our mission is to create AI tools that empower research...
decor
1 Day ago
Software Test Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
By clicking the “Apply” button, I understand that my employment application process with Takeda will commence and that the information I provide in my application will be processed in line with Takeda’s Privacy Notice and Terms of Use. I further att...
decor
1 Day ago
Software Developer 5 (Java Fullstack)
Information Technology
  • Thiruvananthapuram, Kerala, India
Job DescriptionBuilding off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team focuses on product development and product strategy for Oracle Health, while building out a complete platfo...
decor
1 Day ago
Java Developer - Spring Frameworks
Information Technology
  • Thiruvananthapuram, Kerala, India
Java DescriptionWe are looking for a passionate and talented Java Developer with 2-3 years of hands-on experience to join our growing development team.The ideal candidate should have a strong foundation in Java technologies and the ability to develo...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media