Free cookie consent management tool by TermsFeed Data Engineer- Web Scraping & Threat Intelligence | Antal Tech Jobs
Back to Jobs
3 Weeks ago

Data Engineer- Web Scraping & Threat Intelligence

decor
Mumbai, Maharashtra, India
Information Technology
Full-Time
HEROIC Cybersecurity

Overview

About the Role

: HEROIC Cybersecurity (HEROIC.com) is seeking a senior-level Threat Intelligence Data Engineer – Automated Collection & Dark Web Intelligence to design, build, and operate fully automated intelligence collection systems that power our AI-driven cybersecurity and breach intelligence platforms.

This role owns the end-to-end discovery, acquisition, and ingestion pipeline for continuously discovering, crawling, extracting, indexing, and normalizing millions of new artifacts daily—including documents, chats, forums, leaked datasets, repositories, threat actor communications, hacker marketplaces, unsecured infrastructure, and decentralized networks across the surface web, deep web, dark web, and anonymized networks.

Our Threat Research Team’s mission is aggressive: achieve near-total coverage of global breach and leak data with 99%+ automation. Your work directly enables HEROIC’s ability to identify exposures before they are weaponized.

What You Will Do

Automated Intelligence Collection & Discovery

  • Architect and operate large-scale, distributed crawling and discovery systems across:
  • Surface web, deep web, and dark web
  • Hacker forums, underground marketplaces, and breach communities
  • Chat platforms (Telegram, Discord, IRC, WhatsApp, etc.)
  • Paste sites, code repositories, and social platforms used for breach disclosure
  • Continuously discover, archive, and download newly released datasets, logs, credentials, and artifacts the moment they appear

Dark Web, Anonymized & Decentralized Networks

  • Build automated collectors and archivers for anonymized and decentralized networks including:
  • Tor (.onion), I2P, ZeroNet, Freenet, IPFS, GNUnet, Lokinet, Yggdrasil, and similar systems
  • Design resilient workflows for unreliable, adversarial, or ephemeral data sources
  • Normalize and index data from non-traditional network protocols and formats

Infrastructure & Exposure Discovery

  • Develop automated scanning systems to identify:
  • Unsecured databases (Elasticsearch, MySQL, PostgreSQL, MongoDB, etc.)
  • Exposed cloud storage (S3, Azure, GCP, DigitalOcean Spaces)
  • Open FTP servers, backups, and misconfigured archives
  • Monitor and ingest data from file hosting and distribution platforms commonly used for breach dumps

Pipeline Engineering & Operations

  • Build ETL pipelines to clean, normalize, enrich, and index structured and unstructured data
  • Implement advanced anti-bot evasion strategies (proxy rotation, fingerprinting, CAPTCHA mitigation, session management)
  • Integrate collected intelligence into centralized databases and search systems
  • Design APIs and internal tooling to support downstream analysis and AI/ML workflows
  • Implement advanced anti-bot, evasion, and resiliency techniques (proxy rotation, fingerprinting, CAPTCHA mitigation, session handling)
  • Automate deployment, scaling, and monitoring using Docker, Kubernetes, and cloud infrastructure
  • Continuously optimize performance, reliability, and cost efficiency of crawler clusters

What We Are Looking For

  • Minimum 4 years of hands-on experience in data engineering, intelligence collection, crawling, or distributed data pipelines
  • Strong Python expertise and experience with frameworks such as Scrapy, Playwright, Selenium, or custom async systems
  • Proven experience operating high-volume, automated data collection systems in production
  • Deep understanding of web protocols, HTTP, DOM parsing, and adversarial scraping environments
  • Experience with asynchronous, concurrent, and distributed architectures
  • Familiarity with SQL and NoSQL databases (PostgreSQL, MongoDB, Elasticsearch, Cassandra)
  • Strong Linux/Unix, shell scripting, and Git-based workflows
  • Experience deploying and operating systems using Docker, Kubernetes, AWS, or GCP
  • Excellent analytical, debugging, and problem-solving skills
  • Strong written and verbal communication skills.

Preferred / High-Value Experience

  • Direct experience with dark web intelligence, breach data, OSINT, or threat research
  • Familiarity with Tor, I2P, underground forums, stealer logs, or credential ecosystems
  • Experience processing large breach datasets or stealer logs
  • Background working in adversarial data environments
  • Exposure to AI/ML-driven intelligence platforms

What We Can Offer

  • Position Type: Full-time
  • Location: Remote in India. Work from wherever you please! Your home, the beach, our offices, etc.
  • Compensation: USD 1300-2000 monthly (depending on experience)
  • Professional Growth: Amazing upward mobility in a rapidly expanding company.
  • Innovative Culture: Be part of a team that leverages AI and cutting-edge technologies.

About Us:

HEROIC Cybersecurity (HEROIC.com) is building the future of cybersecurity. Unlike traditional cybersecurity solutions, HEROIC takes a predictive and proactive approach to intelligently secure our users before an attack or threat occurs. Our work environment is fast-paced, challenging, and exciting. At HEROIC, you’ll work with a team of passionate, engaged individuals dedicated to intelligently securing the technology of people all over the world.

Share job
Similar Jobs
View All
23 Hours ago
Data Engineer
Fintech
  • 3 - 5 Yrs
  • Mumbai
Data Engineer Mumbai | Full-Time  Experience: 3–6 Years Budget: Up to ₹27 LPA Industry: General Insurance (Digital-First Organization) We’re rebuilding insurance from the ground up digital-first, transparent, fast, and fair. No legacy te...
decor
1 Day ago
QA Manager
Fintech
  • 10 - 18 Yrs
  • Pune
Job Description We are seeking an experienced and dynamic QA Manager to lead our quality assurance team in delivering high-quality software products for our organization. The ideal candidate will have a strong background in manual and automation tes...
decor
1 Day ago
Database Administrator (DBA)
Information Technology
  • Bangalore, Karnataka, India
This role is for one of our clients Company Name: cloudtechner Seniority level: Mid-Senior level Min Experience: 5 years Location: Gurgaon, NCR JobType: full-time We are looking for an experienced and detail-oriented Database Administrator (DBA) to ...
decor
1 Day ago
Salesforce Data Engineer
Information Technology
  • Bangalore, Karnataka, India
DescriptionRole Summary :We are seeking a highly skilled Salesforce Data Engineer with deep expertise in the Salesforce platform and a strong focus on building and operating Salesforce Data Cloud (D360) solutions. The ideal candidate will design, int...
decor
1 Day ago
Business Analyst I
Information Technology
  • Bangalore, Karnataka, India
Through our dedicated associates, Conduent delivers mission-critical services and solutions on behalf of Fortune 100 companies and over 500 governments - creating exceptional outcomes for our clients and the millions of people who count on them. You ...
decor
1 Day ago
Associate Software Engineer - Test Automation (Infra)
Information Technology
  • Bangalore, Karnataka, India
Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in history, we surpassed $2B in revenue in our last fiscal ...
decor
1 Day ago
Interesting Job Opportunity: Data Analyst - SQL/Python
Information Technology
  • Bangalore, Karnataka, India
DescriptionWe are seeking a skilled Data Analyst with strong expertise in Python, SQL, and Excel, coupled with a solid foundation in statistics and a good understanding of retail demand processes.The ideal candidate will be responsible for transformi...
decor
1 Day ago
EY - GDS Consulting - AI and DATA - GCP Data Engineer - Senior
Information Technology
  • Bangalore, Karnataka, India
At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even b...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media