Overview
The Role
We're looking for a Intern - Data Engineer who thinks programmatically and gets things done. You'll be working across the full data lifecycle, sourcing and scraping data, cleaning and structuring it, and building and exposing APIs that put it to use. You are someone who can break down large, complex problems into small, actionable steps and follow through on them. We use Python heavily, but what matters more is how you think.
Key Responsibilities
- Scrape and ingest data from external APIs, websites, and third-party data providers
- Scope out new data sources and evaluate how they complement our existing ones
- Clean, normalize, and structure large volumes of heterogeneous data to enable search and analytics at scale
- Write maintainable, flexible code with empathy for your fellow developers
- Identify opportunities for automation
Required Skills
- Proficiency in Python, particularly for web scraping, data manipulation and automation
- Solid understanding of HTTP request/response cycles, status codes, auth patterns
- Comfort working with structured and semi-structured data formats SQL/NoSQL tables, CSVs, JSON, etc.
- Ability to write complex regular expressions and build parsers to extract usable data from messy PDFs, HTML, JSON, and other formats
Nice to Have
- Experience with Solr or other search systems (Elasticsearch, OpenSearch)
- Hands-on experience with message queues, ideally RabbitMQ or a similar system (SQS, Kafka)
- Familiarity with cloud providers like AWS or GCP
- Familiarity with image manipulation libraries and OCR/text extraction tools (e.g., Tesseract, Textract, or similar) for processing unstructured data.
- Familiarity with Grafana or similar observability tooling
What We Value
You don't need to have worked in our specific domain before. We care more about how you approach problems than your resume. If you can take something ambiguous, break it down, and ship something reliable, you'll fit in here.
We are looking for someone who is available for a summer internship of 3 months.