Overview
We are looking for a Lead Data Engineer to design and build scalable, high-quality data products that power analytics, reporting, and AI use cases. You’ll play a key role in defining event modeling standards, building a trusted metrics layer, and developing modern transformation workflows using dbt. You’ll also contribute to emerging GenAI initiatives, leveraging Python and a foundational understanding of LLMs (Large Language Models).
Key Responsibilities:
Data Modeling & Analytics Enablement
Strong experience building Analytics Data Warehouses (DWH) using dimensional modeling, including SCD (Slowly Changing Dimensions Type 1/2), incremental loading strategies, and star/snowflake schema design.
Design and implement scalable event data models that support product analytics and behavioral insights.
Develop and maintain a governed metrics layer (definitions, calculation logic, validation, and documentation).
Build and optimize a semantic layer that enables consistent reporting across BI tools and downstream consumers.
Partner with Sales, Marketing, Support, Product, and Engineering teams to define reliable, reusable datasets and business logic.
dbt & Transformation Development
Build and maintain transformation pipelines using dbt, including:
o modular models, sources, and documentation
o data tests (generic + custom)
o incremental models and performance tuning
Establish best practices around branching, deployment, and CI/CD for dbt projects.
Data Platform & Quality
Ensure high data quality through proactive testing, observability, and monitoring.
Improve dataset reliability and maintainability through naming conventions, contracts, and lineage management.
Troubleshoot pipeline issues and resolve data inconsistencies quickly and effectively.
GenAI & LLM Support
Support integration of data with LLM-based applications (e.g., data narrator, metadata generation, dataset summarization, etc.).
Apply a basic understanding of LLM concepts such as embeddings, prompts, vector search, and token limits to guide data design.
Python Development
Build utilities, automation scripts, and data workflows using Python.
Use Python for validation frameworks, pipeline tooling, and integration across systems.
Required Qualifications:
6+ years of experience in Data Engineering or similar roles.
Strong experience in data warehousing.
Strong experience with event modeling (product events, behavioral data).
Proven ability to build and manage a metrics layer and semantic layer for consistent analytics.
Hands-on expertise with dbt for building production-grade transformation models.
Strong Python skills for data engineering workflows and automation.
Familiarity with GenAI concepts and modern AI/data workflows.
Basic understanding of LLMs, including how data is used in LLM applications.
Strong SQL skills and experience working with modern data warehouses (Snowflake/BigQuery/Redshift or similar).
Excellent communication skills and ability to collaborate with cross-functional stakeholders.
Preferred Qualifications (Nice to Have)
Experience building a semantic layer tool (e.g., dbt Semantic Layer, Cube, MetricFlow, etc.).
Experience with data orchestration tools (Airflow, Dagster, Prefect).
Familiarity with data observability tools (OpenMetaData, Monte Carlo, Datadog, etc.).
Experience supporting ML features, embeddings pipelines, or vector databases.
Experience working in product analytics ecosystems (Segment, Mixpanel, etc.).