Overview
Job Purpose
We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data
Engineer, you will play a key role in the design, development, and optimization of our data
pipelines, supporting the ingestion, transformation, and analysis of large and complex datasets.
This role involves working with a variety of data sources, including Azure SQL Database, Google
Analytics, Google Play Store, Apple App Store, Salesforce, and other cloud-based services, to
ensure seamless data flow across our platforms. You will work extensively with Databricks,
leveraging its workflows, cluster management, and Unity Catalog to streamline and govern data
processes.
Responsibilities
Data Ingestion & Integration:
• Ingest data from a variety of sources such as Azure SQL DB, Google Analytics, Google
Play Store, Apple App Store, Salesforce, and others.
• Develop and optimize ETL/ELT pipelines to transform data from CSV, JSON, SQL tables,
and APIs into usable formats.
• Work with REST APIs to pull data from various external sources and integrate it into our
data ecosystem.
Data Transformation & Modeling:
• Design and implement efficient data transformation processes to cleanse, aggregate, and
enrich data.
• Apply industry best practices for data modeling to ensure scalability, performance, and
data integrity.
• Collaborate with data analysts and data scientists to provide clean, high-quality datasets
for reporting and analysis.
Databricks & Cluster Management:
• Utilize Databricks for data processing, transformation, and orchestration tasks.
• Manage and optimize Databricks clusters for performance, reliability, and cost-
effectiveness.
• Implement Databricks workflows to automate and streamline data pipelines.
• Use Unity Catalog for data governance and metadata management, ensuring compliance
and data access control.
Required Skills & Experience
Experience:
• 5+ years of hands-on experience in data engineering or a related field.
• Proven experience with Databricks and Databricks workflows, including cluster
management and data pipeline orchestration.
• Strong experience in data ingestion from SQL databases (Azure SQL DB), APIs (Google
Analytics, Google Play Store, Apple App Store, Salesforce), and file-based sources (CSV,
JSON).
Technical Skills:
• Proficiency in SQL for data manipulation and transformation.
• Experience with Python or Scala for writing and managing data workflows.
• Working knowledge of REST APIs for data integration.
• Experience in data transformation using Apache Spark, Delta Lake, or similar
technologies.
• Knowledge of cloud platforms such as Azure, with a focus on Azure SQL DB.
• Familiarity with Unity Catalog for metadata management and governance.
Data Engineering Best Practices:
• Understanding of data architecture, data pipelines, and the ETL/ELT process.
• Experience in data modeling, optimizing queries, and working with large datasets.
• Familiar with data governance, metadata management, and data access controls.
Preferred Skills (Optional):
• Knowledge of Apache Kafka or other real-time streaming technologies.
• Experience with Data Lake or Data Warehouse technologies.
• Familiarity with additional data transformation tools such as Apache Airflow or dbt.
• Understanding of machine learning workflows and data pipelines.