Overview
Data Engineer cum Data Scientist (Databricks, Snowflake, Kafka)
The Opportunity:
We are seeking a talented and experienced Data Engineer cum Data Scientist to join our team. In this hybrid role, you will be instrumental in designing, developing, and maintaining our data infrastructure, building efficient ETL pipelines, and contributing to data modelling and architecture. You will also have the opportunity to apply your expertise in data science to build and deploy machine learning models, transforming raw data into actionable insights.
What You'll Do:
•Design, develop, and optimize scalable data pipelines using Databricks, Snowflake, and Kafka for real-time and batch processing.
•Implement robust ETL processes to ingest, transform, and load data from various sources into our data warehouse.
•Develop and maintain data models and architecture that support analytics, reporting, and machine learning initiatives.
•Write clean, efficient, and well-documented code primarily in Python and PySpark.
•Utilize SQL for data querying, manipulation, and optimization within Snowflake and other data stores.
•Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver effective solutions.
•Contribute to the full lifecycle of Data Science and ML Model Building, including data preparation, feature engineering, model training, evaluation, and deployment.
•Monitor and troubleshoot data pipelines and systems to ensure data quality, reliability, and performance.
•Stay up-to-date with the latest industry trends and technologies in data engineering, data science, and cloud platforms.
What You'll Bring:
•Proven experience as a Data Engineer, Data Scientist, or a similar role.
•Strong proficiency with Databricks for data processing and analytics.
•Extensive experience with Snowflake for data warehousing and performance optimization.
•Hands-on experience with Kafka for building real-time data streaming applications.
•Expertise in Python programming, including relevant data manipulation libraries (e.g., Pandas, NumPy).
•Solid experience with PySpark for distributed data processing.
•Advanced SQL skills for complex data querying and schema design.
•Demonstrated understanding of Data Modeling and Architecture principles.
•Experience with the full lifecycle of Data Science & ML Model Building, from exploratory data analysis to model deployment.
•Familiarity with cloud platforms (e.g., AWS, Azure, GCP) is a plus.
•Excellent problem-solving, analytical, and communication skills.
•Ability to work independently and collaboratively in a fast-paced environment.
Bonus Points If You Have:
•Experience with CI/CD pipelines for data engineering workflows.
•Knowledge of data governance and data quality best practices.
•Familiarity with containerization technologies (e.g., Docker, Kubernetes).
•Contributions to open-source projects or a strong GitHub profile.
Education:
•Bachelor's or Master's degree in Computer Science, Engineering, Statistics, Mathematics, or a related quantitative field.