Overview
Required years of experience –
6-10 years of relevant experience
GENERAL DESCRIPTION OF ROLE:
Professionals in this group design and implement high-performance, scalable and optimized data solutions for large enterprise-wide data mining and processing. They are responsible for design and development of data flows and Big Data Platform deployment and implementation. Incumbents usually require expert knowledge in Databricks, Spark, SQL etc.
JOB RESPONSIBILITIES
· Design and propose end-to-end data pipelines ((ingestion to storage to consumption) for data projects as well as databases to support web-based applications.
· Design and implement data warehouses and data marts to serve data consumers.
· Execute implementation of the designs and bring it from design stages to implementation followed by operationalization to maintenance.
· Database design and modelling. To be able to organize data at both macro and micro level and provide logical data models for the consumers.
· Database performance tuning and data lifecycle management
· Assist in support and enhancements of existing data pipelines and databases.
SKILLS/COMPETENCIES REQUIRED
· 6-10 years of total experience working with data integration teams.
· 3+ years of in-depth experience developing data pipelines within an Apache Spark environment (preferably Databricks).
· 2+ years of active work with Databricks, demonstrating in-depth experience.
· 2+ years of in-depth experience in working with a data warehouse and knowledge of data warehouse modelling techniques.
· Strong knowledge on Pyspark, Python and SQL and distributed computing principles.
· Strong knowledge of data modelling, database technologies, data warehousing.
· Experience with ETL/ELT processes, both design and implementation via SSIS or some other ELT/ETL tool.
· Knowledge of cloud platforms (Aws or Azure) and big data technologies (Hadoop, Spark, etc.)
· Fluent in both complex SQL query performance tuning and database performance tuning
· Understand the importance of performance and able to implement best practices in ensuring performance and maintainability for data-centric projects.
Nice to haves:
· Some experience developing data solutions using native IaaS and PaaS solutions on AWS (Redshift, RDS, S3) will be an advantage