Overview
About the Role
As an Engineer on the Data Intelligence team, you will be dealing with large scale data pipelines and data sets that are critical and foundational for Uber to make decisions for better customer experience. You will be working on a petabyte scale of analytics data from the multiple Uber applications. Help us build the software systems and data models that will enable data scientists to understand our user behavior better and thrive on the data driven mindset at Uber.
About the team
The Data Intelligence Platform team is responsible for the designing core foundational data sets that are critical to understand customers needs and helps business teams take right decisions in solving these critical problems. The team's mission is to ensure high quality for all the critical data flows for analytics purposes across all verticals in Uber and enable faster implementation of data needs by building standardized tools and framework for accurate analysis. We are currently revamping all critical analytical data flows across domains to build high quality data sets and frameworks that are used across Uber.
Basic Qualifications
1. 3+ years of Data engineering experience
2. Demonstrated experience of working with large data volumes and backend services.
3. Good working knowledge of SQL (mandatory) and any other languages ( Java, Scala, Python)
4. ̇Working Experience of ETL, Data pipelines, Data Lake, Data Modeling fundamentals.
5. Good problem solving and analytical skills
6. Good team player and collaboration skills.
Preferred Qualifications
1. Experience in data engineering and working with Big data
2. Experience with ETL or Streaming data and one or more of, Kafka, HDFS, Apache Spark , Apache Flink , Hadoop
3. Good to have experience backend services and familiarity with one of the cloud platform ( AWS/ Azure / Google /Oracle cloud)
What the Candidate Will Do
1. Responsible for defining the Source of Truth (SOT), Dataset designfor multiple Uber teams.
2. Identify unified data models collaborating with Data Science teams
3. Streamline data processing of the original event sources and consolidate them in source of truth event logs
4. Build and maintain real-time/batch data pipelines that can consolidate and clean up usage analytics
5. Build systems that monitor data losses from the different sources and improve the data quality
6. Own the data quality and reliability of the Tier-1 & Tier-2 datasets including maitaining their SLAs, TTL and consumption
7. Devise strategies to consolidate and compensate the data losses by correlating different sources
8. Solve challenging data problems with cutting edge design and algorithms.
Competencies
Data Engineering
1. Fundamentals of Data Engineering and Big data technologies
2. Pipleline creation, writing Spark jobs
3. Experience coding SQL queries and other languages like Scala, Java, Python.
Data Architecture & Design (REQUIRED)
1. Good at designing Data Models
2. Understanding of SOA / Micro services
3. Familiar with AWS / Azure / GCP/OCI cloud services