Overview
Role description
Job Summary
We are seeking an experienced GCP Data Engineer to design, develop, and maintain robust, scalable data pipelines on Google Cloud Platform. The ideal candidate will have strong expertise in Python and PySpark for large-scale data processing, ensuring data quality, performance, and reliability across our data platforms.
Key Responsibilities
· Design, build, and optimize scalable data pipelines and ETL processes using GCP services like BigQuery, Dataflow, Dataproc, and Cloud Storage.
· Develop high-quality, efficient, and well-documented code in Python and PySpark for data processing, manipulation, and analytics.
· Utilize PySpark for distributed data processing and analysis of large datasets to support analytical and business needs.
· Orchestrate data workflows using tools like Cloud Composer (Apache Airflow) and automate processes using Python and shell scripting.
· Implement data modeling, data warehousing concepts, and data governance practices to ensure data integrity, security, and quality.
· Monitor, troubleshoot, and optimize data processing jobs and queries for performance, reliability, and cost efficiency.
· Work closely with data scientists, analysts, and cross-functional teams to gather requirements and translate them into technical solutions.
Required Qualifications & Skills
· Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
· 6+ years of proven experience in a data engineering role with a strong focus on GCP.
· 2+ years of hands-on experience with GCP data services including BigQuery, Cloud Storage, Dataflow, Dataproc, and Pub/Sub.
· Strong proficiency in Python and SQL is essential, with demonstrated expertise in PySpark for big data processing.
· Solid understanding of data modeling, ETL/ELT processes, and data warehousing principles.
· Excellent problem-solving, analytical, and communication skills.
· Experience with version control systems like Git.
· GCP Professional Data Engineer certification is a plus.