Overview
Position Title: GCP Data Engineer
Budget: 21 - 22 LPA
Experience: 5-7 Years
Location: Navi Mumbai
Share your resume at: navneet@sourcebae.com
We are seeking a highly skilled and experienced GCP Data Engineer to join our growing data team. The ideal candidate will be a self-motivated individual with a strong passion for data and a proven track record of designing, building, and maintaining scalable and robust data solutions on Google Cloud Platform. You will play a critical role in transforming raw data into actionable insights, enabling data-driven decision-making across the organization.
Responsibilities:
- Data Pipeline Development: Design, develop, implement, and maintain highly scalable and reliable ETL/ELT data pipelines using a variety of GCP services and programming languages.
- BigQuery Expertise: Leverage Google BigQuery as a primary data warehouse, including designing optimal schemas, writing complex and efficient SQL queries, optimizing query performance, and managing large datasets.
- Data Integration: Integrate data from diverse internal and external sources (structured, semi-structured, and unstructured) into the GCP environment, ensuring data quality and consistency.
- ETL/ELT Processes: Build, manage, and optimize ETL/ELT processes for data ingestion, transformation, and loading, utilizing tools like Dataflow, Dataproc (with PySpark), Cloud Composer (Apache Airflow), and Python scripting.
- Data Modeling & Warehousing: Design and implement efficient data models in BigQuery to support analytical and reporting needs, adhering to data warehousing best practices.
- Automation & Orchestration: Automate data workflows and pipelines using tools like Cloud Composer (Apache Airflow) and Cloud Functions to improve efficiency and reduce manual effort.
- Data Quality & Governance: Implement robust data quality checks, data lineage, and data governance frameworks to ensure high data integrity, reliability, and compliance with industry standards.
- Performance Optimization: Continuously monitor, troubleshoot, and optimize the performance and cost-efficiency of data pipelines and BigQuery queries.
- Collaboration: Work closely with data scientists, data analysts, business stakeholders, and other engineering teams to understand data requirements and deliver solutions that meet business objectives.
- Security & Compliance: Ensure data security, access controls, and compliance with relevant regulations (e.g., GDPR, HIPAA) across all data workflows.
- Documentation & Best Practices: Create and maintain clear and comprehensive documentation of data architectures, processes, and best practices. Participate in code reviews and knowledge sharing sessions.
Skills & Qualifications:
- Experience: 5-7 years of professional experience in data engineering, with a significant focus on Google Cloud Platform (GCP).
- GCP Services: Strong hands-on expertise with key GCP data services, including:
- BigQuery (Mandatory): Advanced SQL, schema design, query optimization, partitioning, clustering.
- Dataflow / Apache Beam: Building and managing batch and streaming data pipelines.
- Cloud Storage: Data storage and management.
- Cloud Composer / Apache Airflow: Workflow orchestration and automation.
- Cloud Functions / Pub/Sub: Real-time data processing and messaging (a plus).
- SQL : SQL skills for complex querying, data transformation, and optimization.
- ETL/ELT Concepts (Mandatory) : Deep understanding of ETL/ELT processes, data integration patterns, and data warehousing principles.
- Data Modeling: Experience with dimensional and relational data modeling (Star Schema, Snowflake Schema).
- Version Control: Familiarity with version control systems (e.g., Git).
- Problem-Solving: Excellent analytical and problem-solving skills with a keen eye for detail.
- Communication: Strong verbal and written communication skills, with the ability to convey complex technical concepts to both technical and non-technical stakeholders.
- Collaboration: Ability to work effectively in a collaborative, fast-paced, and agile environment.
Preferred Qualifications (Nice to Have):
- GCP Professional Data Engineer certification.
- Experience with other cloud platforms (AWS, Azure). Good to have.
- Knowledge of Linux
- Familiarity with CI/CD pipelines and DevOps practices.
- Knowledge of data visualization tools (e.g., Looker, Tableau).
- Experience with data quality frameworks and observability tools.
This role offers an exciting opportunity to work on cutting-edge data solutions within a dynamic and innovative environment. If you are a dedicated and skilled GCP Data Engineer looking to make a significant impact, we encourage you to apply.