Overview
Key Responsibilities
- Data Pipeline Development: Design, build, and optimize scalable, secure, and reliable data
pipelines to ingest, process, and transform large volumes of structured and unstructured data.
- Data Architecture: Architect and maintain data storage solutions, including data lakes, data
warehouses, and databases, ensuring performance, scalability, and cost-efficiency.
- Data Integration: Integrate data from diverse sources, including APIs, third-party systems,
and streaming platforms, ensuring data quality and consistency.
- Performance Optimization: Monitor and optimize data systems for performance, scalability,
and cost, implementing best practices for partitioning, indexing, and caching.
- Collaboration: Work closely with data scientists, analysts, and software engineers to
understand data needs and deliver solutions that enable advanced analytics, machine
learning, and reporting.
- Data Governance: Implement data governance policies, ensuring compliance with data
security, privacy regulations (e.g., GDPR, CCPA), and internal standards.
- Automation: Develop automated processes for data ingestion, transformation, and validation
to improve efficiency and reduce manual intervention.
- Mentorship: Guide and mentor junior data engineers, fostering a culture of technical
excellence and continuous learning.
- Troubleshooting: Diagnose and resolve complex data-related issues, ensuring high
availability and reliability of data systems.
Required Qualifications
- Education: Bachelor's or Master's degree in Computer Science, Engineering, Data Science,
or a related field.
- Experience: 3+ years of experience in data engineering or a related role, with a proven track
record of building scalable data pipelines and infrastructure.
- Technical Skills:
- Proficiency in programming languages such as Python
- Expertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).
- Strong experience with cloud platforms (e.g., AWS, GCP) and their data services
(e.g., Redshift, BigQuery, Snowflake).
- Hands-on experience with ETL/ELT tools (e.g., Apache Airflow, Talend, Informatica) and
data integration frameworks.
- Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed
systems.
- Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a
plus.
- Soft Skills:
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
- Ability to work in a fast-paced, dynamic environment and manage multiple priorities.
- Certifications (optional but preferred): Cloud certifications (e.g., AWS Certified Data Analytics,
Google Professional Data Engineer) or relevant data engineering certifications.
(ref:hirist.tech)