Overview
We are looking for a skilled Data Engineer to join our team and play a key role in building and maintaining robust data pipelines that power AI-driven solutions. The ideal candidate will have strong expertise in data processing, ETL workflows, and real-time streaming, ensuring high-quality, reliable data for AI agents.
-
Clean, organize, and prepare data for AI agents to consume.
-
Design, develop, and maintain ETL processes to ensure smooth data flow.
-
Generate reports on data quality and implement continuous improvements.
-
Identify and integrate structured, unstructured, and streaming data sources relevant to AI tasks.
-
Apply data cleaning, labeling, and normalization techniques.
-
Implement validation, deduplication, and error-handling mechanisms for data integrity.
-
Collaborate with cross-functional teams to ensure data availability and reliability.
Required Skills & Qualifications
-
Programming: Proficiency in Python and SQL.
-
Big Data Frameworks: Strong knowledge of Apache Spark and Apache Flink.
-
Streaming Platforms: Experience with Apache Kafka for real-time data processing.
-
Workflow Orchestration: Hands-on experience with Apache Airflow.
-
Data Handling: Familiarity with both structured and unstructured data.
-
Data Quality: Experience implementing data quality checks for AI-driven systems.
-
Cloud Platforms: Working experience with AWS, GCP, or Azure.
-
Experience with containerization tools like Docker and Kubernetes.
-
Knowledge of data lake and data warehouse architectures.
Exposure to AI/ML data pipelines.