Overview
Position Requirements:
● Minimum 4+ years of experience on PostgreSQL or any Unix based relational database. Advanced understanding of SQL.
● Minimum 2+ years’ experience with Pentaho Data Integration Tools or any ETL/ELT tools like Talend, Informatica, or HOP.
● 2+ years of hands-on experience with orchestration tools (Airflow, Control-M, Autosys).
● Hands-on experience in Python scripting for automation and data processing (1+ years).
● Proven experience with cloud/data platforms (Snowflake, Redshift, BigQuery) and big data (Hadoop, Hive, Spark) is good to have.
● Strong knowledge of data architecture (dimensional modeling, pipelines, warehousing).
● Experience collaborating with non-technical teams (BI, Marketing, Finance) to translate business needs into technical solutions.
● Experience with reporting and Data Visualization platforms (Tableau, Pentaho BI) will be an added advantage.
● Understanding of Digital Marketing Transactional Data (Click Stream Data, Ad Interaction Data, Email Marketing Data)
● Experience with setting up the infrastructure and architectural requirements and Understanding of Medical/Clinical data.
● Leadership: Ability to mentor junior engineers and lead small projects independently.
● Strong communication and documentation skills.
● Willingness to learn new technologies.
● Preferred:
○ GCP/BigQuery, Airflow, or digital marketing data (clickstream, ad
interactions).
○ Exposure to Linux and scheduling tools.
Role & Responsibilities:
● Design and build scalable ETL/ELT pipelines for structured/unstructured data (files, Postgres, Vertica, Hive).
● Analyze Business Requirements, design and implement required data model and Build ETL/ELT strategies.
● Lead data architecture and engineering decision making/planning.
● Optimize data workflows (performance, cost, scalability) using orchestration tools like Airflow.
● Collaborate cross-functionally to align data solutions with business goals (e.g., BI dashboards, marketing analytics).
● Document and communicate technical designs to both technical and non-technical stakeholders.
● Troubleshoot and resolve data pipeline issues with minimal supervision.
● Mentor junior engineers and promote best practices in data engineering.