Overview
5+ years of experience in software/data engineering, including at least 2 years working with Databricks and Apache Spark
• Strong proficiency in Python, SQL, and PySpark
• Deep understanding of AWS and Azure Cloud service
• Experience with Databricks Data LakeHouse, Databricks Workflows, and Databricks SQL, dbt
• Solid grasp of data Lakehouse and warehousing architecture
• Prior experience supporting AI/ML workflows, including training data pipelines and model deployment support
• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation
• Strong analytical and troubleshooting skills in a fast-paced, agile environment
• Excellent collaboration skills for interfacing with both technical and non-technical customer stakeholders
• Clear communicator with strong documentation habits
• Comfortable leading discussions, offering strategic input, and mentoring others
Key Responsibilities:
• The ideal candidate will have a strong background in building scalable data pipelines, optimizing big data workflows, and integrating Databricks with cloud services
• This role will play a pivotal part in enabling the customer’s data engineering and analytics initiatives—especially those tied to AI-driven solutions and projects—by implementing cloud-native architectures that fuel innovation and sustainability
• Partner directly with the customer’s data engineering team to design and deliver scalable, cloud-based data solutions
• Execute complex ad-hoc queries using Databricks SQL to explore large lakehouse datasets and uncover actionable insights
• Leverage Databricks notebooks to develop robust data transformation workflows using PySpark and SQL
• Design, develop, and maintain scalable data pipelines using Apache Spark on Databricks
• Build ETL/ELT workflows with AWS and Azure Services
• Optimize Spark jobs for both performance and cost within the customer’s cloud infrastructure
• Collaborate with data scientists, ML engineers, and business analysts to support AI and machine learning use cases, including data preparation, feature engineering, and model operationalization
• Contribute to the development of AI-powered solutions that improve operational efficiency, route optimization, and predictive maintenance in the waste management domain
• Implement CI/CD pipelines for Databricks jobs using GitHub Actions, Azure DevOps, or Jenkins
• Ensure data quality, lineage, and compliance through tools like Unity Catalog, Delta Lake, and AWS Lake Formation
• Troubleshoot and maintain production data pipelines
• Provide mentorship and share best practices with both internal and customer teams