Overview
Role: Senior Data Engineer with Databricks.
Experience: 5+ Years
Job Type: Contract
Contract Duration: 6 Months
Budget: 1.0 lakh per month
Location : Remote
JOB DESCRIPTION:
We are looking for a dynamic and experienced Senior Data Engineer – Databricks to design, build, and optimize robust data pipelines using the Databricks Lakehouse platform. The ideal candidate should have strong hands-on skills in Apache Spark, PySpark, cloud data services, and a good grasp of Python and Java. This role involves close collaboration with architects, analysts, and developers to deliver scalable and high-performing data solutions across AWS, Azure, and GCP.
ESSENTIAL JOB FUNCTIONS
1. Data Pipeline Development
• Build scalable and efficient ETL/ELT workflows using Databricks and Spark for both batch and streaming data.
• Leverage Delta Lake and Unity Catalog for structured data management and governance.
• Optimize Spark jobs by tuning configurations, caching, partitioning, and serialization techniques.
2. Cloud-Based Implementation
• Develop and deploy data workflows onAWS (S3, EMR,Glue), Azure (ADLS, ADF, Synapse), and/orGCP (GCS, Dataflow, BigQuery).
• Manage and optimize data storage, access control, and pipeline orchestration using native cloud tools.
• Use tools like Databricks Auto Loader and SQL Warehousing for efficient data ingestion and querying.
3. Programming & Automation
• Write clean, reusable, and production-grade code in Python and Java.
• Automate workflows using orchestration tools(e.g., Airflow, ADF, or Cloud Composer).
• Implement robust testing, logging, and monitoring mechanisms for data pipelines.
4. Collaboration & Support
• Collaborate with data analysts, data scientists, and business users to meet evolving data needs.
• Support production workflows, troubleshoot failures, and resolve performance bottlenecks.
• Document solutions, maintain version control, and follow Agile/Scrum processes
Required Skills
Technical Skills:
• Databricks: Hands-on experience with notebooks, cluster management, Delta Lake, Unity Catalog, and job orchestration.
• Spark: Expertise in Spark transformations, joins, window functions, and performance tuning.
• Programming: Strong in PySpark and Java, with experience in data validation and error handling.
• Cloud Services: Good understanding of AWS, Azure, or GCP data services and security models.
• DevOps/Tools: Familiarity with Git, CI/CD, Docker (preferred), and data monitoring tools.
Experience:
• 5–8 years of data engineering or backend development experience.
• Minimum 1–2 years of hands-on work in Databricks with Spark.
• Exposure to large-scale data migration, processing, or analytics projects.
Certifications (nice to have): Databricks Certified Data Engineer Associate
Working Conditions
Hours of work - Full-time hours; Flexibility for remote work with ensuring availability during US Timings.
Overtime expectations - Overtime may not be required as long as the commitment is accomplished
Work environment - Primarily remote; occasional on-site work may be needed only during client visit.
Travel requirements - No travel required.
On-call responsibilities - On-call duties during deployment phases.
Special conditions or requirements - Not Applicable.
Workplace Policies and Agreements
- Confidentiality Agreement: Required to safeguard client sensitive data.
- Non-Compete Agreement: Must be signed to ensure proprietary model security.
- Non-Disclosure Agreement: Must be signed to ensure client confidentiality and security.