Hyderabad, Telangana, India
Healthcare & Life Sciences
Full-Time
UST
Overview
Role Description
Key Responsibilities
Data Architecture,Databricks,Apache Spark,AI/ML
Key Responsibilities
- Data Strategy & Architecture Development
- Define and implement data architecture and strategy that aligns with business goals.
- Design scalable, cost-effective, and high-performance data solutions using Databricks on AWS, Azure, or GCP.
- Establish best practices for Lakehouse Architecture and Delta Lake for optimized data storage, processing, and analytics.
- Data Engineering & Integration
- Architect and build ETL/ELT pipelines using Databricks Spark, Delta Live Tables, and Databricks Workflows.
- Optimize data ingestion from systems like Oracle Fusion Middleware, WebMethods, MuleSoft, and Informatica into Databricks.
- Ensure real-time and batch data processing with Apache Spark and Delta Lake.
- Implement data integration strategies to ensure seamless connectivity with enterprise systems such as Salesforce, SAP, ERP, and CRM.
- Data Governance, Security & Compliance
- Implement data governance frameworks using Unity Catalog for data lineage, metadata management, and access control.
- Ensure compliance with industry regulations like HIPAA, GDPR, and others in the life sciences domain.
- Define and enforce Role-Based Access Control (RBAC) and data security best practices using Databricks SQL and access policies.
- Enable data stewardship and ensure effective data cataloging for self-service data democratization.
- Performance Optimization & Cost Management
- Optimize Databricks compute clusters (DBU usage) for cost efficiency and performance.
- Implement query optimization techniques using Photon Engine, Adaptive Query Execution (AQE), and caching strategies.
- Monitor Databricks workspace health, job performance, and cost analytics.
- AI/ML Enablement & Advanced Analytics
- Design and support ML pipelines leveraging Databricks MLflow for model tracking and deployment.
- Enable AI-driven analytics in genomics, drug discovery, and clinical data processing.
- Collaborate with data scientists to operationalize AI/ML models in Databricks.
- Collaboration & Stakeholder Alignment
- Work closely with business teams, data engineers, AI/ML teams, and IT leadership to align data strategy with enterprise goals.
- Collaborate with platform vendors (Databricks, AWS, Azure, GCP, Informatica, Oracle, MuleSoft) for solution architecture and support.
- Provide technical leadership, conduct Proof of Concepts (PoCs), and drive Databricks adoption across the organization.
- Data Democratization & Self-Service Enablement
- Implement data sharing frameworks for self-service analytics using Databricks SQL and BI tools (Power BI, Tableau).
- Promote data literacy and empower business users with self-service analytics.
- Establish data lineage and cataloging to improve data discoverability and governance.
- Migration & Modernization
- Lead the migration of legacy data platforms (e.g., Informatica, Oracle, Hadoop) to the Databricks Lakehouse.
- Design a roadmap for cloud modernization and ensure seamless data transition with minimal disruption.
- Databricks & Spark Expertise
- Strong knowledge of Databricks Lakehouse architecture (Delta Lake, Unity Catalog, Photon Engine).
- Expertise in Apache Spark (PySpark, Scala, SQL) for large-scale data processing.
- Experience with Databricks SQL and Delta Live Tables (DLT) for real-time and batch processing.
- Proficiency with Databricks Workflows, Job Clusters, and Task Orchestration.
- Cloud & Infrastructure Knowledge
- Hands-on experience with Databricks on AWS, Azure, or GCP (preferred AWS Databricks).
- Strong understanding of cloud storage (ADLS, S3, GCS) and cloud networking (VPC, IAM, Private Link).
- Experience with Infrastructure as Code (Terraform, ARM, CloudFormation) for Databricks setup.
- Data Modeling & Architecture
- Expertise in data modeling (Dimensional, Star Schema, Snowflake, Data Vault).
- Experience with Lakehouse, Data Mesh, and Data Fabric architectures.
- Knowledge of data partitioning, indexing, caching, and query optimization techniques.
- ETL/ELT & Data Integration
- Experience designing scalable ETL/ELT pipelines using Databricks, Informatica, MuleSoft, or Apache NiFi.
- Strong knowledge of batch and streaming ingestion (Kafka, Kinesis, Event Hubs, Auto Loader).
- Expertise in Delta Lake & Change Data Capture (CDC) for real-time updates.
- Data Governance & Security
- Deep understanding of Unity Catalog, RBAC, and ABAC for data access control.
- Experience with data lineage, metadata management, and compliance (HIPAA, GDPR, SOC 2).
- Strong skills in data encryption, masking, and role-based access control (RBAC).
- Performance Optimization & Cost Management
- Ability to optimize Databricks clusters (DBU usage, Auto Scaling, Photon Engine) for cost efficiency.
- Knowledge of query tuning, caching, and performance profiling techniques.
- Experience in monitoring Databricks job performance using tools like Ganglia, CloudWatch, or Azure Monitor.
- AI/ML & Advanced Analytics (Preferred)
- Experience integrating Databricks MLflow for model tracking and deployment.
- Knowledge of AI-driven analytics, particularly in genomics, drug discovery, and life sciences data processing.
- Data Architecture
- Databricks
- Apache Spark
- AI/ML
- Cloud Platforms (AWS, Azure, GCP)
- Data Governance & Security
- ETL/ELT & Data Integration
- Performance Optimization
- Data Modeling
Data Architecture,Databricks,Apache Spark,AI/ML
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in