Bangalore, Karnataka, India
Information Technology
Other
SteerLean Consulting
Overview
Responsibilities- Design and implement data pipelines, ETL processes, schemas, and data models to ingest, process, and prepare multi-petabyte scale datasets for downstream analytics and machine learning.
- Build and optimize data processing systems on modern platforms like Spark, Delta Lake, Kafka, etc.
- Implement data quality, validation, and monitoring measures leveraging tools such as Great Expectations.
- Ensure compliance with security, access control, and regulatory requirements related to PHI and other sensitive data types.
- Support adoption of emerging standards like FHIR for healthcare data exchange.
- Collaborate with data scientists, analysts, and engineers to understand data needs and deliver performant, reliable data products
- Keep track of emerging technologies & trends in the Data Engineering world, incorporating modern tooling and best practices.
- Experience in building and operating production big data platforms and pipelines
- Strong experience with SQL, Spark, workflow orchestrators, distributed message bus, Python, Presto, Deltalake, apache big data tool suites, Docker, Kubernetes, MPP
- Hands on with the design and implementation of cloud-based data solutions using platforms like Azure, AWS, or GCP, optimizing for scalability, cost-efficiency, and performance.
- Implement and maintain data lakes and warehouses, lakehouses including data modeling, ETL processes, and data quality assurance to empower data-driven decision-making.
- Develop real-time data pipelines using streaming technologies like Apache Kafka or AWS Event hub, enabling timely insights and actions from incoming data streams.
- Manage and enhance distributed data systems (e.g., Hadoop, Spark) to efficiently process large-scale datasets, ensuring data availability and reliability.
- Previous experience of working on health data and Azure cloud is a strong plus
- Experience with Databricks or MS Fabric
- Strong track record of designing and implementing scalable data models, schemas, ETL logic
- Experience with data governance, master data management, data pseudonimization and anonymization, and data catalog solutions .
- A strong interest in learning new things and team player ethics.
- Strong analytical skills and good understanding of data structures and algorithms.
- Some exposure to Nextflo and or Nextflow Tower
- Experience building data pipelines for machine learning.
- Knowledge of genomics, medical imaging, and/or EHR data domains
- Knowledge of HIPAA, HL7 and other healthcare data privacy requirements
- Hands on experience with fully managed data warehousing solutions Azure Synapse, AWS Redshift ,Bigquery, Snowflake etc:
- Azure Batch & Blob Storage
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in