Free cookie consent management tool by TermsFeed Lead Consultant - HPC DevOps Engineer | Antal Tech Jobs
Back to Jobs
16 Weeks ago

Lead Consultant - HPC DevOps Engineer

decor
Information Technology
AstraZeneca

Overview

Job Title: Lead Consultant - HPC DevOps Engineer

Career level: E

Introduction to role

The Research Data & Analytics Team in R&D IT comprises skilled data and AI engineers and professionals who are dedicated to delivering innovative services and products. Our mission is to transform the way R&D discovers and develops medicine through data, analytics, and AI. We partner with scientific teams to deliver groundbreaking capabilities, products, and platforms that enable scientists to accelerate medicines that are safe and effective for patients.

The Scientific Computing platform (SCP) is a foundational capability for HPC and scaled computing solutions. Embedded within the Research D&A organization, it is central to analytics products focused on computational chemistry, imaging, multi-OMICs, structural biology, data science, and AI. The SCP team is accountable for the end-to-end delivery of high-performance analytics products, with an emphasis on augmenting the HPC experience. We combine modern HPC with a powerful DevOps stack and cloud-native technologies to power research and development at AstraZeneca.

Accountabilities

The Observability Engineer will be responsible for designing, implementing, and managing monitoring and logging systems that ensure high availability, performance, and visibility across the platform’s infrastructure and applications. The ideal candidate should have deep expertise in Prometheus, Grafana, ELK (Elastic Stack), or similar stack, with a strong understanding of short-term and long-term storage solutions for metrics and logs. Equally important is experience in leadership and coaching to lead and encourage best practices throughout the platform.

What you'll do:

Prometheus: Metrics Collection and Storage: Design and manage Prometheus architecture, including identifying high cardinality and troubleshooting performance issues. Configure short-term and long-term storage solutions using Prometheus-compatible systems (e.g., Thanos, Cortex, or VictoriaMetrics). Implement and optimize Prometheus exporters for collecting custom application metrics. Establish alerting rules using Prometheus Alertmanager.

Grafana: Visualization and Dashboarding: Develop and maintain Grafana dashboards for real-time observability. Integrate Grafana with other systems for unified visualization. Identify key metrics and insights through dashboards for both internal and external consumption.

Management and Insights: Setup and manage logging solutions, develop relevant dashboards and queries to provide actionable insights. Integrate logging solutions with other observability tools for cohesive monitoring.

Cross-Tool Integration: Implement integrations between Prometheus, Grafana, and logging solutions to create a unified observability platform. Design solutions for correlation of metrics and logs to streamline root cause analysis.

Performance Tuning and Maintenance: Monitor the performance of observability tools and optimize resource utilization. Conduct regular upgrades and maintenance of all observability components. Collaboration and Documentation: Work with SCP teams and users to define monitoring and logging requirements. Leadership and coaching on observability best practices while aiming for simplification. Focus on offering observability as an easy-to-consume service for the rest. Document observability architecture, workflows, and troubleshooting guides.

Essential Skills/Experience

Technical skills

  • Prometheus: Expertise in Prometheus setup, scaling, and federation. Knowledge of Thanos, Cortex, or VictoriaMetrics for long-term storage. Hands-on experience with PromQL for writing complex queries.

  • Grafana: Proficiency in creating dashboards and integrating with multiple data sources.

  • Logging: In-depth experience with ELK, Splunk, Loki or similar, both with query languages and dashboarding.

  • Infrastructure: Hands-on experience managing observability infrastructure in Kubernetes, Docker, or other container technologies.

  • Scripting and Automation: Proficiency in Python, Bash, or similar scripting languages. Experience with Infrastructure as Code tools like Terraform or Ansible.

Soft skills

  • Strong problem-solving and analytical abilities.

  • Excellent communication and collaboration skills to work across teams and end users.

  • Ability to streamline complex processes and requirements into simple and elegant solutions.

  • Ability to document complex systems clearly and concisely.

Desirable Skills/Experience

  • Familiarity with other observability tools (e.g., Loki, VictoriaMetrics).

  • Certifications: Prometheus Certified Associate.

When we put unexpected teams in the same room, we unleash bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That's why we work, on average, a minimum of three days per week from the office. But that doesn't mean we're not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.

At AstraZeneca, our work has a direct impact on patients by transforming our ability to develop life-changing medicines. We empower the business to perform at its peak by combining ground breaking science with leading digital technology platforms and data. Here you can innovate, take ownership, explore new solutions, experiment with innovative technology, and tackle challenges in a modern technology environment.

Ready to make a difference? Apply now!

Share job
Similar Jobs
View All
1 Day ago
Python Developer - Bangalore/ Pune
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Job Title: Python Developer with React.js - Bangalore/ Pune About Us “Capco, a Wipro company, is a global technology and management consulting firm. Awarded with Consultancy of the year in the British Bank Award and has been ranked Top 100 Best Com...
decor
1 Day ago
Azure Devops Engineer(5+ Yrs Exp)
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Required Qualifications & Skills: 5+ years in DevOps, SRE, or Infrastructure Engineering. Strong expertise in Cloud (AWS/GCP/Azure) & Infrastructure-as-Code (Terraform, CloudFormation). Proficient in Docker & Kubernetes. Hands-on with CI/CD tools ...
decor
1 Day ago
Practo Technologies - Lead Frontend Software Engineer - React.js/Next.js
Information Technology
Lead Software Engineer - UI Job DescriptionAbout Practo : www.practo.comPracto is the world's leading healthcare platform that connects millions of patients with hundreds of thousands of healthcare providers around the world and helps people make be...
decor
1 Day ago
Software Engineer 2
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI...
decor
1 Day ago
.Net Developer - Full Stack Technologies
Information Technology
Job Title : Senior .NET Full Stack DeveloperCompany : XevyteLocation : Bangalore (Hybrid)Experience Required : 6+ YearsAbout XevyteXevyte is a global technology and services company committed to driving digital transformation and sustainable growth....
decor
1 Day ago
SAP-Data Analyst
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Job Role:- SAP-Data Analyst  Job Location: -Noida/Gurgaon/Hyderabad/Bangalore/Pune Experience: -5 Years Job Roles & Responsibilities: - Collaborate with Finance & FBT Teams: Drive all data-related activities for the finance SAP deployment, ensur...
decor
1 Day ago
Senior Data Analyst Engineer
Space Exploration & Research, Information Technology
  • Pune, Maharashtra, India
Mirra Healthcare India Immedidate Joiners Only Job Description: We are seeking a highly skilled and experienced Senior Data Analyst/Engineer with a strong background in Python programming and Power BI development. The ideal candidate will have at ...
decor
1 Day ago
Senior Manager, Data Stewardship Engineer
Information Technology
  • Pune, Maharashtra, India
This site is for Residents of Europe, Middle East, Africa, Latin America & Asia Pacific.Residents of the United States, Canada & Puerto Rico, please click here. ...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media