Free cookie consent management tool by TermsFeed Senior Cloud Infrastructure Engineer - Observability | Antal Tech Jobs
Back to Jobs
2 Weeks ago

Senior Cloud Infrastructure Engineer - Observability

decor
Bangalore, Karnataka, India
Information Technology
Full-Time
Splunk

Overview

Join us as we pursue our ground-breaking vision to make machine data accessible, usable, and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we are committed to our work, customers, having fun, and most significantly to each other’s success.
The
Splunk Observability Cloud
provides full-fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real-time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud-native systems engineers who use infrastructure-as-code, microservices, automation, and efficient design to build, operate, and scale our products.
Role
You will help us run one of the largest and most sophisticated cloud-scale, bigdata, and microservices platforms in the world. You will be responsible for enabling developers to operate highly available, scalable, and cost-efficient applications with low operational burden by handling and improving the reliability and resiliency of SRE-managed services and infrastructure. You thrive on automation, infrastructure-as-code, reliability engineering, and getting rid of tedious, manual tasks.
You will:
  • Design new services, tools, and monitoring to be implemented by the entire team.
  • Analyze the tradeoffs of the proposed design and make recommendations based on these tradeoffs.
  • Mentor new engineers to achieve more than they thought possible. You enjoy making other teams successful and are fulfilled through the success of others.
Work on reliability projects, including:
  • HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO
  • Chaos engineering
  • Application uptime and performance
  • Capacity management & planning
  • SLIs, SLOs, error budgets, and monitoring dashboards
  • Responsible for deployment and operations of large-scale distributed data stores and streaming services
  • Establishing design patterns for monitoring and benchmarking
  • Establishing and documenting production run books and guidelines for developers
  • Tooling, toil reduction, runbooks & automation to handle production environments
  • Incident management and improving MTTD/MTTR for services
  • Cloud cost optimization
Qualifications
Must-Have:
  • 9+ years of SRE experience in handling large-scale cloud-native microservices platforms.
  • 4+ years of strong hands-on experience deploying, handling, and monitoring large-scale Kubernetes clusters in the public cloud specifically AWS or GCP
  • Experience with infrastructure automation and scripting using Python and/or bash scripting.
  • Strong hands-on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. in order to build observability for large-scale microservices deployments.
  • Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems
Preferred:
  • AWS Solutions Architect certification preferred.
  • Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred
  • Experience with Infrastructure-as-Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
  • Experience with deployment and operations of large scale clusters for Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc.
  • Experience with CI/CD frameworks and Pipeline-as-Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc.
  • Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.

We value diversity, equity, and inclusion at Splunk and are an equal employment opportunity employer. Qualified applicants receive consideration for employment without regard to race, religion, color, national origin, ancestry, sex, gender, gender identity, gender expression, sexual orientation, marital status, age, physical or mental disability or medical condition, genetic information, veteran status, or any other consideration made unlawful by federal, state, or local laws. We consider qualified applicants with criminal histories, consistent with legal requirements.

Note:

Share job
Similar Jobs
View All
1 Day ago
TrueFan - Senior Machine Learning Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
About UsTrueFan is at the forefront of AI-driven content generation, leveraging cutting-edge generative models to build next-generation products. Our mission is to redefine content generation space through advanced AI technologies, including deep ge...
decor
1 Day ago
Salesforce commerce cloud consultant
Information Technology
  • Thiruvananthapuram, Kerala, India
Salesforce Commerce Cloud consultant  5+ Years of Experience 6 to 12 months Mode - Remote 1.1LPM - 1.2LPM Max Key Responsibilities Translate business requirements into scalable Salesforce Service Cloud solutions, in collaboration with CAE's technic...
decor
1 Day ago
Cloud Infrastructure Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
DescriptionInvent the future with us. Recognized by Fast Company’s 2023 100 Best Workplaces for Innovators List, Ampere is a semiconductor design company for a new era, leading the future of computing with an innovative approach to CPU design focuse...
decor
1 Day ago
Devops Engineer- Intermetiate
Information Technology
  • Thiruvananthapuram, Kerala, India
BackJD: Dev ops Engineer:As a DevOps Specialist- should be able to take ownership of the entire DevOps process, including Automated CI/CD pipelines and deployment to production.They should also be comfortable with risk analysis and prioritization.Le...
decor
1 Day ago
Sr Data Scientist (London)
Information Technology
  • Thiruvananthapuram, Kerala, India
AryaXAI stands at the forefront of AI innovation, revolutionizing AI for mission-critical, highly regulated industries by building explainable, safe, and aligned systems that scale responsibly. Our mission is to create AI tools that empower research...
decor
1 Day ago
Software Test Engineer
Information Technology
  • Thiruvananthapuram, Kerala, India
By clicking the “Apply” button, I understand that my employment application process with Takeda will commence and that the information I provide in my application will be processed in line with Takeda’s Privacy Notice and Terms of Use. I further att...
decor
1 Day ago
Software Developer 5 (Java Fullstack)
Information Technology
  • Thiruvananthapuram, Kerala, India
Job DescriptionBuilding off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team focuses on product development and product strategy for Oracle Health, while building out a complete platfo...
decor
1 Day ago
Java Developer - Spring Frameworks
Information Technology
  • Thiruvananthapuram, Kerala, India
Java DescriptionWe are looking for a passionate and talented Java Developer with 2-3 years of hands-on experience to join our growing development team.The ideal candidate should have a strong foundation in Java technologies and the ability to develo...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media