Bangalore, Karnataka, India
Information Technology
Full-Time
Avaamo
Overview
We are seeking a versatile and highly skilled DevOps & SRE with expertise in Compute, Storage, Network, and Cloud technologies. The ideal candidate will design, implement, and manage robust infrastructure solutions, ensuring reliability, scalability, and performance.
Responsibilities
Responsibilities
- Ensure the reliability, availability, and performance of the entire infrastructure stack, including compute, storage, network, and cloud components.
- Lead incident response efforts across the infrastructure stack, coordinating with Application Support, SRE, and Engineering teams to minimize MTTD and MTTR.
- Perform root cause analysis for infrastructure-related incidents and implement corrective actions.
- Develop and maintain automation tools for managing infrastructure resources.
- Collaborate with Engineering teams to plan and execute system upgrades and maintenance.
- Conduct capacity planning and resource management for all infrastructure components.
- Participate in on-call rotations to provide 24x7 support for all critical infrastructure issues.
- Design and implement disaster recovery plans and business continuity strategies.
- Implement best practices for monitoring, logging, and alerting across the infrastructure.
- Foster a culture of continuous improvement and operational excellence.
- Analyze complex infrastructure problems, design scalable and resilient solutions, and lead the implementation of these solutions.
- Collaborate with architects and other engineers to design and enhance the architecture of infrastructure systems, ensuring alignment with business needs and technology standards.
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 3+ years of experience in site reliability engineering, software engineering, or systems engineering.
- Strong programming skills in one or more programming languages such as Python, Go, Java, or similar.
- Experience with cloud platforms such as AWS, GCP, or Azure.
- Proven experience managing and optimizing a diverse infrastructure stack.
- Extensive knowledge of cloud platforms (AWS, Azure, GCP) and infrastructure as code (Terraform, CloudFormation).
- Solid understanding of virtualization and containerization (Docker, Kubernetes) and orchestration.
- Understanding of storage solutions (SAN, NAS, cloud storage) and backup systems.
- Strong understanding of network protocols, routing, switching, and firewalls.
- Experience with load balancers and network monitoring tools.
- Experience in DNS management and troubleshooting.
- Experience in network security best practices.
- Proficiency in monitoring and observability tools (Prometheus, Grafana, Datadog).
- Proficiency in at least one scripting language (Python, Bash) for automation.
- Experience with CI/CD pipeline management and DevOps practices.
- Strong understanding of disaster recovery and business continuity planning.
- Understanding of chaos engineering principles and practices.
- Skills in cost optimization for cloud infrastructure.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in