Chennai, Tamil Nadu, India
Information Technology
Full-Time
Wingify
Overview
About the Role
We are looking for a Senior DevOps Engineer to join our team and help design, build, and operate highly reliable and scalable systems. You enjoy solving complex infrastructure problems and take pride in building automation-first, production-grade platforms.
In this role, you will own critical parts of our infrastructure and work closely with engineering teams to ensure high availability, performance, security, and operational excellence. Our platform handles a large volume of HTTP requests, and the team continuously focuses on infrastructure improvements, automation, reliability, and proactive issue resolution.
Key Responsibilities
- Ensure high availability, reliability, and uptime of production infrastructure and services.
- Design, manage, and optimize cloud infrastructure primarily on Google Cloud Platform (GCP) (with AWS exposure as a plus).
- Own the overall Docker and Kubernetes architecture, including cluster setup, upgrades, scaling, and security.
- Troubleshoot and resolve complex issues in Docker and Kubernetes environments.
- Automate infrastructure provisioning, deployments, and operational workflows using Infrastructure as Code.
- Design, maintain, and optimize CI/CD pipelines to enable reliable and repeatable deployments.
- Implement and own monitoring, alerting, and incident response systems to ensure operational visibility.
- Manage logging, log aggregation, and analysis for production systems.
- Participate in on-call rotations, lead incident response, and drive post-incident reviews and long-term fixes.
- Continuously improve infrastructure, processes, tooling, and automation with a reliability-first mindset.
- Collaborate closely with engineering teams to build scalable, resilient, and secure systems.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science or a related field (or equivalent practical experience).
- 4+ years of hands-on experience operating and supporting production systems at scale.
- Strong experience with Linux system administration, security best practices, and troubleshooting.
- Hands-on experience with Google Cloud Platform (GCP) or Amazon Web Services (AWS).
- High proficiency in shell scripting (Bash, Awk, etc.) and automation.
- Strong experience running Docker and Kubernetes in production environments.
- Experience with CI/CD systems such as Jenkins and configuration management tools like Ansible.
- Experience with monitoring and alerting tools such as Prometheus, Grafana, Site24x7, PagerDuty, or similar.
- Experience with logging and log management tools such as Fluentd, Logstash, Kibana, or similar.
- Strong problem-solving skills and the ability to handle production incidents calmly and effectively.
Nice to Have
- Experience working with MySQL and PostgreSQL in production environments.
- Knowledge of load balancers, DNS, and web architectures.
- Experience with Terraform or other Infrastructure as Code tools.
- Knowledge of Python for automation and tooling.
- Understanding of distributed systems and high-availability architectures.
- Passion for building reliable, scalable, and maintainable systems.
Why This Role
- Opportunity to work on high-traffic, production-grade systems.
- High ownership and impact on infrastructure and reliability decisions.
- A culture that values automation, reliability, and continuous improvement.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in