Gurugram, Haryana, India
Information Technology
Full-Time
Kodo
Overview
What You'll Be Doing
- Platform Reliability & Performance : Ensure our applications and environments are consistently stable, scalable, secure, and performing optimally.
- Collaborative Solutions : Proactively engage with cross-functional teams to understand their requirements, contributing to and delivering suitable supporting solutions.
- Infrastructure Growth & Automation : Develop and implement systems that facilitate rapid growth, including deployment policies, new procedures, configuration management, and planning for patches and capacity upgrades.
- Observability & Monitoring : Establish and maintain comprehensive monitoring and alerting systems to keep engineers continuously aware of potential issues.
- Incident Management & Automation : Create robust runbooks and procedures to minimize outages.
- Act quickly to resolve issues before users are impacted, then automate solutions to prevent future occurrences in production.
- Reliability & Security Engineering : Identify and mitigate reliability and security risks, ensuring our systems are prepared for peak loads, DDoS attacks, and operational mishaps.
- Full-Stack Troubleshooting : Efficiently troubleshoot issues across the entire stack, including software, applications, and network components.
- Project & Team Contribution : Manage individual project priorities, deadlines, and deliverables as an active member of a self-organizing team.
- Continuous Learning : Continuously learn and unlearn by exchanging knowledge, conducting constructive code reviews, and actively participating in retrospectives.
- 5+ years of extensive experience in Linux server administration, including patching, packaging (rpm), performance tuning, networking, user management, and security.
- 5+ years of experience implementing highly available, secure, scalable, and self-healing systems on the Azure cloud platform.
- Strong understanding of networking, especially in cloud environments, along with a deep understanding of CI/CD principles and practices.
- Proven experience implementing industry-standard security best practices, including those recommended by Azure.
- Proficiency with Bash and at least one high-level scripting language (e. Python, Ruby, Go).
- Solid working knowledge of observability stacks like ELK, Prometheus, Grafana, Signoz, etc.
- Expertise in Infrastructure as Code and Infrastructure Testing, preferably using Pulumi or Terraform.
- Hands-on experience in building and administering VMs and Containers using tools such as Docker and Kubernetes.
- Excellent communication skills, both spoken and written, with a demonstrated ability to articulate complex technical problems and projects clearly to all stakeholders.
- Experience with Pulumi with TypeScript or Golang.
- Familiarity with Node.js.
- Experience with Serverless infrastructure.
- In-depth Azure cloud certifications or advanced experience.
- Experience with governance processes and compliance validation, especially for financial services standards such as ISO 27001, SOC2, PCI DSS, etc.
- Prior experience working in product startups.
- Experience in administering and scaling PostgreSQL databases
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in