Kolkata, West Bengal, India
Information Technology
Full-Time
S M Software Solutions Inc
Overview
Work Location: PAN India
Duration: 12 Months (Extendable)
Shift: Rotational shifts including night shifts and weekend availability
Years of Experience: 8+ Years
🔧 Job Summary
We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment.
🎯 Key Responsibilities
Site Reliability Engineering
Duration: 12 Months (Extendable)
Shift: Rotational shifts including night shifts and weekend availability
Years of Experience: 8+ Years
🔧 Job Summary
We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment.
🎯 Key Responsibilities
Site Reliability Engineering
- Design, build, and maintain scalable, reliable infrastructure.
- Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet.
- Develop automation tools/scripts in Python, Go, Java, or Bash.
- Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers).
- Deploy and manage infrastructure on AWS or Kubernetes platforms.
- Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD).
- Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog.
- Conduct postmortems and define SLAs/SLOs to ensure high system reliability.
- Plan and implement capacity management, failover systems, and auto-scaling mechanisms.
- Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc.
- Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb).
- Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools.
- Build actionable alerts and dashboards to reduce alert fatigue and increase insight.
- Advocate for observability best practices with developers and define performance KPIs.
- Proven experience as an SRE or Observability Engineer in production environments.
- Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes).
- Proficient in scripting and automation (Python, Go, Bash, Java).
- Expertise in observability, monitoring, and alerting systems.
- Experience in Infrastructure as Code (IaC) and modern CI/CD practices.
- Strong troubleshooting skills and ability to respond to live production issues.
- Comfortable with rotational shifts, including nights and weekends.
- Ansible
- AWS Automation Services
- AWS CloudFormation
- AWS CodePipeline
- AWS CodeDeploy
- AWS DevOps Services
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in