Pune, Maharashtra, India
Space Exploration & Research, Information Technology
Full-Time
Impronics Technologies
Overview
We are seeking a seasoned Site Reliability Engineer (SRE) with a solid background in payment systems and high-availability architectures. The ideal candidate will have hands-on experience managing large-scale, distributed systems in production, with a deep understanding of reliability, scalability, and performance tuning in the financial services or payments industry.
Key Responsibilities
Key Responsibilities
- Design, build, and maintain scalable, resilient, and secure infrastructure for high-volume payment platforms.
- Ensure system uptime, reliability, and performance through effective monitoring, alerting, and incident response strategies.
- Collaborate with software engineering and DevOps teams to implement CI/CD pipelines and improve deployment efficiency.
- Automate infrastructure management tasks using Infrastructure-as-Code (IaC) tools (Terraform, Ansible, etc.).
- Proactively identify and mitigate system bottlenecks, failures, and potential points of failure.
- Manage disaster recovery strategies, failover planning, and performance testing for critical payment services.
- Work with development teams to ensure services are designed for reliability, scalability, and observability from the ground up.
- Participate in root cause analysis and post-incident reviews to prevent future outages.
- 8+ years of overall experience in infrastructure engineering or SRE roles, with at least 3+ years in the payments/fintech domain.
- Strong understanding of payment protocols (UPI, IMPS, RTGS, NEFT, SWIFT, etc.) and transaction processing systems.
- Proven expertise in Linux systems administration, cloud platforms (AWS, GCP, or Azure), and container orchestration (Kubernetes).
- Solid experience with monitoring/logging tools like Prometheus, Grafana, ELK Stack, Splunk, etc.
- Proficiency in one or more scripting languages (Python, Shell, Go, etc.) for automation.
- Experience with incident management, SLAs, and system troubleshooting in high-pressure environments.
- Familiarity with security and compliance practices in the financial sector (e.g., PCI-DSS, ISO 27001).
- Previous experience supporting mission-critical applications in banking or financial services.
- Exposure to Kafka, Redis, or other real-time streaming and caching technologies.
- Experience with Site Reliability Engineering principles and implementing SLOs/SLIs.
- Understanding of the Error Budget (EL) concept and how it ties into availability and release decisions.
- Experience on any performance testing tool like K6, JMeter, LoadRunner.
- Familiarity with mocking tools like Mockito, WireMock, Microcks.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in