Information Technology
Full-Time
UST
Overview
Role Description
UST Global is seeking a highly skilled Site Reliability Engineer (SRE) to work with one of the leading financial services organizations in the US. This role involves managing the end-to-end application and system stack, ensuring high reliability, scalability, and performance of distributed systems. As an SRE, you will combine software engineering and systems engineering to build and operate large-scale, fault-tolerant production environments.
Key Responsibilities
Gcp,Aws,Jenkins,Kubernetes
UST Global is seeking a highly skilled Site Reliability Engineer (SRE) to work with one of the leading financial services organizations in the US. This role involves managing the end-to-end application and system stack, ensuring high reliability, scalability, and performance of distributed systems. As an SRE, you will combine software engineering and systems engineering to build and operate large-scale, fault-tolerant production environments.
Key Responsibilities
- Engage in and improve the software development lifecycle – from design and development to deployment, operations, and refinement.
- Design, develop, and maintain large-scale infrastructure, CI/CD automation pipelines, and build tools.
- Influence infrastructure architecture, standards, and methods for highly scalable systems.
- Support services prior to production through infrastructure design, platform development, load testing, capacity planning, and launch reviews.
- Maintain and monitor services in production by tracking key performance indicators (availability, latency, system health).
- Automate scalability, resiliency, and system performance improvements.
- Investigate and resolve performance and reliability issues across large-scale and high-throughput services.
- Collaborate with architects and engineers to ensure applications are scalable, maintainable, and follow DR/HA strategies.
- Create and maintain documentation, runbooks, and operational guides.
- Implement corrective action plans with a focus on sustainable, preventative, and automated solutions.
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- 8+ years of experience as a Site Reliability Engineer or in a similar role.
- Strong hands-on expertise in Google Cloud Platform (GCP); experience with AWS is a plus.
- Proficiency in DevOps practices, CI/CD pipelines, and build tools (e.g., Jenkins).
- Solid understanding of container orchestration (Docker, Kubernetes).
- Familiarity with configuration management and deployment tools (Chef, Octopus, Puppet, Ansible, SaltStack, etc.).
- Strong cross-functional knowledge of systems, storage, networking, security, and databases.
- Experience operating production environments at scale with focus on availability and latency.
- Excellent communication, collaboration, and problem-solving skills.
- Strong system administration skills on Linux/Windows, with automation and orchestration experience.
- Hands-on with infrastructure as code (Terraform, CloudFormation).
- Proficiency in CI/CD tools and practices.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Passion for automation and eliminating manual toil.
- Experience working in highly secure, regulated, or compliant industries.
- Knowledge of security and compliance best practices.
- Experience in DevOps culture, thriving in collaborative and fast-paced environments
Gcp,Aws,Jenkins,Kubernetes
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in