Overview
You're the engineer who maintains uptime for 50+ SaaS products when others are operating on intuition. We need DevOps professionals capable of entering unknown AWS environments, restoring stability, and driving availability beyond 99.9% through genuine monitoring, automation, and root-cause analysis. You'll break down complex initiatives into daily deliverables, deploy production-ready Python or JavaScript, and leverage AI as your assistant.
While most organizations boast about "cloud capabilities" while manually tending infrastructure, we're systematizing reliability across dozens of acquired offerings where original developers have departed and documentation is incomplete. The challenge: you'll employ agents and contemporary tools to explore unfamiliar systems 5–10× faster, document your findings, and automate solutions so recurring failures are eliminated. Rather than judging you on certifications and vendor logos, we'll observe how you troubleshoot in real time, author a genuine 5-Whys analysis that identifies one preventable root cause, and construct automations that withstand production conditions.
This is not an L2 "execute the runbook" position. Here, you author the runbooks, architect the deployment path from dev through staged to 10% to full rollout with soak periods and rollback criteria, and create the monitoring that detects edge cases. You block risky changes before deployment. You distinguish infrastructure failures you control from application bugs Engineering controls, and you route permanent remediation to the appropriate team.
You'll operate at the engineering center of reliability, managing infrastructure initiatives, incident triage and RCAs, and change tickets with copy-paste-ready runbooks. If you've already managed a significant SaaS offering and want to apply that expertise across a portfolio, join us. Bring advanced AWS knowledge, production-grade development skills, strict scope discipline, and daily, essential AI tool usage. If you're prepared to maintain operational excellence, please apply.
What You Will Be Doing
- Executing complex infrastructure migrations, consolidations, production-grade automations, and monitoring adjustments
- Managing production incident triage, deploying immediate remediation, and authoring root cause analyses with permanent fixes routed to accountable teams
- Drafting, reviewing, and implementing production changes, including assessing whether a suggested change is safe for execution
- Spending time in Jira and interminable status calls - we prioritize people who deliver solutions, not merely document issues
- Sustaining legacy systems perpetually - you'll be authorized to pursue substantive enhancements
- Waiting on bureaucratic approval processes - you'll possess the authority to deploy immediate fixes during incidents
- Lead reliability and standardization of cloud infrastructure across our expanding product portfolio by deploying comprehensive monitoring, automation, and AWS best practices.
- Deep AWS infrastructure expertise (this is our primary platform - other cloud experience alone won't cut it)
- Experience owning large production infrastructure and troubleshooting production outages independently (not just following a runbook)
- Experience scripting with Python and Bash for day-to-day administration operations
- Experience managing and migrating production databases with multiple engines (including MySql, Postgres, Oracle, MS-SQL)
- Experience with infrastructure automation (Terraform, Ansible, or CloudFormation)
- Linux systems administration expertise
Hundreds of software businesses run on the Trilogy Business Platform. For three decades, Trilogy has been known for 3 things: Relentlessly seeking top talent, Innovating new technology, and incubating new businesses. Our technological innovation is spearheaded by a passion for simple customer-facing designs. Our incubation of new businesses ranges from entirely new moon-shot ideas to rearchitecting existing projects for today's modern cloud-based stack. Trilogy is a place where you can be surrounded with great people, be proud of doing great work, and grow your career by leaps and bounds.
There is so much to cover for this exciting role, and space here is limited. Hit the Apply button if you found this interesting and want to learn more. We look forward to meeting you!
Working with us
This is a full-time (40 hours per week), long-term position. The position is immediately available and requires entering into an independent contractor agreement with Crossover as a Contractor of Record. The compensation level for this role is $50 USD/hour, which equates to $100,000 USD/year assuming 40 hours per week and 50 weeks per year. The payment period is weekly. Consult www.crossover.com/help-and-faqs for more details on this topic.
Crossover Job Code: LJ-5236-IN-Hyderaba-AWSArchitect.010