Overview
You're the engineer who maintains uptime for 50+ SaaS products when others are still figuring things out. We need DevOps engineers capable of entering unknown AWS environments, bringing order to instability, and driving availability beyond 99.9% through proven monitoring, automation, and root cause analysis. You'll break down complex projects into daily deliverables, deploy production-ready Python or JavaScript, and leverage AI as a force multiplier.
Most organizations talk about "the cloud" while manually nursing individual systems. We're scaling reliability across a portfolio of acquired products where original engineers have departed and documentation is incomplete. The opportunity: you'll apply agents and contemporary tooling to understand unfamiliar systems 5–10x faster, document your findings, and automate solutions so recurrence becomes impossible. Rather than judging you on certifications and vendor badges, we'll observe how you troubleshoot in real time, produce a genuine 5-Whys that identifies one actionable root cause, and create automations resilient enough for production deployment.
This is not an L2 "execute the runbook" position. Here, you author the runbooks, architect the deployment path from development to staging to 10% to full rollout with soak periods and rollback conditions, and construct the monitoring that detects corner cases. You reject risky changes before they reach production. You distinguish infrastructure failures under your ownership from application bugs owned by Engineering, and you route permanent remediation to the appropriate team.
You'll operate at the engineering center of reliability, managing infrastructure initiatives, incident triage and RCAs, and change requests accompanied by copy-paste-ready runbooks. If you've already taken ownership of a significant SaaS platform and want to apply that expertise across a fleet, come join us. Bring advanced AWS knowledge, production-quality coding ability, disciplined scope management, and daily, mission-critical use of AI tooling. If you're prepared to maintain operational excellence, please apply.
What You Will Be Doing
- Executing complex infrastructure migrations, consolidations, production-quality automations, and monitoring enhancements
- Responding to production incidents, deploying immediate remediation, and authoring root cause analyses with permanent fixes routed to accountable teams
- Authoring, reviewing, and deploying production changes, including assessing whether proposed changes are safe for execution
- Spending your time in Jira and perpetual status calls - we prioritize people who deliver solutions, not just document issues
- Supporting legacy systems forever - you'll be empowered to pursue substantive improvements
- Waiting on bureaucratic approval layers - you'll possess the authority to deploy immediate fixes during incidents
- Lead reliability and standardization of cloud infrastructure across our expanding product portfolio by deploying robust monitoring, automation, and AWS best practices.
- Deep AWS infrastructure expertise (this is our primary platform - other cloud experience alone won't cut it)
- Experience owning large production infrastructure and troubleshooting production outages independently (not just following a runbook)
- Experience scripting with Python and Bash for day-to-day administration operations
- Experience managing and migrating production databases with multiple engines (including MySql, Postgres, Oracle, MS-SQL)
- Experience with infrastructure automation (Terraform, Ansible, or CloudFormation)
- Linux systems administration expertise
Hundreds of software businesses run on the Trilogy Business Platform. For three decades, Trilogy has been known for 3 things: Relentlessly seeking top talent, Innovating new technology, and incubating new businesses. Our technological innovation is spearheaded by a passion for simple customer-facing designs. Our incubation of new businesses ranges from entirely new moon-shot ideas to rearchitecting existing projects for today's modern cloud-based stack. Trilogy is a place where you can be surrounded with great people, be proud of doing great work, and grow your career by leaps and bounds.
There is so much to cover for this exciting role, and space here is limited. Hit the Apply button if you found this interesting and want to learn more. We look forward to meeting you!
Working with us
This is a full-time (40 hours per week), long-term position. The position is immediately available and requires entering into an independent contractor agreement with Crossover as a Contractor of Record. The compensation level for this role is $50 USD/hour, which equates to $100,000 USD/year assuming 40 hours per week and 50 weeks per year. The payment period is weekly. Consult www.crossover.com/help-and-faqs for more details on this topic.
Crossover Job Code: LJ-5236-IN-Ludhiana-SeniorDevOpsEn.006