Overview
About Simplismart
Simplismart is a GenAI inference platform to deploy, scale, and monitor any GenAI model (LLMs, speech, vision, or diffusion) across cloud or on-prem. Built for strict SLAs, enterprise-grade security, and full observability. Its modular design lets you optimize for cost or latency or auto-select the best topology per workload.
Role Overview
As a Cloud Engineer, you will contribute to building a highly available, global, multi-cloud PaaS platform using open-source technologies to support Simplismart’s rapid growth. This system encompasses diverse environments (Kubernetes, VMs, bare metal compute) and provides a cohesive and reliable abstraction for running AI workloads. You will be able to work with cutting-edge technologies and solve complex problems.
To be successful in this role, you need to be deeply technical, possess strong communication and collaboration skills, and have experience in infrastructure-as-code. Proficiency with tools like Terraform and Ansible and strong software development fundamentals is essential. Additionally, you should have a good understanding of systems knowledge and troubleshooting abilities.
What is expecetd from you-
- Defining and implementing the foundational architecture that will support large-scale, GPU-accelerated machine learning workloads.
- Creating standards, abstractions, and tooling to formalise and orchestrate diverse GPU-based pipelines and workflows.
- Developing a robust internal continuous deployment system capable of handling multiple services and modules across heterogeneous environments.
- Building frameworks and systems that ensure reliability, observability, and fault tolerance for mission-critical workloads.
- Applying deep technical expertise to solve complex problems, guide design decisions, and influence technical direction.
- Using infrastructure-as-code tools such as Terraform and Ansible to provision, manage, and automate infrastructure at scale.
- Delivering high-quality, maintainable, and performant code with strong software engineering fundamentals.
- Demonstrating strong understanding of distributed systems, networking, Linux internals, and troubleshooting complex system behaviour.
- Working effectively with cross-functional teams, clearly communicating technical ideas, and collaborating on architecture and implementation.
- Operating independently when needed, taking ownership of problems, and proactively driving improvements across the platform.
- Showing self-motivation, curiosity, and a forward-looking mindset to continuously evolve the platform and its tooling.
What We’re Looking For-
- 8+ years of experience writing high-performance, well-tested, production-quality code and platform engineering.
- Proficiency in at least one backend programming language (Python desired; C++ is a plus)
- Demonstrated experience with high-performance or distributed cloud microservices architectures.
- Ideally, you should have experience building and operating globally using multiple cloud providers such as AWS, Azure, or GCP. (Also, exp. with Neoclouds like Nebius/ Yotta is a plus.)
- A good understanding of low-level operating systems concepts, including multi-threading, memory management, networking and storage, performance, and scale.
- Pragmatic, methodical, well-organized, detail-oriented, and self-starting.
- Experience with Kubernetes, containerization, Terraform and Ansible.
- Knowledge of GPU programming, NCCL and CUDA is a plus.
Why Join Simplismart?
- Opportunity to define and lead the brand identity of a fast-growing GenAI company.
- Work closely with leadership on high-impact initiatives from global event campaigns to overall storytelling.
- Be part of a team that values design as a strategic lever, not just execution.
- Competitive compensation and growth opportunities in a high-energy startup environment.