Overview
Location: BLR, Onsite
Company: Neoflo.ai (Headquartered in Singapore)
About Neoflo
Neoflo.ai is an early-stage startup rethinking how enterprise workflows should work in an
AI-first world. Our mission is to fundamentally rewire how enterprises think about business
processes, whether it is outbound sales, seller onboarding, or document-heavy workflows
across finance, logistics, insurance, and healthcare. Think: billions of PDFs, emails, and
faxes that still require armies of people to interpret and act on. We believe AI can—and
should—do the heavy lifting.
We’re building a universal AI context engine that interprets these documents, learns
patterns, and either automates actions or drastically reduces human effort. Think: "GPT for
back-office ops" meets "Zapier for messy business workflows" meets “Human judgment and
accountability of BPOs.”
About the Role
We’re looking for a hands-on DevOps Engineer who has scaled production AI/ML systems
and treats reliability as a product. You will own the infrastructure that runs Neoflo: the
deployment pipelines, the GPU fleet, the observability layer, and everything between a
model checkpoint and a paying customer.
This is a founding-team role. You are not just keeping the lights on. You are designing the
platform our engineers ship on, and the cost structure our gross margins depend on.
In this role, you will:
DevOps Engineer, Founding Team 2
• Own the infrastructure end-to-end across our cloud footprint (AWS / GCP), including VPC
design, IAM, secrets management, and multi-region setups for SOC 2 and customer data
residency requirements.
• Build and maintain CI/CD pipelines that take code from commit to production in minutes,
with automated tests, canary rollouts, and one-click rollback.
• Operate our LLM and VLM serving stack. This includes GPU capacity planning,
autoscaling inference workloads, optimizing cold-start latency, and managing spend
across self-hosted models and third-party APIs (OpenAI, Anthropic, etc.).
• Stand up observability that actually catches problems before customers do: structured
logs, traces, metrics, model-quality monitoring, and on-call alerting that doesn’t cry wolf.
• Drive infrastructure cost down as a workstream. Track unit economics per workflow,
identify the top spend drivers, and ship fixes.
• Harden the platform for enterprise customers: SOC 2, audit logging, data isolation,
encryption at rest and in transit, and the security reviews that come with selling to CFOs.
You Might Be a Fit If You...
• Have 5+ years of DevOps, SRE, or platform engineering experience, with at least 2 years
running production workloads that include ML or LLM inference.
• Are fluent with AWS, and have shipped infrastructure as code in Terraform or Pulumi.
• Have hands-on experience with Kubernetes in production: networking, autoscaling, GPU
node pools, and the failure modes that come with each.
• Know your way around the Python ecosystem well enough to read service code, debug
containers, and write tooling, even if you don’t ship features yourself.
• Have built and owned CI/CD pipelines (GitHub Actions, GitLab CI, ArgoCD, or similar)
that real engineering teams depend on.
• Can stand up an observability stack (Prometheus / Grafana / OpenTelemetry / Datadog)
and know what to monitor, not just how to install it.
• Operate with extreme ownership. When production breaks at 2 AM, you don’t wait to be
paged twice. When cloud spend spikes, you find out why before the finance team asks.
• Can break down vague, ambiguous infra requirements into concrete, shippable plans, and
push back when the right answer is “we don’t need this yet.”
Nice to Have
• Experience serving open-source LLMs or VLMs in production (vLLM, TGI, Triton, Ray
Serve), including quantization, batching, and GPU memory tuning.
• Background taking a company through SOC 2 Type II or ISO 27001, end to end.
• Experience with GPU cost optimization at scale: spot capacity, reserved instances,
multi-cloud arbitrage, or self-hosted clusters.
• Experience with data infrastructure (Postgres at scale, vector stores, queues like Kafka
or SQS, workflow orchestrators like Temporal or Airflow).
• Prior experience in a startup or early-stage environment where you’ve had to wear
multiple hats and ship without a playbook.