About this role

Zenskar is building the operational backbone for how B2B companies run their business. As a DevOps Engineer, you will own the infrastructure that everything else runs on — and at a scaling SaaS company, that matters a lot. When infra is broken, nothing ships. When it's well-built, the rest of the team barely thinks about it. That's the bar.
This is not a ticket-queue role. You will not be a service desk for developers. You will design, build, and evolve the platform that keeps Zenskar's systems reliable, fast, and secure — and you'll do it with a software engineer's mindset, not an IT admin's.
  • Design and own cloud infrastructure end-to-end — from architecture decisions to production operations
  • Build and maintain CI/CD pipelines that make shipping safe, fast, and boring (boring is good)
  • Own the observability stack — make sure we know when something breaks before a customer does
  • Drive infrastructure cost optimisation without compromising reliability or developer experience
  • Work closely with backend engineers to make deployments, rollbacks, and incident response feel effortless
  • Identify, document, and eliminate toil — if you're doing something manually more than twice, automate it
  • Embed security and compliance thinking into infrastructure by default — not as a retrofit
  • Be the person who asks "what happens when this fails?" before anyone else does

THE IMPACT YOU'LL MAKE
  • Your infrastructure decisions will determine how reliably Zenskar's enterprise clients can run their business on our platform — downtime or data issues at this layer have direct consequences
  • You will build the foundation that lets the engineering team ship faster without breaking things
  • Your automation and tooling will compound over time — good work here multiplies everyone else's output
  • You will be the person who turns "the infra is always on fire" into "infra just works" — and that shift has a real, visible impact on the company's velocity


Key qualifications

Must have:
  • 3–5 years of hands-on DevOps, SRE, or Platform Engineering experience at a product company
  • Strong Kubernetes experience in production — if you've debugged a CrashLoopBackOff at 2am and lived to tell the tale, you're in the right place
  • Infrastructure-as-Code with Terraform — not just familiarity, but the ability to write, review, and refactor production-grade Terraform without hand-holding
  • Deep AWS experience — ECS/EKS, Lambda, CloudWatch, IAM, VPC, and enough Cost Explorer to know where money goes when bills spike
  • CI/CD ownership — you've built pipelines, not just used them; GitHub Actions, GitLab CI, or equivalent at real scale
  • Can describe the hard infra problems you've solved, why they were hard, and what changed as a result — not just a list of tools on a resume
  • Hands-on AWS ECS experience in production — task definitions, service scaling, capacity providers, deployment strategies, and circuit breakers; not just EC2 or generic container orchestration
  • Lambda operations at scale — function lifecycle management, event source mapping, cold start tuning, and migrating Lambda-based workloads to more appropriate compute patterns as systems mature
  • End-to-end observability ownership — alerting pipelines, custom metrics, structured log ingestion, and actually diagnosing production issues with the stack; not just setting up dashboards
  • Secrets and credentials management in AWS — rotation policies, least-privilege access patterns, and the security hygiene that keeps them clean over time

Good to have:
  • Scripting ability in Python or Go for automation and internal tooling — the kind of thing that saves a team hours every week
  • Observability stack hands-on — Prometheus, Grafana, VictoriaMetrics, or Datadog in production; comfortable diagnosing issues across services, not just building dashboards
  • Kustomize experience alongside Terraform for Kubernetes configuration management
  • Apache Airflow or similar data pipeline infrastructure
  • Security and compliance awareness — understands what SOC 2 means at the infra layer, not just on paper
  • Cost optimisation wins you can point to — concrete numbers, concrete impact
  • Experience building or maintaining an Internal Developer Portal (Backstage or similar)
  • B2B SaaS or fintech background — multi-tenant systems, external integrations, enterprise reliability expectations
  • Early-stage startup experience — comfortable when the runbook doesn't exist yet because you're writing it
  • Self-hosted identity infrastructure (Keycloak, Okta, Auth0, or equivalent) — operational experience, not just integration
  • Metrics-based autoscaling for worker fleets — scaling on queue depth or custom application metrics, not just CPU/memory
  • Not taking yourself too seriously :)

WHAT DRIVES YOU:
  • You treat infrastructure like software — version controlled, tested, reviewable, improvable
  • You automate the thing that annoyed you last week — without being asked
  • You own problems end-to-end: an incident isn't closed when the alert clears, it's closed when the postmortem is done and the fix is in
  • You have opinions on the right way to build infra, but you're not precious about them — you change your mind when the tradeoffs change
  • You thrive in environments where the answer to "what's the runbook for this?" is sometimes "write one"


Location

  • Hybrid — 2 days per week in office
  • Office Location: Indiranagar, Bengaluru
  • Address: 3rd Floor, A Wing No 1, Carlton Towers, HAL Old Airport Rd, HAL 2nd Stage, Indiranagar, Bengaluru, Karnataka 560008


Interview Process

Our interview process is structured, transparent, and efficient:
  • R0 – Recruiter Screening: Quick conversation to assess basic fit, motivation, and role expectations
  • Round 1 – Introductory Chat: Focuses on your past experience, the infra problems you've owned, and how you think about reliability and developer experience. We recommend reviewing the job description & CEO's recorded videos before this step
  • Round 2 – Technical Assessment & Discussion: Evaluates your system design instincts, infrastructure thinking, and how you approach real-world problems under constraints
  • Reference Checks: We request contact details of two former direct managers. The hiring manager will connect with them to better understand your working style and how you operate under pressure
  • Round 3: A final round-up of all the conversations
The process may vary slightly depending on whether we feel it would be useful for you to connect with additional members of the team