Ready to be a Titan?
The Staff Site Reliability Engineer (SRE) will play a critical role in building and scaling the infrastructure behind ServiceTitan’s new AI platform - an intelligent, always-on system that powers autonomous agents and real-time learning at scale.
You’ll own the reliability, performance, and deployment practices across multiple services and environments, driving innovation in automation, observability, and continuous delivery.
This role requires both technical depth and strategic thinking — someone who can architect solutions, mentor teams, and enable true operational excellence across engineering.
What you'll do:
Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure.
Establish true continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns.
Champion observability excellence - define best practices for metrics, tracing, and logging; help product team design meaningful SLIs, SLOs, and error budgets.
Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery.
Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews.
Establish best practices for observability that not only monitor service health but also track the end-to-end success/failure of complex, automated agent workflows and their business impact (SLIs/SLOs).
Leverage Infrastructure as Code (IaC) using Terraform, Kubernetes, and Docker to standardize environments and reduce manual intervention.
Contribute to and maintain CI/CD pipelines using GitHub Actions, Azure DevOps, or TeamCity.
Implement and improve service health dashboards with Mimir, Grafana, Prometheus, or ELK stack to ensure system visibility and reliability.
Mentor engineers and foster a reliability culture across teams — enabling others to build self-healing, observable systems.
What you'll bring:
Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
Solid experience in SRE, DevOps, or infrastructure engineering, with strong hands-on expertise in Azure.
Proven experience designing and operating distributed systems at scale with a strong understanding of reliability engineering principles (SLIs/SLOs/SLA).
Deep proficiency with Terraform, Kubernetes, Docker, and modern IaC and container orchestration best practices.
Expertise in CI/CD automation and release engineering - capable of implementing blue-green, canary, and rollback mechanisms.
Knowledge of SQL Server and PostgreSQL performance tuning and management in cloud environments is a plus
Advanced use of observability tools such as Mimir, Grafana, Prometheus, and ELK stack.
Experience promoting GitOps workflows and tools such as Argo CD or Flux.
Excellent troubleshooting, systems thinking, and mentoring skills.
Be Human With Us:
Being human isn’t about checking every box on a list. It’s about the experiences we have, people we meet, and the perspectives we share. So, if you have the skills but are hesitant to apply because of your background, apply anyway. We need amazing people like you to help us challenge the conventional and think differently about the problems that we’re solving. We’re in this together. Come be human, with us.
What We Offer:
When you join our team, you’re not just accepting a job. You’re making a career move. Here’s how we’ll support you in doing some of the most impactful work of your career:
Flextime, recognition, and support for autonomous work: Flexible time off with ample learning and development opportunities to continue growing your career. We offer a comprehensive onboarding program, leadership training for Titans at all levels, and other programs and events. Great work is rewarded through Bonusly, peer-nominated awards, and more.
Holistic health and wellness benefits: Company-paid medical, dental, and vision (with 100% employer paid options and 90% coverage for dependents), FSA and HSA, 401k match, and telehealth options including memberships to One Medical.
Support for Titans at all stages of life: Parental leave and support, up to $20k in fertility services (i.e. IUI and IVF), surrogacy, and adoption reimbursement, on demand maternity support through Maven Maternity, free breast milk shipping through Maven Milk, pet insurance, legal advisory services, financial planning tools, and more.
At ServiceTitan, we celebrate individuality and uniqueness. We believe that the convergence of fresh perspectives and experiences from all walks of life is what makes our product and culture so great. We strongly encourage people from underrepresented groups to apply. We do not discriminate against employees based on race, color, religion, sex, national origin, gender identity or expression, age, disability, pregnancy (including childbirth, breastfeeding, or related medical condition), genetic information, protected military or veteran status, sexual orientation, or any other characteristic protected by applicable federal, state or local laws.
ServiceTitan is committed to fair and equitable compensation for all of our employees. We thoughtfully consider a wide range of factors when determining individual compensation.The expected salary range for this role for candidates residing in the United States is between $183,400 USD - $245,400 USD. Compensation for candidates residing outside the United States will vary by location and the specific salary range will be discussed during the hiring process. Actual compensation for an individual may vary depending on skills, performance over time, qualifications, experience, and location. In addition to the base salary, the total compensation package also includes an annual bonus, equity and a holistic suite of benefits.Top Skills
Similar Jobs
What you need to know about the Austin Tech Scene
Key Facts About Austin Tech
- Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
- Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
- Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
- Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center



