Zello Jobs

Senior Site Reliability Engineer

Zello

Senior Site Reliability Engineer

Reposted 23 Days Ago

Hybrid

Austin, TX, USA

Senior level

Hybrid

Austin, TX, USA

Senior level

The Senior Site Reliability Engineer will manage the reliability of Zello's data tier, contribute to monitoring and incident response while improving cloud infrastructure and database performance.

The summary above was generated by AI

IMPORTANT: Please be aware, scammers may try to impersonate Zello by reaching out regarding job opportunities. We will never ask you for bank account information, checks, or other sensitive information as part of our hiring process. All correspondence will come from the zello.com email domain. If you’re unsure, please email [email protected] with questions.

About Zello

Zello is a voice-first communication platform, powered by our industry-leading push-to-talk technology, to improve collaboration and productivity for desk-less workers. With over 175+ million users, we’re the #1 rated push-to-talk app in the world, delivering 9 billion (yes, with a B) messages a month.

At Zello, our company values are at the heart of what we do everyday. We’re proud to serve the frontline, we’re privileged to connect people in times of crisis across the globe, and we’re honored to support first responders.

And this is where you come in.

We're seeking a Senior Site Reliability Engineer who can own our data tier at high availability while also pulling weight across the broader platform. As Zello scales, the line between "database problem" and "platform problem" keeps blurring. We want someone who can sit on either side of it. This hire owns our data tier reliability (MySQL, MongoDB, ScyllaDB, Elasticsearch, Redis) and contributes to monitoring, on-call, and our ongoing cloud modernization efforts.

About Zello

Zello is the leading push-to-talk communication platform, enabling instant voice communication for frontline workers across hospitality, logistics, transportation, construction, and public safety. When a hotel manager radios housekeeping or a trucker calls dispatch, they're on Zello — and they need it to work every time. The Platform team builds and operates the infrastructure that makes that possible. Databases sit at the center of that promise: every channel, every message, every login depends on them.

The Role

You'll join the Platform team and report to the Director of Platform Engineering. You'll own the reliability of our MySQL and MongoDB footprint across Google Cloud, work alongside application engineers on performance and schema decisions, and contribute to the broader platform, observability with Prometheus, Loki, and Tempo; on-call; incident response;. This role suits someone who likes operating real production systems, doesn't get stage fright in incidents, and writes the runbook for the next person who hits the same problem.
We're investing in AI to compress incident response, build agents and tooling that speed up root-cause analysis, and lift developer productivity across engineering. We want someone curious about what that looks like for an SRE and excited to help shape it.

After a Successful First Year, You Will Have:

Operated Zello's MySQL and MongoDB clusters to documented availability targets, with automated backups, regularly tested restores, and failover the on-call team trusts under real incident pressure.
Cut latency or capacity cost on at least one critical database workload through measurable performance work — index strategy, query tuning, schema changes, or sharding.
Extended our Observability coverage so incidents are diagnosed in minutes rather than hours, with dashboards and alerts the team actually uses.
Owned a slice of the Platform on-call rotation and led postmortems that turned recurring incidents into permanent fixes.

What You'll Do

Design, deploy, and operate highly available MySQL and MongoDB clusters across our cloud environments; replication, sharding, backups, point-in-time recovery, upgrades, and disaster recovery.
Tune query performance, schema, and index strategy in partnership with application engineers and push fixes upstream into the application when that's the right answer.
Extend our observability stack — Prometheus, Loki, and Tempo — so the data tier is as well instrumented as the application tier, and traces actually reach the root cause.
Participate in the Platform on-call rotation, lead incident response for data-tier issues, and write postmortems that drive durable change.
Improve disaster recovery, security posture, and compliance for our database footprint — encryption, access control, audit logging, backup integrity.
Evaluate and operate ScyllaDB/Cassandra and Elasticsearch where they fit the workload, and bring an opinion on when they don't.
Write the automation, tooling, and operators that take repetitive work off the team's plate.
Use AI to compress incident response and root-cause analysis; building agents, automation, and developer-enablement tooling that scale the team's reliability work

Who You Are

You've operated highly available MySQL and MongoDB in production at scale; replication, sharding, backups, point-in-time recovery, and failover drills you've actually run, not just designed on paper.
You diagnose database performance end-to-end; query plan, indexes, locking, OS, storage, network — and can point to specific incidents where you found and fixed root cause that others had missed.
You've shipped meaningful work on at least two of bare metal Linux, containerized workloads (Docker, Kubernetes, or similar), and a major cloud (GCP preferred; AWS or Azure equivalent is fine).
You instrument what you build. You've used Prometheus, OpenTelemetry, or comparable systems to close real incidents, and you've written the dashboard the next on-call engineer will actually open.
You write code that runs in production: Python, Go, Bash, or similar for automation, tooling, or operators. You don't hand off scripting to someone else.
You communicate clearly under pressure and after the fact. Your postmortems are blameless, specific, and lead to changes that stick — and the people you've worked with describe collaborating with you as straightforward.
You bring an opinion on managed vs. self-managed databases, and can defend the trade-off based on availability, cost, and operational burden.
7+ years in SRE, DevOps, platform, infrastructure, or database reliability roles, with at least 3 years owning production databases.
BSc in Computer Science or equivalent practical experience.
ScyllaDB/Cassandra or Elasticsearch experience is a plus
You've used AI tooling: copilots, agents, or custom automation to expedite incident response, root-cause analysis, or developer workflows.

We hire for potential, passion for our mission, and a knack for solving difficult problems over checking every qualification box. We have competitive pay, equity with significant upside, and intentionally design our benefits to encourage healthy and well-balanced employees, flexible schedules and time off. We even offer a sabbatical after every five years of service so you’re able to pursue and enjoy what matters most to you. And of course, we wouldn’t be a technology company without a ping-pong table and free snacks in our break room. Join us!

Zello provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
All Zello personnel are required to comply with defined security, privacy, and compliance requirements applicable to their role along with requirements that are applicable to all Zello personnel.

We're at downtown Austin on West 6th, with quick highway access. Directly across from Mean Eyed Cat and El Arroyo, there’s always somewhere for lunch or drinks.

Similar Jobs at Zello

Zello

Artificial Intelligence Engineer

20 Hours Ago

Hybrid

Austin, TX, USA

Mid level

Logistics • Mobile • Productivity • Software • Transportation

The Applied AI Engineer will build and maintain AI agents, manage integrations and monitor performance, ensuring quality and continuous improvement of AI tools at Zello.

Top Skills: APIsHubspotJIRALlmsPythonSlackSnowflake

Zello

Product Designer

Yesterday

Hybrid

Austin, TX, USA

105-105 Hourly

Senior level

105-105 Hourly

Senior level

Logistics • Mobile • Productivity • Software • Transportation

The Design System Specialist will manage the design backlog across iOS, Android, and React, deliver Figma components, conduct audits for cross-platform consistency, and partner with engineering leads for implementation accuracy while contributing to product design work.

Top Skills: FigmaMaterial Design 3ReactStorybookStyle DictionarySwiftuiUikit

Zello

Product Marketing Manager

2 Days Ago

Hybrid

Austin, TX, USA

Mid level

Logistics • Mobile • Productivity • Software • Transportation

Own vertical positioning and deal-close enablement for enterprise segments. Build messaging, ROI/business cases, sales enablement assets, case studies, and competitive intel. Partner cross-functionally with Sales, Product, Customer Success, and Partnerships to drive adoption of PMM assets and improve late-stage conversion and win rates.

Top Skills: AICRM

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Zello

Senior Site Reliability Engineer

Zello Austin, Texas, USA Office

Similar Jobs at Zello

Artificial Intelligence Engineer

Product Designer

Product Marketing Manager

What you need to know about the Austin Tech Scene

Key Facts About Austin Tech