PlayOn Sports Jobs

Senior Site Reliability Engineer

PlayOn Sports

Senior Site Reliability Engineer

Reposted 20 Days Ago

Remote

Hiring Remotely in USA

Senior level

Remote

Hiring Remotely in USA

Senior level

Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.

The summary above was generated by AI

Playon is looking for an experienced Senior Site Reliability Engineer to help us strengthen the reliability, performance, and scalability of our systems. This role sits at the intersection of software engineering and operations — focused on building the tools, automation, and visibility that enable our teams to deliver resilient software at scale.

You’ll work closely with application engineers, DevOps, and QA teams to evolve our infrastructure, CI/CD pipelines, observability frameworks, and reliability practices. This is a hands-on engineering role with a strong emphasis on automation, performance analysis, and continuous improvement.

The Outcomes You’ll Deliver:

In the first few months, You'll focus on building a clear understanding of our systems and establishing the foundation for stronger observability across our platforms. As you settle in, your scope will grow to include broader reliability and performance initiatives.

• Assess and improve visibility: Work with engineering teams to review our current dashboards, metrics, and logs, identify the biggest gaps, and make targeted improvements that help us better understand system health.

• Tighten monitoring and alerting: Refine alerts and dashboards for the most critical services so we can catch issues earlier and respond faster.

• Build observability into delivery: Add instrumentation and telemetry into existing build and deploy processes to make reliability checks part of our normal release workflow.

• Clarify what "reliable" means: Help define initial SLIs and SLOs for a few core user flows, aligning the team on what good performance and availability look like.

• Streamline incident response: Partner with the Event Commander/on-call rotation to improve how we communicate, coordinate, and follow up during incidents.

• Reduce manual effort: Automate routine checks and monitoring tasks to free up engineers for more impactful work. Over time, you'll take on a larger role shaping how we measure, monitor, and improve reliability across all services — setting standards, mentoring others, and helping engineering teams make data-driven decisions about performance and stability.

In this role, you can expect to

Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery.

Develop automation, tooling, and monitoring solutions to support high service availability.

Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing.

Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning.

Participate in on-call rotations to support critical services and ensure rapid response to incidents.

To thrive in this role, you have

Solid experience in Python, especially for automation, tooling, and data-driven operational tasks.

Proficiency in at least one (Java, C++, or Go).

Strong understanding of Linux systems, cloud infrastructure (AWS, GCP, or Azure), and modern deployment practices (Docker, Kubernetes, Terraform).

Experience with CI/CD pipelines, version control, and automated testing frameworks.

Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.) and log/metric analysis for diagnosing issues.

Proven experience facilitating and documenting Critical User Journeys translating them to actionable SLA/SLO for automation.

Demonstrated ability to collaborate with cross-functional teams and communicate clearly in high-impact situations.

A problem-solver who approaches reliability as a shared responsibility across engineering.

Familiarity with AI-augmented development tools (Claude, Codex) as part of a modern engineering workflow.

Nice to Have

Experience writing or maintaining end-to-end or integration tests for distributed systems.

Background in performance testing, capacity planning, or chaos engineering.

Contributions to internal developer tooling or reliability-focused frameworks.

Exposure to security, compliance, or change management processes in production environments.

Relevant certifications.

PlayOn is where high school sports come to life. Through GoFan, NFHS Network, and MaxPreps, we give every fan a front-row seat to the moments that matter most: the buzzer-beaters, the comeback wins, the senior nights, the rivalries that define a town.

We built our technology for the people who live and breathe high school athletics — the parents who never miss a game, the alumni still cheering from across the country, the communities that show up week after week. From buying tickets to watching a live stream to reliving the highlights, we make it simple to stay close to the sports and the athletes you love most.

Backed by KKR, we build the technology that powers high school athletics from the inside out: Schools trust us to handle ticketing, streaming, fundraising, concessions, merchandise, and more so the people running programs can stay focused on the athletes and fans we all serve together.

We're a growth-stage company on a mission to make high school sports more accessible, more memorable, and more connected than ever before.

When being there means everything, we make sure you never miss a moment.

Why You'll Love Working at PlayOn

Product, potential, and people. We’re a leader in the high school event space, constantly evolving our product to meet the needs of administrators. We focus on solving real challenges, learning quickly, and creating impactful solutions.

This is a growth-stage company, meaning your contributions have real impact. You’ll have opportunities to grow your skills, tackle meaningful problems, and make a difference in the lives of schools and the students and fans they serve.

Our culture is built on accountability, collaboration, growth, and fairness. We don’t just show up—we show up for each other. Everyone wears the same jersey, and we play hard, make the extra pass, and cheer one another on. Losses teach us, challenges motivate us, and persistence drives us forward. We value integrity over shortcuts, choosing to do what’s right even when it’s hard. Together, we strive to be better every day—because we know that’s how we win as a team.

The Benefits We Offer

Multiple medical insurance plans to choose from

Dental, vision life and disability insurance

Employee Emergency Fund

Company equity (stock options)

Open PTO policy

401K plan with company match

Hybrid/flexible work environment

Note: Must be a full-time employee to participate in the company’s employee health benefit plan. Part-time employees and interns are not eligible to participate.

Similar Jobs

Vertafore

Senior Site Reliability Engineer

6 Days Ago

Remote or Hybrid

Colorado, USA

110K-155K Annually

Senior level

110K-155K Annually

Senior level

Information Technology • Insurance • Software

Own and operate production services end-to-end to ensure reliability, scalability, performance, and operational health. Define SLIs/SLOs, perform incident response and root cause analysis, build automation and self-healing, manage production changes, and collaborate with engineering, product, and operations teams to improve system design and observability.

Top Skills: .NetAWSC#Ci/CdInfrastructure As CodeJavaKubernetesLinuxPythonReactRelational DatabasesWindows

Onebrief

Senior Site Reliability Engineer

2 Days Ago

Remote

United States

180K-220K Annually

Senior level

180K-220K Annually

Senior level

Software • Defense

Own reliability, scalability, and security for on-prem and AWS deployments. Build observability (Prometheus/Loki/Grafana/ELK), define SLOs/SLIs, lead incident response and postmortems, automate infrastructure (Terraform/Ansible), operate Kubernetes clusters, embed security/compliance controls, eliminate operational toil, and mentor teams.

Top Skills: AlloyAnsibleAWSAws GovcloudBashCloudFormationDatadogElkGithub ActionsGitlab Ci/CdGoGrafanaJenkinsKubernetesLokiPrometheusPythonRmfStigsTerraform

Veeam

Senior Site Reliability Engineer

4 Days Ago

Remote

United States

173K-321K Annually

Senior level

173K-321K Annually

Senior level

Cloud • Security • Software • Cybersecurity

Senior SRE to build and run Veeam's Government/Sovereign-cloud reliability practice. Responsibilities include mapping platform workloads, writing runbooks, defining SLIs/SLOs, designing HA on Azure Government, incident response and postmortems, closing observability gaps, automation and IaC in compliance-restricted environments, CI/CD/GitOps pipelines, on-call rotations, and cross-team collaboration and mentoring.

Top Skills: Api ManagementApplication InsightsArgocdArm TemplatesAWSAws CloudformationAws GovcloudAzureAzure DevopsAzure FunctionsAzure GovernmentAzure MonitorAzure StorageBitbucketC#Ci/CdCosmos DbDaggerElastic Stack (Elk)Entra IdFluxcdGitGithub ActionsGitlab CiGitopsGoGrafanaJavaJavaScriptKubernetesMicrosoft TfsOpentelemetryPrometheusPulumiServerless FrameworkTerraformTerragruntTypescript

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center