Freenome Logo

Freenome

Senior Software Engineer - Reliability

Reposted 17 Hours Ago
Easy Apply
Remote
Hiring Remotely in USA
131K-201K Annually
Senior level
Easy Apply
Remote
Hiring Remotely in USA
131K-201K Annually
Senior level
As a Senior Software Engineer in Reliability, you'll design and implement observability and incident management for reliable cloud-based systems, collaborating on SLIs/SLOs and automating operational tasks.
The summary above was generated by AI

About this opportunity:

Our Site Reliability Engineering (SRE) team is a new and critical function at Freenome. As a founding member of the team, you’ll help define the culture and build the systems that keep our regulated, cloud-based production environments reliable as we transition from research to commercial operations. This is an opportunity to do meaningful engineering work that will directly save lives.

We value:

  • Reliability as a product feature.
  • Continual improvement and learning.
  • Automate all the things!
  • Technical simplicity and clarity.
  • Blameless postmortems and transparent communication.

As a Site Reliability Engineer, you will help design, implement, and operate observability, reliability, and incident management systems and practices across our clinical lab systems and regulated commercial workloads. You’ll partner with engineering teams to define service-level indicators (SLIs), objectives (SLOs), and error budgets; build runbooks and operational playbooks; and develop the monitoring and automation needed to ensure that our systems are reliable and compliant.  This will also include contributions to system code, Infrastructure deployments and automation.

This role is ideal for an engineer with experience running production workloads in the cloud, who is excited to build an SRE practice from the ground up in a regulated environment.  

The role reports to the Director, Cloud Infrastructure.

What you’ll do:

  • Define and implement observability practices (metrics, traces, dashboards, logs, alerts) for production systems.
  • Partner with product, engineering, and lab teams to develop and maintain incident response playbooks and escalation procedures.
  • Partner with engineering teams to define SLIs/SLOs and establish error budgets.
  • Participate in on-call rotation for production systems, champion a focus on automation and self-healing.
  • Contribute to production deployment and change-management processes that meet FDA and compliance requirements.
  • Automate operational tasks, reducing manual intervention.
  • Contribute to production systems and designs with the goal of improving reliability.
  • Use Infrastructure as Code (IaC) to manage and deploy team owned infrastructure and subsystems.
  • Help build out the SRE practice.

Communication and Collaboration:

  • Work closely with engineering, product, and lab teams to understand service reliability needs.
  • Partner with TPMs, RA/QA, and compliance stakeholders to align operational practices with regulatory requirements.
  • Participate in cross-functional incident reviews and postmortems.
  • Share knowledge and document operational standards for consistency and onboarding.
  • Design and run fire drills / tabletop exercises as well as disaster recovery exercises.

Culture:

  • Model Freenome’s values and principles in your work and interactions.
  • Promote a collaborative, reliable engineering culture across product, infra, and lab engineering teams.
  • Contribute to documentation, runbooks, and operational standards.
  • Foster a culture of accountability, learning, and psychological safety.

Technical Leadership: 

  • Independently drive reliability improvements in scoped systems or services.
  • Provide mentorship to peers on observability, incident management, and operational best practices.
  • Help build and evolve Freenome’s reliability practices and contribute to team strategy discussions.

Must haves:

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years in software engineering or Infra/DevOps/SRE roles (Python or Go are what we currently use).
  • Experience deploying cloud infrastructure via automation (e.g. Terraform, Pulumi, Bicep/ARM, etc.).
  • Incident management experience in cloud/software engineering as well as familiarity with incident management platforms (e.g., Incident.io, ServiceNow, Opsgenie, Pagerduty, etc.).
  • Hands-on experience operating production workloads in cloud environments.
  • Familiarity with Kubernetes (AKS, GKE, or EKS).
  • Strong troubleshooting and root-cause analysis skills in distributed systems.
  • Experience with observability platforms (e.g., DataDog, Prometheus/Grafana, OpenTelemetry).
  • Ability to define and implement metrics, dashboards, and alerting.
  • Demonstrated ability to work autonomously and own technical outcomes.
  • Strong understanding of cloud Infrastructure and Networking architectures and automation.

Nice to haves:

  • Experience supporting regulated environments (healthcare, biotech, financial).
  • Familiarity with compliance-driven change management and release processes (FDA, HIPAA).
  • Knowledge of CI/CD deployment strategies and change automation.
  • Experience with both GCP and Azure cloud platforms.
  • Interest in mentorship and system reliability practices at scale.

Benefits and additional information:

The US target range of our base salary for new hires is $131,325 - $201,000. You will also be eligible to receive equity, cash bonuses, and a full range of medical, financial, and other benefits depending on the position offered.  Please note that individual total compensation for this position will be determined at the Company’s sole discretion and may vary based on several factors, including but not limited to, location, skill level, years and depth of relevant experience, and education. We invite you to check out our career page @ freenome.com/job-openings/ for additional company information.  

Freenome is proud to be an equal-opportunity employer, and we value diversity. Freenome does not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.

Applicants have rights under Federal Employment Laws.  

  • Family & Medical Leave Act (FMLA)
  • Equal Employment Opportunity (EEO)
  • Employee Polygraph Protection Act (EPPA)

#LI-REMOTE

Top Skills

Bicep
Datadog
Go
Grafana
Kubernetes
Opentelemetry
Prometheus
Pulumi
Python
Terraform

Similar Jobs

7 Days Ago
Easy Apply
Remote
United States
Easy Apply
164K-226K Annually
Senior level
164K-226K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Senior Software Engineer focused on Site Reliability Tooling, you'll enhance system reliability, implement SRE practices, and build automation tools to support site reliability across Upstart's infrastructure.
Top Skills: CdkCloudFormationDatadogGoJavaScriptKubernetesPrometheusPythonTerraformTypescript
22 Days Ago
Easy Apply
Remote or Hybrid
2 Locations
Easy Apply
187K-224K Annually
Senior level
187K-224K Annually
Senior level
eCommerce • Healthtech • Kids + Family • Retail • Social Media
Seeking a Senior Software Engineer, Site Reliability to ensure system stability, scalability, and reliability, while optimizing AWS infrastructure using modern DevOps practices and tools like Terraform, Docker, and Kubernetes.
Top Skills: AWSCircleCICronitorDatadogDockerGithub ActionsJenkinsKubernetesMySQLPagerdutyReactRedisRuby On RailsSentrySidekiqTerraform
6 Days Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
154K-220K Annually
Senior level
154K-220K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves managing high-impact customer escalations, acting as a liaison between engineering and support, debugging complex cloud issues, and enhancing product reliability.
Top Skills: C ProgrammingCurlDockerKubernetesLinuxPostmanTcp/IpUnix

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account