Intellum Jobs

Lead Site Reliability Engineer

Sorry, this job was removed at 03:33 p.m. (CST) on Thursday, Jun 18, 2026

Remote

Hiring Remotely in United States

Remote

Hiring Remotely in United States

Similar Jobs

Relativity

Site Reliability Engineer

11 Days Ago

Remote

Illinois, USA

150K-224K Annually

Senior level

150K-224K Annually

Senior level

Legal Tech • Software

Lead Site Reliability Engineer responsible for platform availability and reliability of RelativityOne. Drive SRE best practices, build tools, lead projects, coach SREs, work with stakeholders, support incidents, run postmortems, and improve monitoring, automation, and operational efficiency.

Top Skills: Ci/CdDevOpsJenkinsJIRAKubernetesAzureMonitoring And AlertingNew RelicNoSQLPowershellRelativity ServerRelativityoneSQLTableau

MongoDB

Site Reliability Engineer

Yesterday

Easy Apply

Remote or Hybrid

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

Snapsheet (snapsheetapp)

Senior Software Engineer

8 Days Ago

In-Office or Remote

130K-165K Annually

Senior level

130K-165K Annually

Senior level

Information Technology • Insurance • Professional Services • Software

The Senior Site Reliability Engineer will enhance system reliability through infrastructure automation, support core applications, and optimize performance, collaborating with development teams on deployment processes.

Top Skills: AWSCdktfCircleCICloudfrontDockerElasticsearchGithub ActionsJenkinsLambdaRdsRedisRuby On RailsS3TerraformTypescript

About us

Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies like Google, Meta, Amazon, Walmart, Xero, Atlassian, Mailchimp, Airbnb, Stripe, and TikTok rely on Intellum to engage and educate the audiences they touch.

We have always been a “remote first” company and are proud to have team members located all over the world. We value Curiosity, Creativity, Perseverance, and Kindness and strive to demonstrate these core values every day. Our culture is very important to us. We invest in our people in fun and exciting ways, including personal development budgets and an annual all-company retreat that is focused less on work and more on human connections. We are in growth mode, and our “smart growth” approach ensures that we will continue to scale our company effectively.

Summary

We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at architecture, code optimization, and deep troubleshooting. In this role, you will drive operational maturity by defining our reliability standards (SLOs), hardening our security posture (WAF/InfraSec), and scaling the Intellum platform.

Our stack

Core: Applications written in Ruby on Rails and Node.js, PostgreSql, MongoDB,, Redis, Memcached, Sidekiq, ActiveJob, Elasticsearch, Websockets
Infrastructure: 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, MemoryStore, RDS, CloudSQL, BigQuery etc.)
Infrastructure as Code (IaC): GitHub, Terragrunt, Terraform, Ansible
CI/CD: Spinnaker, Jenkins
Observability & Alerting: New Relic, AWS CloudWatch, Google Cloud Stackdriver, Squadcast
Agile/Scrum practices utilizing JIRA

Responsibilities

SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives.
Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience.
Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department.
Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline.
Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence.
Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".

Required Skills

Experience & Engineering

10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.

SRE & Operations

Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).

Leadership

Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.

Documentation & Training: Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems.
Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.

Bonus Skills

Automation Tools: Experience in developing solutions using server automation tools such as Terraform, Ansible.
CI/CD Expertise: Experience in writing and maintaining CI/CD pipelines and services.
Kubernetes: Experience in building, deploying, and optimizing Kubernetes-based infrastructure
Perimeter Defense: Experience configuring and managing Web Application Firewalls (WAF) (e.g., Cloudflare, AWS WAF, Akamai) and DDOS protection mechanisms.

Education

Bachelor’s degree in Computer Science or related technical field

BENEFITS

Medical - 100% of employee premiums for selected individual plans
Dental - 100% of employee premiums covered
Vision - 100% of employee premiums covered
LinkedIn Learning
401(k) plus matching (US Based Only)
Flexible PTO
Calm subscription
Annual Company Retreat

Intellum is an equal-opportunity employer. We're committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. We encourage you to apply for an open position and if you have questions about whether or not your job experience and skill set meet the requirements for a specific role, reach out to us directly at [email protected].

If you are an individual applying from CA, NY, CO, CT, MD, NV, or RI, please reach out to [email protected] to inquire about specific pay ranges.

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center