Stord Logo

Stord

Staff Site Reliability Engineer

Posted 2 Days Ago
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
Seeking a Staff Site Reliability Engineer to enhance infrastructure reliability and performance using advanced engineering principles, primarily on Google Cloud Platform.
The summary above was generated by AI

Stord is The Consumer Experience Company, powering seamless checkout through delivery for today's leading brands. Stord is rapidly growing and is on track to double our revenue in the next 18 months. To meet and exceed this target, Stord is strategically scaling teams across the entire company, and seeking energetic experts to help us achieve our mission.

By combining comprehensive commerce-enablement technology with high-volume fulfillment services, Stord provides brands a platform to compete with retail giants. Stord manages over $10 billion of commerce annually through its fulfillment, warehousing, transportation, and operator-built software suite including OMS, Pre- and Post-Purchase, and WMS platforms. Stord is leveling the playing field for all brands to deliver the best consumer experience at scale.

With Stord, brands can increase cart conversion, improve unit economics, and drive sustained customer loyalty. Stord’s end-to-end commerce solutions combine best-in-class omnichannel fulfillment and shipping with leading technology to ensure fast shipping, reliable delivery promises, easy access to more channels, and improved margins on every order.

Hundreds of leading DTC and B2B companies like AG1, True Classic, Native, Seed Health, quip, goodr, Sundays for Dogs, and more trust Stord to deliver industry-leading consumer experiences on every order. Stord is headquartered in Atlanta with facilities across the United States, Canada, and Europe. Stord is backed by top-tier investors including Kleiner Perkins, Franklin Templeton, Founders Fund, Strike Capital, Baillie Gifford, and Salesforce Ventures.

We are seeking a scrappy, high-ownership Staff Site Reliability Engineer (SRE) to join our small, fast-moving SRE team. This role requires someone who can hit the ground running and make an immediate impact on the reliability, scalability, and performance of our production systems. You'll be a key technical leader bridging development and operations, applying advanced software engineering principles to complex infrastructure challenges with minimal hand-holding. In this high-autonomy environment, you'll drive high availability services, architect automation solutions, establish robust monitoring systems, and mentor team members while taking full ownership of critical infrastructure decisions.

What You’ll Do:

Infrastructure & Platform Management

  • Lead architecture decisions to deliver scalable and reliable infrastructure, primarily on Google Cloud Platform (GCP)

  • Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, Pulumi, or similar

  • Manage containerized environments with Docker and Kubernetes

  • Drive system performance tuning, capacity planning, and resource optimization

Reliability & Monitoring

  • Define and maintain Service Level Objectives (SLOs) and Indicators (SLIs)

  • Build robust monitoring, alerting, and observability solutions using Prometheus, Grafana, DataDog, or New Relic

  • Develop and maintain disaster recovery and business continuity strategies

Automation & DevOps

  • Design and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, etc.)

  • Automate operational workflows and infrastructure provisioning

  • Implement configuration management with Ansible, Chef, Puppet, or similar tools

  • Develop custom tooling and scripts to enhance operational efficiency

Collaboration & Support

  • Partner with engineering teams to improve deployment practices and application reliability

  • Provide escalation support for production incidents and lead post-incident reviews

  • Conduct technical design reviews and offer architectural guidance

  • Mentor junior engineers on SRE and infrastructure best practices

  • Participate in on-call rotations for critical systems

What You’ll Need:

Technical Skills

  • 8+ years of experience in site reliability, platform engineering, or infrastructure roles with leadership exposure

  • Proficiency in at least one programming language (Python, Go, Java, etc.)

  • Strong hands-on experience with GCP and its core services

  • Expertise in containerization (Docker) and orchestration (Kubernetes)

  • Deep knowledge of Infrastructure as Code (Terraform, CloudFormation, etc.)

  • Skilled in monitoring/observability (Prometheus, Grafana, ELK, etc.)

  • Solid understanding of networking, load balancing, and distributed systems

  • Experience with Git and collaborative development workflows

Core Competencies

  • Exceptional troubleshooting and problem-solving abilities

  • Strong grasp of system design principles and scalability patterns

  • Experience with incident management and post-mortem practices

  • Familiarity with security best practices and compliance standards

  • Excellent communication skills and ability to work cross-functionally

Preferred Qualifications:
  • Database administration experience (PostgreSQL, MySQL, Redis, etc.)

  • Familiarity with event-driven systems and platforms (Kafka, Pub/Sub, etc.)

  • Experience with log aggregation tools (ELK, Splunk, Fluentd)

  • Exposure to chaos engineering and resilience testing

  • Performance testing and optimization experience

  • Relevant GCP certifications (Cloud Architect, Cloud DevOps Engineer)

  • Knowledge of GCP-specific services (Cloud Run, GKE, Cloud Functions, BigQuery, etc.)

  • Experience with multi-cloud or hybrid architectures

  • Background in functional programming (Elixir, Haskell, F#, Clojure, etc.)

  • Strong DevOps background and mindset

Top Skills

Ansible
Chef
CloudFormation
Datadog
Docker
Elk
Fluentd
Github Actions
Gitlab Ci
Go
Google Cloud Platform
Grafana
Java
Jenkins
Kafka
Kubernetes
MySQL
New Relic
Postgres
Prometheus
Pub/Sub
Pulumi
Puppet
Python
Redis
Splunk
Terraform

Stord Austin, Texas, USA Office

Austin, TX, United States

Similar Jobs

3 Days Ago
Remote or Hybrid
New York, NY, USA
130K-180K Annually
Senior level
130K-180K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
The Staff Software Engineer will oversee SAP BTP CPI applications' operational support, manage incidents, collaborate with various teams, and ensure high system performance.
Top Skills: AbapCloud ApplicationsCpiErp SystemsIdocJSONOdataRestSap AribaSap BtpSap C4CSap CallidusSap Success FactorsSfapiSftpSoapWorkdayXML
21 Days Ago
Remote
United States of America
153K-205K Annually
Senior level
153K-205K Annually
Senior level
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Senior Site Reliability Engineer manages production infrastructure, ensuring performance and reliability using AI tools, Kubernetes, and CI/CD pipelines while mentoring teams.
Top Skills: Apache AirflowAWSAws LambdaAzureChatgptCi/CdCrossplaneGCPGeminiGithub CopilotGoKubernetesOpensearchPostgresPythonRedisSnowflakeTerraform
24 Days Ago
Easy Apply
Remote
USA
Easy Apply
219K-245K Annually
Expert/Leader
219K-245K Annually
Expert/Leader
Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics
The Staff Site Reliability Engineer will architect, operate, and improve the platform while ensuring security compliance and enhancing development processes.
Top Skills: AWSElasticsearchIstioKubernetesNatsNode.jsPostgresPythonReactTerraformTypescript

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account