Sr. Site Reliability Engineer
Who We Are
Overhaul is a supply chain integrity solutions company that allows shippers to connect disparate sources of data into the first fully transparent situational analysis engine designed for the logistics industry. Data that is transformed into critical insights can instantly trigger corrective actions, impacting everything from temperature control to handling requirements or package-level tracking, ensuring cargo arrives at its destination safely, undamaged, and on time. We are a dynamic, innovative, and fun team who is highly committed to our customers’ experiences and our Mission and Vision.
The Role
At Overhaul, our site reliability engineers (SREs) use systems expertise combined with software engineering patterns to help define, create, and support our cloud architecture; and, build systems, orchestration, and operations of services across the business. As we continue to grow due to our success, the role comprises talented engineers focused on evangelizing reliability-as-a-feature through monitoring, service-level objectives, automation, everything-as-code, and testing.
"SRE is what happens when you ask a software engineer to design an operations team"- Google SRE Book
Objectives of this Role:
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large, distributed software applications
- Form and manage relationships with internal and external partners
Daily and Monthly Responsibilities:
- Being part of an on-call rotation to assist in finding a resolution during incidents
- Hosting blameless postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services
- Building positive and collaborative relationships across the company
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Championing automation to reduce toil and increase development velocity
- Balance feature development speed and reliability with well-defined service level objectives
- Applying everything-as-code methodologies across configuration, infrastructure, orchestration, and elsewhere
- This includes writing the code for these methodologies to succeed and/or patching and extending codebases
Required Skills and Qualifications:
- Excellent written and oral communication skills
- Empathic listener and persuasive speaker
- Ability to program with one or more general programming languages, such as Python, Go, Java, C/C++, Ruby, and JavaScript
- Modern public cloud experience (AWS, Azure, GCP)
- Experience with distributed storage technologies like NFS, HDFS, Ceph, or S3 as well as dynamic resource management frameworks (Kubernetes, Mesos, or Nomad)
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- A deep understanding (or desire for understanding and willingness to learn) how distributed systems work under the hood
Preferred Qualifications:
- Previous success in technical engineering
- Coding experience beyond simple scripts
- Hands-on experience designing, implementing, and maintaining infrastructure at scale using everything-as-code
Our Core Values and how they benefit you as an “Overhauler”
Authenticity, Receptivity and Trust
· Extremely competitive base salary package
· 401(k) with Overhaul match
· Flexible working schedules
· Remote, hybrid, and/or In-office*
Encouragement and Learning
· Progressive advancement opportunity & career mobility
· Paid development personal stipend
· Monthly lunch and learns
· 2 Unique learning systems w/Instructor led content
Wellness and Integrity
· Rotating Overhaul “Perks @ work” (Discounts and Freebies)
· Overhaul fully provided healthcare plan
· Employee assistance & wellbeing programs
· New Parent/Family/Caregiver leave(s)
· Daily BAMM time (body and mind movement)
· Life by design vacation policy
Diversity and Inclusivity Statement:
Overhaul has always been, and always will be, committed to diversity and inclusion. Our Overhaul Culture Code’s top listed commitment is to “Diversity and Synergy.” All aspects of employment will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. We strongly encourage people from underrepresented groups to apply!