Site Reliability Engineer (SRE)

Sorry, this job was removed at 11:42 a.m. (CST) on Wednesday, May 15, 2019
Find out who's hiring in Austin.
See all Developer + Engineer jobs in Austin
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Our SRE team is responsible for the overall performance and reliability of Evernote’s service and products. This includes over 200 million passionate and engaged users around the world, with billions of notes and files. We are looking for a Site Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users.

We participate in all aspects of running our platform at scale, focusing on both the service as it runs today and ensuring we can deliver new and exciting features rapidly to users. We have a real passion for automation and we continually seek to improve. We work hand-in-hand with product teams to help them ship production-ready services and get new features in our users' hands. We use Service Level Objectives (SLOs) based on Key Performance Indicators (KPIs) for each of our services and use them to allow us to move quickly while maintaining the quality service our users expect.

What you’ll do

  • Work closely with engineering teams to maintain and scale our existing production platform
  • Help us evolve what it means to be an SRE at Evernote
  • Evolve and implement production readiness standards for new services
  • Champion our SLOs and look to continuously improve them
  • Develop and maintain automation to reduce operations toil for the team
  • Participate in an on-call rotation for our production services

What we’re looking for

  • You possess a contagious sense of ownership and the tenacity to always find a way
  • You focus on quality to build manageable, scalable, and maintainable systems
  • You know that perfection is the enemy of done and when to make trade-offs
  • You emphasize the importance of making decisions based on data
  • You enjoy solving tough technical problems
  • You exercise judgement in a way which reduces risks
  • You share enthusiastically to reduce disconnects and communication breakdowns
  • You always want to understand the why in order to better see patterns and improve quality

What you’ve done

  • You know Linux systems like the back of your hand
  • You’ve managed production environments at scale in a public cloud environment (AWS or GCP)
  • You have a strong familiarity with web applications including MySQL, Java, Apache
  • You’ve attained a deep understanding of networking protocols (e.g. TCP/IP, HTTP, DNS, etc)
  • You’ve implemented and used third-party metrics and monitoring platforms such as DataDog and PagerDuty
  • You possess the ability to wrangle problems quickly using the tools available at your disposal
  • You’ve used configuration management and orchestration tools and you understand why they’re important
  • You’ve built extensible and maintainable automation (Shell, Python, or Go preferred)
  • You’ve run containerized microservices using Kubernetes

Skills that are particularly meaningful to us

  • Google Cloud Platform: GLB, Pub/Sub, Spanner, GCS, App Engine, and GKE
  • Monitoring: PagerDuty, DataDog, Splunk
  • Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform
  • Infrastructure: HAProxy, Envoy, ElasticSearch, Consul
  • Languages/Libraries: Go, Python, Java, Thrift, gRPC
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

6504 Bridge Point Parkway, Austin , TX 78730

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Evernote (unpublished/do not use)Find similar jobs