Principal Site Reliability Engineer

Optimizely

Sorry, this job was removed at 2:09 p.m. (CST) on Thursday, November 14, 2019

View 740 Jobs

Find out who's hiring remotely in Austin.

See all Remote Developer + Engineer jobs in Austin

View 740 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Company Description

Optimizely is the world's leader in customer experience optimization, allowing businesses to dramatically drive up the value of their digital products, commerce and campaigns through its best in class experimentation software platform. By replacing digital guesswork with evidence-based results, Optimizely enables product and marketing professionals to accelerate innovation, lower the risk of new features, and drive up the return on investment from digital by up to 10X. Over 26 of the Fortune 100 companies choose Optimizely to power their global digital experiences. Optimizely’s impressive customer list includes eBay, FOX, IBM, The New York Times and many more global enterprises.

Job Description

SREs at Optimizely are focused on making Optimizely the most reliable, performant and trustworthy Digital Experience Optimization platform ever. Our engineering teams have built data pipelines that process 10 billion events daily and applications that support powerful experimentation and collaboration workflows at scale. Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform and Chef. We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical projects.

How you will make an impact

Define a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
Drive the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
Partner closely with Security, Quality, and Product teams to achieve high priority security, privacy, compliance, reliability and business-continuity objectives on our overall roadmap
Propose and drive large improvements to production systems to achieve significant impact to our business and engineering teams
Mentor and coach engineers to be curious and effective at discovering and solving technical challenges

Qualifications

You have proven experience (10+ years) demonstrating hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture
You have the skills to implement load, stress, performance and reliability testing standards at scale to improve service, platform and infrastructure resiliency
You promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
You demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
You enable the engineering organization to innovate and deliver with greater speed and safety

Additional Information

At Optimizely, we embody inclusion and embrace diversity. We believe in work/life balance and bringing our true selves to work. To that end, we offer best-in-class perks and benefits that support our Optinauts along their career journey with us. Read more about our culture at optimizely.com/careers.

Optimizely is an equal opportunity employer and makes employment decisions on the basis of merit. Optimizely prohibits discrimination based on race, color, religion, sex, sexual identity, gender identity, marital status, veteran status, nationality, citizenship, age, disability, medical condition, pregnancy, or any other unlawful consideration. All your information will be kept confidential according to EEO guidelines.

Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Read Full Job Description

Principal Site Reliability Engineer

Location

Similar Jobs