Site Reliability Engineer
LogicMonitor is the leading SaaS based performance monitoring platform for enterprise IT.
We love going to work and think you should too. We hold our company culture near and dear – it represents an intermix between passion for leadership and passion for an active, healthy life centered around family and friends. LogicMonitor represents community, collaboration and camaraderie.
Located in the 500 West 2nd Street tower, our brand-new Austin office is best-in-class! Be inspired with panoramic downtown & Lady Bird Lake views, where snacks are plentiful and team outings are common. Our offices are sprinkled around the globe, too, with our headquarters in Santa Barbara, California and offices in London, Singapore, and Chengdu, China.
What You'll Do:
Interested in a leading role in the operational uptime and continued expansion of a company's production DevOps infrastructure? Then come join our amazing team of Site Reliability Engineers! We are hiring at all levels based on experience.
The Site Reliability Engineer is a key player to design and implement new production deployments of SOA-based software across global physical and cloud data centers, while maintaining operational uptime of all mission critical systems. You will provide assistance in organizing, securing and automating existing infrastructure and deployments. You will work closely with developers to provide feedback and force operational performance improvements within our product platform and operations infrastructure.
- Maintain uptime of LogicMonitor's SaaS based service and drive technical/process enhancements to improve uptime
- Deploy production applications and drive improvements to the deployment process
- Design and deploy new application components
- Design and deploy new infrastructures and integrations
- Ensure security of the production environment
- Write code to automate various aspects of infrastructure maintenance and and deployments
- Support development and work closely with developers to drive operational and architecture/design changes
- Own, manage, and execute large and technically complex projects across teams
- Act as a strategic resource for the company with the ability to develop and deliver technical presentations for other
- departments, customers, and conferences
- Mentoring of more junior team members
- Lead by example in providing good documentation and thorough runbooks
What You'll Need:
- 3+ years experience working in SaaS based companies
- Solid understanding of linux system administration in distributed environments
- Solid understanding of automated deployments
- Experience with AWS
- Knowledge of security as related to linux systems, applications and networking.
- Experience in various application scaling methodologies, including (but not limited to) load balancers
- Strong level understanding of networking technologies (routing, switching, firewalls, iptables, etc)
- An understanding of SOA
- Experience with configuration management tools such as chef, puppet or ansible
- Experience with java applications.
- Experience with CI and build systems
- Experience with relational databases (MySQL) and NoSQL databases (eg MongoDB) in both administration and querying
- Programming experience (java/ruby/python/shell).
- Experience with source code management tools (git).
- Able to work without close supervision and under pressure
- A desire not just to resolve problems, but to fully understand them. We're looking for the tenacity and skill to quickly delve to the root of the problem, understand why it happened, and prevent it in the future.
- Excellent problem solving skills.
- A geek at heart - it's the only way to be good at this sort of job