Site Reliability Engineer
About Us:
LogicMonitor is the leading SaaS-based performance monitoring platform for enterprise IT.
We love going to work and think you should too. We are customer-obsessed, work as one team, and strive to be better every day. These are our core values. So it's no surprise that we work hard and genuinely have fun working with each other to achieve great things together.
Right now, we are working from home temporarily due to Covid. Normally, you'll be working in the heart of downtown Santa Barbara. We are looking for you to bring your expertise, drive, and passion as we expand our global presence and achieve record-breaking success.
LogicMonitor is an equal opportunity employer. We’re committed to creating an inclusive environment for all our employees, where different backgrounds and perspectives are valued and encouraged - regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. We encourage all people to come as they are.
We operate with integrity, esteem diversity and treat each other fairly and with respect. We strive to find our own versions of personal and professional harmony through community building and holistic growth. We hear time and time again that our awesome people are a huge part of why LMers chose LogicMonitor, love their teams, and choose to stay.
To learn more about life at LogicMonitor, check out our Careers Page.
What You'll Do:
To maintain operational uptime of all mission critical systems. To facilitate and automate operational tasks while also looking for ways to streamline and improve them. Work with developers in order to provide feedback to make the the product function better within the LM infrastructure. Develop TechOps skill to become a valuable member of the core LM Operations team.
Here's a closer look at this key role:
- Maintain uptime of LogicMonitor's SaaS based service and drive technical/process enhancements to improve uptime
- Deploy production applications and drive improvements to the deployment process
- Design and deploy new application components
- Design and deploy new infrastructures and integrations
- Ensure security of the production environment
- Meet with prospective customers as needed
- Write code to automate various aspects of infrastructure maintenance and and deployments
- Support development and work closely with developers to drive operational and architecture/design changes
- Own, manage, and execute large and technically complex projects across teams
- Act as a strategic resource for the company with the ability to develop and deliver technical presentations for other departments, customers, and conferences
- Mentoring of more junior team members
- Lead by example in providing good documentation and thorough runbooks
- 3+ years experience working in SaaS based companies in an engineering role
- Solid understanding of linux system administration in distributed environments
- Solid understanding of automated deployments
- Experience with AWS
- Experience in various application scaling methodologies, including (but not limited to) load balancers
- Experience with configuration management tools such as Chef, Puppet or Ansible
- Experience with Java/Tomcat applications.
- Experience with CI and build systems
- Experience with virtualization and container technologies (Docker, Kubernetes, etc.)
- Experience with relational databases (MySQL) and NoSQL databases (eg MongoDB) in both administration and querying
- Significant programing/scripting experience (java/ruby/python/shell/go).
- Experience with source code management tools (git).
- Knowledge of security as related to linux systems, applications and networking.
- High level understanding of networking technologies (routing, switching, firewalls, iptables, etc)
- High level understanding of SOA and High Availability systems
- Excellent problem solving skills.
- A desire not just to resolve problems, but to fully understand them. We're looking for the tenacity and skill to quickly delve to the root of the problem, understand why it happened, and prevent it in the future.
- Experience with linux system administration + networking.
- Extensive experience with configuration management tools such as chef or puppet.
- Experience with source code management tools (git).
- Able to work without close supervison and self-direct projects.
- Experience with bamboo, or other continuous integration build environments.
- Experience with package management systems (RPM, ruby gems, etc)
- Pluses
- Cisco routing/switching, routing protocols (ospf/bgp).
- Netscaler or other load balancing technologies.
- Experience with Java programming for web applications.
- Have worked with Atlasssian products (Jira/Confluence/HipChat/etc)
- CS degree
#LI-PR1
Residents of California, click Here to view our California Applicant Privacy Notice.