Senior CloudOps Engineer/Site Reliability Engineer at RapidDeploy (Remote)
Sorry, this job was removed at 11:23 a.m. (CST) on Monday, December 6, 2021
By clicking Apply Now you agree to share your profile information with the hiring company.
Join a purpose-driven, fast-growing enterprise software company that is working to transform 9-1-1.
The power to do remarkable things when it matters most is the heart of 9-1-1 and public safety. At RapidDeploy, we believe that regardless of size, geography or budget, everyone in public safety should have access to the data they need when it matters most to save more lives. That’s why, since 2016, our mission has been to reduce emergency response times and improve public safety. We are the industry’s only truly open and integrated emergency response platform with a portfolio of web-based cloud solutions that includes analytics, mapping, dispatch and first responder applications.
Over the past year, we have signed four states and our software has been deployed in more than 700 9-1-1 centers across the U.S. We have increased our annual recurring revenue by more than 20X. We’re hiring passionate team members to help propel us into the next stage of growth.
Site Reliability Engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their systems, services, and products. The RapidDeploy SRE team plays a crucial role in our mission to reduce emergency response times and improve public safety.
RapidDeploy is looking for a Site Reliability Engineer who will be part of a team who will be responsible for monitoring our production systems 24/7. We are looking to hire for our US based overnight shifts with weekend flexibility. Your primary responsibility will be to provide support when there is an incident and managing communications and escalations around the incidents. You will be monitoring our entire platform infrastructure and applications. You must be comfortable performing well under pressure with tight deadlines and communicate to larger audiences. Your other responsibilities will be to build monitoring and alerting tools around the availability, performance, and overall health of our services with scalability and automation in mind.
Responsibilities:
- Work with DevOps and DBA teams to support Cloud infrastructure.
- Work with Analytics team to support Eclipse Analytics.
- Work with Platform and other Development teams to support Nimbus/Radius front end applications and back end services.
- Work with IoT Team to support IoT Devices.
- Work with Customer Support team to provide technical support for customer reported issues.
- Work with QA and Implementation teams to provide insight on application and infrastructure performance with future releases.
- Be in a scheduled rotation for On Call duties which include receiving alerts from monitoring systems as well as internal escalations.
- Build and improve monitors and alerts to increase visibility of system health.
- Build tools or automation that can improve SRE role efficiencies or increase monitoring capabilities.
- Troubleshoot technical issues with infrastructure and applications.
- Operate as an Incident Commander role when Incidents are created. Escalate to other teams, be a central communication channel across teams, and make detailed timeline entries of actions taken during Incident.
- Produce Root Cause Analysis reports for customers.
- Write post-mortems for Incidents and review with internal teams.
Skills/Experience
- Bachelor's degree in Computer Science, Management Information Systems, or equivalent field with 1-2 years’ experience as a Site Reliability Engineer
- Experience with Cloud services, with preference with Azure around Application Insights, Logging, and Monitoring
- Reliability engineer, DevOps engineer, or Software engineer
- Familiarity of distributed systems and microservices
- Understanding of front end and back end architecture
- Experience with SQL databases
- Experience with Datadog or other monitoring and logging tools
- Programming/Scripting skills in a major language such as .NET, PowerShell
- Experience with deployment tools such as Terraform, Ansible, Puppet
- Experience in Kubernetes
- Strong communication skills
Behavioural competencies required
- Work well under pressure
- Good communication skills (Written and verbal)
- A good problem solver
- Have an inquisitive nature
About RapidDeploy Inc.
- Strong revenue growth (20x+ in last 12 months) & huge recent customer wins (e.g., State of California)
- Fast-growing, passionate, mission-driven team – we care about saving lives through technology!
- Offices in Austin, TX and Cape Town, South Africa
- Medical, dental, and vision insurance options, with benefits that kick in on your first day! Benefits are highly subsidized by RapidDeploy Inc.
- In addition to 8 observed holidays, RapidDeploy employees receive 20 days of PTO and unlimited sick leave.
- RapidDeploy is a well-funded, venture backed growth company
RapidDeploy is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status
RapidDeploy, Inc. and its affiliates understand that your privacy is important to you. When you apply for a position with RapidDeploy, we collect and process personal data for recruitment and other related Human Resources purposes. Review our HR Privacy Policy to learn how we collect, use, and protect your personal data in connection with our recruiting and HR efforts
Notice: RapidDeploy’s hiring policy prohibits hiring from existing or prospective customers.
Read Full Job Description