CyrusOne Logo

CyrusOne

Senior Reliability Engineer

Posted 2 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in USA
140K-170K Annually
Expert/Leader
Remote
Hiring Remotely in USA
140K-170K Annually
Expert/Leader
The Senior Reliability Engineer leads the reliability strategy for mission-critical data centers, overseeing risk management, predictive analytics, and continuous improvement initiatives.
The summary above was generated by AI
The Senior Reliability Engineer serves as a subject-matter expert and strategic technical authority for infrastructure reliability across a portfolio of mission-critical data center sites. This role leads the design, governance, and continuous improvement of reliability strategies for power, cooling, and control systems, applying advanced engineering judgment, analytics, and risk-based decision-making.
The Senior Reliability Engineer independently evaluates complex reliability risks, prioritizes initiatives under uncertainty, and influences operational, maintenance, and capital decisions that materially impact uptime, safety, and lifecycle cost. This role operates with minimal oversight and is expected to shape standards, mentor others, and elevate reliability capability across the organization.

Responsibilities:

Enterprise Reliability Strategy & Asset Care

  • Architect and govern portfolio-level, risk-based asset strategies for mission-critical power and cooling infrastructure.
  • Apply advanced RCM principles to define maintenance and inspection strategies aligned to failure risk, system criticality, and redundancy posture.
  • Evaluate and balance tradeoffs between maintenance investment, operational risk, spares coverage, redundancy, and capital replacement.
  • Establish and maintain enterprise PM quality standards, including audits, task effectiveness reviews, and elimination of low-value maintenance.

Operational Governance & Change Risk Management

  • Serve as a final technical authority for high-risk SOPs, MOPs, EOPs, and operational change packages.
  • Perform system-level risk assessments for planned work, incidents, and abnormal operating conditions.
  • Guide site teams in CMMS data integrity, work management maturity, and adherence to approved operating procedures.
  • Lead or oversee complex reliability investigations involving multiple systems, teams, or contributing factors.

Advanced Analytics & Condition Monitoring

  • Design and mature predictive condition-monitoring programs across the portfolio (oil analysis, thermography, vibration, battery monitoring, controls analytics).
  • Develop and interpret leading reliability indicators and degradation trends to anticipate failures before impact.
  • Apply statistical analysis, reliability modeling, and engineering judgment to evaluate failure likelihood and consequence.
  • Translate analytical insights into strategic maintenance, operational mitigations, or capital recommendations.

Critical Spares & Lifecycle Strategy

  • Define and govern enterprise critical spares strategies, accounting for supplier risk, lead times, and system exposure.
  • Identify systemic spares gaps and drive remediation plans in partnership with Supply Chain and Operations.
  • Lead lifecycle asset assessments to guide long-range capital planning and replacement prioritization.
  • Provide data-driven input to business cases supporting capital investments and infrastructure upgrades.

Incident Leadership, RCA & Continuous Improvement

  • Lead high-impact post-incident RCAs and FMEAs, ensuring depth of analysis beyond proximate causes.
  • Identify and address latent design, procedural, and organizational contributors to reliability events.
  • Ensure lessons learned result in durable changes to standards, procedures, maintenance strategies, or training.
  • Champion continuous improvement initiatives that measurably reduce risk and failure recurrence across sites.

Technical Leadership & Capability Development

  • Act as a mentor and technical escalation point for Reliability Engineers, site engineers, and CE leaders.
  • Coach teams on reliability methods, risk-based decision-making, and interpretation of condition-monitoring data.
  • Influence and evolve enterprise reliability standards, playbooks, and operating philosophies.
  • Partner with leadership to strengthen operator certification, training rigor, and operational discipline.

Qualifications:

  • 10+ years of experience in reliability engineering, maintenance engineering, or facilities engineering within mission-critical environments.
  • Demonstrated leadership of complex, multi-system reliability programs with measurable business impact.
  • Expert-level knowledge of RCM, FMEA, RCA, and maintenance optimization methodologies.
  • Deep technical understanding of mission-critical infrastructure, including UPS, generators, switchgear, chillers, cooling towers, CRAH/CRAC, and BMS/EPMS.
  • Proven experience governing SOP/MOP/EOP programs and assessing operational change risk in live environments.
  • Advanced ability to analyze condition-monitoring, CMMS, and operational datasets and convert insights into strategic actions.
  • Proficiency in data analysis and visualization tools (Excel, Power BI, or similar).
  • Ability to apply statistical techniques or reliability modeling to support risk-informed decision-making under uncertainty.
  • Strong executive-level communication skills; able to influence senior leaders and defend technical positions.

Preferred Experience:

  • Experience designing and scaling enterprise critical spares and lifecycle asset management programs.
  • Hands-on experience with predictive analytics, failure modeling, or reliability simulations.
  • Proficiency with Python, R, or similar tools for advanced reliability analytics.
  • Working knowledge of SQL or other data query languages.
  • Strong familiarity with NFPA, IEEE, ASHRAE, and other relevant codes and standards.
  • Experience presenting reliability risk, capital tradeoffs, and investment recommendations to executive audiences.

Education & Certifications:

  • Bachelor’s degree in Mechanical, Electrical, or Industrial Engineering (or equivalent experience).
  • Preferred: CMRP, CRE, or similar advanced reliability or maintenance certification.

Work Conditions:

  • Supports 24×7 mission-critical operations; participates in on-call rotation and may support after-hours events.
  • Ability to work safely in energized environments in compliance with LOTO and NFPA 70E.
  • Travel to supported sites approximately 25%.

Salary range: $140,000-$170,000

CyrusOne is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, or other legally protected status.

CyrusOne provides reasonable accommodation for qualified individuals with disabilities in accordance with the Americans with Disabilities Act (ADA) and any other state or local laws. We will respond to requests for reasonable accommodations to assist you in applying for positions at CyrusOne, or to submit a resume.

Top Skills

Bms
Chillers
Cooling Towers
Crac
Crah
Epms
Excel
Fmea
Generators
Power BI
Predictive Analytics
Python
R
Rca
Rcm
SQL
Switchgear
Ups

Similar Jobs

4 Hours Ago
Remote or Hybrid
Los Angeles, CA, USA
130K-160K Annually
Senior level
130K-160K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
The Unified Communication Engineer manages and improves telecom systems, provides technical support, and integrates new UC technologies while ensuring stability of voice networks.
Top Skills: AWSCiscoMicrosoftUcs ServersVcenterVMwareVoipZoom
Yesterday
Easy Apply
Remote
USA
Easy Apply
181K-212K Annually
Senior level
181K-212K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Seeking a Senior Site Reliability Engineer to enhance software reliability, automate systems, and mentor engineering teams in reliability practices. Requires strong skills in system design, coding, and observability, along with at least 6 years of software engineering experience.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
7 Days Ago
Remote
United States of America
153K-205K Annually
Senior level
153K-205K Annually
Senior level
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Senior Site Reliability Engineer manages production infrastructure, ensuring performance and reliability using AI tools, Kubernetes, and CI/CD pipelines while mentoring teams.
Top Skills: Apache AirflowAWSAws LambdaAzureChatgptCi/CdCrossplaneGCPGeminiGithub CopilotGoKubernetesOpensearchPostgresPythonRedisSnowflakeTerraform

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account