Climavision Jobs

Senior Site Reliability Engineer (C#, .NET)

Climavision

Senior Site Reliability Engineer (C#, .NET)

Posted Yesterday

Remote

Hiring Remotely in United States

135K-170K Annually

Senior level

Remote

Hiring Remotely in United States

135K-170K Annually

Senior level

Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.

The summary above was generated by AI

Senior Site Reliability Engineer

Remote | US

(EST Preferred)

About Climavision   

At Climavision, we’re rebuilding climate technology from the ground up and changing the way we see weather. We merge the power of a proprietary, high-resolution weather radar and satellite network with advanced weather prediction modelling and decades of industry expertise to reduce existing coverage gaps and drastically improve forecasting ability. Our revolutionary new approach to climate technology weather solutions is poised to help reduce the economic risks of climate change on companies, governments, and societies alike. We are backed by The Rise Fund, the world’s largest global impact platform committed to achieving measurable, positive social and environmental outcomes alongside competitive financial returns. Climavision is headquartered in Louisville, KY, with research and development operations in Raleigh, NC.      

The Work   

Are you an experienced Site Reliability Engineer who thrives at the intersection of software engineering and production operations? Do you take pride in keeping mission-critical customer systems reliable under real-world operational pressure? Are you looking for an opportunity to own production reliability for a modern hybrid infrastructure platform spanning cloud, colocation, and edge environments?

If so, we have an exceptional opportunity for you. 

Climavision is seeking a Senior Site Reliability Engineer to contribute towards reliability, operational excellence, and production resilience for our customer-facing platform and weather data services. This role is focused on ensuring our systems consistently meet demanding customer SLAs, including a 99.5% availability commitment for radar-derived data services. A central focus of this role is establishing multi-replica and multi-cluster high availability across our .NET services, including hands-on refactoring of C# code to make services safe to run as multiple instances and across clusters. 

This is a hands-on engineering role for someone who is equally comfortable debugging production .NET services, troubleshooting Kubernetes clusters, leading incident response, and improving operational maturity across the organization. The successful candidate will combine strong software engineering experience in C# / .NET with deep production operations expertise and a disciplined approach to reliability engineering. 

Climavision operates a hybrid infrastructure footprint spanning Microsoft Azure, colocation data centers, and edge Kubernetes clusters, deployed alongside weather radar systems. This role will drive production reliability across Azure, colocation, and edge environments. 

35% Production Reliability Engineering  

30% Application Reliability & .NET Service Architecture 

20% Kubernetes Platform Reliability/Operations 

15% Observability, Automation, and Operational Excellence 

Primary Responsibilities:

Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments. 
Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability. 
Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis. 
Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems. 
Drive multi-replica and multi-cluster high availability across Climavision’s .NET services. This includes working directly in the C# codebase to refactor services that are not currently safe to run as multiple replicas, addressing in-process state, sticky scheduling assumptions, non-idempotent operations, race conditions, and other patterns that prevent safe horizontal scaling, so that services can be deployed with multiple replicas, across multiple clusters, for high availability. 
Contribute to the multi-cluster high-availability strategy across Climavision’s hybrid fleet, including active-active and active-passive failover behavior, traffic routing, data replication considerations, and graceful degradation when a cluster becomes unavailable. 
Operate and improve Climavision’s self-managed Kubernetes platform spanning cloud-hosted, colocation, and edge clusters with a focus on availability, resiliency, recovery and operational performance 
Ensure Kubernetes platform lifecycle activities including upgrades, patching, cluster health, node management, and production change management, are executed in a manner that preserves service availability and minimizes customer-facing risk 
Improve reliability and operational maturity of production platform services, including observability, autoscaling, ingress, and distributed storage. Partner with the teams responsible for the underlying networking and security primitives rather than owning those areas directly. 
Design and validate Kubernetes workloads for resiliency, scalability, and operational efficiency, including autoscaling behavior, workload placement, resource management, and graceful degradation strategies. 
Read, debug, and contribute production-quality C# / .NET code focused on reliability improvements, multi-replica safety, instrumentation, operational tooling, and performance optimization. 
Partner with software engineering teams to improve production readiness, resiliency patterns, deployment safety, and operational visibility before services reach production. Champion multi-replica-safe design patterns as new services are built. 
Maintain and improve deployment pipelines, Helm charts, Kubernetes manifests, and infrastructure automation supporting safe and repeatable production releases. 
Support and evolve Climavision’s observability platform, including metrics, logging, distributed tracing, dashboarding, and alerting. 
Conduct performance engineering and capacity-planning efforts for customer-facing services during peak weather-event demand. 
Help facilitate blameless postmortem reviews and drive operational follow-up items through completion. 
Improve disaster recovery, failover, and business continuity capabilities across cloud, colocation, and edge environments. 
Drive operational excellence initiatives, including automation, reduction of operational toil, game days, production readiness reviews, and reliability best practices. 
Contribute as a senior technical resource and mentor on reliability engineering and production operations practices.

On-Call Expectation:

Climavision operates customer-facing production systems under contractual SLAs that do not pause outside business hours. The Senior Site Reliability Engineer will participate in a primary on-call rotation, taking one full week of primary on-call duty at a time. During the on-call week, the engineer is expected to be reachable and able to actively respond to production incidents and pages 24 hours a day, 7 days a week, including nights, weekends, and holidays. This includes: 

Acknowledging pages and incidents posted to the DevOps Support channel within the established response-time SLO, regardless of the hour the page is received. 
Driving the incident to mitigation or resolution before stepping away, including engaging additional engineers when appropriate. 
Maintaining reliable connectivity (laptop, network, paging device) and personal availability for the full duration of the rotation week. 
Planning personal time, travel, and other commitments around the published on-call rotation, and arranging documented coverage swaps in advance when conflicts are unavoidable. 
Owning written incident handoffs at the end of the rotation and authoring postmortems for incidents that occurred during the week.

Candidates who are not able or willing to meet this on-call standard should not apply.  

Qualifications

A bachelor’s degree in computer science, software engineering, or a related field; equivalent professional experience considered. 

Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role, with at least 4 years in a role formally titled Site Reliability Engineer or carrying explicit SLO / error-budget accountability. 
Strong, hands-on software engineering experience with a minimum of 3 years of experience supporting and modifying C# / .NET applications in production environments. Candidates without production C# / .NET development experience will not be considered for this role; this is a non-negotiable requirement driven by the technology stack of Climavision’s products. 
Demonstrated experience refactoring production application code (preferably C# / .NET) to make services horizontally scalable across multiple replicas: removing in-process state, ensuring idempotency, handling concurrent execution safely, and enabling safe deployment of multiple instances of a service. 
Experience designing or operating multi-cluster high-availability architectures, including failover behavior, traffic routing, and cross-cluster service deployment. 
Experience supporting customer-facing production systems with uptime, reliability, and incident-response responsibilities. 
Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments. 
Experience diagnosing and resolving production incidents across application, platform and Kubernetes infrastructure layers, including workload scheduling, storage, ingress, and cluster-level failures. 
Experience operating Kubernetes outside of strictly managed cloud environments, including bare-metal, colocation, edge, or hybrid infrastructure. 
Experience with Kubernetes operational tooling and ecosystem technologies such as Rancher, Helm, autoscaling frameworks, observability stacks, or distributed storage systems. 
Strong understanding of infrastructure automation and Infrastructure as Code concepts using tools such as Terraform and Ansible. 
Experience supporting CI/CD and production deployment pipelines. Experience with Octopus Deploy is strongly preferred. 
Experience with monitoring, logging, and observability platforms such as DataDog, Prometheus, Grafana, Loki, OpenTelemetry, or comparable technologies. 
Experience operating distributed systems and microservice-based architectures in production environments. 
Working knowledge of Microsoft Azure infrastructure. 
Strong troubleshooting skills across infrastructure, application, and platform layers. 
Demonstrated experience participating in a structured production on-call rotation supporting business-critical systems. 
Strong written and verbal communication skills, including incident documentation and postmortem authoring. 
Experience working in start-up, scale-up, or other fast-moving engineering environments.

Preferred experience: 

Experience operating Kubernetes platforms using RKE2 and Rancher. 
Experience supporting hybrid cloud and colocation infrastructure environments. 
Experience with service mesh technologies such as Istio. 
Experience with Kubernetes-native storage platforms such as Longhorn. 
Experience operating PostgreSQL or PostGIS in Kubernetes environments. 
Experience with distributed messaging systems such as RabbitMQ or NATS. 
Experience supporting GPU-enabled workloads in Kubernetes. 
Familiarity with reliability engineering practices, including SLIs, SLOs, error budgets, and operational maturity metrics.

Physical Demands & Work Environment:   

This is a full-time, exempt position.   
Fully Remote - United States. Eastern Timezone will have preference
This job requires frequent use of a computer to complete tasks, attend meetings, and communicate via Microsoft Teams.

Once you land this position, you’ll get to enjoy:   

Benefits of a dynamic and growing organization   
A challenging, hands-on role that will have real impact on the business   
Competitive compensation  
Comprehensive benefits package 
401(k) Savings Plan  
Medical/Dental/Vision Benefits  
Health Savings Account (HSA) and Flexible Spending Account (FSA) 
Unlimited Paid Time-off  
11 Paid Holidays 
Paid Parental Leave  
Company Paid Short-term Disability (STD) 
Company Paid Long-term Disability (LTD) 
Company Paid Life Insurance

The salary range for this position is $135-170k annually, however Climavision considers several factors when extending an offer of employment including but not limited to, the applicant’s education, experience, the responsibilities of the role, training, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Any offer of employment is contingent on completion of a background check to company standard. Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice. 

Climavision is an equal opportunity employer. All aspects of employment including the decision to hire, promote, discipline, or discharge, will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.

Similar Jobs

DraftKings

Regulatory Compliance Director, Predictions

28 Minutes Ago

Remote or Hybrid

United States

160K-200K Annually

Expert/Leader

160K-200K Annually

Expert/Leader

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Lead regulatory programs for DraftKings prediction markets, manage filings and investigations with the CFTC and other regulators, oversee reporting and compliance monitoring, partner with cross-functional teams to identify and mitigate compliance risks, advise on new products, and represent the company to regulators.

NetBox Labs

Solutions Engineer

34 Minutes Ago

Remote

130K-150K Annually

Mid level

130K-150K Annually

Mid level

Cloud • Software

Drive new logo revenue by qualifying prospects, running technical demos and proofs-of-concept, and creating pipeline alongside Sales. Research customer use cases, translate technical value, execute outreach campaigns, and maintain accurate CRM (HubSpot) data to support deal progression and enable sales strategy.

Top Skills: AnsibleChatgptClaudeHubspotLinkedin NavigatorNetboxPythonSalesloftSeamless.Ai

SoFi

AI Financial Planning Analyst

46 Minutes Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

86K-162K Annually

Senior level

86K-162K Annually

Senior level

Fintech • Mobile • Software • Financial Services

Evaluate and validate AI Coach chat outputs for accuracy and risk, act as subject-matter expert in personal financial planning, identify enhancements, escalate issues to Ops/Product/Compliance, and contribute reusable solution libraries to improve member experience.

Top Skills: Ai ToolsChatbotsJIRALlms

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center