Galaxy Jobs

Vice President Site Reliability Engineering (Data Centers)

Galaxy

Vice President Site Reliability Engineering (Data Centers)

Posted Yesterday

Be an Early Applicant

Remote

Hiring Remotely in USA

Senior level

Remote

Hiring Remotely in USA

Senior level

Lead and manage an SRE team focused on infrastructure automation, governance, configuration management, monitoring, and custom tooling while providing mentorship and technical guidance.

The summary above was generated by AI

Who We Are:
Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. We believe that blockchain and digital asset innovation will transform how value moves through the world – and we’re building the products and services to make that future a reality.

Our institutional digital assets platform spans trading, investment banking, asset management, staking, self-custody, and tokenization technology. We also invest in and operate cutting-edge data center infrastructure to power AI and high-performance computing, addressing the growing demand for scalable energy and compute in the U.S.

We work at the intersection of finance and technology, helping institutions, startups, and developers navigate a digitally native economy. Led by CEO and Founder Michael Novogratz, our team blends deep crypto expertise with institutional experience and a shared commitment to shaping the future of Web3 and AI.

Galaxy is headquartered in New York City, with offices across North America, Europe, the Middle East, and Asia.

To learn more about our businesses and products, visit www.galaxy.com.

What We Value:

We are a diverse team of free thinkers, and fast movers united to help investors and creators energize the global economy. We are looking for individuals who thrive in a culture of builders and overachievers and embrace high performance, transparent feedback, and a mission-first approach. Our culture shapes our way of working and gets us where we want to be.

Seek Excellence.
Be Selective To Be Effective.
Be Highly Aligned, Loosely Coupled.
Disagree Transparently.
Encourage Independent Decision-Making.
Build Dream Teams.

Who You Are

A collaborative and strategic leader with deep hands-on experience in Site Reliability Engineering (SRE) and infrastructure Automation. You are comfortable steering the vision for an enterprise automation roadmap while remaining technical enough to dive into the code. You treat infrastructure as a product, ensuring that your automation workflows are as reliable as the services they deploy. You have a proven track record of managing complex hybrid environments and are proactive in building self-service platforms that enhance engineering velocity and system stability.

Responsibilities

Automation Platform Leadership: Oversee a specialized SRE team focused on the design, deployment, and maintenance of automation toolsets as well as the systems they interact with.
Infrastructure as Code (IaC) Governance: Establish and enforce standards for IaC to ensure consistent, repeatable, and secure deployments across an entire infrastructure ecosystem. Strong proficiency in Terraform is required.
Configuration Management: Lead the strategy for automated configuration and state management, ensuring Ansible playbooks and Packer image pipelines are optimized for both Windows, Linux, and ESXi Platforms.
Monitoring & Observability: Manage the monitoring and health of the automation platforms themselves. Implement SLIs/SLOs to ensure the "tools that build the servers" are highly available and performant.
Lifecycle Management: Drive the automated lifecycle of both physical and virtual assets, from initial template creation/deployment to automated patching, scaling, and decommissioning.
Custom Tooling & Scripting: Lead the development of custom scripts and internal providers (Python, Go, PowerShell, Bash) to provide better insights and tooling for our systems.
Collaboration: Outside of the automation team you will need to be able to collaborate and foster workflows alongside the rest of the Datacenter team and be able to facilitate needs for the team as a whole.
Capacity & Performance: Analyze system behavior and resource utilization in virtual environments to optimize the performance of automated deployments.
Mentorship & Growth: Provide technical guidance and career mentorship to SREs, fostering a culture of "automate-first" and continuous improvement.

Requirements

6-10 years’ experience in Infrastructure, SRE or DevOps, specifically focused on infrastructure automation at scale.
Deep proficiency with Terraform (providers, modules, state management) and Ansible (roles, playbooks, Tower/AWX).
Hands-on experience with Image Creation (i.e. Packer, Ansible, SCCM) to build standardized, hardened images for both Windows and Linux in hybrid environments.
Strong experience managing and automating virtual platforms such as VMware (vSphere/vCenter) as well as Cloud providers such as Azure and AWS.
High-level scripting skills in mediums such as Python, Go, PowerShell, and Bash.
Experience with observability tools (Splunk, ELK, Prometheus, or Grafana) to monitor infrastructure health and automation telemetry.
Good understanding of Network topology and design as well as experience with platforms such as Juniper Networks or Palo Alto.
Strong mastery of Git (branching strategies, PR workflows) and CI/CD platforms (Jenkins, GitLab CI, or GitHub Actions).
Equal comfort managing, troubleshooting, and tuning performance for both Windows Server and Linux.

Nice to Have

Previous work experience includes notable periods of team leadership and or management.
Experience with IAM platforms such as Entra ID, Active Directory, and Okta.
Experience with Storage solutions both block based and object based hosted either on-prem (HP Alletra, EMC, DDN) or in cloud (S3, Azure Blob).
Storage Backup/DR administration and management with Commvault or Veeam.

Galaxy respects diversity and seeks to provide equal employment opportunities to all employees and job applicants for employment without regard to actual or perceived age, race, color, creed, religion, sex or gender (including pregnancy, childbirth, lactation and related medical conditions), gender identity or gender expression (including transgender status), sexual orientation, marital or partnership or caregiver status, ancestry, national origin, citizenship status, disability, military or veteran status, protected medical condition as defined by applicable state or local law, genetic information or predisposing genetic characteristic, or other characteristic protected by applicable federal, state, or local laws and ordinances.

We will endeavor to make a reasonable accommodation to the known limitations of a qualified applicant with a disability unless the accommodation would impose an undue hardship on the operation of our business. If you believe you require such assistance to complete the application process or to participate in an interview, please contact [email protected].

Similar Jobs

Zscaler

Staff SWE, Provisioning Platform

21 Minutes Ago

Easy Apply

Remote or Hybrid

Easy Apply

130K-185K Annually

Senior level

130K-185K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

The Staff Escalation DevOps Engineer will resolve cloud incidents, enhance system monitoring, collaborate on fixes, and lead service support. They will utilize strong troubleshooting skills and various cloud technologies to improve service reliability.

Top Skills: AnsibleAWSAzureBashGCPGrafanaIcmpJavaKlodfuseMySQLOauthPythonSAMLTcp/IpTerraformUdp

Micron Technology

Principal Engineer

24 Minutes Ago

In-Office or Remote

Massachusetts, USA

134K-247K Annually

Senior level

134K-247K Annually

Senior level

Artificial Intelligence • Hardware • Information Technology • Machine Learning

The Staff Engineer for SSD Validation leads system-level testing, oversees validation strategies, and collaborates with various teams to ensure SSD products are prepared for production and meet customer qualifications.

Top Skills: LinuxNvmePciePython

Circle

AI Systems & Automation Manager, Marketing

25 Minutes Ago

In-Office or Remote

Austin, TX, USA

140K-185K Annually

Senior level

140K-185K Annually

Senior level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

This role involves designing and deploying AI-powered marketing solutions, integrating systems, writing code, and collaborating cross-functionally to enhance workflows.

Top Skills: APIsClaudeCodexHubspotJavaScriptOpenaiPythonSalesforce

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center