Arm Logo

Arm

HPC Engineer

Reposted 21 Days Ago
Be an Early Applicant
Hybrid
Austin, TX, USA
130K-176K Annually
Mid level
Hybrid
Austin, TX, USA
130K-176K Annually
Mid level
The HPC Engineer will operate and enhance Arm's HPC platforms, focusing on reliability, automation, and user experience while collaborating with engineering and infrastructure teams.
The summary above was generated by AI
Job Overview
Engineering IT provides the high-performance compute platforms that enable Arm's engineering teams to design, verify, and deliver world-class products. The team operates a mix of on-premises and cloud-based HPC environments, EDA enablement services, job scheduling platforms, automation tooling, and custom workflows that are critical to engineering productivity across Arm.
We are looking for an HPC Operations Engineer to help run, improve, and modernize these services. This role combines production operations, site reliability engineering, automation, cloud integration, and close collaboration with engineering users and infrastructure teams.
Responsibilities
  • Operate, support, and continuously improve Arm's HPC platforms, with a solid focus on IBM Spectrum LSF and related job scheduling services.
  • Improve reliability, scalability, performance, and operational efficiency through automation, observability, standardization, and SRE practices.
  • Develop automation and self-service capabilities to reduce manual operational effort and improve the user experience.
  • Support production HPC environments, including incident response, solve, root cause analysis, service restoration, and continuous improvement.
  • Work directly with engineering users to improve job scheduling behavior, workload performance, resource utilization, and platform efficiency.
  • Develop and maintain scripts, tools, and automation frameworks using Python, Bash, and related technologies.
  • Support modernization initiatives involving containers, Kubernetes, Docker, cloud-native services, Infrastructure as Code, and alternative scheduling or orchestration technologies.
  • Contribute to cloud HPC integration across AWS, GCP, Azure, OpenStack, and hybrid environments.
  • Collaborate with platform, cloud, storage, infrastructure, networking, and security teams to deliver robust engineering services.
  • Contribute to project delivery by working with technical leads, architects, project managers, and operational team members.
  • Help define and promote standards for DevOps, SRE, platform engineering, CI/CD, monitoring, and infrastructure automation.

Required Skills and Experience
  • Experience operating HPC environments and job schedulers such as IBM Spectrum LSF, Slurm, PBS, Grid Engine, or similar.
  • Strong Linux system administration experience, preferably with RHEL or RHEL-based distributions.
  • Good scripting and automation skills using Python, Bash, Shell, or similar languages.
  • Experience supporting production infrastructure, including incident management, solve, operational recovery, and conducting RCA or comparable experience.
  • Familiarity with monitoring, alerting, and observability platforms such as Dynatrace, Prometheus, Grafana, or similar.
  • Experience building, maintaining, or supporting CI/CD pipelines and automation frameworks.
  • Experience with public, private, or hybrid cloud platforms, including AWS, GCP, Azure, OpenStack, and Kubernetes-based services.
  • Understanding of DevOps, SRE, platform engineering, infrastructure automation, and operational excellence principles.
  • Familiarity with Agile delivery practices and collaboration tools such as Jira and Confluence.
  • Ability to work with engineering users, understand workload requirements, and translate operational issues into practical improvements.

Desirable Experience
  • Experience working in EDA or semiconductor engineering environments.
  • Familiarity with EDA tools, license-aware scheduling, large-scale batch workloads, and engineering compute workflows.
  • Exposure to container platforms and orchestration technologies such as Docker, Kubernetes, and Kubernetes-native scheduling.
  • Experience with Infrastructure as Code tools such as Terraform and Ansible.
  • Exposure to alternative schedulers such as Slurm or cloud-native workload orchestration systems.
  • Experience using AI-assisted tooling, MCP, agentic services, or automation agents to improve diagnostics, operations, optimization, or self-service support.
  • Experience operating large-scale distributed systems across both on-premises and cloud infrastructure.

Salary Range:
$130,100-$176,000 per year
We value people as individuals and our dedication is to reward people competitively and equitably for the work they do and the skills and experience they bring to Arm. Salary is only one component of Arm's offering. The total reward package will be shared with candidates during the recruitment and selection process.
Accommodations at Arm
At Arm, we want to build extraordinary teams. If you need an adjustment or an accommodation during the recruitment process, please email [email protected] . To note, by sending us the requested information, you consent to its use by Arm to arrange for appropriate accommodations. All accommodation or adjustment requests will be treated with confidentiality, and information concerning these requests will only be disclosed as necessary to provide the accommodation. Although this is not an exhaustive list, examples of support include breaks between interviews, having documents read aloud, or office accessibility. Please email us about anything we can do to accommodate you during the recruitment process.
Hybrid Working at Arm
Arm's approach to hybrid working is designed to create a working environment that supports both high performance and personal wellbeing. We believe in bringing people together face to face to enable us to work at pace, whilst recognizing the value of flexibility. Within that framework, we empower groups/teams to determine their own hybrid working patterns, depending on the work and the team's needs. Details of what this means for each role will be shared upon application. In some cases, the flexibility we can offer is limited by local legal, regulatory, tax, or other considerations, and where this is the case, we will collaborate with you to find the best solution. Please talk to us to find out more about what this could look like for you.
Equal Opportunities at Arm
Arm is an equal opportunity employer, committed to providing an environment of mutual respect where equal opportunities are available to all applicants and colleagues. We are a diverse organization of dedicated and innovative individuals, and don't discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Arm Austin, Texas, USA Office

5707 Southwest Parkway , Austin, TX, United States, 78735

Similar Jobs at Arm

2 Hours Ago
Hybrid
Austin, TX, USA
198K-268K Annually
Mid level
198K-268K Annually
Mid level
Artificial Intelligence • Internet of Things • Semiconductor
The role involves supporting the delivery of soft IP, optimizing designs, ensuring partner success, and developing methodologies for implementation analysis.
Top Skills: CC++CSSHTMLJavaJavaScriptPerlPHPPythonRSQLTclVerilog
Yesterday
Hybrid
Austin, TX, USA
128K-174K Annually
Senior level
128K-174K Annually
Senior level
Artificial Intelligence • Internet of Things • Semiconductor
Lead response to high‑priority security incidents, coordinate investigations with internal teams, providers and law enforcement, improve IR and forensics capabilities across Windows, Linux, Mac, network and cloud environments, run post‑incident reviews and root cause analysis, contribute to IR policies and detection tuning, and support readiness exercises and incident planning.
Top Skills: CloudEdrForensics ToolsIr ManagementLinuxmacOSSIEMWindows
2 Days Ago
Hybrid
Austin, TX, USA
309K-418K Annually
Senior level
309K-418K Annually
Senior level
Artificial Intelligence • Internet of Things • Semiconductor
Lead definition of system and SoC architectures for infrastructure markets, driving topology, security, boot, IO and memory designs. Guide cross-team development, review microarchitectural specs and test plans, support firmware/middleware/OS integration, define performance and power targets, and work with performance analysis to meet latency, bandwidth, QoS, and next-generation memory goals.
Top Skills: ArmCachesCpu MicroarchitectureFirmwareMemory HierarchyMiddlewareOperating SystemsPciePerformance ModelingQuality Of Service (Qos)SataSoc

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account