NVIDIA Logo

NVIDIA

Senior HPC Scheduler Engineer

Posted 12 Days Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Santa Clara, CA
224K-357K Annually
Senior level
In-Office or Remote
Hiring Remotely in Santa Clara, CA
224K-357K Annually
Senior level
You will design and optimize scheduling and resource management for large cluster systems, working with various teams and improving performance through testing and evaluation.
The summary above was generated by AI

NVIDIA has been continually redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an outstanding legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPUs act as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you will be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

We are looking for an outstanding HPC scheduler/resource manager engineer for architecture, deployment and performance optimization of large datacenter cluster systems and applications. Be a key player to the most exciting computing hardware and software to driving the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design for collecting, visualizing, and acting on a wide variety of data. You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft state of the art scheduling strategies which enable new systems and interconnects. You will interact with OS, GPU compute, and networking specialists to envision, develop and bring up large scale systems.

What you’ll be doing:

  • Provide engineering solutions and prototypes to enable efficient resource management and job scheduling for large scale clusters.

  • Drive next generation requirements and features for schedulers in at scale clusters

  • Ensure technical relationships with internal and external engineering teams.

  • Assist system architects and machine learning/deep learning engineers in building creative solutions based on NVIDIA technology.

  • Be an internal reference for scheduling and resource management concepts and methodologies among the NVIDIA technical community

  • Test, evaluate, and benchmark new technologies and products and work with vendors, partners and peers to improve functionality and optimize performance.

What we need to see:

  • BS, MS, or PhD in Engineering, Mathematics, Physics, Computer Science, or equivalent experience

  • 12+ years of experience designing and running scheduling and resource management systems in large datacenter/AI/HPC solutions.

  • Knowledge and experience with resource management / scheduling code bases: SLURM preferred, other implementations (LSF, SGE, Torque...).

  • Proven understanding of performance clusters, infrastructure and workload patterns.

  • Experience using and installing Linux-based server platforms.

  • C/Python/Bash/Lua programming/scripting experience.

  • Experience working with engineering or academic research community supporting HPC or deep learning.

  • Strong teamwork and both verbal and written communication skills.

Ways to stand out from the crowd:

  • Experience with HPC cluster administration for AI.

  • Experience deploying containerized services.

  • Experience with orchestrators (e.g. Kubernetes).

  • Demonstrated work with Open-Source software: building, debugging, patching and contributing code.

  • Experience tuning memory, storage, and networking settings for performance on Linux systems.

  • Exposure to monitoring and telemetry systems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until March 8, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#deeplearning

Top Skills

Bash
C
Kubernetes
Linux
Lsf
Lua
Python
Sge
Slurm
Torque

Similar Jobs

An Hour Ago
In-Office or Remote
3 Locations
175K-225K Annually
Expert/Leader
175K-225K Annually
Expert/Leader
eCommerce • Mobile
The Associate Creative Director will lead creative strategy and execution for events at Whatnot, oversee design teams, manage projects, and develop marketing campaigns across various platforms.
An Hour Ago
In-Office or Remote
4 Locations
170K-230K Annually
Senior level
170K-230K Annually
Senior level
eCommerce • Mobile
Design and develop trust and risk systems for Whatnot's marketplace. Monitor user impact, respond to threats in real-time, and iterate defenses using machine learning and behavioral analysis.
Top Skills: DagsterDbtPythonSnowflake
An Hour Ago
Remote
United States
200K-250K Annually
Expert/Leader
200K-250K Annually
Expert/Leader
Artificial Intelligence • Edtech • Machine Learning • Natural Language Processing • Social Impact
Lead end-to-end finance operations for a growth-stage SaaS company, overseeing accounting, FP&A, revenue recognition, cash and liquidity planning, compliance, and auditing. Partner with CEO and cross-functional leaders on strategy, pricing, HR/PeopleOps and RevOps, build financial models, KPIs, and dashboards, and scale finance processes and team members to support rapid growth.

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account