NVIDIA Logo

NVIDIA

Senior Deep Learning Frameworks CUDA Software Engineer

Reposted 9 Days Ago
Be an Early Applicant
In-Office or Remote
2 Locations
184K-357K Annually
Expert/Leader
In-Office or Remote
2 Locations
184K-357K Annually
Expert/Leader
The role involves integrating CUDA features into AI frameworks, optimizing performance, and collaborating with teams on AI model development. Responsibilities include design, analysis, and enhancement of AI tools and frameworks.
The summary above was generated by AI

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

We are looking for a motivated Deep Learning engineer to bring advanced CUDA features and Distributed Runtime technologies into AI stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will be working with the team that created core CUDA features and runtimes for scaling Deep Learning and HPC applications. Your customers will have diverse multi-GPU demands, ranging from training on scales up to 100K GPUs to inference down at microsecond latency. CUDA features improve both productivity and performance of AI applications. Your work in AI toolkits will accelerate enabling those for the community. This is an outstanding opportunity for someone with an AI background to advance the state of the art in this space. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision?

What you will be doing:

  • Integrate new CUDA features and Runtime abstractions in AI frameworks: from PoC to performance analysis to production

  • Perform deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate in the lower layers of the stack. Collaborate hands-on with teams working on the latest AI models.

  • Own and drive improvements in the AI Compiler-Runtime interface to build speed-of-light multi-GPU multi-node solutions.

  • Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.

  • Influence the roadmap of core CUDA to facilitate building next-gen DL frameworks.

  • Collaborate with a very dynamic team across multiple time zones.

  • Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and frameworks that enhance performance and programmability.

  • Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.

  • Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

What we need to see:

  • BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).

  • 8+ years of relevant industry experience or equivalent academic experience after completed degree.

  • Development experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang

  • Rapid prototyping and development with Python, C++, CUDA or related DSLs 

  • Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)

  • Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)

  • Understanding of HPC/AI communication concepts 

  • Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)

  • Adaptability and passion to learn new frameworks and tools

  • Flexibility to work and communicate effectively across different teams and timezones

Ways to stand out from the crowd:

  • Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.).

  • Hands-on experience with CUDA, specific communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism).

  • Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). 

  • Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile)

  • Experience with programming for compute & communication overlap in distributed runtime

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 20, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

10 Minutes Ago
Remote
US
110K-130K Annually
Mid level
110K-130K Annually
Mid level
Big Data • Cloud • Fintech • Professional Services • Software
Serve as primary contact for strategic mortgage customers—onboard, drive adoption, manage renewals, and support upsells. Monitor KPIs and customer health, troubleshoot API issues, escalate complex support cases, and build repeatable success processes to reduce churn and scale retention.
Top Skills: Ai ToolsAPIsLoan Origination SystemsPoint-Of-Sale
24 Minutes Ago
Remote
USA
Entry level
Entry level
Insurance • Financial Services
Remote role guiding aspiring insurance agents through state licensing: manage a 150-200 person pipeline, maintain notes in MS Access, communicate licensing requirements, make outbound/inbound calls, provide progress updates to agency partners, and coordinate changes with supervisors and management.
Top Skills: ExcelMicrosoft AccessMS Office
2 Hours Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
119K-160K Annually
Mid level
119K-160K Annually
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Provide end-to-end commercial litigation support, advise on subpoenas and customer data privacy, manage eDiscovery lifecycle with automation/AI, mitigate and resolve disputes, drive process and technology-enabled innovation, and deliver actionable legal insights to cross-functional stakeholders.
Top Skills: AIEdiscoveryInternet Of Things (Iot)Tofu

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account