Avride Logo

Avride

Software Engineer – ML Platform

Reposted 3 Days Ago
In-Office
Austin, TX, USA
Mid level
In-Office
Austin, TX, USA
Mid level
The ML Platform Engineer will build scalable architecture for ML training workloads, optimize system performance, and collaborate with teams for enhanced efficiency on Kubernetes.
The summary above was generated by AI
About the team

The ML Platform team at Avride builds the infrastructure that powers large-scale ML training and data processing for autonomous driving. We sit between Cloud Platform and ML engineers, turning low-level compute, storage, and networking primitives into an ML platform that teams actually use — scalable orchestration, distributed compute, and production-grade tooling for the full model lifecycle.

About the role

As an ML Platform Engineer at Avride, you'll own critical pieces of the ML stack: workflow orchestration, distributed execution, resource governance, performance.You will shape how ML teams across the company run experiments and train models at scale. You will build the abstractions and services that make training workloads reliable, cost-efficient, and fast, helping ML teams run at scale on Kubernetes with strong reliability and excellent developer experience.

What you will do
  • Build and scale our ML compute platform on Kubernetes, using Argo Workflows for training, evaluation, and data processing orchestration
  • Design and implement core platform capabilities, including a Ray-based internal SDK for distributed execution, and multi-tenant resource governance — scheduling, priorities, quotas, and policy enforcement across GPU, CPU, memory, and IO
  • Improve end-to-end training throughput and platform efficiency by optimizing data access patterns, caching, and removing bottlenecks in storage, network, and resource contention
  • Work directly with ML teams to debug complex workload issues, drive root-cause analysis, and turn recurring problems into platform-level fixes
  • Evaluate, integrate and extend open-source tooling (Argo Workflows, Ray, Kubernetes ecosystem) to meet evolving platform needs
What you will need
  • Strong proficiency in Python or Go; C++ is a plus
  • Track record of designing and building scalable, maintainable systems and services
  • Experience operating production services end-to-end: APIs, reliability practices, observability
  • Deep knowledge of Kubernetes: how scheduling, resource management, controllers, and pod lifecycle actually behave under pressure
  • Solid Linux and systems debugging skills: performance investigation, networking, storage/IO
  • Ability to troubleshoot complex production issues across logs, metrics, and traces and drive them to resolution
Nice to have
  • Experience with Argo Workflows, Ray, MLflow, or comparable distributed ML tooling
  • Hands-on experience building or operating large-scale ML training systems: GPU scheduling, distributed training, training data pipelines
  • Track record of optimizing resource usage and performance in distributed environments

Candidates are required to be authorized to work in the U.S. The employer is not offering relocation sponsorship, and remote work options are not available.

Avride is an equal opportunity employer and committed to providing reasonable accommodations to qualified applicants and employees with disabilities to ensure they have equal access to employment opportunities. Avride complies with the Americans with Disabilities Act (ADA), if you need a reasonable accommodation to assist with the application or hiring process, or to perform the essential functions of a job, please email [email protected].

HQ

Avride Austin, Texas, USA Office

8605 Cross Park Dr, Austin, TX , United States, 78754

Similar Jobs

51 Minutes Ago
Hybrid
153K-261K Annually
Expert/Leader
153K-261K Annually
Expert/Leader
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Lead and deliver a portfolio of USAF sustainment projects, ensuring financial, schedule, technical, and growth objectives. Provide strategic program governance, customer engagement, business development support, resource and risk management, and continuous improvement while directing cross-functional teams and aligning execution to organizational strategy.
Top Skills: As9145Deltek CostpointEarned Value Management (Evm)ExcelLean Six SigmaMS OfficeMs Project
2 Hours Ago
Hybrid
Mid level
Mid level
eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Manage end-to-end non-inventory procurement for fulfillment centers including purchasing corrugate, shipping materials, and consumables. Maintain stocking strategy and DOH targets, perform counts, manage purchase requests, monitor vendor performance, ensure budget and policy compliance, and support site audits, 6S, and cross-functional coordination.
Top Skills: Erp PlatformsExcelMS OfficeProcurement Systems
3 Hours Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
119K-160K Annually
Mid level
119K-160K Annually
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Provide end-to-end commercial litigation support, advise on subpoenas and customer data privacy, manage eDiscovery lifecycle with automation/AI, mitigate and resolve disputes, drive process and technology-enabled innovation, and deliver actionable legal insights to cross-functional stakeholders.
Top Skills: AIEdiscoveryInternet Of Things (Iot)Tofu

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account