Arganteal Corporation

Principal Architect - AI

Posted 6 Days Ago

Be an Early Applicant

In-Office

Austin, TX

Expert/Leader

In-Office

Austin, TX

Expert/Leader

The Principal Architect leads AI-focused engagements, designing and implementing GPU-accelerated HPC systems while optimizing and troubleshooting high-performance infrastructures. They serve as a technical authority and ensure successful integrations across Compute, Networking, and Storage.

The summary above was generated by AI

Overview
The Principal Architect leads HPC AI focused Professional Services delivery engagements and cross functional technical teams on customer programs or projects. They are responsible for technical communications with Engineers, Architects, and the customer for AI-driven projects. The Principal Architect may participate in several Customer projects concurrently, integrating AI solutions with enterprise IT systems.

Role Summary
The Principal Architect will be at the epicenter of the AI revolution, working with the most advanced hardware on the planet. Whether you're helping a research facility unlock new scientific breakthroughs or an enterprise to build its first private AI cloud, your fingerprints will be on the infrastructure that defines the next decade of technology.
The right person for the job is a senior individual contributor responsible for designing, implementing, and optimizing large-scale High-Performance Computing and AI platforms centered on the NVIDIA data center ecosystem. This role operates in a hybrid capacity, combining hands-on technical architecture with selective customer-facing advisory responsibilities.
The architect serves as a technical authority across GPU-accelerated compute, high-performance networking, and modern parallel storage platforms, influencing architectural standards and delivery outcomes while ensuring successful, on-time, and on-budget customer deployments without escalations.
This is a remote work from home position, with an average travel expectation of approximately 10%, and a willingness for additional travel during peak project phases or critical customer engagements.

Key Responsibilities
Architecture and Design

Lead the end-to-end architecture of GPU-accelerated HPC and AI platforms, including greenfield AI factory designs and optimization of existing HPC environments.

Architect integrated solutions spanning Compute, Networking, and Storage using NVIDIA HGX and DGX platforms, Grace CPU architectures, Spectrum-X networking, and high-performance parallel storage systems.

Design storage architectures optimized for AI training, inference, and HPC workloads, balancing performance, scalability, resiliency, and cost.

Define reference architectures, design patterns, and best practices for repeatable and supportable customer deployments.

Platform Implementation and Optimization

Provide hands-on technical leadership during implementation phases, including cluster bring-up, performance tuning, and workload optimization.

Architect and integrate workload orchestration and scheduling platforms using NVIDIA Base Command Manager, Slurm, Kubernetes and Run:AI.

Optimize end-to-end data pipelines, including GPU utilization, storage throughput, metadata performance, and job scheduling efficiency.

Troubleshoot performance bottlenecks across Compute, Networking, and Storage.

Storage Architecture & Data Performance

Design and validate high-performance storage solutions using modern parallel and scale-out storage platforms.

Demonstrate hands-on experience with at least one of the following storage technologies

VAST Data

WEKA

Lustre

NetApp

Architect storage solutions that support demanding AI and HPC workloads, including high-throughput training pipelines, checkpointing, and large-scale shared datasets.

Collaborate with compute and networking design to ensure balanced, bottleneck-free architectures.

Technical Authority and Advisory

Act as a senior technical authority for HPC and AI architecture across internal teams and customer engagements.

Participate selectively in customer-facing discussions to validate architecture and delivery plans, with a primary focus on design integrity and execution rather than pre-sales.

Influence platform standards, architectural direction, and technical decision-making through expertise and demonstrated execution.

Delivery Excellence

Identify technical risks early across Compute, Networking, Storage, and orchestration layers, and drive mitigation strategies.

Partner with the PMO counterpart to resolve Risks and Issues upon identification and to ensure production-ready, supportable platforms.

Ensure staff, contractors, and partners adhere to best practices and templates for AI solution delivery.

Review deployment documents, technical assessments, and other outputs to ensure consistency and accuracy, aligning with AI and "One Voice" standards.

__________________________________________
Required Technical Expertise

Core Mastery Areas

Expert level with deep architectural knowledge of NVIDIA data center platforms, including HGX and DGX platforms.

GPU-accelerated compute architecture for AI and HPC workloads.

High-performance networking architectures, especially with Spectrum-X.

Large-scale AI factory and HPC platform design.

Storage Expertise

Hands-on architectural experience with high-performance parallel or scale-out storage systems.

Deep understanding of storage performance characteristics relevant to AI and HPC workloads, including bandwidth, IOPS, latency, and metadata scaling.

Proven experience integrating storage platforms such as VAST Data, NetApp, WEKA, DDN, or Lustre into GPU-accelerated environments.

Working Proficiency

NVIDIA Base Command Manager (BCM) for cluster lifecycle management and operations.

Slurm for HPC workload scheduling and resource management.

Run:AI for GPU orchestration and multi-tenant AI workload optimization.

Kubernetes administration including deploying and managing GPU-accelerated AI and HPC workloads.

Linux systems administration in large-scale, performance-sensitive environments.

Containerized AI workflows and their interaction with schedulers and storage systems.

Additional Experience

Experience optimizing existing HPC or AI platforms for performance, utilization, and cost efficiency.

Prior experience with multi-site, air-gapped, or regulated environments is beneficial but not required.

Experience with liquid cooling, power/cooling design, and data center integration strongly preferred.

Leadership & Influence

Senior individual contributor role with influence through technical authority rather than people management.

Ability to mentor engineers and architects through design reviews, architectural guidance, and technical leadership.

Comfortable operating autonomously in complex, high-impact technical environments.

Documentation & Repeatability Expectations

Develop and maintain high quality architectural documentation, including design blueprints, configuration guides, deployment validation reports, and operational runbooks.

Ensure all technical artifacts meet One Voice standards for clarity, completeness, and technical accuracy, enabling consistent delivery across teams.

Create reusable templates, reference architectures, and standardized design patterns that accelerate future projects and improve delivery quality.

Drive a culture of documentation discipline, ensuring that every deployment is reproducible, supportable, and aligned with architectural intent.

Educational/Experience Requirements

Bachelor’s degree in a technical field or equivalent hands-on experience architecting large scale HPC or AI systems on experience architecting large scale HPC or AI systems.

Advanced degree (MS/PhD) in relevant fields is a plus but not required.

Experience: 10+ years in HPC, Data Center Architecture, and/or Systems Engineering.

Bare Metal Focus: A fundamental preference for, and understanding of, on-premises hardware constraints (power, cooling, cabling).

Proven experience as a Senior, or Lead Architect or equivalent experience in AI projects.

Top Skills

Ddn

Kubernetes

Linux

Lustre

Netapp

Nvidia Dgx

Nvidia Hgx

Run:Ai

Slurm

Spectrum-X

Vast Data

Weka

Austin, TX, United States, 78759

Similar Jobs

Zscaler

Data Architect

18 Days Ago

Easy Apply

Remote or Hybrid

USA

Easy Apply

133K-235K Annually

Expert/Leader

133K-235K Annually

Expert/Leader

Cloud • Information Technology • Security • Software • Cybersecurity

Architect and lead large-scale cloud data platforms and AI/ML frameworks to ingest and process security telemetry. Build production ML and Generative AI solutions (LLMs, agents) to automate SOC workflows, improve threat detection, and reduce analyst workload while guiding cross-team integrations and MLOps practices.

Top Skills: Python,Aws,Azure,Scikit-Learn,Tensorflow,Pytorch,Large Language Models (Llms),Ai Agents,Mlops,Generative Ai,Infrastructure-As-Code,Containerization,Ci/Cd,Streaming Technologies

Ambiq Micro

Software Architect

6 Days Ago

Easy Apply

In-Office

Austin, TX, USA

Easy Apply

Expert/Leader

Hardware • Internet of Things • Software • Wearables • Semiconductor

The Principal Embedded Software Architect will design the software architecture for low-power Edge AI MCUs, leading decisions for functionality and performance, partnering with hardware teams, and mentoring engineers.

Top Skills: Arm Cortex-MAsymmetric Multi-Core SystemsCC++FpgaOpenampReal-Time Operating SystemsRpmsg

LPL Financial

Architect

13 Days Ago

In-Office

156K-260K Annually

Senior level

156K-260K Annually

Senior level

Fintech

Lead design and implementation of enterprise-scale agentic and generative AI systems, establish AI-first PDLC, integrate AI with core platforms and data pipelines, ensure security/compliance and Responsible AI, and mentor teams while driving cross-functional adoption and architecture standards.

Top Skills: Agentic AiAWSC#Ci/Cd For MlEmbeddingsEvent-Driven ArchitectureGenerative AiJavaKubernetesLlmsMicroservicesMl OpsModel Evaluation/Automated EvaluationModel Fine-TuningModel MonitoringModel VersioningOrchestration FrameworksPythonRag (Retrieval-Augmented Generation)TerraformTypescriptVector Databases

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Arganteal Corporation

Principal Architect - AI

Top Skills

Arganteal Corporation Austin, Texas, USA Office

Similar Jobs

Data Architect

Software Architect

Architect

What you need to know about the Austin Tech Scene

Key Facts About Austin Tech