Fluidstack Logo

Fluidstack

Product Manager, AI Platform

Posted 7 Days Ago
In-Office
3 Locations
180K-250K Annually
Senior level
In-Office
3 Locations
180K-250K Annually
Senior level
The Product Manager will lead the AI platform roadmap, focusing on managed inference services and optimizing GPU utilization while collaborating with various teams to address customer needs and competitive offerings.
The summary above was generated by AI
About Fluidstack

At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more - to unlock compute at the speed of light.

We’re working with urgency to make AGI a reality. As such, our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers’ outcomes as our own, taking pride in the systems we build and the trust we earn. If you’re motivated by purpose, obsessed with excellence, and ready to work very hard to accelerate the future of intelligence, join us in building what's next.

About the role

We're hiring a Product Manager to own our AI platform roadmap, including managed inference and agent platforms. You'll define how Fluidstack enables customers to deploy, scale, and optimize LLM inference workloads—from model serving and routing to agent orchestration and compound AI systems. This role requires balancing customer needs for low latency and high throughput with the operational realities of GPU utilization, cost efficiency, and platform reliability. You'll work across engineering, ML research, and go-to-market teams to position Fluidstack against inference-first competitors like Together AI, Fireworks, Baseten, Modal, and Replicate.

What you'll do
  • Own the product strategy and roadmap for managed inference services, including model deployment, autoscaling, multi-LoRA serving, and inference optimization

  • Define requirements for agent platform capabilities: structured outputs, function calling, memory primitives, tool integration, and multi-step reasoning workflows

  • Drive decisions on which inference optimizations to prioritize: speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration

  • Partner with ML infrastructure engineers to design APIs, SDKs, and deployment workflows that support model fine-tuning, version management, and A/B testing

  • Work with datacenter teams to optimize GPU allocation strategies—balancing dedicated vs. serverless deployments, cold start latency, and cost-per-token economics

  • Analyze competitive offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training-to-inference integration), and Modal (serverless architecture)

  • Define pricing models that align with customer usage patterns (tokens, requests, GPU-hours) while maintaining healthy unit economics

  • Conduct customer research to understand inference workload requirements: latency SLAs, throughput targets, model size constraints, and integration needs

  • Translate customer feedback into feature specifications—including support for new model architectures, framework integrations (vLLM, TensorRT-LLM, TGI), and observability tooling

  • Build go-to-market materials: reference architectures, performance benchmarks, cost calculators, and migration guides for customers moving from self-hosted or competing platforms

About you
  • 5+ years product management experience with at least 3 years focused on AI/ML infrastructure, inference platforms, or developer tools

  • Strong technical understanding of transformer architectures, inference optimization techniques, and production ML systems

  • Experience building products for technical users deploying LLMs in production (ML engineers, research scientists, AI application developers)

  • Track record of shipping features that improved inference latency, throughput, or cost efficiency—backed by quantitative metrics

  • Deep familiarity with the inference ecosystem: serving frameworks (vLLM, TensorRT-LLM, TGI), model formats (GGUF, SafeTensors), and API standards (OpenAI-compatible endpoints)

  • Understanding of GPU memory constraints, batching strategies, and the tradeoffs between latency-optimized vs. throughput-optimized serving

  • Ability to translate complex technical concepts (speculative decoding, PagedAttention, Multi-LoRA) into clear customer value propositions

  • Experience conducting competitive analysis in the inference market, including pricing elasticity, feature differentiation, and customer acquisition patterns

  • Comfortable working with engineering teams to debug performance bottlenecks, analyze profiling data, and prioritize kernel-level optimizations

  • Bonus: Experience with agent frameworks (LangChain, LlamaIndex, AutoGPT), compound AI patterns, or model fine-tuning workflows

Compensation

To provide greater transparency to candidates, we share base pay ranges for all US-based job postings. Our compensation package includes base salary, equity, benefits, and for applicable roles, commissions plans. Our cash compensation range for this role is $180,000-$250,000. Final offers vary based on geography, candidate experience, relevant credentials, and other factors. Outstanding candidates may be eligible for adjusted terms plus meaningful equity.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email, please email [email protected] with your resume/CV, the role you've applied for, and the date you submitted your application-- someone from our recruiting team will be in touch.

Top Skills

AI
Gpu
Inference Optimization Techniques
Ml
Model Serving Frameworks
Openai-Compatible Endpoints
Tensorrt-Llm
Tgi
Transformer Architectures
Vllm

Similar Jobs

Yesterday
Hybrid
5 Locations
159K-305K Annually
Senior level
159K-305K Annually
Senior level
Fintech • Financial Services
Lead product strategy for AI platform, managing GPU infrastructure. Collaborate with engineering teams, define roadmaps, and ensure compliance for enterprise AI solutions.
Top Skills: AzureDockerGCPKubernetesNvidia/Run:AiOpenshiftTensorrt-LlmTritonVllm
14 Days Ago
In-Office or Remote
12 Locations
188K-240K Annually
Expert/Leader
188K-240K Annually
Expert/Leader
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Lead strategy and delivery of an AI-native internal platform that unifies tools, data, and services. Define roadmap, partner with engineering/AI/data teams, operationalize safety and evaluation, deploy compliance automation, measure platform success, and drive cross-functional alignment to scale AI use cases across the company.
2 Days Ago
Hybrid
5 Locations
159K-305K Annually
Senior level
159K-305K Annually
Senior level
Fintech • Financial Services
Lead product strategy and roadmap for an enterprise-scale LLM/SLM inference GPU platform. Partner with GPU hardware and platform engineering to deliver high-performance inferencing, GPU orchestration, API productization, observability, SLOs, and compliant lifecycle management to enable secure, scalable AI solutions across the organization.
Top Skills: APIsAzureDockerFp8GCPInt4KubernetesKv Cache ManagementMigNcclNvidia H100Nvidia H200OpenshiftRed Hat Openshift AiRun:AiSdksTensorrt-LlmTritonVllm

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account