Userpilot Logo

Userpilot

Software Engineer - Agentic Platform

Posted 12 Days Ago
Hybrid
Austin, TX, USA
Mid level
Hybrid
Austin, TX, USA
Mid level
Own and evolve an agent platform powering multi-turn, grounded AI experiences: runtime, orchestration, hybrid retrieval, tool grounding, evals, observability, cost accounting, and interoperability. Build reliable, scalable agentic workflows and infrastructure, write specs and evals, and raise platform quality and reliability.
The summary above was generated by AI

About Userpilot

Userpilot is a leading product analytics and user engagement platform used by product teams at hundreds of companies to understand, segment, and activate their users. The product spans a performant JavaScript SDK that runs inside customers' web apps, a Chrome Extension for building in-app UI without code, and a React dashboard that handles complex real-time data, all backed by a distributed Elixir/Phoenix backend that sustains hundreds of thousands of concurrent WebSocket connections, high-throughput Kafka event ingestion, and real-time content delivery at scale.

We move fast, we ship often, and we believe the best engineers care as much about the product they're enabling as the systems and interfaces they build.

The Role

This is an AI-deep role focused on Lia, Userpilot's agent platform, the system that turns a rich product-data model into reliable, grounded, multi-turn AI experiences. The AI is the product, not just a tool you use to build it.

You'll own and elevate the agent platform: a Python service built on Microsoft Agent Framework, with hybrid retrieval over multiple tool catalogs, complex multi-step orchestration utilizing skills and sub-agents, multi-turn state and grounding, and full trace-level observability and cost accounting, all built on framework-neutral domain contracts.

This is a platform you own and push further, not just keep running. You'll contribute to architecture, raise the reliability and eval bar, and help define where a frontier agentic system goes. We hire engineers who can follow a problem wherever it leads, who know when deterministic logic or statistics beat an LLM and vice versa, and who care about the customer experience as much as the system underneath.

What You'll Work On

  • Conversational AI experiences grounded in a rich product-data model, with the tool use, retrieval, streaming, and orchestrated multi-turn grounding required to do it reliably, not just plausibly.
  • The agent runtime and orchestration itself: complex, multi-step agentic workflows, behind framework-neutral domain contracts that keep business logic portable.
  • Hybrid retrieval and tool grounding: RAG (vector + lexical) over tool catalogs assembled from multiple sources (OpenAPI specs, MCP, …), so the agent calls the right operation with the right arguments against live customer data.
  • Packaged AI workflows that produce durable, editable, actionable outputs, not just chat answers that get lost in history.
  • The eval, observability, and cost infrastructure that makes all of this safe and economically viable in production: a multi-layer eval harness (deterministic checks plus live, judge-scored reasoning evals), end-to-end tracing, and per-call cost accounting.
  • Agent interoperability: an MCP server that exposes Userpilot's tools to external AI agents.

What You'll Do

  • Design, build, and operate the agent platform end to end, from the API surface through the runtime, tools, retrieval, persistence, and observability.
  • Build LLM/agent features that ground reliably in customer data, with the streaming, retries, evals, and graceful degradation required to hold them to a production reliability bar.
  • Pick the right tool for each signal (retrieval, deterministic logic, structured outputs, statistics, or an LLM), and combine them well.
  • Treat evals, cost-per-call, and latency as first-class. AI features that run continuously at scale have unit economics; the economics matter as much as the output.
  • Work in a spec-driven, agent-assisted flow, reading and contributing to PRDs that drive both human and AI implementation.
  • Contribute to the team's agentic infrastructure (AGENTS.md, CLAUDE.md, DESIGN.md, slash commands, architectural rules) so AI tooling understands our codebase as well as the humans do.
  • Review code for architectural consistency and reliability, including making sure agent-generated code respects the same boundaries and framework-neutral contracts that human-written code does.
  • Raise the bar around you: set the patterns, write the specs and evals others build on, and level up the engineers (and agents) working in the platform.

What We're Looking For

Required

  • 3+ years building and shipping production software, with a track record of owning systems (not just features) and raising the quality bar for the people around you.
  • Strong Python and CS fundamentals, including solid work with databases, queues, or real-time systems. The agent platform runs on Python (FastAPI, Pydantic, async), so you're fluent here or will be very quickly.
  • Production agentic / LLM systems, not just calling an API: tool use, retrieval grounding, structured outputs, multi-turn state and continuity, streaming, evals, and designing for non-deterministic behavior. Having owned an agent runtime or orchestration layer end to end is a strong signal.
  • Architectural judgment for AI systems: you keep domain logic decoupled from a fast-moving vendor framework, make build-vs-adopt calls deliberately, and know why that matters when the framework landscape shifts every quarter.
  • Judgment about when to use an LLM and when not to: you reach for deterministic logic, retrieval, or statistics when they're more reliable, cheaper, or more reproducible, and you can tell which is which.
  • AI-native workflow: you use AI coding agents (Claude Code, Cursor) as a real part of how you build, prompting for scaffolding, reviewing output critically, and knowing when to push back.
  • Strong product sense and judgment. You care about the user experience and about system correctness in equal measure.
  • Self-management and a continuous-improvement mindset. We don't over-prescribe how the work gets done.

Bonus Points

  • Experience with agent frameworks or orchestration: Microsoft Agent Framework, LangGraph, AutoGen, or a runtime you built yourself.
  • RAG and tool-use platforms (retrieval over tools and APIs, OpenAPI-driven tool generation, MCP).
  • LLM evals and observability: designing them, running them, and acting on the signal, with tracing and cost tooling like Langfuse or OpenTelemetry GenAI.
  • Cost engineering on LLM workloads (caching, batching, model routing, prompt compaction).
  • Embedding-based retrieval or clustering (vector DBs, hybrid search, HDBSCAN, UMAP, and similar).
  • Multi-tenant SaaS architecture: data isolation, per-tenant state, noisy-neighbor concerns.
  • Full-stack / core-services depth: production React/TypeScript, and/or our core stack (Elixir/Phoenix with OTP, ClickHouse, Kafka). You won't live here day to day, but it helps where the agent platform meets the rest of the product.
  • Time-series anomaly detection or drift monitoring; recommendation or ranking systems with user-feedback loops.
  • Spec-driven development, writing or working from specs that drive both human and AI implementation.
  • Contributing to developer experience or agentic infrastructure.
  • Technical leadership on an engineering team.
  • Open source contributions.

How We Build

AI is at the center of what we ship and how we ship it. A few things we believe about how the work gets done:

  • Statistics, heuristics, and LLMs each have a role. The mistake we don't want to make is asking an LLM to do anomaly detection or risk scoring directly: wrong economics, wrong reliability, wrong reproducibility. Use the LLM where it's strongest; use statistics where they're strongest; use heuristics where they're cheapest.
  • Features start with a written spec (a PRD that captures intent and constraints), not a two-line ticket, whether the implementer is a human or an agent.
  • Coding agents do the scaffolding; engineers own the architecture, the review, and the judgment calls.
  • Evals are how we ship safely. Every LLM-shaped feature gets an eval suite before it goes to production, and we look at the suite, not just whether it ran.
  • LLM calls are economics, not free. Caching, batching, model routing, and prompt compaction are first-class engineering concerns.
  • Feedback loops are how AI features get smarter. Instrument everything.
  • Our patterns are encoded explicitly. Every umbrella app and product domain has an AGENTS.md capturing what it does, the patterns it uses, and the mistakes to avoid, so an agent working on core doesn't violate a cache invariant or write directly to ClickHouse, and an agent on the dashboard doesn't break a design contract.
  • DX is a product: if a new engineer (or an AI agent) can't understand a domain from its documentation and rules, that's a bug we fix.

You don't need to have done all of this at your last job. But you should be genuinely curious about it, comfortable owning a system end to end, and excited to help shape how AI products get built here.

EEO Statement

Userpilot is an equal opportunity employer. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other characteristic protected by applicable law. All qualified applicants will receive consideration for employment.

Visa/Work Authorization

Applicants must be legally authorized to work in the United States. We are not able to sponsor or take over sponsorship of an employment visa at this time.

HQ

Userpilot Austin, Texas, USA Office

Austin, TX, United States

Similar Jobs

11 Hours Ago
In-Office
Austin, TX, USA
Senior level
Senior level
Artificial Intelligence • Cloud • Software • Big Data Analytics
As a Staff Software Engineer at Cloudera, you will design and implement scalable AI and machine learning applications, collaborate with cross-functional teams, and adhere to engineering best practices.
Top Skills: AWSAzureC#C++CSSGCPGoGuidanceHTMLJavaKnativeKserveKubeflowKubernetesLangchainMilvusMlflowNemoPineconePythonRRayReactSparkSQLTensorFlow
3 Hours Ago
Hybrid
33K-55K Hourly
Senior level
33K-55K Hourly
Senior level
Fintech • Financial Services
Lead comprehensive loan file reviews and re-underwrites to assess credit decisions, closing compliance, and appraisal collateral quality. Research and validate complex or escalated loans, resolve transition and processing issues, mentor junior specialists, and report findings and recommendations to management.
Top Skills: ExcelMicrosoft OutlookMicrosoft WordWells Fargo Core Origination System
3 Hours Ago
Hybrid
33K-55K Hourly
Senior level
33K-55K Hourly
Senior level
Fintech • Financial Services
Lead comprehensive loan reviews including re-underwrites, closing document compliance checks, and appraisal assessments. Research and validate complex or escalated loan files, resolve transition and processing issues, report findings, and mentor junior specialists. Collaborate with peers and management to meet process deliverables and ensure adherence to lending policies and investor requirements.
Top Skills: ExcelMS OfficeOutlookWells Fargo Core Origination SystemWord

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account