Prime Intellect Logo

Prime Intellect

Applied Research - Evals & Data

Reposted 12 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in USA
Mid level
Remote
Hiring Remotely in USA
Mid level
This role involves designing AI agents, building robust infrastructure, and translating customer insights into technical requirements while working with reinforcement learning and applied data.
The summary above was generated by AI

Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.


Role Impact

This is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems. You’ll have a direct impact on shaping how advanced models are aligned, evaluated, deployed, and used in the real world by:

  • Advancing Agent Capabilities: Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale. Working with applied data from real deployments to continuously refine policies, improve reasoning, and enhance reliability and safety.

  • Building Robust Infrastructure: Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale. Building data capture, processing, and versioning workflows for feedback, model traces, and reward signals.

  • Bridge Between Customers & Research: Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities. Collaborating closely with RL and eval teams to ensure real-world signals inform model alignment and reward shaping.

  • Prototype in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using applied evaluation data to iterate on model performance and discover new capabilities.


Customer-Facing Engineering

  • Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks.

  • Prototype agents, data pipelines, and eval harnesses tailored to real use cases, then hand off hardened systems to core teams.

  • Translate customer insights and evaluation results into roadmap and research direction.


Post-training & Reinforcement Learning

  • Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks.

  • Build evaluation harnesses and verifiers to measure reasoning, robustness, and agentic behavior in real-world workflows.

  • Integrate applied data collection and analytics into the post-training process to surface regressions, emergent skills, and alignment opportunities.

  • Prototype multi-agent and memory-augmented systems to expand capabilities for customer-facing solutions.


Agent Development & Infrastructure

  • Rapidly prototype and iterate on AI agents for automation, workflow orchestration, and decision-making.

  • Extend and integrate with agent frameworks to support evolving feature requests and performance requirements.

  • Architect and maintain distributed training and inference pipelines, ensuring scalability and cost efficiency.

  • Develop observability and monitoring (Prometheus, Grafana, tracing) to ensure reliability and performance in production deployments.


Requirements

  • Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment.

  • Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines).

  • Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate).

  • Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform).

  • Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL.

  • Passion for advancing the state-of-the-art in reasoning, measurement, and building practical, agentic AI systems.


What We Offer

  • Competitive Compensation + equity incentives

  • Flexible Work (remote or San Francisco)

  • Visa Sponsorship & relocation support

  • Professional Development budget

  • Team Off-sites & conference attendance


Growth Opportunity

You’ll join a mission-driven team working at the frontier of open, superintelligence infra. In this role, you’ll have the opportunity to:

  • Shape the evolution of agent-driven, data-informed solutions—from research breakthroughs to production systems used by real customers.

  • Collaborate with leading researchers, engineers, and partners pushing the boundaries of RL, evaluation, and post-training.

  • Grow with a fast-moving organization where your contributions directly influence both the technical direction and the broader AI ecosystem.

If you’re excited to move fast, build boldly, and help define how agentic AI is developed and deployed, we’d love to hear from you.

Ready to build the open superintelligence infrastructure of tomorrow?
Apply now to help us make powerful, open AGI accessible to everyone.

Top Skills

Accelerate
Docker
Evalflow
Grafana
Helm
Kubernetes
Machine Learning
Post-Training Methods
Prometheus
Ray
Reinforcement Learning
Sglang
Swe-Bench
Terraform
Vllm

Similar Jobs

An Hour Ago
Remote
California, USA
151K-205K Annually
Senior level
151K-205K Annually
Senior level
Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Lead financial analysis, forecasting, and EACs; recommend actions to senior management; develop process improvements and business cases; support new business finance, contracts, and working capital initiatives; present variance analyses and strategy to maximize profitability.
Top Skills: ExcelPivot TablesSAPVlookup
An Hour Ago
In-Office or Remote
Long Beach, CA, USA
128K-198K Annually
Senior level
128K-198K Annually
Senior level
Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Lead DevSecOps engineer managing cloud infrastructure, CI/CD, automated testing, compliance, and platform reliability for simulation and virtual-cab teams. Drive automation, security mitigation, tooling, and cross-team DevSecOps roadmaps.
Top Skills: C,C++,Linux,Windows,Microsoft Azure,Aws,Ansible,Terraform,Docker,Kubernetes,Gitlab Ci/Cd,Python,Ec2,Vpc,Security Groups,Subnets,Virtual Machines,Containers,Game Engines
An Hour Ago
Remote or Hybrid
Milpitas, CA, USA
134K-255K Annually
Senior level
134K-255K Annually
Senior level
Cloud • Software
Own product vision and roadmap for an AI-native collaborative workspace. Define AI-assisted workflows, scalability and performance targets, collaboration and access controls, cross-product integrations, and validate with customers to drive adoption and enterprise readiness.
Top Skills: Ai/Ml,Natural Language Processing (Nlp),Distributed Systems,Apis,Microservices,Caching,Databases,Multi-Tenancy,Rbac (Role-Based Access Control),Data Visualization,Performance Engineering,A/B Testing,Feature Flagging,Observability (Telemetry)

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account