Epoch AI Logo

Epoch AI

Researcher, Benchmark Reviews

Posted Yesterday
Be an Early Applicant
Remote
Hiring Remotely in USA
100K-200K Annually
Mid level
Remote
Hiring Remotely in USA
100K-200K Annually
Mid level
Analyze and critique new AI benchmarks every two weeks, evaluate methodologies and implications for capabilities, publish and maintain public-facing reports, and examine benchmark tasks and data in detail (using coding agents to assist while retaining independent judgment).
The summary above was generated by AI

Epoch AI is looking for a Researcher to develop and publish critiques and reviews of AI benchmarks. 

About the role

We are looking for a Researcher to produce a steady stream of benchmark reviews. You will closely analyze a wide variety of new benchmarks, evaluate their methodologies, and write up your findings in public-facing research. You should be comfortable using coding agents to help you, without delegating your judgment.

Examples of the kind of reports you would produce include our reviews of SWE-bench Verified, OSWorld, and economic value benchmarks.

This role is fully remote; we are able to hire in many countries. We invite anyone who is interested to apply, regardless of background, experience, or credentials. Please do not include a cover letter, photograph, or headshot of yourself, or any personal information that is not relevant to the role for which you're applying (including marital status, age, identity traits, etc.). 

If this role sounds interesting, we are also looking for researchers on multiple other teams.  

Applications are rolling.

Key Responsibilities

  • Review benchmarks: Assess a new benchmark at least every two weeks. You’ll evaluate their methodology and what a “good performance” would imply about AI capabilities.
  • Publish and maintain reports: Write up public-facing reports on new benchmarks and periodically update them as new versions release and models make progress.
  • Dig into data: Examine individual tasks within benchmarks in detail, using coding agents to assist while retaining your own critical oversight.

What we're looking for

  • You excel at written communication. Your first drafts rarely need much polishing; they’re clear and publication-quality with minimal revision.
  • You think critically about research methodologies. You have strong opinions about what benchmarks and evaluations should look like, and can recognize when something seems wrong. 
  • You’re already familiar with existing AI benchmarks, their methodologies, strengths, and weaknesses. You have ideas for new benchmarks that don’t exist yet, but should. 
  • You’re comfortable working with dense data. You’ll often be breaking down benchmarks into their component parts, which can be overwhelming without familiarity.

Nice to have

  • Experience writing about AI/ML for a public or research audience.
  • Hands-on experience running or building AI evaluations.
  • If you don't tick all these boxes but think you would be a great fit, please consider applying anyway!

Compensation & Benefits

  • Annual salary between $100,000 and $200,000 USD, depending on location, seniority, and experience.
  • Salaries are not restricted to USD, and contracts and payments are usually in local currencies. Conversions are based on one-year average exchange rates.
  • Fully remote environment, including flexible work hours.
  • Competitive global benefits program, including a comprehensive health insurance program—including supplemental benefits specific to a local country, as available and mandated by local law—and life insurance and a pension plan, if applicable in your country.
  • Generous paid time off (PTO), including no specific annual limit, with 30 days PTO per year protected, unlimited personal and sick leave, and 4 months paid parental leave for permanent staff with at least 12 months of tenure (prorated parental leave if less than 12 months).
  • A flexible and generous expense policy for you to spend on equipment and a large range of productivity tools or learning/development opportunities, including unlimited spending on AI tools, subject to regulations and manager approval.
  • Paid work trips, including 3 staff retreats per year and relevant conferences.
  • Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more. All staff, independently of where they are based, have access to the office for at least 20 days each year.

Additional Information

While we welcome applicants from all time zones, we prefer candidates who can overlap with US and UK time zones.

Please submit all of your application materials in English and note that we require professional level English proficiency.

Epoch is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We're committed to finding the best people for our team, so please don't hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background. Please email [email protected] if you have any questions about this role, accessibility requests, or if you want to request an extension to the application deadline. However, we will not review applications submitted to this email address; please submit your application through the link on this page.

About Epoch AI

Epoch AI is a research institute that investigates trends in machine learning and the economic consequences of AI. Our mission is to develop a comprehensive, publicly accessible knowledge base on AI that informs policymakers, industry leaders, and society at large.

We strive to achieve both rigor and accessibility to our work, as exemplified by some of our most successful projects, including our database of AI models and our AI trends dashboard. Our body of research includes our work on compute trends (IJCN 2022), data scarcity (ICML 2024), and algorithmic progress (NeurIPS 2024). You can read more about our work and mission on our website and in this Time profile. 

Similar Jobs

11 Minutes Ago
Remote or Hybrid
Virginia, USA
Expert/Leader
Expert/Leader
Digital Media • Information Technology • News + Entertainment
Lead design and delivery of AI-enabled product capabilities, including agentic workflows, LLM/agent productionization, distributed inference, model lifecycle, and reusable patterns. Drive cross-functional implementation, mentor engineers, improve performance/scalability, and ensure reliable, maintainable customer-facing AI features.
Top Skills: AgentsAi ToolingAWSAzureDistributed SystemsGCPGoLlmsModel EvaluationModel MonitoringModel RetrainingNlpPythonReal-Time InferenceRecommendation SystemsTime Series Modeling
11 Minutes Ago
Remote or Hybrid
65K-139K Annually
Senior level
65K-139K Annually
Senior level
Digital Media • Information Technology • News + Entertainment
Drive territory strategy and acquire mid-market and enterprise customers for Comcast Business. Generate leads, deliver face-to-face presentations, build partner relationships, manage accounts for retention, and exceed sales targets. Coordinate with internal teams to ensure service levels, maintain sales records, and apply knowledge of network and security technologies to position solutions.
Top Skills: Business Continuity/Disaster RecoveryCustomer Premise EquipmentCybersecurityEthernetLanManNetwork DesignNetwork SecurityNetworking Protocols (Layers 1-3)SdwanVoipVpnWanWdm
12 Minutes Ago
Remote or Hybrid
Pennsylvania, USA
63K-148K Annually
Senior level
63K-148K Annually
Senior level
Digital Media • Information Technology • News + Entertainment
Lead and execute large-scale automated billing updates across products, pricing, and customer accounts. Partner with business and technology teams to validate deployments, test enhancements, perform root-cause analysis, and drive automation and process improvements while ensuring data governance and compliance. Support off-hours releases and manage multiple high-impact priorities.
Top Skills: AmdocsAPIsAscendonAutomation TechnologyBilling SystemsCsgOracleSQL

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account