Epoch AI Logo

Epoch AI

Software Engineer, Benchmarking

Posted Yesterday
Remote
Hiring Remotely in USA
125K-200K Annually
Mid level
Remote
Hiring Remotely in USA
125K-200K Annually
Mid level
Run, maintain, and expand an AI benchmarking infrastructure: implement and integrate benchmarks (primarily using the Inspect library), develop new benchmarks and prototypes, collaborate with researchers and engineers, and ensure evaluation outputs are accurate and integrated into research products.
The summary above was generated by AI
Epoch AI is looking for a Software Engineer who will help us evaluate frontier AI models, enabling researchers, developers, and policymakers to better understand AI development. The role will involve running and maintaining our benchmarking infrastructure as well as contributing to the development of brand new benchmarks.
About the role

Please do not include a cover letter, photograph, or headshot of yourself, or any personal information that is not relevant to the role for which you're applying (including marital status, age, identity traits, etc.).

We are looking for a Software Engineer to help us expand and develop our AI Benchmarking Hub. You will work closely with the rest of the benchmarking team to run and maintain benchmarks, integrate with AI providers, set up existing benchmarks to run on our infrastructure, help design and develop brand new benchmarks, and facilitate internal experiments.

This role is fully remote, and we are able to hire in many countries. We invite anyone who is interested to apply, regardless of background, experience, or credentials.

Applications are rolling. 

Key Responsibilities

  • Implement benchmarks: Implement AI benchmarks within our evaluation infrastructure (primarily using the Inspect library) to expand the suite of capabilities we track. Develop our existing suite of benchmarks so we can quickly and painlessly evaluate new model releases.
  • Develop new benchmarks: Contribute to the development of brand new benchmarks. You will have the opportunity to pitch and prototype your own ideas in addition to helping out with existing projects.
  • Collaborate: Work closely with researchers, analysts, and other engineers at Epoch AI to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications.

What we are looking for

  • Solid engineering skills: A strong software engineering background with more than two years of professional experience building and maintaining complex systems. You are expected to regularly contribute high-quality, robust, and maintainable code and be comfortable diving deep into existing codebases and infrastructure.
  • Ideas and creativity: Candidates should be able to generate their own ideas for new benchmarks, experiments, novel things to try, and other projects.
  • Mission-driven: You’re motivated by Epoch AI’s mission to provide rigorous, independent insight into key trends in AI. You want to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks, empowering researchers, policymakers, and the wider public to make well-informed decisions about AI.
  • AI domain expertise or cybersecurity experience are strong pluses but not required. (This includes hands-on experience running LLM evaluations, familiarity with evaluation frameworks like Inspect, as well as a solid grasp of current AI trends.) Solid engineering skills and an ability to learn quickly matter more than direct background in these areas.

Compensation & Benefits

  • Annual salary between $125,000 and $200,000 USD. 
  • Salaries are not restricted to USD, and contracts and payments are usually in local currencies. Conversions are based on one-year average exchange rates.
  • Fully remote environment, including flexible work hours and schedules for most roles.
  • Competitive global benefits program, including a comprehensive health insurance program—including supplemental benefits specific to a local country, as available and mandated by local law—and life insurance and a pension plan, if applicable in your country.
  • Generous paid time off (PTO), including no specific limit on PTO with 30 days per year protected, unlimited personal and sick leave, and up to 6 months (combination of paid + unpaid) parental leave for permanent staff. 
  • A flexible and generous expense policy for you to spend on equipment and a large range of productivity tools or learning/development opportunities you might find valuable, subject to regulations and manager approval.
  • Paid work trips, including 3 staff retreats per year and relevant conferences.
  • Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more. All staff, independently of where they are based, have access to the office for at least 20 days each year.

Additional Information

While we welcome applicants from all time zones, we prefer candidates who can overlap with UTC–8 (Pacific Time) and UTC (Greenwich Mean Time), as most of our staff work in this range of time zones. We also prefer candidates who can travel: we hold three retreats per year, during which we record podcast episodes and other communication efforts.

Please submit all of your application materials in English and note that we require professional level English proficiency.

Epoch is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We’re committed to finding the best people for our team, so please don’t hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background. Please email [email protected] if you have any questions about this role, accessibility requests, or if you want to request an extension to the application deadline. However, we will not review applications submitted to this email address; please submit your application through the link on this page. 

About Epoch AIEpoch AI is a research institute that investigates trends in machine learning and the economic consequences of AI. Our mission is to develop a comprehensive, publicly accessible knowledge base on AI that informs policymakers, industry leaders, and society at large.
 
We strive to achieve both rigor and accessibility to our work, as exemplified by some of our most successful projects, including our database of AI models and our AI trends dashboard. Our body of research includes our work on compute trends (IJCN 2022), data scarcity (ICML 2024), and algorithmic progress (NeurIPS 2024). You can read more about our work and mission on our website and in this Time profile.

Similar Jobs

12 Minutes Ago
Easy Apply
Remote or Hybrid
Easy Apply
118K-148K Annually
Senior level
118K-148K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
Lead reliability and production engineering for a global, multi-cloud platform. Build highly available infrastructure, drive automation with Python/Go, implement observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, run incident command and post-incident analyses, and partner across teams to improve operability and scalability.
Top Skills: AnsibleAWSAzureBare-MetalBgpC/C++Chaos EngineeringDisaster RecoveryDnsGCPGoGrafanaGreHaproxyHelmIpsecItilLinux/RhelOpentelemetryPrometheusPythonSlis/SlosTemporalTerraform
12 Minutes Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
13K-190K Annually
Senior level
13K-190K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
Sell Zscaler's data security suite (DLP, CASB, DSPM) to enterprise customers in major US cities. Serve as the specialist for account-based strategies, partner with solution engineers, engage C-suite and technical stakeholders, own regional quota, and collaborate with sales leadership to drive net-new and expansion deals.
Top Skills: AICasbCloud SecurityCloud-NativeDlpDspmZero TrustZero Trust Exchange
12 Minutes Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
80K-114K Annually
Mid level
80K-114K Annually
Mid level
Cloud • Information Technology • Security • Software • Cybersecurity
Produce end-to-end motion graphics from storyboards using After Effects and other animation tools. Build and maintain a templatized motion graphics library, translate static designs into motion, meet delivery specs and deadlines, and collaborate with creative teams to support live events, social, and brand video work.
Top Skills: 3D ModelingAfter EffectsAi ToolsBlenderC4DCharacter AnimationCompositingMotion TrackingPremiere

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account