Protege Logo

Protege

Senior Machine Learning Researcher

Posted 4 Days Ago
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Lead evaluation and optimization of large-scale datasets for AI models. Design statistical methods, collaborate on data strategies, and enhance data quality.
The summary above was generated by AI

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview:

Data is the foundation of AI performance, and we believe model quality starts with data quality. You’ll be at the heart of shaping how we curate, assess, and prepare the training data that powers real-world AI systems.

We’re seeking a Senior Member of the Core Data Team/ Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. In this role, you’ll help define what "high-quality data" means in practice, using statistical, computational, and ML-driven methods to ensure our data is diverse, representative, and high-impact. You’ll work closely with research and engineering teams to improve model performance through better data. This is an ideal role for someone with a PhD in machine learning, CS, or a related applied field who is passionate about the role of data in AI training and excited to advance Protege’s mission to become the ubiquitous platform for AI training data.

Key Responsibilities:

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets

  • Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets

  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups

  • Provide leadership on data quality strategy and shape internal best practices

  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards

  • Contribute to research and development of tools that automate data preprocessing and validation

About You:

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field

  • Strong understanding of AI model training pipelines, including pre-processing and evaluation

  • Experience working with large, unstructured datasets, especially text

  • Background in statistical analysis, bias detection, and data validation

  • Able to identify high-impact problems and drive independent solutions

Bonus if you have these attributes:

  • Experience with synthetic data generation or augmentation strategies

  • Publications or open-source contributions in data-centric AI or related areas

  • Experience developing evaluation frameworks or performance metrics for training data

  • Cross-functional collaboration with product, infrastructure, or partnership teams

Top Skills

Ai Model Training Pipelines
Data Preprocessing
Machine Learning
Python
Statistical Analysis

Similar Jobs

21 Days Ago
Remote
United States
175K-230K Annually
Senior level
175K-230K Annually
Senior level
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software • Generative AI
Lead research on large language models to address healthcare challenges, collaborating with teams to enhance model performance and ensure real-world applicability.
Top Skills: AWSGCPHugging Face TransformersKubeflowKubernetesPythonPyTorchPytorch LightningTensorFlow
41 Minutes Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
116K-165K Annually
Mid level
116K-165K Annually
Mid level
Cloud • Information Technology • Security • Software • Cybersecurity
As a Senior Web Content Writer, you'll create and optimize high-quality web content to drive organic growth, collaborating with teams on SEO best practices and technical accuracy.
Top Skills: Seo,Aeo,Ai,Drupal,Html
49 Minutes Ago
Easy Apply
Remote
United States
Easy Apply
189K-236K Annually
Senior level
189K-236K Annually
Senior level
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
The Principal Data Platform Engineer will lead the development of scalable data management systems, focusing on ingestion, processing, and analytics platforms while collaborating with cross-functional teams.
Top Skills: AirflowApache KafkaSparkAWSDockerKubernetesPython

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account