The College Board Logo

The College Board

AI/ML Data Engineer

Posted 25 Days Ago
Remote
Hiring Remotely in USA
137K-148K Annually
Mid level
Remote
Hiring Remotely in USA
137K-148K Annually
Mid level
As an AI/ML Data Engineer, you will design and build data and ML infrastructure to create personalized student experiences, working on ETL processes, feature stores, and collaborating with product and data science teams.
The summary above was generated by AI

AI/ML Data Engineer 

College Board - Technology  

Location: This is a fully remote role that requires working EST hours. Candidates who live near CB offices have the option of being fully remote or hybrid (Tuesday and Wednesday in office).  

Type: This is a full-time position 

About the Team 

Aquifer is a small, highly collaborative team that implements data and analytics services powering higher‑education recruitment and student engagement for College Boards’ BigFuture Division. We experiment thoughtfully and ship durable, secure data products that personalize outreach and help partners execute strategic enrollment plans. 

Our team has a mix of engineers and architects that blends expertise in data engineering, analytics, and product strategy to deliver scalable solutions that transform how students connect with colleges. We value curiosity, reliability, and clear communication, and we work closely across disciplines to ensure every product is impactful, maintainable, and user-focused. 

About the Opportunity 

As an AI/ML Data Engineer, you’ll design, build, and operate the data and ML plumbing that powers personalized student experiences at scale. You’ll create batch and streaming pipelines, ML‑ready datasets, feature/embedding stores, and the services that move models into production safely and compliantly. You’ll collaborate with Product, Data Science, and Analytics to turn raw events into reliable, privacy‑preserving features that drive real impact for students and higher‑ed partners. 

In this role, you will: 

ML Data Platform & Pipelines (40%) 

  • Design, build, and own batch and streaming ETL (e.g., Kinesis/Kafka → Spark/Glue → Step Functions/Airflow) for training, evaluation, and inference use cases. 

  • Stand up and maintain offline/online feature stores and embedding pipelines (e.g., S3/Parquet/Iceberg + vector index) with reproducible backfills. 

  • Implement data contracts & validation (e.g., Great Expectations/Deequ), schema evolution, and metadata/lineage capture (e.g., OpenLineage/DataHub/Amundsen). 

  • Optimize lakehouse/warehouse layouts and partitioning (e.g., Redshift/Athena/Iceberg) for scalable ML and analytics. 

Model Enablement & LLM DataOps (30%) 

  • Productionize training and evaluation datasets with versioning (e.g., DVC/LakeFS) and experiment tracking (e.g., MLflow). 

  • Build RAG foundations: document ingestion, chunking, embeddings, retrieval indexing, and quality evaluation (precision@k, faithfulness, latency, and cost). 

  • Collaborate with DS to ship models to serving (e.g., SageMaker/EKS/ECS), automate feature backfills, and capture inference data for continuous improvement. 

Reliability, Security & Compliance (15%) 

  • Define SLOs and instrument observability across data and model services (freshness, drift/skew, lineage, cost, and performance). 

  • Embed security & privacy by design (PII minimization/redaction, secrets management, access controls), aligning with College Board standards and FERPA. 

  • Build CI/CD for data and models with automated testing, quality gates, and safe rollouts (shadow/canary). 

Documentation & Enablement (15%) 

  • Maintain docs‑as‑code for pipelines, contracts, and runbooks; create internal guides and tech talks. 

  • Mentor peers through design reviews, pair/mob sessions, and post‑incident learning. 

About You 

You have: 

  • 4+ years in data engineering (or 3+ with substantial ML productionization), with strong Python and distributed compute (Spark/Glue/Dask) skills. 

  • Proven experience shipping ML data systems (training/eval datasets, feature or embedding pipelines, artifact/version management, experiment tracking). 

  • MLOps/LLMOps: orchestration (Airflow/Step Functions), containerization (Docker), and deployment (SageMaker/EKS/ECS); CI/CD for data & models. 

  • Expert SQL and data modeling for lakehouse/warehouse (Redshift/Athena/Iceberg), with performance tuning for large datasets. 

  • Data quality & contracts (Great Expectations/Deequ), lineage/metadata (OpenLineage/DataHub/Amundsen), and drift/skew monitoring. 

  • Cloud experience preferably with AWS services such as S3, Glue, Lambda, Athena, Bedrock, OpenSearch, API Gateway, DynamoDB, SageMaker, Step Functions, Redshift and Kinesis BI tools like Tableau, Quicksight, or Looker for real-time analytics and dashboards   

  • Security and privacy mindset; ability to design compliant pipelines handling sensitive student data. 

  • An ability to judiciously evaluate the feasibility, fairness, and effectiveness of AI solutions and articulate considerations and concerns around implementing models in the context of specific business applications 

  • Excellent communication, collaboration, and documentation habits. 

Preferred 

  • RAG & vector search experience (OpenSearch KNN/pgvector/FAISS) and prompt/eval frameworks. 

  • Real‑time feature engineering (Kinesis/Kafka) and low‑latency stores for online inference. 

  • Testing strategies for ML systems (unit/contract tests, data fuzzing, offline/online parity checks). 

  • Experience in higher‑ed/assessments data domains. 

All roles at College Board require:  

  • A passion for expanding educational and career opportunities and mission-driven work 

  • Authorization to work in the United States for any employer 

  • Curiosity and enthusiasm for emerging technologies, with a willingness to experiment with and adopt new AI-driven solutions and a comfort learning and applying new digital tools independently and proactively.  

  • Clear and concise communication skills, written and verbal 

  • A learner's mindset and a commitment to growth: welcoming diverse perspectives, giving and receiving timely, respectful feedback, and continuously improving through iterative learning and user input. 

  • A drive for impact and excellence: solving complex problems, making data-informed decisions, prioritizing what matters most, and continuously improving through learning, user input, and external benchmarking. 

  • A collaborative and empathetic approach: working across differences, fostering trust, and contributing to a culture of shared success. 

 

About Our Process   

  • Application review will begin immediately and will continue until the position is filled. This role is expected to accept applications for a minimum of 5 business days. 

  • While the hiring process may vary, it generally includes: resume and application submission, recruiter phone/video screen, hiring manager interview, performance exercise such as live coding, a panel interview, a conversation with leadership and reference checks.    

What We Offer 

At College Board, we offer more than just a paycheck—we provide a meaningful career, a supportive team, and a comprehensive package designed to help you thrive. We’re a self-sustaining nonprofit that believes in fair and competitive compensation, grounded in your qualifications, experience, impact, and the market. 

A Thoughtful Approach to Compensation 

  • The hiring range for this role is $137K–$148K. 

  • Your exact salary will depend on your location, experience, and how your background compares to others in similar roles at the College Board. 

  • We aim to make our best offer upfront—rooted in fairness, transparency, and market data. 

  • We adjust salaries by location to ensure fairness, no matter where you live. 

You’ll have open, transparent conversations about compensation, benefits, and what it’s like to work at College Board throughout your hiring process. Check out our careers page for more. 

#LI-REMOTE

#LI-AP1

Top Skills

Airflow
Athena
Dask
Docker
Ecs
Eks
Glue
Iceberg
Kinesis
Looker
Python
Quicksight
Redshift
S3
Sagemaker
Spark
SQL
Step Functions
Tableau

Similar Jobs

22 Days Ago
In-Office or Remote
2 Locations
101K-168K Annually
Senior level
101K-168K Annually
Senior level
Appliances
The Senior Power System Studies Engineer designs and applies AI/ML techniques for power system optimization, conducts simulations, and mentors junior engineers.
Top Skills: AIDigsilentMatlabMlPscadPsseRtds
Yesterday
In-Office or Remote
6 Locations
100K-115K Annually
Mid level
100K-115K Annually
Mid level
Other • Real Estate
Pulte Mortgage seeks a Data Engineer to enhance data ecosystems, optimize data processing, and leverage AI/ML technologies to improve data-driven decision-making for the company.
Top Skills: AzureHadoopKafkaMicrosoft FabricPower BIPysparkPythonSparkSQL
17 Days Ago
Easy Apply
Remote
United States
Easy Apply
128K-173K Annually
Senior level
128K-173K Annually
Senior level
Healthtech
The Sales Engineer will serve as a technical lead, guiding clients on advanced data integration and analytics solutions in healthcare, impacting revenue and client engagement.
Top Skills: AIAPIsAWSAzureGCPMachine LearningSnowflake

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account