Protege Jobs

Senior Software Engineer, Data Processing

Protege

Senior Software Engineer, Data Processing

Posted 12 Days Ago

Remote

Hiring Remotely in USA

Senior level

Remote

Hiring Remotely in USA

Senior level

The Senior Software Engineer will design and operate data ingestion and processing systems for multimodal data, ensuring quality, security, and compliance while optimizing for performance and reliability. Responsibilities include building pipelines, handling messy data, and partnering with teams to enhance the platform's capabilities.

The summary above was generated by AI

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

About the Role

Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion — the part of the platform that takes large-scale source data and turns it into clean, structured, enriched, validated, AI-ready datasets. This is a hands-on, backend- and data-heavy role with end-to-end ownership of the pipelines that move and process data at volume.

Protege connects organizations that hold high-value data with the AI builders who need it. The value of that exchange depends on what happens at ingestion: raw, varied, high-volume source data has to be processed reliably, securely, and at scale before it's useful to anyone.

You'll work across imaging, audio, video, and other data modalities, crossing healthcare, media, and other disparate industries and data partners. You’ll partner closely with product, Data Lab, and partner engineering teams to build robust ingestion and processing systems for structured and unstructured data at massive scale, from millions to billions of records, files, and other source objects. This role is ideal for engineers who are energized by messy data at scale, want deep ownership of critical infrastructure, and like turning ambiguity into reliable systems.

What You'll DoIngestion & Processing Systems

Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well-structured datasets
Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream
Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing
Build parsers, validators, and normalization logic that can systematically handle messy, non-standard, and high-variance source formats
Turn repeated one-off data handling work into reusable processing patterns, internal tooling, and platform capabilities

Scale, Performance & Reliability

Build for high volume and high throughput, optimizing systems for reliability, cost, and speed
Work across distributed and parallel compute systems to process workloads that do not fit well on a single machine
Choose the right execution model for the workload, including batch processing, distributed execution, and modern compute patterns for unstructured data and inference-heavy processing
Diagnose and resolve bottlenecks across ingestion and processing systems, and keep performance from degrading as volume and modality complexity grow

Data Quality, Security & Compliance

Build validation and quality checks that catch bad, incomplete, or malformed data before it propagates downstream
Handle sensitive and regulated data, including PHI, with the security and care the domain demands, including de-identification where required
Track provenance, metadata, and usage constraints through the ingestion path so downstream use remains compliant and auditable
Raise the quality bar for observability, debuggability, and operational reliability across the ingestion layer

Cross-Functional Partnership

Partner with product and Data Lab to support new modalities, new partner requirements, and non-standard source data
Work directly with partner engineering teams when needed to translate source-system realities into robust ingestion and processing design
Surface recurring patterns that are worth standardizing into reusable transforms, validators, and internal tooling
Help shape how Protege handles new data types as the platform expands into more complex data environments

What Success Looks Like30 days: Ramp

Get productive in the codebase and ship your first improvements to existing pipelines
Build a working map of the ingestion and processing stack, the major data flows, and how we handle each modality
Meet the engineering, product, and Data Lab teams to understand how the function operates across the company

60 days: Take Ownership

Own a processing pipeline or modality end to end, from ingestion through delivery of AI-ready output
Develop depth in how we handle one or two data types at scale
Start raising the bar on data quality, observability, and processing best practices

90 days: Operate Independently

Own a significant part of the ingestion and processing layer and lead design on new modalities or scaling challenges
Ship reliably with minimal hand-holding, and help unblock others working in the data layer
Identify at least one leverage opportunity — a reusable transform, tool, or architectural improvement — worth investing in, and drive it

What You BringMust Haves

5+ years building and operating production backend or data systems, with real experience in data processing at scale
Hands-on experience designing and running large-scale data pipelines
Strong programming skills in Python
Experience with distributed data processing
Strong proficiency with AWS
Comfort with messy, varied, high-volume data and high ambiguity, with a knack for finding patterns in complex environments
Attention to detail without losing speed, and a bias to action
Excited to work on a product built around moving and processing large volumes of data
Curious, tenacious, and proactive

Nice to Haves

Experience processing one or more specific modalities at scale: medical imaging (e.g., DICOM), text, audio or video
Background working with sensitive or regulated data environments (HIPAA, healthcare compliance, PHI handling)
Experience with streaming systems or workflow orchestration (e.g., Airflow, Dagster)
Experience with GCP and Azure
Prior startup experience as a founding or early engineer
Familiarity with ML, NLP, or LLM-based systems, including embeddings and fine-tuning

Protege Values

Pass the Loved Ones’ Test
We act with integrity and do the right thing — especially when it’s hard and no one is watching.
Always Find a Way
We are resourceful, resilient builders who solve hard problems and push through obstacles.
Go Fast and Grow Fast
Velocity matters. We move with urgency, learn quickly, and continuously improve as individuals and as a company.
Practice Kindness and Candor
We communicate directly and respectfully, building trust through honest feedback and genuine care for one another.
Deliver Together
We win as one team. Collaboration, accountability, and shared ownership drive our success.
Own the Outcome. Hone the Craft.
We take pride in our work, sweat the details, and continuously raise the bar for excellence.

Similar Jobs

Citadel

Quantitative Researcher

36 Minutes Ago

In-Office or Remote

200K-300K Annually

Expert/Leader

200K-300K Annually

Expert/Leader

Information Technology • Software • Financial Services • Big Data Analytics

Global Quantitative Researchers at Citadel leverage advanced statistical and quantitative techniques to drive investment strategies and optimize portfolios.

Top Skills: C++Python

Comcast

Fullstack .Net Developer - Freewheel

3 Hours Ago

Remote or Hybrid

Pennsylvania, USA

71K-166K Annually

Junior

71K-166K Annually

Junior

Digital Media • Information Technology • News + Entertainment

Full‑stack .NET developer responsible for writing, maintaining and optimizing code, designing APIs and system architecture, implementing unit/integration tests, supporting deployments, troubleshooting performance issues, and collaborating with QA and stakeholders. May work variable hours including nights/weekends.

Top Skills: AjaxAngularAsp.NetBootstrapperC#Continuous IntegrationCSSEntity FrameworkGitHTMLIisIocJavaJavaScriptJqueryJSONLinqMvc 5Net Core 2.0Net FrameworkOrmSalesforce Experience CloudSap AbapSQL ServerTfsTypescriptVb.NetVisual StudioWeb ApiXML

Comcast

Salesforce Engineer

3 Hours Ago

Remote or Hybrid

Pennsylvania, USA

84K-196K Annually

Senior level

84K-196K Annually

Senior level

Digital Media • Information Technology • News + Entertainment

Design, build, test, and deploy scalable Salesforce solutions across multi-cloud environments using Apex, LWC, Visualforce, declarative tools and integrations (MuleSoft/REST/SOAP). Lead configuration, data modeling, CI/CD, code reviews, troubleshooting, and Agile delivery while mentoring junior engineers and supporting platform governance and long-term architectural alignment.

Top Skills: ApexCi/CdCopadoCSSFlowsGitHTMLJavaScriptLightning App BuilderLightning Web Components (Lwc)Media CloudMulesoftRest ApisSales CloudSalesforce Experience CloudService CloudSoap ApisSOQLSoslVisualforce

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center