Photon Logo

Photon

Data Engineer - Dallas, TX

Posted Yesterday
In-Office or Remote
Hiring Remotely in United States
38K-133K Annually
Mid level
In-Office or Remote
Hiring Remotely in United States
38K-133K Annually
Mid level
Design and build RAG-focused data pipelines for LLMs: ETL/ELT for structured and unstructured data, vector database architecture, embedding and chunking strategies, data cleaning/PII removal, metadata engineering, low-latency streaming, and evaluation/versioning to support autonomous agents.
The summary above was generated by AI

We are seeking a Data Engineer who will be responsible for the "Ingestion-to-Insight" pipeline that allows autonomous agents to access, search, and reason over vast amounts of proprietary and public data.

Your role is critical: you will design the RAG (Retrieval-Augmented Generation) architectures and data pipelines that ensure our agents have the right context at the right time to make accurate decisions.

Key Responsibilities

  • AI-Ready Data Pipelines: Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption.
  • Vector Database Management: Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval.
  • Chunking & Embedding Strategies: Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the "recall" and "precision" of the agent's knowledge retrieval.
  • Data Quality for AI: Develop automated "Data Cleaning" workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets.
  • Metadata Engineering: Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks.
  • Real-time Data Streaming: Build low-latency data streams (using Kafka or Flink) to provide agents with "fresh" data, enabling them to act on real-time market or operational changes.
  • Evaluation Frameworks: Construct "Gold Datasets" and versioned data snapshots to help the team benchmark agent performance over time.

Required Skills & Qualifications

  • Experience: 4+ years in Data Engineering, with at least 1 year focusing on data for LLMs or AI/ML applications.
  • Python Mastery: Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration.
  • Data Tooling: Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks).
  • Vector Expertise: Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity).
  • Search Knowledge: Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25).
  • Cloud Infrastructure: Proficiency in managing data workloads on AWS, Azure, or GCP.

Preferred Qualifications

  • Experience with LlamaIndex or LangChain for data ingestion.
  • Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points.
  • Familiarity with "Data-Centric AI" principles—prioritizing data quality over model size.

Compensation, Benefits and Duration

Minimum Compensation: USD  38,000
Maximum Compensation: USD 133,000
Compensation is based on actual experience and qualifications of the candidate. The above is a reasonable and a good faith estimate for the role.
Medical, vision, and dental benefits, 401k retirement plan, variable pay/incentives, paid time off, and paid holidays are available for full time employees.
This position is not available for independent contractors
No applications will be considered if received more than 120 days after the date of this post

Similar Jobs

Yesterday
Remote
United States
38K-134K Annually
Senior level
38K-134K Annually
Senior level
Agency • Information Technology
Design, build, and maintain scalable ETL/ELT pipelines into Snowflake, enforce data quality, optimize data models and query performance for Power BI reporting, produce documentation, and troubleshoot data issues while collaborating with stakeholders.
Top Skills: AWSAzureEltETLGCPOraclePower BISnowflakeSQL
Yesterday
In-Office or Remote
United States
36K-127K Annually
Senior level
36K-127K Annually
Senior level
Agency • Information Technology
Design, build, modernize, and maintain operational and analytical data capabilities for Wealthscape Reporting, Analytics and Insights. Perform solution design, data analysis, ETL development, and production rollouts using Snowflake, AWS, cloud databases, and CI/CD pipelines. Collaborate across teams in a fast-paced financial services environment.
Top Skills: AnsibleAWSAws LambdaCi/CdContainerizationDockerInformaticaJenkinsLinuxMavenOraclePostgresPythonShell ScriptingSnaplogicSnowflakeStashUnix
14 Days Ago
Remote or Hybrid
United States
190K-240K Annually
Senior level
190K-240K Annually
Senior level
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
As a Senior Data Engineer at Jellyfish, you'll build and maintain data pipelines, optimize orchestration, automate CI/CD processes, and enhance data integration while ensuring high performance and reliability.
Top Skills: AirflowBigQueryDagsterDatabricksDbtPrefectPysparkPythonRedisSnowflakeSQLTerraform

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account