Similar Jobs
Insurance
Lead operational excellence for enterprise AI/ML platforms: ensure reliability, scalability, observability, governance, and deployment readiness. Build automation, CI/CD, IaC, monitoring, and model lifecycle processes. Serve as senior escalation for incidents, drive root cause analysis, mentor engineers, and partner across Data, Infrastructure, Security, and Product teams to enable production AI workloads and generative AI adoption.
Top Skills:
Amazon SagemakerAWSAws BedrockCi/CdGithub ActionsInfrastructure-As-CodeJavaJavaScriptJenkinsMavenNode.jsPalantir FoundryPythonServerlessTypescript
Agency • Information Technology
Operate and monitor ML/AI models and agentic systems in production. Build AI observability, logging, tracing, and evaluation pipelines. Monitor LLM outputs, detect drift and model degradation, and maintain data/feature pipelines. Develop CI/CD, model versioning, experiment tracking, and automate alerts and incident remediation while collaborating with data scientists and platform teams.
Artificial Intelligence • Software • Industrial • Manufacturing
Build and operate secure, scalable ML pipelines and hybrid GPU infrastructure for training, fine-tuning, serving, and governing open-source LLMs and agentic apps. Implement CI/CD, model versioning, observability, benchmarking, security controls, and integration with RAG/agent tools and vector DBs to support production deployments across tenants.
Top Skills:
AirflowArgocdBashChromaDagsterDeepspeedDockerDvcEksFaissFsdpGithub ActionsGkeGrafanaHelmHuggingface HubHuggingface InferenceHuggingface TrainerKeycloakKubernetesLambdaLangchainLanggraphLangsmithLlamaindexMlflowModalOpaOpenllm-EvalsOpentelemetryPrefectPrometheusPythonQdrantQloraRagasRayRay ServeRegoSagemakerTerraformTgiTriton Inference ServerVaultVllmWeaviateWeights & Biases
We are an MIT-born, venture-backed Silicon Valley startup building a real-life 'Jarvis'—an AI Copilot for design and manufacturing. Our goal is to utilize advanced AI, physics simulation, and computer graphics to reduce costs and improve engineering productivity across all steps of the design and manufacturing process.
Responsibilities
- Architect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud.
- Define, instrument, and maintain logging, monitoring, and alerting for model performance and data drift.
- Automate CI/CD for ML artifacts and infrastructure using GitHub Actions or equivalent.
- Collaborate with cross-functional teams, including frontend engineers, backend engineers, research engineers, and infrastructure engineers.
- Write clean, well-documented, fast, and maintainable code.
- Help ensure our systems have high availability and performance.
What we're looking for
- BS in Computer Science or a related field.
- 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent.
- Expert-level Python and TypeScripts skills.
- Experience with Docker, Kubernetes, Terraform, and Google Cloud.
- Deep understanding of large language models (LLMs) and prompt-engineering best practices.
- Experience designing and maintaining CI/CD pipelines to fine-tune or train LLM models.
- Excellent written and verbal communication skills.
Bonus Points
- Experience in computer graphics or physics-based simulation.
- Background in setting up Prometheus/Grafana, ELK, or similar monitoring stacks.
- Experience with Vertex AI.
- Experience working with custom Domain-Specific Languages.
Our tech stack
- Google Cloud
- Python, TypeScript
- Protobuf, gRPC
- Next.JS, React.JS
- GitHub Actions
- Docker, Kubernetes, Spinnaker
- PostgreSQL
What you need to know about the Austin Tech Scene
Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.
Key Facts About Austin Tech
- Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
- Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
- Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
- Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center



