webAI Logo

webAI

Staff DevOps Engineer

Reposted 23 Hours Ago
Hybrid
Austin, TX, USA
Senior level
Hybrid
Austin, TX, USA
Senior level
Architect and build secure, scalable cloud and edge infrastructure for AI workloads. Implement IaC, Kubernetes with GPU support, CI/CD with security controls, observability, MLOps pipelines, disaster recovery, incident response, and mentor engineers while driving infrastructure standards and automation.
The summary above was generated by AI

About Us:

webAI is pioneering the future of artificial intelligence by establishing the first distributed AI infrastructure dedicated to personalized AI. We recognize the evolving demands of a data-driven society for scalability and flexibility, and we firmly believe that the future of AI lies in distributed processing at the edge, bringing computation closer to the source of data generation.

Our mission is to build a future where a company's valuable data and intellectual property remain entirely private, enabling the deployment of large-scale AI models directly on standard consumer hardware without compromising the

information embedded within those models. We are developing an end-to-end platform that is secure, scalable, and fully under the control of our users, empowering enterprises with AI that understands their unique business.

We are a team driven by truth, ownership, tenacity, and humility, and we seek individuals who resonate with these core values and are passionate about shaping the next generation of

AI.

About the Role:

We are seeking a Staff DevOps Engineer to architect, build, and scale secure infrastructure for deploying AI workloads across cloud and edge environments. This is a high-impact, staff-level individual contributor role where you will drive infrastructure strategy, lead technical initiatives, and serve as the subject matter expert on cloud architecture, security best practices, and platform reliability.

You will design scalable, automated infrastructure solutions that enable our AI platform to operate efficiently across diverse deployment scenarios—from public cloud to on-premises and edge computing environments. This role requires deep technical expertise, architectural thinking, and the ability to translate complex requirements into production-ready infrastructure automation.

Responsibilities:

  • Design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud (AWS, Azure, GCP) and hybrid environments

  • Build and maintain production-grade Infrastructure as Code (IaC) using Terraform, Ansible, or Pulumi, managing 100+ resources with GitOps workflows and automated validation

  • Design and operate production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization

  • Implement secure CI/CD pipelines with integrated security controls (SAST, DAST, vulnerability scanning, secrets management) and automated deployment workflows for containerized AI models

  • Lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift

  • Design comprehensive observability and monitoring using Prometheus, Grafana, ELK, or Datadog with distributed tracing, APM, and real-time alerting aligned to SLIs/SLOs

  • Implement security best practices including least-privilege access, encryption at rest/in transit, network segmentation, and automated compliance validation

  • Lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability

  • Architect disaster recovery and business continuity strategies with automated backup, failover, and recovery processes

  • Develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns across teams

  • Mentor mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance

  • Drive technical documentation and knowledge sharing including runbooks, architecture decision records (ADRs), and infrastructure standards


Qualifications:

  • 7+ years of hands-on experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering with proven track record of architecting production systems

  • Expert-level proficiency with Docker, Kubernetes (CKA/CKAD preferred), and cloud-native technologies in production environments

  • 5+ years implementing Infrastructure as Code with Terraform, Ansible, or Pulumi, managing large-scale (50+) cloud resources

  • Deep experience with cloud platforms (AWS, Azure, or GCP) including compute, networking, storage, and managed services

  • Proven experience building and scaling CI/CD pipelines with integrated security controls (GitHub Actions, GitLab CI, Jenkins, ArgoCD)

  • Strong programming skills in Python (preferred for automation), Bash, or Go for infrastructure tooling and automation

  • Production experience with observability and monitoring tools: Prometheus, Grafana, ELK, CloudWatch, Datadog, or similar

  • Experience with MLOps workflows: model deployment automation, versioning, and lifecycle management

  • Demonstrated experience with GitOps methodologies and declarative infrastructure management

  • Strong understanding of security best practices: encryption, secrets management, identity and access management (IAM), network security

  • Excellent written and verbal communication skills for technical documentation and cross-functional collaboration

Preferred Skills:

  • Experience architecting multi-cloud or hybrid cloud environments with portability and interoperability considerations

  • Hands-on experience deploying large language models (LLMs) or transformer models at scale with model serving infrastructure

  • Expertise in Zero Trust architecture and modern security patterns for cloud-native applications

  • Experience with service mesh technologies (Istio, Linkerd) for microservices communication and observability

  • Strong understanding of AI/ML infrastructure: feature stores, model registries, A/B testing infrastructure, and model monitoring

  • Experience with edge computing deployments and distributed system architectures

  • Cost optimization expertise: FinOps practices, resource rightsizing, and cloud cost management

  • Experience mentoring or leading technical initiatives across engineering teams

  • Certifications: CKA, CKAD, Terraform Associate, AWS Solutions Architect, Azure Administrator, or GCP Professional Cloud Architect

Core Values:

We at webAI are committed to living out the core values we have put in place as the foundation on which we operate as a team. We seek individuals who exemplify the following:

  • Truth - Emphasizing transparency and honesty in every interaction and decision.

  • Ownership - Taking full responsibility for one’s actions and decisions, demonstrating commitment to the success of our clients.

  • Tenacity - Persisting in the face of challenges and setbacks, continually striving for excellence and improvement.

  • Humility - Maintaining a respectful and learning-oriented mindset, acknowledging the strengths and contributions of others.

Benefits:

  • Competitive salary and performance-based incentives.

  • Comprehensive health, dental, and vision benefits package.

  • 401k Match (US-based only)

  • $200/mos Health and Wellness Stipend

  • $400/year Continuing Education Credit

  • $500/year Function Health subscription (US-based only)

  • Free parking, for in-office employees

  • Unlimited Approved PTO

  • Parental Leave for Eligible Employees

  • Supplemental Life Insurance

webAI is an Equal Opportunity Employer and does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We adhere to these principles in all aspects of employment, including recruitment, hiring, training, compensation,

promotion, benefits, social and recreational programs, and discipline. In addition, it is the policy of webAI to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works.


Top Skills

Ansible
Argocd
AWS
Azure
Bash
Cka
Ckad
Cloudwatch
Datadog
Docker
Elk
GCP
Github Actions
Gitlab Ci
Gitops
Go
Grafana
Istio
Jenkins
Kubernetes
Linkerd
Prometheus
Pulumi
Python
Terraform
HQ

webAI Austin, Texas, USA Office

515 Congress Ave, Austin, Texas, United States, 78701

Similar Jobs

10 Days Ago
In-Office
150K-220K Annually
Senior level
150K-220K Annually
Senior level
Aerospace • Artificial Intelligence • Machine Learning • Robotics • Software
As a DevOps Engineer, you'll manage CI/CD systems, enhance developer productivity, automate processes, and integrate testing within pipelines, ensuring operational efficiency.
Top Skills: AnsibleAzure DevopsBashDockerGithub ActionsGrafanaJenkinsKubernetesPowershellPrometheusPythonTerraform
4 Days Ago
In-Office or Remote
4 Locations
153K-262K Annually
Senior level
153K-262K Annually
Senior level
Fintech • Payments
The Staff Software Engineer coordinates engineering activities, leads projects, mentors others, ensures high-quality technical solutions, and represents the company externally.
Top Skills: AWSBashCdkDatadogDockerEksGithub ActionsGithub EnterpriseGoPythonTerraform
13 Days Ago
In-Office or Remote
2 Locations
170K-292K Annually
Senior level
170K-292K Annually
Senior level
Fintech • Payments
The role involves making technical decisions, establishing processes, mentoring engineers, and leading cloud infrastructure initiatives at Venmo.
Top Skills: AWSBashCdkDatadogDockerEksGithub ActionsGithub EnterpriseGoPythonTerraform

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account