Flex

Staff Infrastructure Engineer

Reposted 2 Days Ago

Remote

Hiring Remotely in U.S.

204K-300K Annually

Senior level

Remote

Hiring Remotely in U.S.

204K-300K Annually

Senior level

The Senior Staff Infrastructure Engineer will lead infrastructure design, build scalable cloud solutions, manage SRE practices, and drive automation efforts. Responsibilities include defining technical strategies, enhancing developer experience, and communicating with leadership.

The summary above was generated by AI

Flex is a growth-stage, NYC headquartered FinTech company that is creating the best rent payment experience. It’s hard to believe that it’s 2026 and paying rent on time is expensive, inflexible, and difficult. We’re here to change that! Flex enables our users to pay rent throughout the month on a schedule that better fits their finances and budget. Our mission is to empower as many renters as possible with flexibility over their most significant recurring expense. After deliberately keeping a stealth profile as we built up unprecedented investor support and an enthusiastic user base, we are looking for motivated individuals to help us keep our mission growing. Will you be a part of the team?

About the role

Flex is looking for a Staff Infrastructure Engineer to lead technical direction across our infrastructure platform, setting the strategy for how we build, deploy, and operate reliable systems at scale.

In this role, you will lead projects within the Infrastructure Engineering team, partnering with engineering leaders across the org to align infrastructure investments with business priorities and drive the standards and practices that raise the bar for the wider engineering department. You will shape how we approach reliability, developer experience, and infrastructure as code, and you will be expected to identify when current approaches are not working and redirect effort toward higher-impact outcomes.

At Flex, we are an AI-first engineering organization. We use AI-assisted tools to move faster and improve quality, while maintaining strong human ownership of correctness, security, and reliability. We’re looking for engineers who combine strong technical judgment with practical, execution-focused delivery.

We are particularly interested in candidates with software engineering experience in languages like Java, Python, or TypeScript. This background helps you collaborate effectively with product and platform teams, build internal tooling, and improve developer experience.

This remote role requires a minimum of 8 years of cloud infrastructure experience.

What you’ll do

Lead infrastructure project teams across multiple domains (reliability, developer experience, cloud platform), providing technical direction, maintaining project plans, and keeping leadership and cross-functional stakeholders informed of progress, risks, and tradeoffs.
Partner with engineering leaders and peer Staff+ engineers across the org to align infrastructure strategy, align technical investments with business goals, and provide authoritative technical scope for cross-functional initiatives.
Architect and deliver large, complex infrastructure systems, designing for scale, reliability, and operational simplicity. Drive decisions on build-vs-buy, technology selection, and migration strategy for the domains you lead.
Define and evolve Flex's infrastructure-as-code strategy, including Terraform module architecture, governance standards, and safe rollout patterns. Introduce new IaC tooling or frameworks when existing approaches no longer serve team needs, and drive adoption across engineering.
Lead strategic reliability improvements across services you work with, defining SLI/SLO frameworks with partner teams, delivering net-new ways to measure and communicate operational health and customer impact, and driving sustained reliability gains rather than one-off fixes.
Shape the developer platform strategy, identifying the highest-leverage investments in self-service tooling, CI/CD, and deployment automation. Set the quality bar for developer-facing infrastructure and ensure the team ships tooling that meaningfully accelerates engineering velocity.
Design cross-service observability architectures (metrics, logs, traces) with clear operational standards. Lead strategic alerting and runbook improvements that reduce mean-time-to-detect and mean-time-to-resolve across the org.
Drive systemic incident resilience: lead cross-team infrastructure incident response, identify recurring failure patterns, and own the follow-through that turns post-incident findings into durable infrastructure improvements. Proactively refocus team efforts when reliability projects are off-course or not delivering meaningful risk reduction.
Build engineering rigor into team processes, improving design review standards, deployment checklists, operational readiness criteria, and code quality practices. Set a high bar and coach the team to consistently meet it.
Design AI-assisted workflows for your team: identify high-leverage opportunities where AI tooling can remove bottlenecks or enable previously infeasible work. Set guardrails for responsible AI use in infrastructure operations, evaluate emerging AI capabilities, and coach engineers on developing strong AI judgment.

Key qualifications

8+ years of hands-on infrastructure engineering experience in production environments, with at least 2 years operating at a senior or staff level, including leading technical projects, setting direction for other engineers, and making architecture-level decisions.
Deep experience architecting, operating, and scaling infrastructure on AWS, with demonstrated depth across several of: EKS, S3, RDS, API Gateway, VPC, Load Balancers, Lambda, DocumentDB, DynamoDB. GCP experience is a plus.
Track record of defining infrastructure-as-code strategy at scale, including Terraform module architecture, governance patterns, and driving adoption of IaC standards across teams.
Strong Kubernetes and container platform experience, including designing cluster architectures, managing multi-tenant workloads, and operating production microservice deployments at scale.
Proven ability to design and improve CI/CD systems (GitHub Actions preferred) with a focus on deployment safety, velocity, and developer experience. Evidence of introducing new tooling or processes that measurably improved deployment outcomes.
Experience designing observability architectures for distributed systems (metrics, logs, traces) and using observability data to drive reliability improvements. Datadog experience is a plus.
Solid networking knowledge (DNS, load balancing, firewalls, VPNs, service mesh, service-to-service connectivity) and experience applying it to solve cross-service infrastructure problems.
Strong technical communication and influence skills: ability to write clear technical strategy documents, present architecture decisions to leadership, explain complex tradeoffs across teams, and align stakeholders on technical direction.
Proficient in at least one of Java, Python, or TypeScript, with demonstrated code review practice and a track record of raising code quality standards through review feedback and tooling.
Demonstrated leadership mindset: leads project teams end-to-end, proactively identifies and redirects off-course work, builds engineering rigor into team processes, and takes ownership of outcomes beyond individual deliverables.

Flex takes a market-based approach to pay, and compensation may vary depending on your primary work location. Work locations are categorized into one of three tiers based on a cost of labor index for that geographic area. The successful candidate’s starting pay will be commensurate with their experience, qualifications, and Flex’s internal leveling guidelines and benchmarks.

Tier A (NYC/SF/Seattle): $200,000-$250,000 USD
Tier B: $180,000- $225,000 USD
Tier C: $170,000-$212,000 USD

#LI-Remote

Life at Flex

We understand that it takes a diverse team of highly intelligent, curious, determined, empathetic, and self aware people to grow a successful company. Our HQ is located in New York City, but we have employees located throughout the US, Australia, Canada and South America. We are growing quickly, but deliberately, with a focus on building an inclusive culture. Our dynamic team has incredible perspectives to share, just as we know you do, and we take great pride in being an equal opportunity workplace.

Offices

Roles posted in New York, San Francisco, and Salt Lake City are hybrid positions with on-site expectations of 2-3 days per week in our local offices. For candidates outside of these areas, you may be eligible for our relocation assistance program.

Benefits

For full-time U.S. employees we offer:

Competitive medical, dental, and vision
Company equity
401(k) plan with company match
Unlimited paid time off + 13 company paid holidays
Parental leave
Flex Cares Program: Non-profit company match + pet adoption coverage
Free Flex subscription

For full-time non-U.S. employees, we offer:

Competitive compensation + company equity
Unlimited PTO

Similar Jobs

Coinbase

Site Reliability Engineer

8 Days Ago

Easy Apply

Remote

USA

Easy Apply

218K-257K Annually

Expert/Leader

218K-257K Annually

Expert/Leader

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Staff Site Reliability Engineer will lead AI-driven innovations, automate cloud infrastructure, implement CI/CD frameworks, and maintain operational IT support at Coinbase.

Top Skills: AnsibleAWSBashChefCi/CdDockerGitGoKubernetesPuppetPythonRubySaltTerraform

ServiceNow

Machine Learning Engineer

14 Days Ago

Remote or Hybrid

173K-303K Annually

Mid level

173K-303K Annually

Mid level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The Staff Machine Learning Engineer will develop VoIP infrastructure, integrate AI features into telephony systems, and mentor colleagues while ensuring performance and scalability.

Top Skills: AnsibleFreeswitchGoHelmJavaKamailioKubernetesPrometheusPythonRtpRtpengineSipSplunkVoip

ServiceNow

Machine Learning Engineer

14 Days Ago

Remote or Hybrid

173K-303K Annually

Mid level

173K-303K Annually

Mid level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The Staff Machine Learning Engineer focuses on building VoIP infrastructure and telephony platforms, collaborating with teams and ensuring platform performance, while mentoring colleagues and managing software development best practices.

Top Skills: AnsibleFreeswitchGitlab CiGoHelmJavaKamailioKubernetesPrometheusPythonRtpRtpengineSipSplunkVoip

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center