Next-Generation Ops: A FutureTalk With Google Cloud Platform’s Kelsey Hightower

by New Relic
September 26, 2016

By Christian Sinai

When Kelsey Hightower first entered the ops world, the “coolest” thing you could do was to deploy a server: Configure it, harden it, get it ready for use, write a bunch of scripts to monitor it in production. And that’s about it.

As Hightower began thinking about the future of ops in a software world increasingly powered by cloud, containers, and other modern technologies, he realized that ops should no longer just be about managing servers. Maybe you love writing Nagios scripts, but Hightower would rather that not be his full-time job. He thinks he—and plenty of other sysadmins and ops folks—can provide a lot more value elsewhere.

“How” to do that is the foundation of Hightower’s recent FutureTalk presentation at New Relic’s Portland, Ore., engineering headquarters: “Kubernetes Abstractions: Building Next Generation Automation Tools.” Hightower is a developer advocate for Google Cloud Platform and an avid proponent of containers and distributed systems, including Kubernetes, Google’s open source container orchestration platform.

LISTEN: Interview with Kelsey Hightower on The New Stack @ Scale Podcast 

Of course, someone still needs to keep the servers up and running. But what if we replaced “someone” with something? Hightower explores how ops can use new platforms and abstractions—including Kubernetes—to build the tools it needs to evolve beyond the server maintenance game. Hightower shares examples of the kinds of tools ops can build with these new abstractions—and how those examples provide patterns for all kinds of other uses.

Building declarative, responsive systems

One of the greatest opportunities these new platforms and abstractions provide, according to Hightower, is reducing the inefficiency and manual effort that comes with necessary but painful operational tasks. A use case in point: Implementing and managing security certificates for your HTTP endpoints. That was a particular headache prior to Let’s Encrypt, and it remains a labor-intensive chore today when done manually. Tracking and remediating expiring certificates alone, for example, can be a bear, especially at scale, and not necessarily the best use of ops’ time.

Kelsey Hightower, Developer Advocate, Google

Managing TLS certificates is a great example: You can do it in a node-specific manner by writing shell scripts and so forth, but that doesn’t mean you should: “Too much work,” Hightower scoffs, especially once you move into environments running thousands of machines. “What we want to do is declare to the system that the certs must be there and anything that needs to use the certs should just declare that they want to use the certs. That way we don’t pin ourselves to an individual machine, and this is critical to building some of these next-generation tools. We have to decouple ourselves from the node. Right now, all of our tools are very node-centric. They assume we’re going to do a deployment to a node. We have to remove that.”

Hightower’s talk walks through building a tool, “kube-cert-manager,” for managing Let’s Encrypt certificates for a Kubernetes cluster. He also shares the code behind the tool via GitHub.

Ops nirvana: optimal resource utilization

Hightower’s kube-cert-manager establishes a model for other tools that use similar abstractions, such as a watch pattern (for ensuring that the system grabs data only when it’s actually needed for an event to happen) or a control loop (for reconciliation throughout the cluster). He also demos a scheduling tool to help automate another pressing challenge for many ops teams: How do you ensure you’re using your resources efficiently by matching the right workloads with the right machines?

“Just placing things on nodes based on memory and CPU is not going to be enough, especially because every company is different,” Hightower says. This becomes critical as more and more organizations move to the cloud. Given the varying costs per machine on most cloud platforms, resource optimization is crucial for managing the budget. You don’t want to run a small web server, for example, on an expensive GPU in the cluster. That’s simply wasting resources on something that can be used for a “higher-order purpose,” Hightower notes.

“We need to build something a little bit smarter to handle this for us,” he says. And in the video below, you can watch him walk through how to do exactly that with Kubernetes, ensuring that workloads are assigned to the cheapest available machines before moving up to more expensive resources.

 

Don’t miss our next FutureTalk

For more information about our FutureTalks series, make sure to join our Meetup group, New Relic FutureTalks PDX, and follow us on Twitter @newrelic for the latest developments and updates on upcoming events.

Note: Event dates, participants, and topics are subject to change without notice. 

About the Author

Christian Sinai manages the FutureTalks speaker series at New Relic. He is a former architect (emphasizing sustainable design), and tech startup entrepreneur. View posts by Christian Sinai.

Jobs at New Relic

Austin startup guides

LOCAL GUIDE
Best Companies to Work for in Austin
LOCAL GUIDE
Coolest Tech Offices in Austin
LOCAL GUIDE
Best Perks at Austin Tech Companies
LOCAL GUIDE
Women in Austin Tech