The DevOps Sages Weigh In: How to Build the Best Toolchains

Three Austin tech engineers share their picks

Written by Erik Fassnacht
Published on May. 24, 2021
The DevOps Sages Weigh In: How to Build the Best Toolchains
Brand Studio Logo

We’ve all been there: standing in a fluorescent-lit aisle or browsing endless rows of branded products online, stopped in our tracks because of the sheer amount of choices on display. 

The growth of options is not imagined. In the 1970s, for instance, the average grocery store had around 9,000 products available. By 2008, there were 47,000. In his book “Future Shock,” Alvin Toffler coined the term “overchoice” to describe the indecision that comes from such a cornucopia of possibilities. 

Now that feeling of overchoice has reached the DevOps industry, where the increasing number of tools can create a clear sense of uncertainty for engineers. Which works best for each project, team and company? Moreover, how will the tools integrate and evolve over time?

Rather than succumb to stasis, Built In Austin let local DevOps pros share their secrets on how to skip the internal 504 code and zero in on the most helpful tech — while reducing the overchoice for other engineers along the way.

 

 

At AffiniPay, a fintech platform for professional service markets, Kris Bushover and Justin Meacham work together to maintain the company's DevOps system and introduce new tools as needed. They believe that Terraform, along with a number of other tools, can unlock greater efficiency and success.

 

Give us a brief glimpse into your DevOps toolchain. What are a few of your favorite tools your team is currently using?

Kris Bushover, DevOps and infrastructure manager: We use a fairly standard set of tools in our environment. We rely heavily on infrastructure-as-code, and HashiCorp’s Terraform is our tool of choice for that role. Almost all of our cloud infrastructure is provisioned and configured via Terraform, and we have a rule that if something is done via the AWS Console or Command Line Interface, then that change gets ported into Terraform. We also use Terraform to manage our HashiCorp Vault infrastructure, as well as our Datadog observability platform. For configuration management and remote execution, we rely on Ansible. 

In addition to both of those, we have custom tooling written in Bash, Ruby, Python and Go for everything from Kubernetes deployments to problem remediation. Most of our applications publish code via our continuous integration pipeline in the form of a Docker image and a corresponding Helm chart, which is what gets deployed in our test, staging and production environments. Packaging application code and configuration in this manner ensures that we are shipping the exact same code in all environments to establish consistency and quality from development to production.


What were some of the key considerations when evaluating and choosing the tools your team would use? 

Bushover: When choosing a new tool, first we evaluate the degree to which the tool fits our use case. For example, in the Terraform case, we knew we needed to utilize infrastructure-as-code, and we knew that we needed to manage more than just AWS. So we determined Terraform was a very good solution because of the high degree of fitness for purpose.

Of course, cost is always a concern as well. An important factor here, though, is taking into account not just the upfront or ongoing dollar costs, but also the intangible costs associated with integrating a tool into the environment. This can be something like how well the tool can be integrated into Terraform for provisioning and configuration, or how much training will be required to get engineers comfortable with the tool.

Another priority we have is continuously refining our toolset so we can be as lean and efficient as possible in streamlining the development pipeline. With the breadth of technology available today, the challenge is balancing this simplicity while ensuring that we continuously evaluate the need for incorporating new technologies as the industry evolves.

 

When choosing a new tool, first we evaluate the degree to which the tool fits our use case.


How has your DevOps toolchain evolved over time, and why? 

Justin Meacham, staff software engineer and infrastructure lead: Like most software companies, we started with a very simple set of requirements and built that with minimal resources. In the beginning, we used EC2 components configured using the AWS console/CLI, along with corresponding dependencies like VPCs and S3 buckets. We shipped code to instances using things like SCP and SSH and adjusted other components in a one-off fashion to meet minimum requirements. This was less than ideal as test and production instances had a tendency to introduce drift and could not be easily reproduced.

AWS was evolving rapidly, as were tools like Terraform and Ansible, and as more members joined our team, we developed a need for a more streamlined process to quickly get features into production. Additionally, more team members and more code being shipped led to more granular security needs. This led us to incorporate Terraform to configure our EC2 instances, S3 buckets, IAM roles and policies, load balancing, and other networking components. The more we incorporated Terraform as a management tool of our upstream cloud resources, the more we were able to have consistency and predictability from development, test and eventually production processes.

Brad Hein
DevOps Manager • SailPoint

 

Brad Hein is a DevOps manager at SailPoint, a platform for security solutions in the cloud enterprise. He believes that a handful of important tools can increase efficiency and success in a variety of areas, and let us in on a few of his top choices.

 

Give us a brief glimpse into your DevOps toolchain. What are a few of your favorite tools your team is currently using?

Our DevOps toolchain is continually improving, so I’ll divide this into a few different areas. For monitoring metrics, we retrieve service metrics using Prometheus and Cortex and AWS infrastructure metrics using CloudWatch metrics. We use Grafana to visualize those metrics and create dashboards. For logging, we use the EFK stack of Elasticsearch, Fluentd and Kibana. For deployments, we use Jenkins pipelines to deploy to production Elastic Container Service and ArgoCD to deploy to Elastic Kubernetes Service. We also use a tool called LaunchDarkly to enable or disable features for our SaaS tenants. For infrastructure creation, we use Terraform and Terragrunt. For alerting, we use PagerDuty. Finally, we have developed some custom tools. For example, we created a tool we named Harbinger to automatically collect heap dumps when memory usage in a container is near the limit.

Inside the DevOps toolchain at SailPoint

  • Monitoring: Prometheus, Cortex, CloudWatch, Grafana
  • Logging: Elasticsearch, Fluentd, Kibana
  • Deployments: Jenkins, Elastic Container Service, ArgoCD, LaunchDarkly
  • Infrastructure Creation: Terraform, Terragrunt
  • Alerting: PagerDuty

What were some of the key considerations when evaluating and choosing the tools your team would use? 

Since multiple teams use these tools, we look for tools that will empower both our development and DevOps teams. When we have a candidate tool, we generally will do an initial proof of concept (PoC) to learn how it functions and how we expect it to work at SailPoint. We use the PoC to determine the benefits of the tool, how easy it is to operate and how much it costs to run. When the PoC has good results, we have an internal request for comments (RFC) process where we describe the tool, our motivation for using it, how it solves problems we have and how we plan to implement and use it. We post the RFC as a GitHub pull request and open it for comments; then we conduct an RFC review meeting. After an RFC is approved and we’ve decided on using a tool, we start in our development environment, fix any issues we discover there and promote it to production.

 

The work we’re doing allows us to release quickly, release often and release safely, so our customers continually have a better product.

 

What has been the key benefit your team has seen from building out and maintaining a healthy toolchain? 

Ultimately, the work we’re doing allows us to release quickly, release often and release safely, so our customers continually have a better product. It improves our customer's experience by giving us the insight to improve our services and infrastructure. We have a weekly TechOps meeting where our engineering development and DevOps teams use our toolchain to report service health and metrics. We all use Prometheus and Grafana and our CloudWatch metrics to improve our product services and infrastructure. That collaboration allows us to work across teams to prevent, find and fix issues quickly. When we release an improvement or fix an issue, our deployment pipeline will enable us to send it to production with confidence that it passes our tests. We use feature flags heavily so we can enable features independently from releases. One way that plays out is to do a planned rollout of a feature and roll back if we notice issues in our metrics. That allows us to introduce features safely. 

Responses have been edited for length and clarity. Images via listed companies and Shutterstock.

Hiring Now
Atlassian
Cloud • Information Technology • Productivity • Security • Software