Read on for highlights from the conversation, and watch the full webinar on demand here.
In an effort to understand the markers of a productive engineering team, the DORA (DevOps Research and Assessment) group, founded by Dr. Nicole Forsgren, Jez Humble, and Gene Kim, designed their research to answer core questions about DevOps:
Does technology matter for organizations? If so, how can we improve software delivery and engineering performance?
Through rigorous academic and analytical research, the group was able to demonstrate that improving software delivery and engineering performance leads to increased profitability and customer satisfaction. They identified four key metrics, now known as the DORA metrics, which address the areas of engineering most closely associated with success, and established benchmarks, enabling engineering teams to improve performance and balance the speed and stability of their software delivery.
How can organizations get started with DORA metrics and turn those insights into action? Code Climate’s Director of GTM Strategy and Intelligence, Francesca Gottardo, sat down with DORA expert Nathen Harvey to discuss how leaders leverage DORA metrics to improve engineering team health and truly drive change in their organization.
Francesca Gottardo: How did the DORA team choose between metrics that measure the quality of software versus how quickly it was getting shipped?
Nathen Harvey: Delivering technology can enable and accelerate any business. We all want to accelerate the delivery of that technology to enable great customer experiences.
So we look at two metrics: Deployment Frequency, how frequently are you pushing changes out to your users? And Lead Time for Changes, how long does it take for code to go from committed to actually in the hands of your users? The challenge there is that moving fast is good, but not enough, so we also have two stability metrics that complement that. The stability metrics are your Change Failure Rate, what I like to call the ‘Oh, Expletive’ rate — when you push a change to production and someone shouts out an expletive – and Time to Restore, how do we as a team quickly respond and recover when there is an incident or an outage that impacts our users?
That traditional thinking leads us to believe that these two are trade-offs of one another: we can either be fast or we can be stable. But what the data shows us over almost a decade now, is that these two ideas move together in lockstep. There are teams that are good, are good at all four metrics and the teams that are performing poorly are performing poorly across all four metrics as well.
Francesca: Who are DORA metrics for? Are they best suited for a specific type or size of company?
Nathen: We've seen teams of all shapes and sizes using DORA and insights from it successfully. But there were also some challenges there. First, measuring those metrics at an organizational level doesn't really give you a whole lot of value. How frequently does Google deploy? A lot, but what are we going to learn from that?
We really want to look at an application or a service, a particular set of capabilities, if you will, that we deliver to our customers, so first we have to measure at that level. And let's also make sure that we're using or getting insights across the entirety of the team that's responsible for prioritizing, building, deploying, and operating that application or service — it often takes a cross-functional team focused on one application or service.
The technology really doesn't matter. You can use those four metrics to look at how you’re doing with the custom application that you’re building for customers, but you could also do that for the commercial off-the-shelf software that you're using to deliver to your customers, or a SaaS that you're using.
Francesca: Is there a specific type of view or a few specific metrics that a leader of a few teams should look at?
Nathen: From a leadership perspective, I think the best insights you can get from the DORA metrics are just really to understand how your teams are doing. But here's the pitfall there: you're not using this to weigh teams against one another. Instead, what DORA really tries to get at is embracing a practice and a mindset of continuous improvement.
You might want to look across your teams to understand how each team is doing, find those teams that are doing really well, and identify what lessons you can learn from that team. Of course, the context is so important here. If you're shipping a mobile application or if you're working on the mainframe, we can use those same four measurements, but we don't expect the values are going to be the same across those teams. As a leader, I think that there are really good ways to have insight into what sort of investments you need to make in the team, and what sort of support each of your teams need.
Francesca: What are some other common pitfalls you find when people start using DORA metrics?
Nathen: The biggest one is this idea that we have to reach peak performance. Really, the goal is improvement. Don't worry about how other teams are doing. It's nice to have a benchmark to understand where you sort of fit, but the more important thing is, how do you get better? In fact, looking at the four metrics, it's difficult to say, ‘How do I get better? My Deployment Frequency isn't what I want, so I need to get better.’ You don't get better just by mashing the deploy button more frequently. That's not the right approach. The research actually goes a little bit deeper beyond those four key metrics into some capabilities — practices or tools or things that your team does regularly.
The capabilities that the research investigates are technical capabilities like version control, continuous integration, and continuous testing. There are also process capabilities: How much work do you have in-flight at any given time? Maybe shrink down your amount of work-in-progress. What does that change approval look like? Focusing on that change approval process is maybe the thing that's going to unlock value.
Most important of the capabilities are the cultural capabilities. How do the people in your team show up? How do they communicate and collaborate with one another? How are they rewarded? What's incentivized? All of these things really matter, and DORA is really about taking that comprehensive view of what capabilities a team needs to improve in order to drive those four metrics.
Francesca: What is the starting point that you recommend leaders look at?
Nathen: One of the beautiful things about these four metrics is thinking about them holistically. You may want to improve Deployment Frequency, but do you know how you're going to get there? You're going to make your lead times shorter. You're going to make your Change Failure Rate go down and you're going to restore service faster. It doesn't matter which one you focus on; changes are likely going to have good impacts across all four, and we really encourage you to look at all four as a whole.
How do you get started from there? You really then need to go deeper into the capabilities. Start with the capabilities where your teams have a lot of opportunity for growth. It's really about finding your constraint and making improvements there.
Francesca: And you would say as you're measuring that opportunity for growth, it's really relative to the benchmark, correct?
Nathen: Oh, absolutely. Let's say that continuous integration popped up as the thing that you should focus on. Now we have to figure out how we get better at continuous integration. Let's go put some new things in place. Those new things might be new measures, so we can test how well we're doing with continuous integration. There's certainly going to be new practices, maybe even new technologies, but after you've made some of that investment, you have to go back to those four key metrics, back to the benchmarks. Did this investment actually move those metrics in the way that we expected it to?
Francesca: Sometimes leaders can have a hard time getting buy-in for new forms of measurement, or the individual developers on a team have seen a lot of flawed measurement and can be skeptical. How do you suggest that leaders get their teams on board to be measured like this?
Nathen: Yeah, I don't like to be measured either. I get it. I think honestly, the best way to help teams get on board with this is for leaders to share the idea of these metrics and then step out of the way and give the teams the autonomy that they need to make the right choices. If a leader comes to me and says, ‘I'm going to measure your team's performance based on these four metrics,’ that's fine, but what I don't want that leader to do is tell me exactly how to improve those four metrics, because the leader isn't attached to the daily work of our team. But if that same leader says, ‘These are the metrics by which you'll be measured and we want to improve these metrics, what can we do?’ Now, as a team, we’ve been given that trust and the autonomy to select where we should invest and what we should do. A leader's job really is to support that investment, support that learning of the team.
Francesca: How could you ensure that you're comparing kind of apples to apples as you're looking at DORA metrics for teams that may be looking or working in different platforms?
Nathen: You are in fact comparing apples to oranges, and so the thing that I encourage folks to do is celebrate the teams that make the most progress. Maybe you can get to a derivative: This team increased their Deployment Frequency by 10%, this team increased their Deployment Frequency by 50%. Maybe that 50% team went from annual deployments to twice a year, but that's still a 50% improvement, and that's worthy of celebration. I think really looking at the progress that you're making instead of the raw numbers or that sort of benchmark data is the best way to go.
Francesca: One thing I've heard is that it's really important for teams to improve to a higher performance bucket, rather than stay within that bucket.
Nathen: We put out an annual report and people are hungry for benchmarks, and they really want those benchmarks and want to understand how they measure up to peers, to others in the industry. And each year, we do a cluster analysis of those four key measures, and these clusters emerge from the data. We don't set in advance what it means to be a low performer or medium performer. We let the data answer that question for us, but then we have to put labels on those clusters to make them consumable by a leader and by teams, and unfortunately, we use labels like low or medium or high or elite.
Nobody wants to be a low performer. It's not very encouraging to show up to work as a low performer. But I try to encourage folks to recognize that this is not a judgment and maybe just discard the label; it's about that improvement. How are you making progress against that? As you're making changes, you're likely to have some setbacks as well.
In 2020, we did an investigation into reliability practices, and we saw that some teams, as they began their journey changing some of their reliability practices, the reliability of their systems dropped. But over time, as they stayed committed and got more of their teams involved and more of the practices honed within their team, they saw this J curve illustrating impact across the team. So I think the important takeaway there is that this requires commitment. We're asking people to change process and technology. It's going to take some commitment.
Francesca: DORA metrics have had a huge impact in this space and are a popular starting point for taking a data-driven approach. What are your thoughts on how popular they’ve become?
Nathen: It's really exciting for me and for my team, and of course for the researchers, to see that it's had such a lasting and big and expanding impact on the industry. I think that it is important, though, to remember that the research is focused on that process of software delivery and operations. Oftentimes people ask about developer productivity or developer experience. This isn't particularly measuring that, although I would say that a developer is going to have a much better experience knowing that the code that they wrote is actually in the hands of users.
So it's not a direct measure there, it is an outcome of that process. When it comes to any sort of metric that we are looking at, it is important to remember which of these measures are inputs, which of these measures are outcomes. Even something like software delivery as an outcome is an input to organizational performance. It's really important just to understand the full context of the system, which of course includes the people in the system.
Francesca: If you're looking at DORA metrics in a tool like ours, there's also the context available so that you can have those conversations upward and people aren't going to be using that data in the ways that it wasn't intended.
Nathen: Absolutely. And with tools like Code Climate Velocity, you can go beyond those four keys. What are the inputs that are driving that? As an example, what is the quality of the code that's been written? Is it following the practices that we've set within our team? How long does a peer review of this code take? All of these things are really, really important and drive those overall metrics.
Francesca: We've seen that Deployment Frequency really is closely related to PR Size. So that's a great place to look first.
Nathen: Yeah, I think that one in particular is interesting because those four measures really, I think what the researchers really wanted to measure were batch sizes.
But how do we ask you, ‘What's the size of your batch?’ Smaller for you might be a medium for me. So those four metrics can really be used as a proxy to get at batch size and you're going to improve if you make that batch size smaller.
What is the size of our PRs? We can actually look across teams and say, ‘This team that has large PRs, lots and lots of code changes, they tend to go out slower.’ We could also start to look at things like from the time the change is committed, that lead time, what does it do for our Change Failure Rate? We've worked with customers who can pull out data and show us on a dashboard that the longer this change takes to get to production, the higher chance it's going to fail when it reaches production.
Francesca: Can you talk a little bit more about the importance of having metrics be standardized or making sure that Deploy Frequency, for example, means the same thing to everybody in the organization?
Nathen: I think it can be a real challenge, and I think that one of the values of DORA is that it gives us a shared language that we're communicating with one another. Deployment Frequency is a really interesting one. Of course, it's just how frequently you’re pushing changes out to your customers, but then there can be a lot of nuance.
The most important thing there is that you have consistency over time within a team. And then the second most important thing is across an organization as you're looking across teams, even if you can't get to a consistent definition, at least you can publish or write down and probably store in version control, how are we measuring this thing so that it's clearly communicated across those teams?
Francesca: So how do people connect something like Deployment Frequency and Lead Time to higher levels of work? For example, a story, a project, or business feature where you're really delivering that end-user value.
Nathen: DORA metrics are really focused on this idea of software delivery. We are looking only at code commit through to code deploy, but of course there's a lot of stuff that happens before we even get to code commit. Teams want to know things like feature velocity. How fast am I able to ship a feature? That's a different question than ‘How fast am I able to ship a change?’ because a feature is likely multiple changes that get rolled up together. This is where other metrics frameworks, like the flow metrics, might start to come in, where we look at a broader view of that entire value chain. And I think that it can be very difficult. Is a feature brand new thing that we're launching from fresh, or is a feature changing the location of this particular button? They're both features, they both have very different scales. One of the reasons that the DORA research really focuses on that software delivery process is it gives us a little bit more sort of continuity. A change, is a change, is a change, is a change. If we're shipping a change, it should follow the same process. There should be less variability in the size of that or in the duration of how frequent or how long that takes.
Francesca: A lot of the questions that we've been getting from the audience are more about digging deeper into the context of each of these metrics because they're very big picture outcomes. So let's take Mean Time to Recovery, for example. How do you suggest digging into this one?
Nathen: I think the best thing to do is look at something that just happened. So let's say you just had an incident or an outage, something that you recovered from. First and foremost, make sure you're recovered, make sure your users are happy again. Now that we're there, let's take some time to learn from that incident or from that outage. And that's where the investment really starts to take place.
One of the first things we have to do is go talk to people. We have to understand your mindset during this incident or outage, and really try to unwind what led to this, not in a way that we're looking for what things we should blame, but instead just to get a better understanding of the system overall. Let's ask really good questions, and involve the right people in those conversations.
Francesca: Your research is in the abstract, but when you’re seeing DORA metrics practices in actuality, what were some surprises?
Nathen: What I so often see is a thing that you mentioned earlier: it really comes down to the process and the people. The people really matter.
The truth is that as we're trying to change culture and the way that teams and people show up at work, it is oftentimes the case that you have to change how they work to change how they think. Technology and culture are kind of stuck together. You can't just change out a tech and expect that the culture's going to change, nor can you just change the culture and expect that the technology's going to follow. These two amplify and reinforce one another. I think that we're reminded again and again and again that there is not a magic wand. There are no silver bullets. This takes consistent practice. It takes commitment and it takes looking at the entire system if you want to improve.
Francesca: If the need and desire for measuring DORA isn't coming from leadership, how do you suggest a team goes about implementing it?
Nathen: In my opinion, these measures matter for the teams that are building the software. And in fact, I don't mind if a leader isn't pushing me to measure DORA metrics. What I really want is for the team of engineers, the team of practitioners that come together to ship that software, to care about these metrics. Because at the end of the day, the other thing that we know is that these teams are more productive when they're able to hit those metrics. And a productive team is a happy team.
I often ask this question: is a happy team productive or is a productive team happy? I think the answer is yes, right? As an engineer, when I'm productive, when I'm able to be in the flow of my work and get fast feedback on the work that I'm doing, that makes me happier. I have better days at work. There's even research from GitHub that looks exactly at that. What does a good day look like? It's when that engineer, that developer is in the flow doing the work that they love to do, getting that fast feedback. So these metrics really matter for the team.
Francesca: It's what we see in how people use our tool as well, and that so often people or customers will come to us either wanting to fix some inefficiencies in their SDLC or wanting to improve team health, and they may be one and the same when you're really looking at the big picture.
Nathen: Absolutely. I don't know of any CEO who has come to a team and said, ‘Wow, you've deployed more frequently this year. Congratulations. Here's a big bonus.’ The CEO cares about the customer. These metrics can help reinforce that. As technologists, it's easy for us to get caught up in the latest, greatest new technology, this new microservices framework, this new stuff from AI. But at the end of the day, we're here to deliver value to our customers and really understand what they need out of this. That's what our CEO cares about. Frankly, that's what we should care about. We're using technology to further those goals and to keep our customers happy.
Francesca: In the polls, 26% of people said that they have philosophical or cultural barriers to implementing DORA. If leadership doesn't see it as a priority, how can managers still motivate the team?
Nathen: I think that one approach that is successful is to — and this pains me a little bit — stop using the word DORA. Stop using the word DevOps. Don't talk about those things, don't talk about those labels. Turn to a more curious mindset and a questioning mindset. What would it be like for our customers, for our organization, for our team, if we were able to deliver software faster? Imagine that world because we can do that. We can get there and do that in a safe way. Imagine or ask the question, ‘What happens if we don't improve? What happens if we stay stagnant and our practices aren't improving?’ And really start at digging into some of those questions to find that intrinsic motivation that we have. As an engineer, like I said, I want to ship more things. I want to get feedback from that faster. I want to do the right thing for my customers.
Francesca: What is the future of DORA? Are there additional metrics you think of exploring or avenues outside of stability, outside of speed, that you think are important to include in the future?
Nathen: Yeah, absolutely. I think this is really where we get back to those capabilities. Number one, we will continue this research and continue our ongoing commitment to that research being program and platform-agnostic, really trying to help teams understand what capabilities are required to drive those metrics forward. Number two, like you mentioned earlier, we're seeing more and more teams that are trying to use the DORA framework and the metrics and the research to drive those improvements. In fact, we've recently launched dora.dev as a place where the community can start to come together and learn from one another and collaborate with one another and answer these questions.
Don't ask Nathen, let's ask each other. Let's learn from each other and the lived experiences of everyone here. And then, of course, I mentioned that the research will continue. So stay tuned. In the next couple of months, we'll launch the 2023 state of DevOps survey. And the best thing about the survey is the questions that it asks. And I think that teams can really get a lot of value by carefully considering the questions that are posed in the survey. Just by looking at those questions, a team is going to start to identify places where they could make investments and improvements right now.
Francesca: Something that is going to stick with me from what you've said today is that the benchmarks that are standardized may not be what we should all be spending so much time on. It's really about how we compare to ourselves and where we can improve.
Nathen: It is that practice and mindset of continuous improvement. Now, look, the benchmarks are still important because we have to have dashboards, we have to have ways to report, we have to have ways to test that what we're doing is making things different. Whether that's good, better, or worse, at least these benchmarks give us a way to test that and prove out the theories that we have.
To learn how your engineering team can implement DORA metrics to drive improvement, speak to a Velocity product specialist.
Subscribe to our newsletter for more engineering intelligence.