2019 | Pretty Agile

If you do nothing else before you launch your Agile Release Train (ART) baseline your metrics! At some point, in the not too distant future, you are going to be asked, how do you know your Agile Release Train is making a difference? For you the answer might be obvious - it just feels better. It was very much that way for me with my first ART. Metrics weren’t the first indicator that things were getting better, it was the changes in behaviour.

When I first took over the EDW delivery organisation, my days were spent dealing with escalations, trying to drum up work for my teams and trying to stem the tide of staff exits. I knew SAFe was making a difference when my phone stopped ringing off the hook with escalated complaints about not delivering, demand started to increase and people were queuing up to join the team not exit it! The other really telling behaviour change was when our sponsors lost interest in holding monthly governance meetings. Apparently if you are delivering on your commitments governance meetings are less interesting to executives! That reminds me of perhaps the most obvious observable change that gave me confidence that we were getting better - the delivery of working software! In the context of my first train this was nothing short of a miracle!

While the changes in behaviour I observed were enough to convince me we were making a difference, management will always want metrics. As Jeffrey Liker said in his book The Toyota Way: “It is advisable to keep the number of metrics to a minimum. Remember that tracking metrics takes time away from people doing their work. It is also important at this stage to discuss the existing metrics and immediately eliminate ones that are superfluous or drive behaviours that are counter to the implementation of the lean future state vision.”

Below I have captured some of the metrics I like to use when launching Agile Release Trains, some of which you may recognise as also being recommended in the Scaled Agile Framework article on Metrics.

Employee Net Promoter Score (eNPS)

As I have written about previously, I first came across this metric when reading up on the Net Promoter System (NPS). NPS is a customer loyalty measurement identified by Fred Reichheld and some folks at Bain. When understanding drivers of customer loyalty they determined that: “Very few companies can achieve or sustain high customer loyalty without a cadre of loyal, engaged employees.” Employee NPS is measured by asking the question: “On a scale of 0 to 10, where 0 is not at all likely and 10 is extremely likely, how likely are you to you recommend working on [insert ART name] to a friend of colleague?” Those who answer 9 or 10 are classified as promoters, those who respond 7 or 8 are classified as passives and those who respond with a 6 or below are classified as detractors. The NPS score is calculated by subtracting the percentage of detractors from the percentage of promoters. You should be expecting eNPS to increase as a result of launching your ART.

Stakeholder Net Promoter Score (NPS)

This is my take on NPS for the stakeholders of your ART(s). We use the same approach as outlined about eNPS, but this time the question is “On a scale of 0 to 10, where 0 is not at all likely and 10 is extremely likely, how likely are you to you recommend the delivery services of [insert ART name] to a friend of colleague?” You should also expect to see this go up over time.

Cycle Time

Once you have your ART(s) up and running you should be able to capture cycle time for features, where cycle time is calculated as the total processing time from the beginning to the end of your process. When we launch ARTs, we usually create a Program Kanban system to visualise the flow of feature through the ART. If you track the movement of the features through the Kanban this will give you cycle time data. You should expect to see this decrease once your ART has been up and running for a while.

Baselining cycle time might be challenging as you probably don’t have epics or features as the beginning of your SAFe journey. In this case, my advice is to measure the cycle time of projects or your current equivalent. Another way I have seen this done is by mapping the development value stream. This can be done very informally, by taking a pencil and a sheet of A3 paper and walking the process. Noting the steps, the time they take to execute and the wait times between each step. You can then revisit this map periodically updating it and hopefully showing a reduction in cycle time. Alternatively, the SAFe DevOps class includes this exercise and/or Karen Martin’s book Value Stream Mapping provides a detailed workshop guide.

Frequency of Release

This one looks at how frequently your ART delivers outcomes to its customers. Often articulated as frequency in a period eg. once a year, or twice a quarter etc. For many traditional organisations this dictated by the Enterprise Release cycle. You should expect to see this increase.

Escaped Defects

This is a count of defects that make it to production, or “escape” your system. Two common approaches are to capture the number per release or the number per time period. You should expect to see this decrease.

In a code base with a lot of technical debt, you may find that your identification of defects increases in your early PIs as teams become more disciplined about recording defects they find while working on new features. While ideally these defects would have been fixed as they were discovered. We have a view that if a defect will require enough work that it will impact the specific feature or sprint objectives being worked on at the time then teams should choose to record the defect for future prioritisation. This way the entire team can see the defect and the Product Owner can make a responsible decision about prioritising it while maintaining balance for the committed prioritised objectives.

Test Automation

If your Agile Release Train is software related then you will want to baseline your level of test automation. This is you total number of automated tests as a percentage of your total number of tests (manual and automated). Some organisations will start with a zero and may take some time to get started with test automation, but keeping it visible on your list of metrics will help bring focus. Of course, we are looking to have the percentage of automated tests increase over time due to both the creation of automated tests and the removal of manual tests.

Ratio of “Doers” vs “Non-Doers”

Another interesting metric to baseline and track is the percentage of people “doing the work” as a proportion of the people work in the department. “Doers” tends to be defined as people who define, build, test, deploy (e.g agile teams), making everyone else a “non-doer”. You should expect to see the ratio of doers increase as your ART matures.

Market Performance

If your ART is aligned to a Product or Service monetised by your company, you might also find it interesting to baseline the current market performance of that product or service. Some examples include Volume of Sales, Services in Operation and Market Share. In a similar, but perhaps, more daring approach the folks at TomTom used Share Price to demonstrate the value of SAFe in the Agile2014 presentation Adopting Scaled Agile Framework (SAFe): The Good, the Bad, and the Ugly.

Some metrics you won't be able to baseline before you start but you can start tracking once you begin executing your first program increment.

Cost per Story Point

It is likely that you won't be using SAFe’s approach to normalized estimation prior to launching your ART therefore you probably won't be able to baseline this one before you start. However, to be able to do this once you have started you will need to be using normalised estimation, know the labour cost of your ART and determine your approach to capturing “actuals”. For a more detailed explanation of the Cost per Story Point calculation check out: Understanding Cost in a SAFe World.

Almost any cost based metric will be difficult to prove and you will almost certainly be asked how you calculated it. One of the ways I have backed up my assertions with respect to reduced cost per story point is by triangulating that data to see if other approaches to calculating cost reduction yield similar results. One such “test” is to take a “project” or epic that was originally estimated using a traditional or waterfall method and look at the actual costs after delivering it using SAFe. While by no means perfect, it may help support your argument that costs are decreasing.

Program Predictability Measure

In addition the self-assessments, SAFe offers the Program Predictability Measure as a way to measure Agility. Personally, I see this more as a measure of predictability rather than agility, but then again I don’t believe in trying to measure agility

This seems to be one of the most commonly missed parts of SAFe. To be able to calculate this you need to capture the Business Value of the PI Objectives at PI Planning. Sometimes this gets skipped due to time pressure and other times the organisation deliberately skips this as it is perceived as “too subjective”. Of course, it is subjective, but I figure this is mitigated by ensuring the people who provide the Business Value at PI Planning are the same people who provide the actual value as part of the Inspect & Adapt.

The other trap I see organisations fall into is changing the objectives and the business value during the PI, as it is not the teams fault that “the business” changed their mind. Correct! It is also not the teams fault when the system is unpredictable! If you stick to using the objectives from PI Planning the PI Predictability Measures will reflect the health of the entire system - both the teams delivery on commitments and the businesses commitment to the process. If you change the objectives you no longer have a measure of predictability!

Armed with all the above metrics, it is my hope that you will be able to avoid the dreaded Agile Maturity metric.

Agility (or Agile Maturity)

I have yet to see an approach to this that doesn’t require teams to be “assessed”. In my view the only valuable agile assessment of a team is a self-assessment. If the results of self-assessment become a “measure” of agile maturity, the learning value of the exercise will likely be lost. After all, as the popular proverb says; “What gets measured, gets done.” So please, whatever you do, don’t try and measure Agile Maturity by assessing teams. Instead focus on NPS, cycle time, escaped defects and test automation . Moving these numbers is a sign you are headed in the right direction.

In a textbook SAFe implementation, Lean Portfolio Management allocates a budget to each Value Stream and consequently each Agile Release Train (ART). The ART’s Product Manager works with the ART’s stakeholders to prioritise the work that consumes that budget. The ART plans and executes against these priorities and no one worries about how much it costs to deliver any specific feature. However, there is often a difference between the ideal SAFe implementation and your current reality and one of those differences can be an expectation that the ART can articulate the cost of delivering a given feature. This is especially likely to be true if your ART is inside an organisation that still uses project-based funding.

Organisations can be quick to respond to these challenges in the same way they have in the past by asking individuals to fill out timesheets with specific project and activity codes. In a world where we want delivery to be a team accountability and estimation to be in story points this feels like a huge step backwards. So what might an alternative look like if we were to use information that we have readily available and minimise the overhead on the teams to collect data solely for costing purposes?

An approach I have had a lot of success with is the cost-per-story-point model. The idea is that I know the cost of the people working on the ART and I know the historical velocity of the ART therefore I know the cost per story point. If I then want to understand the “cost” of a feature I can take the normalised points estimate from PI Planning and multiply it by the cost per story point and I will have a pretty good approximation. If I want to understand the “actual” cost of a delivered feature, I can ask the teams to advise of any significant surprises they had during delivery, which meant there was probably more or less effort required than what was estimated at PI Planning.

As simple as all this sounds, there are some nuances, that may or may not be material, but will certainly make your numbers more defendable. If you are geeky like me, you might find this whitepaper useful in building a robust cost-per-story-point model for your Agile Release Train.

Over the past few years much has been written and tweeted about the evils of agile estimation (#noestimates). There has also been much consternation amongst agilists with respect to SAFe’s normalized estimation approach. However, for most of my large enterprise clients the need to estimate for the purposes of planning is a practical necessity and SAFe’s normalised estimation is a useful tool, when used as intended. Given this, I have chosen to put the debates about the evils of estimation and normalized story points to one side and instead focus on how we might be able to help teams and Agile Release Trains (ARTs) become more predictable by improving their approach to forecasting using velocity, where velocity is defined as the number of story points delivered by a team or train in a sprint or program increment.

When planning agile teams generally use “yesterday’s weather” to predict their velocity/capacity for the next sprint (or sprints in the case of SAFe’s PI Planning). The idea is that: yesterday’s weather is the best predictor of today’s weather. When applied to agile this is taken to mean last sprint’s velocity is the best predictor of next sprint’s velocity. Where team velocity is the total story points delivered by a team (planned and unplanned work). Of course, this approach does not take into account that a team's capacity is not equal from sprint to sprint, as it is affected by both planned and unplanned leave. I have observed that ARTs tend to solve for this by re-setting their capacity every sprint using SAFe’s normalised estimation approach - which drives me batty!

Frustrated by the misuse of SAFe’s normalised estimation approach, I started to play with the concept of weighting velocity to improve the accuracy (not precision) of team and train planning and forecasting. I have come to call my approach weighted velocity. It is a way of adjusting velocity information so that it more accurately reflects capacity. This entails collecting the attendance of data for a team or ART and expressing it as a percentage of the team's normal capacity.

For example if a team usually has 8 full time teams members and one was on leave for 5 days of the 10 day sprint then the team percentage attendance would be 93.75%, e.g. 75 days/80 days = 93.75%. I then take the velocity for the same team (or ART) for the same period and divide it by the % attendance. For example, if the velocity for the given sprint was 45 points and the percentage attendance was 93.75% then the weighted velocity would be 48.

This approach can also be used to address the impact of working overtime on velocity, too. (Before all you agilists out there go on the attack, we all know there should not be overtime on an agile team!! and I tell my clients this all the time but sometimes these things still happen…) For example, the team has 8 full time team members and they all came in for a half day on a Saturday. The team would have attended 84 of 80 days making their % attendance 105%. If the velocity for this sprint was 50 points. The weighted velocity would be 48 (i.e. 50/105%).

By always weighting the team or (ARTs) velocity we remove the variation caused by planned and unplanned leave, providing a more realistic view of yesterday’s weather for planning and forecasting. Perhaps more simply, we are reverse engineering what the velocity would have been if the team had been at full capacity (100%) for the sprint. Of course, when using weighted velocity as yesterday’s weather for planning purposes, I suggested taking an average over 4 or 5 sprints, then adjusting the number down for any planned leave using the same percentage of attendance approach used above.

I have also found weighted velocity to be exceedingly useful when building cost per story point models, but that is a topic for another blog!

As a side note, on the topic of improving the fidelity of estimations, something else I have always found useful is asking teams to reflect on their estimations from sprint planning when they reach the end of the sprint, perhaps as part of their sprint retrospective. What I ask teams to look for is where they feel there was significant variance between the initial estimate and the actual effort involved in completing the story. Where these variance occur I suggest the team has a discussion about why they think the estimation varied (i.e what did they learn through the delivery of the story) and how they may be able to use this learning in future planning sessions.

plans are nothing planning is everything

Baseline Metrics Before You Start