“Always show your working out!” was the mantra of my maths teacher in senior school. This series of blog posts “On the Nature of Lean Portfolios” is an exploration of Lean Portfolios. It is the thought processes running through my mind, exploring the possibilities so that I understand why things are happening rather than just doing those things blindly. It is not intended to be a fait-accompli presentation of the Solutions within Lean Portfolios but an exploration of the Problems to understand whether the Solutions make sense. There are no guarantees that these discussions are correct, but I am hopeful that the journey of exploration itself will prove educational as things are learnt on the way.
Portfolio Use of Weighted Shortest Job First
The standard advice within SAFe is that when it comes to prioritisation use Weighted Shortest Job First (WSJF). It is a useful simplification to stop Cost-Of-Delay discussions getting unnecessarily complex, but there is more depth to the mechanic than people would assume from reading the SAFe webpage; depth that I’ve previously explored in a blog on The Subtleties of Weighted Shortest Job First.
Weighted Shortest Job First works at the Release Train level because the timeboxes, the Program Increments, exist. The challenge is that at the Portfolio level, where Epics can last for long periods of time, then I suspect that the “Shortest Job First” part of “Weighted Shortest Job First” is going to cause some problems. By its very nature going it’s to make it difficult to balance Short Term wins against Long Term investment, with the long term investment losing. Part of the reason for writing this blog series On The Nature Of Portfolios was to explore this very issue.
What follows over the next few postings are a series of “experiments” to explore the topic and look at the issues that arise. We’ll look at:
- WSJF using Total Epic Effort
- WSJF using Predicted Duration
- WSJF using Experimental Effort
- WSJF; what would Don do?
- WSJF considering Risk factors
- WSJF within the Investment Horizons
WSJF using Predicted Duration
The first experiment looked at what happens when WSJF uses Total Epic Effort, as expected WSJF favoured the short term wins at the expense of long term investment. The challenge being that some of those long term investments are important and need to be worked upon; is there anything that can be done to get those investments prioritised.
A second experiment. To allow for swarming; instead of using Total Effort we will see what happens when we use Predicted Duration for the Effort parameter..
The numbers going into a WSJF can be very subjective, so some notes on the thinking that led to the above:
- For BV, TC and RR|OE the logic remains the same as the previous experiment.
- For the Regulatory Epic the duration is less because a number of Value Streams/Trains are running in Parallel1; the swarming tactic.
The duration estimates have to assume that they have first call on the engineering effort within the Value Streams/Trains, therefore the Epic can be delivered in parallel across all of the Value Stream/Trains. Swarming doesn’t come naturally.
Observation: Swarming requires the right conditions
Teams may need to be guided to swarm together on certain pieces of work. This can be easily communicated in the strategy cascading down into the Value Streams/Trains, and the Product Management can add notes to the Features to remind teams to swarm. Less strict than Acceptance Criteria; Notes can act as a reminder to the team to steer them in PI planning without actually telling them what to do; which would break empowerment and self-organisation. Notes, often used to communicate that the team planning the Feature needs to discuss it with certain people, eg. The Compliance Representative; here the notes are being used to communicate that the Teams can swarm around the features and the planning team can use their empowerment to negotiate with and organise the other teams.
What works against swarming is badly formed capacity allocations. It’s not uncommon for organisations to Ring-Fence capacity in order to ensure that people work on certain explicit pieces of work. Taken too far the capacity allocations limit teams ability to self-organise as they are bound to deal with work in the capacity allocations rather than the most valuable work, they could end up working on nice-to-haves that are part of the capacity allocation and the must-haves outside the capacity allocation aren’t getting done. Limiting the ability to self-organise limits the ability to swarm around work.
This isn’t to say that capacity allocations are bad, they are useful tools, but they should used to indicate the balance of work, e.g. Architecture vs Business split or the Investment Horizons. If they become too granular and are targeting individual pieces of work then they start to impact self-organisation because the central point is now dictating exactly what to work on rather than indicating what the balance of work should be.
Observation: Feature construction and Team Skills
Features need to be well constructed.
Business needs; don’t pre-slice.
Teams need to have the right skills to swarm around Features.
This isn’t a Stream (Feature) orientated team vs Sub-System (Component) orientated team problem. In fact if a Feature is written as a Business need that cuts across the landscape of sub-systems, but the teams are sub-system orientated, then the teams must collaborate to get the work done and that will be mapped out in PI Planning. Swarming is harder to achieve in Stream orientated teams because the team could do the work on their own, but everyone collaborating and doing a bit will get it done quicker.
Where collaboration is occuring the challenge is to ensure that the teams are swarming and working in Parallel to get the work done quicker rather than playing the relay race where one team hands on to the next, then the next; the sequential handoffs delay completion. The Program Board is a key tool in visualizing the collaborations and identifying if any bad patterns are emerging.
Observation: Duration ≠ Effort
Effort is not always a good approximation for duration. Conversely duration in not always a good approximation for effort.
By way of example:
Imagine a business change that involves a mix of a few months of software engineering effort and 2 years of physical construction of a new data centre. The physical construction is all the concrete, steel, and cabling and will be done by external resources, i.e. a civil engineering firm. We don’t want to separate the two pieces because the engineering effort is worthless if it hasn’t got a Data Centre to run on and the Data Centre is worthless if there’s no software for it to run.
Due to the physical construction the duration is long compared to engineering effort & WSJF would never prioritise something with such long duration.
Using Effort would prioritise the work sooner but causes its own issues. We could do all the work on the software early and be waiting almost 2 years to see the results running on the Data Centre. Building the Data Centre and then the Software in sequential fashion doesn’t work either; because, what if the Software doesn’t work? If it doesn’t work then we don’t need to spend money building a Data Centre; the Epic should be cancelled as quickly as possible.
Lean Startup mechanics to the rescue again. Lean Startup should be prioritising de-risking some of the software activities ahead of Data Centre construction to prove that the software will work. After which physical construction can begin and ideally the remaining software delivery scheduled to coincide with completion of the physical construction.
Experimentation needs timeboxes to limit the experiments to ensure that some results are returned and to use those results to influence the work and priorities in the next timebox; which is the Do Not Ignore Insight Gained mentioned in the previous blog.
Observation: Cancellation & Approvals
Running alongside the WSJF needs to be a Cancellation & Approvals process because WSJF just prioritises; it doesn’t make decisions about starting work, and more importantly it doesn’t make decisions about stopping work.
Deliberately Cancellation first. An aphorism that is often used is “Stop starting; start finishing”2 emphasis needs to be on avoiding running to many Epics in parallel. Cancel work that isn’t achieving to create space to approve something new.
Every Program Increment the metrics for each Epic need to be inspected; do they justify the Epic’s continued existence? Have the experiments for the Leading Indicators proved true? Are the Business Outcomes moving in the right direction? If the answer to either of these is no then stop wasting effort on this idea. It is the Epic Owner’s job to collect these metrics every Program Increment and present them at the relevant Portfolio Sync meeting.
Cancelling an Epic should not be perceived as a failure, but as a success; the action has saved the organisation from wasting money on something that wouldn’t achieve its Business Outcomes and that money could be redirected to something that might achieve its Business Outcomes. Be careful of the human factors; an Epic Owner has invested personal time and effort into trying to make the Epic a success and having it cancelled could be perceived as a personal failure. Be wary of incentivisation around the success of an Epic, early cancellation and saving money should be classed as a success as much as achieving the stated Business Goals.
Observation: Risk Mitigation
The advantage of prioritising “The Next Bit” rather than “The Whole” is that it naturally leads towards a Lean Startup mentality. The Lean Startup mentality is good because it turns Risky Endeavours into Risk Mitigation exercises.
Epics need to provide justification for continuation on per-timebox basis is the Organisation mitigating the risk that it is wasting its money on something that isn’t going to work. It also an embodiment of SAFe Principle #1 : Base Decisions On Economics, since the metrics are Leading Indicators or Business Outcomes.
In the next post we’ll run an experiment to look at what happens to Weighted Shortest Job First if the Effort is just the effort to run the next experiment.
#1 Work being done in parallel; Bad. Teams working in parallel (swarming); Good. By all working in parallel the teams will get the work done quicker; which is preferable compared to the relay-race approach where one team does their bit, hands to the next team, who do their bit, who hands to the next… deliver of value takes a lot longer.
#2 “Stop starting, Start Finishing” is the aphorism used to describe SAFe Principle #1: Take an Economic View, on the SAFe Principle Cards.