[Disclaimer: I am CTO of Data Language, a data science consultancy that works in the evidence-based healthcare sector. These are my personal views only and do not reflect the views of the company]
The last few days have revealed more details about the UK government “delay” phase of their coronavirus pandemic response. Firstly, we got an indication that the 590 positive cases confirmed by Wednesday in fact more likely represented 10,000 actual positive cases in the population. Then more information was shared about the goal of establishing “herd immunity” by allowing the disease pass through the population.
Students of Cynefin and Wardley Mapping might have been alarmed by these points. Firstly, a critical success factor in managing risk in complex domains is the effectiveness of your sensor network, i.e. the ability to assess the changing environment in realtime: being out by a factor of roughly twenty in our understanding of the contagion represents an acknowledgement that we are basically running blind right now. The efforts made to prepare for the arrival of the disease in the window of time since the Wuhan outbreak have clearly been woefully inadequate: NHS 111 triage and our diagnostics testing capability should have been scaled to a massively greater extent than has actually happened.
So task #1 is: how can we rectify this situation and regain a handle on actually how many people have the virus.
Secondly regarding the “herd immunity” plan, this is the opposite of deploying safe-to-fail probes to manage risk in complex domains. What does “safe to fail” really mean? It means that hard containment boundaries exist which prevent failure, dysfunction or pathology escalating through the network. More information on that below.
This is why Nicolas Nassem Talib is right in calling out the irrationality of the UK strategy. It is a naive unhedged bet compounded by the absence of containment boundaries. He made the point many years ago that the best risk management plan is one which is most effective when the impacts are greatest: if you don’t survive those situations then nothing else matters. The UK strategy is currently the opposite of this.
So task #2 is: how can we reintroduce containment boundaries bottom-up in the absence of anything coming top-down from government.
The government says it is following scientific advice, but in my opinion has made the error of misunderstanding the nature of the problem. This is not a science problem or an evidence-based healthcare problem or a predictive analytics problem, for the simple reason that we don’t have the data yet. This is a risk management problem. Risk management is a completely different skillset that scientists have historically been pretty bad at: they understand how to act in the presence of evidence, whereas what we currently need to know is how to act in the absence of sufficient data. The people calling the shots now should be those with most experience of successfully managing large impact risk.
Wired reported that Downing Street is currently seeking help from tech companies to address the coronavirus situation. In my opinion, those companies should now do the following to address tasks #1 and #2 described above.
- Build a computational model that simulates non-virtual social connectivity in the UK, send a simulated virus through that network with a 1:20 detection factor and then look at how many directly or indirectly known positives a node/person needs to connect to in order to have a >5% chance of connecting to an unidentified positive*
- Build an NHS App that uses this data to tell people whether they should self-isolate not due to their own symptoms but due to the inferred risk in their localised social network model.
- Allow anyone using the app to report anonymously when someone in their household is showing any cold, flu or bug symptoms. Subtract seasonal averages, then feed back into the model above. Track location-based uptake of the app, then map this to demographic/census data to project from the know sampleset to the whole population (i.e. following same methodology as retail market analytics sampling)
- Massively publicise the app as the best way to receive timely information to stay safe.
Yes such an aggressive containment strategy would have significant social impact. However implemented properly, it would avoid major systemic economic impacts because the virus would be forced to “fail fast” before such impacts hit. The super-aggressive phase would probably only need to be a few weeks, as we saw in China.
The overall goal here is to create an modelled sensor network in the absence of having any real one, that uses confirmed positive tests as a weak signal to infer the actual state of contagion in the network (i.e. refactor from a position of near blindness to inferred observability and to actual observability). Then self-isolation should be enforced based on the inferred localised state of the network rather than individual’s health status. That’s particularly important because asymptomatic people can still be infection vectors.
That’s the start. Then you augment confirmed positive tests with anonymous reporting of symptoms in order to move from an inferred sensor network to a real sensor network. Once you’ve got that plus effective containment boundaries, you can then start rebooting economic activity as quickly as possible.
Failure escalation in complex systems
Earlier in this post, we proposed that failure safety (i.e. the fact that a probe or network intervention might be “safe to fail”) depends on the existence of containment boundaries which prevent failure, dysfunction or pathology escalating through the network.
In the case of a free market, the containment boundary is the organisation: if a company has a product that never generates profit (i.e. it is a pathogen with respect to the cost/benefit values that determines selective fitness) and the company doesn’t proactively cull the product then the company itself will get culled. Should that not happen, e.g. the 2008 financial crisis, then that failure escalates out through the network.
In doing so, it also crosses context boundaries. However these are not context boundaries in the normal x or y dimensional sense of a Wardley map, but instead in the z dimension. They are the context boundaries of scale invariance, where a single node at one level of mapping equates to a entire map at the next level of detail.
This is how failure escalates: from the organisational up to the economic, from the economic into the political, then up again if not caught from political to social (i.e. as in basic societal function), then if not caught finally again from the social to biological. And the biological is utterly merciless and ruthless – it doesn’t care about economics and it doesn’t care about politics. In older societies more exposed to existential risk, this was the Shiva Destroyer of Worlds that wipes out civilisations or species which violate the basic constraints of ecology. Our responsibility is to ensure we act quickly and in the interests of the entire system so that such mechanisms are never triggered.
* an interesting question is how accurately the network model would need to represent the true social graph in order to generate results similar to observed data patterns. Conjecture right now, but my suspicion is that by driving the design by the desired outputs, it may be possible to create a much simplified model that still generates results within acceptable error margins to guide effective action (and also delivers a much more performant model as a result). Another hypothesis to test early/fail fast should anyone decide to run with the implementation of something like this!
The previous post we discussed control in complex adaptive systems. We examined the different categories of constraints that shape emergent behaviour, and highlighted the dangers created by failing to recognise the limits of human control. In this post I would now like to examine the extent to which it’s meaningful to think about lean product development as being ‘natural science’. In doing so, I intend to avoid getting drawn into philosophical debate about the scientific method. Rather, taking my cue once more from David Snowden’s LSSC11 keynote, I’d like to start by examining the narrative context of appeals to natural science (i.e. claims that field F ‘is science’). Historically, such appeals have commonly been made for one of two reasons:
A long tradition exists of appeals to science in order to assert the reliability and validity of new or poorly established fields of intellectual inquiry. That tradition has been based in academic circles, and can be traced back through management and education science; political science, economics and anthropology; through to linguistics and famously even history.
Whether this is something lean product development needs concern itself with, I would question. As a discipline based on practical application rather than theoretical speculation, we can rely on natural selection to take care of that validation for us: if the methods we use aren’t effective then we will simply go out of business. Economic recession makes this process all the more reliable. Earlier this year I came across a really great risk management debate between Philippe Jorion and Nassim Taleb from 1997, where Taleb makes the following point:
“We are trained to look into reality’s garbage can, not into the elegant world of models. [We] are rational researchers who deal with the unobstructed Truth for a living and get (but only in the long term) [our] paycheck from the Truth without the judgment or agency of the more human and fallible scientific committee.”
For me this summarises lean practice. Yes, we are performing validated learning – but simply because doing so is rational behaviour, and more effective than taking a blind punt. Beyond that, whether the techniques employed constitute science seems to me an unnecessary diversion.
That would be of arguably little consequence were it not for one issue: namely, risk. Reminding ourselves that risk is a function of value rather than an entity in its own right, the value that science strives for is truth (however that might be defined). However the value that complex systems risk management strives for is survival. This difference creates a diametrically opposing attitude to outlier data points. Risk management in science is concerned with protecting the truth, and so all outlier data points are by default excluded unless they are repeatable and independently verified. On the other hand, risk management in complex systems is all about managing the cost of failure. It doesn’t matter if you are wrong most of the time, as long as in all those cases the cost of failure is marginal. What counts is being effective when the cost of failure is highest, as that creates the greatest chance of survival. As a result, outlier data points are by default included and are only ever discounted once we are highly confident they are not risk indicators.
The financial crisis has demonstrated the consequences of confusing these two different perspectives: of risk management techniques which are right most of the time but least effective when the cost of failure is greatest.
The other common motive for appeals to science has been power. Going back to Descartes and the birth of western science, its narrative has always been one of mastery over and possession of nature – in short, the language of dominion and control. This takes us back to the themes of the previous post.
Such a perspective has become so deeply embedded in our cultural consciousness that it now influences our comprehension of the world in all sorts of ways. A recent example has been the debate in software circles about avoiding local optimisation. As a technique for improving flow through systems, the principle is sound and highly valuable. However it is entirely dependent on the descriptive coverage of the system in question. Some systems, such as manufacturing or software delivery pipelines, are amenable to such complete mapping. Many others however, such as economic markets, are not. A technique commonly used to describe such systems in evolutionary biology is the fitness landscape:
The sectional view through such an environment might be as follows, leading the unwary to highlight the importance of avoiding local optimisations at points A and C and always evolving towards point B.
The problem here is that for landscapes such as economic markets, the above diagram represents the omniscient/God view. For mere mortals, we only have knowledge of where we have been and so the above diagram looks simply like this:
Whilst it is a valuable insight in its own right simply to understand that our product is a replicator competing in a fitness landscape, as much as we might like to avoid local optimisations doing so is impossible because we never know where they are (even at maxima, we can never tell if they are global or local).
It is for these reasons that I think it is unhelpful to think of lean as science. The narrative context of lean should be not one of arrogance but humility. Rather than succumbing to the illusions of mastery and control in the interests of appeasing our desire for certainty, we are performing validated learning simply because it is the most effective course of action once we recognise the actual limits of both our control and understanding.
It’s been a couple of weeks since my return from LSSC11 in Long Beach, California. My congratulations to David Anderson and the team for putting on a great conference. I was particularly impressed by the diversity of content and choice of keynote speakers. I’m sure the continued adoption of such an outward-looking perspective will keep the lean community a fertile breeding ground for new ideas. For me, personal highlights of the week included an enthusiastic and illuminating conversation about risk management with Robert Charette, meeting Masa Maeda who is obviously both very highly intelligent and a top bloke (thanks a lot for the blues recommendation btw!), a discussion about risk with Don Reinertsen after his excellent Cost of Delay talk (more on this in another post) and catching up with a number of people I haven’t seen for too long.
I think the talk I found most thought-provoking was given by David Snowden. I’d read a couple of articles about the Cynefin framework following a pointer from Steve Freeman and Suren Samarchyan a couple of years ago, but I’d never heard him speak before and the prospect of complexity science and evolutionary theory making it into the keynote of a major IT conference had me excited to say the least. Overall he presented lots of great content, but by the end – and to my surprise – I was left with a slight niggling ‘code smell’ type feeling, something I’d also experienced in couple of the Systems Engineering talks on the previous day. Reflecting on this during his presentation, I realised the cause was essentially two concerns:
1.) The lack of acknowledgment that often we have little or no control over the constraints in a complex adaptive system
2.) The extent to which it’s meaningful to think about lean product development as being ‘natural science’
The first of these will be the subject of the rest of this post. The second I will discuss in my next post.
Constraints in Complex Adaptive Systems
For any agent acting within a complex system – for example, an organisation developing software products and competing in a marketplace – the constraints of that system can be divided into the following core categories:
a.) Constraints they can control. These include software design, development and release practices, financial reward structures, staff development policies, amongst others.
b.) Constraints they can influence. These include how a customer perceives your product compared to the products of your competitors, and the trends and fashions which drive product consumption.
c.) Constraints they can do nothing about and must accept. These include competitor activities, legal constraints, economic climate, and more typical market risks such as exchange rates, interest rates and commodity prices.
Each type of constraint then requires a specific management approach, as follows.
a.) ‘Control’ constraints: this is the domain of organisational and capability maturity models.
b.) ‘Influence’ constraints: this is the domain of marketing and lobbying (both internal/political and external/advertising): for example, one of the most effective growth strategies is to promote ideas which cast the market-differentiating characteristics of your competitors in a negative rather than positive light. However such techniques are only reliable to the extent that no-one else is using them. They also create risk because in such circumstances they create an illusion of control. Once competitors adopt similar techniques then that illusion is revealed, and influence is lost until more sophisticated strategies are devised.
c.) ‘Accept’ constraints: this is the domain of risk management and resilient program management practices.
If an organisation mis-categorises the constraints within which it operates, it can have consequences which are terminal. Almost always this happens because of the illusion of or desire for control, where constraints are mistakenly identified as the first category. The illusion of control is commonly created when complex adaptive systems are in temporal equilibrium states, and so behave predictably for limited (and unpredictable) periods of time. Applying control management techniques in such situations is worse than doing nothing at all, as it creates the illusion of due diligence and hides the real levels of risk exposure. Thinking about the recent financial crisis in these terms, it can be seen as the misapplication of the mathematics of natural science (e.g. Brownian motion in pricing models) in an attempt to manage capital market systems constraints that were actually outside the domain of full human control.
A key to this is something I have blogged about previously: failure handling. Snowden discussed a strategy employed by a number of large organisations where they preferentially select delivery teams from previous failed projects, as the lessons learnt they bring to the new project increase its chance of success. He likened this approach to treating exception handling as an integral part of software design. This idea was a recurring theme across the conference, to the point where a number of people were tweeting suggestions for a ‘Failure’ track at the next LSSC. However I don’t buy this argument, and in his closing remarks I was pleased to hear David Anderson announce that it won’t be happening. The reason I don’t buy it goes back to the software design metaphor. If your application was brought down by a exception, then clearly failure handling was not treated as a first order concern in the design process. Similarly, if resilient failure handling had been designed into your program management process, then rather than your project failing you should have conducted the necessary risk mitigation or performed a product pivot.
In terms of the categories described above, failure related to the first type of constraint generally indicates an immature delivery capability: it is failure that was avoidable. On the other hand, failure due to the third type of constraint indicates a lack of understanding of resilience and the principles that Eric Ries and others have been promoting for a number of years now. It is failure that was absolutely inevitable. Neither of these are characteristics I would want to bring to a new project. Failures due to the second type of constraint are arguably more interesting, but for me they quickly descend into subject areas that ethically I find questionable. Finally and of most interest, are failures due to constraints unique to the particular business context. The uniqueness of the constraint means that the failure is necessary, and not just an attempt to find a solution to an already solved problem. In this case, the failure becomes part of the organisation’s learning cycle and a source of highly valuable insight going forwards. However, even here it could be argued that such lessons should be possible via effective risk management processes rather than requiring full-blown project failure.
I had a brief chat to David Snowden after his talk, and regarding the extent to which systems constraints can be controlled he had the slightly disappointing answer that ‘C-level people get what I’m talking about, it’s only middle managers who don’t understand it.’ Whilst that may or may not be true, afterwards it put me in mind of Benjamin Mitchell’s excellent presentation about Kanban and management fads. I think a large part of the history of management fads reduces down to the exploitation of CxO denial regarding a.) the actual limitations of their control and b.) the difficulty and cost of high quality product development. I think a key task this community can perform to help prevent Kanban going the same way is to stay faithful to ‘brutal reality’, to borrow Chet Richards fantastic phrase, by remaining transparent and honest about the true limits of achievable control.
Although I have been making use of them over the last 18 months in various presentations and the real options tutorial, I recently realised I’d omitted to publish graphs illustrating the optimal exercise point for real options on this blog. As a result, here they are:
The dotted blue line represents risk entailed by the agilist’s Cone of Uncertainty, and covers technical, user experience, operational, market and project risk – in short anything that could jeopardise your return on investment. The way this curve is negotiated is detailed below, by driving out mitigation spikes to address each risk.
The dotted brown line represents risk from late delivery. At a certain point this will start rising at a rate greater than your mitigation spike s are reducing risk, creating a minimum in the Cumulative Risk curve denoted in red. Remembering that
Feature Value = ((1 - Risk) x Generated Value) - Cost
this minimum identifies the Optimal Exercise Point.
One point worth exploring further is why Delayed Delivery is represented as risk rather than cost. The reason is because Cost of Delay is harder to model. For example, let’s say we are developing a new market-differentiating feature for a product. Let’s also say that there are X potential new customers for that feature in our target market, and that it would take Y weeks of marketing to convert 80% of those sales. Providing whenever we do launch, there remains Y weeks until a competitor launches a similar feature then then the cost of delay may be marginal. On the other hand, if we delay too long and a competitor launches their feature before us then there will be a massive spike in cost due to the loss of first mover advantage. However the timings of that cost spike will be unknown (unless we are spying on the competition), and therefore very hard to capture. What we can model though is the increasing risk is of that spike biting us the longer we delay.
I have found it very helpful to think about this using the insightful analysis David Anderson presented during his excellent risk management presentation at Agile 2009 last year. He divides features into four categories:
- Differentiator: drive customer choice and profits
- Spoiler: spoil a competitor’s differentiators
- Cost Reducer: reduce cost to produce, maintain or service and increase margin
- Table Stakes: “must have” commodities
Differentiators (whether on feature or cost) are what drive revenue generation. Spoilers are the features we need to implement to prevent loss of existing customers to a competitor, and are therefore more about revenue protection. And Table Stakes are the commodities we need to have an acceptable product at all. We can see how this maps clearly onto the example above. The cost of delay spike is incurred at the point when the feature we intended to be a differentiator becomes in fact only a spoiler.
This also has a nice symmetry with the meme lifecycle in product S-curve terms.
We can see how features start out as differentiators, become spoilers, then table stakes and are finally irrelevant – which ties closely to the Diversity, Dominance and Zombie phases. There are consequences here in terms of market maturity. If you are launching a product into a new market then everything you do is differentiating (as no-one else is doing anything). Over time, competitors join you, your differentiators become their spoilers (as they play catch-up), and then finally they end up as the table stakes features for anyone wishing to launch a rival product. In other words, the value of table stakes features in mature markets is that they represent barrier to entry.
More recently however, I have come to realise that these categories are as much about your customers as your product. They are a function of how a particular segment of your target market views a given feature. Google Docs is just one example that demonstrates how one person’s table stakes can be another person’s bloatware. Similarly, despite the vast profusion of text editors these days, I still used PFE until fairly recently because it did all the things I needed and did them really well. Its differentiators were the way it implemented my functional table stakes. The same is true of any product that excels primarily in terms of its user experience, most obviously the iPod or iPhone. Marketing clearly also plays a large part in convincing/co-ercing a market into believing what constitutes the table stakes for any new product, as witnessed in the era of Office software prior to Google Docs. The variation in the 20% of features used by 80% of users actually turned out not to be so great after all.
So what does this mean for anyone looking to launch a product into a mature market? Firstly, segment you target audience as accurately as possible. Then select the segment which requires the smallest number of table stakes features. Add the differentiator they most value, get the thing out the door as quickly as possible, and bootstrap from there.
The phrase “Agile in the Large” is one I’ve heard used a number of times over the last year in discussions about scaling up agile delivery. I have to say that I’m not a fan, primarily because it entails some pretty significant ambiguity. That ambiguity arises from the implied question: Agile What in the Large? So far I have encountered two flavours of answer:
1.) Agile Practices in the Large
This is the common flavour. It involves the deployment of some kind of overarching programme container, e.g. RUP, which is basically used to facilitate the concurrent rollout of standard (or more often advanced) agile development practices.
2.) Agile Principles in the Large
This is the less common, but I believe much more valuable, flavour. It involves taking the principles for managing complexity that have been proven over the last ten years within the domain of software delivery and re-applying them to manage complexity in wider domains, in particular the generation of return from technology investment. That means:
- No more Big Upfront Design: putting an end to fixed five year plans and big-spend technology programmes, and instead adopting an incremental approach to both budgeting and investment (or even better, inspirationally recognising that budgeting is a form of waste and doing without it altogether – thanks to Dan North for the pointer)
- Incremental Delivery: in order to ensure investment liability (i.e. code that has yet to ship) is continually minimised
- Frequent, Rapid Feedback: treating analytics integration, A/B testing capabilites, instrumentation and alerting as a first order design concern
- Retrospectives and Adaptation: a test-and-learn product management culture aligned with an iterative, evolutionary approach to commercial and technical strategy
When it comes down to it, it seems to me that deploying cutting-edge agile development practices without addressing the associated complexities of the wider business context is really just showing off. It makes me think back to being ten years old and that kid at the swimming pool who was always getting told off by his parents “Yes Johnny I know you can do a double piked backflip, but forget all that for now – all I need you to do is enter the water without belly-flopping and emptying half the pool”.
Towards the end of last year I was lucky enough to meet up with Chris Matts for lunch on a fairly regular basis. To my great surprise, he did me the amazing honour of capturing some of what we discussed concerning memes in one of his legendary comics. They have just been published on InfoQ together with a discussion piece here: http://www.infoq.com/articles/meme-lifecycle
I am currently taking some time out on a mini mid-West roadtrip area after a most enjoyable and thought-provoking Agile 2009 in Chicago. I will post more on that later, but in the meantime here are the the slides and reading list from my conference session.