Systemic financial crises occur infrequently, giving relatively few crisis observations to feed into the models that try to warn when a crisis is on the horizon. So how certain are these models? And can policymakers trust them when making vital decisions related to financial stability? In this blog, I build a Bayesian neural network to predict financial crises. I show that such a framework can effectively quantify the uncertainty inherent in prediction.
Predicting financial crises is hard and uncertain
Systemic financial crises devastate nations across economic, social, and political dimensions. Therefore, it is important to try and predict when they will occur. Unsurprisingly, one avenue economists have explored to try and aid policymakers in doing so is to model the probability of a crisis occurring, given data about the economy. Traditionally, researchers working in this space have relied on models such as logistic regression to aid in prediction. More recently, exciting research by Bluwstein et al (2020) has shown that machine learning methods also have value in this space.
New or old, these methodologies are frequentist in application. By this, I mean that the model’s weights are estimated as single deterministic values. To understand this, suppose one has annual data on GDP and Debt for the UK between 1950 and 2000, as well as a list of whether a crisis occurred in those years. Given this data, a fair suggestion for modelling the probability of a crises occurring in the future as a function of GDP and Debt today would be to estimate a linear model like that in equation (1). However, the predictions from fitting a straight line like this would be unbounded and we know, by definition, that probabilities must lie between 0 and 1. Therefore, (1) can be passed through a logistic function, as in equation (2), which essentially ‘squashes’ the straight line to fit within the bounds of probability.
Yi,t = β0 + β1GDPi,t-1 + β2Debti,t-1 + εi,t
Prob(Crisis occurring) = logit(Yi,t)
The weights (β0, β1 and β2) can then be estimated via maximum likelihood. Suppose the ‘best’ weights are estimated to be 0.3 for GDP and 0.7 for Debt. These would be the ‘best’ conditional on the information available, ie the data on GDP and Debt. And this data is finite. Theoretically, one could collect data on other variables, expand the data set over a longer time horizon, or improve the accuracy of the data already available. But in practice, obtaining a complete set of information is not possible, there will always be things that we do not know. Consequently, we are uncertain about which weights are truly ‘best’. And in the context of predicting financial crises, which are rare and complex, this is especially true.
It may be possible to quantify the uncertainty associated with this lack of information. To do so, one must step out of the frequentist world and into the Bayesian world. This provides a new perspective, one in which the weights in the model no longer take single ‘best’ values. Instead, they can take a range of values from a probability distribution. These distributions describe all of the values that the weights could take, as well as the probability of those values being chosen. The goal then is no longer to estimate the weights, but rather the parameters associated with the distributions to which the weights belong.
Once the weights of a frequentist model have been estimated, new data can be passed into the model to obtain a prediction. For example, suppose one is again working with the toy data discussed previously and numbers are available for GDP and Debt corresponding to the current year. Whether or not a crisis is going to occur next year is unknown, so the GDP and Debt data are passed into the estimated model. Given that there is one value for each weight, a single value for the probability of a crisis occurring will be returned. In the case of a Bayesian model, the GDP and Debt numbers for the current year can be passed through the model many times. On each pass, a random sample of weights can be drawn from the estimated distributions to make a prediction. By doing so, an ensemble of predictions can be acquired. These ensemble predictions can then be used to calculate a mean prediction, as well as measures of uncertainty such as the standard deviation and confidence intervals.
A Bayesian neural network for predicting crises
To put these Bayesian methods to the test, I use the Jordà-Schularick-Taylor Macrohistory Database – in line with Bluwstein et al (2020) – to try and predict whether or not crises will occur. This brings together comparable macroeconomic data from a wide range of sources to create a panel data set that covers 18 advanced economies over the period 1870 to 2017. Armed with this data set, I then construct a Bayesian neural network that (a) predicts crises with a competitive accuracy and (b) quantifies the uncertainty around each prediction.
Chart 1 below shows stylised representations of a standard neural network and a Bayesian neural network, each of which is constructed as ‘layers’ of ‘nodes’. One starts with the ‘input’ layer, which is simply the initial data. In the case of the simple example of equation (1) there would be three nodes. One each for GDP and Debt, and another which takes the value 1 (this is analogous to including an intercept in linear regression). All of the nodes in the input layer are then connected to all of the nodes in the ‘hidden’ layer (some networks have many hidden layers), and a weight is associated with each connection. Chart 1 shows the inputs to one node in the hidden layer as an example. (The illustration shows a selection of connections in the network. In practice, the networks discussed are ‘fully connected’, ie all nodes in one layer are connected to all nodes in the next layer). Next, at each node in the hidden layer the inputs are aggregated and passed through an ‘activation function‘. This part of the process is very similar to the logistic regression, where the data and an intercept are aggregated via (1) and then passed through the logit function to make the output non-linear.
The outputs of each node in the hidden layer are then passed to the single node in the output layer, where the connections are again weighted. At the output node, again aggregation and activation takes place, resulting in a value between 0 and 1 which corresponds to the probability of there being a crisis! The goal with the standard network is to show the model data such that it can learn the ‘best’ weights for combining inputs, a process called ‘training’. In the case of the Bayesian neural network, each weight is treated as a random variable with a probability distribution. This means that the goal is now to show the model data such that it can learn the ‘best’ estimates of each distributions’ mean and standard deviation – as explained in detail in Jospin et al (2020).
Chart 1: Stylised representation of standard and bayesian neural networks
To demonstrate the capabilities of the Bayesian neural network in quantifying uncertainty in prediction, I train the model using relevant variables from the Macrohistory Database over the full sample period (1870–2017). However, I hold back the sample corresponding to the UK in 2006 (two years prior to the 2008 financial crisis) to use as an out-of-sample test. The sample is fed through the network 200 times. On each pass, each weight is determined as a random draw from its estimated distribution, thus providing a unique output each time. These outputs can be used to calculate a mean prediction with a standard deviation and confidence intervals.
Predicting in practice
The blue diamonds in Chart 2 show the average predicted probability of a crisis occurring form the network’s ensemble predictions. On average, the network predicts that in 2006, the probability of the UK experiencing a financial crisis in either 2007 or 2008 was 0.83. Conversely, the network assigns a probability of 0.17 to there not being a crisis. The model also provides a measure of uncertainty by plotting the 95% confidence interval around the estimates (grey bars). In simple terms, these show the range of estimates that the model thinks the central probability could take with 95% certainty. Therefore, the model (a) correctly assigns a high probability to a financial crisis occurring and (b) does so with a high level of certainty (as indicated by the relatively small grey bars).
Chart 2: Probability of financial crisis estimates for the United Kingdom in 2006
Given the importance of decisions made by policymakers – especially those related to financial stability – it may be desirable to quantify model uncertainty when making predictions. I have argued that Bayesian neural networks may be a viable option for doing so. Therefore, moving forward, these models could provide useful techniques for regulators to consider when dealing with model uncertainty.
Jack Page works in the Bank’s International Surveillance Division.
Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.
If you want to get in touch, please email us at [email protected] or leave a comment below.