Following yesterday’s statement I post today on a computational finance topic. This field has been one of the major users of computational developments over the years, and nowadays every serious financial organization is more or less an Information Technology and Computer Science business, at least form an operational perspective. Nevertheless the fundamentals of Finance still matter, specially the advanced mathematical, probability and statistical concepts underpinning models that structure trading strategies for investment funds, pension funds, asset management companies, private equity funds and hedge funds. A deep group of institutional businesses, indeed.
Machine Learning and its techniques have found a way with trading in financial institutions since electronic trading established for itself a prominent place. As it dwells heavily on data analysis and statistical techniques, it isn’t a surprise its increasing importance for electronic quantitative trading over the years. But machine learning ends up being of particular utility when the computational capacity of machines becomes ever stronger. As we know machine learning is the conceptual and computational apparatus for artificial intelligence systems of our day, that left behind their symbolic theoretical beginnings and embraced a data driven, empirical awakening during the last decade and a half.
Anyone following The Information Age certainly would have noticed that I post a huge amount on deep neural networks. Deep Learning and deep neural networks are one of the most important machine learning developments of recent years. They are revolutionizing such seemingly disparate fields such as Computer Vision (as a side note I would add that this is my favorite field in this disruption, as it has a lot to do with my scientific and technological background), Natural Language Processing, Speech Recognition and even art and musical applications, just to mention the most widely praised and hyped. But the topic of today’s post – Computational Finance – happens to be another one of the most burgeoning fields where deep learning and artificial intelligence is entering and promising to upgrade, to say the least at this moment.
And so we come to the paper I’ve chosen to review today. It was found while doing research for another topic, and I decided it fit to be one of the last posts here in this Blog for this year. I will be on vacation from The Information Age next week. It is appropriately titled Deep Portfolio Theory, and its abstract goes like this:
We construct a deep portfolio theory. By building on Markowitz’s classic risk-return trade-off, we develop a self-contained four-step routine of encode, calibrate, validate and verify to formulate an automated and general portfolio selection process. At the heart of our algorithm are deep hierarchical compositions of portfolios constructed in the encoding step. The calibration step then provides multivariate payouts in the form of deep hierarchical portfolios that are designed to target a variety of objective functions. The validate step trades-off the amount of regularization used in the encode and calibrate steps. The verification step uses a cross validation approach to trace out an ex post deep portfolio efficient frontier. We demonstrate all four steps of our portfolio theory numerically.
Financial Portfolio theories are one of the important achievements in financial economics in the last XX Century. One such theory goes by the designation of Markowitz Portfolio Theory or Modern Portfolio Theory, named after Nobel Prize in Economic Sciences winner Harry Markowitz. We read the Wikipedia entry for this theory and we can immediately confirm it as a mathematical and statistical theory at its core. And if it is mathematical and statistical at its core it is well positioned to be improved and enhanced by an algorithmic, computational approach. And that is the case with our paper’s proposal: it is another one software approach to Portfolio Theory that turns the problem of finding the best efficient frontier predicted by the theory into a mathematical optimization problem , but from the new machine learning/deep learning perspective. If the optimization is large enough, it may well be best approached as a deep learning problem, so our authors claim. Let us check and judge.
The goal of our paper is to provide a theory of deep portfolios. While we base our construction on Markowitz’s original idea that portfolio allocation is a trade-off between risk and return, our approach differs in a number of ways. The objective of deep portfolio theory is twofold. First, we reduce model dependence to a minimum through a data driven approach which establishes the risk-return balance as part of the validation phase of a supervised learning routine, a concept familiar from machine learning. Second, we construct an auto-encoder and multivariate portfolio payouts, denoted by Fˆm(X) and Fˆp(X) respectively, for a market m and portfolio objective p, from a set of base assets, denoted by X, via a hierarchical (or deep) set of layers of univariate nonlinear payouts of sub-portfolios. We provide a four-step procedure of encode, calibrate, validate and verify to formulate the portfolio selection process. Encoding finds the market-map, calibration finds the portfolio-map given a target based on a variety of portfolio objective functions. The validation step trades-off the amount of regularization and errors involved in the encode and calibrate steps. The verification step uses a cross validation approach to trace out an efficient deep frontier of portfolios.
From the paragraph above we check that our authors aim to a different approach from Markowitz’s, but still assuming the original risk-return trade-off. The first difference concerns assuming the risk-return balance as a data driven problem, amenable to supervised machine learning routines as part of a validation step; the second assumes the construction of a deep hierarchical autoencoder which computes a set of univariate nonlinear sub-portfolios coupled together with a set of multivariate portfolio payouts with some objective function. With a combination of encode, calibrate, validate and verify steps, it is claimed that the result is an improved trace of deep portfolios’ efficient frontier.
To further illuminate the motivation behind this effort:
Deep portfolio theory relies on deep factors, lower (or hidden) layer abstractions which, through training, correspond to the independent variable. Deep factors are a key feature distinguishing deep learning from conventional dimension reduction techniques. This is of particular importance in finance, where ex ante all abstraction levels might appear equally feasible.
Dominant deep factors, which frequently have a non-linear relationship to the input data, ensure applicability of the subspace reduction to the independent variable. The existence of such a representation follows from the Kolmogorov-Arnold theorem which states that there are no multivariate functions, only compositions of univariate semiaffine (i.e., portfolio) functions. This motivates the generality of deep architectures.
But the training data issue so common in deep architectures is assumed to be solved by:
The question is how to use training data to construct the deep factors. Specifically, for the univariate activation functions such as tanh or rectified linear units (ReLU), deep factors can be interpreted as compositions of financial put and call options on linear combinations of the assets represented by X. As such, deep factors become deep portfolios and are investible, which is a central observation.
The theoretical flexibility to approximate virtually any nonlinear payout function puts regularization in training and validation at the center of deep portfolio theory. In this framework, portfolio optimization and inefficiency detection become an almost entirely data driven (and therefore model free) tasks. One of the primary strength is that we avoid the specification of any statistical inputs such as expected returns or variance-covariance matrices. Specifically, we can often view statistical models as poor autoencoders in the sense that if we had allowed for richer non-linear structure in determining the market-map, we could capture lower pricing errors whilst still providing good out-of-sample portfolio efficiency.
Elegant, to say the least.
Deep Portfolio Construction
The setup for the deep portfolio theory follows.
Assume that the available market data has been separated into two (or more for an iterative process) disjoint sets for training and validation, respectively, denoted by X and Xˆ.
Our goal is to provide a self-contained procedure that illustrates the trade-offs involved in constructing portfolios to achieve a given goal, e.g., to beat a given index by a prespecifed level. The projected real-time success of such a goal will depend crucially on the market structure implied by our historical returns. We also allow for the case where conditioning variables, denoted by Z, are also available in our training phase. (These might include accounting information or further returns data in the form of derivative prices or volatilities in the market.)
Our four step deep portfolio construction can be summarized as follows.
- I. Auto-encoding Find the market-map, denoted by Fˆm W (X), that solves the regularization problem
For appropriately chosen F m W , this auto-encodes X with itself and creates a more information-efficient representation of X (in a form of pre-processing).
- II. Calibrating For a desired result (or target) Y , find the portfolio-map, denoted by Fˆp W (X), that solves the regularization problem
- III. Validating Find Lˆm and Lˆp to suitably balance the trade-off between the two errors
where X∗ m and X∗ p are the solutions to (1) and (2), respectively.
- IV. Verifying Choose market-map F m and portfolio-map F p such that validation (step 3) is satisfactory. To do so, inspect the implied deep portfolio frontier for the goal of interest as a function of the amount of regularization provides such a metric.
The more detailed description:
Consider a large amount of input data X = (Xit)ˆ[N,T] i,t=1 ∈ RˆT ×N , a market of N stocks over T time periods. X is usually a skinny matrix where N T, for example N = 500 for the SP500, and T can be very large large corresponding to trading intervals. Now, specify a target (or output/goal) vector Y ∈ RˆN .
An input-output map F(·) that reproduces or decodes the output vector can be seen as a data reduction scheme, as it reduces a large amount of input data to match the desired target.
This is where we use a hierarchical structure of univariate activation functions of portfolios. Within this hierarchical structure, there will be a latent hidden structure well-detected by deep learning. Put differently, given empirical data, we can train a network to find a look-up table Y = FW (X), where FW (·) is a composition of semi-affine functions (see Heaton, Polson and Witte, 2016). We fit the parameters W using a objective function that incorporates a regularization penalty
Markowitz and Black-Litterman
We now show how to interpret the Markowitz (1952) and the Black-Litterman (1991) models in our framework. The first key question is how to auto-encode the information in the market. The second is how to decode and make a forecast for each asset in the market. Markowitz’s approach can also be viewed as an encoding step only determined by the empirical mean and variance-covariance matrix
One of the key insights of deep portfolio theory is that if we allow for a regularization penalty λ, then we can search (by varying λ) over architectures that fit the historical returns while providing good out-of-sample predictive frontiers of portfolios. In some sense, the traditional approach corresponds to non-regularization.
Perhaps with this goal in mind, Black-Litterman provides a better auto-encoding of the market by incorporating side information (or beliefs) in the form of an L 2 -norm representing the investors’ beliefs. In the deep learning framework, this is seem as a form of regularization. It introduces bias at the fitting stage with the possible benefit of providing a better out-of-sample portfolio frontier
I must confess this paper to be a rewarding read. And I think that the authors are on a sweet spot of theoretical and empirical financial economics mainstream breakthrough. I just do not know at present if this paper’s results are of widespread use by the trading venues of important financial institutions, but it is not hard to believe this to be at least known and tried out. Further down the paper:
There is still the usual issue of how to choose the amount of regularization λ. The veri- fication phase of our procedure says one should plot the efficient portfolio frontier in a predictive sense. The parameter λ is then chosen by its performance in an out-of-sample cross-validation procedure. This contrasts heavily with the traditional ex ante efficient frontiers obtainable from both, the Markowitz and Black-Litterman approaches, which tend to be far from ex post efficient. Usually, portfolios that were thought to be of low volatility ex ante turn out to be high volatile – perhaps due to time varying volatility, Black (1976), which has not been auto-encoded in the simple empirical moments.
By combining the process into four steps that inter-relate, one can mitigate these types of effects. Our model selection is done on the ex post frontier – not the ex ante model fit. One other feature to note is that we never directly model variance-covariance matrices – if applicable, they are trained in the deep architecture fitting procedure. This can allow for nonlinearities in a time-varying implied variance-covariance structure which is trained to the objective function of interest, e.g. index tracking or index outperformance
An Encode-Decode View of the Market
After the recognition of the autoencoder as the most important deep learning application for finance, the authors sketch their conceptual framework and present the elegant results following this worthwhile effort:
for j = 1, . . . , N, where f(·) is a univariate activation function
The optimization problem corresponding to step two (see Section 1.1) of our deep portfolio construction is then given by
The first term is a reconstruction error (a.k.a. accuracy term), and the second a regularization penalty to gauge the variance-bias trade-off (step three) for good out-of-sample predictive performance. In sparse coding, Wnk are mostly zeros. As we increase λ, the solution obtains more zeros. The following is an extremely fast scalable algorithm (a form of policy iteration) to solve problem (3) in an iterative fashion.
Given the factors F, solve for the weights using standard L 1 -norm (lasso) optimization. • Given the weights W, we solve for the latent factors using quadratic programming, which can, for example, be done using the alternating direction method of multipliers (ADMM).
In the language of factors, the weights W in (3) are commonly denoted by β and referred to as betas.
In deep portfolio theory, we now wish to improve upon (3) by adding a multivariate payout function F(x1, . . . , xp) from a set of base assets (x1, . . . , xp) via a hierarchical (or deep) set of layers of univariate nonlinear payouts of portfolios. Specifically, this means that there are nonlinear transformations, and, rather than quadratic programming, we have to use stochastic gradient descent (SGD, which is a natural choice given the analytical nature of the introduced derivatives) in the iterative process.
The theoretical motivation for deep portfolio structure is given by the Kolmogorov-Arnold (1957) representation theorem which remarkably states that any continuous function F(x1, …, xn) of n variables, where X = (x1, . . . , xn), can be represented as
Here, fj and fij are univariate functions, and fij is a universal basis that does not depend on the payout function F. Rather surprisingly, there are upper bounds on the number of terms, N ≤ 2n+ 1 and K ≤ n. With a careful choice of activation functions fi , fij , this is enough to recover any multivariate portfolio payout function.
Hence, a composition (or convolution) of max-layers is a one layer max-sum function. Such architectures are good at extracting option-like payout structure that exists in the market. Finding such nonlinearities sets deep portfolio theory apart from traditional linear factor model structures. Hence a common approach is to use a deep architecture of ReLU univariate activation functions which can be collapsed back to the multivariate option payout of a shallow architecture with a max-sum activation function.
Discussion and Conclusion
I normally recommend the readers of this blog to read the full reviewed paper for further details and a full check up of all the references and support material inside it. But with this paper, and even after having presented a very big chunk of it, I am of the opinion that the reading is mandatory and highly encouraged, specially to readers of an economics and financial taste or professional background.
In what I didn’t cover here the authors detailed all the experimental setup, the details of the data set and further graphical expositions. In those expositions it is presented interesting ways in which the deep portfolio theory was implemented and how it has beaten a benchmark Biotechnology index named IBB Index.
For the moment I just conclude now this somewhat very long post with the discussion and the main conclusions by the authors of this deeply rewarding to read paper:
Deep portfolio theory (DPT) provides a self-contained procedure for portfolio selection. We use training data to uncover deep feature policies (DFPs) in an auto-encoding step which fits the large data set of historical returns. In the decode step, we show how to find a portfolio-map to achieve a pre-specified goal. Both procedures involve an optimization problem with the need to choose the amount of regularization.
To do this, we use an out-of-sample validation step which we summarize in an efficient deep portfolio frontier. Specifically, we avoid the use of statistical models that can be subject to model risk, and, rather than an ex ante efficient frontier, we judge the amount of regularization – which quantifies the number of deep layers and depth of our hidden layers – via the ex post efficient deep frontier. Our approach builds on the original Markowitz insight that the portfolio selection problem can be viewed as a trade-off solved within an optimization framework (Markowitz, 1952, 2006, de Finetti, 1941). Simply put, our theory is based on first encoding the market information and then decoding it to form a portfolio that is designed to achieve our goal.
There are a number of directions for future research. The fundamental trade-off of how tightly we can fit the historical market information whilst still providing a portfolio-map that can achieve our out-of-sample goal needs further study, as does the testing of attainable goals on different types of data. Exploring the combination of non-homogeneous data sources, especially in problems such as credit and drawdown risk, also seems a promising area. Finally, the selection and comparison of (investible) activation functions, especially with regard to different frequencies of underlying market data, is a topic of investigation.
featured image: Modern Portfolio Theory Wikipedia page