No Explicit Target in the Data

The quality of a ML model is normally evaluated by comparing model’s outcomes with so called targets, which are present in the data. In case of a development of a trading model, the desired outcomes (optimal position / allocation) are not present in the data explicitly, which makes it impossible to use any regression model directly.

Back to our AI Platform
METHODS WE APPLY TO SOLVE IT:

METHODS WE APPLY TO SOLVE IT:

Redefinition of Targets & Introduction of Special Weights

Within the standard supervised learning settings, the data set can be thought of as a set of features-target pairs which, in their turn, are nothing but examples of an unknown mapping from features-vectors to the corresponding targets. The goal of machine learning is to find a function which mimics the observed instances of the mapping as close as possible, so that for a given new features-vector, the ML model can predict the corresponding (not yet observed) target.

These settings resemble the problem which we face when building a trading model. Namely, at a given moment of time in the past we have a features vector that describes the history of prices of a given financial instrument or/and any other relevant information available at that moment of time. Naively we could think that we want to be able to use the given features to predict the price movement for the upcoming period of time, so that we can decide what position / allocation to take. Would it be the case, we could treat the following price changes as targets that could be used to train a predictive model. However, a closer look at the problem, makes it clear that a prediction of the upcoming price change (expected value of the price change) does not give us a possibility to decide the optimal position for the corresponding upcoming period of time. What our model really needs to generate is the optimal position directly, rather than an intermediate prediction of the price change. Obviously, the optimal positions are not present in the data set, which means that the numbers generated by a candidate model cannot be compared explicitly with some examples, which brings us to a very unusual situation from the ML perspective.

 A possible way out of the described situation could be a refusal to use ML regression as a tool and a direct optimization of a function generating position. However, we found a way to construct new special targets and special weights based on the price changes, such that a minimization of squared deviations leads to positions maximizing Sharpe ratio. This method, which from the first point of view looks like just a technical curiosity, opens up for us the full power of machine learning. After a definition of special targets and weights we can use any available ML regressor (Decision Tree, Neural Network, Random Forest, XGBoost) to directly train a model that directly generates positions optimizing Sharpe ratio.