High Noise-to-Signal Ratio

Financial markets are known to be very efficient, which leads to a very high noise-to-signal ratio in the financial data, meaning that patterns / signals present in the data are covered by large noise and, therefore, are hard to detect.

Back to our AI Platform
METHODS WE APPLY TO SOLVE IT:

METHODS WE APPLY TO SOLVE IT:

Human Input & Customizable Inductive Biases

We believe in a synergy between machine and human intelligence and, therefore, we combine a search for the patterns in the data with prior expectations provided by domain knowledge experts. Patterns found in the date get a higher significance, if they are expected or verified by financial experts. At the same time, some hypothesis proposed by experts can be rejected, if they are not supported empirically by the data. Even an automated data-driven search for a trading model is usually directed by a proper inductive bias embedded into the model architecture and constructed based on human input. By searching models in relevant and reasonable domains, we restrict the expressive power of models in a proper way and, in this way, significantly reduce the problem of overfit.

Automated Features Generation

We have developed a method for an automated data-driven and goal-driven features generation directly from financial time series. An alternative approach could be a compression of financial time series into vectors of manually chosen features with a subsequent construction of models on top of those features. However, such an approach is associated with a high risk of losing essential information about time series, if far-from-optimal features are used. Other alternatives, like recurrent or convolution models, are also non-optimal since they are general purpose methods and, therefore, to flexible and not tuned for very special time series that we work with.

Data Efficient Modeling

We adopt a data-efficient modeling, meaning that the models that we develop, usually generate exactly what we need (optimal positions / allocations) and not more. A training of more expressive models, for example generating a full probability density, requires more data and, therefore, is not data efficient. Even to train two models, one for expected return and another one for volatility would be not data efficient, since in the end we do not need these two quantities but only a specific relation between them, defining an optimal position.

Extensive Statistical Testing

In vanilla ML positive out-of-sample results are sufficient to accept a model. In our case, a high level of noise in the data leads to noisy KPIs and, therefore, we need to address the fact that observed good out-of-sample results could arise purely by chance. In order to exclude this possibility, standard out-of-sample model evaluation procedures have to be extended by extensive statistical tests. As a result, in our final evaluation all the KPIs are supplied with estimations of statistical significance, confidence intervals and even complete probability distributions.