Tutorial 3: How to Develop Strategies Using Machine Learning Models

Machine Learning in Trading

Machine learning has revolutionized various industries, including trading. The QuantiX platform empowers traders with machine learning solutions, enabling them to develop trading strategies without the complexities of the technical implementation. By providing a user-friendly interface, QuantiX simplifies the application of machine learning models to financial markets.
With this platform, traders can explore different machine learning models, fine-tune their parameters, and optimize strategies to maximize profitability.

Model Development Process

In this tutorial, we will walk through the process of developing a machine learning model for trading.
The first step in creating a machine learning model is training. Training refers to the process of transferring knowledge to the model. Machine learning models learn from historical data samples, extracting patterns and insights to make informed predictions. This ability allows them to solve a variety of problems, including those in trading.
To train a model in QuantiX:

Navigate to Data Analysis.
Open the ML menu and select New Model.
Alternatively, you can go to My Models (accessible from the ML menu) and click the Add button.

How to Train a Model

To train a model, market data is required. This market data must include all the indicators that will be used for training. In this tutorial, we will use the following indicators to train a machine learning model for cryptocurrency trading:

Open, High, Low, Close, Volume (OHLCV)
Relative Strength Index (RSI)
Bollinger bands

Before training, we need to create a Market Data set that includes the selected indicators. For this tutorial, we will use historical 15-minute candlestick data for the following cryptocurrency pairs:

BTC/USDT
ETH/USDT
BNB/USDT

The dataset will cover the period from January 1, 2017, to December 30, 2023.
Once the market data is prepared, we can proceed with training the model. If you need guidance on creating market data and adding features to it, refer to Tutorial 1 in the beginner tutorial series.
Training a machine learning model involves configuring multiple parameters, which are set up in three key steps:

Training Data
Model Parameters
Feature Processor

1. Training Data

In this step, you will define the specifications of the training data. In other words, you will select the dataset that the model will use during the training process. To proceed to the next step, the following fields must be completed:

Name: Assign a meaningful name to your model to help identify it easily.
Market Data: Select the market data that you previously created for training.
Training Pair(s): Select which pairs you want to use for training. In this tutorial, we will use all the pairs included in the selected market data.
Timeframe: If your market data contains multiple timeframes, select the specific timeframe you want the model to be trained on.
Numeric Features: Choose the features that the model will use for training. You can select multiple features, but keep in mind that including too many may increase the training time. In this tutorial, we will use OHLCV, RSI, and Bollinger Bands.
Target: Define the target variable that the model will learn to predict. The model will be trained to mimic the behavior of the target as closely as possible. In other words, the goal of training phase is to predict the target labels by using the selected features. In this tutorial, we will use Rally-Based Classifying with window size of 100, buy and sell thresholds of 0.2.
Start Date: Specify the starting date for the training data.
End Date: Define the last date of the dataset to be used for training.

Note that a fair and reliable evaluation of a model is only possible when the training period and test period are kept separate. This means that the model must be tested on a time period that is not included in the training phase.

2. Model Parameters

As the name suggests, this step involves configuring the parameters of the model, which serves as the core of the machine learning process. Our platform currently offers five model options:

XGBoost Classifier: A tree-based model that leverages the power of eXtreme Gradient Boosting (XGBoost) to enhance predictive performance.
AdaBoost Classifier: A tree-based model that employs adaptive Boosting.
K-Nearest Neighbor (KNN): A non-parametric classifier that makes predictions based on the closest previous samples to the current test sample.
Naïve Bayes: A probabilistic classifier that applies a simplified version of Bayes' Theorem.
CatBoost: A tree-based model similar to XGBoost, but optimized for lower computational cost and featuring structural differences that enhance efficiency.

Each model has its own advantages and disadvantages—there is no single model that always delivers the best performance. It is up to you to experiment and determine which model works best for your dataset.
Additionally, each model comes with various parameters that influence its behavior. You can access these configurations by clicking on Advanced.
To keep things simple, we will use the XGBoost Classifier with its default settings in this tutorial. You are encouraged to modify the parameters and observe how they impact the results. However, fine-tuning models is beyond the scope of this tutorial. For a deeper understanding of the available models and their parameters, refer to the documentation.

3. Feature Processor

The Feature Processor plays a crucial role in preparing data for the model. It serves two main purposes:

Shaping the input of the model
Pre-processing the inputs of the model

Model Memory defines the number of historical candles the model considers for inference.

When memory is set to 1, the model only uses the features of the last candle to predict the label for the candle.
As the memory size increases, the model incorporates data from more previous candles, potentially improving performance by recognizing longer-term patterns.

Note that increasing memory also leads to longer training times and does not always guarantee better results. As an analyst, it is your responsibility to experiment and determine the optimal configuration for your model.
In this tutorial, we will use a memory size of 1 for simplicity.
If you open the Advanced dropdown menu, you will find additional options for sampling and input data processing. For this tutorial, we will keep the default settings, but if you’re interested in learning more, refer to the documentation.

Price-Like Feature Normalization is a process that normalizes features measured in dollar values ($). This transformation helps the model learn patterns in price data more effectively—though its effectiveness may vary depending on the dataset.
One consequence of this normalization is that the values of one feature may become zero for all candles. However, this adjustment generally contributes to a more efficient learning process.
For this tutorial, we will not modify these settings. If you want to explore this topic further, refer to the documentation.

Once you have configured the Feature Processor, the model is ready for training.
To begin, click on the Train button. After doing so, you will be redirected to the My Models page. Wait for the status of the model to become Successful.
Note: Training process may take several minutes to complete, especially when using large memory sizes or a high number of features.
Once the model is successfully trained, you can use the Actions menu to:

Delete the model
Clone the model
Statistically evaluate its performance.

We will explore these functionalities in the next tutorials.

How to Extract Trading Signals From a Model

Once a model is trained, it can predict probabilities for the labels it has been trained on. However, since our backtest engine operates with trading signals, these probabilities must be converted into actionable signals.
To generate trading signals from the model’s predictions, follow these steps:

Navigate to the ML menu.
Select Signal Generator.

The Signal Generator utilizes the predicted labels from the training period to create signals that can be used in backtests. To use this tool, configure the following settings:

Market Data: Select the market data that was used to train the model.
Model Name: The name of the model that you want to extract signals from.
Reference Pair(s): Select the pairs used for signal generation process. It is recommended to choose the pairs you intend the model to trade.
Certainty: Define the minimum certainty threshold required for the model to generate a buy or sell signal. This is based on the model’s output probability distribution.
Not Matched Value: Specify the value assigned to a candle when it does not meet the conditions for either a buy or sell signal.

To generate trading signals, click on Convert to Signal. This process will create signals associated with the trained model.

If your model predicts n labels, then n signals will be generated-each corresponding to one label.
In this tutorial, we are using rally-based classifying, which has three labels, so the Signal Generator will produce three corresponding signals.

Once the signals are ready, you can use them to run a backtest (and turn the backtest to a trading bot if you want afterward).
You can access the generated signals in the Signal section of the New Backtest page. Alternatively use Go to Backtest button to navigate to the backtest page directly.

Last modified: 10 March 2025