XGBoost mostly combines a huge number of regression trees with a small learning rate. In this situation, trees added early are significant and trees added late are unimportant.
Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.
This is a instruction of new tree booster dart
.
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. “DART: Dropouts meet Multiple Additive Regression Trees.” JMLR.
Drop trees in order to solve the over-fitting.
Trivial trees (to correct trivial errors) may be prevented.
Because of the randomness introduced in the training, expect the following few differences:
Training can be slower than gbtree
because the random dropout prevents usage of the prediction buffer.
The early stop might not be stable, due to the randomness.
In \(m\)-th training round, suppose \(k\) trees are selected to be dropped.
Let \(D = \sum_{i \in \mathbf{K}} F_i\) be the leaf scores of dropped trees and \(F_m = \eta \tilde{F}_m\) be the leaf scores of a new tree.
The objective function is as follows:
\(D\) and \(F_m\) are overshooting, so using scale factor
The booster dart
inherits gbtree
booster, so it supports all parameters that gbtree
does, such as eta
, gamma
, max_depth
etc.
Additional parameters are noted below:
sample_type
: type of sampling algorithm.
uniform
: (default) dropped trees are selected uniformly.
weighted
: dropped trees are selected in proportion to weight.
normalize_type
: type of normalization algorithm.
tree
: (default) New trees have the same weight of each of dropped trees.
forest
: New trees have the same weight of sum of dropped trees (forest).
rate_drop
: dropout rate.
range: [0.0, 1.0]
skip_drop
: probability of skipping dropout.
If a dropout is skipped, new trees are added in the same manner as gbtree.
range: [0.0, 1.0]
import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'booster': 'dart',
'max_depth': 5, 'learning_rate': 0.1,
'objective': 'binary:logistic',
'sample_type': 'uniform',
'normalize_type': 'tree',
'rate_drop': 0.1,
'skip_drop': 0.5}
num_round = 50
bst = xgb.train(param, dtrain, num_round)
preds = bst.predict(dtest)