Processing math: 100%

DART booster

XGBoost mostly combines a huge number of regression trees with a small learning rate. In this situation, trees added early are significant and trees added late are unimportant.

Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.

This is a instruction of new tree booster dart.

Original paper

Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. “DART: Dropouts meet Multiple Additive Regression Trees.” JMLR.

Features

  • Drop trees in order to solve the over-fitting.

    • Trivial trees (to correct trivial errors) may be prevented.

Because of the randomness introduced in the training, expect the following few differences:

  • Training can be slower than gbtree because the random dropout prevents usage of the prediction buffer.

  • The early stop might not be stable, due to the randomness.

How it works

  • In m-th training round, suppose k trees are selected to be dropped.

  • Let D=iKFi be the leaf scores of dropped trees and Fm=η˜Fm be the leaf scores of a new tree.

  • The objective function is as follows:

Obj=nj=1L(yj,ˆym1jDj+˜Fm)+Ω(˜Fm).
  • D and Fm are overshooting, so using scale factor

ˆymj=iKFi+a(iKFi+bFm).

Parameters

The booster dart inherits gbtree booster, so it supports all parameters that gbtree does, such as eta, gamma, max_depth etc.

Additional parameters are noted below:

  • sample_type: type of sampling algorithm.

    • uniform: (default) dropped trees are selected uniformly.

    • weighted: dropped trees are selected in proportion to weight.

  • normalize_type: type of normalization algorithm.

    • tree: (default) New trees have the same weight of each of dropped trees.

    a(iKFi+1kFm)=a(iKFi+ηk˜Fm)a(1+ηk)D=ak+ηkD=D,a=kk+η
    • forest: New trees have the same weight of sum of dropped trees (forest).

    a(iKFi+Fm)=a(iKFi+η˜Fm)a(1+η)D=a(1+η)D=D,a=11+η.
  • rate_drop: dropout rate.

    • range: [0.0, 1.0]

  • skip_drop: probability of skipping dropout.

    • If a dropout is skipped, new trees are added in the same manner as gbtree.

    • range: [0.0, 1.0]

Sample Script

import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'booster': 'dart',
         'max_depth': 5, 'learning_rate': 0.1,
         'objective': 'binary:logistic',
         'sample_type': 'uniform',
         'normalize_type': 'tree',
         'rate_drop': 0.1,
         'skip_drop': 0.5}
num_round = 50
bst = xgb.train(param, dtrain, num_round)
preds = bst.predict(dtest)