Racket Machine Learning — Decision Trees
This Package is part of a set of packages implementing machine learning capabilities for Racket. This particular package implements support for classification of individuals using decision trees.
From Wikipedia (note that as this implementation is focused on classification we do not include chance nodes):
A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.
The intention is also to provide capabilities for random forest trees and other supervised learning capabilities that generate the trees to be used by the classifier described below.
1 Module rml-decisiontrees
(require rml-decisiontrees) | package: rml-decisiontrees |
The notion of a decision tree is to encode a set of tests and outcomes that should result in a classification value. This module builds a decision-tree from a root decision-node that represents a set of tests against the value of a specified feature on an individual. The outcome of a particular test is either the next decision-node, or a terminal-node that holds a classification value.
> (define test-tree (make-decision-tree (make-decision "size" (list [cons (curryr < 5) (make-terminal 'small)] [cons (curryr > 10) (make-terminal 'large)]) #:else (make-decision "color" (list [cons (curry string=? "black") (make-terminal 'cocktail)]) #:else (make-terminal 'medium)))))
> (define test-individual (make-hash (list (cons "size" 6) (cons "color" "black")))) > (tree-classify test-tree test-individual) 'cocktail
> (for-each displayln (reverse (tree-trace test-tree)))
(root (decision "size" [2]))
(size 6) -> (decision "color" [1])
(color "black") -> (terminal cocktail)
The example above classifies some individuals that represent dresses, and will primarily use the size to identify small, medium, or large. However, it will further classify medium sized dresses as cocktail dresses if they are black. Clearly, this is not a terribly good, or realistic, classifier!
1.1 Construction
procedure
(make-decision-tree root-decision) → decision-tree?
root-decision : decision-node?
Every path through the tree should result in a terminal-node.
No loops are allowed.
procedure
(make-decision feature cond-pairs [ #:else else-node]) → decision-node? feature : non-empty-string?
cond-pairs :
(non-empty-listof (cons/c (-> serializable? boolean?) tree-node?)) else-node : tree-node? = null
The specific feature value will be passed into each condition predicate in order. For the first condition to return #t, the corresponding tree-node is the outcome of the decision and will be evaluated in turn. If no condition predicate returns #t and a value exists for else-node, it will be treated as the outcome of the decision.
procedure
(make-terminal value [#:error is-error]) → terminal-node?
value : serializable? is-error : boolean? = #t
1.2 Classification
procedure
(tree-classify tree individual) → serializable?
tree : decision-tree? individual : individual?
Note that the value drop-through-error may be returned in the case that a decision does not have a predicate for the given feature value and no else-node was specified.
procedure
(tree-trace tree) → (listof string?)
tree : decision-tree?
value
drop-through-error : symbol?
parameter
(sandbox-predicates) → boolean?
(sandbox-predicates sandbox-enable) → void? sandbox-enable : boolean?
= #t