Racket Machine Learning --- Decision Trees

1.0

top ← prev up next →

Racket Machine Learning — Decision Trees

This Package is part of a set of packages implementing machine learning capabilities for Racket. This particular package implements support for classification of individuals using decision trees.

From Wikipedia (note that as this implementation is focused on classification we do not include chance nodes):

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

The intention is also to provide capabilities for random forest trees and other supervised learning capabilities that generate the trees to be used by the classifier described below.

1 Module rml-decisiontrees

(require rml-decisiontrees)

package: rml-decisiontrees

The notion of a decision tree is to encode a set of tests and outcomes that should result in a classification value. This module builds a decision-tree from a root decision-node that represents a set of tests against the value of a specified feature on an individual. The outcome of a particular test is either the next decision-node, or a terminal-node that holds a classification value.

Examples:

> (define test-tree
    (make-decision-tree
      (make-decision
       "size"
       (list [cons (curryr < 5) (make-terminal 'small)]
             [cons (curryr > 10) (make-terminal 'large)])
       #:else (make-decision
               "color"
               (list [cons (curry string=? "black")
                           (make-terminal 'cocktail)])
                     #:else (make-terminal 'medium)))))
> (define test-individual (make-hash (list (cons "size" 6)
                                           (cons "color" "black"))))
> (tree-classify test-tree test-individual)
'cocktail
> (for-each displayln (reverse (tree-trace test-tree)))
(root (decision "size" [2]))
(size 6) -> (decision "color" [1])
(color "black") -> (terminal cocktail)

The example above classifies some individuals that represent dresses, and will primarily use the size to identify small, medium, or large. However, it will further classify medium sized dresses as cocktail dresses if they are black. Clearly, this is not a terribly good, or realistic, classifier!

1.1 Construction

procedure
(make-decision-tree root-decision) → decision-tree?
root-decision : decision-node?

Construct a new decision-tree using root-decision as the initial evaluation. The procedure will raise exceptions if the tree is malformed in any way, for example:

Every path through the tree should result in a terminal-node.
No loops are allowed.

procedure
(make-decision feature
cond-pairs
[ #:else else-node]) → decision-node?
  feature : non-empty-string?
   cond-pairs :
(non-empty-listof
  (cons/c
    (-> serializable? boolean?)
    tree-node?))
  else-node : tree-node? = null

Construct a new decision-node, where cond-pairs represents a set of condition/outcome pairs that test the value of the specified feature.

The specific feature value will be passed into each condition predicate in order. For the first condition to return #t, the corresponding tree-node is the outcome of the decision and will be evaluated in turn. If no condition predicate returns #t and a value exists for else-node, it will be treated as the outcome of the decision.

procedure
(make-terminal value [#:error is-error]) → terminal-node?
value : serializable?
is-error : boolean? = #t

Construct a new terminal-node to return value as the classification result. If is-error is #t the result is treated as an error condition.

1.2 Classification

procedure
(tree-classify tree individual) → serializable?
tree : decision-tree?
individual : individual?

Return the classification value(s) represented by the terminal-node at the end of a path through the decision tree.

Note that the value drop-through-error may be returned in the case that a decision does not have a predicate for the given feature value and no else-node was specified.

procedure
(tree-trace tree) → (listof string?)
tree : decision-tree?

Returns a list of strings that represent the path through the decision tree and denote the feature names, values, and so forth that were selected to reach a classification value. Note that values are returned with the latest decision first and the root decision last.

value
drop-through-error : symbol?

A specific classification value that denotes a non-exhaustive decision (a decision node did not have a matching outcome for the feature’s value).

parameter
(sandbox-predicates) → boolean?
(sandbox-predicates sandbox-enable) → void?
sandbox-enable : boolean?
= #t

This parameter determines whether the predicates invoked by tree-classify within a decision-node will be evaluated within a restricted sandbox environment (see Sandboxed Evaluation). This ensures that the predicates do not perform either long-running actions or rely on file-system or network resources.

top ← prev up next →