public abstract class DecisionTreeInfoGain extends DecisionTree implements Serializable, Cloneable
Abstract class that extends DecisionTree
for classes that use an
information gain criteria.
Modifier and Type | Class and Description |
---|---|
static class |
DecisionTreeInfoGain.GainCriteria
Specifies which information gain criteria to use in determining the best
split at each node.
|
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor and Description |
---|
DecisionTreeInfoGain(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
DecisionTree object for a single response
variable and multiple predictor variables. |
Modifier and Type | Method and Description |
---|---|
protected double |
information(int[] x,
int[] y,
double[] classCounts,
double[] weights,
boolean xInfo)
Returns the expected information of a variable
y over a
partition determined by the variable x . |
protected abstract int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
double[] splitCriterionValue,
int[] splitPartition)
Abstract method for selecting the next split variable and split
definition for the node.
|
void |
setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
Specifies which criteria to use in gain calculations in order to
determine the best split at each node.
|
void |
setUseRatio(boolean ratio)
Sets the flag to use or not use the gain ratio instead of the gain to
determine the best split.
|
boolean |
useGainRatio()
Returns whether or not the gain ratio is to be used instead of the gain
to determine the best split.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setConfiguration, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelection
getClassCounts, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
public DecisionTreeInfoGain(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
DecisionTree
object for a single response
variable and multiple predictor variables.xy
- a double
matrix with rows containing the
observations on the predictor variables and one response variableresponseColumnIndex
- an int
specifying the column
index of the response variablevarType
- a PredictiveModel.VariableType
array containing the type of each variableprotected double information(int[] x, int[] y, double[] classCounts, double[] weights, boolean xInfo)
y
over a
partition determined by the variable x
.
Given a data subset containing both variables and , let
be a partition of determined by the values in . Then the expected information is where is either the Shannon entropy or the Gini index, according toDecisionTreeInfoGain.GainCriteria
.
Note: if is constant, the return value is the
Shannon Entropy (or Gini index) of Y.
x
- an int
array of length xy.length
containing values of a predictor or an indicator vector defining the
partition of the observations.y
- int
array of length xy.length
containing the values of the response variable.classCounts
- a double
array containing the counts for
each class of the response variable, when it is categorical.weights
- a double
array used to indicate which subset
of the observations belong in the current node.xInfo
- a boolean
indicating that we are getting
information about x
using a simple frequency estimate.
Value | Method |
true |
simple frequency estimate |
false |
prior probabilities |
double
indicating the information uncertainty.protected abstract int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
selectSplitVariable
in class DecisionTree
xy
- a double
matrix containing the dataclassCounts
- a double
array containing the counts for
each class of the response variable, when it is categoricalparentFreq
- a double
array used to indicate which
subset of the observations belong in the current nodesplitValue
- a double
array representing the resulting
split point if the selected variable is quantitativesplitCriterionValue
- a double
, the value of the
criterion used to determine the splitting variablesplitPartition
- an int
array indicating the resulting
split partition if the selected variable is categoricalint
specifying the column index of the split
variable in this.getPredictorIndexes
public void setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
gainCriteria
- a DecisionTreeInfoGain.GainCriteria
specifying which criteria to
use in gain calculations in order to determine the best split at each
node
Default: gainCriteria
= DecisionTreeInfoGain.GainCriteria.SHANNON_ENTROPY
public void setUseRatio(boolean ratio)
ratio
- a boolean
indicating if the gain ratio is to be
used
true
uses the gain ratio; false
uses the gain.
Default: useRatio=false
public boolean useGainRatio()
boolean
indicating if the gain ratio is to be used
true
, uses the gain ratio; false
uses the gain.
Copyright © 1970-2016 Rogue Wave Software
Built May 19 2016.