public abstract class DecisionTreeInfoGain extends DecisionTree implements Serializable, Cloneable
Abstract class that extends DecisionTree
for classes that use an
information gain criteria.
Modifier and Type | Class and Description |
---|---|
static class |
DecisionTreeInfoGain.GainCriteria
Specifies which information gain criteria to use in determining the best
split at each node.
|
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor and Description |
---|
DecisionTreeInfoGain(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
DecisionTree object for a single response
variable and multiple predictor variables. |
Modifier and Type | Method and Description |
---|---|
protected double |
information(int[] x,
int[] y,
double[] classCounts,
double[] weights,
boolean xInfo)
Returns the expected information of a variable
y over a
partition determined by the variable x . |
protected abstract int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
Abstract method for selecting the next split variable and split
definition for the node.
|
void |
setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
Specifies which criteria to use in gain calculations in order to
determine the best split at each node.
|
void |
setUseRatio(boolean ratio)
Sets the flag to use or not use the gain ratio instead of the gain to
determine the best split.
|
boolean |
useGainRatio()
Returns whether or not the gain ratio is to be used instead of the gain
to determine the best split.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfSets, isAutoPruningFlag, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setConfiguration, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode
getClassCounts, getCostMatrix, getMaxNumberOfCategories, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isMustFitModelFlag, isUserFixedNClasses, setClassCounts, setCostMatrix, setFitModelFlag, setMaxNumberOfCategories, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setWeights
public DecisionTreeInfoGain(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
DecisionTree
object for a single response
variable and multiple predictor variables.xy
- a double
matrix with rows containing the
observations on the predictor variables and one response variable.responseColumnIndex
- an int
specifying the column
index of the response variable.varType
- a PredictiveModel.VariableType
array containing the type of each variable.protected double information(int[] x, int[] y, double[] classCounts, double[] weights, boolean xInfo)
y
over a
partition determined by the variable x
.
Given a data subset containing both variables and , let
be a partition of determined by the values in . Then the expected information is where is either the Shannon entropy or the Gini index, according toDecisionTreeInfoGain.GainCriteria
.
Note: if is constant, the return value is the
Shannon Entropy (or Gini index) of Y.
x
- an int
array of length xy.length
containing values of a predictor or an indicator vector defining the
partition of the observations.y
- int
array of length xy.length
containing the values of the response variable.classCounts
- a double
array containing the counts for
each class of the response variable, when it is categorical.weights
- a double
array used to indicate which subset
of the observations belong in the current node.xInfo
- a boolean
indicating that we are getting
information about x
using a simple frequency estimate.
Value | Method |
true |
simple frequency estimate |
false |
prior probabilities |
double
indicating the information uncertainty.protected abstract int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, int[] splitPartition)
selectSplitVariable
in class DecisionTree
xy
- a double
matrix containing the data.classCounts
- a double
array containing the counts for
each class of the response variable, when it is categorical.parentFreq
- a double
array used to indicate which
subset of the observations belong in the current node.splitValue
- a double
array representing the resulting
split point if the selected variable is quantitative.splitPartition
- an int
array indicating the resulting
split partition if the selected variable is categorical.int
specifying the column index of the split
variable in xy
.public void setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
gainCriteria
- a DecisionTreeInfoGain.GainCriteria
specifying which criteria to
use in gain calculations in order to determine the best split at each
node.
Default: gainCriteria
= DecisionTreeInfoGain.GainCriteria.SHANNON_ENTROPY
public void setUseRatio(boolean ratio)
ratio
- a boolean indicating if the gain ratio is to be used.
true
results in the gain ratio being used and
false
indicates the gain is to be used.
Default: useRatio=false
public boolean useGainRatio()
boolean
indicating if the gain ratio is to be
used.
true
results in the gain ratio being used and
false
indicates the gain is to be used.
Copyright © 1970-2015 Rogue Wave Software
Built October 13 2015.