public class CHAID extends DecisionTree implements Serializable, Cloneable
Generates a decision tree using CHAID for categorical or discrete ordered
predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square
automatic interaction detection. At each node, CHAID
looks for
the best splitting variable using the following steps: given a predictor
variable X, perform a 2-way chi-squared test of association between
each possible pair of categories of X with the categories of Y.
The least significant result is noted and, if a threshold is met, the two
categories of X are merged. Treating this merged category as a single
category, repeat the series of tests and determine if there is further
merging possible. If a merged category consists of three or more of the
original categories of X, CHAID
calls for a step to test
whether the merged categories should be split. This is done by forming all
binary partitions of the merged category and testing each one against
Y in a 2-way test of association. If the most significant result meets
a threshold, then the merged category is split accordingly. As long as the
threshold in this step is smaller than the threshold in the merge step, the
splitting step and the merge step will not cycle back and forth. Once each
predictor is processed in this manner, the predictor with the most
significant qualifying 2-way test with Y is selected as the splitting
variable, and its last state of merged categories define the split at the
given node. If none of the tests qualify (by having an adjusted p-value
smaller than a threshold), then the node is not split. This growing procedure
continues until one or more stopping conditions are met.
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor and Description |
---|
CHAID(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
CHAID object for a single response variable and
multiple predictor variables. |
Modifier and Type | Method and Description |
---|---|
double |
getMergeCategoriesSigLevel()
Returns the significance level for merging categories.
|
double |
getSplitMergedCategoriesSigLevel()
Returns the significance level for splitting previously merged
categories.
|
double |
getSplitVariableSignificanceLevel()
Returns the significance level for split variable selection.
|
protected int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
Selects the split variable for the current node using CHAID (Chi-square
automatic interaction detection).
|
protected void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setMergeCategoriesSignificanceLevel(double mergeAlpha)
Sets the significance level for merging categories.
|
void |
setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
Sets the significance level for splitting previously merged categories.
|
void |
setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
Sets the significance level for split variable selection.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfSets, isAutoPruningFlag, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode
getClassCounts, getCostMatrix, getMaxNumberOfCategories, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isMustFitModelFlag, isUserFixedNClasses, setClassCounts, setCostMatrix, setFitModelFlag, setMaxNumberOfCategories, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setWeights
public CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
CHAID
object for a single response variable and
multiple predictor variables.xy
- a double
matrix that is a number of observations
by the number of variables, which is the number of predictor variables
plus one response variable.responseColumnIndex
- an int
specifying the column
index of the response variable.varType
- a PredictiveModel.VariableType
array containing the type of each variable.public double getMergeCategoriesSigLevel()
double
that specifies the significance level for
merging categories.public double getSplitMergedCategoriesSigLevel()
double
equal to the significance level for
splitting merged categories.public double getSplitVariableSignificanceLevel()
double
that specifies the significance level for
split variable selection.protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, int[] splitPartition)
selectSplitVariable
in class DecisionTree
xy
- a double
matrix containing the data.classCounts
- a double
array containing the counts for
each class of the response variable, when it is categorical.parentFreq
- a double
array used to determine which
subset of the observations belong in the current node.splitValue
- a double
array representing the resulting
split point if the selected variable is quantitative.splitPartition
- an int
array indicating the resulting
split partition if the selected variable is categorical.int
specifying the column index of the split
variable in xy
.protected void setConfiguration(PredictiveModel pm) throws DecisionTree.PruningFailedToConvergeException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException
PredictiveModel
to that of the
input model.setConfiguration
in class DecisionTree
pm
- a PredictiveModel
object which is to have its
attributes duplicated in this instance.PruningFailedToConvergeException
- pruning has failed to converge.PredictiveModel.StateChangeException
- an input parameter has
changed that might affect the model estimates or predictions.PredictiveModel.SumOfProbabilitiesNotOneException
- the
sum of the probabilities does not equal 1.PredictiveModel.SumOfProbabilitiesNotOneException
- the sum of the
probabilities does not equal 1.DecisionTree.PruningFailedToConvergeException
public void setMergeCategoriesSignificanceLevel(double mergeAlpha)
mergeAlpha
- a double
that specifies the significance
level for merging categories.
mergeAlpha
must be between 0.0 and 1.0 and
mergeAlpha
splitMergeAlpha
, unless splitting of previously merged
categories is disabled (the default).
Default: mergeAlpha
= 0.05.
public void setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
splitMergedAlpha
- a double
that specifies the
significance level for splitting merged categories.
splitMergeAlpha
must be greater than or equal to
getMergeCategoriesSigLevel()
unless disabled using
splitMergeAlpha=-1.
Default: splitMergeAlpha
= -1.0 disables splitting of merged
categories.
public void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
splitVariableSelectionAlpha
- a double
that specifies
the significance level for split variable selection.
splitVariableSelectionAlpha
must be between 0.0 and 1.0.
Default: splitVariableSelectionAlpha
= 0.05.
Copyright © 1970-2015 Rogue Wave Software
Built October 13 2015.