Wednesday, March 21, 2012

microsoft decision tree algorithm

I have read some sources about microsoft decision tree algorithm like in claude seidman book, paper about scalable classification over sql databases and paper about learning bayesian network. But i still don't understand and i still didn't get the point on how microsoft decision tree algorithm works exactly when splitting an atribut. Because i have read that microsoft decision tree using Bayesian score to split criteria is it true?

Well, anyone could help me to understand about microsoft decision tree algorithm, please give me details explanation with some example(cases).

thanks for anyone help

There are links to the appropriate research papers at http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/986.aspx

Thanks

-Jamie

|||Hi, there, the first link fails, i am also interested to know how Microsoft Decision Trees splits a continous attribute for derivation of downstream regression models: using information gain or regession approach? thx.|||i'd like to further clarify my previous question for regression tree building: which method is used for splitting a continous variable: information gain or regression approach. The first splitting method (information gain) first bins a continous attribute and turns it into a categorical variable, selects a value which gives highest informtion gain; the second method, according to http://msdn2.microsoft.com/en-us/library/ms175312.aspx, the splitting is performed at the point of non-lineararity(statistical method?). I just wonder which method is used by Microsoft Decision Trees for splitting in the case of a continous variable.|||If the target is a discrete attribute, it uses information gain. For continuous targets, it uses the regression approach|||Thank you for your answer, Jamie. but my question is how to split the continous INPUT attribute (The target variable is assumed automatically to be continuous in the case of regression tree).|||I was describing how the INPUT is handled for various OUTPUTs - if you are only considering the case of regression trees, than it is always the regression approach.|||

Hi, there,

when I use the Microsoft Decision Trees algoirthm to build a regression tree and choose SCORE_METHOD: (1) entropy. :

does the algorithm select the categorical input attribute and split-on by using entropy function? is the continuous input attribute binned into categorical attribute for entropy calculation?

Or this score_method does not act on the regression tree?

Any help is very much appreciated,

Hongqin

|||Continuous variables are split by using a internal binning approach that searches the continuous space for the best split point. This is independent of the score method.|||thx, Jamie.|||

Dear all,

I've tried to change split_method to (2) Complete. It doesn't generate any leaves or trees. It show only top of the tree.

But when I changed it back to (3) Both. The result is ok..but include Binary Split and Complete Split.

By the way, How can i do if I want it to generate only complete split.

Thank you for your answer in advanced.

Nop Vorrasanpisut.

No comments:

Post a Comment