Data mining with decision trees : theory and applications /
Rokach, Lior.
Data mining with decision trees : theory and applications / by Lior Rokach, Oded Maimon - 2nd ed. - [Hackensack] New Jersey : World Scientific, c2015. - 305 p.: ill.; 24 cm.
computer bookfair2015
Includes bibliographical references and index.
About the Authors; Preface for the Second Edition; Preface for the First Edition; Contents; 1. Introduction to Decision Trees; 1.1 Data Science; 1.2 Data Mining; 1.3 The Four-Layer Model; 1.4 Knowledge Discovery in Databases (KDD); 1.5 Taxonomy of Data Mining Methods; 1.6 Supervised Methods; 1.6.1 Overview; 1.7 Classification Trees; 1.8 Characteristics of Classification Trees; 1.8.1 Tree Size; 1.8.2 The Hierarchical Nature of Decision Trees; 1.9 Relation to Rule Induction; 2. Training Decision Trees; 2.1 What is Learning?; 2.2 Preparing the Training Set; 2.3 Training the Decision Tree 3. A Generic Algorithm for Top-Down Induction of Decision Trees3.1 Training Set; 3.2 Definition of the Classification Problem; 3.3 Induction Algorithms; 3.4 Probability Estimation in Decision Trees; 3.4.1 Laplace Correction; 3.4.2 No Match; 3.5 Algorithmic Framework for Decision Trees; 3.6 Stopping Criteria; 4. Evaluation of Classification Trees; 4.1 Overview; 4.2 Generalization Error; 4.2.1 Theoretical Estimation of Generalization Error; 4.2.2 Empirical Estimation of Generalization Error; 4.2.3 Alternatives to the Accuracy Measure; 4.2.4 The F-Measure; 4.2.5 Confusion Matrix 4.2.6 Classifier Evaluation under Limited Resources4.2.6.1 ROC Curves; 4.2.6.2 Hit-Rate Curve; 4.2.6.3 Qrecall (Quota Recall); 4.2.6.4 Lift Curve; 4.2.6.5 Pearson Correlation Coefficient; 4.2.6.6 Area Under Curve (AUC); 4.2.6.7 Average Hit-Rate; 4.2.6.8 Average Qrecall; 4.2.6.9 Potential Extract Measure (PEM); 4.2.7 Which Decision Tree Classifier is Better?; 4.2.7.1 McNemar's Test; 4.2.7.2 A Test for the Difference of Two Proportions; 4.2.7.3 The Resampled Paired t Test; 4.2.7.4 The k-fold Cross-validated Paired t Test; 4.3 Computational Complexity; 4.4 Comprehensibility 4.5 Scalability to Large Datasets4.6 Robustness; 4.7 Stability; 4.8 Interestingness Measures; 4.9 Overfitting and Underfitting; 4.10 "No Free Lunch" Theorem; 5. Splitting Criteria; 5.1 Univariate Splitting Criteria; 5.1.1 Overview; 5.1.2 Impurity-based Criteria; 5.1.3 Information Gain; 5.1.4 Gini Index; 5.1.5 Likelihood Ratio Chi-squared Statistics; 5.1.6 DKM Criterion; 5.1.7 Normalized Impurity-based Criteria; 5.1.8 Gain Ratio; 5.1.9 Distance Measure; 5.1.10 Binary Criteria; 5.1.11 Twoing Criterion; 5.1.12 Orthogonal Criterion; 5.1.13 Kolmogorov-Smirnov Criterion 5.1.14 AUC Splitting Criteria5.1.15 Other Univariate Splitting Criteria; 5.1.16 Comparison of Univariate Splitting Criteria; 5.2 Handling Missing Values; 6. Pruning Trees; 6.1 Stopping Criteria; 6.2 Heuristic Pruning; 6.2.1 Overview; 6.2.2 Cost Complexity Pruning; 6.2.3 Reduced Error Pruning; 6.2.4 Minimum Error Pruning (MEP); 6.2.5 Pessimistic Pruning; 6.2.6 Error-BasedPruning (EBP); 6.2.7 Minimum Description Length (MDL) Pruning; 6.2.8 Other Pruning Methods; 6.2.9 Comparison of Pruning Methods; 6.3 Optimal Pruning; 7. Popular Decision Trees Induction Algorithms; 7.1 Overview; 7.2 ID3
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced. This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new ed.
9789814590075 (hb)
Data mining.
Decision trees.
Machine learning.
Decision support systems.
QA76.9.D343 / R654 2014
006.312 / R.L.D
Data mining with decision trees : theory and applications / by Lior Rokach, Oded Maimon - 2nd ed. - [Hackensack] New Jersey : World Scientific, c2015. - 305 p.: ill.; 24 cm.
computer bookfair2015
Includes bibliographical references and index.
About the Authors; Preface for the Second Edition; Preface for the First Edition; Contents; 1. Introduction to Decision Trees; 1.1 Data Science; 1.2 Data Mining; 1.3 The Four-Layer Model; 1.4 Knowledge Discovery in Databases (KDD); 1.5 Taxonomy of Data Mining Methods; 1.6 Supervised Methods; 1.6.1 Overview; 1.7 Classification Trees; 1.8 Characteristics of Classification Trees; 1.8.1 Tree Size; 1.8.2 The Hierarchical Nature of Decision Trees; 1.9 Relation to Rule Induction; 2. Training Decision Trees; 2.1 What is Learning?; 2.2 Preparing the Training Set; 2.3 Training the Decision Tree 3. A Generic Algorithm for Top-Down Induction of Decision Trees3.1 Training Set; 3.2 Definition of the Classification Problem; 3.3 Induction Algorithms; 3.4 Probability Estimation in Decision Trees; 3.4.1 Laplace Correction; 3.4.2 No Match; 3.5 Algorithmic Framework for Decision Trees; 3.6 Stopping Criteria; 4. Evaluation of Classification Trees; 4.1 Overview; 4.2 Generalization Error; 4.2.1 Theoretical Estimation of Generalization Error; 4.2.2 Empirical Estimation of Generalization Error; 4.2.3 Alternatives to the Accuracy Measure; 4.2.4 The F-Measure; 4.2.5 Confusion Matrix 4.2.6 Classifier Evaluation under Limited Resources4.2.6.1 ROC Curves; 4.2.6.2 Hit-Rate Curve; 4.2.6.3 Qrecall (Quota Recall); 4.2.6.4 Lift Curve; 4.2.6.5 Pearson Correlation Coefficient; 4.2.6.6 Area Under Curve (AUC); 4.2.6.7 Average Hit-Rate; 4.2.6.8 Average Qrecall; 4.2.6.9 Potential Extract Measure (PEM); 4.2.7 Which Decision Tree Classifier is Better?; 4.2.7.1 McNemar's Test; 4.2.7.2 A Test for the Difference of Two Proportions; 4.2.7.3 The Resampled Paired t Test; 4.2.7.4 The k-fold Cross-validated Paired t Test; 4.3 Computational Complexity; 4.4 Comprehensibility 4.5 Scalability to Large Datasets4.6 Robustness; 4.7 Stability; 4.8 Interestingness Measures; 4.9 Overfitting and Underfitting; 4.10 "No Free Lunch" Theorem; 5. Splitting Criteria; 5.1 Univariate Splitting Criteria; 5.1.1 Overview; 5.1.2 Impurity-based Criteria; 5.1.3 Information Gain; 5.1.4 Gini Index; 5.1.5 Likelihood Ratio Chi-squared Statistics; 5.1.6 DKM Criterion; 5.1.7 Normalized Impurity-based Criteria; 5.1.8 Gain Ratio; 5.1.9 Distance Measure; 5.1.10 Binary Criteria; 5.1.11 Twoing Criterion; 5.1.12 Orthogonal Criterion; 5.1.13 Kolmogorov-Smirnov Criterion 5.1.14 AUC Splitting Criteria5.1.15 Other Univariate Splitting Criteria; 5.1.16 Comparison of Univariate Splitting Criteria; 5.2 Handling Missing Values; 6. Pruning Trees; 6.1 Stopping Criteria; 6.2 Heuristic Pruning; 6.2.1 Overview; 6.2.2 Cost Complexity Pruning; 6.2.3 Reduced Error Pruning; 6.2.4 Minimum Error Pruning (MEP); 6.2.5 Pessimistic Pruning; 6.2.6 Error-BasedPruning (EBP); 6.2.7 Minimum Description Length (MDL) Pruning; 6.2.8 Other Pruning Methods; 6.2.9 Comparison of Pruning Methods; 6.3 Optimal Pruning; 7. Popular Decision Trees Induction Algorithms; 7.1 Overview; 7.2 ID3
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced. This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new ed.
9789814590075 (hb)
Data mining.
Decision trees.
Machine learning.
Decision support systems.
QA76.9.D343 / R654 2014
006.312 / R.L.D