Contents
Foreword
Preface
Symbols and Notations
Chapter 1 Introduction 1
References 13
Chapter 2 Linear Regression 15
2.1 Least Squares Linear Regression 15
2.2 Principal Component Analysis and Principal Component Regression 26
2.3 Least Absolute Shrinkage and Selection Operator (L1) 37
2.4 Ridge Regression (L2) 40
2.5 Elastic Net Regression 44
2.6 Multiply Task LASSO (MultiTaskLASSO) 49
Homework 52
References 53
Chapter 3 Linear Classification 55
3.1 Perceptron 57
3.2 Logistic Regression 60
3.3 Linear Discriminant Analysis 73
Homework 80
References 82
Chapter 4 Support Vector Machine 83
4.1 SVC 83
4.2 Kernel Functions 88
4.3 Soft Margin 96
4.4 SVR 102
Homework 108
References 110
Chapter 5 Decision Tree and K-Nearest-Neighbors (KNN) 112
5.1 Classification Trees 112
5.2 Regression Tree 121
5.3 K-Nearest-Neighbors (KNN) Methods 129
Homework 133
References 134
Chapter 6 Ensemble Learning 136
6.1 Boosting 137
6.1.1 AdaBoost 137
6.1.2 Gradient Boosting Machine (GBM) 145
6.1.3 eXtreme Gradient Boosting (XGBoost) 151
6.2 Bagging 153
Homework 158
References 159
Chapter 7 Bayesian Theorem and Expectation-Maximization (EM) Algorithm 160
7.1 Bayesian Theorem 160
7.2 Naive Bayes Classifier 161
7.3 Maximum Likelihood Estimation 168
7.3.1 Gaussian distribution 168
7.3.2 Weibull distribution 170
7.4 Bayesian Linear Regression 175
7.5 Expectation-Maximization (EM) Algorithm 184
7.5.1 Gaussian mixture model (GMM) 185
7.5.2 The mixture of Lorentz and Gaussian distributions 197
7.6 Gaussian Process (GP) Regression 209
Homework 219
References 219
Chapter 8 Symbolic Regression 221
8.1 Overview of Evolutionary Computation 221
8.2 Genetic Programming 223
8.3 Grammar-Guided Genetic Programming and Grammatical Evolution 225
8.4 The Application of LASSO in Symbolic Regression 234
Homework 235
References 235
Chapter 9 Neural Networks 238
9.1 Neural Networks and Perceptron 238
9.2 Back Propagation Algorithm 241
9.3 Regularization in NNs 250
9.3.1 L1 regularization 250
9.3.2 L2 regularization 257
9.4 Classification NNs 261
9.4.1 Binary classification 261
9.4.2 Multiclassification of multiply grades in a category 267
9.5 Autoencoders 272
9.5.1 Introduction 272
9.5.2 Denoising autoencoder 273
9.5.3 Sparse autoencoder 280
9.5.4 Variational autoencoder 288
Homework 311
References 312
Chapter 10 Hidden Markov Chains 313
10.1 Markov Chain 313
10.2 Stationary Markov Chain 317
10.3 Markov Chain Monte Carlo Methods 318
10.3.1 Metropolis Hastings (M-H) algorithm 320
10.3.2 Gibbs sampling algorithm 321
10.4 Calculation Methods for the Probability of Observation Sequence 325
10.4.1 Direct method 325
10.4.2 Forward method 328
10.4.3 Backward method 330
10.5 Estimation of Optimal State Sequence 332
10.5.1 Direct method 332
10.5.2 Viterbi algorithm 333
10.6 Estimation of Intrinsic Parameters—The Baum-Welch Algorithm 334
Homework 344
References 345
Chapter 11 Data Preprocessing and Feature Selection 347
11.1 Reliable Data, Normals and Anomalies 348
11.1.1 Local outlier factor 348
11.1.2 Isolated forest 352
11.1.3 One-class support vector machine 355
11.1.4 Support vector data description 361
11.2 Feature Selection 365
11.2.1 Filter approach 366
11.2.2 Wrapper approach 394
11.2.3 Embedded approach 402
Homework 408
References 408
Chapter 12 Interpretative SHAP Value and Partial Dependence Plot 410
12.1 SHapley Additive exPlanation value 410
12.2 The joint SHAP value of two features 426
12.3 Partial Dependence Plot 427
Homework 440
References 440
Appendix 1 Vector and Matrix 442
A1.1 Definition 442
A1.1.1 Vector 442
A1.1.2 Matrix 442
A1.2 Matrix Algebra 442
A1.2.1 Inverse and transpose 442
A1.2.2 Trace 443
A1.2.3 Determinant 443
A1.2.4 Eigenvalues and eigenvectors 444
A1.2.5 Singular value decomposition (SVD) 444
A1.2.6 Pseudo inverse 445
A1.2.7 Some useful identities 445
A1.3 Matrix Analysis 446
A1.3.1 Derivative of matrix 446
A1.3.2 Derivative of the determinant of a matrix 446
A1.3.3 Derivative of an inverse matrix 447
A1.3.4 Jacobian matrix and Hessian matrix 447
A1.3.5 The chain rule 447
References 447
Appendix 2 Basic Statistics 448
A2.1 Probability 448
A2.1.1 Joint probability 448
A2.1.2 Bayesian theorem and conjugation 448
A2.1.3 Probability density of continuous variables 449
A2.1.4 Quantile function 449
A2.1.5 Expectation, variance and covariance of random variables 449
A2.2 Distributions 449
A2.2.1 Bernoulli distribution 450
A2.2.2 Binomial distribution 450
A2.2.3 Poisson distribution 450
A2.2.4 Gaussian distribution 450
A2.2.5 Weibull distribution 451
A2.2.6 The chi-square (χ2) distribution and χ2-test 451
A2.2.7 Th
Foreword
Preface
Symbols and Notations
Chapter 1 Introduction 1
References 13
Chapter 2 Linear Regression 15
2.1 Least Squares Linear Regression 15
2.2 Principal Component Analysis and Principal Component Regression 26
2.3 Least Absolute Shrinkage and Selection Operator (L1) 37
2.4 Ridge Regression (L2) 40
2.5 Elastic Net Regression 44
2.6 Multiply Task LASSO (MultiTaskLASSO) 49
Homework 52
References 53
Chapter 3 Linear Classification 55
3.1 Perceptron 57
3.2 Logistic Regression 60
3.3 Linear Discriminant Analysis 73
Homework 80
References 82
Chapter 4 Support Vector Machine 83
4.1 SVC 83
4.2 Kernel Functions 88
4.3 Soft Margin 96
4.4 SVR 102
Homework 108
References 110
Chapter 5 Decision Tree and K-Nearest-Neighbors (KNN) 112
5.1 Classification Trees 112
5.2 Regression Tree 121
5.3 K-Nearest-Neighbors (KNN) Methods 129
Homework 133
References 134
Chapter 6 Ensemble Learning 136
6.1 Boosting 137
6.1.1 AdaBoost 137
6.1.2 Gradient Boosting Machine (GBM) 145
6.1.3 eXtreme Gradient Boosting (XGBoost) 151
6.2 Bagging 153
Homework 158
References 159
Chapter 7 Bayesian Theorem and Expectation-Maximization (EM) Algorithm 160
7.1 Bayesian Theorem 160
7.2 Naive Bayes Classifier 161
7.3 Maximum Likelihood Estimation 168
7.3.1 Gaussian distribution 168
7.3.2 Weibull distribution 170
7.4 Bayesian Linear Regression 175
7.5 Expectation-Maximization (EM) Algorithm 184
7.5.1 Gaussian mixture model (GMM) 185
7.5.2 The mixture of Lorentz and Gaussian distributions 197
7.6 Gaussian Process (GP) Regression 209
Homework 219
References 219
Chapter 8 Symbolic Regression 221
8.1 Overview of Evolutionary Computation 221
8.2 Genetic Programming 223
8.3 Grammar-Guided Genetic Programming and Grammatical Evolution 225
8.4 The Application of LASSO in Symbolic Regression 234
Homework 235
References 235
Chapter 9 Neural Networks 238
9.1 Neural Networks and Perceptron 238
9.2 Back Propagation Algorithm 241
9.3 Regularization in NNs 250
9.3.1 L1 regularization 250
9.3.2 L2 regularization 257
9.4 Classification NNs 261
9.4.1 Binary classification 261
9.4.2 Multiclassification of multiply grades in a category 267
9.5 Autoencoders 272
9.5.1 Introduction 272
9.5.2 Denoising autoencoder 273
9.5.3 Sparse autoencoder 280
9.5.4 Variational autoencoder 288
Homework 311
References 312
Chapter 10 Hidden Markov Chains 313
10.1 Markov Chain 313
10.2 Stationary Markov Chain 317
10.3 Markov Chain Monte Carlo Methods 318
10.3.1 Metropolis Hastings (M-H) algorithm 320
10.3.2 Gibbs sampling algorithm 321
10.4 Calculation Methods for the Probability of Observation Sequence 325
10.4.1 Direct method 325
10.4.2 Forward method 328
10.4.3 Backward method 330
10.5 Estimation of Optimal State Sequence 332
10.5.1 Direct method 332
10.5.2 Viterbi algorithm 333
10.6 Estimation of Intrinsic Parameters—The Baum-Welch Algorithm 334
Homework 344
References 345
Chapter 11 Data Preprocessing and Feature Selection 347
11.1 Reliable Data, Normals and Anomalies 348
11.1.1 Local outlier factor 348
11.1.2 Isolated forest 352
11.1.3 One-class support vector machine 355
11.1.4 Support vector data description 361
11.2 Feature Selection 365
11.2.1 Filter approach 366
11.2.2 Wrapper approach 394
11.2.3 Embedded approach 402
Homework 408
References 408
Chapter 12 Interpretative SHAP Value and Partial Dependence Plot 410
12.1 SHapley Additive exPlanation value 410
12.2 The joint SHAP value of two features 426
12.3 Partial Dependence Plot 427
Homework 440
References 440
Appendix 1 Vector and Matrix 442
A1.1 Definition 442
A1.1.1 Vector 442
A1.1.2 Matrix 442
A1.2 Matrix Algebra 442
A1.2.1 Inverse and transpose 442
A1.2.2 Trace 443
A1.2.3 Determinant 443
A1.2.4 Eigenvalues and eigenvectors 444
A1.2.5 Singular value decomposition (SVD) 444
A1.2.6 Pseudo inverse 445
A1.2.7 Some useful identities 445
A1.3 Matrix Analysis 446
A1.3.1 Derivative of matrix 446
A1.3.2 Derivative of the determinant of a matrix 446
A1.3.3 Derivative of an inverse matrix 447
A1.3.4 Jacobian matrix and Hessian matrix 447
A1.3.5 The chain rule 447
References 447
Appendix 2 Basic Statistics 448
A2.1 Probability 448
A2.1.1 Joint probability 448
A2.1.2 Bayesian theorem and conjugation 448
A2.1.3 Probability density of continuous variables 449
A2.1.4 Quantile function 449
A2.1.5 Expectation, variance and covariance of random variables 449
A2.2 Distributions 449
A2.2.1 Bernoulli distribution 450
A2.2.2 Binomial distribution 450
A2.2.3 Poisson distribution 450
A2.2.4 Gaussian distribution 450
A2.2.5 Weibull distribution 451
A2.2.6 The chi-square (χ2) distribution and χ2-test 451
A2.2.7 Th