Multi-Layer Perceptron (MLP) as Regressor:
Neural network parameters:
Number of hidden layers and number of neurons per layers
Penalty (Alpha)
Initial learning rate
Activation function for the hidden layer ('identity', 'logistic', 'tanh', 'relu')
Learning rate ('constant', 'invscaling', 'adaptive')
Solver ('lbfgs', 'sgd', 'adam')
'lbfgs' is an optimizer in the family of quasi-Newton methods. 'sgd' refers to stochastic gradient descent. 'adam' refers to a stochastic gradient-based optimizer proposed by Kingma and Ba (2014)
Moment of the descending gradient (Momentun) -- if we use the 'sgd' solver
Note: there are much more parameters, these are considered the most important
More information:
Optimization methods tested to search the hyperparameters space are:
- Exhaustive Grid Search
- Randomized Parameter Optimization
More information:
Data splitting scheme:
The data set was divided in other three subsets composed by data for training (calibration and validation), and a posteriori test. Data division presented in the following figure:
python - Scikit-learn
python - Pandas
python - NumPy
python - Matplolib
python - Statsmodels