Varying regularization in Multilayer Perception

Jump to: navigation, search

Varying regularization in Multi-layer Perceptron


A comparison of different values for regularization parameter 'alpha' onsynthetic datasets. The plot shows that different alphas yield differentdecision functions.Alpha is a parameter for regularization term, aka penalty term, that combats over fitting by constraining the size of the weights. Increasing alpha may fix high variance (a sign of overfitting) by encouraging smaller weights, resulting in a decision boundary plot that appears with lesser curvatures. Similarly, decreasing alpha may fix high bias (a sign of underfitting) by encouraging larger weights, potentially resulting in a more complicated decision boundary.

multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learningtechnique called backpropagation for training.[1][2] Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.Multilayer perceptrons are sometimes colloquially referred to as "vanilla" neural networks, especially when they have a single hidden layer.

Activation function

If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then linear algebra shows that any number of layers can be reduced to a two-layer input-output model. In MLPs some neurons use a nonlinear activation function that was developed to model the frequency of action potentials, or firing, of biological neurons.


The MLP consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes making it a deep neural network. Since MLPs are fully connected, each node in one layer connects with a certain weight  to every node in the following layer.


Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. This is an example of supervised learning, and is carried out through backpropagation, a generalization of the least mean squares algorithm in the linear perceptron.

obtained from-

Related Models

Following are some related models available for cloning/copying by anyone:

Click on the category links at the bottom of this page to navigate to a full list of simulation models in similar subject area or similar computational methodology.

Model New Results

Software Used



Scikit Learn is a set of simple and efficient tools for data mining and data analysis. Its accessible to everybody, and reusable in various contexts. Scikit-Learn is built on NumPy, SciPy, and matplotlib.

Typical applications of Scikit-Learn include:

For more details see ScikitLearn .