Thursday, 4 September 2008

The Single Layer Perceptron

This is the first in a series of posts in which I will write about the evolution of neural networks. I will start with the most simple network of all, the Single Layer Perceptron, and work through different architectures and learning methods which will eventually lead to networks that can predict stock market price movements, perform face recognition, handle natural language processing and much more.

I will provide full code samples, written in C#, which can be compiled and run by downloading Visual C# 2008 Express Edition from Microsoft at http://www.microsoft.com/express/vcsharp/

The Perceptron is the grand daddy of all neural networks. It was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. The Perceptron is the simplest type of feed forward neural network, known as a linear classifier. This means that the type of problems the network can solve must be linearly separable. In the graphs shown below it is easy to visualise what this means in two dimensions.

linearly separablelinearly non-separable

This is linearly separable.

This is linearly non-separable.


The green line represents the separation between the two classes of data the network is attempting to classify. In three dimensions this would be represented by a plane, and in four dimensions or more by a hyperplane. There are other types of network that can solve linearly non-separable problems, which I will cover in later posts.

To begin to solve this problem we need a network as represented below.

single layer perceptron


As you can see, each input node is connected directly to the output node. By adjusting the value of the connecting weights, the network is able to learn.

In the demonstration below each of the points plotted in the first graph above will be used as training data. The training data will be repeatedly applied to the input nodes of the network and the weights adjusted until the network is performs as desired.

Once the network has been trained the application will step through the range of input data to prove that the network has learnt correctly and is able to perform good generalisation. The following code shows how this is done.

using System;

 

namespace Perceptron

{

    public class Perceptron

    {

        [STAThread]

        static void Main()

        {

            // Load sample input patterns.

            double[,] inputs = new double[,] {

                { 0.72, 0.82 }, { 0.91, -0.69 }, { 0.46, 0.80 },

                { 0.03, 0.93 }, { 0.12, 0.25 }, { 0.96, 0.47 },

                { 0.79, -0.75 }, { 0.46, 0.98 }, { 0.66, 0.24 },

                { 0.72, -0.15 }, { 0.35, 0.01 }, { -0.16, 0.84 },

                { -0.04, 0.68 }, { -0.11, 0.10 }, { 0.31, -0.96 },

                { 0.00, -0.26 }, { -0.43, -0.65 }, { 0.57, -0.97 },

                { -0.47, -0.03 }, { -0.72, -0.64 }, { -0.57, 0.15 },

                { -0.25, -0.43 }, { 0.47, -0.88 }, { -0.12, -0.90 },

                { -0.58, 0.62 }, { -0.48, 0.05 }, { -0.79, -0.92 },

                { -0.42, -0.09 }, { -0.76, 0.65 }, { -0.77, -0.76 } };

 

            // Load sample output patterns.

            int[] outputs = new int[] {

                -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,

                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };

            int patternCount = inputs.GetUpperBound(0) + 1;

 

            // Randomise weights.

            Random r = new Random();

            double[] weights = { r.NextDouble(), r.NextDouble() };

 

            // Set learning rate.

            double learningRate = 0.1;

 

            int iteration = 0;

            double globalError;

 

            do

            {

                globalError = 0;

                for (int p = 0; p < patternCount; p++)

                {

                    // Calculate output.

                    int output = Output(weights, inputs[p, 0], inputs[p, 1]);

 

                    // Calculate error.

                    double localError = outputs[p] - output;

 

                    if (localError != 0)

                    {

                        // Update weights.

                        for (int i = 0; i < 2; i++)

                        {

                            weights[i] += learningRate * localError * inputs[p, i];

                        }

                    }

 

                    // Convert error to absolute value.

                    globalError += Math.Abs(localError);

                }

 

                Console.WriteLine("Iteration {0}\tError {1}", iteration, globalError);

                iteration++;

 

            } while (globalError != 0);

 

            // Display network generalisation.

            Console.WriteLine();

            Console.WriteLine("X, Y, Output");

            for (double x = -1; x <= 1; x += .5)

            {

                for (double y = -1; y <= 1; y += .5)

                {

                    // Calculate output.

                    int output = Output(weights, x, y);

                    Console.WriteLine("{0}, {1}, {2}", x, y, (output == 1) ? "Blue" : "Red");

                }

            }

            Console.ReadKey();

        }

 

        private static int Output(double[] weights, double x, double y)

        {

            double sum = x * weights[0] + y * weights[1];

            return (sum >= 0) ? 1 : -1;

        }

    }

}


The output from the network after training looks like this, which proves generalisation has been correctly achieved. Each pair of x and y coordinates when presented to the network returns the classification expected, i.e. blue or red.

X, Y, Output
-1.0, -1.0, Blue
-1.0, -0.5, Blue
-1.0, 0.0, Blue
-1.0, 0.5, Blue
-1.0, 1.0, Blue
-0.5, -1.0, Blue
-0.5, -0.5, Blue
-0.5, 0.0, Blue
-0.5, 0.5, Blue
-0.5, 1.0, Red
0.0, -1.0, Blue
0.0, -0.5, Blue
0.0, 0.0, Blue
0.0, 0.5, Red
0.0, 1.0, Red
0.5, -1.0, Blue
0.5, -0.5, Red
0.5, 0.0, Red
0.5, 0.5, Red
0.5, 1.0, Red
1.0, -1.0, Red
1.0, -0.5, Red
1.0, 0.0, Red
1.0, 0.5, Red
1.0, 1.0, Red

Whilst this may seem like a trivial problem to solve in two dimensions, the Perceptron can perform very powerful analysis in higher dimensions. The same general principals shown here will apply.

In my next post, I will start to write about the multi-layer perceptron, which can be used to solve linearly non-separable problems, like the one presented in the second graph above.

I welcome feedback on this post, and would be interested in receiving suggestions for future posts.