Thursday, 4 September 2008

The Single Layer Perceptron

This is the first in a series of posts in which I will write about the evolution of neural networks. I will start with the most simple network of all, the Single Layer Perceptron, and work through different architectures and learning methods which will eventually lead to networks that can predict stock market price movements, perform face recognition, handle natural language processing and much more.

I will provide full code samples, written in C#, which can be compiled and run by downloading Visual C# 2008 Express Edition from Microsoft at http://www.microsoft.com/express/vcsharp/

The Perceptron is the grand daddy of all neural networks. It was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. The Perceptron is the simplest type of feed forward neural network, known as a linear classifier. This means that the type of problems the network can solve must be linearly separable. In the graphs shown below it is easy to visualise what this means in two dimensions.

linearly separablelinearly non-separable

This is linearly separable.

This is linearly non-separable.


The green line represents the separation between the two classes of data the network is attempting to classify. In three dimensions this would be represented by a plane, and in four dimensions or more by a hyperplane. There are other types of network that can solve linearly non-separable problems, which I will cover in later posts.

To begin to solve this problem we need a network as represented below.

single layer perceptron


As you can see, each input node is connected directly to the output node. By adjusting the value of the connecting weights, the network is able to learn.

In the demonstration below each of the points plotted in the first graph above will be used as training data. The training data will be repeatedly applied to the input nodes of the network and the weights adjusted until the network is performs as desired.

Once the network has been trained the application will step through the range of input data to prove that the network has learnt correctly and is able to perform good generalisation. The following code shows how this is done.

using System;

 

namespace Perceptron

{

    public class Perceptron

    {

        [STAThread]

        static void Main()

        {

            // Load sample input patterns.

            double[,] inputs = new double[,] {

                { 0.72, 0.82 }, { 0.91, -0.69 }, { 0.46, 0.80 },

                { 0.03, 0.93 }, { 0.12, 0.25 }, { 0.96, 0.47 },

                { 0.79, -0.75 }, { 0.46, 0.98 }, { 0.66, 0.24 },

                { 0.72, -0.15 }, { 0.35, 0.01 }, { -0.16, 0.84 },

                { -0.04, 0.68 }, { -0.11, 0.10 }, { 0.31, -0.96 },

                { 0.00, -0.26 }, { -0.43, -0.65 }, { 0.57, -0.97 },

                { -0.47, -0.03 }, { -0.72, -0.64 }, { -0.57, 0.15 },

                { -0.25, -0.43 }, { 0.47, -0.88 }, { -0.12, -0.90 },

                { -0.58, 0.62 }, { -0.48, 0.05 }, { -0.79, -0.92 },

                { -0.42, -0.09 }, { -0.76, 0.65 }, { -0.77, -0.76 } };

 

            // Load sample output patterns.

            int[] outputs = new int[] {

                -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,

                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };

            int patternCount = inputs.GetUpperBound(0) + 1;

 

            // Randomise weights.

            Random r = new Random();

            double[] weights = { r.NextDouble(), r.NextDouble() };

 

            // Set learning rate.

            double learningRate = 0.1;

 

            int iteration = 0;

            double globalError;

 

            do

            {

                globalError = 0;

                for (int p = 0; p < patternCount; p++)

                {

                    // Calculate output.

                    int output = Output(weights, inputs[p, 0], inputs[p, 1]);

 

                    // Calculate error.

                    double localError = outputs[p] - output;

 

                    if (localError != 0)

                    {

                        // Update weights.

                        for (int i = 0; i < 2; i++)

                        {

                            weights[i] += learningRate * localError * inputs[p, i];

                        }

                    }

 

                    // Convert error to absolute value.

                    globalError += Math.Abs(localError);

                }

 

                Console.WriteLine("Iteration {0}\tError {1}", iteration, globalError);

                iteration++;

 

            } while (globalError != 0);

 

            // Display network generalisation.

            Console.WriteLine();

            Console.WriteLine("X, Y, Output");

            for (double x = -1; x <= 1; x += .5)

            {

                for (double y = -1; y <= 1; y += .5)

                {

                    // Calculate output.

                    int output = Output(weights, x, y);

                    Console.WriteLine("{0}, {1}, {2}", x, y, (output == 1) ? "Blue" : "Red");

                }

            }

            Console.ReadKey();

        }

 

        private static int Output(double[] weights, double x, double y)

        {

            double sum = x * weights[0] + y * weights[1];

            return (sum >= 0) ? 1 : -1;

        }

    }

}


The output from the network after training looks like this, which proves generalisation has been correctly achieved. Each pair of x and y coordinates when presented to the network returns the classification expected, i.e. blue or red.

X, Y, Output
-1.0, -1.0, Blue
-1.0, -0.5, Blue
-1.0, 0.0, Blue
-1.0, 0.5, Blue
-1.0, 1.0, Blue
-0.5, -1.0, Blue
-0.5, -0.5, Blue
-0.5, 0.0, Blue
-0.5, 0.5, Blue
-0.5, 1.0, Red
0.0, -1.0, Blue
0.0, -0.5, Blue
0.0, 0.0, Blue
0.0, 0.5, Red
0.0, 1.0, Red
0.5, -1.0, Blue
0.5, -0.5, Red
0.5, 0.0, Red
0.5, 0.5, Red
0.5, 1.0, Red
1.0, -1.0, Red
1.0, -0.5, Red
1.0, 0.0, Red
1.0, 0.5, Red
1.0, 1.0, Red

Whilst this may seem like a trivial problem to solve in two dimensions, the Perceptron can perform very powerful analysis in higher dimensions. The same general principals shown here will apply.

In my next post, I will start to write about the multi-layer perceptron, which can be used to solve linearly non-separable problems, like the one presented in the second graph above.

I welcome feedback on this post, and would be interested in receiving suggestions for future posts.

33 comments:

Arvind said...

Nice informative article. But, it requires previous knowledge of neural networks and learning algorithms and is not for novices. Thanks, for the code provided and waiting for the next post in series. But, I have a question that why single layer perceptron works only on linear saperable problems and finds solution on boundaries, although MLP (Multi-Layer Perceptron) can be used to solve insaperable classes?

John Wakefield said...

Hi arvind,

That's a very good question, and one that I will be covering in detail in my next post. Briefly, any trained neuron can only draw a single straight line on a graph, or a hyperplane in higher dimensions. These lines are used to partition input space. As you can see from my linearly non-separable example, you would need two straight lines. Therefore, we are going to need another layer containing two neurons. These will be used to map input space into feature space, shifting the locations of the points. The resulting feature space will contain two clusters of points which can easily be separated with a single straight line, and hence by the output neuron.

Many thanks for your question and I hope this answer helps. Please look out for my next post, all will be revealed...

John

Arvind said...

Hi John,
Thanks for the reply and clearing my doubt. Still, I have a question :D.

The Network model, you have implemented takes output activation function as Threshold function at 1: {(output == 1) ? "Blue" : "Red"}.

Now, you said :
Any trained neuron can only draw a single straight line on a graph, or a hyperplane in higher dimensions.

My question is that, by changing the activation function can we make a neuron partition the input space in more than 2 parts?? i.e. is there any possibility that the same model can be used to implement a simple non-saperable class like XOR gate by changing the output activation from current threshold to another, so that it partitions the space in more than 2 parts??

John Wakefield said...

Hi arvind,

I'm not sure I fully understand your question. Do you mean changing the activation function to have more than one threshold, i.e. 0.00 to 0.33 = red, 0.34 to 0.66 = green, 0.67 to 1.00 = blue?

John

Arvind said...

Hi John,

Actually I was thinking that the same perceptron model could be used for implementing a non-saperable class by just changing it's activation function. So that, instead of drawing a single straight line on a graph, the trained neuron creates a set of lines on it. I was just trying to confirm it from you, that is this possible ??

John Wakefield said...

Hi arvind,

I like your thinking. In most neural networks, neurons are trained to represent a linear equation. In our case, this is o = (x * xo) + (y * yo). Even though something like a sigmoid function could be used to flatten the activation curve, this will only ever be able to draw a single straight line through input space.

People have experimented with polynomial activation functions, which are able to draw a curved line through input space. However, these are more difficult to train, and are beyond the scope of the training algorithm found in a single layer perceptron. Polynomial activation functions are frequently found in Support Vector Machines, which I shall be covering in a later post.

John

Arvind said...

Hi John,
Thanks, for the informative reply. Indeed training such neurons would be a difficult task. Now, waiting for ur next post.. Keep writing .........:D

Steven Tu said...

Hi, John Wakefield. You're the first person to comment on my blog:http://popularity-peculiarity.blogspot.com/. Thank you!

Lebon Bon Lebon said...

Thanks, glad you liked my blog.

I submitted this awesome article on digg, check:

http://digg.com/programming/Training_Neural_Networks_Using_Back_Propagation_in_C

Actually I spend like 3 hours a day surfing, I'm trying to cut it down though :).

Keep in touch and keep your content with this great level.

Johnny Idol said...

Hi John - your blog is a great resource and you can be damn sure I'll be following.

I have a question: You say in this article the Perceptron is the simplest neural network, but isn't the ADALINE the simplest and earliest neural network ever developed on McCullough-Pitt?

You might wanna check-out my AI blog (even if there's not much in it yet!): http://roadtriptostrongai.blogspot.com/

illuminus said...

Great article, I look forward to reading more posts!

John Wakefield said...

Thanks Johnny Idol and Illuminus for your comments. Glad you like the blog.

Johnny Idol, you are right, the ADALINE did come before the Perceptron. I should correct my blog and give it the recognition it deserves. p.s. like your blog too - you should write some more...

Johnny Idol said...

Hi John - I was doing some reading and realized perceptron was invented in 1957 while ADALINE is dated 1960, so my previous comment is wrong.

Looks like you got it right!

T said...

This is an awesome blog. I'm applying to do Computer science at university and this is an area I find very interesting. I have already tried to make my own neural net (didn't work), but now armed with the knowledge you have provided, I am going to give it another shot. Thanks!

T said...

Hurray. I did it! I made a single layered perceptron. I couldn't have done it without your walkthrough/tutorial

- My writeup: http://timsprojects.com/perceptron.doc

- My perceptron: http://timsprojects.com/neural.exe

John Wakefield said...

Hi Tim,

Glad you've found it useful. I've read your Word doc - looks good!

Hope it goes well at university.

John

Jean Azzopardi said...

Many thanks for an informative blog.

However, at University, our lecturer is telling us to avoid going down an OO-oriented route, such as modelling each neuron in an object of its own, and to do them with matrix multiplication...

John Wakefield said...

Hi Jean,

Why is your lecturer advising you to do that? Is it for speed? I work in a commercial environment, and wouldn't dream of doing anything in a non-OO way these days. It just makes for much more understandable code...

John

Jean Azzopardi said...

Yes, the main idea is to speed things up. It does, however make understanding a little more difficult, though in the end, I'd like to see just how fast the two end up becoming. What we're doing, btw, is to generate 8x8 bitmaps of digits and recognise them, kinda like OCR.

pinchitter said...

Dear John

I tried your code (converted in Java) It is working find, but in training it is taking lot of time. i.e error becoming zero.
So I tried with Bias Input then it learn very fast. and getting the same result as you are getting. In your code there is no provision for Bias input .. can you tell me why??

John Wakefield said...

Hi Pinchitter,

The reason I didn't include a bias is because this was my first article on this blog, and I was trying to keep the network as simple as possible.

As you progress though my blog, you will see that I intoduce a bias in the post about Hidden Neurons and Feature Space, and explain why it is necessary.

Enjoy your Java porting!

Cheers
John

pinchitter said...

Thanks John for your quick reply.. one more question. how much time your code is taking learn without bias..?? (I don't have C#)

You are doing a wonderful job by sharing your ANN experience with new programmer like me.

keep the good work going...

John Wakefield said...

Hi Pinchitter,

For the simple example shown in the code above, this normally completes in less than 5 iterations, or less than a second.

John

Beverly Matiman said...

Hi.. my thesis is all about comparing single layer Perceptron and Adaline (Adaptive linear element)in ANN im using JAVA and i have ahrd time doing my work. I hope you can post a code using JAVA, these could help me a lot. oh,the problem im rying to solve is that the Perceptron and Adaline should learn to respond with true or false. looking forward for your response.

Beverly said...

Hi.. my thesis is all about comparing single layer Perceptron and Adaline (Adaptive linear element)in ANN im using JAVA and i have ahrd time doing my work. I hope you can post a code using JAVA, these could help me a lot. oh,the problem im rying to solve is that the Perceptron and Adaline should learn to respond with true or false. looking forward for your response.

pinchitter said...

Hi Beverly,

I send you a mail with attachment(Perceptron code in java)

Dean Miller said...

I find your articles very useful. I am a Software Engineer student and for a project I am building a simple character recognition application in C#. I always wondered why the weights do not get overriden during training. For example, if the net's weights are modified to learn input pattern 0 will the modifications made not be undone when the net attempts to learn input pattern 1?

Thank you

Dean Miller

Anonymous said...

hy, i'm looking for a mathlab or C code; i have a project for school about MLP for clasification of gray scale image; can anyone help me?

cristi
contact me at cyx.content@gmail.com

Beverly said...

hi
I hope you can also give me adaline java codes. Looking forward for your email. THNX

Leslie said...

Anyway you could post the java code?

mustafa said...

Hello every one, I'm new in this field, neural network, I'm not able to understand how the data from he table can be an input to the neuron.

If you have any information or link to explain every thing from the beginning, please provide me.
Thank you in advance.

mustafa said...

Hello every one, I'm new in this field, neural network, I'm not able to understand how the data from he table can be an input to the neuron.

If you have any information or link to explain every thing from the beginning, please provide me.
Thank you in advance.

pataswing said...

Very nice tutorial, straightforward.

Thanks.