Sunday, 21 September 2008

Training Neural Networks Using Back Propagation in C#

In my previous post, I showed how multi layer Perceptrons could be used to solve linearly non-separable problems. In that example, I calculated all the network weights by hand. In this post, I shall introduce the back propagation training algorithm, which will do all the hard work for us.

The back propagation algorithm has become the de facto training algorithm for artificial neural networks, and has been studied by the artificial intelligence community since the 1970s. It is used in commercial neural network software packages, such a MATLAB.

The principle of back propagation is actually quite easy to understand, even though the maths behind it can look rather daunting. The basic steps are:
  1. Initialise the network with small random weights.
  2. Present an input pattern to the input layer of the network.
  3. Feed the input pattern forward through the network to calculate its activation value.
  4. Take the difference between desired output and the activation value to calculate the network’s activation error.
  5. Adjust the weights feeding the output neuron to reduce its activation error for this input pattern.
  6. Propagate an error value back to each hidden neuron that is proportional to their contribution of the network’s activation error.
  7. Adjust the weights feeding each hidden neuron to reduce their contribution of error for this input pattern.
  8. Repeat steps 2 to 7 for each input pattern in the input collection.
  9. Repeat step 8 until the network is suitably trained.
It is important to note that each pattern is presented in turn, and the network adjusted slightly, before moving on to the next pattern. If we simply let the network perfectly correct the errors before moving onto the next pattern, it would never learn a generalised solution for the entire input collection.

The magic really happens in step 6, which determines how much error to feed back to each hidden neuron. Once the error value has been established, training can continue as described in my post about the single layer perceptron.

To illustrate how the error value is calculated, I will use this network diagram.

Multi Layer Perceptron

Then, if we use these variables:

output_o = Activation value of the output neuron
error_o = Error at the output neuron
error_h = Error at a hidden neuron
weight_ho = A weight connecting a hidden neuron to the output neuron

The error feed back to a hidden neuron is calculated:

error_h = error_o * Derivative(output_o) * weight_ho

For an explaination about how to calculate the derivative value, see my post on the sigmoid function.

Unlike the single layer Perceptron, training a multi layer Perceptron with back propagation does not guarantee a solution, even if one is available. This is because training can become stuck in a local error minimum. There are a number of strategies to overcome this, which I shall cover another time. For now, restarting training is normally sufficient for small networks.

I will continue to use the same classification problem from my last post.

Linearly Non-Separable

Representing these input patterns. You should copy and paste these into a Patterns.csv file to use in the code sample below:

0.10, 0.03, 0
0.11, 0.11, 0
0.11, 0.82, 0
0.13, 0.17, 0
0.20, 0.81, 0
0.21, 0.57, 1
0.25, 0.52, 1
0.26, 0.48, 1
0.28, 0.17, 1
0.28, 0.45, 1
0.37, 0.28, 1
0.41, 0.92, 0
0.43, 0.04, 1
0.44, 0.55, 1
0.47, 0.84, 0
0.50, 0.36, 1
0.51, 0.96, 0
0.56, 0.62, 1
0.65, 0.01, 1
0.67, 0.50, 1
0.73, 0.05, 1
0.73, 0.90, 0
0.73, 0.99, 0
0.78, 0.01, 1
0.83, 0.62, 0
0.86, 0.42, 1
0.86, 0.91, 0
0.89, 0.12, 1
0.95, 0.15, 1
0.98, 0.73, 0

The code below is my implementation of back propagation in C#. In my opinion C# Generics are a beautiful thing, and you will see that I use them extensively. To run this code, create a new C# Console application and paste the code into the Program.cs file. Paste the input patterns into a file named Patterns.csv located in your project directory. Include Patterns.csv in your project, and make sure you set its “Copy to Output Directory” attribute to True.

using System;

using System.Collections.Generic;

using System.IO;

 

public class Network

{

    private int _hiddenDims = 2;        // Number of hidden neurons.

    private int _inputDims = 2;        // Number of input neurons.

    private int _iteration;            // Current training iteration.

    private int _restartAfter = 2000;   // Restart training if iterations exceed this.

    private Layer _hidden;              // Collection of hidden neurons.

    private Layer _inputs;              // Collection of input neurons.

    private List<Pattern> _patterns;    // Collection of training patterns.

    private Neuron _output;            // Output neuron.

    private Random _rnd = new Random(); // Global random number generator.

 

    [STAThread]

    static void Main()


    {

        new Network();

    }

 

    public Network()

    {

        LoadPatterns();

        Initialise();

        Train();

        Test();

    }

 

    private void Train()

    {

        double error;

        do

        {

            error = 0;

            foreach (Pattern pattern in _patterns)

            {

                double delta = pattern.Output - Activate(pattern);

                AdjustWeights(delta);

                error += Math.Pow(delta, 2);

            }

            Console.WriteLine("Iteration {0}\tError {1:0.000}", _iteration, error);

            _iteration++;

            if (_iteration > _restartAfter) Initialise();

        } while (error > 0.1);

    }

 

    private void Test()

    {

        Console.WriteLine("\nBegin network testing\nPress Ctrl C to exit\n");

        while (1 == 1)

        {

            try

            {

                Console.Write("Input x, y: ");

                string values = Console.ReadLine() + ",0";

                Console.WriteLine("{0:0}\n", Activate(new Pattern(values, _inputDims)));

            }

            catch (Exception e)

            {

                Console.WriteLine(e.Message);

            }

        }

    }

 

    private double Activate(Pattern pattern)

    {

        for (int i = 0; i < pattern.Inputs.Length; i++)

        {

            _inputs[i].Output = pattern.Inputs[i];

        }

        foreach (Neuron neuron in _hidden)

        {

            neuron.Activate();

        }

        _output.Activate();

        return _output.Output;

    }

 

    private void AdjustWeights(double delta)

    {

        _output.AdjustWeights(delta);

        foreach (Neuron neuron in _hidden)

        {

            neuron.AdjustWeights(_output.ErrorFeedback(neuron));

        }

    }

 

    private void Initialise()

    {

        _inputs = new Layer(_inputDims);

        _hidden = new Layer(_hiddenDims, _inputs, _rnd);

        _output = new Neuron(_hidden, _rnd);

        _iteration = 0;

        Console.WriteLine("Network Initialised");

    }

 

    private void LoadPatterns()

    {

        _patterns = new List<Pattern>();

        StreamReader file = File.OpenText("Patterns.csv");

        while (!file.EndOfStream)

        {

            string line = file.ReadLine();

            _patterns.Add(new Pattern(line, _inputDims));

        }

        file.Close();

    }

}

 

public class Layer : List<Neuron>

{

    public Layer(int size)

    {

        for (int i = 0; i < size; i++)

            base.Add(new Neuron());

    }

 

    public Layer(int size, Layer layer, Random rnd)

    {

        for (int i = 0; i < size; i++)

            base.Add(new Neuron(layer, rnd));

    }

}

 

public class Neuron

{

    private double _bias;                       // Bias value.

    private double _error;                      // Sum of error.

    private double _input;                      // Sum of inputs.

    private double _lambda = 6;                // Steepness of sigmoid curve.

    private double _learnRate = 0.5;            // Learning rate.

    private double _output = double.MinValue;   // Preset value of neuron.

    private List<Weight> _weights;              // Collection of weights to inputs.

 

    public Neuron() { }

 

    public Neuron(Layer inputs, Random rnd)

    {

        _weights = new List<Weight>();

        foreach (Neuron input in inputs)

        {

            Weight w = new Weight();

            w.Input = input;

            w.Value = rnd.NextDouble() * 2 - 1;

            _weights.Add(w);

        }

    }

 

    public void Activate()

    {

        _input = 0;

        foreach (Weight w in _weights)

        {

            _input += w.Value * w.Input.Output;

        }

    }

 

    public double ErrorFeedback(Neuron input)

    {

        Weight w = _weights.Find(delegate(Weight t) { return t.Input == input; });

        return _error * Derivative * w.Value;

    }

 

    public void AdjustWeights(double value)

    {

        _error = value;

        for (int i = 0; i < _weights.Count; i++)

        {

            _weights[i].Value += _error * Derivative * _learnRate * _weights[i].Input.Output;

        }

        _bias += _error * Derivative * _learnRate;

    }

 

    private double Derivative

    {

        get

        {

            double activation = Output;

            return activation * (1 - activation);

        }

    }

 

    public double Output

    {

        get

        {

            if (_output != double.MinValue)

            {

                return _output;

            }

            return 1 / (1 + Math.Exp(-_lambda * (_input + _bias)));

        }

        set

        {

            _output = value;

        }

    }

}

 

public class Pattern

{

    private double[] _inputs;

    private double _output;

 

    public Pattern(string value, int inputSize)

    {

        string[] line = value.Split(',');

        if (line.Length - 1 != inputSize)

            throw new Exception("Input does not match network configuration");

        _inputs = new double[inputSize];

        for (int i = 0; i < inputSize; i++)

        {

            _inputs[i] = double.Parse(line[i]);

        }

        _output = double.Parse(line[inputSize]);

    }

 

    public double[] Inputs

    {


        get { return _inputs; }

    }

 

    public double Output

    {

        get { return _output; }

    }

}

 

public class Weight

{

    public Neuron Input;

    public double Value;

}


When the application runs, it will use the back propagation algorithm to train the network, restarting if it gets trapped in a local error minimum. Once fully trained, you will be able to test the network against untrained points on the graph and observe that it generalises correctly.

It is simple to modify the code to accept more input dimensions and use more hidden neurons, by changing the variables _inputDims and _hiddenDims. If you try this, make sure to add extra columns to the Patterns.csv file. By doing this, this network can be used to solve very complex problems.

In future posts, I will be using networks like this to solve real world problems - I would welcome suggestions.

John

35 comments:

Dman003 said...

This is going to be on of the coolest blogs on the planet. It show theory plus code implementation.

John Wakefield said...

Thanks for your kind comments Dman003.
John

Anonymous said...

Hi! I'm new to MLP and have doubts. May i ask how do we determine how many hidden layers are there? is it based on the number of pattern input?

John Wakefield said...

Hi Anonymous,

Great question! Currently no theory exists that prescribes exactly how many hidden neurons or hidden layers are required to approximate a given function.

However, you will always require at least one hidden layer for linearly non-separable patterns, and generally the more complicated the pattern the more neurons and layers are required. Unfortunately, its trial and error, but you will get a feel for it.

Cascade correlation neural networks go some way to providing a solution. As part of their training, they literally add hidden layers and neurons as required. This comes at a price, as they do have a tendency to over fit whilst training. This means that whilst the error rate drops through the floor, they can provide poor generalisation.

Hope that helps.

John

Sebastián said...

Hi!

I found your post while I was searching for some backpropagation C# libraries.

I need to build an ANN which will have as an input an image of 320*240pixels, giving 76800 input neurons.

The main issue I'm facing is to determine how many neurons do I need in the hidden layers.

I read somewhere in the web, that the 1st hidden layer should have as many neurons as the input plus one, giving 76801 neurons, then another layer of 34800 and finally one with about 9600 neurons.

While adapting your post to my problem I realized that the amount of memory I'll need just to rise the NN is huge (because of all the synapses the network will have)

I hope I made myself clear enough...

Are there any suggestions you can think about in order to change the structure of the hidden layers?

I'm sorry for my english, as you'll notice, I'm not a native speaker.

Anonymous said...

Hi! I'm the previous anonymous. I'm really noob to this now actually. For instance a simple XOR function with a hidden layer and a output layer, how do we determine the number of neurons required in the hidden layer?

John Wakefield said...

Hi Sebastián & Anonymous,

Both of your questions are related, and are really important. I've decided to dedicate my next post to the question "How Many Hidden Neurons are Required". Hopefully, I will be able to answer both your questions. Please check back this weekend.

John

Gamer Pro. said...

amazing post man
But object-oriented programming is not a good habit in scientific programming. It can cost up to 8% more computational power in some extreme cases
C# is really elegant, but pity that it is so poorly supported by the compiler gurus

Gamer Pro. said...

You really have complicated the whole thing. The code should be shorted by at least half...

John Wakefield said...

Hi Gamer Pro,

I agree with you, OO code can be a lot more verbose, however, in my day job I use it all the time. I wrestled between keeping the code short and performant verses readable and extensible - the latter won. I could have written this in C++ with plenty of nested loops, but I decided against it… Preference I guess…

Thanks for the feedback.

John

Gamer Pro. said...

Thanks a lot!!
I finally know how to make a simple ANN!

When will you give us the post on ANN that can predict stock market? Can't wait to see!

I simplified the code a little and it's now about ten times faster than the original code

using System;
using System.Collections;

namespace csharp_ann_perceptron_case1
{
class Program
{
static Random random = new Random();
static void Main(string[] args)
{
Console.WriteLine("Yo");
Program entity = new Program();
entity.Run();
}

#region DATA
double[][] Patterns = new double[][] {
new double[]{0.10, 0.03, 0}, new double[]{0.11, 0.11, 0}, new double[]{0.11, 0.82, 0},
new double[]{0.13, 0.17, 0}, new double[]{0.20, 0.81, 0}, new double[]{0.21, 0.57, 1},
new double[]{0.25, 0.52, 1}, new double[]{0.26, 0.48, 1}, new double[]{0.28, 0.17, 1},
new double[]{0.28, 0.45, 1}, new double[]{0.37, 0.28, 1}, new double[]{0.41, 0.92, 0},
new double[]{0.43, 0.04, 1}, new double[]{0.44, 0.55, 1}, new double[]{0.47, 0.84, 0},
new double[]{0.50, 0.36, 1}, new double[]{0.51, 0.96, 0}, new double[]{0.56, 0.62, 1},
new double[]{0.65, 0.01, 1}, new double[]{0.67, 0.50, 1}, new double[]{0.73, 0.05, 1},
new double[]{0.73, 0.90, 0}, new double[]{0.73, 0.99, 0}, new double[]{0.78, 0.01, 1},
new double[]{0.83, 0.62, 0}, new double[]{0.86, 0.42, 1}, new double[]{0.86, 0.91, 0},
new double[]{0.89, 0.12, 1}, new double[]{0.95, 0.15, 1}, new double[]{0.98, 0.73, 0}
};
#endregion

#region Definition of global variables
Neuron[] Inputs;
Neuron[] Hiddens;
Neuron Last;
double learnR = 1; //Learning rate
double lambda = 6; //Steepness of sigmoid curve
int iteration = 0;
int restartafter = 10000000;
int maximumi = 1000000000;
#endregion

void Run()
{
Initialize();
//==================TRAINING===================================
double totalError = 0;//Total error
for (iteration = 0; iteration < maximumi; iteration++)
{
if (iteration >= restartafter)
Initialize();
totalError = 0;
for (int i = 0; i < Patterns.Length; i++)
{
// Load the patterns, Activate input neurons
//Produce output for input neurons
Inputs[0].Output = Patterns[i][0];
Inputs[1].Output = Patterns[i][1];
//Activate and calculate output of hidden neurons
ActivateNeuron(Hiddens[0]);
ActivateNeuron(Hiddens[1]);
//Activate and calculate output of last neuron
ActivateNeuron(Last);
//Calculate Errors
Last.Error = Patterns[i][2] - Last.Output;
totalError += Last.Error * Last.Error;
//Adjust input weights and set input errors
AdjustInputWeightsAndSetInputError(Last);
AdjustInputWeightsAndSetInputError(Hiddens[0]);
AdjustInputWeightsAndSetInputError(Hiddens[1]);
}
if (iteration % 100000 == 0)
Console.WriteLine("Iteration{0}, Error:{1:E}", iteration, totalError);
if (totalError < 0.000001)
break;
}
Console.WriteLine("Training complete,iteration:{1} final Error:{0}", totalError, iteration);
//========================TESTING===============================
#region Testing
Console.WriteLine("Press Ctrl+C to terminate; press any key to test");
Console.ReadKey(true);
while (true)
{
try
{
Console.Write("Input x: ");
double X = double.Parse(Console.ReadLine());
Console.Write("Input y: ");
double Y = double.Parse(Console.ReadLine());
Inputs[0].Output = X;
Inputs[1].Output = Y;
ActivateNeuron(Hiddens[0]);
ActivateNeuron(Hiddens[1]);
ActivateNeuron(Last);
Console.WriteLine("Output: {0}", Last.Output);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
#endregion
}
void ActivateNeuron(Neuron n)
{
n.Excitement = n.Preons[0].Pre.Output * n.Preons[0].Weight + n.Preons[1].Pre.Output * n.Preons[1].Weight;
n.Output = 1 / (1 + (Math.Exp(-lambda * (n.Excitement + n.Bias))));
}
void AdjustInputWeightsAndSetInputError(Neuron n) //This name is pretty long ^_^
{
double Gradient = n.Output * (1 - n.Output);
double commonWeightFactor = n.Error * Gradient * learnR;
n.Bias += commonWeightFactor;
double commonErrorFactor = n.Error * Gradient;
foreach (Synapse synapse in n.Preons)
{
//Here weight is changed before error, problematic? ####\\\\
synapse.Pre.Error = commonErrorFactor * synapse.Weight;
synapse.Weight += commonWeightFactor * synapse.Pre.Output;
}

}
void Initialize()
{
iteration = 0;
Inputs = new Neuron[] { new Neuron(), new Neuron() };
Hiddens = new Neuron[] { new Neuron(), new Neuron() };
Last = new Neuron();
Hiddens[0].Preons = new Synapse[] { new Synapse(Inputs[0]), new Synapse(Inputs[1]) };
Hiddens[1].Preons = new Synapse[] { new Synapse(Inputs[0]), new Synapse(Inputs[1]) };
Last.Preons = new Synapse[] { new Synapse(Hiddens[0]), new Synapse(Hiddens[1]) };
}

#region Definition of subclasses
class Neuron
{
public double Excitement=0;
public double Output=0;
public double Bias = 0;
public double Error = 0;
public Synapse[] Preons;
}
class Synapse
{
public Neuron Pre;
public double Weight;
public Synapse(Neuron pre)
{
Pre = pre;
Weight = (double)random.NextDouble() * 2 - 1;
}
}
#endregion

}
}

John Wakefield said...

Hi Gamer Pro,

Like what you’ve done with the code. It’s certainly much shorter! I will definitely be covering the stock market in future posts – lots about automatically establishing technical indicators and discovering correlations in the market. What plans do you have for Neural Networks?

John

Gamer Pro. said...

Actually I've been thinking on a model of ANN for almost three years... haha, three years ago i hadn't even heard the word "neuron" yet. At that time I called my model 'point-line system'. It was just a thing I thought up that might model our thoughs... I got to know what ANN was all because I was trying to find some established theory to confirm my ideas about the way the MIND works. Though I never found such a theory, I noticed that ANN was something similar to what I had been doing... long story, anyway, now I believe that ANN has the potential to bring great new things to the world, and I'm 'trying' to give a try to make it come true^_^

so, all these twenty years what have you been doing with ANN?

Gamer Pro. said...

Hey, why don't you add a few words like

Artificial Neural Networks(ANN) Tutorial,
Machine Learning,
Artificial Intelligence,
Stock Market Prediction

into your page so that more people googling such key words can get to your blog?

John Wakefield said...

Thanks for the ideas about the new keywords. I will add them to my profile.

It’s interesting hear about your motivation – modelling the human mind is an ambitious goal. Personally, I’ve done quite a few things with neural networks over the past twenty years – it’s one of the things that got me into IT for a living. My first real success, ages ago, was creating a net that learnt how to play perfect tic-tac-toe. Since then, I’ve done image recognition, some games, and plenty of market prediction (I used to be a commodity trader). The big thing for me now is natural language processing. The future of the computer does not involve a keyboard or mouse, and it will take some genuine advances in machine learning to get us there – something I want to be part of…

Anonymous said...

hi...
i'm new to NN and c#...is it possible to load the pattern from MySQL database?or just change it from the select statement in MySql and convert to .csv file??can you teach me how to do that if it is possible??huhuhu~ email me..nfatima84@yahoo.com

John Wakefield said...

Hi Anonymous,

If you already have your data stored in a database then that's great. If you are working with .NET then I would recommend SQL Server Express, which is freely downloadable from Microsoft. If you want to use MySQL with .NET, then there is a good introduction on the MySQL site.

To add this to my network, you would just need to modify the LoadPatterns method in the Network class to use a DataReader instead of a StreamReader.

Good luck
John

Antoine said...

Complex, yet simple to understand ! Great blog.

Vinay Shekhar said...

HI GAMERPRO

URGENTLY NEED UR HELP...UR CONDENSED PROGRAM ON NN IS AWESOME...IM TRYING TO IMPLEMENT IT IN ASP.NET USING VSS...CAN U EXPLAIN ME WHAT U MEAN BY CONSOLE.READLINE() ETC..BASICALLY WAT IS "CONSOLE"?? HOW IS IT DEFINED OR USED??

Will Dwinnell said...

"It [the back propagation algorithm] is used in commercial neural network software packages, such a MATLAB."

For clarity's sake, neural network models are not directly supported in the base MATLAB product, only in the Neural Network Toolbox.

Of course, it would be much easier to implement such an algorithm in base MATLAB than in general-purpose languages.

pinchitter said...

Dear John,

Please explain the below mention statement
[code]
Weight w = _weights.Find(delegate(Weight t) { return t.Input == input; })
[/code]
I have not use C# or .NET

John Wakefield said...

Hi Pinchitter,

You could think of an equivalent function reading like this:

public Weight Find(Weight[] weights, Neuron input)
{
foreach (Weight w in weights)
{
if (w.Input == input)
{
return w;
}
}
return null;
}

John

pinchitter said...

Hi John,

Thanks for your quick reply. Oh! I forgot to wish you.. Happy New Year John :)

Anonymous said...

@Gamer Pro.

I love your enthusiasm, but haven't advancements in Support Vector Machines shown greater abilities than biological models, like an ANN?

Anonymous said...

Hi

I have a problem.
When I run this multi layer perceptron C# script, I got error to this line
_inputs[i] = double.Parse(line[i]);

FormatException was unhandled
(Input string was not in a correct format.)

but I think my Patterns.csv file is correct

Anonymous said...

Hi

I have exactly the same problem & error message that "FormatException was unhandled
(Input string was not in a correct format.)
". If there were source files to download, it would be possible to compare, why this occurs.
Anyway great plog and for example no problems with Gamer Pro´s version.

Best Rgrds
Ike

John Wakefield said...

Hi Ike,

Please post a couple of lines from your patterns.csv file and I'll try and work out what the problem is.

John

Anonymous said...

What is the purpose of the ErrorFeedback method?

Artiom said...
This post has been removed by the author.
Anonymous said...

"FormatException was unhandled
(Input string was not in a correct format.)" was caused by the fact that my settings for excel were Scandinavian (Finnish) and after small fixing & adjustments both in code & excel (; instead of , etc.) there have been no problems.

Thanks & regards
Ike

John Wakefield said...

Hi Artiom,

Yes, you could use a neural network to do this. It would be best to do some preprocessing on your inputs to get them in the range of zero to one. Depending on the inputs, a simple linear scaling may work, or you may need to use a non-linear method.

It may be worth using a network with multiple outputs, with each output neuron representing a day range, i.e. 0-10 days, 10-50 days, 50-100 days, etc.

Would be really interested in how it works out for you.

Cheers
John

Daniel said...

Hi,

first, I really enjoy reading your topics. Honestly, good job.

One question to the algorithm, I was running the provided code but the algorithm never gave me good results. The worst error was around 9 and declines over time to 6. But I never got better results than that. Is that intended? I thought the error would go down close to 0. The program was running approximately 5 minutes. Is that too short?

John Wakefield said...

Hi Daniel,

The way to think about the error is as a measure of how uncomfortable the network is with the output. The only way it would ever drop to zero is if all points laid exactly on the hyperplane.

As long as the network can correctly classify the different inputs, this is all that really matters.

Cheers
John

Vineet said...

Hi,
I have developed a very simple ANN code in Labview but i am stuck with the training part of it. I am using backpropagation with delta rule. The network does not seem to converge... Please suggest what to do

Dilemma Personified said...

dear Mr. Wakefield,

I used the algorithm suggested by you for modeling my network using backpropogation... the problem which i am facing is that the network when trained gets stuck to a constant error which generally is very large ... i have tried varying the no of hidden layes and nodes in the same but the problem persists.... can u suggest a solution .. it will be helpful..
Thanx in advance