Sunday, 21 September 2008

Training Neural Networks Using Back Propagation in C#

In my previous post, I showed how multi layer Perceptrons could be used to solve linearly non-separable problems. In that example, I calculated all the network weights by hand. In this post, I shall introduce the back propagation training algorithm, which will do all the hard work for us.

The back propagation algorithm has become the de facto training algorithm for artificial neural networks, and has been studied by the artificial intelligence community since the 1970s. It is used in commercial neural network software packages, such a MATLAB.

The principle of back propagation is actually quite easy to understand, even though the maths behind it can look rather daunting. The basic steps are:
  1. Initialise the network with small random weights.
  2. Present an input pattern to the input layer of the network.
  3. Feed the input pattern forward through the network to calculate its activation value.
  4. Take the difference between desired output and the activation value to calculate the network’s activation error.
  5. Adjust the weights feeding the output neuron to reduce its activation error for this input pattern.
  6. Propagate an error value back to each hidden neuron that is proportional to their contribution of the network’s activation error.
  7. Adjust the weights feeding each hidden neuron to reduce their contribution of error for this input pattern.
  8. Repeat steps 2 to 7 for each input pattern in the input collection.
  9. Repeat step 8 until the network is suitably trained.
It is important to note that each pattern is presented in turn, and the network adjusted slightly, before moving on to the next pattern. If we simply let the network perfectly correct the errors before moving onto the next pattern, it would never learn a generalised solution for the entire input collection.

The magic really happens in step 6, which determines how much error to feed back to each hidden neuron. Once the error value has been established, training can continue as described in my post about the single layer perceptron.

To illustrate how the error value is calculated, I will use this network diagram.

Multi Layer Perceptron

Then, if we use these variables:

output_o = Activation value of the output neuron
error_o = Error at the output neuron
error_h = Error at a hidden neuron
weight_ho = A weight connecting a hidden neuron to the output neuron

The error feed back to a hidden neuron is calculated:

error_h = error_o * Derivative(output_o) * weight_ho

For an explaination about how to calculate the derivative value, see my post on the sigmoid function.

Unlike the single layer Perceptron, training a multi layer Perceptron with back propagation does not guarantee a solution, even if one is available. This is because training can become stuck in a local error minimum. There are a number of strategies to overcome this, which I shall cover another time. For now, restarting training is normally sufficient for small networks.

I will continue to use the same classification problem from my last post.

Linearly Non-Separable

Representing these input patterns. You should copy and paste these into a Patterns.csv file to use in the code sample below:

0.10, 0.03, 0
0.11, 0.11, 0
0.11, 0.82, 0
0.13, 0.17, 0
0.20, 0.81, 0
0.21, 0.57, 1
0.25, 0.52, 1
0.26, 0.48, 1
0.28, 0.17, 1
0.28, 0.45, 1
0.37, 0.28, 1
0.41, 0.92, 0
0.43, 0.04, 1
0.44, 0.55, 1
0.47, 0.84, 0
0.50, 0.36, 1
0.51, 0.96, 0
0.56, 0.62, 1
0.65, 0.01, 1
0.67, 0.50, 1
0.73, 0.05, 1
0.73, 0.90, 0
0.73, 0.99, 0
0.78, 0.01, 1
0.83, 0.62, 0
0.86, 0.42, 1
0.86, 0.91, 0
0.89, 0.12, 1
0.95, 0.15, 1
0.98, 0.73, 0

The code below is my implementation of back propagation in C#. In my opinion C# Generics are a beautiful thing, and you will see that I use them extensively. To run this code, create a new C# Console application and paste the code into the Program.cs file. Paste the input patterns into a file named Patterns.csv located in your project directory. Include Patterns.csv in your project, and make sure you set its “Copy to Output Directory” attribute to True.

using System;

using System.Collections.Generic;

using System.IO;

 

public class Network

{

    private int _hiddenDims = 2;        // Number of hidden neurons.

    private int _inputDims = 2;        // Number of input neurons.

    private int _iteration;            // Current training iteration.

    private int _restartAfter = 2000;   // Restart training if iterations exceed this.

    private Layer _hidden;              // Collection of hidden neurons.

    private Layer _inputs;              // Collection of input neurons.

    private List<Pattern> _patterns;    // Collection of training patterns.

    private Neuron _output;            // Output neuron.

    private Random _rnd = new Random(); // Global random number generator.

 

    [STAThread]

    static void Main()


    {

        new Network();

    }

 

    public Network()

    {

        LoadPatterns();

        Initialise();

        Train();

        Test();

    }

 

    private void Train()

    {

        double error;

        do

        {

            error = 0;

            foreach (Pattern pattern in _patterns)

            {

                double delta = pattern.Output - Activate(pattern);

                AdjustWeights(delta);

                error += Math.Pow(delta, 2);

            }

            Console.WriteLine("Iteration {0}\tError {1:0.000}", _iteration, error);

            _iteration++;

            if (_iteration > _restartAfter) Initialise();

        } while (error > 0.1);

    }

 

    private void Test()

    {

        Console.WriteLine("\nBegin network testing\nPress Ctrl C to exit\n");

        while (1 == 1)

        {

            try

            {

                Console.Write("Input x, y: ");

                string values = Console.ReadLine() + ",0";

                Console.WriteLine("{0:0}\n", Activate(new Pattern(values, _inputDims)));

            }

            catch (Exception e)

            {

                Console.WriteLine(e.Message);

            }

        }

    }

 

    private double Activate(Pattern pattern)

    {

        for (int i = 0; i < pattern.Inputs.Length; i++)

        {

            _inputs[i].Output = pattern.Inputs[i];

        }

        foreach (Neuron neuron in _hidden)

        {

            neuron.Activate();

        }

        _output.Activate();

        return _output.Output;

    }

 

    private void AdjustWeights(double delta)

    {

        _output.AdjustWeights(delta);

        foreach (Neuron neuron in _hidden)

        {

            neuron.AdjustWeights(_output.ErrorFeedback(neuron));

        }

    }

 

    private void Initialise()

    {

        _inputs = new Layer(_inputDims);

        _hidden = new Layer(_hiddenDims, _inputs, _rnd);

        _output = new Neuron(_hidden, _rnd);

        _iteration = 0;

        Console.WriteLine("Network Initialised");

    }

 

    private void LoadPatterns()

    {

        _patterns = new List<Pattern>();

        StreamReader file = File.OpenText("Patterns.csv");

        while (!file.EndOfStream)

        {

            string line = file.ReadLine();

            _patterns.Add(new Pattern(line, _inputDims));

        }

        file.Close();

    }

}

 

public class Layer : List<Neuron>

{

    public Layer(int size)

    {

        for (int i = 0; i < size; i++)

            base.Add(new Neuron());

    }

 

    public Layer(int size, Layer layer, Random rnd)

    {

        for (int i = 0; i < size; i++)

            base.Add(new Neuron(layer, rnd));

    }

}

 

public class Neuron

{

    private double _bias;                       // Bias value.

    private double _error;                      // Sum of error.

    private double _input;                      // Sum of inputs.

    private double _lambda = 6;                // Steepness of sigmoid curve.

    private double _learnRate = 0.5;            // Learning rate.

    private double _output = double.MinValue;   // Preset value of neuron.

    private List<Weight> _weights;              // Collection of weights to inputs.

 

    public Neuron() { }

 

    public Neuron(Layer inputs, Random rnd)

    {

        _weights = new List<Weight>();

        foreach (Neuron input in inputs)

        {

            Weight w = new Weight();

            w.Input = input;

            w.Value = rnd.NextDouble() * 2 - 1;

            _weights.Add(w);

        }

    }

 

    public void Activate()

    {

        _input = 0;

        foreach (Weight w in _weights)

        {

            _input += w.Value * w.Input.Output;

        }

    }

 

    public double ErrorFeedback(Neuron input)

    {

        Weight w = _weights.Find(delegate(Weight t) { return t.Input == input; });

        return _error * Derivative * w.Value;

    }

 

    public void AdjustWeights(double value)

    {

        _error = value;

        for (int i = 0; i < _weights.Count; i++)

        {

            _weights[i].Value += _error * Derivative * _learnRate * _weights[i].Input.Output;

        }

        _bias += _error * Derivative * _learnRate;

    }

 

    private double Derivative

    {

        get

        {

            double activation = Output;

            return activation * (1 - activation);

        }

    }

 

    public double Output

    {

        get

        {

            if (_output != double.MinValue)

            {

                return _output;

            }

            return 1 / (1 + Math.Exp(-_lambda * (_input + _bias)));

        }

        set

        {

            _output = value;

        }

    }

}

 

public class Pattern

{

    private double[] _inputs;

    private double _output;

 

    public Pattern(string value, int inputSize)

    {

        string[] line = value.Split(',');

        if (line.Length - 1 != inputSize)

            throw new Exception("Input does not match network configuration");

        _inputs = new double[inputSize];

        for (int i = 0; i < inputSize; i++)

        {

            _inputs[i] = double.Parse(line[i]);

        }

        _output = double.Parse(line[inputSize]);

    }

 

    public double[] Inputs

    {


        get { return _inputs; }

    }

 

    public double Output

    {

        get { return _output; }

    }

}

 

public class Weight

{

    public Neuron Input;

    public double Value;

}


When the application runs, it will use the back propagation algorithm to train the network, restarting if it gets trapped in a local error minimum. Once fully trained, you will be able to test the network against untrained points on the graph and observe that it generalises correctly.

It is simple to modify the code to accept more input dimensions and use more hidden neurons, by changing the variables _inputDims and _hiddenDims. If you try this, make sure to add extra columns to the Patterns.csv file. By doing this, this network can be used to solve very complex problems.

In future posts, I will be using networks like this to solve real world problems - I would welcome suggestions.

John