There is currently no mathematical theory that provides a definitive answer to this question. The reality is that structured trial and error must be used instead.
Let us frame the problem. If too few hidden neurons are used, the network will be unable to model complex data, resulting in a poor fit. If too many hidden neurons are used, then training will become excessively long and the network may over fit. Over fitting occurs when the network begins to model random noise contained within the data, resulting in a failure to converge on a generalised solution.
How about the number of hidden layers? For most problems, one hidden layer is normally sufficient. However, should your data contain discontinuities, such as a saw tooth waveform, another hidden layer can help. It is worth noting that neural networks with two hidden layers can approximate functions of any shape, so there is no theoretical reason use any more.
So what do I mean by “structured trial and error”? The process I generally use to determine the number of hidden neurons relies on the premise that networks with too many hidden neurons will converge. Therefore, the goal is try to quickly find the smallest network that converges and then refine the answer by working back from there.
- Start with one hidden neuron – the equivalent of a single layer Perceptron.
- Begin training the network.
- If the network fails to converge after a reasonable period, restart training up to ten times, thus ensuring that it has not fallen into a local minimum.
- If the network still fails to converge, add another hidden neuron and return to step two.
- If you get to here, then the network has converged. Note the number of hidden neurons, as this is your guaranteed maximum required.
- Remove a hidden neuron and restart training.
- Repeat the cycle of training and restarting, until either the network converges on a new solution or you choose to stop.
- If the network converges, then you have just lowered your guaranteed maximum. Return to step five.
Binary Classification
With binary classification, we are only interested in the corners of a hypercube, and not the space contained within. The XOR (or parity) problem falls into this category. The table below represents the complete set of data for a three dimensional XOR problem. These points can be mapped onto a cube, as shown.
| ![]() |
Function Approximation
Moving on from binary classification, I fed a selection of continuous functions into the network, and again obtained the number of hidden neurons required for a respectable approximation. These are shown in the graphs below, along with an approximation of the function produced by the network after training. I have indicated the structure of the network using the convention input:hidden:output. As you can see in these examples, the number of hidden neurons ranges from zero to five depending on the complexity of the function.
| z = (x + y) / 2 | 2:0:1 network |
| z = x * y | 2:2:1 network |
| z = Sin(x * 3.142) * Sin(y * 3.142) | 2:3:1 network |
| z = Abs(Cos(x * 3.142) * Sin(y * 3.142)) | 2:5:1 network |
Pattern Recognition
This is a more complex problem, and really depends on the function you expect the network to perform. Let us say you want to scan a series of infrared satellite images for the existence of military vehicles. The images may have a resolution of 10 mega pixels. Do you create a network containing ten million input nodes and expect it to learn? Probably not - in this case, it would make more sense to slide a square window, consisting of a few hundred pixels across the image, and feed the output of that into a network.

A hidden layer, containing maybe tens of neurons, would then act as a feature detector, trained to detect the various military vehicles. The network may also contain several output neurons, each representing a different type of vehicle. Again, with this type of problem, structured trial and error is required to lead you to the correct outcome.
I hope that this brief explanation provides you with the means to determine the appropriate structure for your networks. It really is an art rather than a science.
In my next post, I will introduce Formula AI. The purpose of this project is to show how an artificial neural network is not only able to learn how to navigate a virtual racetrack, but will also learn optimum racing lines and breaking points. Stay tuned…


