Artificial intelligence is very fascinating. This project of mine is a scalable artificial neural network, which in its structure resembles a brain. It consists of many layers of neurons which are connected with synapses, every neuron in one layer is connected to every neuron in the next layer. The artificial synapses in the model are weighting coefficients, arbitrary numbers, which completely define the network.
The first layer of neurons are called input neurons. Their activity is being set to whatever input the neural network is fed with. The information gets processed in multiple so called hidden layers before it reaches the last layer, which is the output layer.
The network can be trained with lots of data to which the target output is known. In the process, the neural net improves itself until it can perform the task with very high percision, even for problems it has never encountered in training. The way that the neural network solves the task is not pre-programmed. It finds the solution all by itself. This is the magic of AI.
The neuron is the basic component of the network. As input it gets the sum of the outputs o of all previous neurons, weighted with the corresponding coefficients w. On top of that a bias coefficient is added. The sigmoid f of this number is calculated. The result is a number between 0 and 1. This sigmoid acts as a model of the threshold which occurs in real neurons. The bias coefficient shifts the threshold left or right.
When the network is being initialized with random weighting coefficients, the output to a given input is completely random. In order for the network to improve itself, a learning method has to be implemented. The basic idea is as follows: The neural network is fed with input data to which the target output is known. The network calculates its own output, not knowing about the target output. Then the output and the target are compared, which gives the error of the network. The error must be minimized in order for the output to match the target.
There are two different approaches to this task:
Evolution | Gradient Descent / Backpropagation |
---|---|
The most obvious method is to mimic evolution in nature. You generate 1000 or more neural networks randomly, give them a task and then sort them by the lowest error. The best ones can be combined or randomly mutated, the worst ones get replaced with new randomly generated networks. This way, a new generation of 1000 networks can be created. After thousands of generations, the best network can perform the task with high accuracy. |
Gradient descent on the other hand only needs one neural network. Look at the network as a complicated mathematical function net(w,input)=output. The output of the function net only depends on the weighting coefficients w as well as the input. For the training data set, you know the target output to the input, so you can calculate how far off the output of the network is from the target. Now imagine increasing or decreasing one of the weighting coefficients a little bit. The error will change. If you change the weighting in the right direction, the error will decrease. So by changing all the weightings in the right direction, the error will descent. The problem is now that there are millions of weighting coefficients and you can not try every single combination of increasing or decreasing all of the weightings. Instead, the gradinet of the error in all the weightings is being calculated. This way, you know in which direction you have to change every weighting in order for the error to decrease. So you change every weighting a little bit in the right direction and then repeat the process. |
The neural network in the background was given the task to add 1 to a number from 0 to 255 in binary. The 8 input neurons are on the left, the output neurons are on the right. White circles correspond to 1, black circles to 0. The lines represent the synapses. Bright grean means that the previous neuron wants the next one to fire, red means that the previous neuron prohibits firing. In the middle, I have built in a narrow pass of just two neurons. All the information has to go through these. The lefthand part of the network learned compressing 8 bits of information in two floating point numbers, the righthand part learned to extract the information again. Somewhere in the middle the addition of 1 is performed, where exactly is impossible to tell.