ProjectPhysX

Description

Artificial intelligence is very fascinating. This project of mine is a scalable artificial neural network, which in its structure resembles a brain. It consists of many layers of neurons which are connected with synapses, every neuron in one layer is connected to every neuron in the next layer. The artificial synapses in the model are weighting coefficients, arbitrary numbers, which completely define the network.
The first layer of neurons are called input neurons. Their activity is being set to whatever input the neural network is fed with. The information gets processed in multiple so called hidden layers before it reaches the last layer, which is the output layer.
The network can be trained with lots of data to which the target output is known. In the process, the neural net improves itself until it can perform the task with very high percision, even for problems it has never encountered in training. The way that the neural network solves the task is not pre-programmed. It finds the solution all by itself. This is the magic of AI.

The Neuron Model

The neuron is the basic component of the network. As input it gets the sum of the outputs o of all previous neurons, weighted with the corresponding coefficients w. On top of that a bias coefficient is added. The sigmoid f of this number is calculated. The result is a number between 0 and 1. This sigmoid acts as a model of the threshold which occurs in real neurons. The bias coefficient shifts the threshold left or right.

Learning

When the network is being initialized with random weighting coefficients, the output to a given input is completely random. In order for the network to improve itself, a learning method has to be implemented. The basic idea is as follows: The neural network is fed with input data to which the target output is known. The network calculates its own output, not knowing about the target output. Then the output and the target are compared, which gives the error of the network. The error must be minimized in order for the output to match the target.
There are two different approaches to this task: concept of gradient descent

Evolution
Gradient Descent / Backpropagation

The most obvious method is to mimic evolution in nature. You generate 1000 or more neural networks randomly, give them a task and then sort them by the lowest error. The best ones can be combined or randomly mutated, the worst ones get replaced with new randomly generated networks. This way, a new generation of 1000 networks can be created. After thousands of generations, the best network can perform the task with high accuracy. Gradient descent on the other hand only needs one neural network. Look at the network as a complicated mathematical function net(w,input)=output. The output of the function net only depends on the weighting coefficients w as well as the input. For the training data set, you know the target output to the input, so you can calculate how far off the output of the network is from the target. Now imagine increasing or decreasing one of the weighting coefficients a little bit. The error will change. If you change the weighting in the right direction, the error will decrease. So by changing all the weightings in the right direction, the error will descent.
The problem is now that there are millions of weighting coefficients and you can not try every single combination of increasing or decreasing all of the weightings. Instead, the gradinet of the error in all the weightings is being calculated. This way, you know in which direction you have to change every weighting in order for the error to decrease. So you change every weighting a little bit in the right direction and then repeat the process.

I have tried both of the above methods. Both do work, but evolution is very slow compared to gradient descent, because it requires lots of randomly generated numbers and it is basically guessing the right solution randomly. Gradient descent is a lot faster, as you only need one network. The gradient calculation is a recursive method without any complicated mathematical operations, so it is really fast. Also instead of guessing, you literally know in which direction the solution lies.
A disadvantage of gradient descent is getting stuck in local minima. In this case weightings are being optimised to a state in which the error is non-zero. If this occurs, the neural net can be re-initialized with random weightings for a good chance of starting on the other side of the hill.

Evolution	Gradient Descent / Backpropagation
The most obvious method is to mimic evolution in nature. You generate 1000 or more neural networks randomly, give them a task and then sort them by the lowest error. The best ones can be combined or randomly mutated, the worst ones get replaced with new randomly generated networks. This way, a new generation of 1000 networks can be created. After thousands of generations, the best network can perform the task with high accuracy.	Gradient descent on the other hand only needs one neural network. Look at the network as a complicated mathematical function net(w,input)=output. The output of the function net only depends on the weighting coefficients w as well as the input. For the training data set, you know the target output to the input, so you can calculate how far off the output of the network is from the target. Now imagine increasing or decreasing one of the weighting coefficients a little bit. The error will change. If you change the weighting in the right direction, the error will decrease. So by changing all the weightings in the right direction, the error will descent. The problem is now that there are millions of weighting coefficients and you can not try every single combination of increasing or decreasing all of the weightings. Instead, the gradinet of the error in all the weightings is being calculated. This way, you know in which direction you have to change every weighting in order for the error to decrease. So you change every weighting a little bit in the right direction and then repeat the process.

Example

The neural network in the background was given the task to add 1 to a number from 0 to 255 in binary. The 8 input neurons are on the left, the output neurons are on the right. White circles correspond to 1, black circles to 0. The lines represent the synapses. Bright grean means that the previous neuron wants the next one to fire, red means that the previous neuron prohibits firing. In the middle, I have built in a narrow pass of just two neurons. All the information has to go through these. The lefthand part of the network learned compressing 8 bits of information in two floating point numbers, the righthand part learned to extract the information again. Somewhere in the middle the addition of 1 is performed, where exactly is impossible to tell.

Sources

-	emgoz. (2016). Snakes, Neural Networks and Genetic Algorithms
-	MIT OpenCourseWare. (MIT 6.034 Artificial Intelligence, 2010). 12a: Neural Nets
-	MIT OpenCourseWare. (MIT 6.034 Artificial Intelligence, 2010). 12b: Deep Neural Nets
-	MIT OpenCourseWare. (MIT 6.034 Artificial Intelligence, 2010). 13: Learning: Genetic Algorithms
-	Wikipedia. Backpropagation

NeuralNetwork

Description

The Neuron Model

Learning

Example

Sources