How to train single neuron? How does methodology of learning look? What is DELTA rule? How to check training efficiency? What is ADALINE model?

graphics source

Philosophy of machine learning

British philosopher John Locke (do not confuse with character from popular TV series “Lost”) said that every man borns as “tabula rasa” (blank slate), without any built-in mental content what means that every mind needs experience during life to gain knowledge. If we consider his theorem we will find it’s truth. What is more, after 300 years we can easily make analogy to artificial neural networks, and say that they operate under the same rules. Artificial neurons after creating are not adapted to any tasks and they also have to be trained.

John Locke (from popular TV series “Lost”) and Dharma Project graphics source

The main idea is to show them repeatedly data sets and make them more "sensitive" for these data which we would classify as truth. Interesting is that after training, neural network is going to recognize also these data sets which it have never seen - but there is a condition - these data sets need look similar to training data. Sometimes it’s helpful, but sometimes not, so we will continue learning process using single batch for many iterations until we reach satisfying efficiency level.

Learning in progress..

We know main idea, but how does this “mechanism of learning” work? To be honest - it’s all coming down to synapses weights w tuning, exactly like in biological process. Every synapse is treaten as individual tuner for every input, what allows us to modulating signals which flow through single neuron.

graphics source

How does tuning look? We’re going to describe it, but first let’s tell about types of machine learning.

We distinguish the following types of training:

supervised learning - learning with a teacher (human) - it lets neuron for very precisely classification. For this type of training we must have two pieces of information which will be shown to our network - we need to know what we’d like set as inputs, and what result we would to receive as output. This willingness of receive specific result is called “prediction” (also known as “desired output”). If neuron has these informations, it will find all of dependencies between them, and will remember for the future classification.
unsupervised learning - learning without a teacher - this type of training uses so-called “cluster analysis”. It is slower than classical supervised learning (needs more iterations), but its undeniable advantage is that there’s no need to know what we would receive as output. After some time of training, without human interference neurons will "automatically" learn that there are data which are similar to the others, and there are data which are much different - that will be base for future classification.

In this article we consider only supervised learning case. Of course the most trivial method is manually setting of the synapses weights and watching what will we receive as output, but that’s not exactly what we would like to deal with (cause it’s boring!). There is a method which is much more better but still enough simple. Ladies and Gentlemen, I present to you DELTA rule.

DELTA - it sounds proud.graphics source

Some maths

DELTA rule for single input and for his synapse could be written as short equation:

`w_j|i+1 = w_j|i + η(z - y)x_i`

where:

w_j|i - synapse weight, number j in i iteration

w_j|i+1 - weight of the same synapse in next iteration

x_j - input, number j

z - desired output (prediction)

y - actual output

η - learning rate

j- input number or weight number

i - iteration number

Of course we apply rule for every j input.

For first iteration we should set weights as very small values, different from each other. This asymmetry doesn’t matter for single neuron, but it will do if we create more complex, multilayer network.

In equation appears learning rate η, which is a value chosen by human and which is instruction “how big step should learning function do, for every update of weights”. Process of learning could be compared to tuning the radio from early 90’s. This old type of radio has a knob, which we can turn to find frequency which is used by our favourite radio station. Now imagine that every turn of knob is iteration, and angle of turn is learning rate (learning step). If we change knob position too abruptly it could make result of searching really divergent from our expectations. Opposite situation is when we make too small, “shy” steps - then our tuning will take a long time and there is possibility that it never ends.

Morpheus about machine learning; graphics source

We are exactly in the same situation if we set too high or too low learning rate. That’s why optimal value for learning rate is so important - it provides satisfying algorithm convergence in enough short time.

Learning rate could be constant for whole time of training process, but there are also more complex algorithms where its gradually changed (so-called adaptive learning rate adjustment methods). In our case we will treat learning rate as a small number close to 0 (e.g. 0.07 or 0.2 or 0.4 etc.)

DELTA rule is iterative algorithm, what means that every step (iteration) updates synapses weights, what approximating actual output y to desired output z.

Something is going to update; graphics source

In theory we would to receive actual output y equal to desired output z. In mathematical context it means that we would like to minimize error function.

`E = ½ Σ_{j = 0} (z_j - y_j)² -> min`

where:

E - error function

z_j - desired output for j

y_j - output for single data set, number j

Error function is some kind of quality indicator. It determines how large averaged square error occurs after training between desired and actual outputs for all data sets. Of course we want it to be as small as is possible.

Summary

DELTA rule is rather simple method, but this is basis and starting point for more complex learning algorithms. It’s worth noting that after applying this learning rule, our neuron evolves from perceptron into ADALINE model (ADAptive LINear Element). In following articles I’m going to show how this simple model could be implement for hobby purposes (e.g. character recognition).

We made progress!; graphics source

Thanks for your attention!

Graphics

I confirm that I have used Google advanced image search with usage rights: "free to use, share or modify, even commercially"

Bibliography

R. Tadeusiewicz "Sieci neuronowe" Akademicka Oficyna Wydaw. RM, Warszawa 1993. Seria: Problemy Współczesnej Nauki i Techniki. Informatyka.

Machine learning "from scratch" #2 How to train your neuron

Philosophy of machine learning

Learning in progress..

Some maths

`w_j|i+1 = w_j|i + η(z - y)x_i`

`E = ½ Σ_{j = 0} (z_j - y_j)² -> min`

Summary

Graphics

Bibliography

See also:

Hi @callmejoe!

Contribute to Open Source with utopian.io

Machine learning "from scratch" #2 How to train your neuron

Philosophy of machine learning

Learning in progress..

Some maths

wj|i+1 = wj|i + η(z - y)xi

E = ½ Σj = 0 (zj - yj)2 -> min

Summary

Graphics

Bibliography

See also:

Hi @callmejoe!

Contribute to Open Source with utopian.io

`w_j|i+1 = w_j|i + η(z - y)x_i`

`E = ½ Σ_{j = 0} (z_j - y_j)² -> min`