Tag Archives: machine learning

The role of back propagation in a machine learning algorithm

By: Ivy Dahl

Photo by Google DeepMind on Pexels.com

An algorithm is a series of steps to complete a task.

A machine learning algorithm is a computer program composed of many individual, yet interconnected parts that perform a series of steps involving numbers and the guided transformations of them to achieve a specific goal, without being given an explicit direction. Back propagation is one of those parts. It’s a journey from one number to another, calculating the error of a specific guess.

Imagine a bunch of data points on a graph, given to the back propagation algorithm by some previous part of the general machine learning algorithm, all spaced apart so that a line would have to be placed along them to connect them. Approximating a line that would fit all those points evenly, finding the medium, is a difficult task. An objective numerical way is the squared distance between a given curve and the data points. This is called a loss. The fitted curve with a high loss suggests the data points are far from the currently generated curve indicating a bad approximation. Low losses indicate a better fit as the currently generated curve closely aligns with the data points, making it a more accurate approximation.

A function could represent a line.

But a constant function wouldn’t fit this line by itself. Neither would an exponential function. But what if you put them together? And added another? You can think of this collection of functions like a toolbox that is already equipped with these functions, but still in need of a perfect coefficient to bend it to fit the line as close as possible. The goal is to find the combination of coefficients that outputs the best fitting line. The best fitting line is defined as the line with the combination of k’s that yield the lowest loss. Each k represents a random number.

Y(x) =k(0) + k1x + k2x + k3x + k4x

X constructs a curve, Y calculates the loss function of it and outputs one number, the loss, then plugs these numbers back into the general equation for the curve. Now it can start going through the numbers in order to find the minimum loss. This will lead to finding the distance between data points and the current generated curve repeatedly.

But can we make this prediction without going through countless trial and errors? Differentiability is good for fast finding of optimal number combinations. We do this by focusing on one random k coefficient on a coordinate graph. We are trying to find which value of k results in the lowest loss individually.

Knowing the local behavior of a function only, we are blind to all other points. Adjust x. This new adjusted input of x will result in a new value for y which will retain the same amount of change as x was changed. There will be a line connecting x and y and when you take smaller steps it will continue to more accurately align with the optimal line configuration, creating a tangent line between the x point and y point.The steepness of this tangent line is its rate of change, indicating a change in the curve that it’s aligning with.

Now let’s say you do this for every one of the coefficients individually. This gives multiple sets of these slopes that can be compared to the larger set of complex slopes coming together to form a line. These two sets of slopes will coincide. You might not be able to recreate a complex line like this, but if you break it into smaller slopes, it’s easier to approximate the individual slopes to the original, and put them all together once done to recreate the line with the lowest loss.

This is a brief description of how back propagation is an integral part of a machine learning algorithm by taking a specific set of inputs from somewhere else in the algorithm and creating the best fitting curve for it, so that curve can be passed forward to another part of the machine learning algorithm.