The Acceleration of AI
- mjfahrion
- Oct 10, 2023
- 6 min read
I wrote this back in January of 2021, after getting into some machine learning, computer vision applications, not having any what was going to unfold less than two years later in generative AI.
If neural networks and GPU's are totally foreign to you, its a good quick read to better understand the technology underpinning AI and how it differs from traditional algorithm based computing.

An introduction to Neural Networks and GPUs
We have spent most of our lives inside the age of computing. And, as part of the tech community, we understand the fundamentals of hardware and software, CPUs and algorithms. We understand that higher clock speeds, wider data busses and an increased number of processor cores has enabled today’s CPUs to run increasingly more complex software. Among other things, this allows us to watch YouTube videos, write documents and handle large spreadsheet all at the same time. And CPUs will continue to get more powerful and be able to execute even more complex software tasks.
But, about two decades ago, a new technology began bending the curve and added rocket fuel to the acceleration of Artificial Intelligence. That new technology was the ”Graphical Processing Unit” or “GPU”. In just the last eight years, the advances in GPU power were able to begin to deliver on the theoretical promises of Artificial Intelligence. This article will explain what is different and innovative about the GPU as it applies to Artificial Intelligence and Deep Learning. But, before we can appreciate the value of the GPU and understand why this specialized class of processor has been so influential, we have to take a brief look at a different kind of computing, called “Artificial Neural Networks.”
Artificial Neural Networks
Artificial Neural Networks are inspired by how we understand the human mind to work. So, let’s begin with a quick visit inside our own brain. Among other things, our brain consists of billions of fundamental units called neurons. Neurons are responsible for receiving input from the external world, transforming and relaying the signals, and sending motor commands to our muscles. When triggered by its own input signals, a neuron “fires” to its connected neurons. The creation and tuning of these connections, as well as the strength of the connections, is what enables our ability to learn.
Looking deeper, those billions of neurons in our brain are organized into parallel columns that process information. A mini-column may contain 100 neurons. Those mini-columns are further organized into larger hyper-columns, consisting of around 100 million columns. This structure enables our own ability to process and act on many inputs, at least seemingly, in parallel. The net result of this large parallel processing structure is the ability to learn by observation and examples rather than by being “programmed.”
The concept of artificial neural networks in computer science dates back to the 1940s. But, it is only recently that we have had the computing power to begin to unleash the potential of artificial neural networks. Let’s look at how artificial neural networks are constructed.
Artificial Neural Networks start with the concept of three layers:
The first layer is the input layer. These nodes perform no calculations, they are simply the input variables, firing their output if the incoming data set matches their conditions.
The middle is referred to as the “hidden layers” which could be one or many layers where the computational work is done. Each node applies an algorithm to the signals from its input connections to decide whether to fire to its output.
The third layer is the output layer. If the task of our artificial neural network was categorize whether an image was a cat or a dog, we would have two output neurons, cat and dog.
That is the basic structure of an artificial neural network. The data set itself, and the number of potential outcomes, will factor into determining the width and depth of this matrix.
Weighting & Propagation
Before we look at how we train this structure to perform useful tasks, there are three concepts we need to examine.
First is the concept of “weighted connections”. Each of the nodes in each of the layers is connected to each of the nodes in the next layer, but the strength, or relevance of those connections is given a weighting factor. This factor will determine the relative impact of that node as an input to the next algorithm. Note that, in the figure above, the line thickness represents the weighting in the resultant trained network. Prior to the training, all connections were equal in weight (equal line thicknesses).
Next is the concept of “forward propagation”. This is the process of getting an output from the network for any given input by traveling through those weighted connections.
And finally, we have “reverse propagation.” This is the function that enables the artificial neural network to “learn.” In reverse propagation, the network compares training data to the output produced by the network. By propagating the “error” backwards into the network, it calculates how much each connection contributed to the error and applies a correction factor to the weighting of that connection. By iterating this process through a set of labeled training data, the network will optimize itself such that it produces the most correct result possible for any set of inputs.
With that fundamental knowledge, we can now examine how an artificial neural network is “trained.”
Learn, Repeat, Evolve
For illustration purposes, let’s assume we are training our network to recognize if an image is an X or an O.
Our first task is to configure our artificial neural network. We must decide how many input nodes, how many hidden layers to use, how many nodes in each of those hidden layers, and how many output nodes. These neural network configuration decisions are based on math, intuition and experimentation to see which configuration provides the best results.
We then take a set of training data. This is known data, already labeled as a known “X” or “O.” The network examines its first training data. At this stage all connections are weighted equally. The data forward propagates through the network, initially producing a random result. The network then compares that output with the known correct answer provided by the labeled data set. Errors are then reverse propagated back through the network. Connections that contributed to the error are weakened. Connection scores that contributed to correct outcomes are strengthened. Using this iterative process, the network trains itself, creating the artificial neural network model required to correctly classify the data set. That model can then be deployed and used to “infer” correct outcomes with raw, unlabeled data.
So, armed with that knowledge of real and artificial neural networks, let’s now turn back to our original topic, the Graphical Processing Unit, or GPU.
Brain Power for AI
Neural networks are clearly a parallel processing paradigm. The more simultaneous calculations that can be done, the faster a network can be “trained” or, the faster a trained network can process new data. The calculations themselves are fairly simple, so we’d like a processor architecture capable of performing as many simple mathematical calculations as quickly as possible. Traditional CPUs are built to be general purpose computing tools, able to be programmed to perform a wide variety of functions, generally in a sequential fashion. In comparison, a GPU is a purpose-built type of processing unit built to perform a narrower set of functions in a massively parallel, not sequential, environment.
Today’s CPU may have a small number of highly complex cores, each capable of a wide variety of complex operations. In comparison, a GPU will have hundreds or thousands of much smaller cores optimized for a limited set of mathematical operations. This structure is ideal for the intensive, high throughput parallel processing needed to train an artificial neural network, as well as faster at executing an existing, trained model against new data sets.
High-definition graphic images are inherently a large data set. When GPUs first entered the tech scene, their domain was gaming and graphics. They provided the ability to render lifelike graphic images, and quickly. GPUs rendered Dory’s wake as she crosses the Atlantic and the lighting and shadows in Tiger Woods’ PGA Tour on Xbox. That same GPU architecture has proven to be very effective when applied to various forms of artificial neural networks. Applying GPUs to artificial neural networks has accelerated development and deployment of AI deep learning applications where complex algorithms aren’t programmed, the network is “trained.”