© 2022 Uma Chandrasekhar. All rights reserved.

Imagination based AIs — Part -4

Uma Chandrasekhar

--

Reinforcement Learning Explained

As against the classical machine learning techniques, where the AI algorithm is provided with the correct set of inputs and outputs, in deep reinforcement learning the algorithm is asked to choose the optimal strategy by making trial and error decision, based on some random inputs. The excellent analogy I can provide here is of a student who has been provided with a structured syllabus and teaching methods as in a school or a university against a student provided with all the available resources but not provided with the correct method of learning and allowed to explore and learn.

Reinforcement learning agents are not provided with any basic domain heuristics unlike their other machine learning counterparts, instead they have to construct their domain heuristics from their own learning and decision making. Every time, the algorithm makes a good decision, a bigger reward is provided and thus helping it to differentiate between good and bad strategies. This, in turn, help it to build the domain heuristics required to solve the task. Say for example, a game character (Learning agent) has to make a decision to follow a path from a set of events to reach a destination. As he is going through the path, he will come to know that he has taken the right path, if he finds the energy drink in his chosen path. In case if he does not, then the path will lead him back to the original point or some mid-way point, where he is asked to make one more decision to find the right path, which might lead to another reward, possibly a weapon.

It might seem awfully confusing to my audience, who has watched robots, which does not make any mistakes, while doing some precision tasks, especially to those who have some experience with industrial robots, used to build machines, furniture, and office premises. But the sheer knowhow about the basic differences between the two types of robots will provide a significant knowledge input about why the two robotic agents behave differently. The industrial robots are programmed to do such difficult hard tasks, while the AI robots are not provided with any basic program inputs, which ought to help them to do the task in the precise manner. AI robotic agents are learning agents who are allowed to figure out the right way on their own, resulting in more enhanced AI based robotic systems to solve complex problems using excellent strategies. In other words, unlike the industrial automation robots, imaginative AI based robots, might be capable of educating themselves, in order to enhance their knowledge, instead of merely doing the set assigned tasks. To simply put it, they do not follow commands from their master, but they develop strategies on their own and are proactive, in solving problems, by taking initiatives. To clarify more, unlike programmed industrial robots, who are given step and step instructions to perform a given task, AI based robotic systems, find their own way in order to make strategies to perform a given task.

© 2022 Uma Chandrasekhar. All rights reserved.

Artificial Neural Networks

Now that we know, how an imagination based AI function and strategize. Let’s analyze. What is it made up of? The most basic building block of any reinforcement learning algorithm is the neural network. As the name suggests, it is neuroscience influenced and the word neural in the neural network is derived from the neuron of human brain. The function of a neuron is to take many inputs and activate a function.

© 2022 Uma Chandrasekhar. All rights reserved.

Activation Function

The activation functions define the relationship between the input and the tasks performed. There are five basic activations functions used in any kind of neural network. They are step function, sigmoid function, tanh function, Rectified Liner Unit ( ReLU ) and Leaky ReLU. The most favored activation function, in many deep learning models, is Leaky ReLU, for one advantage — the output of this functions is influenced by the input value for all values of x, from -∞ to +∞. Reinforcement learning or Machine Learning algorithms use the stochastic gradient descent model which converges to an output and hence the more the number of x values used, better the convergence and hence the trained model output will be closer to the original expected output. Say for example, if a Tesla Autopilot has to avoid a curb, the action will be performed better when large number of input images of the curb are provided.

© 2022 Uma Chandrasekhar. All rights reserved.

Layers

Another important terminology in a neural network is layers. A layer is a collection of neurons which takes an input and gives an output. The left most layer, in any neural network is the input and the right most layer is the output layer and the in between layers are called hidden layers. The hidden layer can be 1, 2, 3 or sometimes more than 3. Many current day training model outputs can be obtained using three hidden layers. The important idea to note here is that the different hidden layers can be trained to use different activation functions. Say for example, layer 1 can use sigmoid function, layer 2 Leaky ReLU and layer 3 may use tanh function. This choice is given to the data scientist who are creating the models. The models can be created by using a combination of many activation functions or a single activation function as required by the output. The best way to do this is to code the layers with different activation functions and check for the best output. In other words, test the model built with different combination of activation functions and choose the combination which gives the closest output to the expected output, and that specific combination can be chosen to train in order to perform a said task.

© 2022 Uma Chandrasekhar. All rights reserved.

A standard neural network or Shallow Neural Network (SNN) contains, one input layer, one hidden layer and one output layer, whereas a Deep Neural Network or DQN, which is used in deep learning or deep reinforcement learning contains more than one hidden layer, along with one input layer and an output layer.

© 2022 Uma Chandrasekhar. All rights reserved.
© 2022 Uma Chandrasekhar. All rights reserved.

The output of any neural network using deep reinforcement learning is tabled as a Q — table. While SNN takes the current state and the action as input in order to give the next state output as a Q — Value which represents a strategy, the DQN takes the current state alone as the input and combines it with many actions , in order to give many Q- value outputs, leaving the Deep Reinforcement Learning (DRL) agent to choose the best strategy based on their training. This is the secret behind deep reinforcement learning, as the agent choses a strategy out of the multiple task strategies, learnt before, to earn the maximum reward. Remember a reward is given, whenever a step towards the goal is taken. And thus, deep reinforcement learning lays the foundational concepts of Artificial General Intelligence, which I mentioned in the first article of this series. This is also the secret behind, DRL agents being successful at beating the World Champion at ‘Go’, a game which is lot tougher than Chess, which was previously considered to be requiring the most strategies.

© 2022 Uma Chandrasekhar. All rights reserved.

The above image shows a feed forward ANN, no feedback, hence cannot hold any information and hence not an efficient one, as memory is not the most essential part of the neural network, but plays a vital role in learning.

Types of ANN

There are many types of artificial neural networks (ANN) and the type is chosen based on the specific application requirement. Remember, I was mentioning about task agnostic approach used in E2E learning used by deep reinforcement learning, in the part-3 , of this series, it has its own disadvantage namely memory. Thus, Neural networks dealing with long, enormous tasks may forget what they learned in the beginning, hence different types of ANN came into existence, which might remember their learning.

The most highly used four types are as follows — CNN, RNN , LSTM and GRU

CNN- Convolutional Neural Network is the type of ANN (mostly DQN as it contains more than one hidden layer) used in computer vision applications. The three hidden layers are convolutional layer, pooling layer and fully connected layer. The layers are arranged in 3 dimensions as they are mostly used in inputting three colors R, G, B.

RNN — Recurrent Neural Networks are made to hold information, using a feedback loop, unlike the other standard types of ANN which are not capable of remembering the information from the previous states. But RNN suffer from a problem called ‘vanishing gradient’, where in back propagation, the gradient shrinks to a tiny value and hence stops contributing towards the learning process and the converging to a valued output. Hence RNN has a forgetful short term memory, besides having a hidden state memory.

© 2022 Uma Chandrasekhar. All rights reserved.

LSTM — The Long –Short — Term Memory is a special kind of RNN, which is made of a number of RNN units connected together through a transport medium, also known as the cell state, which in turn, transmits all the information in a sequence chain. As the cell state travels from one previous state to the next current state, the information gets added or deleted via gated units. The medium also serves as the memory of LSTM, thus reducing the effects of the short term memory. The gates learn, over several states, about which data is important to hold on to and which are useless and get rid of it. The four gates shown in the figure illustrates the LSTM- Forget, input, cell and output.

© 2022 Uma Chandrasekhar. All rights reserved.

The forget gate which takes the input of the current state (Xi) and the previous hidden state (hi-1) and passes it through the sigmoid function. The output of the sigmoid function ranges from 0 to 1 and any values close to zero are deleted and all the values close to one are kept.

In order to update the cell signal (Ct) in the cell state, the LSTM focusses on the input gate, where there are two activation functions. A sigmoid function similar to the forget gate , which approximates the output in the range of 0 to 1 and a tanh function which receives the same signals as the forget state, but approximates the output between -1 to +1 , then the two outputs are passed through a pointwise multiplication unit, where the sigmoid function decides all the information closer to 1 to keep and closer and 0 to forget.

The cell state is the horizontal dotted line carrying the Ct and Ct-1 signals. To begin with, the Ct-1 gets pointwise multiplied by the forget vector (output of the forget gate). So, all the values close to zero from the previous cell state are dropped and only the values closer to one are forwarded. The resultant is the pointwise addition to the output of the input gate which turn provides the values for the current cell state Ct.

The output state decides the next hidden state ht . As the hidden state is a collection of feedback from previous inputs, it can be used to make predictions. To start with, we send the ht-1 and xt through the sigmoid function and the ht through a tanh function. The result of the two are sent through a pointwise multiplicator and then we arrive at the ht. The new cell state Ct and new hidden state ht are both carried on to the next iteration or state.

Thus, LSTM is used to hold long term dependencies and plenty of information and thus enable it to solve extremely complex tasks. They are used for time series data such as stock market, election polls etc., which are meant to change continuously over a period of time, mostly used in NLP and route planning.

© 2022 Uma Chandrasekhar. All rights reserved.

GRU- Gated Recurrent Units are a superior form of RNN, where fewer operations are required to train a model and hence faster the training. The speed is achieved by reducing the number of gates used in RNN to merely three, instead of four as in LSTM. The gates help to keep or forget the information. The update gate is a combination of forget and input gate of LSTM and it is used for the input, keep and forget functions and the Reset gate decides which information can be allowed to be forgotten, for efficient route planning applications. In other words, at the end of every iteration, ‘Reset gate’ tells the ‘Update gate’, ‘what information can be forgotten’ and’ what information can be kept and used’ as input for the next iteration. And finally, the output gate which uses the tanh function and decides the next hidden state ht

This brings us to the end of part- 4 of this series.

Watch the same space for the grand finale of this series.

--

--

Uma Chandrasekhar

I live and work as an executive technical innovator in Silicon Valley, California . I love working in autonomous systems including AVs.