Reinforcement Learning is one of the biggest research topics in AI and ML today, thanks to its potential to transform business. It mimics learning as it is done in the real world and therefore researchers are optimistic that it will be of great value to help AI and ML achieve complex goals.

What is reinforcement learning

Reinforcement learning (RL) is one of the three machine learning paradigms, the others being supervised learning and unsupervised learning. Reinforcement learning is a form of machine learning that is based on the concept that software agents can undertake actions in a specific environment in order to get a reward. The machine uses reinforcement learning algorithms to determine what steps to take, based upon feedback from the environment.

RL involves sequential decision problems. What this means is that an agent interacts with an environment by taking actions. The agent’s aim is to maximize its expected reward. In order to act near optimal levels, the agent must be able to reason about the long-term consequences of its actions, even if the immediate rewards associated with this may be negative. Each action sequentially affects the next, and therefore the agent must plan the series of actions that will maximize rewards.

The RL environment is usually modeled as a Markov Decision Process (MDP) and uses dynamic programming techniques. Some of the more popular Reinforcement Learning Algorithms are DQN (Deep Q Network), SARSA (State Action Reward State Action) and Q-Learning.

Business applications

Reinforcement learning requires tremendous amounts of data. Therefore, applications of reinforcement learning are largely found in fields like robotics and gameplay where data is readily available.

Organizations and institutions are now beginning to implement reinforcement learning for problems where sequential decision-making is required and where reinforcement learning can support human experts or automate the decision-making process. Here are a few:


Reinforcement learning provides a principled framework that allows agents to interact with the environment to learn ideal behaviors. Reinforcement learning can provide robot controllers with specifications of what to do. The agent will then interact with the environment, modify behavior accordingly and get rewarded.

Game playing

The most famous use case of reinforcement learning is in AlphaGo which defeated humans in the game Go. Using AI and ML in game playing is not merely fun and games – training software to optimize game scores and outperform players gives researchers opportunities to understand how different processes within a spectrum of different fields can be improved and even optimized.


RL has been used in medical applications ranging from chatbots to pathology to oncology. Machine vision and other machine learning technologies have been used to help pathologists and radiologists get faster results. More recently, researchers have successfully used deep learning and RL to train algorithms to recognize cancerous tissue.

Autonomous Vehicles

Most autonomous vehicles and drones have reinforcement algorithms at the center. They are data intensive, using lidar, HD maps and other technologies to build a 3-D world to train the agent. Now, newer star ups are taking RL to a new direction in autonomous vehicles. For example, a company called Wayve aims to work on robotic intelligence without requiring big models, sensors and endless data. Instead, they aim to discover an efficient training process that enables rapid learning through trial and error. Their autonomous car has so far successfully used a model-free RL algorithm called Deep Deterministic Policy Gradients (DDPG) to solve the task of lane following. This allows the car to adapt to new environments more easily and rapidly, building upon its past experience.


RL is used to generate a goal-oriented chatbot. Goal-oriented chatbots are used by businesses to help their customers with a specific goal like booking tickets, making hotel reservations, etc. Typically, the dialogue system is an ensemble of NLG models, including neural network and template-based models. RL is applied to the data and real-world user interactions to train the system agent to ‘converse’ with humans via text.

Though RL has been in academia for long, it is now gaining traction in the industry, largely due to its ability to combine the latest deep learning algorithms with decision-based systems to give optimized results. Researchers are hopeful that RL and Deep Learning together rapidly will lead them closer to achieving game-changing artificial general intelligence.