Companies are increasingly exploring opportunities to apply reinforcement learning to some of their most challenging problems. Reinforcement learning is a class of machine learning algorithms that is behind groundbreaking results in robotics and machines beating humans in games like Go and StarCraft II. With reinforcement learning, models are trained to take actions in an environment so as to maximize a reward; not every action results in a positive reward, but the goal is that over time the sequence of actions will maximize the possible total reward. For this reason, reinforcement learning is best when applied to sequential decision-making problems.
 
One of the most promising enterprise applications of reinforcement learning is customer engagement and determining the next best action. That next best action could be a product recommendation, an offer of a discount, or some messaging on relevant product or brand information. The action could be delivered on one of many different channels (phone call, email, text, posted mail) and at specific times during the day. The critical thing is that the action is personalized based on the customer’s interests, needs and preferences. This is very different from traditional approaches of mass communication and product-centric campaigns.
 
Of course, the first step in being able to personalize customer engagements is being able to collect and integrate data about individuals across the enterprise and data silos. Once the data has been collected and integrated, machine learning models can predict what products a customer is most likely to buy, what coupons they may redeem or what channel a customer is most likely to respond to. Many companies even struggle at this stage. These predictions, however, still require some kind of intervention. How do we make use of the fact that we know a customer is likely to want to buy a certain product? Reinforcement learning is the next step in next best action maturity. With reinforcement learning, the sequence of decisions regarding what product, what offer, and what channel can be automated to maximize the lifetime value of the customer while maximizing their experience with the brand.
 
If automating next best action decisions using reinforcement learning is the goal, how do you get started? It can be quite complex to fully automate a typical company’s engagement with customers. To appreciate the complexity, it is worth describing how reinforcement learning works in a little more detail. As mentioned above, a reinforcement learning agent learns to take actions in an environment to maximize a reward. To do that we need to define what is the state of the agent (for now we can assume there is an agent for each customer); this may include what the customer has purchased in the past, what offers they have seen before, what communications they have engaged with, demographics, and even other model outputs that might capture their needs and preferences. The actions can then include all the combinations of products, offers and messaging across all the different channels. The communications themselves can be personalized to include wording, images, colors and even font sizes. As you can see, the possible state and action pairs is enormous.
 
To begin, companies should focus on an initial smaller state and action space; for example, only consider offer recommendations for one channel and only allow the agent to select offers from a manually prepared bank of offers. The agent will then be responsible for making decisions only for offers on that channel. All the other factors that go into personalizing the next best action across all the other channels will need to be executed by campaigns and rules as it would normally be done. The agent still needs to know what those other actions are and update the state accordingly. In addition to other actions that may be outside of the control of the agent, other rules could include ensuring that offers are consistent with the organization's policy (e.g. not promoting alcohol to non-drinkers), specific layout for segments of the population, ensuring there is an appropriate amount of diversity in the products and offers, and rules for eligibility like whether a customer has already received an offer.
 
Systems that manage customer interactions can assemble a complete view of customers and execute personalized customer journeys across multiple channels. These systems enable marketers to integrate advanced analytic models, like reinforcement learning models, with existing rules. In the case of reinforcement learning, this provides marketers a platform to progressively expand the state and actions space as the agent learns more and demonstrates value. Over time, the agent will take on more responsibility and gradually replace the old rules. It is worth noting that some of those old rules may be the product of decisions that were not based on data, possibly based on the marketers’ bias, and resulting in less than optimal outcomes.
 
One of the key things to consider when defining that initial state and action space is whether there is a good amount of historical data and whether the predictive models have shown to be able to perform well. Some of the best reinforcement learning models begin learning without any previous demonstrations – that is they experiment and learn the best actions on their own. This was the case with AlphaGo Zero, which beat the previous best machine Go player, AlphaGo Master, which in turn had beat the best human players. For enterprises, having a model that experiments with real customers’ experiences is not a good idea. So being able to learn as much as possible from historical data is important. Other things to consider include: being able to constrain actions so that they are considered “safe” - meaning the negative impact of trying different actions is not too great; there is sufficient data for a good state representation; and of course, whether the benefit of being able to automate the decisions has significant improvement over the current approach.
 
As more companies look to make better and faster decisions that are critical to their business, reinforcement learning will increasingly be used to automate a wider range of business decisions.
As more companies look to make better and faster decisions that are critical to their business, reinforcement learning will increasingly be used to automate a wider range of business decisions.
Companies need to consider carefully where they start and how they allow models to progressively increase the decisions they are responsible for. Tools for executing business rules that are flexible and can easily integrate analytic models will facilitate a controlled deployment of automated business decisioning.
Portrait of Peter Mackenzie
Peter Mackenzie
Peter is the AI Team Lead for Americas.  He is responsible for the successful delivery of AI projects in America, supporting the sales field and managing the team’s program of research and IP development.   Previously, Peter was the director of services at Think Big Analytics for 5 years, responsible for the delivery of big data and advance analytic projects.  Peter has a strong background in program management and has successfully delivered large programs of work in a range of different industries.  Peter holds a bachelors of commerce in management science and a master’s in computer science from McGill University.
 
View all posts by Peter Mackenzie

Related Posts