Posted by R0bin_L0rd
You’re busy and (depending on effective keyword targeting) you’ve come here looking for something to shave months off the process of learning to produce your own chat bot. If you’re convinced you need this and just want the how-to, skip to "What my bot does." If you want the background on why you should be building for platforms like Google Home, Alexa, and Facebook Messenger, read on.
Why should I read this?
Do you remember when it wasn't necessary to have a website? When most boards would scoff at the value of running a Facebook page? Now Gartner is telling us that customers will manage 85% of their relationship with brands without interacting with a human by 2020 and publications like Forbes are saying that chat bots are the cause.
The situation now is the same as every time a new platform develops: if you don’t have something your customers can access, you're giving that medium to your competition. At the moment, an automated presence on Google Home or Slack may not be central to your strategy, but those who claim ground now could dominate it in the future.
The problem is time. Sure, it'd be ideal to be everywhere all the time, to have your brand active on every platform. But it would also be ideal to catch at least four hours sleep a night or stop covering our keyboards with three-day-old chili con carne as we eat a hasty lunch in between building two of the Next Big Things. This is where you’re fortunate in two ways;
- When we develop chat applications, we don’t have to worry about things like a beautiful user interface because it’s all speech or text. That's not to say you don't need to worry about user experience, as there are rules (and an art) to designing a good conversational back-and-forth. Amazon is actually offering some hefty prizes for outstanding examples.
- I’ve spent the last six months working through the steps from complete ignorance to creating a distributable chat bot and I’m giving you all my workings. In this post I break down each of the levels of complexity, from no-code back-and-forth to managing user credentials and sessions the stretch over days or months. I’m also including full code that you can adapt and pull apart as needed. I’ve commented each portion of the code explaining what it does and linking to resources where necessary.
I've written more about the value of Interactive Personal Assistants on the Distilled blog, so this post won't spend any longer focusing on why you should develop chat bots. Instead, I'll share everything I've learned.
What my built-from-scratch bot does
Ever since I started investigating chat bots, I was particularly interested in finding out the answer to one question: What does it take for someone with little-to-no programming experience to create one of these chat applications from scratch? Fortunately, I have direct access to someone with little-to-no experience (before February, I had no idea what Python was). And so I set about designing my own bot with the following hard conditions:
- It had to have some kind of real-world application. It didn't have to be critical to a business, but it did have to bear basic user needs in mind.
- It had to be easily distributable across the immediate intended users, and to have reasonable scope to distribute further (modifications at most, rather than a complete rewrite).
- It had to be flexible enough that you, the reader, can take some free code and make your own chat bot.
- It had to be possible to adapt the skeleton of the process for much more complex business cases.
- It had to be free to run, but could have the option of paying to scale up or make life easier.
- It had to send messages confirming when important steps had been completed.
The resulting program is "Vietnambot," a program that communicates with Slack, the API.AI linguistic processing platform, and Google Sheets, using real-time and asynchronous processing and its own database for storing user credentials.
If that meant nothing to you, don't worry — I'll define those things in a bit, and the code I'm providing is obsessively commented with explanation. The thing to remember is it does all of this to write down food orders for our favorite Vietnamese restaurant in a shared Google Sheet, probably saving tens of seconds of Distilled company time every year.
It's deliberately mundane, but it's designed to be a template for far more complex interactions. The idea is that whether you want to write a no-code-needed back-and-forth just through API.AI; a simple Python program that receives information, does a thing, and sends a response; or something that breaks out of the limitations of linguistic processing platforms to perform complex interactions in user sessions that can last days, this post should give you some of the puzzle pieces and point you to others.
What is API.AI and what's it used for?
API.AI is a linguistic processing interface. It can receive text, or speech converted to text, and perform much of the comprehension for you. You can see my Distilled post for more details, but essentially, it takes the phrase “My name is Robin and I want noodles today” and splits it up into components like:
- Intent: food_request
- Action: process_food
- Name: Robin
- Food: noodles
- Time: today
This setup means you have some hope of responding to the hundreds of thousands of ways your users could find to say the same thing. It’s your choice whether API.AI receives a message and responds to the user right away, or whether it receives a message from a user, categorizes it and sends it to your application, then waits for your application to respond before sending your application’s response back to the user who made the original request. In its simplest form, the platform has a bunch of one-click integrations and requires absolutely no code.
I’ve listed the possible levels of complexity below, but it’s worth bearing some hard limitations in mind which apply to most of these services. They cannot remember anything outside of a user session, which will automatically end after about 30 minutes, they have to do everything through what are called POST and GET requests (something you can ignore unless you’re using code), and if you do choose to have it ask your application for information before it responds to the user, you have to do everything and respond within five seconds.
What are the other things?
Slack: A text-based messaging platform designed for work (or for distracting people from work).
Google Sheets: We all know this, but just in case, it’s Excel online.
Asynchronous processing: Most of the time, one program can do one thing at a time. Even if it asks another program to do something, it normally just stops and waits for the response. Asynchronous processing is how we ask a question and continue without waiting for the answer, possibly retrieving that answer at a later time.
Database: Again, it’s likely you know this, but if not: it’s Excel that our code will use (different from the Google Sheet).
Heroku: A platform for running code online. (Important to note: I don’t work for Heroku and haven’t been paid by them. I couldn’t say that it's the best platform, but it can be free and, as of now, it’s the one I’m most familiar with).
How easy is it?
This graph isn't terribly scientific and it's from the perspective of someone who's learning much of this for the first time, so here’s an approximate breakdown:
Label |
Functionality |
Time it took me |
---|---|---|
1 |
You set up the conversation purely through API.AI or similar, no external code needed. For instance, answering set questions about contact details or opening times |
Half an hour to distributable prototype |
2 |
A program that receives information from API.AI and uses that information to update the correct cells in a Google Sheet (but can’t remember user names and can’t use the slower Google Sheets integrations) |
A few weeks to distributable prototype |
3 |
A program that remembers user names once they've been set and writes them to Google Sheets. Is limited to five seconds processing time by API.AI, so can’t use the slower Google Sheets integrations and may not work reliably when the app has to boot up from sleep because that takes a few seconds of your allocation* |
A few weeks on top of the last prototype |
4 |
A program that remembers user details and manages the connection between API.AI and our chosen platform (in this case, Slack) so it can break out of the five-second processing window. |
A few weeks more on top of the last prototype (not including the time needed to rewrite existing structures to work with this) |
*On the Heroku free plan, when your app hasn’t been used for 30 minutes it goes to sleep. This means that the first time it’s activated it takes a little while to start your process, which can be a problem if you have a short window in which to act. You could get around this by (mis)using a free “uptime monitoring service” which sends a request every so often to keep your app awake. If you choose this method, in order to avoid using all of the Heroku free hours allocation by the end of the month, you’ll need to register your card (no charge, it just gets you extra hours) and only run this application on the account. Alternatively, there are any number of companies happy to take your money to keep your app alive.
For the rest of this post, I’m going to break down each of those key steps and either give an overview of how you could achieve it, or point you in the direction of where you can find that. The code I’m giving you is Python, but as long as you can receive and respond to GET and POST requests, you can do it in pretty much whatever format you wish.
1. Design your conversation
Conversational flow is an art form in itself. Jonathan Seal, strategy director at Mando and member of British Interactive Media Association's AI thinktank, has given some great talks on the topic. Paul Pangaro has also spoken about conversation as more than interface in multiple mediums.
Your first step is to create a flow chart of the conversation. Write out your ideal conversation, then write out the most likely ways a person might go off track and how you’d deal with them. Then go online, find existing chat bots and do everything you can to break them. Write out the most difficult, obtuse, and nonsensical responses you can. Interact with them like you’re six glasses of wine in and trying to order a lemon engraving kit, interact with them as though you’ve found charges on your card for a lemon engraver you definitely didn’t buy and you are livid, interact with them like you’re a bored teenager. At every point, write down what you tried to do to break them and what the response was, then apply that to your flow. Then get someone else to try to break your flow. Give them no information whatsoever apart from the responses you’ve written down (not even what the bot is designed for), refuse to answer any input you don’t have written down, and see how it goes. David Low, principal evangelist for Amazon Alexa, often describes the value of printing out a script and testing the back-and-forth for a conversation. As well as helping to avoid gaps, it’ll also show you where you’re dumping a huge amount of information on the user.
While “best practices” are still developing for chat bots, a common theme is that it’s not a good idea to pretend your bot is a person. Be upfront that it’s a bot — users will find out anyway. Likewise, it’s incredibly frustrating to open a chat and have no idea what to say. On text platforms, start with a welcome message making it clear you’re a bot and giving examples of things you can do. On platforms like Google Home and Amazon Alexa users will expect a program, but the “things I can do” bit is still important enough that your bot won’t be approved without this opening phase.
I've included a sample conversational flow for Vietnambot at the end of this post as one way to approach it, although if you have ideas for alternative conversational structures I’d be interested in reading them in the comments.
A final piece of advice on conversations: The trick here is to find organic ways of controlling the possible inputs and preparing for unexpected inputs. That being said, the Alexa evangelist team provide an example of terrible user experience in which a bank’s app said: “If you want to continue, say nine.” Quite often questions, rather than instructions, are the key.
2. Create a conversation in API.AI
API.AI has quite a lot of documentation explaining how to create programs here, so I won’t go over individual steps.
Key things to understand:
You create agents; each is basically a different program. Agents recognize intents, which are simply ways of triggering a specific response. If someone says the right things at the right time, they meet criteria you have set, fall into an intent, and get a pre-set response.
The right things to say are included in the “User says” section (screenshot below). You set either exact phrases or lists of options as the necessary input. For instance, a user could write “Of course, I’m [any name]” or “Of course, I’m [any temperature].” You could set up one intent for name-is which matches “Of course, I’m [given-name]” and another intent for temperature which matches “Of course, I’m [temperature],” and depending on whether your user writes a name or temperature in that final block you could activate either the “name-is” or “temperature-is” intent.
The “right time” is defined by contexts. Contexts help define whether an intent will be activated, but are also created by certain intents. I’ve included a screenshot below of an example interaction. In this example, the user says that they would like to go to on holiday. This activates a holiday intent and sets the holiday context you can see in input contexts below. After that, our service will have automatically responded with the question “where would you like to go?” When our user says “The” and then any location, it activates our holiday location intent because it matches both the context, and what the user says. If, on the other hand, the user had initially said “I want to go to the theater,” that might have activated the theater intent which would set a theater context — so when we ask “what area of theaters are you interested in?” and the user says “The [location]” or even just “[location],” we will take them down a completely different path of suggesting theaters rather than hotels in Rome.
The way you can create conversations without ever using external code is by using these contexts. A user might say “What times are you open?”; you could set an open-time-inquiry context. In your response, you could give the times and ask if they want the phone number to contact you. You would then make a yes/no intent which matches the context you have set, so if your user says “Yes” you respond with the number. This could be set up within an hour but gets exponentially more complex when you need to respond to specific parts of the message. For instance, if you have different shop locations and want to give the right phone number without having to write out every possible location they could say in API.AI, you’ll need to integrate with external code (see section three).
Now, there will be times when your users don’t say what you're expecting. Excluding contexts, there are three very important ways to deal with that:
- Almost like keyword research — plan out as many possible variations of saying the same thing as possible, and put them all into the intent
- Test, test, test, test, test, test, test, test, test, test, test, test, test, test, test (when launched, every chat bot will have problems. Keep testing, keep updating, keep improving.)
- Fallback contexts
Fallback contexts don’t have a user says section, but can be boxed in by contexts. They match anything that has the right context but doesn’t match any of your user says. It could be tempting to use fallback intents as a catch-all. Reasoning along the lines of “This is the only thing they’ll say, so we’ll just treat it the same” is understandable, but it opens up a massive hole in the process. Fallback intents are designed to be a conversational safety net. They operate exactly the same as in a normal conversation. If a person asked what you want in your tea and you responded “I don’t want tea” and that person made a cup of tea, wrote the words “I don’t want tea” on a piece of paper, and put it in, that is not a person you’d want to interact with again. If we are using fallback intents to do anything, we need to preface it with a check. If we had to resort to it in the example above, saying “I think you asked me to add I don’t want tea to your tea. Is that right?” is clunky and robotic, but it’s a big step forward, and you can travel the rest of the way by perfecting other parts of your conversation.
3. Integrating with external code
I used Heroku to build my app . Using this excellent weather webhook example you can actually deploy a bot to Heroku within minutes. I found this example particularly useful as something I could pick apart to make my own call and response program. The weather webhook takes the information and calls a yahoo app, but ignoring that specific functionality you essentially need the following if you’re working in Python:
#start req = request.get_json print("Request:") print(json.dumps(req, indent=4)) #process to do your thing and decide what response should be res = processRequest(req) # Response we should receive from processRequest (you’ll need to write some code called processRequest and make it return the below, the weather webhook exa
source https://moz.com/blog/chat-bot
No comments:
Post a Comment