A generation of voice assistants such as Siri, Cortana, and Google Now have been popular spoken dialogue systems. More recently, we have seen a rise in text-based conversational agents (aka chatbots). Text is preferred to voice by many users for privacy reasons and in order to avoid bad speech recognition in noisy environments. These agents are also welcome as an alternative to downloading and installing applications. This makes a lot of sense when completing simple tasks such as ordering a cab or asking for the weather.
In most cases, much like voice assistants, these chatbots only support very simple and sequential interactions. The reason is that the user's goal is well-defined and dialogue flow can be easily hand-crafted. However, there are other use-cases such as customer service, or travel booking where there is a decision-making process.
Frames is precisely meant to encourage research towards conversational agents which can support decision-making in complex settings, in this case - booking a vacation including flights and a hotel. More than just searching a database, we believe the next generation of conversational agents will need to help users explore a database, compare items, and reach a decision.
The dialogues in Frames were collected in a Wizard-of-Oz fashion. Two humans talked to each other via a chat interface. One was playing the role of the user and the other one was playing the role of the conversational agent. We call the latter a wizard as a reference to the Wizard of Oz, the man behind the curtain. The wizards had access to a database of 250+ packages, each composed of a hotel and round-trip flights. We gave users a few constraints for each dialogue and we asked them to find the best deal. This resulted in complex dialogues where a user would often consider different options, compare packages, and progressively build the description of her ideal trip.
With this dataset, we also present a new task: frame tracking. Our main observation is that decision-making is tightly linked to memory. In effect, to choose a trip, users and wizards talked about different possibilities, compared them and went back-and-forth between cities, dates, or vacation packages.
Current systems are memory-less. They implement slot-filling for search as a sequential process where the user is asked for constraints one after the other until a database query can be formulated. Only one set of constraints is kept in memory. For instance, in the illustration below, on the left, when the user mentions Montreal, it overwrites Toronto as destination city. However, behaviours observed in Frames imply that slot values should not be overwritten. One use-case is comparisons: it is common that users ask to compare different items and in this case, different sets of constraints are involved (for instance, different destinations). Frame tracking consists of keeping in memory all the different sets of constraints mentioned by the user. It is a generalization of the state tracking task to a setting where not only the current frame is memorized.
Adding this kind of conversational memory is key to building agents which do not simply serve as a natural language interface for searching a database but instead accompany users in their exploration and help them find the best item.
Most dialogue systems implement goal-oriented conversations as a sequential, slot-filling process. Each dialogue state is either augmented with new information (left) or overwritten (right).
Solving frame tracking would enable dialogue systems to memorize all the information provided by the user and allow comparisons between items.
Dialogues were performed by 12 participants over a period of 20 days.
We deployed a Slack bot named wozbot enabling participants to pair up. Wizards were given a link to a search interface at the beginning of each dialogue. The search interface was a simple graphical interface with all the searchable fields in the database (destination, origin, budget, dates, etc.).
For each dialogue, a user was paired up with an available wizard and received a new task.
Find a vacation between September 1st and September 8th to Havana from Stuttgart for under $700. Dates are not flexible. If not available, then end the conversation.
Why this task?
The setting is simple and the user has a good idea of what she wants. Therefore, the agent only needs to help the user find suitable packages and book one. This situation is the first one that a conversational agent for travel booking should handle.
Find a vacation between August 15th and August 25th for 2 adults and 1 kid. You leave from Atlanta. You’re travelling on a budget and you’d like to spend at most $2000. If you can't find anything, then try ending before August 22nd and increasing your budget by $200.
Why this task?
In this case, the user does not know the destination in advance. Besides, the last part encourages the user to make changes to her request if she cannot find a trip corresponding to her dates and budget. Therefore, here, we model both the exploration behaviour of the user as well as the possibility to change the initial constraints. This task is a bit more complicated than the previous one but represents a situation which is likely to occur.
You either want to go to New York, Tokyo, Berlin, or Paris from Montreal. You want to travel sometime between August 23rd and September 1st. Ask for information about each package. Compare the packages and pick the one you like best.
Why this task?
For this task, the user has to compare options for different cities. Here, we model the case where a user looks at specific destinations and tries to find the best trip. This requires extensive exploration of the database and a memory of the different options which have been discussed.