Sophisticated, human-generated datasets for

natural language understanding research

About

Maluuba's datasets for Natural Language Understanding:

  • Machine Reading Comprehension
  • Goal Oriented Dialogue Systems
  • Conversational Interfaces and Reinforcement Learning

Maluuba is making these datasets available for the Artificial Intelligence research community.

Machine Reading Comprehension

NewsQA Dataset

Maluuba's News QA is a new machine reading comprehension dataset for developing algorithms capable of answering questions requiring human-level comprehension and reasoning skills. This dataset of CNN news articles has 120K Q&A pairs. Questions are written by humans in natural language. Questions may not have answers and answers may be multiword passages.

Goal Oriented Dialogue

Frames Dataset

Maluuba's Frames dataset is designed to help drive research that enables truly conversational agents that can support decision-making in complex settings. This dataset was prepared through human-to-human conversations via a chat interface. One human played the role of customer and the other played the role of travel agent. The dataset contains natural and complex dialogues with users considering different options, comparing packages, and progressively building rich descriptions through conversation.