About the Dataset

Answering questions about a given image is a difficult task, requiring both an understanding of the image and the accompanying query. Maluuba's FigureQA dataset introduces a new visual reasoning task for research, specific to graphical plots and figures. The task comes with an additional twist: all of the questions are relational, requiring the comparison of several or all elements of the underlying plot.

Images are comprised on five types of figures commonly found in analytical documents. Fifteen question types were selected for the dataset concerning quantitative attributes in relational global and one-vs-one contexts. These include properties like minimum and maximum, greater and less than, medians, curve roughness, and area under the curve (AUC). All questions in the training and validation sets have either a yes or no answer.

Click on a figure below to enlarge it and see some of its questions, answers, and bounding boxes.

Vertical Bar Graph
Horizontal Bar Graph
Line Graph
Dot Line Graph
Pie Chart

Highlights

100,000
Figure images in the training set
1,327,368
Question-answer pairs in the training set
100
Unique colors and possible names for figure plot elements
15
Question types for quantitative attributes

Details

Dataset Split # Images # Questions Has Answers &
Annotations?
Color Scheme
Train 100,000 1,327,368 Yes Scheme 1
Validation 1 20,000 265,106 Yes Scheme 1
Validation 2 20,000 265,798 Yes Scheme 2
Test 1 20,000 265,024 No Scheme 1
Test 2 20,000 265,402 No Scheme 2

Unique Features

Additionally, the following features make FigureQA a distinct visual question-answering (VQA) and reasoning dataset:

  • It is entirely synthetically generated. Any number of samples can be generated in a configurable and extensible manner.
  • Each figure image is accompanied by the source data used to create it. This data can be used as input features or a learning target, and can be used to formulate questions and answers.
  • Rich bounding box annotations for all plot elements are extracted automatically and included with each generated figure image.

Figure Color Schemes

To color and identify plot elements, 100 colors where selected from the X11 named color set. Colors were selected to have a large color distance from white, the background color, with some modifications to the names to enhance readability.

In order to evaluate models on unseen color combinations, we provide validation and test sets with two color schemes consisting of alternating disjoint color sets. Each figure is colored with one set according to the training color scheme, then the other color set in the test set using the test color scheme. This ensures that all colors are learned during training, and is consistent with the one used in the CLEVR dataset.

For example:

Scheme 1
  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set A, including crimson, seafoam, and royal blue.
  • Horizontal bar graphs and dot line charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.
Scheme 2
  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.
  • Horizontal bar graphs and dot line charts are colored with using 50 unique colors in set A, including crimson, seafoam, and royal blue.