Meet TABFACT!

A large-scale dataset with 16k Wikipedia tables as evidence for 118k human annotated statements to study fact verification with semi-structured evidence.

Why TABFACT?

HIGH-QUALITY

Mechanical Turk + Post filtering

LARGE-SCALE

16k Wikipedia tables as evidence for 118k human annotated statements for verification.

LOGIC-BASED

Natural language inference based on logic reasoning.

Open-Domain

Reasoning over open domain Wikitables

Explore

We have designed an interface for you to view the data, please click here to explore the dataset and have fun!

Example

In the task, you are given a Wikipedia table with its caption, the goal is to distinguish which statements are entailed by the table and which are refuted by it, an example is shown below:

Download (Train/Test Data, Code)

All the code and data are provided in github. The leaderboard is hosted in codalab

Statistics

The statements are collected through two channels, a simpler one and complex one. They involve reasoning of different difficulty levels. We demonstrate the proportion of higher-order semantics in the annotated statements for the two channels as follows:

Paper

Please cite our paper as below if you use the TabFact dataset.


@inproceedings{2019TabFactA,
  title={TabFact : A Large-scale Dataset for Table-based Fact Verification},
  author={Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou and William Yang Wang},
  booktitle = {International Conference on Learning Representations (ICLR)},
  address = {Addis Ababa, Ethiopia},
  month = {April},
  year = {2020}
}

Acknowledgement

We sincerely acknowledge Ice Pasupat for releasing his complex table-QA dataset and Victor Zhong for his WikiSQL dataset, our work is deeply inspired by these brilliant papers. We also thank Jiawei Wu and Xin Wang for sharing their website template.