CausalBank is the dataset associated with the paper Guided Generation of Cause and Effect. It contains 314 million pairs of cause-effect statements scraped from the Common Crawl corpus using causal lexical patterns. The resources associated with the paper are available here.

Download CausalBank Resources [(Google Drive link)]

To cite dataset:

  title     = {Guided Generation of Cause and Effect},
  author    = {Li, Zhongyang and Ding, Xiao and Liu, Ting and Hu, J. Edward and Van Durme, Benjamin},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  year      = {2020},


COD3S: Diverse Generation with Discrete Semantic Signatures is an approach for diverse candidate generation using models trained on CausalBank. The method applies locality-sensitive hashing to sentence embeddings to construct sentence bit codes that encode semantic textual similarity. Training a model to condition on these codes allows for a controllable knob for diversity. Code, training data and pretrained models are available here.

COD3S Overview

To cite COD3S:

title     = "{COD3S}: Diverse Generation with Discrete Semantic Signatures",
author    = {Weir, Nathaniel and Sedoc, Jo{\~a}o  and Van {D}urme, Benjamin},
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
year      = "2020"