CausalBank is the dataset associated with the paper Guided Generation of Cause and Effect. It contains 314 million pairs of cause-effect statements scraped from the Common Crawl corpus using causal lexical patterns. The resources associated with the paper are available here.
Download CausalBank Resources [(Google Drive link)]
@inproceedings{ijcai2020-guided,
title = {Guided Generation of Cause and Effect},
author = {Li, Zhongyang and Ding, Xiao and Liu, Ting and Hu, J. Edward and Van Durme, Benjamin},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
year = {2020},
}
COD3S: Diverse Generation with Discrete Semantic Signatures is an approach for diverse candidate generation using models trained on CausalBank. The method applies locality-sensitive hashing to sentence embeddings to construct sentence bit codes that encode semantic textual similarity. Training a model to condition on these codes allows for a controllable knob for diversity. Code, training data and pretrained models are available here.
@inproceedings{weir-etal-20-cod3s,
title = "{COD3S}: Diverse Generation with Discrete Semantic Signatures",
author = {Weir, Nathaniel and Sedoc, Jo{\~a}o and Van {D}urme, Benjamin},
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
year = "2020"
}