CausalBank is the dataset associated with the paper Guided Generation of Cause and Effect. It contains 314 million pairs of cause-effect statements scraped from the Common Crawl corpus using causal lexical patterns. The resources associated with the paper are available here.

Download CausalBank Resources [(Google Drive link)]

COD3S: Diverse Generation with Discrete Semantic Signatures is an approach for diverse candidate generation using models trained on CausalBank. The method applies locality-sensitive hashing to sentence embeddings to construct sentence bit codes that encode semantic textual similarity. Training a model to condition on these codes allows for a controllable knob for diversity. Code, training data and pretrained models are available here.

COD3S Overview

