SARA v2

Read the paper

Dataset

This Github repo contains instructions and scripts to format the dataset into a variety of useful formats, and to run a number of baseline experiments.

Argument identification and coreference

The annotations for argument identification and coreference are located under argument_placeholders/, where each line is either # subsection_id or a tuple of span_start, span_end, argument_id. The integers span_start and span_end are 0-based, character-based indices into the statutes. The character at index span_end is part of the span.

Structure extraction

Annotations for structure extractions are under structure/, and follow a Prolog-like syntax, with keyword arguments. For simplicity, a call to a keyword argument of the form ARG: ARG is written as ARG. Boundaries for each subsection are in boundaries, and as for argument identification, they are 0-based, character-based, inclusive indices.

Argument instantiation

Annotations for argument instantiation can be found in the file argument_instantiation. Each line contains four tab-separated elements: 1. the name of the case, 2. the name of the subsection to apply, 3. input argument-value pairs, 4. output argument-value pairs. Input argument-value pairs are formatted as a json-formatted Python dictionary, ie {argument: value}. Output argument-value pairs can be represented in two different ways, depending on whether the subsection applies. If the subsection doesn't apply, the fourth element is simply the string false, corresponding to output argument-value pair {"@TRUTH": False}. If the subsection applies, the output argument-value pairs have the same format as the input argument-value pairs. One needs to add {"@TRUTH": True} to the output.

Citation

Factoring Statutory Reasoning as Language Understanding Challenges [paper] [bibtex]
Nils Holzenberger and Benjamin Van Durme
ACL, 2021

back to main page