TempCourt: evaluation of temporal taggers on a new corpus of court decisions

María Navas-Loro; Erwin Filtz; Víctor Rodríguez-Doncel; Axel Polleres; Sabrina Kirrane; María Navas-Loro; Erwin Filtz; Víctor Rodríguez-Doncel; Axel Polleres; Sabrina Kirrane

doi:10.1017/S0269888919000195

Abstract: The extraction and processing of temporal expressions (TEs) in textual documents have been extensively studied in several domains; however, for the legal domain it remains an open challenge. This is possibly due to the scarcity of corpora in the domain and the particularities found in legal documents that are highlighted in this paper. Considering the pivotal role played by temporal information when it comes to analyzing legal cases, this paper presents TempCourt, a corpus of 30 legal documents from the European Court of Human Rights, the European Court of Justice, and the United States Supreme Court with manually annotated TEs. The corpus contains two different temporal annotation sets that adhere to the TimeML standard, the first one capturing all TEs and the second dedicated to TEs that are relevant for the case under judgment (thus excluding dates of previous court decisions). The proposed gold standards are subsequently used to compare ten state-of-the-art cross-domain temporal taggers, and to identify not only the limitations of cross-domain temporal taggers but also limitations of the TimeML standard when applied to legal documents. Finally, the paper identifies the need for dedicated resources and the adaptation of existing tools, and specific annotation guidelines that can be adapted to different types of legal documents.

Other Articles By Authors

TempCourt: evaluation of temporal taggers on a new corpus of court decisions

¹D3206 – Ontology Engineering Group, Universidad Politécnica de Madrid, Montegancedo Campus, Madrid, Spain e-mails: mnavas@fi.upm.es, vrodriguez@fi.upm.es

²Institute for Information Business, Vienna University of Economics and Business, Vienna, Austria e-mails: Erwin.Filtz@wu.ac.at, Axel.Polleres@wu.ac.at, Sabrina.Kirrane@wu.ac.at

Received: 25 February 2019

Revised: 06 November 2019

Accepted: 09 November 2019

Published online: 17 December 2019

Abstract: Abstract: The extraction and processing of temporal expressions (TEs) in textual documents have been extensively studied in several domains; however, for the legal domain it remains an open challenge. This is possibly due to the scarcity of corpora in the domain and the particularities found in legal documents that are highlighted in this paper. Considering the pivotal role played by temporal information when it comes to analyzing legal cases, this paper presents TempCourt, a corpus of 30 legal documents from the European Court of Human Rights, the European Court of Justice, and the United States Supreme Court with manually annotated TEs. The corpus contains two different temporal annotation sets that adhere to the TimeML standard, the first one capturing all TEs and the second dedicated to TEs that are relevant for the case under judgment (thus excluding dates of previous court decisions). The proposed gold standards are subsequently used to compare ten state-of-the-art cross-domain temporal taggers, and to identify not only the limitations of cross-domain temporal taggers but also limitations of the TimeML standard when applied to legal documents. Finally, the paper identifies the need for dedicated resources and the adaptation of existing tools, and specific annotation guidelines that can be adapted to different types of legal documents.

HTML

Acknowledgments

This work was partially funded by a Predoctoral grant from the Programa Propio de la Universidad Politécnica de Madrid, and from a grant from Consejo Social de la Universidad Polit’ecnica de Madrid. This work was supported by the Republic of Austria’s Federal Ministry for Digital and Economic Affairs and the Jubilüaumsfonds der Stadt Wien

https://www.iso.org/iso-8601-date-and-time-format.html.

http://www.timeml.org.

http://www.akomantoso.org/.

https://tempcourt.github.io/TempCourt/.

ISO 24617-1 Language Resource Management—Semantic Annotation Framework (SemAF)—Time and Events (SemAF Time and ISO-TimeML).

https://www.w3.org/TR/annotation-model/.

http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#.

For instance, diagnosis such as tumors or medical tests are relevant events that should appear in a timeline of a medical doctor, as stated by Styler IV et al., (2014), but not in other types of texts. Similarly, specific legal events such as preliminary rulings (explained in Section 3) in European judgments are always relevant to lawyers, although they never appear in other kinds of texts.

Please note that the same sentence contains two TEs which are attributed to two different temporal dimensions.

https://catalog.ldc.upenn.edu/docs/LDC2006T08/timeml_annguide_1.2.1.pdf.

http://eur-lex.europa.eu/.

https://hudoc.echr.coe.int

https://www.supremecourt.gov/.

https://echr.coe.int/Pages/home.aspx?p=ddisclaimerc=.

https://eur-lex.europa.eu/legal-content/EN/TXT/?uri%3dCELEX:32011D0833.

https://eur-lex.europa.eu/content/legal-notice/legal-notice.html#droits..

https://www.copyright.gov/title17/92chap1.html#105.

Regulation (EU) 2016/679.

http://www.timeml.org/timebank/documentation-1.2.html.

http://www.timeml.org/timebank/aquaint-timeml/aquaint_timeml_1.0.tar.gz.

https://catalog.ldc.upenn.edu/docs/LDC2006T08/timeml_annguide_1.2.1.pdf.

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-timex2-guidelines-v0.1.pdf.

The functionality and the rules were not modified.

The final corpus can be downloaded at https://tempcourt.github.io/TempCourt/.

http://www.cs.york.ac.uk/semeval-2013/task1/data/uploads/timeml-validator-1.1a.tar.gz.

https://www.cs.york.ac.uk/semeval-2013/task1/.

https://www.cs.york.ac.uk/semeval-2013/task1/.

https://github.com/HeidelTime/heideltime/.

https://uima.apache.org.

https://github.com/stanfordnlp/CoreNLP/tree/master/src/edu/stanford/nlp/time.

https://nlp.stanford.edu/software/sutime.shtml#Extensions.

https://github.com/xszhong/syntime.

https://github.com/cnorthwood/ternip.

Fraction of the results identified which were correct.

Fraction of the results that should have been found which were correctly identified.

Beginning of, Inside of, Outside of a time expression.

https://cleartk.github.io/cleartk/docs/module/cleartk_timeml.html.

https://github.com/tarsqi/ttk.

http://timeml.org/tarsqi/index.html.

https://github.com/nchambers/caevo.

https://github.com/hllorens/otip.

https://github.com/leondz/usfd2, https://code.google.com/archive/p/usfd2/.

https://bitbucket.org/kentonl/uwtime-standalone.

Except USFD2.

https://github.com/HeidelTime/heideltime/wiki/Evaluation-Results.

http://www.echr.coe.int/Documents/Note_citation_ENG.pdf.

Some cases, such as distinctions between EQUAL_OR_LESS/LESS_THAN (for UWTime) and LATE/END and EARLY/START (for TERNIP) were counted as errors.

The two first authors equally contributed to this work.

Rights and permissions

References (50)

About this article

Cite this article

María Navas-Loro, Erwin Filtz, Víctor Rodríguez-Doncel, Axel Polleres, Sabrina Kirrane. 2019. TempCourt: evaluation of temporal taggers on a new corpus of court decisions. The Knowledge Engineering Review. 34:5 doi: 10.1017/S0269888919000195

María Navas-Loro, Erwin Filtz, Víctor Rodríguez-Doncel, Axel Polleres, Sabrina Kirrane. 2019. TempCourt: evaluation of temporal taggers on a new corpus of court decisions. The Knowledge Engineering Review. 34:5 doi: 10.1017/S0269888919000195

{{lists.name}}

TempCourt: evaluation of temporal taggers on a new corpus of court decisions

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors