Читать книгу Domain-Sensitive Temporal Tagging - Jannik Strötgen - Страница 10
ОглавлениеPreface
Time matters! Whatever document we read, be it a news article, biography, some microblog, or a patient’s record, to name but a few examples, temporal information embedded in the documents typically helps us determine the course of events and actions, to correlate events, and eventually to get an overview of the documents’ content. Driven by the continuously increasing amount of textual data that is available on the Web, in electronic archives, and Intranet document repositories the computer-supported analysis and exploration of textual data has become a necessity and also a challenge in numerous application domains. Named Entity Recognition (NER), that is, the task of information extraction that aims at detecting and classifying elements in some text into predefined classes, such as locations, persons, organizations, and temporal expressions, has become a cornerstone of tools and techniques that help to address this challenge.
Only in the past two decades has the topic of temporal tagging as a specific type of NER task become a major focus in research and development. Temporal tagging addresses the extraction, classification, and normalization of temporal expressions that occur in text documents, and it is the prerequisite for temporal information extraction. By now, the important role of temporal tagging has been well recognized in application domains such as text summarization, question answering, information retrieval, and topic detection and tracking. In these applications of temporal tagging, results can be as simple as the fully automated construction of a timeline of events detected in a document’s content and can be as complex as revealing the temporal discourse structure in documents.
To date, there is no book that provides a comprehensive overview of the various methods, tools, evaluation competitions, and challenges the tasks of temporal tagging are faced with in the presence of diverse types of textual data and application domains. This book aims at closing this gap. Starting from the very fundamental role and concepts of time in documents, it provides an up-to-date overview of annotation standards, techniques, and competitions for evaluating the quality of temporal taggers, annotated corpora (including non-English texts) used for evaluations and developments, as well as a detailed overview of temporal taggers.
As the title indicates, this book focuses particularly on temporal tagging of documents from different domains, including text data different from the well-studied domain of news articles. For this, we discuss the challenges and approaches temporal taggers have to consider when processing news-style, narrative-style, colloquial-style, and so-called autonomic-style documents, the latter covering documents that contain many temporal expressions that cannot be normalized to real points in time, but only according to some local or autonomic time frame. Examples of autonomic-style documents are specific types of scientific texts and literary works.
We believe that this book provides researchers, practitioners, and developers a valuable resource for designing and improving temporal tagging techniques and tools, or just for applying them in a useful manner as part of more complex text analysis and exploration pipelines. While publicly available temporal taggers already provide sophisticated output for several application scenarios, there is still a lot of work in this area ahead of us. This book aims at providing a solid foundation on which such work can be built.
Jannik Strötgen and Michael Gertz
Saarbrücken, Germany and Heidelberg, Germany
July 2016