AU2021106681A4 - A model for event time extraction and its application to temporal question answering system - Google Patents

A model for event time extraction and its application to temporal question answering system Download PDF

Info

Publication number
AU2021106681A4
AU2021106681A4 AU2021106681A AU2021106681A AU2021106681A4 AU 2021106681 A4 AU2021106681 A4 AU 2021106681A4 AU 2021106681 A AU2021106681 A AU 2021106681A AU 2021106681 A AU2021106681 A AU 2021106681A AU 2021106681 A4 AU2021106681 A4 AU 2021106681A4
Authority
AU
Australia
Prior art keywords
events
temporal
answer
event
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021106681A
Inventor
Vanitha Guda
Suresh Kumar Sanampudi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guda Vanitha Dr
Sanampudi Suresh Kumar Dr
Original Assignee
Guda Vanitha Dr
Sanampudi Suresh Kumar Dr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guda Vanitha Dr, Sanampudi Suresh Kumar Dr filed Critical Guda Vanitha Dr
Priority to AU2021106681A priority Critical patent/AU2021106681A4/en
Application granted granted Critical
Publication of AU2021106681A4 publication Critical patent/AU2021106681A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

TITLE: "A MODEL FOR EVENT TIME EXTRACTION AND ITS APPLICATION TO TEMPORAL QUESTION ANSWERING SYSTEM" ABSTRACT A temporal Question Answering (QA) system that uses an improved domain independent model/ method of extraction and representation of events, times, and event-time relations for increasing the performance, wherein the said system collectively comprises of means for question processing, document processing, answer fusion, and answer extraction with temporal layer with the concept of divide and conquer strategy. The question processing means traces a query input for temporal expressions (TE) depending on features of events and times; with a next stage to categorize to simple questions and complex questions; and decomposition /split of a complex question is based on the identification of temporal signals, which link simple events to form complex questions. The document processing means pre processes a collection of documents that are fed for the model evaluations and datasets description presented in a results section. The answer fusion means divides complex questions into smaller units as Q-Focus and Q-Restriction obtained by an information processing engine. The answer extraction means entrusts a list of possible answers to the Q-Focus and to the Q-restriction and forms the input of the individual answer filtering task. Figure associated with Abstract is Fig. Ic 1 2/4 Question Verb in second < ~pa rt>YK When Verb is did+Q2+occur? Obtaining subject of Q1 Obtaining Q2 IIinfinitive verb |Obtaining Q2infinitive verb When did +Q2 + + with infinitive verb? When did + Subject Q1+Q2 with infinitive verb? Fig. I Question split processing

Description

2/4
Question
Verb in second
< ~pa rt>YK
When Verb is did+Q2+occur?
Obtaining subject of Q1 Obtaining Q2 IIinfinitive verb
|Obtaining Q2infinitive verb When did +Q2 + + with infinitive verb? When did + Subject Q1+Q2 with infinitive verb?
Fig. I Question split processing
TITLE OF THE INVENTION "A MODEL FOR EVENT TIME EXTRACTION AND ITS APPLICATION TO TEMPORAL QUESTION ANSWERING SYSTEM"
Technical Field of the Invention
[001] Present invention relates to a domain independent model for extraction and representation of events, times, and event-time relations in natural language text. Further, the invention relates to a system that uses the improved domain independent model for increasing the performance of a temporal question answering system.
Background of the Invention
[002] In the present information era, natural language is one of the significant sources to describe the events which happen over the world. To refer those events can be done in majorly by inferring location, participants, or things. But in a real time scenario, the event usually associated with time, the occurrence of activity within a time point is an essential dimension than location or participants.
[003] Temporal information in natural language text defines the occurrence times of a situation that happens. The events that fall within time or activities that change with time are essential for many of the NLP applications. Extraction and representation of event time relations in natural language text help to solve Al applications like temporal question answering systems, text summaries, and information retrieval systems.
[004] For example, Temporal Question Answering requires extraction of temporal information encoded in the original text. In answering the questions like a person's current occupation details or about a specific incident at a particular time, in these cases system may need to selectively determine which of several recorded occupations as states, incidents reported as occurrences within a time stamp.
Brief Summary of the Invention
[005] Present invention relates to a domain independent model for extraction and representation of events, times, and event-time relations in natural language text. Further, the model has applied to the temporal question answering system in order to answer the time-specific queries.
[006] Event extraction plays an important role in most of the natural language applications like text summarization, question answering systems, named entity recognition etc.
[007] It is therefore a primary objective of the present invention to construct a domain independent model to extract events.
[008] It is a further objective of the present invention to build a model to extract lexical, syntactic semantic based temporal expressions.
[009] It is a further objective of the present invention to build the event-time relationships in a structured form.
[010] It is a further objective of the present invention to event time relationship in temporal question answering (QA) systems.
[011] The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
Brief description of the Drawings
[012] Fig. 1a illustrates an architecture diagram of a Temporal QA according to an exemplary embodiment of the present invention;
[013] Fig. lb illustrates a block diagram of Question processing according to an exemplary embodiment of the present invention;
[014] Fig. I illustrates a block diagram of Question split processing according to an exemplary embodiment of the present invention;
[015] Fig. 2a illustrates a block diagram of a Hybrid or composite model for the events extraction according to an exemplary embodiment of the present invention;
[016] Fig. 2b illustrates a model for time expression extraction according to an exemplary embodiment of the present invention;
[017] Fig. 2c illustrates a model for event-time relation with event-time graph according to an exemplary embodiment of the present invention.
Detailed Description of the Invention
[018] According to an exemplary embodiment of the present invention, a temporal Question Answering (QA) system that uses an improved domain independent model/ method of extraction and representation of events, times, and event-time relations for increasing the performance is disclosed. The said system involves a Question posed to the said QA system, the question is processed with a temporal layer, split complex questions into several simple questions.
[019] In accordance with the exemplary embodiment of the present invention, the said system processes the task by identifying a variety of components in the query for which answers can be extracted based on the question type, events, time features which it belong to. The said system answers obtained to these simple questions are integrated to erect a final answer for the given complex temporal query.
[020] In accordance with the exemplary embodiment of the present invention, the tasks performed in the architecture are independent of each other but collectively provide answer for a given temporal query.
[021] In accordance with the exemplary embodiment of the present invention, the said system collectively comprises of a means of question processing, a means of document processing, a means of answer fusion, and a means of answer extraction with temporal layer with the concept of divide and conquer strategy.
[022] In accordance with the exemplary embodiment of the present invention, the question processing means traces a query input for temporal expressions (TE) depending on features of events and times; with a next stage to categorize to simple questions and complex questions; and decomposition /split of a complex question is based on the identification of temporal signals, which link simple events to form complex questions.
[023] In accordance with the exemplary embodiment of the present invention, the document processing means preprocesses a collection of documents that are fed for the model evaluations and datasets description presented in a results section.
[024] In accordance with the exemplary embodiment of the present invention, the answer fusion means divides complex questions into smaller units as Q-Focus and Q-Restriction obtained by an information processing engine.
[025] In accordance with the exemplary embodiment of the present invention, a re composition unit carries out individual answer filtering and answer comparison based re-composition activities to obtain final answer for the given query.
[026] In accordance with the exemplary embodiment of the present invention, the answer extraction means entrusts a list of possible answers to the Q-Focus and to the Q- restriction and forms the input of the individual answer filtering task.
[027] In accordance with the exemplary embodiment of the present invention, the said system selects only those answers that satisfy the temporal constraints obtained by the TE Identification and Normalization Unit.
[028] In accordance with the exemplary embodiment of the present invention, the said system sees the date of the answer should be temporally compatible with the temporal tag (i.e., the date of the answer must lie within the date values of the tag otherwise it will be rejected).
[029] In accordance with the exemplary embodiment of the present invention, the said system pursues the answers that fulfil the constraints go to the Answer Re Composition module.
[030] In accordance with the exemplary embodiment of the present invention, the system once the answers have been filtered using the signals and the ordering key, the results for the Q-Focus are compared with the answer to the Q-Restriction in order to determine if they are temporally compatible.
[031] In accordance with the exemplary embodiment of the present invention, the temporal signal of the said system establishes the appropriate order between the answers of Q- Focus and the Q-Restriction. Analysing the temporal compatibility between the list of possible answers of Q-Focus and Q-Restriction answer, it constructs the appropriate answer to the complex question.
[032] In accordance with the exemplary embodiment of the present invention, the QA system addresses the temporal queries which mostly are complex questions are split into simple questions based on the number of events present in the questions.
[033] In accordance with the exemplary embodiment of the present invention, the QA performs obtaining keywords from the question based on the parts-of-speech, filtering out the sentences from the document, Ranking and sorting the relevant sentences and finally extracting the answer from the highest ranked sentences based upon the expected type of answer, analysed from the question.
[034] In accordance with the exemplary embodiment of the present invention, a domain independent model/ method of extraction and representation of events, times, and event- time relations for increasing the performance of a temporal question answering system, comprises a step of extracting, events with hand coded rules, and machine learning techniques as a composite/ hybrid way of extracting events.
[035] In accordance with the exemplary embodiment of the present invention, wherein the hybrid way of extracting events considers plain text, pre-process the text to get valid tokens from the text (pre process involves elimination of stop words, lexical, morphological and syntactical features).
[036] In accordance with the exemplary embodiment of the present invention, the hybrid way of extracting events involves steps of pre-processing task by a POS Tagger for tagging verbal entities as VB means verbs; running algorithm rules to get the lexical features; analysis of perform syntax and morphological with WordNet for identifying different senses of a word to extract the token that appears as both nouns and verbs which can be defined as event using composite rules; running CRF based Stanford Named Entity (NE) tagger that will tag remaining unidentified events; and running the composite rules to identify nonverbal events.
[037] In accordance with the exemplary embodiment of the present invention, the method further comprises of the step of detecting, events which are in non-verbal form by combining hand coded rules with machine learning techniques to get the semantics but not the nonverbal events.
[038] In accordance with the exemplary embodiment of the present invention, wherein the method of event extraction with hand coded rules is based on an algorithm built with rules is formed by considering the events which are in the form of actions, activities, occurrences or states.
[039] In accordance with the exemplary embodiment of the present invention, wherein the method of event extraction with hand coded rules is based on properties like reporting, perception, state, occurrence and lexical features of 34 classes of POS tags.
[040] In accordance with the exemplary embodiment of the present invention, wherein the method extracts events from a given text it is important to know about the event features.
[041] In accordance with the exemplary embodiment of the present invention, wherein the system and method uses custom features to detect specific knowledge and features of an event it is required to check the associated time, places and participants of the event.
[042] In accordance with the exemplary embodiment of the present invention, the event extraction, involves usage of hand coded rules that are good at identifying the events which are in verb form and failing to identify non-verbal events. Step 2 of the method involves detection of events which are in non-verbal form. Here, the inventors combined hand coded rules with machine learning techniques which are able to get the semantics but not the non-verbal events.
[043] In accordance with the exemplary embodiment of the present invention, the improved method also called as the hybrid/composite method uses hand coded rules with machine learning techniques which differ from the prior art event extraction strategies.
[044] Events Extraction with Machine learning techniques and Hybrid/ Composite Rules In this section explains step-2 using machine learning techniques like Conditional Random Field (CRF), Semantic Role Label (SRL) and WordNet are used to extract events. In this semantics means all senses of the token were identified nonverbal events are not recognized. Instep-3 hand coded rules and machine learning techniques are combined as composite or hybrid way for events extraction. i) Text Pre-processing: Performs normalization of the text includes tokenization and parts of speech tagging. ii) Feature Identification: For feature identification first rules approach is combined with machine learning techniques. Rules approach derives the lexical, features, syntax and semantic features are identified by using machine learning techniques like Conditional Random Field (CRF) and Semantic Role Label (SRL) are used. a) Using CRF technique: For CRF implementation have done using the c++ based CRF ++ package, it is one of the open-source implementations for segmenting the data. CRF implementation needs to train the model with some feature, the features are POS- Part of Speech of the token, Tense of the Token, Aspect of token like progressive, perfective and none (eg: Still, I am doing the same job), Event State considers Occurrence, Event Stem. b) Semantic Role Label (SRL): is the method based on event nominalization which focuses on the events or target class. If a word is treated as noun and also a verb two tasks needs to be performed one is identifying the arguments related to the word and other is Argument
Labeling. The major feature of SRL is it will extract all constituents of a word by determining their arguments and its adjuncts. The two important tasks of SRL are: (1) Event and Result Nominalization is the process to get the constitutes and nominals of a word in bulk of nonverbal nouns. (2) Nominal's identification with the help of its suffixes (-ed, or, ee, er). iii) Hybrid/ Composite rules method Consider Plain Text T, pre-process the text to get valid tokens from the text (pre process involves elimination of Stop words, lexical, morphological and syntactical features). Step by step procedure for Event Extraction using following steps: 1. Performs pre-processing task by POS Tagger: this will tag the verbal entities as VB means verbs. 2. Run Rules Algorithm to get the lexical features. 3. Perform syntax and morphological Analysis using machine learning with Word Net: this step is to identify different senses of a word to extract the token that appears as both nouns and verbs which can be defined as event using composite rules. 4. Run CRF based Stanford Named Entity (NE) tagger this will tag remaining unidentified events. 5. Run the composite rules to identify nonverbal events.
Composite Rules are Rulel for nominalizations: Morphologically derived nouns from verbs are distinguished as nominalizations (or nonverbal nouns). The nonverbal nounsare identified by its suffixes (- tion, ion, ing, ed) these are not NEs but may end with these suffixes are considered as Event words. Rule2 for verb and noun: The token if it is verb and also noun then the combinations are searched in the sentence of the test set, non-NE noun words are considered as the Events.
Rule 3 for nominal's and nonverbal: Nominal and nonverbal event nouns can be identified by the complements of its aspectual PPS headed by prepositions like (during, after, before, at the end of, at the beginning etc., these are clues) the next words after these clues are also Events. Rule4 for event nouns: Event noun can also appear like objects of aspectual and time related verbs, such as (e. g., began a hotel or carried out the test etc.) the non NE that appears after these expressions are also Events. Rule5 for token failure: Any of the token if it fails of the above rules that are treated like non-Events.
[045] Time Extraction from natural Language Text The information which associated with time is called temporal information; identifying and extracting temporal information from raw text is fundamental for language understanding. Extraction of temporal information is key to many applications like textual entailment, question answering, dialog systems and document summarization.
[046] Identification and extraction of time expression, and existing methods and systems are explained herein. The inventors present a system for extracting temporal expression and temporal events means the events which are associated with time expression. Majorly time expression in information available in two forms one is Quantitative and other is Qualitative form.
[047] Web is the main source for language processing applications there is a need to extract the temporal information from the given text to address the time related information. For example, in question answering and news data to address the time based queries like who was the President of India in the month of December 1950? Where the system needs to answer the question by searching the document set that to talk about the president from 1947 to 1952. In this context a temporal aware system will helps to the Q.A system needs to about the president during December 1950.
[048] Next in the field of medical domain to maintain the patient's record doctors maintains the information about patient's medical observations. The information which provided in the record need not bein chronological order, but if the information is ordered within a time lined structure that will help practitioner to understand the patient's medical history. Extracting temporal information will benefit to the other applications of NLP like legal domain, and other times based searches. But most of the earlier research works on temporal information processing was carried out on the data collected from the news domain because of the availability of large corpora and presence of temporal expressions.
[049] Implementation model for extraction of time expression In text time in two forms of time expressions one Quantitative time in these type tokens are directly expressed with time, another is qualitative time information consists semantic way of time representation.
[050] We implemented in two methods of implementations one is by writing the several pattern based rules by using Regex to identify quantitative, qualitative time. To get the semantics of Qualitative time expressions combined pattern based rules with specific holiday package for the existing one. Fig.2b explains the model for pattern based rules the major components are input text, normalization, events extraction and time extraction with rules.
i) Input text: Considers the text as an input in the form of a document or pool of documents for the work. ii) Text Normalization: Basic Pre-processing of the text. It involves tokenization which includes the lexical, Morphological, syntactic and formatting features. • Lexical Features: involves removal of unwanted symbols, stop word elimination to identify valid token. • Syntactic: basic syntax of a token is constructed with a syntactic chunk which the token belongs to with the help of the Parts Of Speech (POS) tagging can be implemented by using Stanford OpenNIpParser [], (e.g., I IPP for inside proper pronoun, VBD- verb in ing form). Formatting features: These are some set of flag indications where text needs to be formatted (Example, if the text all in is all Caps like "FRI" is all Dots "F.C.I", is all Digits like "2012" or initial Caps "March" or any of all these combinations). (iii)Times Extraction with Rules: The major component our implementation, here we have built patterns for time expression identification and extraction For extraction first need to identify the various types of time expressions from the available text, time may be available with different variation that can be a timestamp or duration. Some of the variant time expressions: (a) Types of Time Expressions: Time expressions can be date, time, duration or set frequency time, that are: • Date Expressions: A date expression refers to a point in time of the granularity day "e.g., March 30, 1980" or any other coarser granularity, like month "e.g., March 1980" or year "1980". Most of the date expressions can be calendric dates (e.g., "January 4") and other verbal expressions which can be mapped to calendar dates (e.g., this week, last month, next Friday, or this time etc.). • Time Expression: Time expression can refer to a point in time and time granularity smaller than day such as a part of a day (e.g., Friday morning) or time of a day (e.g., "5:50pm"). In other words TIME isused to represent specific time points within a day e.g., 4.05am or can be relative time 20 min ago etc. • Duration Expression: A duration expression provides information about the length of an interval. The amount of intervening time between the two end-points of a time interval. Examples for Duration expressions like "last two months onwards", "two hours". • Set Frequency or Range Expression: A set expression refers to the periodical aspect of an event, e.g., "every Friday, or thrice a day". Medical
Documents like discharge summaries have various frequency terms and most of them are represented by latinabbreviations such as tid (thrice a day)", \q4 h (every four hours)". • Implicit and Explicit Times: Most of the times classifications are described in the form of explicit and implicit form of representations in the above mentioned types of Date, Time expressions are said to be explicit form and the duration like points, intervals are called as implicit form representation. • Time exist in text any of the above mentioned types, for the extraction it is necessary to identify the type and its context. By using the above stated classification of time expressions in our work our pattern based rules are formed to extract the times from text explained in Sect. 4.
[051] Work Model for Time Extraction To extract time two popular methods are used for times extraction that are TIMEX3 Tag from TimeMarkup Language (TIMEML) and other using Java library SUTime.TimeML Timex3 is annotation based one the limitation is completely annotation based representation of time and only represent quantitative time expressions.
SuTimeis another method, it extracts quantitative and qualitative time expressions but semantics of time expressions like but specific time based holiday events (like Independence day, Mother's day etc.,), and durations are not recognized by SuTime.
In our work flow the limitations of SuTime and TimeML are addressed in two ways one is by using SuTime and other is and is without using SuTime. Using SuTime recognizes time expressions but semantic time or specific times are not handled. Without SuTime is our pattern rules are used for quantitative and qualitative times and semantic times also recognized by adding holiday package rules.
Eg: Temporal expression Type NormalizedDecember 3, 1884 Explicit time and type as a DATE 1884-12-03 from 1952 to 1962 RANGE 1952/1962.
Using SuTime: After the normalization task, in Rules component by usingSuTime we are extracting time in all formats and to overcome the limitation ofSuTime we are adding a specific holiday package to SuTime (explained samplerules of holiday package in algorithm).
Input: Set of normalized tokens Output: identified time expressions (Td) from text Terms:(Te: explicittime; Ti:implicittime; Th: holiday time) 1. Initialize RESULT Set as empty={} a. for each normalized token t in corpus do{ b. if ( t.timestamp € (Te) ){ map the TeRules for Quantitative times c. } else if (t.timestamp € (Ti) ) {/ map the Ti Rules for Qualitative times d. } else if (t.timestamp € (Th)) e. {/ map the ThRules for specific holiday times 2. }else if (Timestamp of token t 0 Te||Ti ll Th called as token td and a. td c(Tt,Da,Du,Co)) { i. / (Tt: times, Da:dates, Du:duration/ intervals, Co is composite time) ii. / Sample rules for Te, Ti and Th stated below in Note} b. } 3. RESULT td 4. } / the derived td can be mapped to any one of the rules 5. Add td to RESULT 6. } else non temporal word t# td
Without using SuTime: without SuTime handled quantitative and qualitative times by using pattem-based rules for recognition of all types of times including holidays package, the algorithm stated below: In the above algorithm token t maps to Te, Ti and Th maps the pattern rules and returns the time expressions. Date, time, duration and specific holiday time everything has to be written in the form of regular pattern of the defined categories.
Sample Rules TeRules: a token that matches the regular expression for quantitative times "years"
[ruleType:"token te ETe",Pattern:(/years?/), result: YEAR]: (Years : ([]+?[0
91+)
[ruleType:"token teETe", Pattern explicit:( /times?/ ), result: times} (Years:[1J+?[0-9]+) Ti-Rules: mapping from a token that matches the regular expression for "Qualitative times". ruleType: "token teiETi", pattern: (($num)! tol- / ($num)["- NODE)), result: Duration ($1,$2)} //can be extended to find the with $3 and uses Allen's algebra ThRules: mapping the token for specific holiday times
[ruleType:"tokente ETiAte ETi","filter",pattern ([{word:/Independencydaylspring|Good Friday |march| may/} & !{ tag:/NN */
])}
Rule-based Normalization: after completing the rules basic normalizations are required for the extraction of time. Rule normalization includes • Lexical Map: some basic mapping may need to map the names to numbers like units to ISO values etc. • Context Categorization: to classify and to determine the class for atime point is basic time unit or duration, forward or backward, any specific or generic reference, etc.
* Reference time / Anaphoric time: normalizing temporal tracking for timewhose values must be computed with respect to a reference time • Normalized Computation: combining the results of all of these steps to produce a final normalized value.
[052] Event -Time Relation Extraction Events are unclear in nature because of the exponential growth of information on the World Wide Web. Most of the information which is available may consist of lot of events some of them are fall within a time period some of them are not. The events with time constraints termed as temporal events, the process of automatic recognition and extraction of event related to time needs to identify event-time relation. Temporal event has become a probing research in Natural Language Processing (NLP). Extracting Event- Time relation is the basic component of knowledge for temporal information and useful for many applications like question answering system and temporal summarization etc.
Temporal Events are defined as actions that happen or occur at a particular time and place. The resultant temporal event extracted can be applied to various domains like news data, manuscripts, blogs, biological, legal etc. The current available information extraction system focuses on extracting static facts, encoding the extracted facts into some binary relations.
In real world context dynamic relations facts needto be extracted, dynamic relations which functions with time constraint is called fluent, E.g. President (Trump, USA) with temporal scope (16/05/2015-1/05/2020). When user wants to extract information of a particular period of times, it is observed to be difficult to extract the information from the given natural language text.
In information system, data available in various representations like semi-structured or unstructured form; most of the available Information Extraction (IE) systems extract events from semi-structured data or from closed domain. Temporal event identification and extraction for unstructured data could be beneficial for IE systems in various ways; these events could enhance the performance of personalized news systems, based on the user preferences or identified topics or concepts. Further,they can be useful in question answering (),risk analysis applications (),monitoring systems (), and decision making support tools.
For Event-Time Relation • If a timestamp not exist for every event/relation in a document means events and relations typically have temporal context, it may not be explicitly stated in a document. • Identifying the event or relation has at most onetime expression associated with it, otherwise treated as non-temporal event. • Each temporal expression can be linked to one or more events or relations. Since multiple events or relations may happen for a given time.
[053] Methods to Extract Event Time Relation: For the event-time relation as an initial step traditional rule based system were attempted with hand coded rules and also with low level techniques of NLP were used. The major difficulty with rules is that writing up the rules which is found to be critical to interpret all the aspects of natural language text. Rule based systems focuses only on syntactic features of the language where more amount of human interpretation is involved for analysing the text.
Next attempt is named as Data-Driven methods which uses syntactic and semantic features of the natural language text, but data driven methods need statistical measures to recognize and classify the events and times and need large amount of training corpus to build models to extract the event-time relation.
Hybrid method is one of the efficient methods in comparison with rule based and data driven. With the hybrid method the limitations of rule based and data driven methods have been addressed and the features of the both are combined to improve the performance of the system. To improve the performance of event time relation extraction it considers the syntactic nature of rule based methods and semantic features of data driven methods. Linguistic knowledge and human interpretability is involved for all the three methods.
[054] Model for Event-Time Relation Extraction Finding event time relationship is essential to address the events happen in time, and to know that that are related or not. Considering Event -Time relation exists in two forms one is Event-Time and Event-Time-Event i. Event - time: Identified events with the time expression. Eg: I am working on 10th December 2018. Events: working Time - 10th December 2018 ii. Event Time Event: event 1 and event2 are related with time expression. Eg: Bomb blast happened after the meeting. Events: blast, meeting Time relationship: After
[055] Fig.2c explains the model for event-time relation extracting modelwhere the major steps involved are: Extracting Events, Time module and the next module is Event-time Relation with graph.
Step: The text or the documents are considered as the input, event time graph is the output of the system. Basic pre-processing of NLP steps such as Tokenization, lemmatization, stemming, parts of speech tagging, and parsing and named entity recognition takes place for the input.
Thus, processed document will be given for extraction of events and times. After the extraction of events and times, the relation between events and times needs to be identified for the construction of the event-time graph.
Step2: Event Extraction is takes place after the basic NLP pre-processing, the syntax POS tagged tokens from the text will be the input for the event and time extraction component. In English text, events are derived using syntactic and semantic features. An approach is built where the nouns and verbs areextracted. At the initial stage, all the verbs are treated as events. In the next stage, the words that have the noun and verb tags are resolved from ambiguity to identify the nonverbal events with our Events Extraction algorithm explained in chapter 3, outputs the events from the given text.
Step3: Time Extraction Time in quantitative and qualitative forms handled by our model to extract time in various forms. Time model explained in chapter 4to recognize the quantitative time expression i.e., calendric times mentioned with specific date time in standard ISO format were directly recognized by using SuTime. Time expressions like "independence day", "Mother's day" are not captured by SuTime. To extract such temporal expressions, pattern-based rules are developed and integrated into the existing framework. Rules are developed for calendric holidays of INDIAN scenario explained in section 4.3.
Step4: Time related Events, after the extraction of events, Time relation between event and time need to identify. Event reference relation and time relation declares the final events and time expressions with the lexical, syntax and semantic features and that are useful for event-time graph construction from the extraction modules.
The rules to declare the events and time from the given text by the following rules: i) Suffix rule: In text Deverbal nouns are usually identified by the suffixes like '-tion', '- ion', ' -ing' and '- ed' etc. ii) Noun and Verb Rule: Noun and verb combinations are searched in the sentences of the test set. The non-NE noun word tokens are considered as events.
Step5: Event-Time Graph Model: In our work we are defining that the event- graph structure in labeled edges with events and times in a mixed graph manner, the vertices represent events, edges represent the relation between events with time expression. The graph represented as G in a notation of tuple that is V vertices, E edges, S directed edges, m is mapping and r is the relation that can be finally stated as G = (V, E, S, m, r), Event from 'e' can be assigned to a type (e.g., Occurrence, happening or Reporting) and consists of features of event and its arguments.
In this work, events are considered as relations and that denoted by 'e the 'r' is relations based on the following: i. Fact Feature of the events: in real-world events represents facts, Eg: in question answering for factual event is (e.g.: 'Who discovered sea route to India?'), and nonfactual event (e.g., 'Who did not win a prize in hackathon 2016?' or 'When might Clinton resign?'). In our work we didn't focus nonfactual events or hypothetical (e.g., 'he can win'), future (e.g., 'He will win'), negative (e.g., 'They did not win'), and event mentions ('If he had won'), Since we aim to represent factual event are real-world events which actually occurred, and that may important for the ordering of the events because the events can be pointed on a timeline. ii. Event features consisting of Token features: basic word, the stem, lemma and POS tag. iii. Context features: for tokens grammatical syntax verification and syntax relations of a token, the type of chunk (e.g., NP, VP, PP) and chunk's adjacent are identified in this.Features-Modifier: these are to differentiate the factual, non-factual category of events. > Event argument types: In extraction to get the robustness, we considered event arguments in four categories that like an agent, location, time and target. These types of arguments are suitable to answer four main wh-questions: 'where, and when, Who did what to whom,
Relations Type: Semantic may exist between events, temporal, causal with semantic relations also. > Event argument types: In extraction to get the robustness, we considered event arguments in four categories that like an agent, location, time and target. These types of arguments are suitable to answer four main who-questions: 'where, and when, who did what to whom. > Relations Type: Semantic may exist between events, temporal, causal with semantic relations also.
Step 6: Building Graph for Event-Time Relation In this event-time graph nodes are events, whereas edges represent the temporal and co- reference relations between events. Here we present extraction and representation of the relationship between events-time by representing with event time graph construction.
Graphs are used to model and represent the event time relations by building the event time graph. Events are represented as nodes whereas each node can be an individual point or it may connect with other nodes with relations. Nodes are real world events mentioned with event attributes. Event-Time graph is edge-labelled graph where edges represent the temporal relations between events attributes.
Rules are formed to extract relations and also avoid the step of disambiguation because the attributes of event argument types are not identified by using syntactic nature. Consider a simple example with different typed arguments like concept, location, and time or cause these all are syntactic way these are prepositional objects and event attributes. In this work major goal is to identify the concept as event, time, or it can be in the form of a relation between event and time.
Disambiguation rules are used to avoid ambiguous extraction pattern, these rules are based on the analysis of the development corpus preferring general rules over specific rules that can easily adapt to any other domains Here we discuss rules for semantic type of a prepositional object, for semantic types. Order specified for the rules that are listed below: • Given a token if it is in any form of the verb like VBD, VBN, VB etc,.then it is assigned to event type; or if a given word or token is a named entity of type is verb and noun it can be verified by the events extraction module and treats as events. Complete events extraction algorithm explained in our previous work (VGuda,Suresh 2018 ) by using that events extracted from natural language text. • If the given token is cardinal type identification by POS tagger, then is a number or time then by the module of time expression will verifies the given is a number or time. The detailed time expression extraction explained in pattern based rules of our earlier work. • In the given text the prepositional object is represented by time expression then the prepositions are like ,,before", ,,after", ,,during", then the token is considered to be of temporal type. • Temporal signals are such as before and near - are ambiguous and can have both temporal and location as spatial point. • If it is stated by a temporal preposition like,,,before",,after", ,,during", then the given argument type is a temporal event of the type. • Graph G can represent the relation between event and time to build the relations. Relations are based on 13 relations of Allen's interval algebra.
[056] Data sets and Evaluation of Event Time Relation For Events Extraction SemEval, TempEVal and MUC data sets are considered. Both SemEval and MUC's data and data sets are text documents all are not related to any field or domain. Metric used to measure the accuracy are F-measure which computed with relevance measures of precision and recall results are presented in results section.
For Time expression considered articles from Wikis. In this process three main categories taken that are news data, Warfare, and Celebrities to build a data for training process summing up all articles yields a total of 1800 sentences. To train purpose sampled 40 documents with 1600 sentences, within this test set there are total 268 events identified by Evita and total times are 148 hands coded manually the extracted times with the methods presented in results section.
In our work we used the corpus of Forum for Information Retrieval (FIRE-2018) Datasets contains various semi-structured documents (XML) consisting of events related to 11 categories of files. The categories emphasized are Accidents, Crime, Cyclone, Earthquake, Fire, Floods, Shootout, Storm, Suicide Attack, and Volcano. To measure the accuracy of the model uses metrics are 5fold cross-validation. Results are projected at results section.

Claims (10)

  1. CLAIMS I/We Claim 1. A temporal Question Answering (QA) system that uses an improved domain independent model/ method of extraction and representation of events, times, and event-time relations for increasing the performance, wherein the said system involves: a. a Question posed to the said QA system, the question is processed with a temporal layer, split complex questions into several simple questions; b. the said system processes the task by identifying a variety of components in the query for which answers can be extracted based on the question type, events, time features which it belongs to; c. answers obtained to these simple questions are integrated to erect a final answer for the given complex temporal query; d. the tasks performed in the architecture are independent of each other but collectively provide answer for a given temporal query; e. the said system collectively comprises of a means of question processing, a means of document processing, a means of answer fusion, and a means of answer extraction with temporal layer with the concept of divide and conquer strategy; f. the question processing means traces a query input for temporal expressions (TE) depending on features of events and times; with a next stage to categorize to simple questions and complex questions; and decomposition /split of a complex question is based on the identification of temporal signals, which link simple events to form complex questions; g. the document processing means pre-processes a collection of documents that are fed for the model evaluations and datasets description presented in a results section; h. the answer fusion means divides complex questions into smaller units as Q Focus and Q-Restriction obtained by an information processing engine; i. a re-composition unit carries out individual answer filtering and answer comparison based re-composition activities to obtain final answer for the given query; j. the answer extraction means entrusts a list of possible answers to the Q Focus and to the Q-restriction and forms the input of the individual answer filtering task; k. the said system selects only those answers that satisfy the temporal constraints obtained by the TE identification and normalization unit; 1. the said system sees the date of the answer should be temporally compatible with the temporal tag (i.e the date of the answer must lie within the date values of the tag otherwise it will be rejected); and m. the said system pursues the answers that fulfils the constraints go to the answer Re-Composition module.
  2. 2. The system according to claim 1, wherein the system once the answers have been filtered using the signals and the ordering key, the results for the Q-Focus are compared with the answer to the Q-Restriction in order to determine if they are temporally compatible.
  3. 3. The system according to claim 1, wherein the temporal signal of the said system establishes the appropriate order between the answers of Q-Focus and the Q Restriction analysing the temporal compatibility between the list of possible answers of Q-Focus and Q-Restriction answer, it constructs the appropriate answer to the complex question.
  4. 4. The system according to claim 1, wherein the QA system addresses the temporal queries which mostly are complex questions are split into simple questions based on the number of events present in the questions.
  5. 5. The system according to claim 1, wherein the QA performs obtaining keywords from the question based on the parts-of-speech, filtering out the sentences from the document, Ranking and sorting the relevant sentences and finally extracting the answer from the highest ranked sentences based upon the expected type of answer, analysed from the question.
  6. 6. A domain independent model/ method of extraction and representation of events, times, and event-time relations for increasing the performance of a temporal question answering system, comprising steps of: a. extracting, events with hand coded rules, and machine learning techniques as a composite/ hybrid way of extracting events; b. the hybrid way of extracting events considers plain text, pre-process the text to get valid tokens from the text (pre process involves elimination of stop words, lexical, morphological and syntactical features); c. the hybrid way of extracting events involves steps of: i. pre-processing task by a POS Tagger for tagging verbal entities as VB means verbs; ii. running algorithm rules to get the lexical features; iii. analysis of perform syntax and morphological with WordNet for identifying different senses of a word to extract the token that appears as both nouns and verbs which can be defined as event using composite rules; iv. running CRF based Stanford Named Entity (NE) tagger that will tag remaining unidentified events; v. running the composite rules to identify nonverbal events; and d. detecting, events which are in non-verbal form by combining hand coded rules with machine learning techniques to get the semantics but not the nonverbal events.
  7. 7. The method according to claim 6, wherein the event extraction with hand coded rules is based on an algorithm built with rules is formed by considering the events which are in the form of actions, activities, occurrences or states.
  8. 8. The method according to claim 6, wherein the event extraction with hand coded rules is based on properties like reporting, perception, state, occurrence and lexical features of 34 classes of POS tags.
  9. 9. The method according to claim 6, wherein the method extracts events from a given text it is important to know about the event features.
  10. 10. The method according to claim 6, wherein the system and method uses custom features to detect specific knowledge and features of an event it is required to check the associated time, places and participants of the event.
AU2021106681A 2021-08-24 2021-08-24 A model for event time extraction and its application to temporal question answering system Ceased AU2021106681A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021106681A AU2021106681A4 (en) 2021-08-24 2021-08-24 A model for event time extraction and its application to temporal question answering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021106681A AU2021106681A4 (en) 2021-08-24 2021-08-24 A model for event time extraction and its application to temporal question answering system

Publications (1)

Publication Number Publication Date
AU2021106681A4 true AU2021106681A4 (en) 2022-01-06

Family

ID=78958455

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021106681A Ceased AU2021106681A4 (en) 2021-08-24 2021-08-24 A model for event time extraction and its application to temporal question answering system

Country Status (1)

Country Link
AU (1) AU2021106681A4 (en)

Similar Documents

Publication Publication Date Title
Barros et al. NATSUM: Narrative abstractive summarization through cross-document timeline generation
US8000956B2 (en) Semantic compatibility checking for automatic correction and discovery of named entities
Faure et al. First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX
Chan et al. A text-based decision support system for financial sequence prediction
US8086557B2 (en) Method and system for retrieving statements of information sources and associating a factuality assessment to the statements
Gkotsis et al. Don’t let notes be misunderstood: A negation detection method for assessing risk of suicide in mental health records
US9542388B2 (en) Identifying unchecked criteria in unstructured and semi-structured data
UzZaman et al. Event and temporal expression extraction from raw text: First step towards a temporally aware system
Llorens et al. Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language
Dickinson Error detection and correction in annotated corpora
Subha et al. Quality factor assessment and text summarization of unambiguous natural language requirements
Müller Fully automatic resolution of'it','this', and'that'in unrestricted multi-party dialog
Al-Ayyoub et al. Framework for Affective News Analysis of Arabic News: 2014 Gaza Attacks Case Study.
Orekhov et al. Using Internet News Flows as Marketing Data Component.
AU2021106681A4 (en) A model for event time extraction and its application to temporal question answering system
Geierhos et al. Chapter III Guesswork? Resolving Vagueness in User-Generated Software Requirements
Aliane et al. Annotating events, time and place expressions in arabic texts
Mekki et al. Tokenization of Tunisian Arabic: a comparison between three Machine Learning models
Arnfield Enhanced Content-Based Fake News Detection Methods with Context-Labeled News Sources
DeVille et al. Text as Data: Computational Methods of Understanding Written Expression Using SAS
Lawrence Explainable argument mining
Paik CHronological information Extraction SyStem (CHESS)
Ngai et al. A knowledge-based approach for unsupervised Chinese coreference resolution
Alemany et al. Representing discourse for automatic text summarization via shallow NLP techniques
Moldovan et al. Role of semantics in question answering

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry