CN103455638A - Behavior knowledge extracting method and device combining reasoning and semi-automatic learning - Google Patents

Behavior knowledge extracting method and device combining reasoning and semi-automatic learning Download PDF

Info

Publication number
CN103455638A
CN103455638A CN2013104522928A CN201310452292A CN103455638A CN 103455638 A CN103455638 A CN 103455638A CN 2013104522928 A CN2013104522928 A CN 2013104522928A CN 201310452292 A CN201310452292 A CN 201310452292A CN 103455638 A CN103455638 A CN 103455638A
Authority
CN
China
Prior art keywords
behavior
knowledge
template
candidate
behavior knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013104522928A
Other languages
Chinese (zh)
Inventor
毛文吉
曾大军
葛安生
孔庆超
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN2013104522928A priority Critical patent/CN103455638A/en
Publication of CN103455638A publication Critical patent/CN103455638A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a behavior knowledge extracting method and device combining reasoning and semi-automatic learning. Aiming at massive open source texts, a small amount of behavior knowledge extracting models and semantic relation among behavior knowledge are utilized to incrementally obtain behavior premises, behavior results and time sequence relations among behaviors from texts. The behavior premises, the behavior results and the time sequence relations among behaviors are respectively obtained on the basis of Bootstrapping, and on the basis of the semantic relation among behavior knowledge, knowledge reasoning is used in Bootstrapping for knowledge extracting. By the method, behavior knowledge extracting efficiency and quality are increased, automatic behavior modeling and analyzing aiming at massive texts in different fields can be achieved.

Description

A kind of knowledge extraction method of the behavior in conjunction with reasoning and semi-automatic study and device
Technical field
The invention belongs to the computer science and technology field, be specifically related to a kind ofly based on a small amount of initial behavior knowledge, extract template, in conjunction with behavior knowledge extraction method and the device of reasoning and semi-automatic study, for from the mass text increment obtain behavior knowledge.
Background technology
Behavior knowledge is the very important knowledge type of a class, in a plurality of fields that relate to behavior modeling, analysis and prediction, has very important application.Along with the development of Internet technology and universal, the mass text gathered has on the net also proposed severe technological challenge when to behavior knowledge acquisition work, bringing Data support.
Behavior knowledge extraction work in the past is general to be adopted based on supervised learning or the method based on manual rule, representative work comprises: Sil etc. (" Extracting action and event semantics from web text; " in AAAI Fall Symposium on Common-Sense Knowledge (AAAI-CSK), 2010) utilize support vector machine to extract behavior prerequisite and knowledge of result; Li etc. (" Automatic construction of domain theory for attack planning; " in2010IEEE International Conference on Intelligence and Security Informatics (IEEE-ISI), 2010, pp.65-70) utilize manual template to extract behavior prerequisite and result.Behavior knowledge extraction method in the past mainly has the following disadvantages: (1) needs the language material of a large amount of manual marks or the manual construction that places one's entire reliance upon to extract template, thereby causes efficiency lower; (2) only extract behavior prerequisite and knowledge of result, ignored the extraction to relation between behavior, particularly obtain the important behavior knowledge of this class of sequential relationship between behavior; (3) only extract separately every kind of behavior knowledge, can not utilize the semantic association between behavior knowledge to promote the mutual expansion between behavior knowledge not of the same race.
Summary of the invention
The technical problem to be solved in the present invention is: for the text of increasing income of magnanimity, use a small amount of behavior knowledge extract template and utilize the semantic association between behavior knowledge, increment ground obtains the three kinds of main behavior knowledge of sequential relationship between behavior prerequisite, behavior outcome and behavior from text.
For solving the problems of the technologies described above, the present invention proposes a kind of behavior knowledge extraction method, comprises the steps:
S1, utilize cooccurrence relation and semantic relevant information between template and behavior knowledge, statistical correlation degree between calculated candidate template and behavior Knowledge Set, candidate's knowledge and template set, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior Knowledge Set and template set according to described confidence level;
S2, utilize the semantic association between different types of behavior knowledge, by Method of Knowledge Reasoning, expand the behavior Knowledge Set;
S3, behavior knowledge is carried out to the knowledge refinement, mainly comprise and merge similar situation and remove the contradiction situation, the quality of extracting to improve behavior knowledge.
According to a kind of embodiment of the present invention, described step S1 comprises repeatedly iteration, and each iteration comprises that increment obtains template and increment and obtains these two of behavior knowledge step by step.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.
According to a kind of embodiment of the present invention, described increment obtains the as follows step by step of template:
S1.1, the behavior knowledge obtained based on last round of iteration obtain the candidate template collection from input text; Utilize the cooccurrence relation between current behavior Knowledge Set and candidate template to calculate its statistical correlation degree, and the semantic similarity between the template set that obtains of calculated candidate template and last round of iteration, and then obtain the confidence level of candidate template.
S1.2, candidate template is sorted from high to low by confidence level, chosen the template that a front k template obtains as the epicycle iteration.Template number and n that wherein k is last round of iteration tsum, n trefer to the template number that each iteration newly increases, value is determined by embodiment.
According to a kind of embodiment of the present invention, described increment obtains the as follows step by step of behavior knowledge:
S1.3, the template obtained based on the epicycle iteration obtain candidate's behavior Knowledge Set from input text; Utilize the cooccurrence relation between current template set and candidate's behavior knowledge to calculate its statistical correlation degree, and the semantic similarity between the behavior Knowledge Set that obtains of calculated candidate behavior knowledge and last round of iteration, and then obtain the confidence level of candidate's behavior knowledge.
S1.4, respectively three class behavior knowledge are sorted from high to low by confidence level, and chosen front k the behavior knowledge obtained as the epicycle iteration.K be last round of iteration every kind of behavior knowledge quantity and n wherein ksum, n krefer to every quantity that every kind of behavior knowledge of iteration newly increases of taking turns, value is determined by embodiment.
According to a kind of embodiment of the present invention, the confidence level of described template and behavior knowledge is defined as follows:
C i ( t ) = 1 max t ′ C i ( t ′ ) ( ( 1 - δ ) SA i ( t ) + δ SS i ( t ) )
C i ( k ) = 1 max k ′ C i ( k ′ ) ( ( 1 - δ ) SA i ( k ) + δ SS i ( k ) )
Wherein, C iand C (t) i(k) mean respectively candidate template t and the candidate's knowledge k confidence level when i wheel iteration, SA i() and SS i() means respectively candidate template or statistical correlation degree and the semantic similarity of knowledge when i wheel iteration, max t 'c i(t ') and maX k 'c i(k ') is respectively the maximal value of the confidence level of all templates and knowledge in the i wheel, δ is weight factor, its codomain be set as [0,1), when the δ value is 0, mean that confidence level calculating only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.
According to a kind of embodiment of the present invention, the formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and Knowledge Set, between candidate's behavior knowledge and template set:
SA i ( t ) = 1 max t ′ SA i ( t ′ ) Σ k ∈ K i - 1 PMI + ( k , t ) × C i - 1 ( k )
SA i ( k ) = 1 max k ′ SA i ( k ′ ) Σ t ∈ T i PMI + ( k , t ) × C i ( t )
In front, t means candidate template, K i-1mean the behavior Knowledge Set that the i-1 wheel obtains, C i-1(k) mean the confidence level of behavior knowledge k in i-1 wheel iteration; In rear formula, k means candidate's behavior knowledge, T ithe candidate template collection of epicycle iteration, C i(t) be the confidence level of template t in the epicycle iteration.
According to a kind of embodiment of the present invention, the template set T obtained in candidate template t and last round of iteration i-1between the formula that is calculated as follows of semantic similarity:
SS i ( t ) = 1 max t ′ SS i ( t ′ ) Σ e ∈ T i - 1 Sim ( t , e ) × C i - 1 ( e )
Wherein, Sim (t, e) means that template t and e are at similarity degree semantically;
The behavior Knowledge Set K obtained in candidate's behavior knowledge k and last round of iteration i-1between the formula that is calculated as follows of semantic similarity:
SS i ( k ) = 1 max k ′ SS i ( k ′ ) Σ e ∈ K i - 1 Sim ( k , e ) × C i - 1 ( e )
Wherein, Sim (k, e) means that behavior knowledge k and e are at similarity degree semantically.
According to a kind of embodiment of the present invention, in described step S2, behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite, behavior outcome and behavior.
Mutual inference method between described row knowledge:
Figure BDA0000387677830000041
Wherein, a 1and a 2the expression behavior, s means state, Effect (a 1, s) mean that s is a 1result, Precondition (a 2, s) mean that s is a 2prerequisite, Temporal-relation (a 1, a 2) expression a 1to occur in a 2behavior before.
According to a kind of embodiment of the present invention, the every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a 2, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result 1same a 2the behavior formed together is to (a 1, a 2) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a 1, a 2), if (a 1, s) be present in (or (a in results set 2, s) be present in the prerequisite set), by (a 2, s) add in the set of candidate's behavior prerequisite (or by (a 1, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior knowledge simultaneously.
According to a kind of embodiment of the present invention, in described step S3, the behavior, behavior prerequisite and the result that merge the redundancy that similar situation obtains pre-service are merged; Remove the contradiction situation for the every sequential relationship of taking turns between the behavior that iteration obtains of Bootstrapping step, remove the behavior pair of contradiction each other.
In addition, the present invention also provides a kind of behavior knowledge extraction element, comprise as lower module,
The first module, for utilizing cooccurrence relation and the semantic relevant information between template and behavior knowledge, the statistical correlation degree of calculated candidate behavior knowledge and template, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior knowledge and template according to described confidence level;
The second module, for utilizing the semantic association between different types of behavior knowledge, expand behavior knowledge by Method of Knowledge Reasoning;
The 3rd module, merge similar situation and remove the contradiction situation for the behavior knowledge that described the first module is obtained, and improves the quality that behavior knowledge is extracted.
Compared with prior art, the knowledge extraction method of the behavior in conjunction with reasoning and semi-automatic study that the present invention proposes and device are owing to having utilized statistical information and semantic information, and combine implicit expression behavior knowledge acquisition and the explicit behavior knowledge acquisition based on Text Information Extraction of knowledge-based inference, therefore, the validity and reliability extracted in behavior knowledge and be applicable to process existing method aspect extensive text and there is obvious advantage:
Based on a small amount of initial extraction template increment obtain a large amount of behavior knowledge, be applicable to extracting towards the behavior knowledge of mass text;
Knowledge reasoning and Bootstrapping technology are organically combined, obviously improved the performance that behavior knowledge is extracted;
Designed Bootstrapping step utilizes statistical correlation information and semantic analog information to estimate the confidence level of knowledge, can effectively improve the reliability that behavior knowledge is extracted.
The accompanying drawing explanation
Fig. 1 is the behavior knowledge extraction method process flow diagram that the present invention proposes.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and, with reference to accompanying drawing, the present invention is described in further detail.
Fig. 1 shows in the present invention the behavior knowledge extraction method process flow diagram in conjunction with reasoning and semi-automatic study.As shown in Figure 1, the method comprises the following steps:
S1, the Bootstrapping step based on statistical correlation degree and semantic similarity.
This step specifically refers to: utilize cooccurrence relation and semantic relevant information between template and behavior knowledge, statistical correlation degree between calculated candidate template and behavior Knowledge Set, between candidate's behavior knowledge and template set, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and candidate template, finally according to confidence level, obtain new behavior Knowledge Set and template set.
Described Bootstrapping step refers in statistical learning utilizes initial given a small amount of behavior template, by the process of iteration Stepwise Refinement result.Described template refers to for extracting the syntactic pattern of behavior knowledge, for example sentence " The terrorists use fertilizer to make explosives. " can mate the prerequisite template " need|use<Precondition > to<Verb ><Object ", thereby obtain prerequisite knowledge: " fertilizer " is the prerequisite of " make explosives ".Described cooccurrence relation refers to single template and the common situation about occurring of behavior knowledge, with non-negative some mutual information, measures (hereinafter can describe in detail).
Described semantic relevant information refers to the semantic hierarchies relation in semantic dictionary (as WordNet, synonym word woods etc.) according to two words, by calculating both semantic similarities, finally obtain between template and template (collection), the semantic similarity between behavior Knowledge and behavior knowledge (collection).The statistical correlation degree of described candidate's behavior knowledge and template can be weighed by non-negative some mutual information and corresponding confidence level between single behavior knowledge and template, between candidate's behavior Knowledge and behavior Knowledge Set, candidate template weighs by the semantic similarity between single behavior knowledge and template and corresponding confidence level with the semantic similarity between template set.
S2, behavior knowledge reasoning step.
This step is utilized the semantic association between different types of behavior knowledge, by Method of Knowledge Reasoning, expands the behavior Knowledge Set.Method of Knowledge Reasoning refers to according to existing behavior knowledge, the process of the behavior knowledge of utilizing the semantic association deduction between behavior knowledge to make new advances.
S3, behavior knowledge refinement step.
So-called " refinement " refers to and merges the behavior knowledge that the phase Sihe is removed contradiction.This step merges similar situation and removes the contradiction situation text pretreatment stage and the behavior knowledge obtained in the Bootstrapping step, improves the quality that behavior knowledge is extracted.Described text pre-service is before the Bootstrapping step, to utilize the natural language processing instrument to carry out participle, part-of-speech tagging and syntactic analysis to the magnanimity text of increasing income, and identifies the process of the behavior that state that noun phrase expresses and verb+object form express from the syntax analysis result.
Below introduce in detail above-mentioned each step.
S1, the Bootstrapping step based on statistical correlation degree and semantic similarity.
This step comprises repeatedly iteration, and the number of times of iteration can be determined according to concrete enforcement.Wherein, mainly comprise two step by step in the Bootstrapping step of iteration each time: increment obtains template and increment obtains behavior knowledge.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.
Increment obtains the as follows step by step of template:
S1.1, the behavior knowledge obtained based on last round of iteration obtain the candidate template collection from input text; Utilize the cooccurrence relation between current behavior Knowledge Set and candidate template to calculate its statistical correlation degree, and the semantic similarity between the template set that obtains of calculated candidate template and last round of iteration, and then obtain the confidence level of candidate template.
S1.2, candidate template is sorted from high to low by confidence level, chosen the template that a front k template obtains as the epicycle iteration.Template number and n that wherein k is last round of iteration tsum, n trefer to the template number that each iteration newly increases, value is determined by embodiment.
Increment obtain the process of behavior knowledge and step that above-mentioned increment obtains template similar, comprise:
S1.3, the template obtained based on the epicycle iteration obtain candidate's behavior Knowledge Set from input text; Utilize the cooccurrence relation between current template set and candidate's behavior knowledge to calculate its statistical correlation degree, and the semantic similarity between the behavior Knowledge Set that obtains of calculated candidate behavior knowledge and last round of iteration, and then obtain the confidence level of candidate's behavior knowledge.
S1.4, respectively three class behavior knowledge are sorted from high to low by confidence level, and chosen front k the behavior knowledge obtained as the epicycle iteration.K be last round of iteration every kind of behavior knowledge quantity and n wherein ksum, n krefer to every quantity that every kind of behavior knowledge of iteration newly increases of taking turns, value is determined by embodiment.
Described behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite knowledge, behavior outcome knowledge and behavior.
The computing method of the confidence level of candidate template and behavior knowledge are based on two category informations, i.e. statistical correlation degree (Statistical Association, SA) and semantic similarity (Semantic Similarity, SS).Particularly, the confidence level of template and behavior knowledge is defined as follows:
C i ( t ) = 1 max t &prime; C i ( t &prime; ) ( ( 1 - &delta; ) SA i ( t ) + &delta; SS i ( t ) ) - - - ( 1 )
C i ( k ) = 1 max k &prime; C i ( k &prime; ) ( ( 1 - &delta; ) SA i ( k ) + &delta; SS i ( k ) ) - - - ( 2 )
Here, C iand C (t) i(k) mean respectively candidate template t and the candidate's behavior knowledge k confidence level when i wheel iteration, SA i() and SS i() means respectively candidate template or statistical correlation degree and the semantic similarity of behavior knowledge when i wheel iteration, max t 'c i(t ') and max k 'c i(k ') is respectively the maximal value of the confidence level of all templates and behavior knowledge in the i wheel, for normalization.δ is weight factor, its codomain be set as [0,1), when the δ value is 0, meaning that confidence level is calculated only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.When initial, the confidence level of template is set as 1.
Below introduce respectively the statistical correlation degree of behavior knowledge and template and the computing method of semantic similarity.
(1) the statistical correlation degree calculates
The cooccurrence relation of the calculating of statistical correlation degree based between template and behavior knowledge, the relevance between tolerance candidate template and behavior Knowledge Set, candidate's behavior knowledge and template set.For calculating the statistical correlation degree between single behavior knowledge and single template, the present invention has designed non-negative some mutual information (Nonnegative Pointwise Mutual Information, PMI +):
PMI + ( k , t ) = log ( P ( k , t ) P ( k ) &times; P ( t ) + 1 ) - - - ( 3 )
Wherein, k means single behavior knowledge, and t means single template.Probability of occurrence when P (k), P (t) and P (k, t) mean respectively probability that knowledge k occurs, probability that template t occurs and behavior knowledge k and template t.Non-negative some mutual information PMI of the present invention's design +the value perseverance is non-negative, can prevent from obtaining the negative that absolute value is larger under conventional point mutual information (PMI) account form, to statistical certainty, calculates and brings impact.
Take turns iteration every, at first choose template, then utilize the template obtained to choose behavior knowledge, therefore when the statistical correlation of calculated candidate behavior knowledge is spent, template and confidence level thereof in the template set that can utilize the epicycle iteration to obtain; And, when the statistical correlation of calculated candidate template is spent, be knowledge and the confidence level thereof in the behavior Knowledge Set that utilizes last round of iteration to obtain.
The formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and behavior Knowledge Set, between candidate's behavior knowledge and template set:
SA i ( t ) = 1 max t &prime; SA i ( t &prime; ) &Sigma; k &Element; K i - 1 PMI + ( k , t ) &times; C i - 1 ( k ) - - - ( 4 )
SA i ( k ) = 1 max k &prime; SA i ( k &prime; ) &Sigma; t &Element; T i PMI + ( k , t ) &times; C i ( t ) - - - ( 5 )
In formula (4), t means candidate template, K i-1mean the behavior Knowledge Set that the i-1 wheel obtains, C i-1(k) mean the confidence level of behavior knowledge k in i-1 wheel iteration.In formula (5), k means candidate's behavior knowledge, T ithe candidate template collection of epicycle iteration, C i(t) be the confidence level of template t in the epicycle iteration.
(2) semantic similarity calculates
Semantic similarity between behavior knowledge and the calculating of the semantic similarity between template adopt similar thought: the semantic similarity that at first calculates word and word, and then the semantic similarity seen of the semantic similarity between the calculating behavior and state (comprising behavior prerequisite and behavior outcome), finally calculate between template and template (collection), the semantic similarity between behavior Knowledge and behavior knowledge (collection).
The present invention utilizes the semantic hierarchies relation in general semantics dictionary (as: WordNet, synonym word woods etc.) to calculate two word w 1and w 2between semantic similarity, concrete form is as follows:
Sim ( w 1 , w 2 ) = 1 D ( w 1 , w 2 ) + 1 - - - ( 6 )
D (w in above formula 1, w 2) be defined as word w 1with word w 2semantic distance in the general semantics dictionary: if w 1and w 2synonym, D (w 1, w 2)=0; If the two is set membership, D (w 1, w 2)=1, the rest may be inferred; If w 1and w 2there do not is hyponymy, D (w 1, w 2)=∞.
State s 1and s 2between semantic similarity be defined as s 1and s 2core noun n 1and n 2between semantic similarity Sim (n 1, n 2).Behavior a 1(verb v 1+ object o 1) and a 2(verb v 2+ object o 2) between semantic similarity by Sim (v 1, v 2) and Sim (o 1, o 2) product determine.
Single behavior knowledge k 1and k 2between semantic similarity calculate minute two kinds of situations: if behavior prerequisite and knowledge of result (i.e. the form of " behavior a-state s "), k 1and k 2between similarity by Sim (s 1, s 2) and Sim (a 1, a 2) product determine; If k 1and k 2sequential relationship between behavior (i.e. " behavior a 1-behavior a 2" form), k 1and k 2between semantic similarity be Sim (a 1, a 2).During semantic similarity between calculation template, at first check that whether the represented syntactic structure of two templates is consistent, if the syntactic structure of two templates is consistent, the semantic similarity of the two is defined as the product of the semantic similarity between the word of syntax tree same position; If the syntactic structure of two templates is inconsistent, the semantic similarity of the two is 0.
The calculating of the semantic similarity based between single behavior knowledge and template, according to the statistical correlation degree, calculating similar method, the template set T obtained in candidate template t and last round of iteration i-1between the formula that is calculated as follows of semantic similarity:
SS i ( t ) = 1 max t &prime; SS i ( t &prime; ) &Sigma; e &Element; T i - 1 Sim ( t , e ) &times; C i - 1 ( e ) - - - ( 7 )
Wherein, Sim (t, e) means that template t and e are at similarity degree semantically.Similarly, the behavior Knowledge Set K obtained in candidate's behavior knowledge k and last round of iteration i-1between the formula that is calculated as follows of semantic similarity:
SS i ( k ) = 1 max k &prime; SS i ( k &prime; ) &Sigma; e &Element; K i - 1 Sim ( k , e ) &times; C i - 1 ( e ) - - - ( 8 )
Wherein, Sim (k, e) means that behavior knowledge k and e are at similarity degree semantically.With the calculating difference of statistical correlation degree, be that the semantic similarity of candidate's behavior knowledge and template calculates all behavior knowledge and template sets based on obtaining in last round of iteration.
S2, behavior knowledge reasoning step.
The present invention utilizes the semantic association between behavior knowledge to obtain implicit behavior knowledge, often in automatic expansion Bootstrapping step takes turns the behavior Knowledge Set that iteration obtains.
Particularly, can utilize behavior prerequisite and knowledge of result to expand the sequential relationship set, utilize behavior prerequisite and sequential relationship knowledge to carry out the propagation behavior results set, and utilize behavior outcome and sequential relationship knowledge to carry out the set of propagation behavior prerequisite.Below the mutual inference method between behavior prerequisite, result and sequential relationship knowledge:
Figure BDA0000387677830000103
Wherein, a 1and a 2the expression behavior, s means state, Effect (a 1, s) mean that s is a 1result, Precondition (a 2, s) mean that s is a 2prerequisite, Temporal-relation (a 1, a 2) expression a 1to occur in a 2behavior before.The every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a 2, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result 1same a 2the behavior formed together is to (a 1, a 2) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a 1, a 2), if (a 1, s) be present in (or (a in results set 2, s) be present in the prerequisite set), by (a 2, s) add in the set of candidate's behavior prerequisite (or by (a 1, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior Knowledge Set simultaneously.
S3, behavior knowledge refinement step.
The refinement of behavior knowledge comprises the merging of similar situation and the removal of contradiction situation.
Wherein, merge similar situation and occur in the pretreatment stage to input text, mainly for behavior and state (comprising behavior prerequisite and result);
Remove the contradiction situation and be for every and take turns the behavior knowledge that iteration obtains, mainly for the sequential relationship between behavior.
Merge similar situation based on the general semantics dictionary, check two behavior a in the behavior set 1and a 2whether the verb of (being verb+object form) part or object part are synonym, if synonym each other merges this two behaviors; Similarly, the state in state set is merged.In the sequential relationship set of the removal inspection behavior of contradiction situation, whether exist behavior to (a simultaneously 1, a 2) and (a 2, a 1), if exist, remove (a simultaneously 1, a 2) and (a 2, a 1).
Below according to specific embodiment, further illustrate the technique scheme that the present invention proposes.
In this embodiment, using the Al-Qaeda terrorist organization's relevant online news report as input, input text by come from the epoch online, 26699 news web pages of BBC, USA Today, the New York Times, Guardian, Washington Post and Los Angeles Times form.For guaranteeing the quality of input text, the sentence of only reserved character length between 4 to 80 finally obtains 801570 sentences from input text.
At first these input texts are carried out to pre-service, based on the syntactic analysis result, generate initial behavior and state set, and respectively behavior collection and the state set obtained carried out to the knowledge refinement, remove wherein behavior and the state of redundancy.Then, set a small amount of initial behavior prerequisite and result and extract template, the confidence level of these original templates is set as to 1.The initial prerequisite of using in the present embodiment and template is as follows as a result:
The prerequisite template:
1.need|use<Precondition>to<Verb><Object>
2.have|possess<Precondition>need to<Verb><Object>
3.<Precondition>[that could|could]be used to|for|in<Verb><Object>
4.use<Precondition>to<Verb><Object>
5.can<Verb><Object>,use<Precondition>
6.be|to<Verb><Object>use<Precondition>
Template as a result:
1.<Verb><Object>[in order]to have<Effect>
2.cause|obtain<Effect>by<Verb><Obiect>
3.<Verb><Object>[,]cause|obtain<Effect>
4.<Effect>be caused|obtained by<Verb><Obiect>
When the first round, iteration started, due to also, without any behavior knowledge, first utilize original template extract every kind of behavior knowledge and calculate its confidence level from text.Set δ=0.5 in the present embodiment, often take turns the behavior knowledge quantity n that iteration newly increases kbe made as 5, the template number n newly increased tbe made as 1.When first round iteration finishes, the behavior knowledge and the confidence level thereof that get are as follows:
Prerequisite knowledge:
Figure BDA0000387677830000121
Knowledge of result:
Figure BDA0000387677830000122
The behavior prerequisite and the knowledge of result that according to first round iteration, obtain, utilize knowledge reasoning to obtain sequential relationship between behavior and corresponding confidence level as follows:
1.1aunch attack find haven 1.0
2.1aunch attack create haven 1.0
At first second takes turns iteration utilizes the behavior knowledge got in first round iteration to obtain new template from input text, calculates all candidate template confidence level of (comprising the template in the first round), and presses the reliability order of template.According to default n t, epicycle is more last round of newly increases a template.Second to take turns newly-increased each class template and confidence level thereof as follows:
Template:<Verb as a result ><Object >, put<Effect > and 1.0
Prerequisite template: be<Precondition > to<Verb ><Obiect > 1.0
Sequential template:<Verb2 ><Object2 > to<Verbl ><Objectl > 1.0
Then, the template of obtaining according to epicycle, in employing and the first round, behavior knowledge is extracted identical step, obtains new behavior knowledge from input text.So move in circles, until reach default iterations.After iteration finishes, the behavior knowledge and the confidence level thereof that finally get are as follows:
Prerequisite knowledge:
Figure BDA0000387677830000131
Knowledge of result:
Sequential relationship:
Figure BDA0000387677830000133
Figure BDA0000387677830000141
Based on the described input text of the present embodiment, the experimental results of the behavior knowledge extraction method that the present invention proposes following (wherein, iterations is made as 24 times, and the step-length of δ is 0.25, and comprise the inscience reasoning and in conjunction with the knowledge reasoning situation):
Weight factor The knowledge of result accuracy Prerequisite knowledge accuracy The sequential relationship accuracy
δ=0 (without reasoning) 0.533 0.817 /
δ=0 0.55 0.842 0.788
δ=0.25 0.575 0.842 0.805
δ=0.5 0.558 0.875 0.813
δ=0.75 0.542 0.808 0.743
The advantage of method proposed by the invention is as follows:
The present invention, only based on a small amount of initial extraction template, just can obtain a large amount of behavior knowledge increment, time saving and energy saving, is applicable to extracting towards the behavior knowledge of mass text;
The behavior knowledge extraction method of the present invention's design combines knowledge reasoning and Bootstrapping technology, has obviously improved the performance that behavior knowledge is extracted;
The Bootstrapping step that the present invention adopts has utilized statistical correlation and semantic analog information to estimate the confidence level of knowledge, can effectively improve the reliability that behavior knowledge is extracted.
Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (11)

1. a behavior knowledge extraction method, comprise the steps:
S1, utilize cooccurrence relation and semantic relevant information between template and behavior knowledge, statistical correlation degree between calculated candidate template and behavior Knowledge Set, candidate's knowledge and template set, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior Knowledge Set and template set according to described confidence level;
S2, utilize the semantic association between different types of behavior knowledge, by Method of Knowledge Reasoning, expand the behavior Knowledge Set;
S3, behavior knowledge is carried out to the knowledge refinement, mainly comprise and merge similar situation and remove the contradiction situation, the quality of extracting to improve behavior knowledge.
2. behavior knowledge extraction method as claimed in claim 1, it is characterized in that: described step S1 comprises repeatedly iteration, each iteration comprises that increment obtains template and increment and obtains these two of behavior knowledge step by step.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.
3. behavior knowledge extraction method as claimed in claim 2, it is characterized in that: described increment obtains the as follows step by step of template:
S1.1, the behavior knowledge obtained based on last round of iteration obtain the candidate template collection from input text; Utilize the cooccurrence relation between current behavior Knowledge Set and candidate template to calculate its statistical correlation degree, and the semantic similarity between the template set that obtains of calculated candidate template and last round of iteration, and then obtain the confidence level of candidate template.
S1.2, candidate template is sorted from high to low by confidence level, chosen the template that a front k template obtains as the epicycle iteration.Template number and n that wherein k is last round of iteration tsum, n trefer to the template number that each iteration newly increases, value is determined by embodiment.
4. behavior knowledge extraction method as claimed in claim 2, it is characterized in that: described increment obtains the as follows step by step of behavior knowledge:
S1.3, the template obtained based on the epicycle iteration obtain candidate's behavior Knowledge Set from input text; Utilize the cooccurrence relation between current template set and candidate's behavior knowledge to calculate its statistical correlation degree, and the semantic similarity between the behavior Knowledge Set that obtains of calculated candidate behavior knowledge and last round of iteration, and then obtain the confidence level of candidate's behavior knowledge.
S1.4, respectively three class behavior knowledge are sorted from high to low by confidence level, and chosen front k the behavior knowledge obtained as the epicycle iteration.K be last round of iteration every kind of behavior knowledge quantity and n wherein ksum, n krefer to every quantity that every kind of behavior knowledge of iteration newly increases of taking turns, value is determined by embodiment.
5. behavior knowledge extraction method as described as claim 3 or 4, it is characterized in that: the confidence level of described template and behavior knowledge is defined as follows:
C i ( t ) = 1 max t &prime; C i ( t &prime; ) ( ( 1 - &delta; ) SA i ( t ) + &delta; SS i ( t ) )
C i ( k ) = 1 max k &prime; C i ( k &prime; ) ( ( 1 - &delta; ) SA i ( k ) + &delta; SS i ( k ) )
Wherein, C iand C (t) i(k) mean respectively candidate template t and the candidate's knowledge k confidence level when i wheel iteration, SA i() and SS i() means respectively candidate template or statistical correlation degree and the semantic similarity of knowledge when i wheel iteration, max t 'ci (t ') and max k 'c i(k ') is respectively the maximal value of the confidence level of all templates and knowledge in the i wheel, δ is weight factor, its codomain be set as [0,1), when the δ value is 0, mean that confidence level calculating only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.
6. behavior knowledge extraction method as claimed in claim 5 is characterized in that: the formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and Knowledge Set, between candidate's behavior knowledge and template set:
SA i ( t ) = 1 max t &prime; SA i ( t &prime; ) &Sigma; k &Element; K i - 1 PMI + ( k , t ) &times; C i - 1 ( k )
SA i ( k ) = 1 max k &prime; SA i ( k &prime; ) &Sigma; t &Element; T i PMI + ( k , t ) &times; C i ( t )
In front, t means candidate template, K i-1mean the behavior Knowledge Set that the i-1 wheel obtains, C i-1(k) mean the confidence level of behavior knowledge k in i-1 wheel iteration; In rear formula, k means candidate's behavior knowledge, T ithe candidate template collection of epicycle iteration, C i(t) be the confidence level of template t in the epicycle iteration.
7. behavior knowledge extraction method as claimed in claim 5 is characterized in that:
The template set T obtained in candidate template t and last round of iteration i-lbetween the formula that is calculated as follows of semantic similarity:
SS i ( t ) = 1 max t &prime; SS i ( t &prime; ) &Sigma; e &Element; T i - 1 Sim ( t , e ) &times; C i - 1 ( e )
Wherein, Sim (t, e) means that template t and e are at similarity degree semantically;
The behavior Knowledge Set K obtained in candidate's behavior knowledge k and last round of iteration i-1between the formula that is calculated as follows of semantic similarity:
SS i ( k ) = 1 max k &prime; SS i ( k &prime; ) &Sigma; e &Element; K i - 1 Sim ( k , e ) &times; C i - 1 ( e )
Wherein, Sim (k, e) means that behavior knowledge k and e are at similarity degree semantically.
8. behavior knowledge extraction method as claimed in claim 1, it is characterized in that: in described step S2, behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite, behavior outcome and behavior.
Mutual inference method between described row knowledge:
Figure FDA0000387677820000033
Wherein, a 1and a 2the expression behavior, s means state, Effect (a 1, s) mean that s is a 1result, Precondition (a 2, s) mean that s is a 2prerequisite, Temporal-relation (a 1, a 2) expression a 1to occur in a 2behavior before.
9. behavior knowledge extraction method as claimed in claim 8, it is characterized in that: the every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a 2, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result 1same a 2the behavior formed together is to (a 1, a 2) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a 1, a 2), if (a 1, s) be present in (or (a in results set 2, s) be present in the prerequisite set), by (a 2, s) add in the set of candidate's behavior prerequisite (or by (a 1, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior knowledge simultaneously.
10. behavior knowledge extraction method as claimed in claim 1 is characterized in that: in described step S3, the behavior, behavior prerequisite and the result that merge the redundancy that similar situation obtains pre-service are merged; Remove the contradiction situation for the every sequential relationship of taking turns between the behavior that iteration obtains of Bootstrapping step, remove the behavior pair of contradiction each other.
11. a behavior knowledge extraction element, comprise as lower module,
The first module, for utilizing cooccurrence relation and the semantic relevant information between template and behavior knowledge, the statistical correlation degree of calculated candidate behavior knowledge and template, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior knowledge and template according to described confidence level;
The second module, for utilizing the semantic association between different types of behavior knowledge, expand behavior knowledge by Method of Knowledge Reasoning;
The 3rd module, merge similar situation and remove the contradiction situation for the behavior knowledge that described the first module is obtained, and improves the quality that behavior knowledge is extracted.
CN2013104522928A 2013-09-26 2013-09-26 Behavior knowledge extracting method and device combining reasoning and semi-automatic learning Pending CN103455638A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013104522928A CN103455638A (en) 2013-09-26 2013-09-26 Behavior knowledge extracting method and device combining reasoning and semi-automatic learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013104522928A CN103455638A (en) 2013-09-26 2013-09-26 Behavior knowledge extracting method and device combining reasoning and semi-automatic learning

Publications (1)

Publication Number Publication Date
CN103455638A true CN103455638A (en) 2013-12-18

Family

ID=49738001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013104522928A Pending CN103455638A (en) 2013-09-26 2013-09-26 Behavior knowledge extracting method and device combining reasoning and semi-automatic learning

Country Status (1)

Country Link
CN (1) CN103455638A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572982A (en) * 2014-12-31 2015-04-29 东软集团股份有限公司 Personalized recommendation method and system based on question guide
CN109615006A (en) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 Character recognition method and device, electronic equipment and storage medium
CN111401671A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Method and device for calculating derivative features in accurate marketing and readable storage medium
CN114492387A (en) * 2022-04-18 2022-05-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Domain self-adaptive aspect term extraction method and system based on syntactic structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000626A (en) * 2007-01-12 2007-07-18 宋晓伟 Information storing method and method for converting search inquiry into inquiry statement
US20090119649A1 (en) * 2007-11-02 2009-05-07 Klocwork Corp. Static analysis defect detection in the presence of virtual function calls

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000626A (en) * 2007-01-12 2007-07-18 宋晓伟 Information storing method and method for converting search inquiry into inquiry statement
US20090119649A1 (en) * 2007-11-02 2009-05-07 Klocwork Corp. Static analysis defect detection in the presence of virtual function calls

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANSHENG GE ET AL: "Action Knowledge Extraction from Web Text", 《2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572982A (en) * 2014-12-31 2015-04-29 东软集团股份有限公司 Personalized recommendation method and system based on question guide
CN104572982B (en) * 2014-12-31 2017-10-31 东软集团股份有限公司 Personalized recommendation method and system based on problem guiding
CN109615006A (en) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 Character recognition method and device, electronic equipment and storage medium
CN111401671A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Method and device for calculating derivative features in accurate marketing and readable storage medium
CN111401671B (en) * 2019-01-02 2023-11-21 中国移动通信有限公司研究院 Derived feature calculation method and device in accurate marketing and readable storage medium
CN114492387A (en) * 2022-04-18 2022-05-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Domain self-adaptive aspect term extraction method and system based on syntactic structure
CN114492387B (en) * 2022-04-18 2022-07-19 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Domain self-adaptive aspect term extraction method and system based on syntactic structure

Similar Documents

Publication Publication Date Title
Wang et al. Probabilistic tree-edit models with structured latent variables for textual entailment and question answering
Wang et al. Structure learning via parameter learning
Carlson et al. Coupling semi-supervised learning of categories and relations
Ru et al. Using semantic similarity to reduce wrong labels in distant supervision for relation extraction
CN103207860A (en) Method and device for extracting entity relationships of public sentiment events
Bonet-Jover et al. Exploiting discourse structure of traditional digital media to enhance automatic fake news detection
Jang et al. Metaphor detection in discourse
Ji et al. Data selection in semi-supervised learning for name tagging
CN103631858A (en) Science and technology project similarity calculation method
Nayak et al. Knowledge graph based automated generation of test cases in software engineering
Wang et al. Joint information extraction and reasoning: A scalable statistical relational learning approach
CN103455638A (en) Behavior knowledge extracting method and device combining reasoning and semi-automatic learning
Zhang et al. Stanford at TAC KBP 2016: Sealing Pipeline Leaks and Understanding Chinese.
Dung Natural language understanding
Musdholifah et al. FVEC feature and machine learning approach for Indonesian opinion mining on YouTube comments
CN117009213A (en) Metamorphic testing method and system for logic reasoning function of intelligent question-answering system
Chen et al. Semantic information extraction for improved word embeddings
Wu et al. ParsingPhrase: Parsing-based automated quality phrase mining
Nie et al. Measuring semantic similarity by contextualword connections in chinese news story segmentation
Munir et al. A comparison of topic modelling approaches for urdu text
CN103793491B (en) Chinese news story segmentation method based on flexible semantic similarity measurement
Lai et al. An unsupervised approach to discover media frames
Mathew et al. Paraphrase identification of Malayalam sentences-an experience
Fu et al. Research on Chinese Text Classification Based on Improved RNN
Shams et al. Intent Detection in Urdu Queries Using Fine-Tuned BERT Models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131218