CN103455638A

CN103455638A - Behavior knowledge extracting method and device combining reasoning and semi-automatic learning

Info

Publication number: CN103455638A
Application number: CN2013104522928A
Authority: CN
Inventors: 毛文吉; 曾大军; 葛安生; 孔庆超; 王磊
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2013-09-26
Filing date: 2013-09-26
Publication date: 2013-12-18

Abstract

The invention provides a behavior knowledge extracting method and device combining reasoning and semi-automatic learning. Aiming at massive open source texts, a small amount of behavior knowledge extracting models and semantic relation among behavior knowledge are utilized to incrementally obtain behavior premises, behavior results and time sequence relations among behaviors from texts. The behavior premises, the behavior results and the time sequence relations among behaviors are respectively obtained on the basis of Bootstrapping, and on the basis of the semantic relation among behavior knowledge, knowledge reasoning is used in Bootstrapping for knowledge extracting. By the method, behavior knowledge extracting efficiency and quality are increased, automatic behavior modeling and analyzing aiming at massive texts in different fields can be achieved.

Description

A kind of knowledge extraction method of the behavior in conjunction with reasoning and semi-automatic study and device

Technical field

The invention belongs to the computer science and technology field, be specifically related to a kind ofly based on a small amount of initial behavior knowledge, extract template, in conjunction with behavior knowledge extraction method and the device of reasoning and semi-automatic study, for from the mass text increment obtain behavior knowledge.

Background technology

Behavior knowledge is the very important knowledge type of a class, in a plurality of fields that relate to behavior modeling, analysis and prediction, has very important application.Along with the development of Internet technology and universal, the mass text gathered has on the net also proposed severe technological challenge when to behavior knowledge acquisition work, bringing Data support.

Behavior knowledge extraction work in the past is general to be adopted based on supervised learning or the method based on manual rule, representative work comprises: Sil etc. (" Extracting action and event semantics from web text; " in AAAI Fall Symposium on Common-Sense Knowledge (AAAI-CSK), 2010) utilize support vector machine to extract behavior prerequisite and knowledge of result; Li etc. (" Automatic construction of domain theory for attack planning; " in2010IEEE International Conference on Intelligence and Security Informatics (IEEE-ISI), 2010, pp.65-70) utilize manual template to extract behavior prerequisite and result.Behavior knowledge extraction method in the past mainly has the following disadvantages: (1) needs the language material of a large amount of manual marks or the manual construction that places one's entire reliance upon to extract template, thereby causes efficiency lower; (2) only extract behavior prerequisite and knowledge of result, ignored the extraction to relation between behavior, particularly obtain the important behavior knowledge of this class of sequential relationship between behavior; (3) only extract separately every kind of behavior knowledge, can not utilize the semantic association between behavior knowledge to promote the mutual expansion between behavior knowledge not of the same race.

Summary of the invention

The technical problem to be solved in the present invention is: for the text of increasing income of magnanimity, use a small amount of behavior knowledge extract template and utilize the semantic association between behavior knowledge, increment ground obtains the three kinds of main behavior knowledge of sequential relationship between behavior prerequisite, behavior outcome and behavior from text.

For solving the problems of the technologies described above, the present invention proposes a kind of behavior knowledge extraction method, comprises the steps:

S1, utilize cooccurrence relation and semantic relevant information between template and behavior knowledge, statistical correlation degree between calculated candidate template and behavior Knowledge Set, candidate's knowledge and template set, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior Knowledge Set and template set according to described confidence level;

S2, utilize the semantic association between different types of behavior knowledge, by Method of Knowledge Reasoning, expand the behavior Knowledge Set;

S3, behavior knowledge is carried out to the knowledge refinement, mainly comprise and merge similar situation and remove the contradiction situation, the quality of extracting to improve behavior knowledge.

According to a kind of embodiment of the present invention, described step S1 comprises repeatedly iteration, and each iteration comprises that increment obtains template and increment and obtains these two of behavior knowledge step by step.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.

According to a kind of embodiment of the present invention, described increment obtains the as follows step by step of template:

S1.1, the behavior knowledge obtained based on last round of iteration obtain the candidate template collection from input text; Utilize the cooccurrence relation between current behavior Knowledge Set and candidate template to calculate its statistical correlation degree, and the semantic similarity between the template set that obtains of calculated candidate template and last round of iteration, and then obtain the confidence level of candidate template.

S1.2, candidate template is sorted from high to low by confidence level, chosen the template that a front k template obtains as the epicycle iteration.Template number and n that wherein k is last round of iteration _tsum, n _trefer to the template number that each iteration newly increases, value is determined by embodiment.

According to a kind of embodiment of the present invention, described increment obtains the as follows step by step of behavior knowledge:

S1.3, the template obtained based on the epicycle iteration obtain candidate's behavior Knowledge Set from input text; Utilize the cooccurrence relation between current template set and candidate's behavior knowledge to calculate its statistical correlation degree, and the semantic similarity between the behavior Knowledge Set that obtains of calculated candidate behavior knowledge and last round of iteration, and then obtain the confidence level of candidate's behavior knowledge.

S1.4, respectively three class behavior knowledge are sorted from high to low by confidence level, and chosen front k the behavior knowledge obtained as the epicycle iteration.K be last round of iteration every kind of behavior knowledge quantity and n wherein _ksum, n _krefer to every quantity that every kind of behavior knowledge of iteration newly increases of taking turns, value is determined by embodiment.

According to a kind of embodiment of the present invention, the confidence level of described template and behavior knowledge is defined as follows:

C_{i} (t) = \frac{1}{\max_{t^{'}} C_{i} (t^{'})} ((1 - δ) {SA}_{i} (t) + δ {SS}_{i} (t))

C_{i} (k) = \frac{1}{\max_{k^{'}} C_{i} (k^{'})} ((1 - δ) {SA}_{i} (k) + δ {SS}_{i} (k))

Wherein, C _iand C (t) _i(k) mean respectively candidate template t and the candidate's knowledge k confidence level when i wheel iteration, SA _i() and SS _i() means respectively candidate template or statistical correlation degree and the semantic similarity of knowledge when i wheel iteration, max _{t '}c _i(t ') and maX _{k '}c _i(k ') is respectively the maximal value of the confidence level of all templates and knowledge in the i wheel, δ is weight factor, its codomain be set as [0,1), when the δ value is 0, mean that confidence level calculating only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.

According to a kind of embodiment of the present invention, the formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and Knowledge Set, between candidate's behavior knowledge and template set:

{SA}_{i} (t) = \frac{1}{\max_{t^{'}} {SA}_{i} (t^{'})} \underset{k &Element; K_{i - 1}}{Σ} {PMI}_{+} (k, t) \times C_{i - 1} (k)

{SA}_{i} (k) = \frac{1}{\max_{k^{'}} {SA}_{i} (k^{'})} \underset{t &Element; T_{i}}{Σ} {PMI}_{+} (k, t) \times C_{i} (t)

In front, t means candidate template, K _i-1mean the behavior Knowledge Set that the i-1 wheel obtains, C _i-1(k) mean the confidence level of behavior knowledge k in i-1 wheel iteration; In rear formula, k means candidate's behavior knowledge, T _ithe candidate template collection of epicycle iteration, C _i(t) be the confidence level of template t in the epicycle iteration.

According to a kind of embodiment of the present invention, the template set T obtained in candidate template t and last round of iteration _i-1between the formula that is calculated as follows of semantic similarity:

{SS}_{i} (t) = \frac{1}{\max_{t^{'}} {SS}_{i} (t^{'})} \underset{e &Element; T_{i - 1}}{Σ} Sim (t, e) \times C_{i - 1} (e)

Wherein, Sim (t, e) means that template t and e are at similarity degree semantically;

The behavior Knowledge Set K obtained in candidate's behavior knowledge k and last round of iteration _i-1between the formula that is calculated as follows of semantic similarity:

{SS}_{i} (k) = \frac{1}{\max_{k^{'}} {SS}_{i} (k^{'})} \underset{e &Element; K_{i - 1}}{Σ} Sim (k, e) \times C_{i - 1} (e)

Wherein, Sim (k, e) means that behavior knowledge k and e are at similarity degree semantically.

According to a kind of embodiment of the present invention, in described step S2, behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite, behavior outcome and behavior.

Mutual inference method between described row knowledge:

Wherein, a ₁and a ₂the expression behavior, s means state, Effect (a ₁, s) mean that s is a ₁result, Precondition (a ₂, s) mean that s is a ₂prerequisite, Temporal-relation (a ₁, a ₂) expression a ₁to occur in a ₂behavior before.

According to a kind of embodiment of the present invention, the every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a ₂, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result ₁same a ₂the behavior formed together is to (a ₁, a ₂) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a ₁, a ₂), if (a ₁, s) be present in (or (a in results set ₂, s) be present in the prerequisite set), by (a ₂, s) add in the set of candidate's behavior prerequisite (or by (a ₁, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior knowledge simultaneously.

According to a kind of embodiment of the present invention, in described step S3, the behavior, behavior prerequisite and the result that merge the redundancy that similar situation obtains pre-service are merged; Remove the contradiction situation for the every sequential relationship of taking turns between the behavior that iteration obtains of Bootstrapping step, remove the behavior pair of contradiction each other.

In addition, the present invention also provides a kind of behavior knowledge extraction element, comprise as lower module,

The first module, for utilizing cooccurrence relation and the semantic relevant information between template and behavior knowledge, the statistical correlation degree of calculated candidate behavior knowledge and template, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and template, and obtain new behavior knowledge and template according to described confidence level;

The second module, for utilizing the semantic association between different types of behavior knowledge, expand behavior knowledge by Method of Knowledge Reasoning;

The 3rd module, merge similar situation and remove the contradiction situation for the behavior knowledge that described the first module is obtained, and improves the quality that behavior knowledge is extracted.

Compared with prior art, the knowledge extraction method of the behavior in conjunction with reasoning and semi-automatic study that the present invention proposes and device are owing to having utilized statistical information and semantic information, and combine implicit expression behavior knowledge acquisition and the explicit behavior knowledge acquisition based on Text Information Extraction of knowledge-based inference, therefore, the validity and reliability extracted in behavior knowledge and be applicable to process existing method aspect extensive text and there is obvious advantage:

Based on a small amount of initial extraction template increment obtain a large amount of behavior knowledge, be applicable to extracting towards the behavior knowledge of mass text;

Knowledge reasoning and Bootstrapping technology are organically combined, obviously improved the performance that behavior knowledge is extracted;

Designed Bootstrapping step utilizes statistical correlation information and semantic analog information to estimate the confidence level of knowledge, can effectively improve the reliability that behavior knowledge is extracted.

The accompanying drawing explanation

Fig. 1 is the behavior knowledge extraction method process flow diagram that the present invention proposes.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and, with reference to accompanying drawing, the present invention is described in further detail.

Fig. 1 shows in the present invention the behavior knowledge extraction method process flow diagram in conjunction with reasoning and semi-automatic study.As shown in Figure 1, the method comprises the following steps:

S1, the Bootstrapping step based on statistical correlation degree and semantic similarity.

This step specifically refers to: utilize cooccurrence relation and semantic relevant information between template and behavior knowledge, statistical correlation degree between calculated candidate template and behavior Knowledge Set, between candidate's behavior knowledge and template set, and the semantic similarity between candidate's behavior Knowledge and behavior Knowledge Set, between candidate template and template set, and then the confidence level of calculated candidate behavior knowledge and candidate template, finally according to confidence level, obtain new behavior Knowledge Set and template set.

Described Bootstrapping step refers in statistical learning utilizes initial given a small amount of behavior template, by the process of iteration Stepwise Refinement result.Described template refers to for extracting the syntactic pattern of behavior knowledge, for example sentence " The terrorists use fertilizer to make explosives. " can mate the prerequisite template " need|use<Precondition > to<Verb ><Object ", thereby obtain prerequisite knowledge: " fertilizer " is the prerequisite of " make explosives ".Described cooccurrence relation refers to single template and the common situation about occurring of behavior knowledge, with non-negative some mutual information, measures (hereinafter can describe in detail).

Described semantic relevant information refers to the semantic hierarchies relation in semantic dictionary (as WordNet, synonym word woods etc.) according to two words, by calculating both semantic similarities, finally obtain between template and template (collection), the semantic similarity between behavior Knowledge and behavior knowledge (collection).The statistical correlation degree of described candidate's behavior knowledge and template can be weighed by non-negative some mutual information and corresponding confidence level between single behavior knowledge and template, between candidate's behavior Knowledge and behavior Knowledge Set, candidate template weighs by the semantic similarity between single behavior knowledge and template and corresponding confidence level with the semantic similarity between template set.

S2, behavior knowledge reasoning step.

This step is utilized the semantic association between different types of behavior knowledge, by Method of Knowledge Reasoning, expands the behavior Knowledge Set.Method of Knowledge Reasoning refers to according to existing behavior knowledge, the process of the behavior knowledge of utilizing the semantic association deduction between behavior knowledge to make new advances.

S3, behavior knowledge refinement step.

So-called " refinement " refers to and merges the behavior knowledge that the phase Sihe is removed contradiction.This step merges similar situation and removes the contradiction situation text pretreatment stage and the behavior knowledge obtained in the Bootstrapping step, improves the quality that behavior knowledge is extracted.Described text pre-service is before the Bootstrapping step, to utilize the natural language processing instrument to carry out participle, part-of-speech tagging and syntactic analysis to the magnanimity text of increasing income, and identifies the process of the behavior that state that noun phrase expresses and verb+object form express from the syntax analysis result.

Below introduce in detail above-mentioned each step.

This step comprises repeatedly iteration, and the number of times of iteration can be determined according to concrete enforcement.Wherein, mainly comprise two step by step in the Bootstrapping step of iteration each time: increment obtains template and increment obtains behavior knowledge.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.

Increment obtains the as follows step by step of template:

Increment obtain the process of behavior knowledge and step that above-mentioned increment obtains template similar, comprise:

Described behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite knowledge, behavior outcome knowledge and behavior.

The computing method of the confidence level of candidate template and behavior knowledge are based on two category informations, i.e. statistical correlation degree (Statistical Association, SA) and semantic similarity (Semantic Similarity, SS).Particularly, the confidence level of template and behavior knowledge is defined as follows:

C_{i} (t) = \frac{1}{\max_{t^{'}} C_{i} (t^{'})} ((1 - δ) {SA}_{i} (t) + δ {SS}_{i} (t)) - - - (1)

C_{i} (k) = \frac{1}{\max_{k^{'}} C_{i} (k^{'})} ((1 - δ) {SA}_{i} (k) + δ {SS}_{i} (k)) - - - (2)

Here, C _iand C (t) _i(k) mean respectively candidate template t and the candidate's behavior knowledge k confidence level when i wheel iteration, SA _i() and SS _i() means respectively candidate template or statistical correlation degree and the semantic similarity of behavior knowledge when i wheel iteration, max _{t '}c _i(t ') and max _{k '}c _i(k ') is respectively the maximal value of the confidence level of all templates and behavior knowledge in the i wheel, for normalization.δ is weight factor, its codomain be set as [0,1), when the δ value is 0, meaning that confidence level is calculated only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.When initial, the confidence level of template is set as 1.

Below introduce respectively the statistical correlation degree of behavior knowledge and template and the computing method of semantic similarity.

(1) the statistical correlation degree calculates

The cooccurrence relation of the calculating of statistical correlation degree based between template and behavior knowledge, the relevance between tolerance candidate template and behavior Knowledge Set, candidate's behavior knowledge and template set.For calculating the statistical correlation degree between single behavior knowledge and single template, the present invention has designed non-negative some mutual information (Nonnegative Pointwise Mutual Information, PMI ₊):

{PMI}_{+} (k, t) = \log (\frac{P (k, t)}{P (k) \times P (t)} + 1) - - - (3)

Wherein, k means single behavior knowledge, and t means single template.Probability of occurrence when P (k), P (t) and P (k, t) mean respectively probability that knowledge k occurs, probability that template t occurs and behavior knowledge k and template t.Non-negative some mutual information PMI of the present invention's design ₊the value perseverance is non-negative, can prevent from obtaining the negative that absolute value is larger under conventional point mutual information (PMI) account form, to statistical certainty, calculates and brings impact.

Take turns iteration every, at first choose template, then utilize the template obtained to choose behavior knowledge, therefore when the statistical correlation of calculated candidate behavior knowledge is spent, template and confidence level thereof in the template set that can utilize the epicycle iteration to obtain; And, when the statistical correlation of calculated candidate template is spent, be knowledge and the confidence level thereof in the behavior Knowledge Set that utilizes last round of iteration to obtain.

The formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and behavior Knowledge Set, between candidate's behavior knowledge and template set:

{SA}_{i} (t) = \frac{1}{\max_{t^{'}} {SA}_{i} (t^{'})} \underset{k &Element; K_{i - 1}}{Σ} {PMI}_{+} (k, t) \times C_{i - 1} (k) - - - (4)

{SA}_{i} (k) = \frac{1}{\max_{k^{'}} {SA}_{i} (k^{'})} \underset{t &Element; T_{i}}{Σ} {PMI}_{+} (k, t) \times C_{i} (t) - - - (5)

In formula (4), t means candidate template, K _i-1mean the behavior Knowledge Set that the i-1 wheel obtains, C _i-1(k) mean the confidence level of behavior knowledge k in i-1 wheel iteration.In formula (5), k means candidate's behavior knowledge, T _ithe candidate template collection of epicycle iteration, C _i(t) be the confidence level of template t in the epicycle iteration.

(2) semantic similarity calculates

Semantic similarity between behavior knowledge and the calculating of the semantic similarity between template adopt similar thought: the semantic similarity that at first calculates word and word, and then the semantic similarity seen of the semantic similarity between the calculating behavior and state (comprising behavior prerequisite and behavior outcome), finally calculate between template and template (collection), the semantic similarity between behavior Knowledge and behavior knowledge (collection).

The present invention utilizes the semantic hierarchies relation in general semantics dictionary (as: WordNet, synonym word woods etc.) to calculate two word w ₁and w ₂between semantic similarity, concrete form is as follows:

Sim (w_{1}, w_{2}) = \frac{1}{D (w_{1}, w_{2}) + 1} - - - (6)

D (w in above formula ₁, w ₂) be defined as word w ₁with word w ₂semantic distance in the general semantics dictionary: if w ₁and w ₂synonym, D (w ₁, w ₂)=0; If the two is set membership, D (w ₁, w ₂)=1, the rest may be inferred; If w ₁and w ₂there do not is hyponymy, D (w ₁, w ₂)=∞.

State s ₁and s ₂between semantic similarity be defined as s ₁and s ₂core noun n ₁and n ₂between semantic similarity Sim (n ₁, n ₂).Behavior a ₁(verb v ₁+ object o ₁) and a ₂(verb v ₂+ object o ₂) between semantic similarity by Sim (v ₁, v ₂) and Sim (o ₁, o ₂) product determine.

Single behavior knowledge k ₁and k ₂between semantic similarity calculate minute two kinds of situations: if behavior prerequisite and knowledge of result (i.e. the form of " behavior a-state s "), k ₁and k ₂between similarity by Sim (s ₁, s ₂) and Sim (a ₁, a ₂) product determine; If k ₁and k ₂sequential relationship between behavior (i.e. " behavior a ₁-behavior a ₂" form), k ₁and k ₂between semantic similarity be Sim (a ₁, a ₂).During semantic similarity between calculation template, at first check that whether the represented syntactic structure of two templates is consistent, if the syntactic structure of two templates is consistent, the semantic similarity of the two is defined as the product of the semantic similarity between the word of syntax tree same position; If the syntactic structure of two templates is inconsistent, the semantic similarity of the two is 0.

The calculating of the semantic similarity based between single behavior knowledge and template, according to the statistical correlation degree, calculating similar method, the template set T obtained in candidate template t and last round of iteration _i-1between the formula that is calculated as follows of semantic similarity:

{SS}_{i} (t) = \frac{1}{\max_{t^{'}} {SS}_{i} (t^{'})} \underset{e &Element; T_{i - 1}}{Σ} Sim (t, e) \times C_{i - 1} (e) - - - (7)

Wherein, Sim (t, e) means that template t and e are at similarity degree semantically.Similarly, the behavior Knowledge Set K obtained in candidate's behavior knowledge k and last round of iteration _i-1between the formula that is calculated as follows of semantic similarity:

{SS}_{i} (k) = \frac{1}{\max_{k^{'}} {SS}_{i} (k^{'})} \underset{e &Element; K_{i - 1}}{Σ} Sim (k, e) \times C_{i - 1} (e) - - - (8)

Wherein, Sim (k, e) means that behavior knowledge k and e are at similarity degree semantically.With the calculating difference of statistical correlation degree, be that the semantic similarity of candidate's behavior knowledge and template calculates all behavior knowledge and template sets based on obtaining in last round of iteration.

S2, behavior knowledge reasoning step.

The present invention utilizes the semantic association between behavior knowledge to obtain implicit behavior knowledge, often in automatic expansion Bootstrapping step takes turns the behavior Knowledge Set that iteration obtains.

Particularly, can utilize behavior prerequisite and knowledge of result to expand the sequential relationship set, utilize behavior prerequisite and sequential relationship knowledge to carry out the propagation behavior results set, and utilize behavior outcome and sequential relationship knowledge to carry out the set of propagation behavior prerequisite.Below the mutual inference method between behavior prerequisite, result and sequential relationship knowledge:

Wherein, a ₁and a ₂the expression behavior, s means state, Effect (a ₁, s) mean that s is a ₁result, Precondition (a ₂, s) mean that s is a ₂prerequisite, Temporal-relation (a ₁, a ₂) expression a ₁to occur in a ₂behavior before.The every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a ₂, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result ₁same a ₂the behavior formed together is to (a ₁, a ₂) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a ₁, a ₂), if (a ₁, s) be present in (or (a in results set ₂, s) be present in the prerequisite set), by (a ₂, s) add in the set of candidate's behavior prerequisite (or by (a ₁, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior Knowledge Set simultaneously.

S3, behavior knowledge refinement step.

The refinement of behavior knowledge comprises the merging of similar situation and the removal of contradiction situation.

Wherein, merge similar situation and occur in the pretreatment stage to input text, mainly for behavior and state (comprising behavior prerequisite and result);

Remove the contradiction situation and be for every and take turns the behavior knowledge that iteration obtains, mainly for the sequential relationship between behavior.

Merge similar situation based on the general semantics dictionary, check two behavior a in the behavior set ₁and a ₂whether the verb of (being verb+object form) part or object part are synonym, if synonym each other merges this two behaviors; Similarly, the state in state set is merged.In the sequential relationship set of the removal inspection behavior of contradiction situation, whether exist behavior to (a simultaneously ₁, a ₂) and (a ₂, a ₁), if exist, remove (a simultaneously ₁, a ₂) and (a ₂, a ₁).

Below according to specific embodiment, further illustrate the technique scheme that the present invention proposes.

In this embodiment, using the Al-Qaeda terrorist organization's relevant online news report as input, input text by come from the epoch online, 26699 news web pages of BBC, USA Today, the New York Times, Guardian, Washington Post and Los Angeles Times form.For guaranteeing the quality of input text, the sentence of only reserved character length between 4 to 80 finally obtains 801570 sentences from input text.

At first these input texts are carried out to pre-service, based on the syntactic analysis result, generate initial behavior and state set, and respectively behavior collection and the state set obtained carried out to the knowledge refinement, remove wherein behavior and the state of redundancy.Then, set a small amount of initial behavior prerequisite and result and extract template, the confidence level of these original templates is set as to 1.The initial prerequisite of using in the present embodiment and template is as follows as a result:

The prerequisite template:

1.need|use<Precondition>to<Verb><Object>

2.have|possess<Precondition>need to<Verb><Object>

3.<Precondition>[that could|could]be used to|for|in<Verb><Object>

4.use<Precondition>to<Verb><Object>

5.can<Verb><Object>，use<Precondition>

6.be|to<Verb><Object>use<Precondition>

Template as a result:

1.<Verb><Object>[in order]to have<Effect>

2.cause|obtain<Effect>by<Verb><Obiect>

3.<Verb><Object>[，]cause|obtain<Effect>

4.<Effect>be caused|obtained by<Verb><Obiect>

When the first round, iteration started, due to also, without any behavior knowledge, first utilize original template extract every kind of behavior knowledge and calculate its confidence level from text.Set δ=0.5 in the present embodiment, often take turns the behavior knowledge quantity n that iteration newly increases _kbe made as 5, the template number n newly increased _tbe made as 1.When first round iteration finishes, the behavior knowledge and the confidence level thereof that get are as follows:

Prerequisite knowledge:

Knowledge of result:

The behavior prerequisite and the knowledge of result that according to first round iteration, obtain, utilize knowledge reasoning to obtain sequential relationship between behavior and corresponding confidence level as follows:

1.1aunch attack find haven 1.0

2.1aunch attack create haven 1.0

At first second takes turns iteration utilizes the behavior knowledge got in first round iteration to obtain new template from input text, calculates all candidate template confidence level of (comprising the template in the first round), and presses the reliability order of template.According to default n _t, epicycle is more last round of newly increases a template.Second to take turns newly-increased each class template and confidence level thereof as follows:

Template:<Verb as a result ><Object >, put<Effect > and 1.0

Prerequisite template: be<Precondition > to<Verb ><Obiect > 1.0

Sequential template:<Verb2 ><Object2 > to<Verbl ><Objectl > 1.0

Then, the template of obtaining according to epicycle, in employing and the first round, behavior knowledge is extracted identical step, obtains new behavior knowledge from input text.So move in circles, until reach default iterations.After iteration finishes, the behavior knowledge and the confidence level thereof that finally get are as follows:

Prerequisite knowledge:

Knowledge of result:

Sequential relationship:

Based on the described input text of the present embodiment, the experimental results of the behavior knowledge extraction method that the present invention proposes following (wherein, iterations is made as 24 times, and the step-length of δ is 0.25, and comprise the inscience reasoning and in conjunction with the knowledge reasoning situation):

Weight factor	The knowledge of result accuracy	Prerequisite knowledge accuracy	The sequential relationship accuracy
				δ=0 (without reasoning)	0.533	0.817	/
δ=0	0.55	0.842	0.788
				δ=0.25	0.575	0.842	0.805
δ=0.5	0.558	0.875	0.813
				δ=0.75	0.542	0.808	0.743

The advantage of method proposed by the invention is as follows:

The present invention, only based on a small amount of initial extraction template, just can obtain a large amount of behavior knowledge increment, time saving and energy saving, is applicable to extracting towards the behavior knowledge of mass text;

The behavior knowledge extraction method of the present invention's design combines knowledge reasoning and Bootstrapping technology, has obviously improved the performance that behavior knowledge is extracted;

The Bootstrapping step that the present invention adopts has utilized statistical correlation and semantic analog information to estimate the confidence level of knowledge, can effectively improve the reliability that behavior knowledge is extracted.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a behavior knowledge extraction method, comprise the steps:

2. behavior knowledge extraction method as claimed in claim 1, it is characterized in that: described step S1 comprises repeatedly iteration, each iteration comprises that increment obtains template and increment and obtains these two of behavior knowledge step by step.Increment refers to the carrying out along with iteration, and each is taken turns and obtains than last round of more template and behavior knowledge.

3. behavior knowledge extraction method as claimed in claim 2, it is characterized in that: described increment obtains the as follows step by step of template:

4. behavior knowledge extraction method as claimed in claim 2, it is characterized in that: described increment obtains the as follows step by step of behavior knowledge:

5. behavior knowledge extraction method as described as claim 3 or 4, it is characterized in that: the confidence level of described template and behavior knowledge is defined as follows:

C_{i} (t) = \frac{1}{\max_{t^{'}} C_{i} (t^{'})} ((1 - δ) {SA}_{i} (t) + δ {SS}_{i} (t))

C_{i} (k) = \frac{1}{\max_{k^{'}} C_{i} (k^{'})} ((1 - δ) {SA}_{i} (k) + δ {SS}_{i} (k))

Wherein, C _iand C (t) _i(k) mean respectively candidate template t and the candidate's knowledge k confidence level when i wheel iteration, SA _i() and SS _i() means respectively candidate template or statistical correlation degree and the semantic similarity of knowledge when i wheel iteration, max _{t '}ci (t ') and max _{k '}c _i(k ') is respectively the maximal value of the confidence level of all templates and knowledge in the i wheel, δ is weight factor, its codomain be set as [0,1), when the δ value is 0, mean that confidence level calculating only carrys out the reliability of evaluate candidate behavior knowledge and template with the statistical correlation degree.

6. behavior knowledge extraction method as claimed in claim 5 is characterized in that: the formula that is calculated as follows of the statistical correlation degree in i wheel iteration between candidate template and Knowledge Set, between candidate's behavior knowledge and template set:

{SA}_{i} (t) = \frac{1}{\max_{t^{'}} {SA}_{i} (t^{'})} \underset{k &Element; K_{i - 1}}{Σ} {PMI}_{+} (k, t) \times C_{i - 1} (k)

{SA}_{i} (k) = \frac{1}{\max_{k^{'}} {SA}_{i} (k^{'})} \underset{t &Element; T_{i}}{Σ} {PMI}_{+} (k, t) \times C_{i} (t)

7. behavior knowledge extraction method as claimed in claim 5 is characterized in that:

The template set T obtained in candidate template t and last round of iteration _i-lbetween the formula that is calculated as follows of semantic similarity:

{SS}_{i} (t) = \frac{1}{\max_{t^{'}} {SS}_{i} (t^{'})} \underset{e &Element; T_{i - 1}}{Σ} Sim (t, e) \times C_{i - 1} (e)

{SS}_{i} (k) = \frac{1}{\max_{k^{'}} {SS}_{i} (k^{'})} \underset{e &Element; K_{i - 1}}{Σ} Sim (k, e) \times C_{i - 1} (e)

8. behavior knowledge extraction method as claimed in claim 1, it is characterized in that: in described step S2, behavior knowledge comprises three kinds, refers to respectively the sequential relationship knowledge between behavior prerequisite, behavior outcome and behavior.

Mutual inference method between described row knowledge:

9. behavior knowledge extraction method as claimed in claim 8, it is characterized in that: the every wheel after iteration finishes, on the basis of the behavior prerequisite of obtaining in epicycle, result and sequential relationship set, expand in accordance with the following steps the three behaviors knowledge collection: at first, to each behavior prerequisite knowledge (a ₂, s), check whether state s is present in results set, if exist, each be take to the behavior a that s is result ₁same a ₂the behavior formed together is to (a ₁, a ₂) add in the set of candidate's sequential relationship; Secondly, each behavior in inspection sequential knowledge collection is to (a ₁, a ₂), if (a ₁, s) be present in (or (a in results set ₂, s) be present in the prerequisite set), by (a ₂, s) add in the set of candidate's behavior prerequisite (or by (a ₁, s) add in candidate's behavior results set); Finally, for each the behavior knowledge k in candidate's behavior prerequisite, result and sequential relationship set, if k also is based on candidate's behavior knowledge that the statistical correlation degree obtains, the confidence level of k is made as to 1, and k is added in corresponding behavior knowledge simultaneously.

10. behavior knowledge extraction method as claimed in claim 1 is characterized in that: in described step S3, the behavior, behavior prerequisite and the result that merge the redundancy that similar situation obtains pre-service are merged; Remove the contradiction situation for the every sequential relationship of taking turns between the behavior that iteration obtains of Bootstrapping step, remove the behavior pair of contradiction each other.

11. a behavior knowledge extraction element, comprise as lower module,