CN106156083A

CN106156083A - A kind of domain knowledge processing method and processing device

Info

Publication number: CN106156083A
Application number: CN201510150067.8A
Authority: CN
Inventors: 贾炜
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2015-03-31
Filing date: 2015-03-31
Publication date: 2016-11-23
Anticipated expiration: 2035-03-31
Also published as: CN106156083B

Abstract

The invention discloses a kind of domain knowledge processing method and processing device, described method includes: obtaining target text data, target text data include at least one domain knowledge；Based on default semantic description rule, target text data are resolved, obtain every domain knowledge in target text data two knowledge entities and between entity relationship；Two knowledge entities in every domain knowledge and entity relationship are combined, to generate the knowledge tlv triple of every domain knowledge；Knowledge based tlv triple, builds the structurized domain knowledge body that target text data are corresponding.Be different from prior art by the way of manual sorting, domain knowledge is carried out structuring process make inefficient situation, the present invention utilizes semantic parsing scheme to obtain the knowledge entity in domain knowledge and entity relationship and then composition structurized knowledge triple combination, and then construct domain knowledge body, thus improve the efficiency that domain knowledge structureization processes.

Description

A kind of domain knowledge processing method and processing device

Technical field

The present invention relates to data mining technology field, particularly to a kind of domain knowledge processing method and processing device.

Background technology

Domain knowledge is typically to exist in the form of text, generally uses the mode of manual sorting in prior art Text is carried out the making of data form, in order to represent structurized domain knowledge so that prior art In domain knowledge carried out has inefficient problem when structuring processes, therefore, needing one badly can Domain knowledge is carried out the scheme of efficient structuring process.

Summary of the invention

It is an object of the invention to, it is provided that a kind of domain knowledge processing method and processing device, existing in order to solve Domain knowledge is carried out by the way of manual sorting technology inefficient when structuring processes by technology ask Topic.

The invention provides a kind of domain knowledge processing method, including:

Obtaining target text data, described target text data include at least one domain knowledge；

Based on default semantic description rule, described target text data are resolved, obtains described mesh Mark text data in every described domain knowledge two knowledge entities and between entity relationship；

Two knowledge entities in every described domain knowledge and entity relationship are combined, every to generate The knowledge tlv triple of domain knowledge described in bar；

Based on described knowledge tlv triple, build the structurized domain knowledge that described target text data are corresponding Body.

Said method, it is preferred that described based on default semantic description rule, to described target text number According to resolving, obtain every described domain knowledge in described target text data two knowledge entities and Entity relationship between it, including:

Determine the target predicate of every domain knowledge in described target text data；

Each described target predicate is classified, obtains classification results；

Classification results based on each described target predicate, determines the knowledge of its each self-corresponding domain knowledge Entity and between entity relationship.

Said method, it is preferred that described classification results based on each described target predicate, determines that it is each The knowledge entity of self-corresponding domain knowledge and between entity relationship, including:

According to the classification results of each described target predicate, determine the text that each described classification results is corresponding Recognition template, the stereotype of described text identification template is corresponding with the classification results of described target predicate；

Based on described text identification template, the domain knowledge at each described target predicate place is carried out entity Analyze, obtain every described domain knowledge two knowledge entities and between entity relationship.

Said method, it is preferred that in obtaining described target text data the two of every described domain knowledge Individual knowledge entity and between entity relationship after, described method also includes:

Obtain in described target text data and know without the residue field of described text identification template analysis Know；

Described residue domain knowledge is carried out words and phrases parsing, is met the target domain of template generation rule Knowledge；

Based on described target domain knowledge, generate and belong to different from the described text identification template that there is currently The new text identification template of stereotype.

Said method, it is preferred that based on described target domain knowledge, generate described with there is currently After text identification template belongs to the new text identification template of different templates classification, described method also includes:

Utilize and be different from the domain knowledge text of described target text data to described new text identification template Carry out accuracy rate judgement, to reject the accuracy rate text identification template less than predetermined threshold value.

Said method, it is preferred that described based on described knowledge tlv triple, builds described target text data Corresponding structurized domain knowledge body, including:

All described knowledge tlv triple are normalized operation, obtain described target text data corresponding Structurized domain knowledge base；

Based on the entity relationship in described knowledge tlv triple each in described domain knowledge base, set up described neck The domain knowledge collection of illustrative plates that domain knowledge base is corresponding；

Described domain knowledge collection of illustrative plates is carried out the logical judgment of attribute, to construct described domain knowledge collection of illustrative plates Corresponding domain knowledge body.

Said method, it is preferred that based on described knowledge tlv triple, build described target text data pair Before the structurized domain knowledge body answered, described method also includes:

Obtain text context attribute values and the corpus of text property value of each described knowledge tlv triple；

Based on described text context attribute values and corpus of text property value, obtain each described knowledge tlv triple Accuracy rate；

When accuracy rate in each described knowledge tlv triple is in presetting accuracy rate value scope, perform described Based on described knowledge tlv triple, build the structurized domain knowledge body that described target text data are corresponding, Otherwise, delete its accuracy rate value and be not in the tlv triple of its each self-corresponding accuracy rate value scope, perform institute State based on described knowledge tlv triple, build structurized domain knowledge corresponding to described target text data originally Body.

Present invention also offers a kind of domain knowledge processing means, described device includes:

Data capture unit, is used for obtaining target text data, and described target text data include at least Article one, domain knowledge；

Data parsing unit, for based on default semantic description rule, entering described target text data Row resolves, obtain two knowledge entities of every described domain knowledge in described target text data and it Between entity relationship；

Tlv triple signal generating unit, for closing two knowledge entities in every described domain knowledge and entity System is combined, to generate the knowledge tlv triple of every described domain knowledge；

Ontological construction unit, for based on described knowledge tlv triple, builds described target text data corresponding Structurized domain knowledge body.

Said apparatus, it is preferred that described data parsing unit includes:

Predicate determines subelement, for determining the target meaning of every domain knowledge in described target text data Word；

Predicate Classification subelement, for classifying each described target predicate, obtains classification results；

Entity determines subelement, for classification results based on each described target predicate, determines that it is each The knowledge entity of corresponding domain knowledge and between entity relationship.

Said apparatus, it is preferred that described entity determines that subelement includes:

Template determines module, for classification results based on each described target predicate, determines each described The text identification template that classification results is corresponding, the stereotype of described text identification template is called with described target The classification results of word is corresponding；

Knowledge analysis module, for based on described text identification template, to each described target predicate place Domain knowledge carry out entity analysis, obtain every described domain knowledge two knowledge entities and between Entity relationship.

Said apparatus, it is preferred that also include:

Knowledge acquisition unit, is used for obtaining at described data parsing unit two of every described domain knowledge Knowledge entity and between entity relationship after, obtain in described target text data without described literary composition The residue domain knowledge that this recognition template is analyzed；

Words and phrases resolution unit, for described residue domain knowledge is carried out words and phrases parsing, is met template The target domain knowledge of create-rule；

Template generation unit, for based on described target domain knowledge, generates and described text identification template Belong to the text identification template of different templates classification.

Said apparatus, it is preferred that also include:

Template verification unit, knows with the described text that there is currently for generating at described template generation unit After other template belongs to the new text identification template of different templates classification, utilize and be different from described target literary composition The domain knowledge text of notebook data carries out accuracy rate judgement to described new text identification template, to reject standard Really rate is less than the text identification template of predetermined threshold value.

Said apparatus, it is preferred that described ontological construction unit includes:

Normalization operator unit, for all described knowledge tlv triple are normalized operation, obtains The structurized domain knowledge base that described target text data are corresponding；

Knowledge mapping sets up subelement, for based on described knowledge tlv triple each in described domain knowledge base In entity relationship, set up the domain knowledge collection of illustrative plates that described domain knowledge base is corresponding；

Collection of illustrative plates logical judgment subelement, for described domain knowledge collection of illustrative plates being carried out the logical judgment of attribute, The domain knowledge body corresponding to construct described domain knowledge collection of illustrative plates.

Said apparatus, it is preferred that also include:

Text attribute acquiring unit, in target text data pair described in described ontological construction cell formation Before the structurized domain knowledge body answered, obtain the text context attributes of each described knowledge tlv triple Value and corpus of text property value；

Accuracy rate acquiring unit, for based on described text context attribute values and corpus of text property value, obtains Taking the accuracy rate of each described knowledge tlv triple, the accuracy rate in each described knowledge tlv triple is in pre- If during accuracy rate value scope, trigger described ontological construction unit, otherwise, trigger tlv triple and delete unit；

Tlv triple deletes unit, is used for deleting its accuracy rate value and is not in its each self-corresponding accuracy rate value model The tlv triple enclosed, triggers described ontological construction unit.

From such scheme, a kind of domain knowledge processing method and processing device that the present invention provides, by right After target text data containing domain knowledge obtain, based on semantic description rule to target text Resolving, to obtain the knowledge entity in every domain knowledge and entity relationship, and then combination producing is every The knowledge tlv triple of bar domain knowledge, and then construct corresponding to target text data based on these tlv triple Domain knowledge body.It is different from prior art and by the way of manual sorting, domain knowledge is carried out The mode that structuring processes makes inefficient situation, and the present invention utilizes semantic parsing scheme to obtain field Knowledge entity in knowledge and entity relationship and then composition structurized knowledge triple combination, and then construct Domain knowledge body, thus improves the efficiency that domain knowledge structureization processes.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only embodiments of the invention, for those of ordinary skill in the art, not On the premise of paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.

The flow chart of a kind of domain knowledge processing method embodiment one that Fig. 1 provides for the present invention；

Fig. 2 to Fig. 5 is respectively the part stream of a kind of domain knowledge processing method embodiment two that the present invention provides Cheng Tu；

The partial process view of a kind of domain knowledge processing method embodiment three that Fig. 6 provides for the present invention；

The flow chart of a kind of domain knowledge processing method embodiment four that Fig. 7 provides for the present invention；

The structural representation of a kind of domain knowledge processing means embodiment five that Fig. 8 provides for the present invention；

Fig. 9 to Figure 12 is respectively the part of a kind of domain knowledge processing means embodiment six that the present invention provides Structural representation；

The part-structure signal of a kind of domain knowledge processing method embodiment seven that Figure 13 provides for the present invention Figure；

The structural representation of a kind of domain knowledge processing means embodiment eight that Figure 14 provides for the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the present invention, and It is not all, of embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under creative work premise, broadly fall into the scope of protection of the invention.

With reference to Fig. 1, for the flow chart of a kind of domain knowledge processing method embodiment one that the present invention provides, its In, described method may comprise steps of, to realize the process of the structuring to domain knowledge:

Step 101: obtaining target text data, described target text data include that at least one field is known Know.

Wherein, described domain knowledge is the data existed in the form of text, such as, described target text Data include that three domain knowledges, every described domain knowledge include subject, predicate (such as link-verb etc.) And three parts such as object, to express a semanteme.

Step 102: based on default semantic description rule, described target text data are resolved, In described target text data every described domain knowledge two knowledge entities and between entity close System.

Wherein, described semantic description rule it is to be understood that in daily use implication express the language commonly used Sentence describes result, the rule etc. corresponding to speech as daily in people custom.Based on these languages in the present embodiment Described target text data are resolved, to obtain in every described domain knowledge by justice description rule Two knowledge entities and between entity relationship, wherein, in described knowledge entity and described domain knowledge Subject or object etc. corresponding, described entity relationship is corresponding with described predicate etc., such as, described neck Domain knowledge: " mobile phone is a kind of smart machine ", wherein, " mobile phone " and " smart machine " is and knows Knowing entity, "Yes" is the entity relationship between two knowledge entities " mobile phone " and " smart machine ".

It should be noted that knowledge entity here can be conceptual entity, it is also possible to real for exemplary Body, in described domain knowledge, " mobile phone " is exemplary entity, and " smart machine " is conceptual reality Body, and entity relationship "Yes" shows the knowledge side belonging to this domain knowledge, as above the next knowledge side Face, accordingly, " mobile phone " is the bottom on this in the next knowledge side, and " smart machine " is on this Upper in the next knowledge side.For another example, in domain knowledge " vice general manager belongs to company leader group ", Its entity relationship " belongs to " the knowledge side showing the part entirety belonging to this domain knowledge, and entity is " secondary General manager " it is the part in this part entirety knowledge side, belong to exemplary entity；Entity " lead by company Lead group " it is the entirety in this part entirety knowledge side, belong to conceptual entity.

Step 103: two knowledge entities in every described domain knowledge and entity relationship are combined, To generate the knowledge tlv triple of every described domain knowledge.

Such as, by two knowledge entity " handss in described domain knowledge " mobile phone is a kind of smart machine " Machine " and " smart machine " and entity relationship "Yes" be combined, obtain structurized knowledge tlv triple " being /is (mobile phone, smart machine) ", in this knowledge tlv triple, " being /is " is expressed as knowledge entity Relation between " mobile phone " and " smart machine ".

Step 104: based on described knowledge tlv triple, builds corresponding structurized of described target text data Domain knowledge body.

Wherein, by hereinbefore controlling, described knowledge tlv triple is the structurized ternary comprising three entities Group, thus, in the present embodiment, based on these knowledge tlv triple to construct described target text data Corresponding domain knowledge body, this domain knowledge body is structuring body, and this domain knowledge body Corresponding with the domain knowledge in described target text data.

From such scheme, a kind of domain knowledge processing method embodiment one that the present invention provides, pass through After target text data containing domain knowledge are obtained, based on semantic description rule to target literary composition Originally resolve, to obtain the knowledge entity in every domain knowledge and entity relationship, and then combination producing The knowledge tlv triple of every domain knowledge, and then it is right to construct target text data institute based on these tlv triple The domain knowledge body answered.It is different from prior art by the way of manual sorting, domain knowledge to be entered The mode that row structuring processes makes inefficient situation, and the present embodiment utilizes semantic parsing scheme to obtain Knowledge entity in domain knowledge and entity relationship and then composition structurized knowledge triple combination, and then structure Build out domain knowledge body, thus improve the efficiency that domain knowledge structureization processes.

With reference to Fig. 2, for step described in a kind of domain knowledge processing method embodiment two that the present invention provides The flowchart of 102, wherein, described step 102 can be realized by following steps:

Step 121: determine the target predicate of every domain knowledge in described target text data.

Wherein, described target predicate can be verb, adjective etc., such as link-verb "Yes" or " being " Deng.Predicate in domain knowledge every described is determined by the present embodiment.

Step 122: each described target predicate is classified, obtains classification results.

It is to say, the present embodiment judges by described target predicate carries out classification, so that obtain can table Bright described target predicate belongs to the classification results of domain knowledge side, such as: upper the next, part entirety, genus Property, refer to relation, attitude relation, sequential relationship, position relationship, beginning event, change events and knot Bundle events etc., by identifying that neck belonging to it judged in the semanteme expressed by this target predicate in the present embodiment Domain knowledge side corresponding to domain knowledge, as above the knowledge side such as bottom or integral part.

Step 123: classification results based on each described target predicate, determines that its each self-corresponding field is known Know knowledge entity and between entity relationship.

Concrete, as shown in Figure 3, for the flowchart of described step 123, wherein, described step Rapid 123 can be realized by following steps:

Step 301: according to the classification results of each described target predicate, determine each described classification results pair The text identification template answered.

Wherein, the stereotype of described text identification template is corresponding with the classification results of described target predicate.

Step 302: based on described text identification template, the domain knowledge to each described target predicate place Carry out entity analysis, obtain every described domain knowledge two knowledge entities and between entity relationship.

It is to say, classification based on the domain knowledge side corresponding to this target predicate is come in the present embodiment It is determined to domain knowledge is carried out the text identification template of entity structure identification in text, such as, described Target predicate is "Yes", the next knowledge side in correspondence, now, determine with this on the next knowledge side Corresponding upper the next text identification template, and then utilize text recognition template to each described target predicate The domain knowledge at place carries out entity structure identification in text, to obtain two in each described domain knowledge Entity relationship between individual knowledge entity and the two knowledge entity.

In a particular application, described text identification template is the text identification template of finite number, therefore, Existing in described target text data to use the text identification template that there is currently to carry out field in it The identification of the text entities structure of knowledge, for solving this problem, real by following steps in the present embodiment The now acquisition to new text identification template, with reference to Fig. 4, for another part flow chart of the embodiment of the present invention, Wherein, after described step 102, described method can also comprise the following steps:

Step 105: obtain the residue without described text identification template analysis in described target text data Domain knowledge.

It is to say, in the present embodiment every time to the domain knowledge in text data based on the literary composition that there is currently After this recognition template processes, cannot carry out based on the text identification template that there is currently remaining The domain knowledge processed obtains, and using these residue domain knowledges as new text identification template Basic data.

Step 106: described residue domain knowledge is carried out words and phrases parsing, is met template generation rule Target domain knowledge.

Concrete, the statement in these residue domain knowledges is carried out by the present embodiment word frequency analysis, cluster, The machine learning such as Frequent episodes analysis process, to obtain producing the target of new text identification template Domain knowledge.

Step 107: based on described target domain knowledge, the described text identification template generating and there is currently Belong to the new text identification template of different templates classification.

Concrete, the statement in described target domain knowledge is carried out by the present embodiment center predicate study, The automatic learning manipulations of machine such as semantic concept mark, template contrast merging, and then obtain new text identification Template, these new text identification templates can be to the text identification template None-identified existed The domain knowledge of knowledge side out processes.

It addition, with reference to Fig. 5, for another part flow chart in the embodiment of the present invention, wherein, in described step After rapid 107, described method can also comprise the following steps:

Step 108: utilize and be different from the domain knowledge text of described target text data to described new text Recognition template carries out accuracy rate judgement, to reject the accuracy rate text identification template less than predetermined threshold value.

Concrete, the present embodiment can reacquire other texts being different from described target text data Data, these reacquire text data in equally contain a plurality of domain knowledge, utilize described newly Text identification template these domain knowledges reacquired are carried out field language material parsing, to obtain these In new text identification template, field language material resolves the higher literary composition being higher than predetermined threshold value such as accuracy rate of accuracy rate This recognition template, it may be assumed that reject the accuracy rate text identification template less than described threshold value, and then by accuracy rate Relatively higher higher than as described in the text identification template of predetermined threshold value be placed in template base.

With reference to Fig. 6, for step described in a kind of domain knowledge processing method embodiment three that the present invention provides The flowchart of 104, wherein, described step 104 can be realized by following steps:

Step 141: all described knowledge tlv triple are normalized operation, obtain described target text number According to corresponding structurized domain knowledge base.

Wherein, by these knowledge tlv triple are carried out tlv triple merging, concept normalization in the present embodiment Deng operation, to obtain domain knowledge base.Concrete, the normalization operation in the present embodiment can use system The mode of meter checking, gets final structurized domain knowledge base by the method for iterative computation.

Step 142: based on the entity relationship in described knowledge tlv triple each in described domain knowledge base, build The domain knowledge collection of illustrative plates that vertical described domain knowledge base is corresponding.

From hereinbefore, the entity relationship in described knowledge tlv triple is to reflect this knowledge ternary The corresponding knowledge side belonging to domain knowledge of group, as above the next knowledge side or the overall knowledge side of part Deng, therefore, based on the entity relationship of knowledge tlv triple i.e. every in described domain knowledge base in the present embodiment The domain knowledge figure set up corresponding to this domain knowledge base is drawn in knowledge side belonging to described domain knowledge Spectrum, can realize the structuring table of tlv triple with the form using node to be connected in described domain knowledge collection of illustrative plates Show, if the node in collection of illustrative plates is knowledge entity, and the connection correspondent entity relation between node.

Step 143: described domain knowledge collection of illustrative plates carries out the logical judgment of attribute, to construct described field The domain knowledge body that knowledge mapping is corresponding.

Concrete, by hereinbefore releasing, this domain knowledge collection of illustrative plates is to include described domain knowledge The knowledge side that in storehouse, every domain knowledge is corresponding, i.e. the connection between collection of illustrative plates interior joint is corresponding to entity Relation and then to should knowledge side belonging to domain knowledge, as above the next knowledge side or the overall knowledge of part Sides etc., therefore, by right to these knowledge side institutes in described domain knowledge collection of illustrative plates in the present embodiment The logical relation answered judges, knowledge entity and reality in tlv triple as corresponding in combined every field knowledge Body relation, distinguishes each knowledge entity attributes, such as concept or example, and then constructs described neck Domain knowledge body corresponding to domain knowledge collection of illustrative plates.It should be noted that in described domain knowledge body, Conceptual entity and exemplary entity are to discriminate between out.

In implementing, in order to improve the accuracy of the domain knowledge body finally given, need often The knowledge tlv triple of bar domain knowledge carries out the statistical testing of business cycles of accuracy rate, thus, with reference to Fig. 7, for the present invention The flow chart of a kind of domain knowledge processing method embodiment four provided, wherein, described step 104 it Before, described method can also comprise the following steps:

Step 109: obtain text context attribute values and the corpus of text attribute of each described knowledge tlv triple Value.

Wherein, described text context attribute values can be: the linguistic context integrity properties of described knowledge tlv triple Value, described corpus of text property value can include: the language material support property value of described knowledge tlv triple, Language material concordance property value and language material isolated degree property value etc..

Step 110: based on described text context attribute values and corpus of text property value, obtain each described in know Knowing the accuracy rate of tlv triple, the accuracy rate in each described knowledge tlv triple is in presetting accuracy rate value model When enclosing, perform described step 104, otherwise, perform step 111.

In the present embodiment, by the text context attribute values of each described knowledge tlv triple and described literary composition Whether this language material property value is in its each self-corresponding preset threshold range judges, and then draws this The accuracy rate of knowledge tlv triple.Such as, each statistics of each described knowledge tlv triple is referred to by the present embodiment Mark: as described in text context attribute values such as linguistic context integrity properties value and as described in corpus of text property value such as Language material support property value, language material concordance property value and language material isolated degree property values etc., carry out statistics and test Card, to obtain the accuracy rate of each described knowledge tlv triple, and accurate in each described knowledge tlv triple When rate is in the accuracy rate value scope of its correspondence, perform step 104, with based on described knowledge tlv triple, Build the structurized domain knowledge body that described target text data are corresponding, otherwise, perform step 111.

Step 111: delete its text context attribute values or corpus of text property value to be not in it each self-corresponding The tlv triple of threshold range, performs described step 104.

It is to say, its statistical indicator can be unsatisfactory for requiring (with described accuracy rate value scope by the present embodiment Corresponding) knowledge tlv triple delete, based on remaining knowledge tlv triple, to build described target The structurized domain knowledge body that text data is corresponding.

It should be noted that its statistical indicator can be met the knowledge tlv triple of requirement by this present embodiment First insert in formal knowledge base, using as subsequent operation data basis, as generate domain knowledge base, Set up domain knowledge collection of illustrative plates and build domain knowledge body etc..

It addition, the knowledge tlv triple of deletion can also be carried out further accuracy rate judgement by the present embodiment, And then, its accuracy rate is inserted superseded knowledge base less than presetting the knowledge tlv triple eliminating threshold value, will residue Knowledge tlv triple insert in alternative knowledge base.

It should be noted that by tlv triple being carried out statistical testing of business cycles, i.e. mentioned by the hereinbefore present invention The method obtaining tlv triple accuracy rate value, can to the new text identification template hereinbefore got just Really property is verified, such as, utilizes new text identification template to the domain knowledge in new text data After carrying out structuring process, it is possible to obtain the tlv triple corresponding to every field knowledge, by based on front Tlv triple is carried out the statistical testing of business cycles method of accuracy rate judgement by literary composition, these newly obtained tlv triple are carried out Statistical testing of business cycles, to obtain the accuracy rate of this newly obtained tlv triple, by judging this newly obtained tlv triple Accuracy rate judge the accuracy rate of this new text identification template, and then be such as higher than higher for accuracy rate The text identification template of described predetermined threshold value is placed in template base.

During it addition, the present embodiment carries out statistical testing of business cycles to the tlv triple corresponding to domain knowledge, can adopt The scheme being incremented by by iteration realizes, and such as, new text identification template can tested by the present embodiment After card, utilize the text identification template that accuracy rate is high to a collection of ternary acquired in new text data Combination carries out statistical testing of business cycles, and the result of statistical testing of business cycles result with front an iteration is merged, by terms of Calculate the accuracy rate of new tlv triple, then by new tlv triple and the tlv triple difference having occurred and that change It is written in formal knowledge base, alternative knowledge base or superseded knowledge base, in case subsequent operation.

With reference to Fig. 8, for the structural representation of a kind of domain knowledge processing means embodiment five that the present invention provides Figure, wherein, described device can be by following structure, to realize the process of the structuring to domain knowledge:

Data capture unit 801, is used for obtaining target text data, described target text data include to A few domain knowledge.

Data parsing unit 802, for based on default semantic description rule, to described target text data Resolve, obtain every described domain knowledge in described target text data two knowledge entities and Between entity relationship.

Tlv triple signal generating unit 803, for by two knowledge entities in every described domain knowledge and entity Relation is combined, to generate the knowledge tlv triple of every described domain knowledge.

Ontological construction unit 804, for based on described knowledge tlv triple, builds described target text data pair The structurized domain knowledge body answered.

From such scheme, a kind of domain knowledge processing means embodiment five that the present invention provides, pass through After target text data containing domain knowledge are obtained, based on semantic description rule to target literary composition Originally resolve, to obtain the knowledge entity in every domain knowledge and entity relationship, and then combination producing The knowledge tlv triple of every domain knowledge, and then it is right to construct target text data institute based on these tlv triple The domain knowledge body answered.It is different from prior art by the way of manual sorting, domain knowledge to be entered The mode that row structuring processes makes inefficient situation, and the present embodiment utilizes semantic parsing scheme to obtain Knowledge entity in domain knowledge and entity relationship and then composition structurized knowledge triple combination, and then structure Build out domain knowledge body, thus improve the efficiency that domain knowledge structureization processes.

With reference to Fig. 9, for data solution described in a kind of domain knowledge processing means embodiment six that the present invention provides The structural representation of analysis unit 802, wherein, described data parsing unit 802 can include following structure:

Predicate determines subelement 821, for determining the target of every domain knowledge in described target text data Predicate.

Predicate Classification subelement 822, for classifying each described target predicate, obtains classification results.

Entity determines subelement 823, for classification results based on each described target predicate, determines that it is each The knowledge entity of self-corresponding domain knowledge and between entity relationship.

Concrete, as shown in Figure 10, determine the structural representation of subelement 823 for described entity, its In, described entity determines that subelement 823 can be realized by following structure:

Template determines module 1001, for classification results based on each described target predicate, determines each The text identification template that described classification results is corresponding.

Knowledge analysis module 1002, for based on described text identification template, to each described target predicate The domain knowledge at place carries out entity analysis, obtain every described domain knowledge two knowledge entities and Between entity relationship.

In a particular application, described text identification template is the text identification template of finite number, therefore, Existing in described target text data to use the text identification template that there is currently to carry out field in it The identification of the text entities structure of knowledge, for solving this problem, real by following steps in the present embodiment The now acquisition to new text identification template, with reference to Figure 11, for another part structure of the embodiment of the present invention Schematic diagram, wherein, described device can also include following structure:

Knowledge acquisition unit 805, for obtaining every described domain knowledge at described data parsing unit 802 Two knowledge entities and between entity relationship after, obtain in described target text data without The residue domain knowledge of described text identification template analysis.

Words and phrases resolution unit 806, for described residue domain knowledge is carried out words and phrases parsing, is met mould The target domain knowledge of plate create-rule.

Concrete, the statement in these residue domain knowledges is carried out by the present embodiment word frequency analysis, cluster, Frequent episodes analyses etc. and study thereof process, to obtain producing the target of new text identification template Domain knowledge.

Template generation unit 807, for based on described target domain knowledge, generates and described text identification mould Plate belongs to the text identification template of different templates classification.

It addition, with reference to Figure 12, for another part structural representation in the embodiment of the present invention, wherein, institute State device and can also include following structure:

Template verification unit 808, described in generating at described template generation unit 807 and there is currently After text identification template belongs to the new text identification template of different templates classification, utilize described in being different from The domain knowledge text of target text data carries out accuracy rate judgement to described new text identification template, with Reject the accuracy rate text identification template less than predetermined threshold value.

With reference to Figure 13, for body described in a kind of domain knowledge processing method embodiment seven that the present invention provides The structural representation of construction unit 804, wherein, described ontological construction unit 804 can include following knot Structure realizes:

Normalization operator unit 841, for all described knowledge tlv triple are normalized operation, To the structurized domain knowledge base that described target text data are corresponding.

Knowledge mapping sets up subelement 842, for based on described knowledge ternary each in described domain knowledge base Entity relationship in group, sets up the domain knowledge collection of illustrative plates that described domain knowledge base is corresponding.

Collection of illustrative plates logical judgment subelement 843, sentences for the logic that described domain knowledge collection of illustrative plates is carried out attribute Disconnected, the domain knowledge body corresponding to construct described domain knowledge collection of illustrative plates.

In implementing, in order to improve the accuracy of the domain knowledge body finally given, need often The knowledge tlv triple of bar domain knowledge carries out the statistical testing of business cycles of accuracy rate, thus, with reference to Figure 14, for this The structural representation of a kind of domain knowledge processing means embodiment eight of bright offer, wherein, described device is also Can include following structure:

Text attribute acquiring unit 809, for building described target text at described ontological construction unit 804 Before the structurized domain knowledge body that data are corresponding, obtain the text language of each described knowledge tlv triple Border property value and corpus of text property value.

Accuracy rate acquiring unit 810, is used for based on described text context attribute values and corpus of text property value, Obtaining the accuracy rate of each described knowledge tlv triple, the accuracy rate in each described knowledge tlv triple is in When presetting accuracy rate value scope, trigger described ontological construction unit 804, otherwise, trigger tlv triple and delete single Unit 811.

In the present embodiment, by the text context attribute values of each described knowledge tlv triple and described literary composition Whether this language material property value is in its each self-corresponding preset threshold range judges, and then draws this The accuracy rate of knowledge tlv triple.Such as, each statistics of each described knowledge tlv triple is referred to by the present embodiment Mark: as described in text context attribute values such as linguistic context integrity properties value and as described in corpus of text property value such as Language material support property value, language material concordance property value and language material isolated degree property values etc., carry out statistics and test Card, to obtain the accuracy rate of each described knowledge tlv triple, and accurate in each described knowledge tlv triple When rate is in the accuracy rate value scope of its correspondence, trigger described ontological construction unit 804, with based on described Knowledge tlv triple, builds the structurized domain knowledge body that described target text data are corresponding, otherwise, Trigger described tlv triple and delete unit 811.

Tlv triple deletes unit 811, is used for deleting its accuracy rate value and is not in its each self-corresponding accuracy rate value The tlv triple of scope, triggers described ontological construction unit 804.

It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each What embodiment stressed is all the difference with other embodiments, identical similar between each embodiment Part see mutually.

Finally, in addition it is also necessary to explanation, term " include ", " comprising " or its any other variant Be intended to comprising of nonexcludability so that include the process of a series of key element, method, article or Person's equipment not only includes those key elements, but also includes other key elements being not expressly set out, or also Including the key element intrinsic for this process, method, article or equipment.In the feelings not having more restriction Under condition, statement " including ... " key element limited, it is not excluded that in the mistake including described key element Journey, method, article or equipment there is also other identical element.

Above a kind of domain knowledge processing method and processing device provided herein is described in detail, Principle and the embodiment of the application are set forth by specific case used herein, above example Explanation be only intended to help and understand the present processes and core concept thereof；Simultaneously for this area Those skilled in the art, according to the thought of the application, the most all have and change In place of change, in sum, this specification content should not be construed as the restriction to the application.

Claims

1. a domain knowledge processing method, including:

Method the most according to claim 1, it is characterised in that described based on default semantic description Described target text data are resolved by rule, obtain every described neck in described target text data Two knowledge entities of domain knowledge and between entity relationship, including:

Method the most according to claim 2, it is characterised in that described based on each described target meaning The classification results of word, determine its each self-corresponding domain knowledge knowledge entity and between entity relationship, Including:

Method the most according to claim 3, it is characterised in that obtaining described target text data In every described domain knowledge two knowledge entities and between entity relationship after, described method is also Including:

Method the most according to claim 4, it is characterised in that based on described target domain knowledge, The described text identification template generated and there is currently belongs to the new text identification template of different templates classification Afterwards, described method also includes:

Method the most according to claim 1, it is characterised in that described based on described knowledge tlv triple, Build the structurized domain knowledge body that described target text data are corresponding, including:

Method the most according to claim 1, it is characterised in that based on described knowledge tlv triple, Before building the structurized domain knowledge body that described target text data are corresponding, described method also includes:

8. a domain knowledge processing means, described device includes:

Device the most according to claim 8, it is characterised in that described data parsing unit includes:

Device the most according to claim 9, it is characterised in that described entity determines subelement bag Include:

11. devices according to claim 10, it is characterised in that also include:

12. devices according to claim 11, it is characterised in that also include:

13. devices according to claim 8, it is characterised in that described ontological construction unit includes:

14. devices according to claim 8, it is characterised in that also include: