CN106650940A - Field knowledge base establishment method and device - Google Patents
Field knowledge base establishment method and device Download PDFInfo
- Publication number
- CN106650940A CN106650940A CN201611220184.8A CN201611220184A CN106650940A CN 106650940 A CN106650940 A CN 106650940A CN 201611220184 A CN201611220184 A CN 201611220184A CN 106650940 A CN106650940 A CN 106650940A
- Authority
- CN
- China
- Prior art keywords
- concept
- core
- similarity
- key
- core concept
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Abstract
The invention provides a field knowledge base establishment method and device. After a core concept in a current to-be-established field and a target text in which the core concept is located are obtained, at least one non-core concept can be obtained from the target text and the similarity between the core concept and the non-core concept can be obtained; when the similarity satisfies a preset condition, whether the non-core concept is the same to the concept existing in a field knowledge base of the to-be-established field or not is judged; if the non-core concept is different from the concept existing in the field knowledge base of the to-be-established field; the non-core concept is reserved in the field knowledge base of the to-be-established field; the non-core concept is taken as a new core concept, a target text in which the new core concept is located is obtained; the step of obtaining the at least one non-core concept form the target text is carried out again; and after all concepts in the knowledge base of the to-be-established field are obtained, a relation between any two concepts is obtained, so the field knowledge base of the to-be-established field is obtained, and the knowledge base of the to-be-established field is established automatically.
Description
Technical field
The invention belongs to technical field of information processing, in particular, more particularly to a kind of domain knowledge base construction method
And device.
Background technology
Domain knowledge base is the set of the relation between concept included in field and concept, and wherein concept is belonging to it
The knowledge in field, can indicate, the relation between concept is the similarity degree between concept, can be passed through by the entry in field
Representing, such as the domain knowledge base of financial field, the entry such as finance, economics and circulation can be used as its institute for numerical value
Concept in the domain knowledge base in category field, with the development of information, domain knowledge base can make knowledge information and ordering,
And using the shared of knowledge and exchange.
At present the structure of domain knowledge base typically by the expert in field or is engaged in the personnel of editing completing, i.e.,
It is the computer form of expression to understand by the knowledge transformation in human brain by expert or the personnel for being engaged in editing, for example
For the domain knowledge base of financial field, its structure can be completed by economist, and such economist is according to oneself
Professional knowledge provide relation between the concept and concept of financial field, so as to obtain the domain knowledge base of financial field, but
Be it is this built by personnel domain knowledge base need to take considerable time, energy and cost, subsequently again to domain knowledge base
In content update when, need to be updated the personnel of work and be fully understood by just to carry out more to having in domain knowledge base
Newly, therefore the existing this maintenance for being unfavorable for domain knowledge base by way of personnel are to build domain knowledge base.
The content of the invention
In view of this, it is an object of the invention to provide a kind of domain knowledge base construction method and device, for automatic structure
The knowledge base in any one field is built, so as to solve the problems, such as to be brought by personnel's manual construction, specific technical scheme is as follows:
The present invention provides a kind of domain knowledge base construction method, and methods described includes:
Obtain the key concept and key concept place target text in current field to be built;
At least one non-core concept is obtained from the target text, the non-core concept is from the target text
In the concept in full dose concept set that extracts, the full dose concept set is combined into the field to be built and field center
The set of heart concept and non-core concept;
Obtain the similarity of the key concept and the non-core concept;
When the similarity of the key concept and the non-core concept meets pre-conditioned, judge described non-core general
Whether identical with the concept being present in the domain knowledge base in the field to be built read, if it is not, then meeting pre- by described
If the non-core concept of condition is retained in the domain knowledge base in the field to be built, and using the non-core concept as new
Key concept, obtain the new key concept place target text, return to perform and obtain at least from the target text
The step of one non-core concept, if it is, give up described meeting pre-conditioned non-core concept;
After all concepts in the domain knowledge base for getting the field to be built, obtain between any two concept
Relation, so as to obtain the domain knowledge base in the field to be built, all concepts include all of the field to be built
Key concept and all non-core concepts.
Preferably, the similarity for obtaining the key concept and the non-core concept, including:
When the concept that the key concept is the 1st acquisition, the non-core concept place target text is obtained, from institute
To state obtained in the target text of non-core concept place and be located at least one of the full dose concept set the first concept, and according to
Described at least one first concepts and the corresponding at least one non-core concept of the key concept, obtain the key concept and
The similarity of the non-core concept;
When the key concept be the non-core concept that obtains i & lt as new key concept when, from described new
The corresponding non-core concept place target text of key concept in obtain and be located at least one of described full dose concept set
Second concept, and according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept
The similarity obtained with i & lt, obtains the new key concept and the corresponding non-core concept of the new key concept
Similarity, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain it is non-
Similarity between key concept, 1≤i≤N, N=M-1, M be get it is all general in the knowledge base in the field to be built
When reading, the total degree of non-core concept is obtained.
Preferably, it is described corresponding at least one non-core according to described at least one first concepts and the key concept
Concept, obtains the similarity of the key concept and the non-core concept, including:
Obtain identical in described at least one first concepts at least one non-core concept corresponding with the key concept
The first concept quantity and at least one first concept and the key concept it is corresponding at least one non-core general
Concept sum in thought, wherein concept sum is the quantity of the concept of the identical first and described at least one first general
Read the quantity sum with different concepts in the corresponding at least one non-core concept of the key concept;
According to the quantity of the concept of the identical first and concept sum, the key concept and described non-core general is obtained
The similarity of thought;
It is described according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept
The similarity obtained with i & lt, obtains the new key concept and the corresponding non-core concept of the new key concept
Similarity, including:
In obtaining described at least one second concepts and the corresponding at least one non-core concept of the new key concept
The quantity of the concept of identical second and at least one second concept and the new key concept corresponding at least one
Concept sum in non-core concept, wherein described at least one second concepts and the new key concept corresponding at least
Quantity and at least one second concept and institute of the concept sum in individual non-core concept for the concept of the identical second
State the quantity sum of different concepts in the corresponding at least one non-core concept of new key concept;
According to the quantity and at least one second concept and the new key concept of the concept of the identical second
Concept sum in corresponding at least one non-core concept, obtains the new key concept and the new key concept pair
First similarity of the non-core concept answered;
According to the similarity that first similarity and i & lt are obtained, the new key concept and described new is obtained
The similarity of the corresponding non-core concept of key concept.
Preferably, it is described when the similarity of the key concept and the non-core concept meets pre-conditioned, judge
Whether the non-core concept is identical with the concept being present in the domain knowledge base in the field to be built, if it is not, then
Meet described pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and by the non-core
Heart concept obtains the new key concept place target text as new key concept, returns and performs from target text
The step of at least one non-core concept is obtained in this, if it is, give up described meeting pre-conditioned non-core concept, bag
Include:
Obtain the similarity of the non-core concept and each concept in full dose concept set;
The similarity of each concept in the non-core concept and full dose concept set, obtains described non-core general
Read the average similarity to full dose concept set;
When the similarity of the key concept and the non-core concept is more than the non-core concept to full dose concept set
During the average similarity of conjunction, judge the non-core concept whether be present in the domain knowledge base in the field to be built
Concept it is identical;
If it is not, then by similarity more than the non-core concept to the non-core of the average similarity of full dose concept set
Concept is retained in the domain knowledge base in the field to be built, and the non-core concept is obtained as new key concept
The new key concept place target text is taken, is returned and is performed at least one non-core concept of acquisition from the target text
The step of;
If it is, giving up similarity more than non-core of the non-core concept to the average similarity of full dose concept set
Heart concept.
Preferably, after all concepts in the knowledge base for getting the field to be built, any two is obtained
Relation between concept, including:
Obtain each self-corresponding non-core concept of any two concept;
Obtain in each self-corresponding non-core concept of any two concept quantity of same concept and it is described arbitrarily
The quantity of different concepts in two concepts;
According to the quantity and the quantity of different concepts of the same concept, obtain similar between any two concept
Degree, the similarity between any two concept is used to indicate the similarity degree between any two concept.
The present invention also provides a kind of domain knowledge base construction device, and described device includes:
First acquisition unit, for obtaining current field to be built in key concept and key concept place mesh
Mark text;
Second acquisition unit, it is described non-core general for obtaining at least one non-core concept from the target text
Thought is the concept in full dose concept set extracted from the target text, and the full dose concept set is combined into described treating
The set of key concept and non-core concept in structure field and field;
First computing unit, for obtaining the similarity of the key concept and the non-core concept;
Processing unit, for when the similarity of the key concept and the non-core concept meets pre-conditioned, sentencing
Whether the non-core concept of breaking is identical with the concept being present in the domain knowledge base in the field to be built, if not,
Then meet described pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and will be described non-
Key concept triggers the first acquisition unit as new key concept, if it is, give up described meeting pre-conditioned
Non-core concept;
Second computing unit, for all concepts in the domain knowledge base for getting the field to be built after, obtain
The relation between any two concept is taken, so as to obtain the domain knowledge base in the field to be built, all concepts include institute
State all key concepts and all non-core concepts in field to be built.
Preferably, first computing unit, for when the key concept is the concept that the 1st time obtains, obtaining institute
Non-core concept place target text is stated, is obtained from the non-core concept place target text and is located at the full dose concept set
At least one of conjunction the first concept, and according to described at least one first concepts and the key concept corresponding at least one
Non-core concept, obtains the similarity of the key concept and the non-core concept, and for being when the key concept
The non-core concept that i & lt is obtained as new key concept when, it is corresponding non-core general from the new key concept
Read to be obtained in the target text of place and be located at least one of the full dose concept set the second concept, and according to described at least one
The similarity that the corresponding at least one non-core concept of individual second concept, the new key concept and i & lt are obtained, obtains institute
The similarity of new key concept and the corresponding non-core concept of the new key concept is stated, the similarity that i & lt is obtained is
Similarity between the non-core concept that the corresponding key concept of non-core concept and i & lt that i & lt is obtained is obtained, 1≤i≤
When N, N=M-1, M are to get all concepts in the knowledge base in the field to be built, total time of non-core concept is obtained
Number.
Preferably, first computing unit, including:
First obtains subelement, described non-core general for when the key concept is the concept that the 1st time obtains, obtaining
Place target text is read, is obtained from the non-core concept place target text and is located in the full dose concept set at least
One the first concept, and for when the key concept be the non-core concept that obtains i & lt as new core it is general
When reading, obtain from the corresponding non-core concept place target text of the new key concept and be located at the full dose concept set
At least one of the second concept;
Second obtains subelement, for obtaining described at least one first concepts and the key concept corresponding at least
The quantity of the concept of identical first and at least one first concept are corresponding with the key concept in individual non-core concept
At least one non-core concept in concept sum, wherein the concept sum for the concept of the identical first quantity and
In described at least one first concepts and the corresponding at least one non-core concept of the key concept quantity of different concepts it
With;
First computation subunit, for according to the quantity of the concept of the identical first and concept sum, obtaining the core
The similarity of heart concept and the non-core concept;
3rd obtains subelement, corresponding extremely for obtaining described at least one second concepts and the new key concept
The quantity and at least one second concept and the new core of the concept of identical second in a few non-core concept
Concept sum in the corresponding at least one non-core concept of concept, wherein described at least one second concepts and the new core
In the corresponding at least one non-core concept of heart concept concept sum for the concept of the identical second quantity and it is described extremely
The quantity sum of different concepts in few second concept and the corresponding at least one non-core concept of the new key concept;
Second computation subunit, for according to the quantity of the concept of the identical second and at least one second concept
In at least one non-core concept corresponding with the new key concept concept sum, obtain the new key concept and
First similarity of the corresponding non-core concept of the new key concept;
3rd computation subunit, for the similarity obtained according to first similarity and i & lt, obtains described new
The similarity of key concept and the corresponding non-core concept of the new key concept.
Preferably, the processing unit, including:
4th computation subunit, it is similar to each concept in full dose concept set for obtaining the non-core concept
Degree;
5th computation subunit, for similar to each concept in full dose concept set according to the non-core concept
Degree, obtains average similarity of the non-core concept to full dose concept set;
Judgment sub-unit, the similarity for working as the key concept and the non-core concept is non-core general more than described
When reading to the average similarity of full dose concept set, judge the non-core concept whether be present in the field to be built
Domain knowledge base in concept it is identical;
Subelement is processed, for working as non-core concept with the concept being present in the domain knowledge base in field to be built not
When identical, similarity is retained in more than the non-core concept to the non-core concept of the average similarity of full dose concept set
In the domain knowledge base in the field to be built, and using the non-core concept as new key concept, triggering described first
Acquiring unit, and for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built,
Then give up similarity more than non-core concept of the non-core concept to the average similarity of full dose concept set.
Preferably, second computing unit, for obtaining each self-corresponding non-core concept of any two concept,
Obtain the quantity and any two concept of same concept in each self-corresponding non-core concept of any two concept
The quantity of middle different concepts, and the quantity of the quantity according to the same concept and different concepts, obtain any two general
Similarity between thought, the similarity between any two concept is used to indicate the similarity degree between any two concept.
Compared with prior art, the above-mentioned technical proposal that the present invention is provided has the advantage that:
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained
After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept
Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing
Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in
In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located
Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built
Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built
Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing
Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck
Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding,
Reduce the maintenance difficulties of domain knowledge base.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with basis
The accompanying drawing that these accompanying drawings are obtained.
Fig. 1 is the flow chart of domain knowledge base construction method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart that similarity provided in an embodiment of the present invention is obtained;
Fig. 3 is another kind of flow chart that similarity provided in an embodiment of the present invention is obtained;
Fig. 4 is the structural representation of domain knowledge base construction device provided in an embodiment of the present invention;
Fig. 5 is the structural representation of the first computing unit in domain knowledge base construction device provided in an embodiment of the present invention;
Fig. 6 is the structural representation of processing unit in domain knowledge base construction device provided in an embodiment of the present invention.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The a part of embodiment of the present invention, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art
The all embodiments obtained under the premise of creative work is not made, belong to the scope of protection of the invention.
Fig. 1 is referred to, the flow chart of domain knowledge base construction method provided in an embodiment of the present invention is it illustrates, for certainly
It is dynamic to build any one domain knowledge base, to solve the problems, such as because personnel's manual construction domain knowledge base brings.Specifically, scheme
Domain knowledge base construction method shown in 1 may comprise steps of:
101:Obtain the key concept and key concept place target text in current field to be built.It is wherein to be built
Field is certain specific area extracted from full dose knowledge base, and key concept is then representative in field to be built
Concept, when the financial field for such as extracting from full dose knowledge base is used as field to be built, can be by " finance " this user
Key concept of the known entry as financial field.And key concept place target text can be to core in some websites
When the text that concept is explained, such as key concept are " finance ", its place target text can be Baidupedia or Wiki hundred
The text that section explains to finance.
In embodiments of the present invention, the determination mode of key concept is:Obtain each concept each text in financial field
The number of times occurred in this, it is key concept to choose concept of the occurrence number in the range of preset times, and wherein each concept can be with
Obtain in each text by data grabber mode from financial field, naturally it is also possible to by the expert in field or be engaged in volume
The personnel for collecting work specify manually, and after each concept in financial field is obtained, it is each in financial field to each concept
The number of times occurred in individual text can be marked in full dose knowledge base is built, and preset times scope can be according to practical application
Determine, or the determination mode of key concept is:The key concept in its research field is marked by expert.
Above-mentioned full dose knowledge base is the set of knowledge in field to be built and other field, including field to be built and other
Relation in field between all concepts and concept, embodiment of the present invention domain knowledge base construction method is then based on full dose knowledge base
In relation between the concept of domain knowledge base that obtains belonging to field to be built and concept.Wherein acquisition modes of full dose knowledge base
Including but not limited to following manner:
A kind of acquisition modes are that full dose knowledge base is obtained by data grabber mode, specifically by web crawlers, from
Info web is captured on internet, then the information that the info web of crawl is provided with portal website is compared, obtain text
This information is stored in full dose knowledge base, and each entry in such text message can be considered as concept, and in same text
Entry in information can be considered as the related concept of tool, and wherein portal website is to lead to the comprehensive internet information resource of certain class
And application system about information service is provided;Or from existing full dose knowledge base website capture, such as Baidupedia or
Capture in the websites such as wikipedia.
Another kind of acquisition modes obtain full dose knowledge base by artificial organ mode, specifically by expert or are engaged in editor's work
The personnel of work enter edlin according to existing knowledge base and the instruction itself grasped, and one is realized entirely by many people's cooperations
Amount knowledge base.This work is frequently not that personal and single tissue can be completed, so this artificial organ mode is all online
To be cooperated, this full dose knowledge base website of such as Baidupedia is artificial to complete in online cooperation.
102:At least one non-core concept is obtained from target text, wherein non-core concept is to carry from target text
The concept in full dose concept set taken out, full dose concept set is combined into key concept and non-core in field to be built and field
The set of heart concept, therefore above-mentioned full dose knowledge base can be the set of the relation between the set of full dose concept and concept.
After target text is obtained, the entry with hyperlink connection function is obtained from target text, then by entry indication
Show that concept is compared with the concept in full dose concept set, if certain in concept indicated by entry and full dose concept set is general
Read identical, then using concept indicated by entry as non-core concept, wherein the entry with hyperlink connection function is in triggering entry
The text explained to entry can be had access to afterwards.
Additionally, in addition to the mode of the non-core concept of above-mentioned acquisition, can also be by Chinese words segmentation to target text
Word segmentation processing is carried out, the concept indicated by each entry for obtaining is compared with the concept in full dose concept set, if word
Concept indicated by bar is identical with certain concept in full dose concept set, then using concept indicated by entry as non-core concept.
For example, when key concept is " finance ", the entry obtained from its place target text has:" circulation ", " develop gold
Melt ", " evolution security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " currency ", " commodity ",
If the concept that these entries are indicated is identical with certain concept in full dose concept set, the concept that above-mentioned entry is indicated point
Not as non-core concept, if the concept that certain entry is indicated is different from each concept in full dose concept set, such as " intermediary
Mechanism ", then can not be as non-core concept.
103:Obtain the similarity of key concept and non-core concept.Wherein similarity is used to indicate non-core concept and core
The similarity degree of heart concept, using determine non-core concept whether can as the concept in the domain knowledge base in field to be built,
The similarity of key concept and non-core concept can by cosine similarity, Pearson's similarity factor and Jaccard similarities come
Obtain, the computation complexity and computational efficiency between Jaccard similarities is better than cosine similarity and Pearson's similarity factor, this
Inventive embodiments are illustrated with Jaccard similarities to the similarity for obtaining key concept and non-core concept.
Wherein Jaccard similarities are used to calculate the similarity between the individuality of symbol tolerance or boolean's value metric, its correspondence
Computing formula it is as follows:
WhereinRepresent concept set of a and o in X, a is key concept, o is non-core concept, X
For full dose concept set, i.e., in embodiments of the present invention the calculating of the similarity of key concept and non-core concept can be:Point
Not Huo Qu key concept and non-core concept concept set, the quantity of the friendship centralized concept of the two concept set is divided by union
The quantity of middle concept is the similarity of key concept and non-core concept.
For example, o refers to " finance " this key concept, and it is non-core general that O refers to that " finance " is linked in full dose concept set X
The set of thought, such as " circulation " above-mentioned, " evolution finance ", " evolution security ", " draft bank ", " draft ", " silver ",
" intermediary ", " economist ", " currency ", " commodity ".
And a refers to " economist " this non-core concept, A refers to what " economist " was linked in full dose concept set X
The set of other concepts, such as " currency ", " means of production ", " distribution ", " economics ", " commodity ".So, due to " currency ",
" commodity " are both common factors, then above-mentioned computing formula Sima,oMolecule is 2.Union is " circulation ", and " evolution finance " " develops
Security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " currency ", " commodity ", " means of production ",
" distribution ", " economics ", then above-mentioned computing formula Sima,oDenominator is 13, then both similarities are 2/13 ≈ 0.154.
From above-mentioned computing formula Sima,oUnderstand, acquisition process such as Fig. 2 institutes of the similarity of key concept and non-core concept
Show, may comprise steps of:
201:Non-core concept place target text is obtained, is obtained from the target text of non-core concept place and is located at full dose
The concept of at least one of concept set first.
In embodiments of the present invention, the acquisition modes of at least one first concepts non-core concept corresponding with key concept
Acquisition modes it is identical, this is no longer described in detail, by taking " economist " this non-core concept as an example, at least one first of acquisition
Concept has respectively:" currency ", " means of production ", " distribution ", " economics ", " commodity ".
202:Obtain identical at least one first concepts and the corresponding at least one non-core concept of key concept
Concept in the quantity of one concept and at least one first concepts and the corresponding at least one non-core concept of key concept is total
Number, wherein concept sum is the quantity and at least one first concepts and key concept corresponding at least of the concept of identical first
The quantity sum of different concepts in individual non-core concept.
With above-mentioned " finance " as key concept, the first concept be by the economist in financial this key concept this
The concept that non-core concept is obtained, accordingly, the corresponding at least one non-core concept of this key concept of finance has:" stream
It is logical ", " evolution finance ", " evolution security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " goods
Coin ", " commodity ", at least one first concepts that economist this non-core concept is obtained have:" currency ", " means of production ",
" distribution ", " economics ", " commodity ", then the concept of identical first is " currency, commodity " in the two concept set, then identical
The quantity of the first concept is 2, and the quantity of different concepts is 11, then concept sum is 13.
203:According to the quantity of the concept of identical first and concept sum, the similar of key concept and non-core concept is obtained
Degree, to be realized according to the corresponding at least one non-core concept of the first concept and key concept by step 202 and step 203,
Obtain the similarity of key concept and non-core concept.
Here it should be noted that:When needing to obtain the similarity of key concept and certain non-core concept, its root
According to the concept of identical first quantity and concept sum be the corresponding information of this non-core concept, rather than other are non-core general
Read corresponding information, such as when needing to obtain the similarity of key concept " finance " and non-core concept " economist ", phase
The quantity and concept sum of the first same concept is the corresponding information of non-key concept " economist ".
104:Judge whether key concept and the similarity of non-core concept meet pre-conditioned, if it is, execution step
105, if not, execution step 108.When the similarity of key concept and non-core concept meets pre-conditioned, non-core is indicated
Heart concept is the concept in the domain knowledge base in field to be built;When the similarity of key concept and non-core concept be unsatisfactory for it is pre-
If during condition, indicating that non-core concept is not the concept in the domain knowledge base in field to be built.
In embodiments of the present invention, a kind of pre-conditioned feasible pattern is:Non-core concept is to full dose concept set
Average similarity, its acquisition process is:Obtain the similarity of non-core concept and each concept in full dose concept set, and root
According to the similarity of each concept in non-core concept and full dose concept set, non-core concept is obtained to full dose concept set
Average similarity, specific computing formula is as follows:
If full dose concept set is combined into X={ x1,x2,...xn, xiRepresent i-th concept in full dose concept set X, then it is non-
Key concept a is as follows for the formula of the average similarity of full dose concept set:
Sim(a,xi) it is non-core concept a and xiSimilarity, its computing formula can refer to Sima,oComputing formula,
When key concept and non-core concept similarity more than non-core concept to the average similarity of full dose concept set when, judge
The similarity of key concept and non-core concept meets pre-conditioned, when key concept and non-core concept similarity be less than or
During equal to non-core concept to the average similarity of full dose concept set, the similarity of key concept and non-core concept is judged not
Meet pre-conditioned.
105:When the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with
The concept being present in the domain knowledge base in field to be built is identical, if not, execution step 106, if it is, execution step
107。
106:The non-core concept for meeting pre-conditioned is retained in the domain knowledge base in field to be built, and by non-core
Heart concept obtains new key concept place target text as new key concept, and continues executing with step 102.
107:Give up and meet pre-conditioned non-core concept, and execution step 109.
When the similarity of key concept and non-core concept meets pre-conditioned, indicate that non-core concept is neck to be built
Concept in the domain knowledge base in domain, but also need to whether to have had in the domain knowledge base for determine whether field to be built
Same concept, if it is, illustrating that this non-core concept is had been written into domain knowledge base, now can hold
Row step 107 is given up, to avoid domain knowledge base in concept repetition, if it is not, then illustrating this non-core concept not
In being written to domain knowledge base, then execution step 106 is retained in domain knowledge base, and general as new core
Read, in obtaining new key concept place target text, continuation obtains at least one from new key concept place target text
Individual non-core concept, that is, continue to obtain other concepts in the domain knowledge base in field to be built to improve domain knowledge base.
It is pre-conditioned for non-core concept to the average similarity of full dose concept set when, what step 107 was given up is phase
It is more than non-core concept of the non-core concept to the average similarity of full dose concept set like degree, what corresponding step 106 retained
It is that similarity is more than non-core concept of the non-core concept to the average similarity of full dose concept set, it is possible to which similarity is big
In non-core concept to the non-core concept of the average similarity of full dose concept set as new key concept.
108:Give up and be unsatisfactory for pre-conditioned non-core concept, and execution step 109.When key concept and non-core general
When the similarity of thought is unsatisfactory for pre-conditioned, indicate that non-core concept is not the concept in the domain knowledge base in field to be built,
Now can directly give up and be unsatisfactory for pre-conditioned non-core concept, such as directly give up similarity general less than or equal to non-core
Read the non-core concept of the average similarity to full dose concept set
109:After all concepts in the domain knowledge base for getting field to be built, obtain between any two concept
Relation, so as to obtain the domain knowledge base in field to be built, wherein all concepts include all key concepts in field to be built
With all non-core concepts.
In embodiments of the present invention, if having given up all non-core of step 102 acquisition by step 107 and step 108
Concept, represents that remaining all non-core concepts are had been written into domain knowledge base, further relates to get domain knowledge
All concepts in storehouse, now can further obtain the relation between any two concept, complete the structure of domain knowledge base.
If step 106 still has non-core concept as new key concept, illustrate still have non-core concept to be not written to
In domain knowledge base, then continue non-core concept as new key concept, execution step 102, to improve domain knowledge base.
In embodiments of the present invention, the relation between any two concept can be subordinate relation or same level relation, such as
The relation between non-core concept under key concept and key concept can be subordinate relation, and same key concept is multiple
Relation between non-core concept can be same level relation.
Relation between certain any two concept can be indicated with the similarity between any two concept, wherein any two
Similarity between individual concept can be obtained by cosine similarity, Pearson's similarity factor and Jaccard similarities, between
The computation complexity and computational efficiency of Jaccard similarities is implemented better than cosine similarity and Pearson's similarity factor, the present invention
Example is illustrated with Jaccard similarities obtaining the similarity between any two concept.
If the concept set of the domain knowledge base in field to be built is combined into S, in any two concept a concept is a, separately
One concept is b, and the calculating formula of similarity between concept a and concept b is as follows:
Wherein,Represent concept set of a and b in S.
For example, the concept in the domain knowledge base in field to be built has " finance ", " economist ", " economics ", " goods
Coin ", " commodity ", " stock ", " market ".Wherein a is " economist ", and A refers to " economist " as key concept, knows in field
Know the set of the corresponding non-core concept of key concept " economist " in concept S in storehouse.Originally, " economist " was as core
The non-core concept obtained during heart concept has:" currency ", " means of production ", " distribution ", " economics ", " commodity ", but in Jing
Cross after process and be retained in having in domain knowledge base:" currency ", " economics ", then " commodity ", this concept set of A includes
" currency, economics, commodity " these three concepts.
B is " market ", and B refers to that with " market " as key concept key concept " market " is right in concept S of domain knowledge base
The set of the non-core concept answered.Originally, the non-core concept that " market " obtained when as key concept has:" stock ", " hands over
Easily ", " value ", " commodity ", but having in domain knowledge base is being retained in after treatment:" stock ", " commodity ", then B this
Individual concept set includes " stock, commodity " these three concepts.
So, because " commodity " are the common factors of set A and set B, then computing formula Sima,bMiddle molecule is 1.Set A and
The union of set B is " currency ", " economics ", " commodity ", " stock ", then computing formula Sima,bMiddle denominator be 4, then concept a and
Similarity between concept b is 1/4 ≈ 0.25, and thus, the similarity relation for obtaining economist and market is 0.25.
Can be drawn by above-mentioned computing formula, indicated with the similarity between any two concept between any two concept
Relation when, the acquisition modes of the relation between any two concept can be:Obtain each self-corresponding non-core of any two concept
Heart concept, in each self-corresponding non-core concept of acquisition any two concept in the quantity and any two concept of same concept
The quantity of different concepts, and the quantity of the quantity according to same concept and different concepts, obtain similar between any two concept
Degree, the similarity between any two concept is used to indicate the similarity degree between any two concept.
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained
After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept
Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing
Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in
In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located
Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built
Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built
Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing
Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck
Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding,
Reduce the maintenance difficulties of domain knowledge base.
Here it should be noted is that:When the similarity of key concept and non-core concept is obtained, if core is general
Read the concept for the 1st acquisition, i.e., be not by the non-core concept for obtaining as new key concept when, can pass through above-mentioned
Computing formula Sima,oTo obtain, but when key concept be using i & lt obtain non-core concept as new key concept
When, then need to consider similarity transmission, such as it is public in similarity when calculating the similarity of above-mentioned " economist " and " economics "
The similarity for considering " economist " and " finance " is needed in formula, wherein 1≤i≤N, N=M-1, M are described to be built to get
During all concepts in the knowledge base in field, the total degree of non-core concept is obtained.
Why consider that similarity transmission is because with the increase of Email Filtering, the non-core concept of acquisition may be with
The key concept of the 1st acquisition is unrelated, for this kind of non-core concept can not be written in domain knowledge base, but not
In the case of considering that similarity is transmitted, this non-core concept meets the pre-conditioned of embodiment of the present invention setting, so as to can be by
It is retained in domain knowledge base, causes to exist in domain knowledge base the concept for being not belonging to the field, is this embodiment of the present invention
Consider similarity transmission so that non-core concept is closed with the key concept for obtaining before by its own corresponding key concept
Connection, reduces the presence of the probability of erroneous picture in domain knowledge base, accordingly, the non-core concept that i & lt is obtained as
New key concept, new key concept and the calculating process of the similarity of non-core concept is as shown in figure 3, can include following
Step:
301:Obtain from the new corresponding non-core concept place target text of key concept and be located at full dose concept set
At least one of the second concept.In embodiments of the present invention, the acquisition modes of at least one second concepts and key concept pair
The acquisition modes of the non-core concept answered are identical, and this is no longer described in detail, and still by taking above-mentioned finance and economist as an example, finance is the
The key concept of 1 acquisition, economist is the non-core concept for obtaining for the 1st time, can as new key concept,
When economist is as new key concept, the non-core concept for obtaining has:" currency ", " means of production ", " distribution " is " economical
Learn ", " commodity " are then obtained and be located in each non-core concept place target text at least one of full dose concept set the
Two concepts, that is, obtain the set of the second concept of each non-core concept.
302:Obtain identical at least one second concepts, at least one non-core concept corresponding with new key concept
The second concept quantity and at least one second concepts and the corresponding at least one non-core concept of new key concept in
Concept sum, the concept in the concept of wherein at least one second and the corresponding at least one non-core concept of new key concept
Sum is the quantity and at least one second concepts and corresponding at least one non-core of new key concept of the concept of identical second
The quantity sum of different concepts in heart concept.
It is understandable that:Obtain at least one second concepts and new key concept is corresponding at least one non-core general
The quantity of the concept of identical second and at least one second concepts and corresponding at least one non-core of new key concept in thought
Concept sum in heart concept is:In units of the set of each the second concept, gathering and new for each the second concept is obtained
The quantity and the quantity of the second different concepts of the concept of identical second in the corresponding at least one non-core concept of key concept,
So by the quantity and the quantity of the second different concepts of the concept of identical second of the set of each the second concept, obtain right
The concept sum of the set of the second concept answered.
303:It is corresponding extremely according to the quantity and at least one second concepts and new key concept of the concept of identical second
Concept sum in a few non-core concept, obtains new key concept and the corresponding non-core concept of new key concept
First similarity, its corresponding computing formula is:
Wherein, bnFor the new key concept that n-th is obtained, its corresponding non-core concept is a,
Represent a and bnConcept set in S, A ∩ C represent the quantity of the concept of identical second, and A ∪ C represent that at least one second is general
Read and the concept sum in the new corresponding at least one non-core concept of key concept.
304:According to the similarity that the first similarity and i & lt are obtained, new key concept and new key concept is obtained
The similarity of corresponding non-core concept, to be realized according at least one second concepts, new by step 302 to step 304
The similarity that the corresponding at least one non-core concept of key concept and i & lt are obtained, obtains new key concept and new core
The similarity that the similarity of the corresponding non-core concept of heart concept, wherein i & lt are obtained is the non-core concept pair that i & lt is obtained
Similarity between the non-core concept that the key concept answered and i & lt are obtained.
Below X is combined into full dose concept set, the key concept of the 1st acquisition is o, and non-core concept is a, used as new core
The collection of heart concept is combined into B, and B={ b1,b2,...bn, wherein biFor the new key concept that i & lt is obtained, then b1For the 1st time
The new key concept for obtaining, i.e. the non-core concept institute that key concept o is obtained as new key concept, then the phase of o and a
It is as follows like degree formula:
WhereinFor the similarity that i & lt is obtained.
Here it should be noted is that:It is determined that during key concept in field to be built, may can determine whether multiple
Key concept, is that this can choose a key concept from multiple key concepts, and obtains selected key concept place
Target text, naturally it is also possible to process parallel or successively to multiple key concepts, parallel or general to multiple cores successively
When thought is processed, after the non-core concept of any one key concept is got, need corresponding with other key concepts
Non-core concept is compared, with only non-core in the non-core concept in any two or multiple key concepts general
Thought is processed.
For aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but
It is that those skilled in the art should know, the present invention is not limited by described sequence of movement, because according to the present invention, certain
A little steps can adopt order or while carry out.Secondly, those skilled in the art also should know, described in this description
Embodiment belongs to preferred embodiment, and involved action and the module not necessarily present invention is necessary.
Fig. 4 is referred to, the structure of domain knowledge base construction device provided in an embodiment of the present invention is it illustrates, can be wrapped
Include:First acquisition unit 11, second acquisition unit 12, the first computing unit 13, the computing unit 15 of processing unit 14 and second.
First acquisition unit 11, for obtaining current field to be built in key concept and key concept place target
Text.
Field wherein to be built is certain specific area extracted from full dose knowledge base, and key concept is then to be built
Representative concept in field, when the financial field for such as extracting from full dose knowledge base is used as field to be built, can
Using by key concept of the entry as financial field known to " finance " this user.And key concept place target text can be with
It is key concept is explained text in some websites, when such as key concept is " finance ", its place target text can be with
It is text that Baidupedia or wikipedia are explained to finance.
For how determining that it is related in embodiment of the method that key concept and the acquisition modes of full dose knowledge base can be referred to
Illustrate, this embodiment of the present invention is no longer illustrated.
Second acquisition unit 12, for from target text obtain at least one non-core concept, non-core concept be from
The concept in full dose concept set extracted in target text, full dose concept set is combined into field to be built and field center
The set of heart concept and non-core concept, therefore above-mentioned full dose knowledge base can be the relation between the set of full dose concept and concept
Set, for second acquisition unit 12, its mode for obtaining at least one non-core concept is referred in embodiment of the method
Related description, this embodiment of the present invention is no longer illustrated.
First computing unit 13, for obtaining the similarity of key concept and non-core concept.Wherein similarity is used to refer to
Show the similarity degree of non-core concept and key concept, to determine whether non-core concept can be used as the field in field to be built
The similarity of the concept in knowledge base, key concept and non-core concept can by cosine similarity, Pearson's similarity factor and
Obtaining, the computation complexity and computational efficiency between Jaccard similarities is better than cosine similarity and skin for Jaccard similarities
Ademilson similarity factor, the embodiment of the present invention is entered with Jaccard similarities to the similarity for obtaining key concept and non-core concept
Row explanation.
Accordingly, when the concept that key concept is the 1st acquisition, the first computing unit 13 is used to obtain non-core concept
Place target text, obtains general positioned at least one of full dose concept set first from the target text of non-core concept place
Read, and according to the corresponding at least one non-core concept of at least one first concepts and key concept, obtain key concept and non-
The similarity of key concept.
When key concept be the non-core concept that obtains i & lt as new key concept when, the first computing unit
13 be used to being obtained from the new corresponding non-core concept place target text of key concept in the full dose concept set to
Few second concept, and according at least one second concepts, the corresponding at least one non-core concept of new key concept and
The similarity that i & lt is obtained, obtains the similarity of new key concept and the corresponding non-core concept of new key concept, will
I & lt obtain similarity be delivered to i & lt acquisition non-core concept as the new corresponding similarity of key concept, make
Non-core concept is obtained with the key concept for obtaining before by its own corresponding key concept association, in reducing domain knowledge base
There is the probability of erroneous picture.
The similarity that wherein i & lt is obtained is that the corresponding key concept of non-core concept that i & lt is obtained and i & lt are obtained
Non-core concept between similarity, 1≤i≤N, N=M-1, M be get it is all general in the knowledge base in field to be built
When reading, the total degree of non-core concept is obtained.
Corresponding, the structure of the first computing unit 13 is as shown in figure 5, can include:First obtain subelement 131, the
Two obtain subelement 132, the first computation subunit the 133, the 3rd obtains subelement 134, the second computation subunit 135 and the 3rd meter
Operator unit 136.
First obtains subelement 131, for when key concept is the concept that the 1st time obtains, obtaining non-core concept institute
In target text, obtain general positioned at least one of full dose concept set first from the target text of non-core concept place
Read, and for when key concept be the non-core concept that obtains i & lt as new key concept when, from new core
Obtain in the corresponding non-core concept place target text of concept and be located at least one of full dose concept set the second concept.
Second obtains subelement 132, corresponding at least one non-for obtaining at least one first concepts and key concept
The quantity of the concept of identical first and at least one first concepts and key concept are corresponding at least one non-in key concept
Concept sum in key concept, wherein concept sum are the quantity and at least one first concepts and core of the concept of identical first
The quantity sum of different concepts in the corresponding at least one non-core concept of heart concept.
First computation subunit 133, for according to the quantity of the concept of identical first and concept sum, obtaining key concept
With the similarity of non-core concept.
3rd obtains subelement 134, for obtaining at least one second concepts and new key concept corresponding at least
The quantity of the concept of identical second and at least one second concepts and new key concept are corresponding extremely in individual non-core concept
Concept sum in a few non-core concept, the concept of wherein at least one second and new key concept corresponding at least one
Concept sum in non-core concept is the quantity and at least one second concepts and new key concept of the concept of identical second
The quantity sum of different concepts in corresponding at least one non-core concept.
Second computation subunit 135, for according to the quantity of the concept of identical second and at least one second concepts and new
The corresponding at least one non-core concept of key concept in concept sum, obtain new key concept and new key concept
First similarity of corresponding non-core concept.
3rd computation subunit 136, for the similarity obtained according to the first similarity and i & lt, obtains new core
The similarity of concept and the corresponding non-core concept of new key concept.
In embodiments of the present invention, first the acquisition subelement 132 of subelement 131, second, the first computation subunit are obtained
133rd, the 3rd obtain subelement 134, the concrete implementation procedure of the second computation subunit 135 and the 3rd computation subunit 136 and
Illustrate, refer to the related description of embodiment of the method part, this embodiment of the present invention is no longer illustrated.
Processing unit 14, for when the similarity of key concept and non-core concept meets pre-conditioned, judging non-core
Whether heart concept is identical with the concept being present in the domain knowledge base in field to be built, if it is not, then default bar will be met
The non-core concept of part is retained in the domain knowledge base in field to be built, and using non-core concept as new key concept,
Triggering first acquisition unit 11, if it is, give up meeting pre-conditioned non-core concept.
In embodiments of the present invention, a kind of pre-conditioned feasible pattern is:Non-core concept is to full dose concept set
Average similarity, the structure of corresponding processing unit 14 is as shown in fig. 6, can include:4th computation subunit the 141, the 5th is counted
Operator unit 142, judgment sub-unit 143 and process subelement 144.
4th computation subunit 141, it is similar to each concept in full dose concept set for obtaining non-core concept
Degree.
5th computation subunit 142, for similar to each concept in full dose concept set according to non-core concept
Degree, obtains average similarity of the non-core concept to full dose concept set.
Judgment sub-unit 143, for being more than non-core concept to full dose when the similarity of key concept and non-core concept
During the average similarity of concept set, judge non-core concept whether be present in the domain knowledge base in field to be built
Concept is identical.
Process subelement 144, for when non-core concept be present in it is general in the domain knowledge base in field to be built
When thought is differed, similarity is retained in more than non-core concept to the non-core concept of the average similarity of full dose concept set
In the domain knowledge base in field to be built, and using non-core concept as new key concept, first acquisition unit 11 is triggered, with
And for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built, then giving up similarity
More than non-core concept of the non-core concept to the average similarity of full dose concept set.
In embodiments of the present invention, the 4th computation subunit 141, the 5th computation subunit 142, the and of judgment sub-unit 143
The concrete implementation procedure for processing subelement 144 refers to the related description of embodiment of the method part, to this embodiment of the present invention not
Illustrate again.
Second computing unit 15, after all concepts in the domain knowledge base for getting field to be built, obtains
Relation between any two concept, so as to obtain the domain knowledge base in field to be built, all concepts include field to be built
All key concepts and all non-core concepts.
Optionally, the second computing unit 15, for obtaining each self-corresponding non-core concept of any two concept, obtains and appoints
In each self-corresponding non-core concept of two concepts of meaning in the quantity and any two concept of same concept different concepts number
Amount, and the quantity of the quantity according to same concept and different concepts, obtain the similarity between any two concept, and any two is general
Similarity between thought is used for the similarity degree for indicating between any two concept, concrete implementation procedure and illustrates, the side of referring to
The related description of method embodiment part, no longer illustrates this embodiment of the present invention.
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained
After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept
Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing
Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in
In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located
Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built
Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built
Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing
Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck
Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding,
Reduce the maintenance difficulties of domain knowledge base.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight
Point explanation is all difference with embodiment, between each embodiment identical similar part mutually referring to.For
For device class embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, related part is referring to side
The part explanation of method embodiment.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between there is any this actual relation or order.And, term " including ", "comprising" or its any variant are intended to contain
Lid nonexcludability is included, so that a series of process, method, article or equipment including key elements not only will including those
Element, but also including the key element being not expressly set out, or also include solid by this process, method, article or equipment
Some key elements.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including
Also there is other identical element in the process of the key element, method, article or equipment.
The foregoing description of the disclosed embodiments, enables those skilled in the art to realize or using the present invention.To this
Various modifications of a little embodiments will be apparent for a person skilled in the art, and generic principles defined herein can
Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited
It is formed on the embodiments shown herein, and is to fit to consistent with principles disclosed herein and features of novelty most wide
Scope.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of domain knowledge base construction method, it is characterised in that methods described includes:
Obtain the key concept and key concept place target text in current field to be built;
At least one non-core concept is obtained from the target text, the non-core concept is to carry from the target text
The concept in full dose concept set taken out, it is general that the full dose concept set is combined into core in the field to be built and field
Read the set with non-core concept;
Obtain the similarity of the key concept and the non-core concept;
When the similarity of the key concept and the non-core concept meets pre-conditioned, judge that the non-core concept is
It is no identical with concept that is being present in the domain knowledge base in the field to be built, if it is not, then meeting default bar by described
The non-core concept of part is retained in the domain knowledge base in the field to be built, and using the non-core concept as new core
Heart concept, obtains the new key concept place target text, returns execution and obtains at least one from the target text
The step of non-core concept, if it is, give up described meeting pre-conditioned non-core concept;
After all concepts in the domain knowledge base for getting the field to be built, the pass between any two concept is obtained
System, so as to obtain the domain knowledge base in the field to be built, all concepts include all cores in the field to be built
Heart concept and all non-core concepts.
2. method according to claim 1, it is characterised in that the acquisition key concept and the non-core concept
Similarity, including:
When the concept that the key concept is the 1st acquisition, the non-core concept place target text is obtained, from described non-
Obtain in the target text of key concept place and be located at least one of the full dose concept set the first concept, and according to described
At least one first concepts and the corresponding at least one non-core concept of the key concept, obtain the key concept and described
The similarity of non-core concept;
When the key concept be the non-core concept that obtains i & lt as new key concept when, from the new core
Obtain in the corresponding non-core concept place target text of heart concept and be located at least one of described full dose concept set second
Concept, and according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept and i-th
The similarity of secondary acquisition, obtains the similar of the new key concept and the corresponding non-core concept of the new key concept
Degree, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain it is non-core
Similarity between concept, 1≤i≤N, N=M-1, M are to get all concepts in the knowledge base in the field to be built
When, obtain the total degree of non-core concept.
3. method according to claim 2, it is characterised in that described according to described at least one first concepts and the core
The corresponding at least one non-core concept of heart concept, obtains the similarity of the key concept and the non-core concept, including:
Obtain identical in described at least one first concepts and the corresponding at least one non-core concept of the key concept
In the quantity of one concept and at least one first concept and the corresponding at least one non-core concept of the key concept
Concept sum, wherein the concept sum for the concept of the identical first quantity and at least one first concept and
The quantity sum of different concepts in the corresponding at least one non-core concept of the key concept;
According to the quantity of the concept of the identical first and concept sum, the key concept and the non-core concept are obtained
Similarity;
It is described according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept and
The similarity of i acquisition, obtains the similar of the new key concept and the corresponding non-core concept of the new key concept
Degree, including:
Obtain identical in described at least one second concepts at least one non-core concept corresponding with the new key concept
The second concept quantity and at least one second concept and corresponding at least one non-core of the new key concept
Concept sum in heart concept, wherein described at least one second concepts and the new key concept are corresponding at least one non-
Concept sum in key concept is the quantity and at least one second concept of the concept of the identical second and described new
The corresponding at least one non-core concept of key concept in different concepts quantity sum;
It is corresponding with the new key concept with described at least one second concepts according to the quantity of the concept of the identical second
At least one non-core concept in concept sum, obtain the new key concept and the new key concept be corresponding
First similarity of non-core concept;
According to the similarity that first similarity and i & lt are obtained, the new key concept and the new core are obtained
The similarity of the corresponding non-core concept of concept.
4. method according to claim 1, it is characterised in that described when the key concept and the non-core concept
When similarity meets pre-conditioned, judge the non-core concept whether with the domain knowledge for being present in the field to be built
Concept in storehouse is identical, if it is not, then meeting described pre-conditioned non-core concept and being retained in the field to be built
In domain knowledge base, and using the non-core concept as new key concept, the new key concept place target is obtained
Text, returns and performs the step of obtaining at least one non-core concept from the target text, if it is, giving up described full
The pre-conditioned non-core concept of foot, including:
Obtain the similarity of the non-core concept and each concept in full dose concept set;
The similarity of each concept in the non-core concept and full dose concept set, obtains the non-core concept pair
The average similarity of full dose concept set;
When the similarity of the key concept and the non-core concept is more than the non-core concept to full dose concept set
During average similarity, judge the non-core concept whether be present in it is general in the domain knowledge base in the field to be built
Read identical;
If it is not, then similarity is more than into non-core concept of the non-core concept to the average similarity of full dose concept set
In being retained in the domain knowledge base in the field to be built, and using the non-core concept as new key concept, institute is obtained
New key concept place target text is stated, is returned and is performed the step that at least one non-core concept is obtained from the target text
Suddenly;
If it is, giving up similarity more than the non-core concept to the non-core general of the average similarity of full dose concept set
Read.
5. method according to claim 1, it is characterised in that described in the knowledge base for getting the field to be built
All concepts after, obtain any two concept between relation, including:
Obtain each self-corresponding non-core concept of any two concept;
Obtain the quantity and any two of same concept in each self-corresponding non-core concept of any two concept
The quantity of different concepts in concept;
According to the quantity and the quantity of different concepts of the same concept, the similarity between any two concept, institute are obtained
State the similarity degree that the similarity between any two concept is used to indicate between any two concept.
6. a kind of domain knowledge base construction device, it is characterised in that described device includes:
First acquisition unit, for obtaining current field to be built in key concept and key concept place target text
This;
Second acquisition unit, for obtaining at least one non-core concept from the target text, the non-core concept is
The concept in full dose concept set extracted from the target text, the full dose concept set is combined into described to be built
The set of key concept and non-core concept in field and field;
First computing unit, for obtaining the similarity of the key concept and the non-core concept;
Processing unit, for when the similarity of the key concept and the non-core concept meets pre-conditioned, judging institute
Whether identical with the concept being present in the domain knowledge base in the field to be built non-core concept is stated, if it is not, then will
It is described to meet pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and will be described non-core
Concept triggers the first acquisition unit as new key concept, if it is, give up described meeting pre-conditioned non-core
Heart concept;
Second computing unit, after all concepts in the domain knowledge base for getting the field to be built, obtains and appoints
Relation between two concepts of meaning, so as to obtain the domain knowledge base in the field to be built, all concepts include described treating
All key concepts in structure field and all non-core concepts.
7. device according to claim 6, it is characterised in that first computing unit, for when the key concept
For the 1st acquisition concept when, the non-core concept place target text is obtained, from non-core concept place target text
Obtain in this and be located at least one of the full dose concept set the first concept, and according to described at least one first concepts and
The corresponding at least one non-core concept of the key concept, obtains the similar of the key concept and the non-core concept
Degree, and for when the key concept be the non-core concept that obtains i & lt as new key concept when, from described
At least one in the full dose concept set is obtained in the corresponding non-core concept place target text of new key concept
Individual second concept, and it is corresponding at least one non-core general according to described at least one second concepts, the new key concept
The similarity obtained with i & lt is read, the new key concept and the corresponding non-core concept of the new key concept is obtained
Similarity, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain
Similarity between non-core concept, 1≤i≤N, N=M-1, M are to get owning in the knowledge base in the field to be built
During concept, the total degree of non-core concept is obtained.
8. device according to claim 7, it is characterised in that first computing unit, including:
First obtains subelement, for when the key concept is the concept that the 1st time obtains, obtaining the non-core concept institute
In target text, obtain from the non-core concept place target text and be located at least one of described full dose concept set
First concept, and for when the key concept be the non-core concept that obtains i & lt as new key concept when,
Obtain from the corresponding non-core concept place target text of the new key concept and be located in the full dose concept set
At least one second concepts;
Second obtains subelement, corresponding at least one non-for obtaining described at least one first concepts and the key concept
The quantity of the concept of identical first and at least one first concept and the key concept are corresponding extremely in key concept
Concept sum in a few non-core concept, wherein concept sum is the quantity of the concept of the identical first and described
The quantity sum of different concepts at least one first concepts and the corresponding at least one non-core concept of the key concept;
First computation subunit, for according to the quantity of the concept of the identical first and concept sum, obtaining the core general
Read the similarity with the non-core concept;
3rd obtains subelement, for obtaining described at least one second concepts and the new key concept corresponding at least
The quantity of the concept of identical second and at least one second concept and the new key concept in individual non-core concept
Concept sum in corresponding at least one non-core concept, wherein described at least one second concepts and the new core are general
Read quantity and described at least one of the concept sum in corresponding at least one non-core concept for the concept of the identical second
The quantity sum of different concepts in individual second concept and the corresponding at least one non-core concept of the new key concept;
Second computation subunit, for according to the quantity of the concept of the identical second and at least one second concept and institute
The concept sum in the corresponding at least one non-core concept of new key concept is stated, the new key concept and described is obtained
First similarity of the corresponding non-core concept of new key concept;
3rd computation subunit, for the similarity obtained according to first similarity and i & lt, obtains the new core
The similarity of concept and the corresponding non-core concept of the new key concept.
9. device according to claim 6, it is characterised in that the processing unit, including:
4th computation subunit, for the similarity of each concept in obtaining the non-core concept and full dose concept set;
5th computation subunit, for the similarity of each concept in the non-core concept and full dose concept set,
Obtain average similarity of the non-core concept to full dose concept set;
Judgment sub-unit, for being more than the non-core concept pair when the similarity of the key concept and the non-core concept
During the average similarity of full dose concept set, judge the non-core concept whether with the neck for being present in the field to be built
Concept in domain knowledge base is identical;
Subelement is processed, for differing with the concept being present in the domain knowledge base in field to be built when non-core concept
When, similarity is retained in more than the non-core concept to the non-core concept of the average similarity of full dose concept set described
In the domain knowledge base in field to be built, and using the non-core concept as new key concept, trigger described first and obtain
Unit, and for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built, then giving up
Similarity is abandoned more than non-core concept of the non-core concept to the average similarity of full dose concept set.
10. device according to claim 6, it is characterised in that second computing unit, for obtaining described any two
The each self-corresponding non-core concept of individual concept, obtains same concept in each self-corresponding non-core concept of any two concept
Quantity and any two concept in different concepts quantity, and according to the quantity and different concepts of the same concept
Quantity, obtain the similarity between any two concept, the similarity between any two concept is used to indicating described
Similarity degree between any two concept.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611220184.8A CN106650940B (en) | 2016-12-26 | 2016-12-26 | A kind of domain knowledge base construction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611220184.8A CN106650940B (en) | 2016-12-26 | 2016-12-26 | A kind of domain knowledge base construction method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106650940A true CN106650940A (en) | 2017-05-10 |
CN106650940B CN106650940B (en) | 2019-01-22 |
Family
ID=58826830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611220184.8A Active CN106650940B (en) | 2016-12-26 | 2016-12-26 | A kind of domain knowledge base construction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650940B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664595A (en) * | 2018-05-08 | 2018-10-16 | 和美(深圳)信息技术股份有限公司 | Domain knowledge base construction method, device, computer equipment and storage medium |
CN112699909A (en) * | 2019-10-23 | 2021-04-23 | 中移物联网有限公司 | Information identification method and device, electronic equipment and computer readable storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117281A (en) * | 2009-12-30 | 2011-07-06 | 北京亿维讯科技有限公司 | Method for constructing domain ontology |
CN102214232A (en) * | 2011-06-28 | 2011-10-12 | 东软集团股份有限公司 | Method and device for calculating similarity of text data |
CN102306182A (en) * | 2011-08-30 | 2012-01-04 | 西华大学 | Method for excavating user interest based on conceptual semantic background image |
CN102637163A (en) * | 2011-01-09 | 2012-08-15 | 华东师范大学 | Method and system for controlling multi-level ontology matching based on semantemes |
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN104636430A (en) * | 2014-12-30 | 2015-05-20 | 东软集团股份有限公司 | Case knowledge base representation and case similarity obtaining method and system |
CN104715042A (en) * | 2015-03-24 | 2015-06-17 | 清华大学 | Conceptual design knowledge representation method and knowledge management system based on ontology |
CN104809128A (en) * | 2014-01-26 | 2015-07-29 | 中国科学院声学研究所 | Method and system for acquiring statement emotion tendency |
CN105912637A (en) * | 2016-04-08 | 2016-08-31 | 西藏飞跃智能科技有限公司 | Knowledge-based user interest mining method |
-
2016
- 2016-12-26 CN CN201611220184.8A patent/CN106650940B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117281A (en) * | 2009-12-30 | 2011-07-06 | 北京亿维讯科技有限公司 | Method for constructing domain ontology |
CN102637163A (en) * | 2011-01-09 | 2012-08-15 | 华东师范大学 | Method and system for controlling multi-level ontology matching based on semantemes |
CN102214232A (en) * | 2011-06-28 | 2011-10-12 | 东软集团股份有限公司 | Method and device for calculating similarity of text data |
CN102306182A (en) * | 2011-08-30 | 2012-01-04 | 西华大学 | Method for excavating user interest based on conceptual semantic background image |
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN104809128A (en) * | 2014-01-26 | 2015-07-29 | 中国科学院声学研究所 | Method and system for acquiring statement emotion tendency |
CN104636430A (en) * | 2014-12-30 | 2015-05-20 | 东软集团股份有限公司 | Case knowledge base representation and case similarity obtaining method and system |
CN104715042A (en) * | 2015-03-24 | 2015-06-17 | 清华大学 | Conceptual design knowledge representation method and knowledge management system based on ontology |
CN105912637A (en) * | 2016-04-08 | 2016-08-31 | 西藏飞跃智能科技有限公司 | Knowledge-based user interest mining method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664595A (en) * | 2018-05-08 | 2018-10-16 | 和美(深圳)信息技术股份有限公司 | Domain knowledge base construction method, device, computer equipment and storage medium |
CN108664595B (en) * | 2018-05-08 | 2020-10-16 | 和美(深圳)信息技术股份有限公司 | Domain knowledge base construction method and device, computer equipment and storage medium |
CN112699909A (en) * | 2019-10-23 | 2021-04-23 | 中移物联网有限公司 | Information identification method and device, electronic equipment and computer readable storage medium |
CN112699909B (en) * | 2019-10-23 | 2024-03-19 | 中移物联网有限公司 | Information identification method, information identification device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106650940B (en) | 2019-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pedraza-Fariña et al. | A network theory of patentability | |
CN108388876A (en) | A kind of image-recognizing method, device and relevant device | |
CN106250707A (en) | A kind of based on degree of depth learning algorithm process head construction as the method for data | |
CN109918511A (en) | A kind of knowledge mapping based on BFS and LPA is counter to cheat feature extracting method | |
CN107818815A (en) | The search method and system of electronic health record | |
CN106845061A (en) | Intelligent interrogation system and method | |
CN105528422A (en) | Focused crawler processing method and apparatus | |
CN107506350A (en) | A kind of method and apparatus of identification information | |
CN108597605A (en) | A kind of life big data acquisition of personal health and analysis system | |
CN107819790A (en) | The recognition methods of attack message and device | |
CN106650940A (en) | Field knowledge base establishment method and device | |
CN108133752A (en) | A kind of optimization of medical symptom keyword extraction and recovery method and system based on TFIDF | |
CN113707299A (en) | Auxiliary diagnosis method and device based on inquiry session and computer equipment | |
Arshad et al. | A comprehensive knowledge management process framework for healthcare information systems in healthcare industry of Pakistan | |
CN104536972B (en) | Web page contents sensory perceptual system based on CDN and method | |
CN107967332A (en) | Enterprise's address recognition methods and identifying system | |
Gerteis et al. | Nationalism in America: the case of the Populist movement | |
CN102982011B (en) | A kind of method and apparatus for recognizing out-of-sequence text | |
CN109616165A (en) | Medical information methods of exhibiting and device | |
CN111859238A (en) | Method and device for predicting data change frequency based on model and computer equipment | |
CN106487540A (en) | A kind of rules process method and equipment | |
CN107527289A (en) | A kind of investment combination industry distribution method, apparatus, server and storage medium | |
CN109657907A (en) | Method of quality control, device and the terminal device of geographical national conditions monitoring data | |
CN110010231A (en) | A kind of data processing system and computer readable storage medium | |
CN109299081A (en) | Clean method, apparatus, computer equipment and the storage medium of room rate data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |