CN106650940A - Field knowledge base establishment method and device - Google Patents

Field knowledge base establishment method and device Download PDF

Info

Publication number
CN106650940A
CN106650940A CN201611220184.8A CN201611220184A CN106650940A CN 106650940 A CN106650940 A CN 106650940A CN 201611220184 A CN201611220184 A CN 201611220184A CN 106650940 A CN106650940 A CN 106650940A
Authority
CN
China
Prior art keywords
concept
core
similarity
key
core concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611220184.8A
Other languages
Chinese (zh)
Other versions
CN106650940B (en
Inventor
王书剑
张霞
赵立军
崔朝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201611220184.8A priority Critical patent/CN106650940B/en
Publication of CN106650940A publication Critical patent/CN106650940A/en
Application granted granted Critical
Publication of CN106650940B publication Critical patent/CN106650940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention provides a field knowledge base establishment method and device. After a core concept in a current to-be-established field and a target text in which the core concept is located are obtained, at least one non-core concept can be obtained from the target text and the similarity between the core concept and the non-core concept can be obtained; when the similarity satisfies a preset condition, whether the non-core concept is the same to the concept existing in a field knowledge base of the to-be-established field or not is judged; if the non-core concept is different from the concept existing in the field knowledge base of the to-be-established field; the non-core concept is reserved in the field knowledge base of the to-be-established field; the non-core concept is taken as a new core concept, a target text in which the new core concept is located is obtained; the step of obtaining the at least one non-core concept form the target text is carried out again; and after all concepts in the knowledge base of the to-be-established field are obtained, a relation between any two concepts is obtained, so the field knowledge base of the to-be-established field is obtained, and the knowledge base of the to-be-established field is established automatically.

Description

A kind of domain knowledge base construction method and device
Technical field
The invention belongs to technical field of information processing, in particular, more particularly to a kind of domain knowledge base construction method And device.
Background technology
Domain knowledge base is the set of the relation between concept included in field and concept, and wherein concept is belonging to it The knowledge in field, can indicate, the relation between concept is the similarity degree between concept, can be passed through by the entry in field Representing, such as the domain knowledge base of financial field, the entry such as finance, economics and circulation can be used as its institute for numerical value Concept in the domain knowledge base in category field, with the development of information, domain knowledge base can make knowledge information and ordering, And using the shared of knowledge and exchange.
At present the structure of domain knowledge base typically by the expert in field or is engaged in the personnel of editing completing, i.e., It is the computer form of expression to understand by the knowledge transformation in human brain by expert or the personnel for being engaged in editing, for example For the domain knowledge base of financial field, its structure can be completed by economist, and such economist is according to oneself Professional knowledge provide relation between the concept and concept of financial field, so as to obtain the domain knowledge base of financial field, but Be it is this built by personnel domain knowledge base need to take considerable time, energy and cost, subsequently again to domain knowledge base In content update when, need to be updated the personnel of work and be fully understood by just to carry out more to having in domain knowledge base Newly, therefore the existing this maintenance for being unfavorable for domain knowledge base by way of personnel are to build domain knowledge base.
The content of the invention
In view of this, it is an object of the invention to provide a kind of domain knowledge base construction method and device, for automatic structure The knowledge base in any one field is built, so as to solve the problems, such as to be brought by personnel's manual construction, specific technical scheme is as follows:
The present invention provides a kind of domain knowledge base construction method, and methods described includes:
Obtain the key concept and key concept place target text in current field to be built;
At least one non-core concept is obtained from the target text, the non-core concept is from the target text In the concept in full dose concept set that extracts, the full dose concept set is combined into the field to be built and field center The set of heart concept and non-core concept;
Obtain the similarity of the key concept and the non-core concept;
When the similarity of the key concept and the non-core concept meets pre-conditioned, judge described non-core general Whether identical with the concept being present in the domain knowledge base in the field to be built read, if it is not, then meeting pre- by described If the non-core concept of condition is retained in the domain knowledge base in the field to be built, and using the non-core concept as new Key concept, obtain the new key concept place target text, return to perform and obtain at least from the target text The step of one non-core concept, if it is, give up described meeting pre-conditioned non-core concept;
After all concepts in the domain knowledge base for getting the field to be built, obtain between any two concept Relation, so as to obtain the domain knowledge base in the field to be built, all concepts include all of the field to be built Key concept and all non-core concepts.
Preferably, the similarity for obtaining the key concept and the non-core concept, including:
When the concept that the key concept is the 1st acquisition, the non-core concept place target text is obtained, from institute To state obtained in the target text of non-core concept place and be located at least one of the full dose concept set the first concept, and according to Described at least one first concepts and the corresponding at least one non-core concept of the key concept, obtain the key concept and The similarity of the non-core concept;
When the key concept be the non-core concept that obtains i & lt as new key concept when, from described new The corresponding non-core concept place target text of key concept in obtain and be located at least one of described full dose concept set Second concept, and according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept The similarity obtained with i & lt, obtains the new key concept and the corresponding non-core concept of the new key concept Similarity, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain it is non- Similarity between key concept, 1≤i≤N, N=M-1, M be get it is all general in the knowledge base in the field to be built When reading, the total degree of non-core concept is obtained.
Preferably, it is described corresponding at least one non-core according to described at least one first concepts and the key concept Concept, obtains the similarity of the key concept and the non-core concept, including:
Obtain identical in described at least one first concepts at least one non-core concept corresponding with the key concept The first concept quantity and at least one first concept and the key concept it is corresponding at least one non-core general Concept sum in thought, wherein concept sum is the quantity of the concept of the identical first and described at least one first general Read the quantity sum with different concepts in the corresponding at least one non-core concept of the key concept;
According to the quantity of the concept of the identical first and concept sum, the key concept and described non-core general is obtained The similarity of thought;
It is described according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept The similarity obtained with i & lt, obtains the new key concept and the corresponding non-core concept of the new key concept Similarity, including:
In obtaining described at least one second concepts and the corresponding at least one non-core concept of the new key concept The quantity of the concept of identical second and at least one second concept and the new key concept corresponding at least one Concept sum in non-core concept, wherein described at least one second concepts and the new key concept corresponding at least Quantity and at least one second concept and institute of the concept sum in individual non-core concept for the concept of the identical second State the quantity sum of different concepts in the corresponding at least one non-core concept of new key concept;
According to the quantity and at least one second concept and the new key concept of the concept of the identical second Concept sum in corresponding at least one non-core concept, obtains the new key concept and the new key concept pair First similarity of the non-core concept answered;
According to the similarity that first similarity and i & lt are obtained, the new key concept and described new is obtained The similarity of the corresponding non-core concept of key concept.
Preferably, it is described when the similarity of the key concept and the non-core concept meets pre-conditioned, judge Whether the non-core concept is identical with the concept being present in the domain knowledge base in the field to be built, if it is not, then Meet described pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and by the non-core Heart concept obtains the new key concept place target text as new key concept, returns and performs from target text The step of at least one non-core concept is obtained in this, if it is, give up described meeting pre-conditioned non-core concept, bag Include:
Obtain the similarity of the non-core concept and each concept in full dose concept set;
The similarity of each concept in the non-core concept and full dose concept set, obtains described non-core general Read the average similarity to full dose concept set;
When the similarity of the key concept and the non-core concept is more than the non-core concept to full dose concept set During the average similarity of conjunction, judge the non-core concept whether be present in the domain knowledge base in the field to be built Concept it is identical;
If it is not, then by similarity more than the non-core concept to the non-core of the average similarity of full dose concept set Concept is retained in the domain knowledge base in the field to be built, and the non-core concept is obtained as new key concept The new key concept place target text is taken, is returned and is performed at least one non-core concept of acquisition from the target text The step of;
If it is, giving up similarity more than non-core of the non-core concept to the average similarity of full dose concept set Heart concept.
Preferably, after all concepts in the knowledge base for getting the field to be built, any two is obtained Relation between concept, including:
Obtain each self-corresponding non-core concept of any two concept;
Obtain in each self-corresponding non-core concept of any two concept quantity of same concept and it is described arbitrarily The quantity of different concepts in two concepts;
According to the quantity and the quantity of different concepts of the same concept, obtain similar between any two concept Degree, the similarity between any two concept is used to indicate the similarity degree between any two concept.
The present invention also provides a kind of domain knowledge base construction device, and described device includes:
First acquisition unit, for obtaining current field to be built in key concept and key concept place mesh Mark text;
Second acquisition unit, it is described non-core general for obtaining at least one non-core concept from the target text Thought is the concept in full dose concept set extracted from the target text, and the full dose concept set is combined into described treating The set of key concept and non-core concept in structure field and field;
First computing unit, for obtaining the similarity of the key concept and the non-core concept;
Processing unit, for when the similarity of the key concept and the non-core concept meets pre-conditioned, sentencing Whether the non-core concept of breaking is identical with the concept being present in the domain knowledge base in the field to be built, if not, Then meet described pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and will be described non- Key concept triggers the first acquisition unit as new key concept, if it is, give up described meeting pre-conditioned Non-core concept;
Second computing unit, for all concepts in the domain knowledge base for getting the field to be built after, obtain The relation between any two concept is taken, so as to obtain the domain knowledge base in the field to be built, all concepts include institute State all key concepts and all non-core concepts in field to be built.
Preferably, first computing unit, for when the key concept is the concept that the 1st time obtains, obtaining institute Non-core concept place target text is stated, is obtained from the non-core concept place target text and is located at the full dose concept set At least one of conjunction the first concept, and according to described at least one first concepts and the key concept corresponding at least one Non-core concept, obtains the similarity of the key concept and the non-core concept, and for being when the key concept The non-core concept that i & lt is obtained as new key concept when, it is corresponding non-core general from the new key concept Read to be obtained in the target text of place and be located at least one of the full dose concept set the second concept, and according to described at least one The similarity that the corresponding at least one non-core concept of individual second concept, the new key concept and i & lt are obtained, obtains institute The similarity of new key concept and the corresponding non-core concept of the new key concept is stated, the similarity that i & lt is obtained is Similarity between the non-core concept that the corresponding key concept of non-core concept and i & lt that i & lt is obtained is obtained, 1≤i≤ When N, N=M-1, M are to get all concepts in the knowledge base in the field to be built, total time of non-core concept is obtained Number.
Preferably, first computing unit, including:
First obtains subelement, described non-core general for when the key concept is the concept that the 1st time obtains, obtaining Place target text is read, is obtained from the non-core concept place target text and is located in the full dose concept set at least One the first concept, and for when the key concept be the non-core concept that obtains i & lt as new core it is general When reading, obtain from the corresponding non-core concept place target text of the new key concept and be located at the full dose concept set At least one of the second concept;
Second obtains subelement, for obtaining described at least one first concepts and the key concept corresponding at least The quantity of the concept of identical first and at least one first concept are corresponding with the key concept in individual non-core concept At least one non-core concept in concept sum, wherein the concept sum for the concept of the identical first quantity and In described at least one first concepts and the corresponding at least one non-core concept of the key concept quantity of different concepts it With;
First computation subunit, for according to the quantity of the concept of the identical first and concept sum, obtaining the core The similarity of heart concept and the non-core concept;
3rd obtains subelement, corresponding extremely for obtaining described at least one second concepts and the new key concept The quantity and at least one second concept and the new core of the concept of identical second in a few non-core concept Concept sum in the corresponding at least one non-core concept of concept, wherein described at least one second concepts and the new core In the corresponding at least one non-core concept of heart concept concept sum for the concept of the identical second quantity and it is described extremely The quantity sum of different concepts in few second concept and the corresponding at least one non-core concept of the new key concept;
Second computation subunit, for according to the quantity of the concept of the identical second and at least one second concept In at least one non-core concept corresponding with the new key concept concept sum, obtain the new key concept and First similarity of the corresponding non-core concept of the new key concept;
3rd computation subunit, for the similarity obtained according to first similarity and i & lt, obtains described new The similarity of key concept and the corresponding non-core concept of the new key concept.
Preferably, the processing unit, including:
4th computation subunit, it is similar to each concept in full dose concept set for obtaining the non-core concept Degree;
5th computation subunit, for similar to each concept in full dose concept set according to the non-core concept Degree, obtains average similarity of the non-core concept to full dose concept set;
Judgment sub-unit, the similarity for working as the key concept and the non-core concept is non-core general more than described When reading to the average similarity of full dose concept set, judge the non-core concept whether be present in the field to be built Domain knowledge base in concept it is identical;
Subelement is processed, for working as non-core concept with the concept being present in the domain knowledge base in field to be built not When identical, similarity is retained in more than the non-core concept to the non-core concept of the average similarity of full dose concept set In the domain knowledge base in the field to be built, and using the non-core concept as new key concept, triggering described first Acquiring unit, and for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built, Then give up similarity more than non-core concept of the non-core concept to the average similarity of full dose concept set.
Preferably, second computing unit, for obtaining each self-corresponding non-core concept of any two concept, Obtain the quantity and any two concept of same concept in each self-corresponding non-core concept of any two concept The quantity of middle different concepts, and the quantity of the quantity according to the same concept and different concepts, obtain any two general Similarity between thought, the similarity between any two concept is used to indicate the similarity degree between any two concept.
Compared with prior art, the above-mentioned technical proposal that the present invention is provided has the advantage that:
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding, Reduce the maintenance difficulties of domain knowledge base.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with basis The accompanying drawing that these accompanying drawings are obtained.
Fig. 1 is the flow chart of domain knowledge base construction method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart that similarity provided in an embodiment of the present invention is obtained;
Fig. 3 is another kind of flow chart that similarity provided in an embodiment of the present invention is obtained;
Fig. 4 is the structural representation of domain knowledge base construction device provided in an embodiment of the present invention;
Fig. 5 is the structural representation of the first computing unit in domain knowledge base construction device provided in an embodiment of the present invention;
Fig. 6 is the structural representation of processing unit in domain knowledge base construction device provided in an embodiment of the present invention.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art The all embodiments obtained under the premise of creative work is not made, belong to the scope of protection of the invention.
Fig. 1 is referred to, the flow chart of domain knowledge base construction method provided in an embodiment of the present invention is it illustrates, for certainly It is dynamic to build any one domain knowledge base, to solve the problems, such as because personnel's manual construction domain knowledge base brings.Specifically, scheme Domain knowledge base construction method shown in 1 may comprise steps of:
101:Obtain the key concept and key concept place target text in current field to be built.It is wherein to be built Field is certain specific area extracted from full dose knowledge base, and key concept is then representative in field to be built Concept, when the financial field for such as extracting from full dose knowledge base is used as field to be built, can be by " finance " this user Key concept of the known entry as financial field.And key concept place target text can be to core in some websites When the text that concept is explained, such as key concept are " finance ", its place target text can be Baidupedia or Wiki hundred The text that section explains to finance.
In embodiments of the present invention, the determination mode of key concept is:Obtain each concept each text in financial field The number of times occurred in this, it is key concept to choose concept of the occurrence number in the range of preset times, and wherein each concept can be with Obtain in each text by data grabber mode from financial field, naturally it is also possible to by the expert in field or be engaged in volume The personnel for collecting work specify manually, and after each concept in financial field is obtained, it is each in financial field to each concept The number of times occurred in individual text can be marked in full dose knowledge base is built, and preset times scope can be according to practical application Determine, or the determination mode of key concept is:The key concept in its research field is marked by expert.
Above-mentioned full dose knowledge base is the set of knowledge in field to be built and other field, including field to be built and other Relation in field between all concepts and concept, embodiment of the present invention domain knowledge base construction method is then based on full dose knowledge base In relation between the concept of domain knowledge base that obtains belonging to field to be built and concept.Wherein acquisition modes of full dose knowledge base Including but not limited to following manner:
A kind of acquisition modes are that full dose knowledge base is obtained by data grabber mode, specifically by web crawlers, from Info web is captured on internet, then the information that the info web of crawl is provided with portal website is compared, obtain text This information is stored in full dose knowledge base, and each entry in such text message can be considered as concept, and in same text Entry in information can be considered as the related concept of tool, and wherein portal website is to lead to the comprehensive internet information resource of certain class And application system about information service is provided;Or from existing full dose knowledge base website capture, such as Baidupedia or Capture in the websites such as wikipedia.
Another kind of acquisition modes obtain full dose knowledge base by artificial organ mode, specifically by expert or are engaged in editor's work The personnel of work enter edlin according to existing knowledge base and the instruction itself grasped, and one is realized entirely by many people's cooperations Amount knowledge base.This work is frequently not that personal and single tissue can be completed, so this artificial organ mode is all online To be cooperated, this full dose knowledge base website of such as Baidupedia is artificial to complete in online cooperation.
102:At least one non-core concept is obtained from target text, wherein non-core concept is to carry from target text The concept in full dose concept set taken out, full dose concept set is combined into key concept and non-core in field to be built and field The set of heart concept, therefore above-mentioned full dose knowledge base can be the set of the relation between the set of full dose concept and concept.
After target text is obtained, the entry with hyperlink connection function is obtained from target text, then by entry indication Show that concept is compared with the concept in full dose concept set, if certain in concept indicated by entry and full dose concept set is general Read identical, then using concept indicated by entry as non-core concept, wherein the entry with hyperlink connection function is in triggering entry The text explained to entry can be had access to afterwards.
Additionally, in addition to the mode of the non-core concept of above-mentioned acquisition, can also be by Chinese words segmentation to target text Word segmentation processing is carried out, the concept indicated by each entry for obtaining is compared with the concept in full dose concept set, if word Concept indicated by bar is identical with certain concept in full dose concept set, then using concept indicated by entry as non-core concept.
For example, when key concept is " finance ", the entry obtained from its place target text has:" circulation ", " develop gold Melt ", " evolution security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " currency ", " commodity ", If the concept that these entries are indicated is identical with certain concept in full dose concept set, the concept that above-mentioned entry is indicated point Not as non-core concept, if the concept that certain entry is indicated is different from each concept in full dose concept set, such as " intermediary Mechanism ", then can not be as non-core concept.
103:Obtain the similarity of key concept and non-core concept.Wherein similarity is used to indicate non-core concept and core The similarity degree of heart concept, using determine non-core concept whether can as the concept in the domain knowledge base in field to be built, The similarity of key concept and non-core concept can by cosine similarity, Pearson's similarity factor and Jaccard similarities come Obtain, the computation complexity and computational efficiency between Jaccard similarities is better than cosine similarity and Pearson's similarity factor, this Inventive embodiments are illustrated with Jaccard similarities to the similarity for obtaining key concept and non-core concept.
Wherein Jaccard similarities are used to calculate the similarity between the individuality of symbol tolerance or boolean's value metric, its correspondence Computing formula it is as follows:
WhereinRepresent concept set of a and o in X, a is key concept, o is non-core concept, X For full dose concept set, i.e., in embodiments of the present invention the calculating of the similarity of key concept and non-core concept can be:Point Not Huo Qu key concept and non-core concept concept set, the quantity of the friendship centralized concept of the two concept set is divided by union The quantity of middle concept is the similarity of key concept and non-core concept.
For example, o refers to " finance " this key concept, and it is non-core general that O refers to that " finance " is linked in full dose concept set X The set of thought, such as " circulation " above-mentioned, " evolution finance ", " evolution security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " currency ", " commodity ".
And a refers to " economist " this non-core concept, A refers to what " economist " was linked in full dose concept set X The set of other concepts, such as " currency ", " means of production ", " distribution ", " economics ", " commodity ".So, due to " currency ", " commodity " are both common factors, then above-mentioned computing formula Sima,oMolecule is 2.Union is " circulation ", and " evolution finance " " develops Security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " currency ", " commodity ", " means of production ", " distribution ", " economics ", then above-mentioned computing formula Sima,oDenominator is 13, then both similarities are 2/13 ≈ 0.154.
From above-mentioned computing formula Sima,oUnderstand, acquisition process such as Fig. 2 institutes of the similarity of key concept and non-core concept Show, may comprise steps of:
201:Non-core concept place target text is obtained, is obtained from the target text of non-core concept place and is located at full dose The concept of at least one of concept set first.
In embodiments of the present invention, the acquisition modes of at least one first concepts non-core concept corresponding with key concept Acquisition modes it is identical, this is no longer described in detail, by taking " economist " this non-core concept as an example, at least one first of acquisition Concept has respectively:" currency ", " means of production ", " distribution ", " economics ", " commodity ".
202:Obtain identical at least one first concepts and the corresponding at least one non-core concept of key concept Concept in the quantity of one concept and at least one first concepts and the corresponding at least one non-core concept of key concept is total Number, wherein concept sum is the quantity and at least one first concepts and key concept corresponding at least of the concept of identical first The quantity sum of different concepts in individual non-core concept.
With above-mentioned " finance " as key concept, the first concept be by the economist in financial this key concept this The concept that non-core concept is obtained, accordingly, the corresponding at least one non-core concept of this key concept of finance has:" stream It is logical ", " evolution finance ", " evolution security ", " draft bank ", " draft ", " silver ", " intermediary ", " economist ", " goods Coin ", " commodity ", at least one first concepts that economist this non-core concept is obtained have:" currency ", " means of production ", " distribution ", " economics ", " commodity ", then the concept of identical first is " currency, commodity " in the two concept set, then identical The quantity of the first concept is 2, and the quantity of different concepts is 11, then concept sum is 13.
203:According to the quantity of the concept of identical first and concept sum, the similar of key concept and non-core concept is obtained Degree, to be realized according to the corresponding at least one non-core concept of the first concept and key concept by step 202 and step 203, Obtain the similarity of key concept and non-core concept.
Here it should be noted that:When needing to obtain the similarity of key concept and certain non-core concept, its root According to the concept of identical first quantity and concept sum be the corresponding information of this non-core concept, rather than other are non-core general Read corresponding information, such as when needing to obtain the similarity of key concept " finance " and non-core concept " economist ", phase The quantity and concept sum of the first same concept is the corresponding information of non-key concept " economist ".
104:Judge whether key concept and the similarity of non-core concept meet pre-conditioned, if it is, execution step 105, if not, execution step 108.When the similarity of key concept and non-core concept meets pre-conditioned, non-core is indicated Heart concept is the concept in the domain knowledge base in field to be built;When the similarity of key concept and non-core concept be unsatisfactory for it is pre- If during condition, indicating that non-core concept is not the concept in the domain knowledge base in field to be built.
In embodiments of the present invention, a kind of pre-conditioned feasible pattern is:Non-core concept is to full dose concept set Average similarity, its acquisition process is:Obtain the similarity of non-core concept and each concept in full dose concept set, and root According to the similarity of each concept in non-core concept and full dose concept set, non-core concept is obtained to full dose concept set Average similarity, specific computing formula is as follows:
If full dose concept set is combined into X={ x1,x2,...xn, xiRepresent i-th concept in full dose concept set X, then it is non- Key concept a is as follows for the formula of the average similarity of full dose concept set:
Sim(a,xi) it is non-core concept a and xiSimilarity, its computing formula can refer to Sima,oComputing formula, When key concept and non-core concept similarity more than non-core concept to the average similarity of full dose concept set when, judge The similarity of key concept and non-core concept meets pre-conditioned, when key concept and non-core concept similarity be less than or During equal to non-core concept to the average similarity of full dose concept set, the similarity of key concept and non-core concept is judged not Meet pre-conditioned.
105:When the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with The concept being present in the domain knowledge base in field to be built is identical, if not, execution step 106, if it is, execution step 107。
106:The non-core concept for meeting pre-conditioned is retained in the domain knowledge base in field to be built, and by non-core Heart concept obtains new key concept place target text as new key concept, and continues executing with step 102.
107:Give up and meet pre-conditioned non-core concept, and execution step 109.
When the similarity of key concept and non-core concept meets pre-conditioned, indicate that non-core concept is neck to be built Concept in the domain knowledge base in domain, but also need to whether to have had in the domain knowledge base for determine whether field to be built Same concept, if it is, illustrating that this non-core concept is had been written into domain knowledge base, now can hold Row step 107 is given up, to avoid domain knowledge base in concept repetition, if it is not, then illustrating this non-core concept not In being written to domain knowledge base, then execution step 106 is retained in domain knowledge base, and general as new core Read, in obtaining new key concept place target text, continuation obtains at least one from new key concept place target text Individual non-core concept, that is, continue to obtain other concepts in the domain knowledge base in field to be built to improve domain knowledge base.
It is pre-conditioned for non-core concept to the average similarity of full dose concept set when, what step 107 was given up is phase It is more than non-core concept of the non-core concept to the average similarity of full dose concept set like degree, what corresponding step 106 retained It is that similarity is more than non-core concept of the non-core concept to the average similarity of full dose concept set, it is possible to which similarity is big In non-core concept to the non-core concept of the average similarity of full dose concept set as new key concept.
108:Give up and be unsatisfactory for pre-conditioned non-core concept, and execution step 109.When key concept and non-core general When the similarity of thought is unsatisfactory for pre-conditioned, indicate that non-core concept is not the concept in the domain knowledge base in field to be built, Now can directly give up and be unsatisfactory for pre-conditioned non-core concept, such as directly give up similarity general less than or equal to non-core Read the non-core concept of the average similarity to full dose concept set
109:After all concepts in the domain knowledge base for getting field to be built, obtain between any two concept Relation, so as to obtain the domain knowledge base in field to be built, wherein all concepts include all key concepts in field to be built With all non-core concepts.
In embodiments of the present invention, if having given up all non-core of step 102 acquisition by step 107 and step 108 Concept, represents that remaining all non-core concepts are had been written into domain knowledge base, further relates to get domain knowledge All concepts in storehouse, now can further obtain the relation between any two concept, complete the structure of domain knowledge base.
If step 106 still has non-core concept as new key concept, illustrate still have non-core concept to be not written to In domain knowledge base, then continue non-core concept as new key concept, execution step 102, to improve domain knowledge base.
In embodiments of the present invention, the relation between any two concept can be subordinate relation or same level relation, such as The relation between non-core concept under key concept and key concept can be subordinate relation, and same key concept is multiple Relation between non-core concept can be same level relation.
Relation between certain any two concept can be indicated with the similarity between any two concept, wherein any two Similarity between individual concept can be obtained by cosine similarity, Pearson's similarity factor and Jaccard similarities, between The computation complexity and computational efficiency of Jaccard similarities is implemented better than cosine similarity and Pearson's similarity factor, the present invention Example is illustrated with Jaccard similarities obtaining the similarity between any two concept.
If the concept set of the domain knowledge base in field to be built is combined into S, in any two concept a concept is a, separately One concept is b, and the calculating formula of similarity between concept a and concept b is as follows:
Wherein,Represent concept set of a and b in S.
For example, the concept in the domain knowledge base in field to be built has " finance ", " economist ", " economics ", " goods Coin ", " commodity ", " stock ", " market ".Wherein a is " economist ", and A refers to " economist " as key concept, knows in field Know the set of the corresponding non-core concept of key concept " economist " in concept S in storehouse.Originally, " economist " was as core The non-core concept obtained during heart concept has:" currency ", " means of production ", " distribution ", " economics ", " commodity ", but in Jing Cross after process and be retained in having in domain knowledge base:" currency ", " economics ", then " commodity ", this concept set of A includes " currency, economics, commodity " these three concepts.
B is " market ", and B refers to that with " market " as key concept key concept " market " is right in concept S of domain knowledge base The set of the non-core concept answered.Originally, the non-core concept that " market " obtained when as key concept has:" stock ", " hands over Easily ", " value ", " commodity ", but having in domain knowledge base is being retained in after treatment:" stock ", " commodity ", then B this Individual concept set includes " stock, commodity " these three concepts.
So, because " commodity " are the common factors of set A and set B, then computing formula Sima,bMiddle molecule is 1.Set A and The union of set B is " currency ", " economics ", " commodity ", " stock ", then computing formula Sima,bMiddle denominator be 4, then concept a and Similarity between concept b is 1/4 ≈ 0.25, and thus, the similarity relation for obtaining economist and market is 0.25.
Can be drawn by above-mentioned computing formula, indicated with the similarity between any two concept between any two concept Relation when, the acquisition modes of the relation between any two concept can be:Obtain each self-corresponding non-core of any two concept Heart concept, in each self-corresponding non-core concept of acquisition any two concept in the quantity and any two concept of same concept The quantity of different concepts, and the quantity of the quantity according to same concept and different concepts, obtain similar between any two concept Degree, the similarity between any two concept is used to indicate the similarity degree between any two concept.
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding, Reduce the maintenance difficulties of domain knowledge base.
Here it should be noted is that:When the similarity of key concept and non-core concept is obtained, if core is general Read the concept for the 1st acquisition, i.e., be not by the non-core concept for obtaining as new key concept when, can pass through above-mentioned Computing formula Sima,oTo obtain, but when key concept be using i & lt obtain non-core concept as new key concept When, then need to consider similarity transmission, such as it is public in similarity when calculating the similarity of above-mentioned " economist " and " economics " The similarity for considering " economist " and " finance " is needed in formula, wherein 1≤i≤N, N=M-1, M are described to be built to get During all concepts in the knowledge base in field, the total degree of non-core concept is obtained.
Why consider that similarity transmission is because with the increase of Email Filtering, the non-core concept of acquisition may be with The key concept of the 1st acquisition is unrelated, for this kind of non-core concept can not be written in domain knowledge base, but not In the case of considering that similarity is transmitted, this non-core concept meets the pre-conditioned of embodiment of the present invention setting, so as to can be by It is retained in domain knowledge base, causes to exist in domain knowledge base the concept for being not belonging to the field, is this embodiment of the present invention Consider similarity transmission so that non-core concept is closed with the key concept for obtaining before by its own corresponding key concept Connection, reduces the presence of the probability of erroneous picture in domain knowledge base, accordingly, the non-core concept that i & lt is obtained as New key concept, new key concept and the calculating process of the similarity of non-core concept is as shown in figure 3, can include following Step:
301:Obtain from the new corresponding non-core concept place target text of key concept and be located at full dose concept set At least one of the second concept.In embodiments of the present invention, the acquisition modes of at least one second concepts and key concept pair The acquisition modes of the non-core concept answered are identical, and this is no longer described in detail, and still by taking above-mentioned finance and economist as an example, finance is the The key concept of 1 acquisition, economist is the non-core concept for obtaining for the 1st time, can as new key concept, When economist is as new key concept, the non-core concept for obtaining has:" currency ", " means of production ", " distribution " is " economical Learn ", " commodity " are then obtained and be located in each non-core concept place target text at least one of full dose concept set the Two concepts, that is, obtain the set of the second concept of each non-core concept.
302:Obtain identical at least one second concepts, at least one non-core concept corresponding with new key concept The second concept quantity and at least one second concepts and the corresponding at least one non-core concept of new key concept in Concept sum, the concept in the concept of wherein at least one second and the corresponding at least one non-core concept of new key concept Sum is the quantity and at least one second concepts and corresponding at least one non-core of new key concept of the concept of identical second The quantity sum of different concepts in heart concept.
It is understandable that:Obtain at least one second concepts and new key concept is corresponding at least one non-core general The quantity of the concept of identical second and at least one second concepts and corresponding at least one non-core of new key concept in thought Concept sum in heart concept is:In units of the set of each the second concept, gathering and new for each the second concept is obtained The quantity and the quantity of the second different concepts of the concept of identical second in the corresponding at least one non-core concept of key concept, So by the quantity and the quantity of the second different concepts of the concept of identical second of the set of each the second concept, obtain right The concept sum of the set of the second concept answered.
303:It is corresponding extremely according to the quantity and at least one second concepts and new key concept of the concept of identical second Concept sum in a few non-core concept, obtains new key concept and the corresponding non-core concept of new key concept First similarity, its corresponding computing formula is:
Wherein, bnFor the new key concept that n-th is obtained, its corresponding non-core concept is a, Represent a and bnConcept set in S, A ∩ C represent the quantity of the concept of identical second, and A ∪ C represent that at least one second is general Read and the concept sum in the new corresponding at least one non-core concept of key concept.
304:According to the similarity that the first similarity and i & lt are obtained, new key concept and new key concept is obtained The similarity of corresponding non-core concept, to be realized according at least one second concepts, new by step 302 to step 304 The similarity that the corresponding at least one non-core concept of key concept and i & lt are obtained, obtains new key concept and new core The similarity that the similarity of the corresponding non-core concept of heart concept, wherein i & lt are obtained is the non-core concept pair that i & lt is obtained Similarity between the non-core concept that the key concept answered and i & lt are obtained.
Below X is combined into full dose concept set, the key concept of the 1st acquisition is o, and non-core concept is a, used as new core The collection of heart concept is combined into B, and B={ b1,b2,...bn, wherein biFor the new key concept that i & lt is obtained, then b1For the 1st time The new key concept for obtaining, i.e. the non-core concept institute that key concept o is obtained as new key concept, then the phase of o and a It is as follows like degree formula:
WhereinFor the similarity that i & lt is obtained.
Here it should be noted is that:It is determined that during key concept in field to be built, may can determine whether multiple Key concept, is that this can choose a key concept from multiple key concepts, and obtains selected key concept place Target text, naturally it is also possible to process parallel or successively to multiple key concepts, parallel or general to multiple cores successively When thought is processed, after the non-core concept of any one key concept is got, need corresponding with other key concepts Non-core concept is compared, with only non-core in the non-core concept in any two or multiple key concepts general Thought is processed.
For aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but It is that those skilled in the art should know, the present invention is not limited by described sequence of movement, because according to the present invention, certain A little steps can adopt order or while carry out.Secondly, those skilled in the art also should know, described in this description Embodiment belongs to preferred embodiment, and involved action and the module not necessarily present invention is necessary.
Fig. 4 is referred to, the structure of domain knowledge base construction device provided in an embodiment of the present invention is it illustrates, can be wrapped Include:First acquisition unit 11, second acquisition unit 12, the first computing unit 13, the computing unit 15 of processing unit 14 and second.
First acquisition unit 11, for obtaining current field to be built in key concept and key concept place target Text.
Field wherein to be built is certain specific area extracted from full dose knowledge base, and key concept is then to be built Representative concept in field, when the financial field for such as extracting from full dose knowledge base is used as field to be built, can Using by key concept of the entry as financial field known to " finance " this user.And key concept place target text can be with It is key concept is explained text in some websites, when such as key concept is " finance ", its place target text can be with It is text that Baidupedia or wikipedia are explained to finance.
For how determining that it is related in embodiment of the method that key concept and the acquisition modes of full dose knowledge base can be referred to Illustrate, this embodiment of the present invention is no longer illustrated.
Second acquisition unit 12, for from target text obtain at least one non-core concept, non-core concept be from The concept in full dose concept set extracted in target text, full dose concept set is combined into field to be built and field center The set of heart concept and non-core concept, therefore above-mentioned full dose knowledge base can be the relation between the set of full dose concept and concept Set, for second acquisition unit 12, its mode for obtaining at least one non-core concept is referred in embodiment of the method Related description, this embodiment of the present invention is no longer illustrated.
First computing unit 13, for obtaining the similarity of key concept and non-core concept.Wherein similarity is used to refer to Show the similarity degree of non-core concept and key concept, to determine whether non-core concept can be used as the field in field to be built The similarity of the concept in knowledge base, key concept and non-core concept can by cosine similarity, Pearson's similarity factor and Obtaining, the computation complexity and computational efficiency between Jaccard similarities is better than cosine similarity and skin for Jaccard similarities Ademilson similarity factor, the embodiment of the present invention is entered with Jaccard similarities to the similarity for obtaining key concept and non-core concept Row explanation.
Accordingly, when the concept that key concept is the 1st acquisition, the first computing unit 13 is used to obtain non-core concept Place target text, obtains general positioned at least one of full dose concept set first from the target text of non-core concept place Read, and according to the corresponding at least one non-core concept of at least one first concepts and key concept, obtain key concept and non- The similarity of key concept.
When key concept be the non-core concept that obtains i & lt as new key concept when, the first computing unit 13 be used to being obtained from the new corresponding non-core concept place target text of key concept in the full dose concept set to Few second concept, and according at least one second concepts, the corresponding at least one non-core concept of new key concept and The similarity that i & lt is obtained, obtains the similarity of new key concept and the corresponding non-core concept of new key concept, will I & lt obtain similarity be delivered to i & lt acquisition non-core concept as the new corresponding similarity of key concept, make Non-core concept is obtained with the key concept for obtaining before by its own corresponding key concept association, in reducing domain knowledge base There is the probability of erroneous picture.
The similarity that wherein i & lt is obtained is that the corresponding key concept of non-core concept that i & lt is obtained and i & lt are obtained Non-core concept between similarity, 1≤i≤N, N=M-1, M be get it is all general in the knowledge base in field to be built When reading, the total degree of non-core concept is obtained.
Corresponding, the structure of the first computing unit 13 is as shown in figure 5, can include:First obtain subelement 131, the Two obtain subelement 132, the first computation subunit the 133, the 3rd obtains subelement 134, the second computation subunit 135 and the 3rd meter Operator unit 136.
First obtains subelement 131, for when key concept is the concept that the 1st time obtains, obtaining non-core concept institute In target text, obtain general positioned at least one of full dose concept set first from the target text of non-core concept place Read, and for when key concept be the non-core concept that obtains i & lt as new key concept when, from new core Obtain in the corresponding non-core concept place target text of concept and be located at least one of full dose concept set the second concept.
Second obtains subelement 132, corresponding at least one non-for obtaining at least one first concepts and key concept The quantity of the concept of identical first and at least one first concepts and key concept are corresponding at least one non-in key concept Concept sum in key concept, wherein concept sum are the quantity and at least one first concepts and core of the concept of identical first The quantity sum of different concepts in the corresponding at least one non-core concept of heart concept.
First computation subunit 133, for according to the quantity of the concept of identical first and concept sum, obtaining key concept With the similarity of non-core concept.
3rd obtains subelement 134, for obtaining at least one second concepts and new key concept corresponding at least The quantity of the concept of identical second and at least one second concepts and new key concept are corresponding extremely in individual non-core concept Concept sum in a few non-core concept, the concept of wherein at least one second and new key concept corresponding at least one Concept sum in non-core concept is the quantity and at least one second concepts and new key concept of the concept of identical second The quantity sum of different concepts in corresponding at least one non-core concept.
Second computation subunit 135, for according to the quantity of the concept of identical second and at least one second concepts and new The corresponding at least one non-core concept of key concept in concept sum, obtain new key concept and new key concept First similarity of corresponding non-core concept.
3rd computation subunit 136, for the similarity obtained according to the first similarity and i & lt, obtains new core The similarity of concept and the corresponding non-core concept of new key concept.
In embodiments of the present invention, first the acquisition subelement 132 of subelement 131, second, the first computation subunit are obtained 133rd, the 3rd obtain subelement 134, the concrete implementation procedure of the second computation subunit 135 and the 3rd computation subunit 136 and Illustrate, refer to the related description of embodiment of the method part, this embodiment of the present invention is no longer illustrated.
Processing unit 14, for when the similarity of key concept and non-core concept meets pre-conditioned, judging non-core Whether heart concept is identical with the concept being present in the domain knowledge base in field to be built, if it is not, then default bar will be met The non-core concept of part is retained in the domain knowledge base in field to be built, and using non-core concept as new key concept, Triggering first acquisition unit 11, if it is, give up meeting pre-conditioned non-core concept.
In embodiments of the present invention, a kind of pre-conditioned feasible pattern is:Non-core concept is to full dose concept set Average similarity, the structure of corresponding processing unit 14 is as shown in fig. 6, can include:4th computation subunit the 141, the 5th is counted Operator unit 142, judgment sub-unit 143 and process subelement 144.
4th computation subunit 141, it is similar to each concept in full dose concept set for obtaining non-core concept Degree.
5th computation subunit 142, for similar to each concept in full dose concept set according to non-core concept Degree, obtains average similarity of the non-core concept to full dose concept set.
Judgment sub-unit 143, for being more than non-core concept to full dose when the similarity of key concept and non-core concept During the average similarity of concept set, judge non-core concept whether be present in the domain knowledge base in field to be built Concept is identical.
Process subelement 144, for when non-core concept be present in it is general in the domain knowledge base in field to be built When thought is differed, similarity is retained in more than non-core concept to the non-core concept of the average similarity of full dose concept set In the domain knowledge base in field to be built, and using non-core concept as new key concept, first acquisition unit 11 is triggered, with And for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built, then giving up similarity More than non-core concept of the non-core concept to the average similarity of full dose concept set.
In embodiments of the present invention, the 4th computation subunit 141, the 5th computation subunit 142, the and of judgment sub-unit 143 The concrete implementation procedure for processing subelement 144 refers to the related description of embodiment of the method part, to this embodiment of the present invention not Illustrate again.
Second computing unit 15, after all concepts in the domain knowledge base for getting field to be built, obtains Relation between any two concept, so as to obtain the domain knowledge base in field to be built, all concepts include field to be built All key concepts and all non-core concepts.
Optionally, the second computing unit 15, for obtaining each self-corresponding non-core concept of any two concept, obtains and appoints In each self-corresponding non-core concept of two concepts of meaning in the quantity and any two concept of same concept different concepts number Amount, and the quantity of the quantity according to same concept and different concepts, obtain the similarity between any two concept, and any two is general Similarity between thought is used for the similarity degree for indicating between any two concept, concrete implementation procedure and illustrates, the side of referring to The related description of method embodiment part, no longer illustrates this embodiment of the present invention.
By above-mentioned technical proposal, the key concept and key concept place target in current field to be built is obtained After text, at least one non-core concept can be obtained from target text, and obtain the phase of key concept and non-core concept Like spending, when the similarity of key concept and non-core concept meets pre-conditioned, judge non-core concept whether with it is existing Concept in the domain knowledge base in field to be built is identical, if will otherwise meet pre-conditioned non-core concept be retained in In the domain knowledge base in field to be built, and using non-core concept as new key concept, obtain new key concept and be located Target text, returns and performs the step of obtaining at least one non-core concept from target text, is getting field to be built Knowledge base in all concepts after, obtain any two concept between relation, so as to obtain the domain knowledge in field to be built Storehouse, realizes the automatic structure of the domain knowledge base in field to be built, the expert in field so to be built or is engaged in editing Personnel just without the need for manual construction knowledge base.After the domain knowledge base for building any one field, can also be by building neck Each step in domain knowledge base is automatically updating knowledge base so that personnel know the related content of domain knowledge base without the need for understanding, Reduce the maintenance difficulties of domain knowledge base.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with embodiment, between each embodiment identical similar part mutually referring to.For For device class embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, related part is referring to side The part explanation of method embodiment.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between there is any this actual relation or order.And, term " including ", "comprising" or its any variant are intended to contain Lid nonexcludability is included, so that a series of process, method, article or equipment including key elements not only will including those Element, but also including the key element being not expressly set out, or also include solid by this process, method, article or equipment Some key elements.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including Also there is other identical element in the process of the key element, method, article or equipment.
The foregoing description of the disclosed embodiments, enables those skilled in the art to realize or using the present invention.To this Various modifications of a little embodiments will be apparent for a person skilled in the art, and generic principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited It is formed on the embodiments shown herein, and is to fit to consistent with principles disclosed herein and features of novelty most wide Scope.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (10)

1. a kind of domain knowledge base construction method, it is characterised in that methods described includes:
Obtain the key concept and key concept place target text in current field to be built;
At least one non-core concept is obtained from the target text, the non-core concept is to carry from the target text The concept in full dose concept set taken out, it is general that the full dose concept set is combined into core in the field to be built and field Read the set with non-core concept;
Obtain the similarity of the key concept and the non-core concept;
When the similarity of the key concept and the non-core concept meets pre-conditioned, judge that the non-core concept is It is no identical with concept that is being present in the domain knowledge base in the field to be built, if it is not, then meeting default bar by described The non-core concept of part is retained in the domain knowledge base in the field to be built, and using the non-core concept as new core Heart concept, obtains the new key concept place target text, returns execution and obtains at least one from the target text The step of non-core concept, if it is, give up described meeting pre-conditioned non-core concept;
After all concepts in the domain knowledge base for getting the field to be built, the pass between any two concept is obtained System, so as to obtain the domain knowledge base in the field to be built, all concepts include all cores in the field to be built Heart concept and all non-core concepts.
2. method according to claim 1, it is characterised in that the acquisition key concept and the non-core concept Similarity, including:
When the concept that the key concept is the 1st acquisition, the non-core concept place target text is obtained, from described non- Obtain in the target text of key concept place and be located at least one of the full dose concept set the first concept, and according to described At least one first concepts and the corresponding at least one non-core concept of the key concept, obtain the key concept and described The similarity of non-core concept;
When the key concept be the non-core concept that obtains i & lt as new key concept when, from the new core Obtain in the corresponding non-core concept place target text of heart concept and be located at least one of described full dose concept set second Concept, and according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept and i-th The similarity of secondary acquisition, obtains the similar of the new key concept and the corresponding non-core concept of the new key concept Degree, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain it is non-core Similarity between concept, 1≤i≤N, N=M-1, M are to get all concepts in the knowledge base in the field to be built When, obtain the total degree of non-core concept.
3. method according to claim 2, it is characterised in that described according to described at least one first concepts and the core The corresponding at least one non-core concept of heart concept, obtains the similarity of the key concept and the non-core concept, including:
Obtain identical in described at least one first concepts and the corresponding at least one non-core concept of the key concept In the quantity of one concept and at least one first concept and the corresponding at least one non-core concept of the key concept Concept sum, wherein the concept sum for the concept of the identical first quantity and at least one first concept and The quantity sum of different concepts in the corresponding at least one non-core concept of the key concept;
According to the quantity of the concept of the identical first and concept sum, the key concept and the non-core concept are obtained Similarity;
It is described according to described at least one second concepts, the corresponding at least one non-core concept of the new key concept and The similarity of i acquisition, obtains the similar of the new key concept and the corresponding non-core concept of the new key concept Degree, including:
Obtain identical in described at least one second concepts at least one non-core concept corresponding with the new key concept The second concept quantity and at least one second concept and corresponding at least one non-core of the new key concept Concept sum in heart concept, wherein described at least one second concepts and the new key concept are corresponding at least one non- Concept sum in key concept is the quantity and at least one second concept of the concept of the identical second and described new The corresponding at least one non-core concept of key concept in different concepts quantity sum;
It is corresponding with the new key concept with described at least one second concepts according to the quantity of the concept of the identical second At least one non-core concept in concept sum, obtain the new key concept and the new key concept be corresponding First similarity of non-core concept;
According to the similarity that first similarity and i & lt are obtained, the new key concept and the new core are obtained The similarity of the corresponding non-core concept of concept.
4. method according to claim 1, it is characterised in that described when the key concept and the non-core concept When similarity meets pre-conditioned, judge the non-core concept whether with the domain knowledge for being present in the field to be built Concept in storehouse is identical, if it is not, then meeting described pre-conditioned non-core concept and being retained in the field to be built In domain knowledge base, and using the non-core concept as new key concept, the new key concept place target is obtained Text, returns and performs the step of obtaining at least one non-core concept from the target text, if it is, giving up described full The pre-conditioned non-core concept of foot, including:
Obtain the similarity of the non-core concept and each concept in full dose concept set;
The similarity of each concept in the non-core concept and full dose concept set, obtains the non-core concept pair The average similarity of full dose concept set;
When the similarity of the key concept and the non-core concept is more than the non-core concept to full dose concept set During average similarity, judge the non-core concept whether be present in it is general in the domain knowledge base in the field to be built Read identical;
If it is not, then similarity is more than into non-core concept of the non-core concept to the average similarity of full dose concept set In being retained in the domain knowledge base in the field to be built, and using the non-core concept as new key concept, institute is obtained New key concept place target text is stated, is returned and is performed the step that at least one non-core concept is obtained from the target text Suddenly;
If it is, giving up similarity more than the non-core concept to the non-core general of the average similarity of full dose concept set Read.
5. method according to claim 1, it is characterised in that described in the knowledge base for getting the field to be built All concepts after, obtain any two concept between relation, including:
Obtain each self-corresponding non-core concept of any two concept;
Obtain the quantity and any two of same concept in each self-corresponding non-core concept of any two concept The quantity of different concepts in concept;
According to the quantity and the quantity of different concepts of the same concept, the similarity between any two concept, institute are obtained State the similarity degree that the similarity between any two concept is used to indicate between any two concept.
6. a kind of domain knowledge base construction device, it is characterised in that described device includes:
First acquisition unit, for obtaining current field to be built in key concept and key concept place target text This;
Second acquisition unit, for obtaining at least one non-core concept from the target text, the non-core concept is The concept in full dose concept set extracted from the target text, the full dose concept set is combined into described to be built The set of key concept and non-core concept in field and field;
First computing unit, for obtaining the similarity of the key concept and the non-core concept;
Processing unit, for when the similarity of the key concept and the non-core concept meets pre-conditioned, judging institute Whether identical with the concept being present in the domain knowledge base in the field to be built non-core concept is stated, if it is not, then will It is described to meet pre-conditioned non-core concept and be retained in the domain knowledge base in the field to be built, and will be described non-core Concept triggers the first acquisition unit as new key concept, if it is, give up described meeting pre-conditioned non-core Heart concept;
Second computing unit, after all concepts in the domain knowledge base for getting the field to be built, obtains and appoints Relation between two concepts of meaning, so as to obtain the domain knowledge base in the field to be built, all concepts include described treating All key concepts in structure field and all non-core concepts.
7. device according to claim 6, it is characterised in that first computing unit, for when the key concept For the 1st acquisition concept when, the non-core concept place target text is obtained, from non-core concept place target text Obtain in this and be located at least one of the full dose concept set the first concept, and according to described at least one first concepts and The corresponding at least one non-core concept of the key concept, obtains the similar of the key concept and the non-core concept Degree, and for when the key concept be the non-core concept that obtains i & lt as new key concept when, from described At least one in the full dose concept set is obtained in the corresponding non-core concept place target text of new key concept Individual second concept, and it is corresponding at least one non-core general according to described at least one second concepts, the new key concept The similarity obtained with i & lt is read, the new key concept and the corresponding non-core concept of the new key concept is obtained Similarity, i & lt obtain similarity be i & lt obtain the corresponding key concept of non-core concept and i & lt obtain Similarity between non-core concept, 1≤i≤N, N=M-1, M are to get owning in the knowledge base in the field to be built During concept, the total degree of non-core concept is obtained.
8. device according to claim 7, it is characterised in that first computing unit, including:
First obtains subelement, for when the key concept is the concept that the 1st time obtains, obtaining the non-core concept institute In target text, obtain from the non-core concept place target text and be located at least one of described full dose concept set First concept, and for when the key concept be the non-core concept that obtains i & lt as new key concept when, Obtain from the corresponding non-core concept place target text of the new key concept and be located in the full dose concept set At least one second concepts;
Second obtains subelement, corresponding at least one non-for obtaining described at least one first concepts and the key concept The quantity of the concept of identical first and at least one first concept and the key concept are corresponding extremely in key concept Concept sum in a few non-core concept, wherein concept sum is the quantity of the concept of the identical first and described The quantity sum of different concepts at least one first concepts and the corresponding at least one non-core concept of the key concept;
First computation subunit, for according to the quantity of the concept of the identical first and concept sum, obtaining the core general Read the similarity with the non-core concept;
3rd obtains subelement, for obtaining described at least one second concepts and the new key concept corresponding at least The quantity of the concept of identical second and at least one second concept and the new key concept in individual non-core concept Concept sum in corresponding at least one non-core concept, wherein described at least one second concepts and the new core are general Read quantity and described at least one of the concept sum in corresponding at least one non-core concept for the concept of the identical second The quantity sum of different concepts in individual second concept and the corresponding at least one non-core concept of the new key concept;
Second computation subunit, for according to the quantity of the concept of the identical second and at least one second concept and institute The concept sum in the corresponding at least one non-core concept of new key concept is stated, the new key concept and described is obtained First similarity of the corresponding non-core concept of new key concept;
3rd computation subunit, for the similarity obtained according to first similarity and i & lt, obtains the new core The similarity of concept and the corresponding non-core concept of the new key concept.
9. device according to claim 6, it is characterised in that the processing unit, including:
4th computation subunit, for the similarity of each concept in obtaining the non-core concept and full dose concept set;
5th computation subunit, for the similarity of each concept in the non-core concept and full dose concept set, Obtain average similarity of the non-core concept to full dose concept set;
Judgment sub-unit, for being more than the non-core concept pair when the similarity of the key concept and the non-core concept During the average similarity of full dose concept set, judge the non-core concept whether with the neck for being present in the field to be built Concept in domain knowledge base is identical;
Subelement is processed, for differing with the concept being present in the domain knowledge base in field to be built when non-core concept When, similarity is retained in more than the non-core concept to the non-core concept of the average similarity of full dose concept set described In the domain knowledge base in field to be built, and using the non-core concept as new key concept, trigger described first and obtain Unit, and for when non-core concept is identical with the concept being present in the domain knowledge base in field to be built, then giving up Similarity is abandoned more than non-core concept of the non-core concept to the average similarity of full dose concept set.
10. device according to claim 6, it is characterised in that second computing unit, for obtaining described any two The each self-corresponding non-core concept of individual concept, obtains same concept in each self-corresponding non-core concept of any two concept Quantity and any two concept in different concepts quantity, and according to the quantity and different concepts of the same concept Quantity, obtain the similarity between any two concept, the similarity between any two concept is used to indicating described Similarity degree between any two concept.
CN201611220184.8A 2016-12-26 2016-12-26 A kind of domain knowledge base construction method and device Active CN106650940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611220184.8A CN106650940B (en) 2016-12-26 2016-12-26 A kind of domain knowledge base construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611220184.8A CN106650940B (en) 2016-12-26 2016-12-26 A kind of domain knowledge base construction method and device

Publications (2)

Publication Number Publication Date
CN106650940A true CN106650940A (en) 2017-05-10
CN106650940B CN106650940B (en) 2019-01-22

Family

ID=58826830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611220184.8A Active CN106650940B (en) 2016-12-26 2016-12-26 A kind of domain knowledge base construction method and device

Country Status (1)

Country Link
CN (1) CN106650940B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664595A (en) * 2018-05-08 2018-10-16 和美(深圳)信息技术股份有限公司 Domain knowledge base construction method, device, computer equipment and storage medium
CN112699909A (en) * 2019-10-23 2021-04-23 中移物联网有限公司 Information identification method and device, electronic equipment and computer readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117281A (en) * 2009-12-30 2011-07-06 北京亿维讯科技有限公司 Method for constructing domain ontology
CN102214232A (en) * 2011-06-28 2011-10-12 东软集团股份有限公司 Method and device for calculating similarity of text data
CN102306182A (en) * 2011-08-30 2012-01-04 西华大学 Method for excavating user interest based on conceptual semantic background image
CN102637163A (en) * 2011-01-09 2012-08-15 华东师范大学 Method and system for controlling multi-level ontology matching based on semantemes
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN104636430A (en) * 2014-12-30 2015-05-20 东软集团股份有限公司 Case knowledge base representation and case similarity obtaining method and system
CN104715042A (en) * 2015-03-24 2015-06-17 清华大学 Conceptual design knowledge representation method and knowledge management system based on ontology
CN104809128A (en) * 2014-01-26 2015-07-29 中国科学院声学研究所 Method and system for acquiring statement emotion tendency
CN105912637A (en) * 2016-04-08 2016-08-31 西藏飞跃智能科技有限公司 Knowledge-based user interest mining method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117281A (en) * 2009-12-30 2011-07-06 北京亿维讯科技有限公司 Method for constructing domain ontology
CN102637163A (en) * 2011-01-09 2012-08-15 华东师范大学 Method and system for controlling multi-level ontology matching based on semantemes
CN102214232A (en) * 2011-06-28 2011-10-12 东软集团股份有限公司 Method and device for calculating similarity of text data
CN102306182A (en) * 2011-08-30 2012-01-04 西华大学 Method for excavating user interest based on conceptual semantic background image
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN104809128A (en) * 2014-01-26 2015-07-29 中国科学院声学研究所 Method and system for acquiring statement emotion tendency
CN104636430A (en) * 2014-12-30 2015-05-20 东软集团股份有限公司 Case knowledge base representation and case similarity obtaining method and system
CN104715042A (en) * 2015-03-24 2015-06-17 清华大学 Conceptual design knowledge representation method and knowledge management system based on ontology
CN105912637A (en) * 2016-04-08 2016-08-31 西藏飞跃智能科技有限公司 Knowledge-based user interest mining method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664595A (en) * 2018-05-08 2018-10-16 和美(深圳)信息技术股份有限公司 Domain knowledge base construction method, device, computer equipment and storage medium
CN108664595B (en) * 2018-05-08 2020-10-16 和美(深圳)信息技术股份有限公司 Domain knowledge base construction method and device, computer equipment and storage medium
CN112699909A (en) * 2019-10-23 2021-04-23 中移物联网有限公司 Information identification method and device, electronic equipment and computer readable storage medium
CN112699909B (en) * 2019-10-23 2024-03-19 中移物联网有限公司 Information identification method, information identification device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN106650940B (en) 2019-01-22

Similar Documents

Publication Publication Date Title
Pedraza-Fariña et al. A network theory of patentability
CN108388876A (en) A kind of image-recognizing method, device and relevant device
CN106250707A (en) A kind of based on degree of depth learning algorithm process head construction as the method for data
CN109918511A (en) A kind of knowledge mapping based on BFS and LPA is counter to cheat feature extracting method
CN107818815A (en) The search method and system of electronic health record
CN106845061A (en) Intelligent interrogation system and method
CN105528422A (en) Focused crawler processing method and apparatus
CN107506350A (en) A kind of method and apparatus of identification information
CN108597605A (en) A kind of life big data acquisition of personal health and analysis system
CN107819790A (en) The recognition methods of attack message and device
CN106650940A (en) Field knowledge base establishment method and device
CN108133752A (en) A kind of optimization of medical symptom keyword extraction and recovery method and system based on TFIDF
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
Arshad et al. A comprehensive knowledge management process framework for healthcare information systems in healthcare industry of Pakistan
CN104536972B (en) Web page contents sensory perceptual system based on CDN and method
CN107967332A (en) Enterprise's address recognition methods and identifying system
Gerteis et al. Nationalism in America: the case of the Populist movement
CN102982011B (en) A kind of method and apparatus for recognizing out-of-sequence text
CN109616165A (en) Medical information methods of exhibiting and device
CN111859238A (en) Method and device for predicting data change frequency based on model and computer equipment
CN106487540A (en) A kind of rules process method and equipment
CN107527289A (en) A kind of investment combination industry distribution method, apparatus, server and storage medium
CN109657907A (en) Method of quality control, device and the terminal device of geographical national conditions monitoring data
CN110010231A (en) A kind of data processing system and computer readable storage medium
CN109299081A (en) Clean method, apparatus, computer equipment and the storage medium of room rate data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant