CN108304488A - A method of utilizing the automatic study ontology of Topic Model - Google Patents
A method of utilizing the automatic study ontology of Topic Model Download PDFInfo
- Publication number
- CN108304488A CN108304488A CN201810009239.3A CN201810009239A CN108304488A CN 108304488 A CN108304488 A CN 108304488A CN 201810009239 A CN201810009239 A CN 201810009239A CN 108304488 A CN108304488 A CN 108304488A
- Authority
- CN
- China
- Prior art keywords
- concept
- ontology
- vocabulary
- concept set
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of methods of the automatic study ontology using Topic Model, this method supports automatic domain body structure, a kind of measure of Semantic Similarity between the calculating concept of information is invented, the Semantic Similarity between concept for calculating the generation of LDA models, the method for this automatic study ontology are divided into two steps:The first step is to carry out concept identification from text corpus or web corpus;Second step is to carry out the relationship between concept using Semantic Similarity defined herein measurement CP to establish.This method need not have auxiliary of the seed ontology as initial study ontology.The experimental results showed that the method proposed by the present invention for carrying out automated ontology structure using Topic Model is very effective.
Description
Technical field
The present invention relates to a kind of methods of ontological construction not to be had to using TopicModel as basic conception unit is generated
Ontology seed can learn ontology and achieve the purpose that build ontology.
Background technology
Ontological construction has been applied to various fields, such as artificial intelligence, information extraction, machine translation field.But people
Work structure ontology is the work of very time and effort consuming, is updated with the continuous extension of concept and realm information, structure is large-scale
Ontology needs more and more manpower material resources and energy, institute big as webdirectories, Wordnet taking human as structure
Type ontology seed needs to expend more effort and energy.Therefore it is required to build ontology automatically strongly to keep up with this field
The current demand that information rises suddenly and sharply, to reduce the cost for being considered to build up and safeguarding ontology.So analyzed recently using computer data,
It is a meaningful research that the mode of data mining builds ontology automatically, has attracted many researchers a large amount of to this progress
In-depth study.
Automatic structure ontology has changed into a new research field, has many methods to have been proposed for building automatically
Ontology, ontology has had at present much applies immediately, and knowledge engineer can be helped to combine automatically or semi-automatically machine learning
Technology builds and extends ontology, greatly reduces the artificial constructed cost for safeguarding ontology.Most of present body learning sides
Method concentrates on extending, updates existing ontology seed, is updated using concept or lexical unit is extracted from document dictionary
With spread ontology seed.There are also the methods of automatic study ontology, but the method for most this automatic study ontologies is all
Based on the ontological construction in special knowledge field, such as SKOS models, but these methods all have certain limitation.
There are the method for much learning ontology from text corpus, such as the ontological construction side based on lexico-syntactic
Method, these methods are mainly closed using natural language processing technique and existing lexicon resources to learn the is-a between concept
System, i.e., so-called Hearst-parterns, but such methods, which have a disadvantage to be exactly that Hearst-parterns is this, needs frequency
The vocabulary pattern of numerous appearance will not frequently occur, while he can only handle some very fuzzy lexical semantic relations.
The common sense such as P.Cimicano and F.M.Suchanek are gone to extract more using this web search engine of Wikipedia, Wordnet
Language mode.
Statistical learning method based on cluster and classification is also applied in body learning, these methods usually utilize similitude
It measures and carries out the foundation of conceptual relation with measure of dissimilarity.The limitation of such methods is the ontology based on cluster and classification
Learning method is difficult to execute.The hierarchical structure of Method for Ontology Learning study ontology based on information extraction technique, such methods are only capable of
Enough extract the similar mankind, place, this concept generally changed very much of animal and their sub- concept.
Topic Model probabilistic models be one kind in the case where no priori provides, know from scientific publications
Do not go out concept by the very effective model of industry-proven.Topic Model models have been widely applied to text now
This excavation applications.It is a kind of new research method to carry out body learning using Topic Model.Elias Zavitsanos etc. are carried
Go out a kind of automated ontology learning method based on statistical method, this method is by constantly reusing Topic Model moulds
Then the concept set that type trains recycles conditional independence to judge the contact between the concept that identifies, but this method
It cannot carry out the contact of concept between two hierarchical structures.Wang wei et al. propose two methods and are all based on Semantic Web
Learn body construction method, this method by information theory with Topic Model be combined in the way of, show to recall well
Rate and accuracy rate, but need to limit the quantity of the sub- concept node of nearest root node.
Invention content
It is improper accurately to determine between concept the object of the present invention is to provide a kind of method of automatic study ontology
Correlation, and depth and that ontology is determined during ontology can be learnt in the case where not providing priori
Practise the terminal of time.
In order to achieve the above object, the technical solution of the present invention is to provide a kind of automatic using Topic Model
The method for practising ontology, which is characterized in that include the following steps:
The first step carries out concept extraction using LDA models from given document corpus, is generated by the concept being drawn into
Go out concept set, then the hierarchical structure G, G={ T, E } of progress concept hierarchy subdivision generation ontological construction, in formula, T=t1,
T2 ..., tm } it is concept set, it is defined as Upper Concept set;T '=t1 ', t2 ' ..., tm ' it is sub- concept set, it is fixed
Justice is the next layer of concept set of Upper Concept set T, and concept set T and sub- concept set T ' are successive two layers;E is side
Set, each eij ∈ E indicate that i-th of concept ti and j-th of concept tj ' in sub- concept set T ' in concept set T has side phase
Even;
Second step, using CosTMI method for measuring similarity, the semanteme in identification hierarchical structure G between successive two layers is similar
Property, wherein in Upper Concept set T in the context of p-th of concept tp and concept tp, s-th in next layer of concept set T '
Semantic similarity CosTMI (ts ', tr ' of two concepts of concept ts ' and r-th of concept tr ';tp)
In formula, tp includes sequence of words { wp1, wp2 ..., wpn };Ts ' include sequence of words ws'1, ws'2 ...,
ws’n};Tr ' includes sequence of words { wr ' 1, wr'2 ..., wr ' n };PMI () is the point mutual information of two vocabulary, two vocabulary
The point mutual information of w and w ' is PMI (w, w '), then has:
In formula, P (w, w ')=P (w) P (w ' | w);
In formula, z is theme, the probability that P (z=j) is theme when being j, P (w |
Z=j when) to be theme be j, the conditional probability of vocabulary w, k is the quantity of concept;
In formula, the condition of P (w ' | z=j) is theme when being j w '
Probability, P (z=j | w) is vocabulary when being w, the conditional probability of theme j;
If CosTMI (ts ', tr ';Tp) be more than certain threshold value thc, then in tp and ts ', tr ' opening relationships;
Third step calculates standard similarity measurement L (ts ', tr ';Tp),
In formula, P (ts ' | tp) is (being the probability of the generation of ts ' under tp context vocabulary environment), P (tr ' | tp) be (be on tp
Hereafter under vocabulary environment the generation of ts ' probability);
Passing through standard similarity measurement L (ts ', tr ';When tp) defining the relationship between Ontological concept, each pass through
Topic Model learn the concept that the concept all corresponds to an ontology, context rings of each concept ts ' or tr ' in tp
Conditional probability under border, for calculating the semantic similarity between same layer concept, value is smaller to show that the Semantic Similarity of value is got over
It is high;
4th step, the hierarchical structure for determining ontology
If learning three concept hierarchies Th, Tm, Tl using TopicModel, Th is highest level, and Tm is the intermediate level,
Tl is lowest level, the entropys of these three variables is denoted as H (Th), H (Tm), H (Tl), and H (Tl | Tm) is the condition in message area
Entropy, then successive two layers of concept set information gain Δ (I (Th, Tm, Tl)) be defined as:
Δ (I (Th, Tm, Tl))=H (Th)-H (Tl | Tm)
When Δ (I (Th, Tm, Tl)) is less than defined threshold value ω, stop utilizing LDA model learning concept set.
Preferably, in the first step, carry out concept hierarchy subdivision generate ontological construction hierarchical structure G when follow with
Lower rule:
Rule 1:If ti ∈ T, tj ' ∈ T', NT < NT', conclusion are:Sub- concept set T ' is than concept set T, wherein
NT and NT' is the floor height rank of concept set T and sub- concept set T ' respectively;
Rule 2:If ti ∈ T, tj ' ∈ T',In ti and tj ' between very likely there is relationship between superior and subordinate,
Wherein,It is empty set.
The present invention, which proposes a kind of new method, to practise ontology from given corpus of text Kuku middle school automatically.We utilize one
The concept that the probabilistic model that kind is widely used i.e. TopicModel models generate is used as the structure required concept units of ontology,
There are these concepts, it is also necessary to there is a kind of method to measure the similitude of these concepts, to define adjacent upper and lower two in body construction
Contact between layer concept, that is, in order to build this subject structure to side is set up between concept, form the level framework of ontology.
Ensure to be related between the concept for learning, and the most compact and reasonable of the contact between concept.We define two phases thus
Measured like property, and it is proposed that a new differentiation ontology level constructional depth standard, that is, propose one it is new
Method is come the standard that cycle terminates when differentiating study ontology.
The present invention generates concept by recycling LDA models i.e. Topic Model models, and definition can be measured accurately generally
The measure of Semantic Similarity builds the layer of structure between the concept and concept of ontology between thought.Using true
GENIA corpus and the ontology GENIA ontologies verification present invention propose the validity and practicability of body constructing method.
Description of the drawings
Fig. 1 is variation relation figure of the accuracy rate with topic dimensions;
Fig. 2 is the variation relation figure of the increase accuracy rate with ontology depth under CP measurements;
Fig. 3 is the variation relation figure of the increase accuracy rate with ontology depth under L1 measurements.
Specific implementation mode
Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention
Rather than it limits the scope of the invention.In addition, it should also be understood that, after reading the content taught by the present invention, people in the art
Member can make various changes or modifications the present invention, and such equivalent forms equally fall within the application the appended claims and limited
Range.
A kind of method of automatic study ontology using Topic Model provided by the invention generally comprises following step
Suddenly:
The first step carries out concept extraction using LDA models from given document corpus, by constantly reusing
Model Xu Xi goes out to build the required concept set of ontology.
Second step devises CP similarity measurements, identifies the similitude between level structuring concept, i.e., general between adjacent level
The potential contact read;L1 standards have been formulated to judge the level quantity of ontological construction.
Third step passes through the validity and actual effect of this body constructing method of experimental verification.
Above-mentioned steps are related to following technological innovation:
One) ontological construction process
Fig. 1 illustrates the process of ontological construction.Hierarchical structure a G, G={ T, E } are built, in formula, T=t1, t2 ...,
Tm } to be concept set, referred to as conceptual level can be defined as Upper Concept set by LDA model outputs.T '=t1 ',
T2 ' ..., tm ' } it is sub- concept set, it is defined as the next layer of concept set of Upper Concept set T.E is the set on side, each
Eij ∈ E indicate i-th of concept ti in concept set T and j-th of concept tj ' in sub- concept set T ' have side be connected G=T,
E }, T={ t1, t2 ..., tm } is the set of a concept in figure.
In order to build the contact between upper layer and lower layer concept, it is thus necessary to determine that the concept hierarchy belonging to these concept nodes, which
Belong to high one layer of concept set, which belongs to low one layer of concept set, and sets up the connection between this two layers of concept set
System can be more complicated.It is not especially clear using the boundary between the concept of LDA models, needing will using certain measure
These Concept Hierarchies, and relationship between layers is also set up, some concepts may have several fathers, some concepts can
Can there is no child, the concept hierarchy of generation is more, and the relationship between conceptual level is closer, so the quantity that level concept generates is not
It can increase without limitation, need the level quantity that an ontological construction is manually set.
Two) related rule
Before proposing to implement automated ontology learning method, two basic rules are defined first.General situation
Under be that continuous recycling LDA models produce concept set, for building the required concept of hierarchical structure.The present invention is fixed
Some rules of justice, these rules are used for limiting the concept that the model produces, use when building hierarchical structure ontology.
It is more abstract more the concept for being in high-rise according to intuition, on the contrary it is more specific;It is fewer more the concept for being in high-rise, it is on the contrary
It is more.These common sense are so based on, rule is defined as follows:
Rule 1:If ti ∈ T, tj ' ∈ T', NT < NT', conclusion are:Sub- concept set T ' is than concept set T, wherein
NT and NT' is the floor height rank of concept set T and sub- concept set T ' respectively.
When being repeated with LDA models when going study to generate concept set, it is necessary to determine NT < NT ' first.Therefore should
Rule is very important the method for building ontology.
Each of the every layer concept for learning by LDA by document corpus is the word that high frequency occurs in the literature
It converges, in the concept set that high-rise high frequency occurs, very likely same high frequency occurs in low layer concept set, so building this
These identical vocabulary may establish contact during body, this is unreasonable.Therefore it is defined as follows rule:
Rule 2:If ti ∈ T, tj ' ∈ T',In ti and tj ' between very likely there is relationship between superior and subordinate,
Wherein,It is empty set.
The rule can help the similarity measurement to be introduced below this patent between our defined notions.
Three) similarity measurement
The present invention builds the hierarchical structure of ontology using the method for similarity measurement, that is to say, that the contact between concept
It is to be established by the similarity between concept.Reach certain similarity between two concepts that two level concepts are concentrated
Value, could establish contact, otherwise it is assumed that being not in contact between them.It is similar in order to calculate the semanteme between two concepts
Property, utilize LDA models generate concept set symphysis at concept matrix, each Input matrix is that concept appears in ontology
Possibility size.
Similitude between usual concept is spent using point mutual information PMI (Pointwise Mutual Information)
Amount, invention defines Semantic Similarity measures between a kind of new vocabulary w and w ', are determined using the expectation of two concepts
Adopted PMI, each concept are made of a series of vocabulary, this is also a special nature of LDA models.Two vocabulary w's and w '
Point mutual information is PMI (w, w '), then has:
In formula, P (w, w ')=P (w) P (w ' | w);
In formula, z is theme, the probability that P (z=j) is theme when being j, and P (w | z
=j) when to be theme be j, the probability of vocabulary w, k is the quantity of concept;
In formula, P (w ' | z=j) is theme when being j, the probability of w ', P
(z=j | w) is vocabulary when being w, the conditional probability that theme is j.
The calculation formula that the present invention provides the point mutual information of two vocabulary is the concept between follow-up tissue construction ontology
Hierarchical structure is prepared, and the Semantic Similarity defined between another concept can also use the formula.
The each concept generated by LDA corresponds to a concept inside body construction.Semantic Similarity measurement is measurement two
Semantic similarity between a concept.In the context of special linguistic context, the semantic similarity of other two concept.Upper Concept
S-th of concept ts ' and r-th be generally in set T in the context of p-th concept tp and concept tp, in next layer of concept set T '
Read semantic similarity CosTMI (ts ', tr ' of two concepts of tr ';tp)
In formula, tp includes sequence of words { wp1, wp2 ..., wpn };Ts ' include sequence of words ws'1, ws'2 ...,
ws’n};Tr ' includes sequence of words { wr ' 1, wr'2 ..., wr ' n }.
Preset threshold value thct, if CosTMI (ts ', tr ';Tp) value is more than certain threshold value thct, in tp and ts,
Ts ' opening relationships.By above-mentioned definition and the calculating of Semantic Similarity, what is obtained all can be ontology with the concept of opening relationships
A concept in structure in ontology.Threshold value Thct is by testing the value to be determined, this value two concepts of bigger explanation
Between Semantic Similarity it is bigger, otherwise Semantic Similarity is smaller.
L1 standard similarity measurements:
In formula, P (ts ' | tp) be (be
The probability of the generation of ts ' under tp context vocabulary environment), P (tr ' | tp) be (be the hair of the tr ' under tp context vocabulary environment
Raw probability);
By standard similarity measurement L (ts ', tr ';When tp) defining the relationship between Ontological concept, each pass through
Topic Model learn the concept that the concept all corresponds to an ontology, context rings of each concept ts ' or tr ' in tp
Conditional probability under border, for calculating the semantic similarity between same layer concept, value is smaller to show that the Semantic Similarity of value is got over
It is high.
Determine the hierarchical structure of ontology:
According to rule 1, the concept of low level is more more specific than high-level concept, and in natural language processing, we can
It was found that most abstract concept.Namely those most abstract concepts are the concepts that can not be finely divided again.Therefore study ontology mistake
Cheng Zhong, concept cannot be subdivided always, it is proposed that a new method determines the suitable size of ontology, i.e., in given neck
The quantity of ontology level is determined in domain knowledge base.
If learning three concept hierarchies Th, Tm, Tl using TopicModel, Th is highest level, and Tm is the intermediate level,
Tl is lowest level, the entropys of these three variables is denoted as H (Th), H (Tm), H (Tl), and H (Tl | Tm) is the condition in message area
Entropy, then successive two layers of concept set information gain Δ (I (Th, Tm, Tl)) be defined as:
Δ (I (Th, Tm, Tl))=H (Th)-H (Tl | Tm)
When Δ (I (Th, Tm, Tl)) is less than defined threshold value ω, stop utilizing LDA model learning concept set.This is not
Equation means that the concept distributional semantic model of Tm and Tl are quite similar when Δ (I (Th, Tm, Tl)) value is less than certain threshold value,
LDA concept learnings reach the best expectation of an Ontological concept hierarchy learning, and concept hierarchy quantity is exactly the concept of ontology at this time
Level quantity.In actual experiment, we set ω values close to 0.
The structure bulk process that we are proposed carries out experimental verification by the corresponding GENIA ontologies of GENIA corpus.
GENIA corpus is a biological corpus.The corpus includes 1,999 medical vocabularies, be from MeSH, human and
It collects and obtains in blood cells.Include 45 concepts and 42 relationships in GENIA ontologies.Our experiment contents are by GENIA
Expectation is input to LDA models, calculates the required concept of ontology to be built.We compared it is proposed that method and
The execution for the algorithm that Zavitsanos et al. is proposed is completed in Pentium 4, the PC machine of memory 2GB, we compared
The CI methods that CosTMI and Zavitsanos et al. are proposed, the parameter setting such as table 1 of tri- measures of CI, CP and L1.
The parameter setting of 1 method for measuring similarity of table
Our experimental result is described in detail below.GENIA ontology hierarchical structures are comprising two different ontologies.
We measured using recall rate, accuracy rate and F1 assess it is proposed that method execution efficiency and obtained body construction
Quality.The calculation formula of recall rate Rec is as follows:
In formula, nrcIt is the quantity of the correct concept for the study that algorithm provides, NrIt is that model totally calculates
Concept quantity.
Accuracy rate Prec is defined as follows:
In formula, npcIt is the quantity of concept learning, NpIt is that algorithm learns whole concept quantity.
The calculation formula of F1 measurements is as follows:
Two methods execute as shown in table 2 in the conceptive comparing result of study:
Implementing results of the concept C based on similarity measurement of 2 algorithm of table study
From table 2 it will be seen that it is proposed that method AOL implementing results be that effectively, can be used for
The ontological construction of other field knowledge, accuracy rate and recall rate are all above CI methods.
The implementing result of relationship between the different similarity measurement structure reconstruction of table 3
Shown in table 2 and table 3, comparing result is very satisfactory, and the algorithm that we are proposed is proved to be to be led in structure
It is effectively in terms of the ontology of domain.As can be seen that our algorithm is in the accuracy for identifying concept from Experimental comparison results
Aspect is really slightly inferior, and reason may be that algorithm is still short of in terms of identifying very specific concept.
Fig. 1 illustrates the vocabulary quantity that each concept includes, and is found in we do experimentation, and each concept is included
Vocabulary quantity influence whether the accuracy of ontological construction.The experimental results showed that if each concept includes 10 vocabulary below
Quantity can seriously affect the accuracy of ontological construction., whereas if the vocabulary quantity that each concept includes is more, this is constructed
The accuracy of body is also higher.But be not that the concept that includes is The more the better, by experimental test and analysis, each concept includes 16
A vocabulary result can be relatively good, if concept includes that vocabulary is too many, some low-frequency words occurred in corpus are will appear in concept
It converges, it is little to the abstract sense of concept in ontological construction, the actual mass of ontological construction is influenced whether instead.Result in Fig. 1
Shown, it is extraordinary that CP and the L1 measurement proposed using this patent, which is effect,.The experimental result also indicates that this patent defined
Word semantic similarity measure carries out the validity of Concept Semantic similarity measurement during building ontology.
, also by algorithm during building ontology, the variation between the variation and accuracy rate of ontology level depth is closed for we
System.Experimental result is as shown in Figures 2 and 3, when main presentation is measured using CP and L1, with the variation of ontology depth, accurately
The situation of change of rate F1.We test the parameter th of settingcp=0.93, thfl=1.24.The standard that algorithm terminates is according to us
The value of the DMI of definition, the value are set as ω=0.01.Fig. 2 is described when the depth of ontology reaches 7, this accurate rate metric F1
Reach highest.It is when ontology depth reaches 8 for Fig. 3, this accuracy rate F1 reaches highest.
Finally we must mention some related some factors for influencing ontological constructions, automated ontology build one it is open
Research field, there are no a fixed standards to assess all quality and effect for seeing ontology;In addition it is exactly this patent experiment
Used ontology GENIA is also to be built by the method for the domain expert i.e. subjectivity of people or comparison basis.So passing through visitor
The method of sight goes one subjective method of measurement that can more have any problem.Also it is exactly to build the assessment of ontology automatically than with kind
Complexity and the difficult meeting that ontology is extended and updated on the basis of sub- ontology are more.
Claims (2)
1. a kind of method of automatic study ontology using Topic Model, which is characterized in that include the following steps:
The first step carries out concept extraction using LDA models from given document corpus, is produced generally by the concept being drawn into
Set is read, then the hierarchical structure G, G={ T, E } of progress concept hierarchy subdivision generation ontological construction, in formula, T=t1,
T2 ..., tm } it is concept set, it is defined as Upper Concept set;T '=t1 ', t2 ' ..., tm ' it is sub- concept set, definition
For the next layer of concept set of Upper Concept set T, concept set T and sub- concept set T ' is successive two layers;E is the collection on side
It closes, each eij ∈ E indicate that i-th of concept ti and j-th of concept tj ' in sub- concept set T ' in concept set T has side phase
Even;
Second step, using CosTMI method for measuring similarity, the Semantic Similarity in identification hierarchical structure G between successive two layers,
Wherein, in Upper Concept set T in the context of p-th of concept tp and concept tp, s-th of concept in next layer of concept set T '
Two concepts of ts ' and r-th of concept tr ' semantic similarity CosTMI (ts ', tr ';tp)
In formula, tp includes sequence of words { wp1, wp2 ..., wpn };Ts ' includes sequence of words { ws ' 1, ws ' 2 ..., ws ' n };
Tr ' includes sequence of words { wr ' 1, wr ' 2 ..., wr ' n };PMI () is the point mutual information of two vocabulary, two vocabulary w and w '
Point mutual information be PMI (w, w '), then have:
In formula, P (w, w ')=P (w) P (w ' | w);
In formula, z is theme, the probability that P (z=j) is theme when being j, P (w | z=j)
When to be theme be j, the conditional probability of vocabulary w, k is the quantity of concept;
In formula, the conditional probability of P (w ' | z=j) is theme when being j w ', P
(z=j | w) is vocabulary when being w, the conditional probability of theme j;
If CosTMI (ts ', tr ';Tp) be more than certain threshold value thc, then in tp and ts ', tr ' opening relationships;
Third step, calculate standard similarity measurement L (ts ', tr ';Tp),
In formula, P (ts ' | tp) is (being the probability of the generation of ts ' under tp context vocabulary environment), P (tr ' | tp) be (be on tp
Hereafter under vocabulary environment the generation of tr ' probability);
By standard similarity measurement L (ts ', tr ';When tp) defining the relationship between Ontological concept, each pass through Topic
Model learns the concept that the concept all corresponds to an ontology, and each concept ts ' or tr ' is under the context environmental of tp
Conditional probability, for calculating the semantic similarity between same layer concept, value is smaller to show that the Semantic Similarity of value is higher;
4th step, the hierarchical structure for determining ontology
If learning three concept hierarchies Th, Tm, Tl using TopicModel, Th is highest level, and Tm is the intermediate level, and Tl is
Lowest level, the entropys of these three variables are denoted as H (Th), H (Tm), H (Tl), and H (Tl | Tm) is the conditional entropy in message area, then
Successive two layers of concept set information gain Δ (I (Th, Tm, Tl)) is defined as:
Δ (I (Th, Tm, Tl))=H (Th)-H (Tl | Tm)
When Δ (I (Th, Tm, Tl)) is less than defined threshold value ω, stop utilizing LDA model learning concept set.
2. a kind of body constructing method based on Topic Model as described in claim 1, which is characterized in that described the
In one step, carry out following following rule when concept hierarchy subdivision generates the hierarchical structure G of ontological construction:
Rule 1:If ti ∈ T, tj ' ∈ T',Conclusion is:Sub- concept set T ' is than concept set T, wherein NT and NT'
It is the floor height rank of concept set T and sub- concept set T ' respectively;
Rule 2:If ti ∈ T, tj ' ∈ T',In ti and tj ' between very likely there is relationship between superior and subordinate, whereinIt is empty set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009239.3A CN108304488A (en) | 2018-01-04 | 2018-01-04 | A method of utilizing the automatic study ontology of Topic Model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009239.3A CN108304488A (en) | 2018-01-04 | 2018-01-04 | A method of utilizing the automatic study ontology of Topic Model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108304488A true CN108304488A (en) | 2018-07-20 |
Family
ID=62868677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810009239.3A Pending CN108304488A (en) | 2018-01-04 | 2018-01-04 | A method of utilizing the automatic study ontology of Topic Model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108304488A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547739A (en) * | 2016-11-03 | 2017-03-29 | 同济大学 | A kind of text semantic similarity analysis method |
CN107133283A (en) * | 2017-04-17 | 2017-09-05 | 北京科技大学 | A kind of Legal ontology knowledge base method for auto constructing |
-
2018
- 2018-01-04 CN CN201810009239.3A patent/CN108304488A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547739A (en) * | 2016-11-03 | 2017-03-29 | 同济大学 | A kind of text semantic similarity analysis method |
CN107133283A (en) * | 2017-04-17 | 2017-09-05 | 北京科技大学 | A kind of Legal ontology knowledge base method for auto constructing |
Non-Patent Citations (1)
Title |
---|
ZHIJIE LIN: "《Terminological ontology learning based on LDA》", 《THE 2017 4TH INTERNATIONL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI 2017)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Feature extraction and analysis of natural language processing for deep learning English language | |
CN107992597B (en) | Text structuring method for power grid fault case | |
CN109376242B (en) | Text classification method based on cyclic neural network variant and convolutional neural network | |
CN111078889B (en) | Method for extracting relationship between medicines based on various attentions and improved pre-training | |
CN104268197B (en) | A kind of industry comment data fine granularity sentiment analysis method | |
CN109145112A (en) | A kind of comment on commodity classification method based on global information attention mechanism | |
CN108897857A (en) | The Chinese Text Topic sentence generating method of domain-oriented | |
CN106815293A (en) | System and method for constructing knowledge graph for information analysis | |
CN109284406A (en) | Intension recognizing method based on difference Recognition with Recurrent Neural Network | |
Sadr et al. | Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms | |
CN109325231A (en) | A kind of method that multi task model generates term vector | |
CN113343690B (en) | Text readability automatic evaluation method and device | |
CN111274790A (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
Gu et al. | Enhancing text classification by graph neural networks with multi-granular topic-aware graph | |
Sadr et al. | Exploring the efficiency of topic-based models in computing semantic relatedness of geographic terms | |
Siddharth et al. | Toward automatically assessing the novelty of engineering design solutions | |
Jeon et al. | Measuring the novelty of scientific publications: a fastText and local outlier factor approach | |
Xia et al. | Study of text emotion analysis based on deep learning | |
CN107895012B (en) | Ontology construction method based on Topic Model | |
Han et al. | Automatic business process structure discovery using ordered neurons LSTM: a preliminary study | |
CN117077631A (en) | Knowledge graph-based engineering emergency plan generation method | |
Lin et al. | Learning ontology automatically using topic model | |
Atmaja et al. | Deep learning-based categorical and dimensional emotion recognition for written and spoken text | |
Hua | Study on the application of rough sets theory in machine learning | |
Revanesh et al. | An Optimized Question Classification Framework Using Dual-Channel Capsule Generative Adversarial Network and Atomic Orbital Search Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180720 |
|
RJ01 | Rejection of invention patent application after publication |