CN106611038A

CN106611038A - Ontology concept-based lexical semantic similarity solving method

Info

Publication number: CN106611038A
Application number: CN201610833103.5A
Authority: CN
Inventors: 金平艳
Original assignee: Sichuan Yonglian Information Technology Co Ltd
Current assignee: Sichuan Yonglian Information Technology Co Ltd
Priority date: 2016-07-28
Filing date: 2016-09-20
Publication date: 2017-05-03

Abstract

The invention provides an ontology concept-based lexical semantic similarity solving method, which comprises the steps of mapping to-be-compared words input in a statistical method module into an ontology concept; selecting ontology concepts, with corresponding maximum depths, of the to-be-compared words from an ontology concept module, calculating the distance between the ontology concepts and calculating the most recent common ancestor depth; and finally calculating the similarity between the two to-be-compared words. The ontology concept-based lexical semantic similarity solving method is closer to an empirical value of an expert in quantitative concept; the factors of the distance between the ontology concepts, with the corresponding maximum depths, of the to-be-compared words (c1, c2), the depths and the like are more fully and comprehensively considered, so that the accuracy of the semantic similarity result is greatly improved; and the ontology reasoning effect is better improved.

Description

Similarity of Words method for solving based on Ontological concept

Technical field

The present invention relates to Semantic Web technology field, and in particular to a kind of Similarity of Words based on Ontological concept is asked Solution method.

Background technology

At present, many scholars are paying close attention to the computational methods of Ontological concept similarity, and similarity problem is in philosophy, semanticss etc. By in-depth study and analysis in multiple subjects.Consider in terms of forefathers' title, attribute, structure mainly from concept etc. general The similarity of thought.Have first to calculate concept similarity before and be divided into two-layer：" initial similarity " and " by non-hyponymy body Existing similarity ", the former is mainly calculated using the distance between concept, and the latter is then on the basis of forefathers calculate, to lead to The non-hyponymy for crossing concept is calculated；Again comprehensively the two is just obtained the actual similarity of concept in domain body.Remove Outside this, also mainly by the hyponymy and other factorses between concept come the semanteme between the concept of calculating field inside Similarity.It has been proposed, for example, that a kind of comprehensive similarity calculating method, i.e., first according to the similarity mistake of two concept names Leach maximally related concept；Conceptual example, concept attribute and conceptual relation are based respectively on again and calculate concept similarity, and carry out comprehensive Close.Although now many applications using mass data due to can to a certain extent cover this problem, in many situations Under, my mode of mass data simultaneously do not apply to, and have ignored Study on Semantic so that the subjective feeling of calculated result and people Difference is often leading to great error.So the Similarity Measure of semanteme is just particularly important in this case, if can be with The similar word of each word is obtained, by the inquiry to similar word, the shared effect of user profile can be undoubtedly improved, In order to meet the demand, the present invention proposes a kind of Similarity of Words method for solving based on Ontological concept.

The content of the invention

Term similar problem for how to obtain each term, the invention provides the vocabulary language based on Ontological concept Adopted similarity method for solving.

In order to solve the above problems, the present invention is achieved by the following technical solutions：

Step 1：Initialization statistical method module.

Step 2：By word (c to be compared₁, c₂) in input initialization statistical method module.

Step 3：By word (c to be compared₁, c₂) it is mapped to Ontological concept module.

Step 4：Word (c to be compared is chosen respectively₁, c₂) the maximum Ontological concept g of correspondence depth₁、g₂。

Step 5：Calculate word (c to be compared₁, c₂) correspond between two maximum Ontological concepts of depth apart from dis (g₁, g₂)。

Step 6：Through above-mentioned steps, two word (c to be compared are calculated₁, c₂) most recent co mmon ancestor depth D (c₁, c₂)。

Step 7：Calculate two word (c to be compared₁, c₂) similarity sim (c₁, c₂)。

Present invention has the advantages that：

1st, this calculates Lexical Similarity method in the empirical value for quantifying conceptive closer expert

2nd, the method more fully, has more considered word (c to be compared₁, c₂) between the maximum Ontological concept of correspondence depth away from From factors such as, depth, the accuracy of semantic similarity result is greatly improved.

3rd, preferably improve the effect of ontology inference.

Description of the drawings

Similarity of Words method for solving structure flow charts of the Fig. 1 based on Ontological concept

Specific embodiment

To solve the problems, such as how to obtain the term similar of each term, the present invention is described in detail with reference to Fig. 1, Its specific implementation step is as follows：

Step 1：Initialization statistical method module.

Step 4：Word (c to be compared is chosen respectively₁, c₂) the maximum Ontological concept g of correspondence depth₁、g₂, it is specifically described such as Under：

Word C ∈ (c to be compared₁, c₂) and concept between be one-to-many relation, when the concept depth chosen is deeper, then wait to compare Compared with word C ∈ (c₁,c₂) then more concrete, it is more convenient to calculate word C ∈ (c to be compared₁, c₂) semantic similarity.This depth is in statistics It is easily found in module block, for example, exists《Hownet》In find the corresponding Ontological concept of word.

Step 5：Calculate word (c to be compared₁, c₂) correspond between two maximum Ontological concepts of depth apart from dis (g₁, g₂), need elder generation Seek the similarity sim (g of justice original item between two Ontological concepts₁, g₂), then calculate relative depth deepth (g between two Ontological concepts₁, g₂), concrete calculating process is as follows:

5.1) between two Ontological concepts justice original item similarity sim (g₁, g₂)

If c₁The maximum Ontological concept g of correspondence depth₁In containing n justice original, i.e. g₁∈(y₁, y₂..., y_n), c₂Correspondence is deep The maximum Ontological concept g of degree₂In containing m justice original, i.e. g₂∈(y₁', y₂' ..., y_m′)。

Calculate g two-by-two respectively₁With g₂The former similarity of middle justice, i.e. sim (y_i, y_j'), i ∈ (1,2 ..., n), j ∈ (1, 2 ..., m), g can be obtained₁With g₂Middle justice original item similarity matrix J (g₁, g₂), it is as follows：

Justice original similarity S maximum in each row vector is found out according to above-mentioned matrix_i, i.e.,

Finally obtain the similarity sim (g of justice original item between two Ontological concepts₁, g₂), it is as follows：

sim(g₁, g₂)=max (S₁, S₂..., S_n)

5.2) relative depth deepth (g between two Ontological concepts is calculated₁, g₂)

deepth(g₁, g₂)=d₁-d₂

Above formula d₁For c₁The maximum Ontological concept g of correspondence depth₁Depth value in the module, in the same manner d₂For c₂Correspondence depth is most Big Ontological concept g₂Depth value in the module, this can be easy to draw according to module.

5.3) word (c to be compared is calculated₁,c₂) correspond between two maximum Ontological concepts of depth apart from dis (g₁, g₂)

Above formula α is smoothing factor, and this as the case may be, is specifically given by expert.

Step 6：Through above-mentioned steps, two word (c to be compared are calculated₁, c₂) most recent co mmon ancestor depth D (c₁, c₂), tool Body is described as follows：

According to module, two word (c to be compared can be found₁, c₂) most recent co mmon ancestor depth D (c₁,c₂).Here two treat Comparing word (c₁, c₂) most recent co mmon ancestor depth, the closer to bottom, represents two word (c to be compared₁, c₂) more close.

Step 7：Calculate two word (c to be compared₁, c₂) similarity sim (c₁, c₂), its concrete calculating process is as follows：

Above formula β is weight factor, as β ＞ 0.5, the depth D (c of common ancestor₁, c₂) to similarity sim (c₁, c₂) Affect larger, otherwise, apart from dis (g between two Ontological concepts₁, g₂) to similarity sim (c₁, c₂) impact it is larger.Rule of thumb Can obtain, the latter is to sim (c₁, c₂) affect bigger.

Based on the Similarity of Words method for solving of Ontological concept, its false code calculating process：

Input：Initialization module, word (c to be compared₁, c₂)

Output：Word (c to be compared₁, c₂) similarity sim (c₁, c₂)。

Claims

1. the Similarity of Words method for solving of Ontological concept is based on, the present invention relates to Semantic Web technology field, specifically relates to And a kind of Similarity of Words method for solving based on Ontological concept, it is characterized in that, comprise the steps：

Step 1：Initialization statistical method module

Step 2：By word to be comparedIn input initialization statistical method module

Step 3：By word to be comparedIn being mapped to Ontological concept module

Step 4：Word to be compared is chosen respectivelyThe maximum Ontological concept of correspondence depth

Step 5：Calculate word to be comparedDistance between two maximum Ontological concepts of correspondence depth

Step 6：Through above-mentioned steps, two words to be compared are calculatedThe depth of most recent co mmon ancestor

Step 7：Calculate two words to be comparedSimilarity。

2., according to the Similarity of Words method for solving based on Ontological concept described in claim 1, it is characterized in that, the above Concrete calculating process in the step 5 is as follows：

Step 5：Calculate word to be comparedDistance between two maximum Ontological concepts of correspondence depth, need to first ask two The similarity of justice original item between body conceptRelative depth between two Ontological concepts is calculated againConcrete meter Calculation process is as follows:

5.1）The similarity of justice original item between two Ontological concepts

IfThe maximum Ontological concept of correspondence depthIn containing n justice original, i.e.,Correspondence depth Maximum Ontological conceptIn containing m justice original, i.e.,

Calculate two-by-two respectivelyWithThe former similarity of middle justice, i.e.,

Can obtainWithMiddle justice original item similarity matrixIt is as follows：

Justice original similarity maximum in each row vector is found out according to above-mentioned matrixI.e.

Finally obtain the similarity of justice original item between two Ontological conceptsIt is as follows：

5.2）Calculate relative depth between two Ontological concepts

Above formulaForThe maximum Ontological concept of correspondence depthDepth value in the module, in the same mannerForCorrespondence depth is most Big Ontological conceptDepth value in the module, this can be easy to draw according to module

5.3）Calculate word to be comparedDistance between two maximum Ontological concepts of correspondence depth

Above formulaFor smoothing factor, this as the case may be, is specifically given by expert.

3., according to the Similarity of Words method for solving based on Ontological concept described in claim 1, it is characterized in that, the above Concrete calculating process in the step 7 is as follows：

Step 7：Calculate two words to be comparedSimilarityIts concrete calculating process is as follows：

Above formulaFor weight factor, whenWhen, the depth of common ancestorTo similarityImpact It is larger, otherwise, distance between two Ontological conceptsTo similarityImpact it is larger, rule of thumb may be used , the latter coupleAffect bigger.