TW589549B

TW589549B - Method for calculating conceptual similarity based on common sense tree marking method

Info

Publication number: TW589549B
Application number: TW91137148A
Authority: TW
Inventors: Wen-Tai Hsieh; Shih-Chun Chou
Original assignee: Inst Information Industry
Priority date: 2002-12-24
Filing date: 2002-12-24
Publication date: 2004-06-01
Also published as: TW200411422A

Abstract

A method for calculating conceptual similarity based on common sense tree marking method comprises the following procedures: firstly provide a common sense tree structure, which provides multiple common sense trees, subsequently provide a first glossary and second glossary, in which the first glossary provides at least a first definition and the second glossary provides at least a second definition, then determine if the first definition and the second definition are located at the same common sense tree, when the first definition and the second definition correspond to the same common sense tree, then determine at least a score value of the corresponding common sense tree, finally determine the similarity of the first glossary and the second glossary based on the score value.

Description

【發明所屬之技術領域】特別ΪΆ有！Γ建立同義詞庫有關於一種根據常識樹狀標詞庠及反義詞庫之方法。【先則技術】及反義詞庫之方法，示法計算概念相似度且建現今無法表達呑吾思相似正確、有義詞，值同義詞庫【發明内同義詞庫通常只能指出兩個 ’彙具有之定義的相似度，度如文件相似度比對、文件效率的服務，舉例來說，老兩者皆包含”人π、’，教學”、無法快速且客觀的新增詞彙容】柯彙是否為同義詞，所以在搜尋引擎以及自動摘要將無法提供師與教授雖然不是同 π教育"的定義，造成法則。有的相似為常識樹明實施其包括識樹，一第一，本發建立同發明之法計算據常識明的主要目義詞庫以及上述目的，概念相似度樹狀標示法 t計算達成本狀標示例之根下列步驟首先提供一常識接著提供第定義第二詞定義與二定義計分值似度。詞彙和第二彙具有至少第二定義是落於相同之，最後根據反義詞可藉由之方法計算概樹狀結詞彙， -第二否位於常識樹計分值提供一種利用詞彙庫之方所提供來達成念相似構，其第一詞定義，相同之時，則 ’決定法。之一種根據。依據本發度之方法，具有複數常彙具有至少接下來分別常識樹，當決定對應常第一詞彙和判斷上述帛_ 第一定義和第識樹之至少— 第二詞彙之相[Technical field to which the invention belongs] Particularly rare! Γ Establishing a Thesaurus is related to a method based on common sense tree marks and antonyms. [Principal technique] and antonym thesaurus methods, notations that calculate conceptual similarity and that are not able to express liguria similarity, meaning words, value thesaurus [thesaurus in the invention can usually only indicate that two Defined similarity, such as file similarity comparison and file efficiency services. For example, the old two include "person π, ', teaching", and can not quickly and objectively add vocabulary. Synonyms, so search engines and automatic summaries will not be able to provide teachers and professors although they are not the same as the definition of education ", resulting in rules. Some similarities are implemented by common sense tree. It includes the knowledge tree. First, the present invention establishes the main thesaurus of common sense based on the method of the invention and the above purpose. The concept similarity tree marking method t calculates the cost. The root of the standard example The following steps first provide a common sense and then provide a definition of the second word definition and the second definition of the score value similarity. The vocabulary and the second vocabulary have at least a second definition that is the same. Finally, the antonym can be used to calculate the approximate tree-like knot vocabulary.-The second is located in the common sense tree. In order to achieve a similar structure, the first word is defined. One basis. According to the method of the present invention, there are plural constants with at least the next common sense tree, when it is decided to correspond to the first vocabulary and judge the above __ the first definition and the at least the cognition tree-the second vocabulary

589549 五、發明說明（2) 為讓本發明之上述和其他目的、特徵、和優點能更明顯易懂，下文特舉出較佳實施例，並配合所附圖式，作詳細說明如下：【實施方式】請參考第1圖，第1圖為示意圖係顯示依據本發明實施例之根據常識樹狀標示法計算概念相似度方法之操作流程圖〇 ' 首先’如步驟S1 〇，擷取兩詞彙，在一文章中擷取兩 =彙’例如教師及導師，一常識樹狀結構具有複數常識樹士例如貫體（e n t i t y )常識樹、一標言志（a 11 r i b u t e )常識樹、一事件（even)常識樹以及一第2特徵（secondary 請參考附件），常識樹為電腦系統所使用之a彙之子典，包含不同詞彙之定義，其中一在一常識樹中越底層其具有越高相似；常識樹狀結構查詢兩詞棄所包含，；依么識樹中找到-定義，| ·位' 事件常識樹；f二定=常，第2特徵常識樹中找到一義” 疋義教師”在常識樹狀結構中實p A1丹，月，另外，詞彙”導 ^tta,J^^^ ^ i 中找到兩個定義”良，，與”教古n。第2特徵常識樹教師所包括之定義：^ ° humanl人—實體常識樹589549 V. Description of the invention (2) In order to make the above and other objects, features, and advantages of the present invention more comprehensible, the following describes the preferred embodiments and the accompanying drawings in detail, as follows: [ Embodiment] Please refer to FIG. 1. FIG. 1 is a schematic diagram showing an operation flowchart of a method for calculating conceptual similarity according to a common sense tree marking method according to an embodiment of the present invention. ‘First’, as step S1 〇, extract two words Take two = sinks in one article, such as teachers and mentors, a common sense tree structure with a plurality of common sense trees, such as an entity common sense tree, an 11 ribute common sense tree, and an event ( even) the common sense tree and a second feature (secondary, please refer to the attachment), the common sense tree is a subset of the a used in computer systems, contains definitions of different words, one of which has higher similarity at the lower level in a common sense tree; common sense Tree structure query contains the two word discards; according to-find in the knowledge tree-definition, | · bit 'event common sense tree; f two fixed = constant, the second feature common sense tree is found "Yiyi teacher "In the common sense tree structure, p A1 Dan, Month, and In addition, two definitions are found in the vocabulary" Guide ^ tta, J ^^^ ^ i, "and" teach ancient n. "The 2nd characteristic common sense tree teacher's office Included definitions: ^ ° humanl person-entity common sense tree

589549 五、發明說明（3) occupat ion |職位—標誌常 teach |教-事件常識樹 " education |教育〜第2 導師所包括之定義： & human |人-實體常識樹 guide |引導-事件常識樹 desired |良-第2特徵常識樹 teach |教-事件常識樹 5 education |教育_第2特徵常識樹接著，步驟S14，判斷兩詞彙所包含之相同之常識樹上，將兩詞彙中所包含之定義逐我—疋否位於斷兩定義是否位於相同之常識樹上，步驟比對，判位於相同之常識樹上’計算兩定義之計分值，：兩定義義在相同之常識樹上為字面相同、義元相同以I」斷兩定相同之定義，將給予兩定義之常缚樹一計分值，=文拼法義在此常識樹具有最高相似度’計算方法為樹之^底=疋 (TheDeppestLevel )除以括號樹之最底層加丨之值（樹I 底層/ (樹之最底層+ 1 ))，例如當教師中定義"人"與導師中定義"人"’具有相同字面、相同義元以及相同英文拼法’經過計算將得到一計分值為8/(8 + 1)，表示兩定義在此常識樹具有最高相似度，其中8為實體常識樹之最底層，教師中定義”教"與導師中定義"教"，具有相同字面、相同義元以及相同英文拼法，經過計算將得到一計分值為 1 3 /(1 3 + 1 )，表示兩定義在此常識樹具有最高相似度，其589549 V. Description of the invention (3) occupat ion | Position—Signal often teach | Teaching-event common sense tree " education | Education ~ 2nd tutor includes definitions: & human | person-entity common sense tree guide | guide-event Common sense tree desired | Good-Second feature common sense tree teach | Teaching-Event common sense tree 5 education | Education_Second feature common sense tree Next, step S14, judging that the two words contain the same common sense tree, and place the two words in The included definitions are driven by me—whether they are located on the same common sense tree and whether the two definitions are located on the same common sense tree. Steps are compared to determine whether they are located on the same common sense tree. For literally the same definition and the same meaning, I "will determine that the two definitions are the same. A score will be given to the two definitions of the regular bound tree. = The text spelling method means that the common sense tree has the highest similarity. The calculation method is the bottom of the tree =疋 (TheDeppestLevel) divided by the lowest level of the parentheses tree plus the value of 丨 (the bottom level of the tree I / (the lowest level of the tree + 1)), for example, when the teacher defines " person " and the instructor defines " person " ' Have the same literal, phase Yiyuan and the same English spelling 'after calculation will get a score of 8 / (8 + 1), which means that the two definitions have the highest similarity in this common sense tree, of which 8 is the lowest level of the physical common sense tree, which is defined by the teacher " Teach "quote" and "define" in the tutor, have the same literals, the same meaning, and the same English spelling. After calculation, you will get a score of 1 3 / (1 3 + 1), which means that the two definitions are common sense. The tree has the highest similarity, and its

0213-8982TWF(nl);STLjC-01-B9192;FRANKLIN.ptd $ 6頁 589549 五 -發明說明（4) — 中1 3為事件常識樹之最底層，以及教師中定義"教育’，師中定義，，教育，，，具有相同字面、相同義元以及相θ ^導拼法，經過計算將得到一計分值為2/(2 + 1 )，表示英文在此常識樹具有最高相似度，其中2為第2特徵常螞定義底層。命彳对之最若為子面不相同、義元不相同或英文拼法不相同之^ 義’尋找兩定義在相同之常識樹中具有最深的共同父節^ 值除以括號常識樹之最底層（TheDep pest Level )加丨（最深的共同父節點值/(常識樹之最底層+ Π)來計算一計分值此外’若兩定義為反義或對義將所計算求得之計分數加上負號為負計分值。員體$硪樹：計分值（humanl人，human|人、= 8/(8+1) ；事件常識樹計分值（teach 13/(13/1) 第特彳政《硪樹：計分值（e d u c a t i ο η |教盲， eduCation 丨教育）=2/(2+1) 算所d步驟S18 ’將計分值記錄至-記錄表，將計 2/(2 + 1) 值’例如將8/(8 + 1) ' 13/〇3/1)以及 Z / I Z + 1 )二個許八括記錄表中有計分刀δ上至一記錄表’記錄表同時記載在樹、事件常ί吊Μ之總和’例如為3(實體常識義散佈在多；:ί2特二常,識樹)以及兩詞彙所包含之货樹、事件常η二°a =的常識樹之總和，例如為4 (實體常益哉祕、第2特徵常識樹 '標誌常識樹）。0213-8982TWF (nl); STLjC-01-B9192; FRANKLIN.ptd $ 6 pages 589549 Five-Explanation (4) — Medium 1 3 is the lowest level of the common sense tree of events, and the definition in the teacher " Education ', the teacher's The definition, education, has the same literals, the same meaning and the same θ ^ guide spelling. After calculation, a score value of 2 / (2 + 1) is obtained, which means that English has the highest similarity in this common sense tree, where 2 defines the bottom layer for the second feature Changma. The fate of the pair is the one with different sub-faces, different meanings, or different spellings in English ^ meaning 'find the two definitions that have the deepest common parent section in the same common sense tree ^ value divided by the bottom of the common sense tree (TheDep pest Level) Add 丨 (the deepest common parent node value / (the lowest level of the common sense tree + Π) to calculate a score value. In addition, if the two are defined as antisense or antisense, add the calculated score The minus sign is a negative scoring value. Member $ 硪 Tree: scoring value (humanl person, human | person, = 8 / (8 + 1); event common sense tree scoring value (teach 13 / (13/1) The first special policy "Sassafras tree: scoring value (educati ο η | education blindness, eduCation 丨 education) = 2 / (2 + 1) d step S18 of the office 'record the scoring value to-record table, will count 2 / (2 + 1) value 'e.g. 8 / (8 + 1)' 13 / 〇3 / 1) and Z / IZ + 1) There are two scoring records in the scoring table δ up to a recording table 'The record table records both the sum of the tree and the event often, for example, 3' (for example, the common sense of the entity is scattered in many places ;: 2 special and two constants, the tree of knowledge) and the goods tree and event that are included in the two words a = total of common sense trees , For example 4 (Zai secret entity often beneficial, wherein the second sense Tree 'flag knowledge tree).

0213-8982TW(nl) ;STLC-〇i FRANKLIN.ptd 第7頁 589549 五、發明說明（5) 步驟S20，讀取記錄表產生一比例值， 5十义值的堂均括4·々i 錄表中有義螂德* 3 Α Μ 和列如為3除以兩詞橐所包含之定義政佈在多少常識樹的常識二：之二棄所包含之定義散佈在多少常識：的1 兩詞依據記錄表中之資訊與比例值二值，將記錄表中計分值之總和除以記錄表中有计刀值的吊識樹之總和再乘以比例值，例如 ^ (13/(13 + 1)) + (2/(2 + 1)) / 3) * 3/4 為計算這三顆常識樹包含之定義在常識樹中之平均深度，其中（（8/(8 + 1) + (13/(13 + 1)) + (2/(2 + 1)) / 3)為三顆常識樹之計分值的平均值，以及加上常識樹包含之定義的廣度為比例值乘以括號1減比例值，例如（3/4 * (1-(3/4))產生一相似度，例如為 0.80 85 [((8K8 + 1)) + (13/(13+1)) + (2/(2+1)) / 3) * 3/4] + (3/4 * (卜（3/4))= 0.8085)。 (計分值之總和/ 記錄表中有計分值的常識樹之總和 * 比例值）+ (比例值* (1 - 比例值））最後，請參考第3圖，第3圖為示意圖係顯示依據本發明實施例之根據常識樹狀標示法計算概念相似度方法之關聯詞庫，步驟S24，依據兩詞彙之相似度，建立一同義詞庫及反義詞庫，將所計算兩詞彙之相似度儲存至關聯辭庫，統計關聯辭庫中每個詞彙與其他詞彙的相似度，依據相似度高的詞彙建立一同義詞庫，以及依據相似度低的詞0213-8982TW (nl); STLC-〇i FRANKLIN.ptd page 7 589549 V. Description of the invention (5) Step S20, read the record table to produce a proportional value, the 50 values of the meaning are included in the 4 · 々i record In the table, there is a protagonist * 3 Α Μ and the column is 3 divided by two words. The definition of how much common sense is included in the common sense tree. Second: How much is the definition of the common sense that is discarded: 1. Two words Based on the two values of the information in the record table and the proportional value, divide the sum of the score values in the record table by the sum of the tree with the knife count value in the record table and multiply by the proportional value, such as ^ (13 / (13 + 1)) + (2 / (2 + 1)) / 3) * 3/4 is the average depth of the three common sense trees defined in the common sense tree, where ((8 / (8 + 1) + ( 13 / (13 + 1)) + (2 / (2 + 1)) / 3) is the average of the score values of the three common sense trees, plus the breadth of the definition contained in the common sense tree is the proportional value multiplied by the brackets 1 minus the proportional value, for example (3/4 * (1- (3/4)) produces a similarity, such as 0.80 85 [((8K8 + 1)) + (13 / (13 + 1)) + (2 / (2 + 1)) / 3) * 3/4] + (3/4 * (Bu (3/4)) = 0.8085). (Sum of the scoring values / There are points in the record table Sum of Common Sense Trees * Proportional Value) + (Proportional Value * (1-Proportional Value)) Finally, please refer to Figure 3, which is a schematic diagram showing the calculation concept based on the common sense tree notation according to the embodiment of the present invention Correlation lexicon of the similarity method, step S24. Based on the similarity of the two vocabularies, establish a thesaurus and an antonym, store the calculated similarity of the two vocabularies in the related lexicon, and count each word in the related lexicon and other words. Similarity, build a thesaurus based on words with high similarity, and words based on low similarity

0213-8982TWF(nl);STLC-〇l-B9192;FRANKLIN.ptd 第8頁 589549 五、發明說明（6) 以下以一實施例說明若定義之字面不相同或英文拼法不相同，將尋找兩定義在相同^當，元不相有最深的共同父節點值除以括號常識樹之最底哉樹中具 (TheDeppestLevel)加1 (最深的共同父節點之最底層+ 1))來計算一計分值，舉例來說，/、,（常熾樹取兩詞彙，從一文章中擷取兩詞彙，例如貧賤首^先，抬員著，根據常識樹結構查詢出兩詞彙所包含之定$ _責。接彙貧賤所包含之定義為，，屬性值，，、，，等铋"1義’例如詞 - 久、低等·· 彙建立一反義詞庫位於榡誌值莠彙貧賤所包含之定義為’’屬性值、，，等級 π 、π貧富，，以及，，窮”，其中’’屬性值，，以及”窮 (Attribute Value)常識樹中，丨丨等級”、丨，你心寺’’以" 貧 "位於標誌常識樹中，以及"筹"位於第2特徵常識樹中，另畐一詞彙富貴所包含之定義為”屬性值”、，，貧舍二"富,，、" 良，其中"屬性值"以及"富"位於標誌值當㈣±中，"貧富 "位於標誌常識樹中良"位於第2特 a中。貧賤所包含之定義： ^又吊4树中 ❿ aValue |屬性值-標誌值常識樹 rank |等級-標誌常識樹 LowRank |低等-標誌常識樹 undesired I 莠—^ 0 .. • h ^第2特徵常識樹 richness丨負虽〜於知适常識樹 poor |窮-&諸值常識樹富貴所包含之定義： aValue |屬性值〜辦铽“值常識樹0213-8982TWF (nl); STLC-〇l-B9192; FRANKLIN.ptd page 8 589549 V. Description of the invention (6) The following is an example to explain if the definitions are different or the English spelling is different, two Defined at the same time, when the element has the deepest common parent node value divided by the bracket in the bottom of the common sense tree (TheDeppestLevel) plus 1 (the bottommost common parent node + 1)) to calculate a count Scores, for example, / ,, (Chang Chi tree takes two words, extracts two words from an article, such as poor first ^ first, lifted the staff, according to the structure of the common sense tree to find out what the two words contain _Responsibility. The definition contained in the foreign exchange poverty is, the attribute value ,,,,, etc. bismuth " 1 meaning 'such as the word-Jiu, low, etc. · The sink establishes an antonym thesaurus is located in the value of the Hui poverty The definition is `` attribute value ,, level π, π rich and poor, and, poor '', where `` attribute value '', and `` attribute value '' in the common sense tree, 丨丨 level, 丨 your heart The temple `` is located in the common sense tree with " poor " and " chip " 2 In the feature common sense tree, the other word rich and rich contains the definition of "attribute value", ", poor", "rich,", "good", where " attribute value " and " rich " are located The value of the mark is ㈣ ±, and "the rich and the poor" is located in the mark common sense tree. It is located in the second special a. The definition of poverty includes: ^ again in the 4 tree ❿ aValue | attribute value-mark value common sense tree rank | Rank-Sign Common Sense Tree LowRank | Low-Sign Common Sense Tree undesired I 莠 — ^ 0 .. • h ^ The 2nd feature common sense tree richness 丨 Negative ~~ Knowledge Common Tree Poor | Poor- & Values Common Sense Tree Definition included in riches: aValue | attribute value ~ do "value common sense tree

0213.8982TWF(nl);STLC-01-B9192；reANKLIN.ptd 第9頁 589549 五、發明說明（7) richness |貧富〜標誌常識樹 rich丨富—標誌值常識樹 desired |良-第2特徵常識樹士接下來，判斷兩詞彙所具有之定義是否位於相同之常識樹上，若兩個定義位於相同之常識樹上具有相同字面、，同義元以及相同英文拼法，將給予兩定義之常識樹一計分值i其具有最高相似度，例如詞彙貧賤之定義”貧富”與 e司彙备貝之定義"貧富”在標誌常識樹中具有最高相似度三接下來’參考第2圖，第2圖為示意圖係顯示依據本發明實施例之第2特徵常識樹，本發明實施例之操作流程將說明於下，斷詞彙貧賤之定義”莠”與詞彙富貴之定義”良具有不相同字面、不相同義元以及不相同英文拼法，因此將在標諸值常識樹中搜尋兩定義最深的共同父節點值’例如為2 ’依據上述計分值演算法求得計分值為 (2/(2+1 ))，而由於詞彙貧賤之定義”莠”與詞彙富貴之定義π良’’為反義所以將計算求得之計分數加上負號產生一負計分值為-(2 /( 2 + 1 ))。此外判斷詞彙貧賤之定義”窮，，與詞彙富貴之定義”富"，具有不相同字面、不相同義元以及不相同英文拼法，因此將在標誌值常識樹中搜尋兩定義最深的共同父節點值，例如為3，依據上述計分值演算法（最深的共同父節點值常識樹之最底層+ 1 ))求得計分值為（3 /( 4 + 1 ))，而由於詞彙貧賤之定義”窮”與詞彙富貴之定義"富π為反義所以將計算求得之計分值加上負號產生一負計分值為-（3 / ( 4 + 1 ))。0213.8982TWF (nl); STLC-01-B9192; reANKLIN.ptd Page 9 589549 V. Description of the invention (7) richness | rich or poor ~ sign common sense tree rich 丨 rich—sign value common sense tree desired | good-second characteristic common sense tree Next, determine whether the definitions of the two words are located on the same common sense tree. If the two definitions are on the same common sense tree and have the same literals, synonyms, and the same English spelling, the two definitions will be given a common sense tree. The scoring value i has the highest similarity, for example, the definition of the word "poor and rich" and the definition of "economy" " poor and rich "in the common sense tree have the highest similarity. Next, refer to Figure 2 and Figure 2. The diagram is a schematic diagram showing the second feature common sense tree according to the embodiment of the present invention. The operation flow of the embodiment of the present invention will be described below. The definition of the word "poor" is not the same as the word "rich". The same meaning element and different English spellings, so the two commonest parent node values with the deepest definition in the standard common sense tree will be searched. For example, 2, the score value is obtained according to the above score algorithm. Is (2 / (2 + 1)), and because the definition of vocabulary “贱” and the definition of vocabulary ‘good’ are antonyms, the calculated score plus a negative sign produces a negative score. Is-(2 / (2 + 1)). In addition, the definition of poor vocabulary "poor," and the definition of rich and vocabulary "rich" have different literals, different meanings, and different English spellings. Therefore, the two deepest common fathers will be searched in the flag value common sense tree. The node value, for example, 3, according to the above-mentioned scoring algorithm (the deepest common parent node value is the lowest level of the common sense tree + 1)), the scoring value is (3 / (4 + 1)), and due to poor vocabulary The definition of "poor" and the definition of vocabulary "rich π" are antonyms, so the calculated score value plus a negative sign produces a negative score value-(3 / (4 + 1)).

589549 五、發明說明（8) 標誌、常識樹：計分值（richnessl貧富，richness I貧畐）=3/(3+1) 窮，r i ch I 富）標誌值常識樹：計分值（p〇〇r | -(3/(4+1)) 第2特徵常識樹：計分值（undesired丨赛，desired| 良）=-(2/(2 + 1 )) >接著，將所計算得之計分值記錄至一記錄表，記錄表中圮載在記錄表中有計分值的常識樹之總和，例如為3 (標誌常識樹' 標誌值常識樹、第2特徵常識樹）以及兩詞彙所包含之定義散佈在多少常識樹的常識樹之總和，例如為 3 (標誌常識樹、標誌值常識樹、第2特徵常識樹），利用讀取記錄表中貢訊，將記錄表中有計分值的常識樹之總和，例如為3除以所有定義散佈在多少常識樹的常識樹之總和，例如為3產生一比例值，例如3 / 3。最後’依據記錄表中之資訊與比例值之間關係，計算這二顆常識樹包含之定義在常識樹中之平均深度以及加上常識樹包含之定義的廣度產生一相似度，例如-0 · 5丨6 6。 (3/(3+1)+(-3/(4+1)+(-2/(2+1)))/3)*3/3+(3/3*(1-3/3))=-0· 5166 根據以上所述’本發明根據常識樹狀標示法計算概念相似度之方法，能正確表達兩詞彙間相似度，並且具有快速且客觀的新增詞彙法則，因此在知識管理的應用上極具有產業的價值。雖然本發明已以較佳實施例揭露如上，然其並非用以589549 V. Description of the invention (8) Mark and common sense tree: score value (richnessl rich, richness I poor) = 3 / (3 + 1) poor, ri ch I rich) Mark value common sense tree: score (p 〇〇r |-(3 / (4 + 1)) The second feature common sense tree: score value (undesired 丨 match, desired | good) =-(2 / (2 + 1)) > Next, calculate the The score value obtained is recorded in a record table. The total of the common sense tree with the score value contained in the record table is recorded in the record table, for example, 3 (sign common sense tree 'sign value common sense tree, second characteristic common sense tree), and The definition of the two common words in the two words is the sum of common sense trees. For example, it is 3 (sign common sense tree, flag value common sense tree, and second characteristic common sense tree). Sum of common sense trees with a scoring value, for example, 3 divided by the sum of all common sense trees that define how many common sense trees are scattered, for example, a proportional value is generated for 3, such as 3 / 3. Finally, according to the information in the record table and The relationship between the ratio values, calculate the average depth of the two common sense trees defined in the common sense tree and add the common sense tree The breadth of the included definition produces a similarity, such as -0 · 5 丨 6 6. (3 / (3 + 1) + (-3 / (4 + 1) + (-2 / (2 + 1))) / 3) * 3/3 + (3/3 * (1-3 / 3)) =-0 · 5166 According to the above, the method of the present invention for calculating the similarity of concepts according to the common sense tree marking method can correctly express two words The similarity between them and the fast and objective addition of vocabulary rules, so it has great industrial value in the application of knowledge management. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to

第11頁〇213-8982TWF(nl);STljC-01-B9192;FRANKLIN.ptd 589549 五、發明說明（9) 限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍内，當可作各種之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Page 11 〇213-8982TWF (nl); STljC-01-B9192; FRANKLIN.ptd 589549 V. Description of the invention (9) The invention is limited, and anyone skilled in the art will not depart from the spirit and scope of the invention. Various modifications and retouching can be made, so the protection scope of the present invention shall be determined by the scope of the attached patent application.

0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第12頁 589549 圖式簡單說明第1圖為示意圖係顯示依據本發明實施例之根據常識樹狀標示法計算概念相似度方法之操作流程圖；第2圖為示意圖係顯示依據本發明實施例之第2特徵常識樹；第3圖為示意圖係顯示依據本發明實施例之根據常識樹狀標示法計算概念相似度方法之關聯詞庫。【符號說明】 S 1 0〜擷取兩詞彙； S1 2〜依據常識樹狀結構查詢兩詞彙所包含之定義； S1 4〜判斷兩詞彙所包含之定義是否位於相同之常識樹上； S1 6〜當兩定義位於相同之常識樹上計算兩定義之計分值； 51 8〜將計分值記錄至一記錄表； 52 0〜讀取記錄表產生一比例值； S2 2〜依據記錄表中之資訊與比例值之間關係計算產生一相似值； S24〜依據兩詞彙之相似度建立一同義詞庫及反義詞庫； 3 0〜關聯詞庫。0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 12 589549 Brief description of the diagram Figure 1 is a schematic diagram showing the operation of the method for calculating conceptual similarity according to the common sense tree notation according to the embodiment of the present invention Flowchart; Figure 2 is a schematic diagram showing a second feature common sense tree according to an embodiment of the present invention; Figure 3 is a schematic diagram showing a related thesaurus of a method for calculating conceptual similarity based on a common sense tree notation according to an embodiment of the present invention. [Symbol description] S 1 0 ~ capture two words; S1 2 ~ query the definition of two words according to the common sense tree structure; S1 4 ~ judge whether the definitions contained in the two words are on the same common sense tree; S1 6 ~ When the two definitions are located on the same common sense tree, the score values of the two definitions are calculated; 51 8 ~ Record the score value to a record table; 52 0 ~ Read the record table to generate a proportional value; S2 2 ~ According to the record table The calculation of the relationship between the information and the proportional value produces a similar value; S24 ~ Establish a thesaurus and antonym based on the similarity of the two words; 3 0 ~ related thesaurus.

0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第13頁0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 13

Claims

589549 VI. Application for patent scope 1. * A method for calculating similarity of nasal concepts based on common sense tree marks, which includes the following steps: Provide a common sense tree structure with a plurality of common sense trees; provide a first word and a second word The first vocabulary has at least a first definition and the second vocabulary has at least a second definition; judging whether the first definition and the second definition are located in the same common sense tree, respectively; when the first definition and the first When the two definitions are located in the same common sense tree, at least one score value corresponding to the common sense tree is determined; based on the score value, the similarity between the first word and the second word is determined. 2. The method for calculating conceptual similarity based on the common sense tree notation described in item 1 of the scope of patent application, wherein when the first definition is located in the same common sense tree as the second definition and the first definition is the same as the first The two definitions have the same literals, the same meanings, and the same English spelling. The above-mentioned first definition is the same as the above-mentioned second common sense tree. The score value is the above-mentioned first definition and the above-mentioned second definition. The highest similarity of common sense trees. 3. As described in item 2 of the scope of patent application, the method for calculating the similarity of concepts based on the common sense tree notation, wherein the method for calculating the highest similarity of the common sense tree is (the lowest level of the tree / (the lowest level of the tree) + 1)) ° 4. The method for calculating conceptual similarity based on the common sense tree designation method described in item 1 of the scope of patent application, wherein the steps further include

0213-8982TW (nl); STljC-01-B9192; FRANKLIN.ptd page 14 589549 6. The scope of the patent application is not the same as the second definition above, the meaning is different, or the English spelling is different, which will be in the common sense tree above. Find the deepest common parent node and calculate the score of one of the common sense trees above. 5. The method for calculating conceptual similarity based on the common sense tree notation as described in item 4 of the scope of patent application, wherein the score-value method for calculating one of the above common sense trees is (the deepest common parent node value / (of the common sense tree The lowest level + 1)) 〇6. As described in item 1 of the scope of patent application, the method of calculating conceptual similarity based on common sense tree marking method, which further includes a step of recording the calculated positive and negative score values to a Record Form. 7. The method for calculating conceptual similarity based on the common sense tree marking method described in item 6 of the scope of the patent application, wherein the above-mentioned record table further includes the total T1 of the common sense tree recorded in the record table and the above-mentioned first The sum of the common sense tree T2 of the definition and the second definition above are scattered. 8. The method for calculating conceptual similarity based on common sense tree notation as described in item 7 of the scope of patent application, wherein the above-mentioned T1 value is divided by the above-mentioned T2 to produce a proportional value. 9. The method for calculating the similarity of a concept according to the common sense tree designation method described in item 1 of the scope of patent application, wherein the steps further include scoring the above points when the first definition and the second definition are antisense or opposite. A negative sign is added to the value to produce a negative scoring value. 10. The method for calculating conceptual similarity based on the common sense tree marking method described in item 6 of the scope of patent application, wherein a total of a score value is generated according to the above record table divided by a common sense tree with a score value in the record table Sum

0213-8982TWF (nl); STljC-01-B9192; FRANKLIN.ptd page 15 ~ 'Shenjing patent range ^ The ratio value is multiplied by brackets 丨 minus the above ratio value (the sum of the scoring values / = in the table) The sum of the common sense tree of the scoring values * proportional value) + (proportional value 0-proportional value)) produces a similar value. ~ 11 · As described in Item 10 of the scope of patent application, the method for calculating the similarity of concepts based on the common sense tree mark, wherein the similarity is stored in an association dictionary, and each 3 Phases of other vocabularies = degree, a synonym database is established based on the above-mentioned similar words with a high degree of similarity, and an antonym dictionary is established based on the words with a low similarity degree. Method 1 2-A method for calculating the similarity of nasal outlines according to the common sense tree designation method, which includes the following steps: providing a common sense tree structure; providing a definition of a first word and a second word, the second word having at least The similarity between the first vocabulary and the second vocabulary is determined according to the correlation between the first definition and the second construct. 1 3 · The method for calculating conceptual similarity based on the common sense tree designation as described in item 12 of the scope of patent application, wherein the first definition is true when the first definition and the second definition are located in the same common sense tree. It has the same literals, identical meanings, and same spellings as the second definition above. ”The above-mentioned first definition is the same as the second definition. The highest similarity between the definition and the second definition in the common sense tree. Η. If the U.S. households in the scope of the patent application apply the tree-based labeling based on common sense, the first term above has at least one / number "—definition; and the definition is in the common-sense tree-like structure.

0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd page 16 589549 VI. Method for calculating conceptual similarity by applying the patent scope method, in which the method for calculating the southernmost similarity of the common sense tree is (tree The lowest level of the tree / (the lowest level of the tree + 1)) ° 1 5. The method for calculating the similarity of the concept according to the common sense tree notation as described in item 12 of the scope of patent application, wherein the steps further include the above first definition It is literally different from the second definition above, the meaning is different, or the English spelling is not the same. The deepest common parent node will be found in the common sense tree and a score value of one of the common sense trees will be calculated. 1 6. The method for calculating conceptual similarity based on the common sense tree notation as described in item 15 of the scope of patent application, wherein one of the above common sense trees is calculated as (the deepest common parent node value / (common sense tree) The lowest level + 1)) 〇1 7. The method for calculating conceptual similarity based on the common sense tree marking method described in item 12 of the scope of patent application, wherein the steps further include recording the calculated positive and negative score values To a record sheet. 1 8. The method for calculating conceptual similarity based on the common sense tree marking method described in item 17 of the scope of patent application, wherein the above-mentioned record table further includes the total T1 of the common sense tree recorded in the record table with a score value and the above-mentioned first The sum of the common sense tree T1 of one definition and the second definition above is scattered. 19. The method for calculating conceptual similarity based on common sense tree notation as described in item 18 of the scope of patent application, wherein the above-mentioned Tl · value is divided by the above-mentioned T2 to produce a proportional value. 2. The method for calculating the similarity of a concept based on the common sense tree notation as described in item 12 of the scope of patent application, wherein the steps further include when the first

0213-8982TWF (nl); STLC-01.B9192; FRANKLIN.ptd Page 17 589549 6. The definition of the scope of patent application and the above second definition are antisense or opposite. Add the above score value to a negative sign to generate a Negative score. 2 1. The method for calculating conceptual similarity based on the common sense tree designation method described in item 17 of the scope of patent application, wherein the total of a score value generated according to the above record table is divided by the common sense of the score value in the record table Sum of trees plus-the above scale value multiplied by brackets 1 minus the above scale value (sum of score values / sum of common sense trees with score values in the record table * scale value) + (scale value * (1-scale Value)) produces a similar value. 2 2. The method for calculating the similarity of a concept according to the common sense tree designation method described in item 21 of the scope of patent application, wherein the similarity is stored in a related dictionary, and each word in the related dictionary is counted with other The vocabulary similarity is based on the above high similarity vocabulary to establish a thesaurus, and the low similarity vocabulary is based on the antonym.

0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 18