TW589549B - Method for calculating conceptual similarity based on common sense tree marking method - Google Patents

Method for calculating conceptual similarity based on common sense tree marking method Download PDF

Info

Publication number
TW589549B
TW589549B TW91137148A TW91137148A TW589549B TW 589549 B TW589549 B TW 589549B TW 91137148 A TW91137148 A TW 91137148A TW 91137148 A TW91137148 A TW 91137148A TW 589549 B TW589549 B TW 589549B
Authority
TW
Taiwan
Prior art keywords
common sense
tree
definition
similarity
value
Prior art date
Application number
TW91137148A
Other languages
Chinese (zh)
Other versions
TW200411422A (en
Inventor
Wen-Tai Hsieh
Shih-Chun Chou
Original Assignee
Inst Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inst Information Industry filed Critical Inst Information Industry
Priority to TW91137148A priority Critical patent/TW589549B/en
Application granted granted Critical
Publication of TW589549B publication Critical patent/TW589549B/en
Publication of TW200411422A publication Critical patent/TW200411422A/en

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for calculating conceptual similarity based on common sense tree marking method comprises the following procedures: firstly provide a common sense tree structure, which provides multiple common sense trees, subsequently provide a first glossary and second glossary, in which the first glossary provides at least a first definition and the second glossary provides at least a second definition, then determine if the first definition and the second definition are located at the same common sense tree, when the first definition and the second definition correspond to the same common sense tree, then determine at least a score value of the corresponding common sense tree, finally determine the similarity of the first glossary and the second glossary based on the score value.

Description

【發明所屬之技術領域】 特別ΪΆ有!Γ建立同義詞庫 有關於一種根據常識樹狀標 詞庠及反義詞庫之方法。 【先則技術】 及反義詞庫之方法, 示法計算概念相似度 且 建 現今 無法表達 呑吾思相似 正確、有 義詞,值 同義詞庫 【發明内 同義詞庫通常只能指出兩個 ’彙具有之定義的相似度, 度如文件相似度比對、文件 效率的服務,舉例來說,老 兩者皆包含”人π、’,教學”、 無法快速且客觀的新增詞彙 容】 柯彙是否為同義詞, 所以在搜尋引擎以及 自動摘要將無法提供 師與教授雖然不是同 π教育"的定義,造成 法則。 有 的相似 為 常識樹 明實施 其包括 識樹, 一第一 ,本發 建立同 發明之 法計算 據常識 明的主要目 義詞庫以及 上述目的, 概念相似度 樹狀標示法 t計算 達成本 狀標示 例之根 下列步驟首先提供一常識 接著提供第 定義 第二詞 定義與 二定義 計分值 似度。 詞彙和第二 彙具有至少 第二定義是 落於相同之 ,最後根據 反義詞 可藉由 之方法 計算概 樹狀結 詞彙, -第二 否位於 常識樹 計分值 提供一種利用詞彙 庫之方 所提供 來達成 念相似 構,其 第一詞 定義, 相同之 時,則 ’決定 法。 之一種根據 。依據本發 度之方法, 具有複數常 彙具有至少 接下來分別 常識樹,當 決定對應常 第一詞彙和 判斷上述帛_ 第一定義和第 識樹之至少— 第二詞彙之相[Technical field to which the invention belongs] Particularly rare! Γ Establishing a Thesaurus is related to a method based on common sense tree marks and antonyms. [Principal technique] and antonym thesaurus methods, notations that calculate conceptual similarity and that are not able to express liguria similarity, meaning words, value thesaurus [thesaurus in the invention can usually only indicate that two Defined similarity, such as file similarity comparison and file efficiency services. For example, the old two include "person π, ', teaching", and can not quickly and objectively add vocabulary. Synonyms, so search engines and automatic summaries will not be able to provide teachers and professors although they are not the same as the definition of education ", resulting in rules. Some similarities are implemented by common sense tree. It includes the knowledge tree. First, the present invention establishes the main thesaurus of common sense based on the method of the invention and the above purpose. The concept similarity tree marking method t calculates the cost. The root of the standard example The following steps first provide a common sense and then provide a definition of the second word definition and the second definition of the score value similarity. The vocabulary and the second vocabulary have at least a second definition that is the same. Finally, the antonym can be used to calculate the approximate tree-like knot vocabulary.-The second is located in the common sense tree. In order to achieve a similar structure, the first word is defined. One basis. According to the method of the present invention, there are plural constants with at least the next common sense tree, when it is decided to correspond to the first vocabulary and judge the above __ the first definition and the at least the cognition tree-the second vocabulary

589549 五、發明說明(2) 為讓本發明之上述和其他目的、特徵、和優點能更明 顯易懂,下文特舉出較佳實施例,並配合所附圖式,作詳 細說明如下: 【實施方式】 請參考第1圖,第1圖為示意圖係顯示依據本發明實施 例之根據常識樹狀標示法計算概念相似度方法之操作流程 圖〇 ' 首先’如步驟S1 〇,擷取兩詞彙,在一文章中擷取兩 =彙’例如教師及導師,一常識樹狀結構具有複數常識 樹士例如貫體(e n t i t y )常識樹、一標言志(a 11 r i b u t e )常 識樹、一事件(even)常識樹以及一第2特徵(secondary 請參考附件),常識樹為電腦系統所使 用之a彙之子典,包含不同詞彙之定義,其中一 在一常識樹中越底層其具有越高相似 ; 常識樹狀結構查詢兩詞棄所包含,;依么 識樹中找到-定義,| ·位' 事件常識樹;f二定=常 ,第2特徵常識樹中找到一義” 疋義教 師”在常識樹狀結構中實p A1丹,月,另外,詞彙”導 ^tta,J^^^ ^ i 中找到兩個定義”良,,與”教古n。 第2特徵常識樹 教師所包括之定義:^ ° humanl人—實體常識樹589549 V. Description of the invention (2) In order to make the above and other objects, features, and advantages of the present invention more comprehensible, the following describes the preferred embodiments and the accompanying drawings in detail, as follows: [ Embodiment] Please refer to FIG. 1. FIG. 1 is a schematic diagram showing an operation flowchart of a method for calculating conceptual similarity according to a common sense tree marking method according to an embodiment of the present invention. ‘First’, as step S1 〇, extract two words Take two = sinks in one article, such as teachers and mentors, a common sense tree structure with a plurality of common sense trees, such as an entity common sense tree, an 11 ribute common sense tree, and an event ( even) the common sense tree and a second feature (secondary, please refer to the attachment), the common sense tree is a subset of the a used in computer systems, contains definitions of different words, one of which has higher similarity at the lower level in a common sense tree; common sense Tree structure query contains the two word discards; according to-find in the knowledge tree-definition, | · bit 'event common sense tree; f two fixed = constant, the second feature common sense tree is found "Yiyi teacher "In the common sense tree structure, p A1 Dan, Month, and In addition, two definitions are found in the vocabulary" Guide ^ tta, J ^^^ ^ i, "and" teach ancient n. "The 2nd characteristic common sense tree teacher's office Included definitions: ^ ° humanl person-entity common sense tree

589549 五、發明說明(3) occupat ion |職位—標誌常 teach |教-事件常識樹 " education |教育〜第2 導師所包括之定義: & human |人-實體常識樹 guide |引導-事件常識樹 desired |良-第2特徵常識樹 teach |教-事件常識樹 5 education |教育_第2特徵常識樹 接著,步驟S14,判斷兩詞彙所包含之 相同之常識樹上,將兩詞彙中所包含之定義逐我—疋否位於 斷兩定義是否位於相同之常識樹上,步驟比對,判 位於相同之常識樹上’計算兩定義之計分值,:兩定義 義在相同之常識樹上為字面相同、義元相同以I」斷兩定 相同之定義,將給予兩定義之常缚樹一計分值,=文拼法 義在此常識樹具有最高相似度’計算方法為樹之^底=疋 (TheDeppestLevel )除以括號樹之最底層加丨之值(樹I 底層/ (樹之最底層+ 1 )),例如當教師中定義"人"與 導師中定義"人"’具有相同字面、相同義元以及相同英文 拼法’經過計算將得到一計分值為8/(8 + 1),表示兩定義 在此常識樹具有最高相似度,其中8為實體常識樹之最底 層,教師中定義”教"與導師中定義"教",具有相同字面、 相同義元以及相同英文拼法,經過計算將得到一計分值為 1 3 /(1 3 + 1 ),表示兩定義在此常識樹具有最高相似度,其589549 V. Description of the invention (3) occupat ion | Position—Signal often teach | Teaching-event common sense tree " education | Education ~ 2nd tutor includes definitions: & human | person-entity common sense tree guide | guide-event Common sense tree desired | Good-Second feature common sense tree teach | Teaching-Event common sense tree 5 education | Education_Second feature common sense tree Next, step S14, judging that the two words contain the same common sense tree, and place the two words in The included definitions are driven by me—whether they are located on the same common sense tree and whether the two definitions are located on the same common sense tree. Steps are compared to determine whether they are located on the same common sense tree. For literally the same definition and the same meaning, I "will determine that the two definitions are the same. A score will be given to the two definitions of the regular bound tree. = The text spelling method means that the common sense tree has the highest similarity. The calculation method is the bottom of the tree =疋 (TheDeppestLevel) divided by the lowest level of the parentheses tree plus the value of 丨 (the bottom level of the tree I / (the lowest level of the tree + 1)), for example, when the teacher defines " person " and the instructor defines " person " ' Have the same literal, phase Yiyuan and the same English spelling 'after calculation will get a score of 8 / (8 + 1), which means that the two definitions have the highest similarity in this common sense tree, of which 8 is the lowest level of the physical common sense tree, which is defined by the teacher " Teach "quote" and "define" in the tutor, have the same literals, the same meaning, and the same English spelling. After calculation, you will get a score of 1 3 / (1 3 + 1), which means that the two definitions are common sense. The tree has the highest similarity, and its

0213-8982TWF(nl);STLjC-01-B9192;FRANKLIN.ptd $ 6頁 589549 五 -發明說明(4) — 中1 3為事件常識樹之最底層,以及教師中定義"教育’, 師中定義,,教育,,,具有相同字面、相同義元以及相θ ^導 拼法,經過計算將得到一計分值為2/(2 + 1 ),表示英文 在此常識樹具有最高相似度,其中2為第2特徵常螞定義 底層。 命彳对之最 若為子面不相同、義元不相同或英文拼法不相同之^ 義’尋找兩定義在相同之常識樹中具有最深的共同父節^ 值除以括號常識樹之最底層(TheDep pest Level )加丨(最深 的共同父節點值/(常識樹之最底層+ Π)來計算一計分 值此外’若兩定義為反義或對義將所計算求得之計分數 加上負號為負計分值。 員體$硪樹:計分值(humanl人,human|人、= 8/(8+1) ; 事件常識樹 計分值(teach 13/(13/1) 第特彳政《硪樹:計分值(e d u c a t i ο η |教盲, eduCation 丨教育)=2/(2+1) 算所d步驟S18 ’將計分值記錄至-記錄表,將計 2/(2 + 1) 值’例如將8/(8 + 1) ' 13/〇3/1)以及 Z / I Z + 1 )二個許八括 記錄表中有計分刀δ上至一記錄表’記錄表同時記載在 樹、事件常ί吊Μ之總和’例如為3(實體常識 義散佈在多;:ί2特二常,識樹)以及兩詞彙所包含之货 樹、事件常η二°a =的常識樹之總和,例如為4 (實體常益 哉祕、第2特徵常識樹 '標誌常識樹)。0213-8982TWF (nl); STLjC-01-B9192; FRANKLIN.ptd $ 6 pages 589549 Five-Explanation (4) — Medium 1 3 is the lowest level of the common sense tree of events, and the definition in the teacher " Education ', the teacher's The definition, education, has the same literals, the same meaning and the same θ ^ guide spelling. After calculation, a score value of 2 / (2 + 1) is obtained, which means that English has the highest similarity in this common sense tree, where 2 defines the bottom layer for the second feature Changma. The fate of the pair is the one with different sub-faces, different meanings, or different spellings in English ^ meaning 'find the two definitions that have the deepest common parent section in the same common sense tree ^ value divided by the bottom of the common sense tree (TheDep pest Level) Add 丨 (the deepest common parent node value / (the lowest level of the common sense tree + Π) to calculate a score value. In addition, if the two are defined as antisense or antisense, add the calculated score The minus sign is a negative scoring value. Member $ 硪 Tree: scoring value (humanl person, human | person, = 8 / (8 + 1); event common sense tree scoring value (teach 13 / (13/1) The first special policy "Sassafras tree: scoring value (educati ο η | education blindness, eduCation 丨 education) = 2 / (2 + 1) d step S18 of the office 'record the scoring value to-record table, will count 2 / (2 + 1) value 'e.g. 8 / (8 + 1)' 13 / 〇3 / 1) and Z / IZ + 1) There are two scoring records in the scoring table δ up to a recording table 'The record table records both the sum of the tree and the event often, for example, 3' (for example, the common sense of the entity is scattered in many places ;: 2 special and two constants, the tree of knowledge) and the goods tree and event that are included in the two words a = total of common sense trees , For example 4 (Zai secret entity often beneficial, wherein the second sense Tree 'flag knowledge tree).

0213-8982TW(nl) ;STLC-〇i FRANKLIN.ptd 第7頁 589549 五、發明說明(5) 步驟S20,讀取記錄表產生一比例值, 5十义值的堂均括4·々i 錄表中有 義螂德* 3 Α Μ 和列如為3除以兩詞橐所包含之定 義政佈在多少常識樹的常識 二:之二 棄所包含之定義散佈在多少常識:的1 兩詞 依據記錄表中之資訊與比例值二 值,將記錄表中計分值之總和除以記錄表中 有计刀值的吊識樹之總和再乘以比例值,例如 ^ (13/(13 + 1)) + (2/(2 + 1)) / 3) * 3/4 為計算這三顆常 識樹包含之定義在常識樹中之平均深度,其中((8/(8 + 1) + (13/(13 + 1)) + (2/(2 + 1)) / 3)為三顆常識樹之計分 值的平均值,以及加上常識樹包含之定義的廣度為比例值 乘以括號1減比例值,例如(3/4 * (1-(3/4))產生一相 似度,例如為 0.80 85 [((8K8 + 1)) + (13/(13+1)) + (2/(2+1)) / 3) * 3/4] + (3/4 * (卜(3/4))= 0.8085)。 (計分值之總和/ 記錄表中有計分值的常識樹之總和 * 比例值)+ (比例值* (1 - 比例值)) 最後,請參考第3圖,第3圖為示意圖係顯示依據本發 明實施例之根據常識樹狀標示法計算概念相似度方法之關 聯詞庫,步驟S24,依據兩詞彙之相似度,建立一同義詞 庫及反義詞庫,將所計算兩詞彙之相似度儲存至關聯辭 庫,統計關聯辭庫中每個詞彙與其他詞彙的相似度,依據 相似度高的詞彙建立一同義詞庫,以及依據相似度低的詞0213-8982TW (nl); STLC-〇i FRANKLIN.ptd page 7 589549 V. Description of the invention (5) Step S20, read the record table to produce a proportional value, the 50 values of the meaning are included in the 4 · 々i record In the table, there is a protagonist * 3 Α Μ and the column is 3 divided by two words. The definition of how much common sense is included in the common sense tree. Second: How much is the definition of the common sense that is discarded: 1. Two words Based on the two values of the information in the record table and the proportional value, divide the sum of the score values in the record table by the sum of the tree with the knife count value in the record table and multiply by the proportional value, such as ^ (13 / (13 + 1)) + (2 / (2 + 1)) / 3) * 3/4 is the average depth of the three common sense trees defined in the common sense tree, where ((8 / (8 + 1) + ( 13 / (13 + 1)) + (2 / (2 + 1)) / 3) is the average of the score values of the three common sense trees, plus the breadth of the definition contained in the common sense tree is the proportional value multiplied by the brackets 1 minus the proportional value, for example (3/4 * (1- (3/4)) produces a similarity, such as 0.80 85 [((8K8 + 1)) + (13 / (13 + 1)) + (2 / (2 + 1)) / 3) * 3/4] + (3/4 * (Bu (3/4)) = 0.8085). (Sum of the scoring values / There are points in the record table Sum of Common Sense Trees * Proportional Value) + (Proportional Value * (1-Proportional Value)) Finally, please refer to Figure 3, which is a schematic diagram showing the calculation concept based on the common sense tree notation according to the embodiment of the present invention Correlation lexicon of the similarity method, step S24. Based on the similarity of the two vocabularies, establish a thesaurus and an antonym, store the calculated similarity of the two vocabularies in the related lexicon, and count each word in the related lexicon and other words. Similarity, build a thesaurus based on words with high similarity, and words based on low similarity

0213-8982TWF(nl);STLC-〇l-B9192;FRANKLIN.ptd 第8頁 589549 五、發明說明(6) 以下以一實施例說明若定義之字面不相 同或英文拼法不相同,將尋找兩定義在相同^當,元不相 有最深的共同父節點值除以括號常識樹之最底 哉樹中具 (TheDeppestLevel)加1 (最深的共同父節點 之最底層+ 1))來計算一計分值,舉例來說,/、,(常熾樹 取兩詞彙,從一文章中擷取兩詞彙,例如貧賤首^先,抬員 著,根據常識樹結構查詢出兩詞彙所包含之定$ _責。接 彙貧賤所包含之定義為,,屬性值,,、,,等铋"1義’例如詞 - 久、低等·· 彙建立一反義詞庫 位於榡誌值 莠 彙貧賤所包含之定義為’’屬性值、,,等級 π 、π貧富,,以及,,窮”,其中’’屬性值,,以及”窮 (Attribute Value)常識樹中,丨丨等級”、丨,你心 寺’’以" 貧 "位於標誌常識樹中,以及"筹"位於第2特徵常識樹中,另畐 一詞彙富貴所包含之定義為”屬性值”、,,貧舍二"富,,、" 良,其中"屬性值"以及"富"位於標誌值當㈣±中,"貧富 "位於標誌常識樹中良"位於第2特 a中。 貧賤所包含之定義: ^又吊4树中 ❿ aValue |屬性值-標誌值常識樹 rank |等級-標誌常識樹 LowRank |低等-標誌常識樹 undesired I 莠—^ 0 .. • h ^第2特徵常識樹 richness丨負虽〜於 知适常識樹 poor |窮-&諸值常識樹 富貴所包含之定義: aValue |屬性值〜辦 铽“值常識樹0213-8982TWF (nl); STLC-〇l-B9192; FRANKLIN.ptd page 8 589549 V. Description of the invention (6) The following is an example to explain if the definitions are different or the English spelling is different, two Defined at the same time, when the element has the deepest common parent node value divided by the bracket in the bottom of the common sense tree (TheDeppestLevel) plus 1 (the bottommost common parent node + 1)) to calculate a count Scores, for example, / ,, (Chang Chi tree takes two words, extracts two words from an article, such as poor first ^ first, lifted the staff, according to the structure of the common sense tree to find out what the two words contain _Responsibility. The definition contained in the foreign exchange poverty is, the attribute value ,,,,, etc. bismuth " 1 meaning 'such as the word-Jiu, low, etc. · The sink establishes an antonym thesaurus is located in the value of the Hui poverty The definition is `` attribute value ,, level π, π rich and poor, and, poor '', where `` attribute value '', and `` attribute value '' in the common sense tree, 丨 丨 level, 丨 your heart The temple `` is located in the common sense tree with " poor " and " chip " 2 In the feature common sense tree, the other word rich and rich contains the definition of "attribute value", ", poor", "rich,", "good", where " attribute value " and " rich " are located The value of the mark is ㈣ ±, and "the rich and the poor" is located in the mark common sense tree. It is located in the second special a. The definition of poverty includes: ^ again in the 4 tree ❿ aValue | attribute value-mark value common sense tree rank | Rank-Sign Common Sense Tree LowRank | Low-Sign Common Sense Tree undesired I 莠 — ^ 0 .. • h ^ The 2nd feature common sense tree richness 丨 Negative ~~ Knowledge Common Tree Poor | Poor- & Values Common Sense Tree Definition included in riches: aValue | attribute value ~ do "value common sense tree

0213.8982TWF(nl);STLC-01-B9192;reANKLIN.ptd 第9頁 589549 五、發明說明(7) richness |貧富〜標誌常識樹 rich丨富—標誌值常識樹 desired |良-第2特徵常識樹 士接下來,判斷兩詞彙所具有之定義是否位於相同之常 識樹上,若兩個定義位於相同之常識樹上具有相同字面、 ,同義元以及相同英文拼法,將給予兩定義之常識樹一計 分值i其具有最高相似度,例如詞彙貧賤之定義”貧富”與 e司彙备貝之定義"貧富”在標誌常識樹中具有最高相似 度三接下來’參考第2圖,第2圖為示意圖係顯示依據本發 明實施例之第2特徵常識樹,本發明實施例之操作流程將 說明於下,斷詞彙貧賤之定義”莠”與詞彙富貴之定義”良 具有不相同字面、不相同義元以及不相同英文拼法, 因此將在標諸值常識樹中搜尋兩定義最深的共同父節點 值’例如為2 ’依據上述計分值演算法求得計分值為 (2/(2+1 )),而由於詞彙貧賤之定義”莠”與詞彙富貴之定 義π良’’為反義所以將計算求得之計分數加上負號產生一 負計分值為-(2 /( 2 + 1 ))。此外判斷詞彙貧賤之定義”窮,,與 詞彙富貴之定義”富",具有不相同字面、不相同義元以及 不相同英文拼法,因此將在標誌值常識樹中搜尋兩定義最 深的共同父節點值,例如為3,依據上述計分值演算法(最 深的共同父節點值常識樹之最底層+ 1 ))求得計分值 為(3 /( 4 + 1 )),而由於詞彙貧賤之定義”窮”與詞彙富貴之 定義"富π為反義所以將計算求得之計分值加上負號產生 一負計分值為-(3 / ( 4 + 1 ))。0213.8982TWF (nl); STLC-01-B9192; reANKLIN.ptd Page 9 589549 V. Description of the invention (7) richness | rich or poor ~ sign common sense tree rich 丨 rich—sign value common sense tree desired | good-second characteristic common sense tree Next, determine whether the definitions of the two words are located on the same common sense tree. If the two definitions are on the same common sense tree and have the same literals, synonyms, and the same English spelling, the two definitions will be given a common sense tree. The scoring value i has the highest similarity, for example, the definition of the word "poor and rich" and the definition of "economy" " poor and rich "in the common sense tree have the highest similarity. Next, refer to Figure 2 and Figure 2. The diagram is a schematic diagram showing the second feature common sense tree according to the embodiment of the present invention. The operation flow of the embodiment of the present invention will be described below. The definition of the word "poor" is not the same as the word "rich". The same meaning element and different English spellings, so the two commonest parent node values with the deepest definition in the standard common sense tree will be searched. For example, 2, the score value is obtained according to the above score algorithm. Is (2 / (2 + 1)), and because the definition of vocabulary “贱” and the definition of vocabulary ‘good’ are antonyms, the calculated score plus a negative sign produces a negative score. Is-(2 / (2 + 1)). In addition, the definition of poor vocabulary "poor," and the definition of rich and vocabulary "rich" have different literals, different meanings, and different English spellings. Therefore, the two deepest common fathers will be searched in the flag value common sense tree. The node value, for example, 3, according to the above-mentioned scoring algorithm (the deepest common parent node value is the lowest level of the common sense tree + 1)), the scoring value is (3 / (4 + 1)), and due to poor vocabulary The definition of "poor" and the definition of vocabulary "rich π" are antonyms, so the calculated score value plus a negative sign produces a negative score value-(3 / (4 + 1)).

589549 五、發明說明(8) 標誌、常識樹:計分值(richnessl貧富,richness I貧 畐)=3/(3+1) 窮,r i ch I 富) 標誌值常識樹:計分值(p〇〇r | -(3/(4+1)) 第2特徵常識樹:計分值(undesired丨赛,desired| 良)=-(2/(2 + 1 )) >接著,將所計算得之計分值記錄至一記錄表,記錄表 中圮載在記錄表中有計分值的常識樹之總和,例如為3 (標 誌常識樹' 標誌值常識樹、第2特徵常識樹)以及兩詞彙所 包含之定義散佈在多少常識樹的常識樹之總和,例如為 3 (標誌常識樹、標誌值常識樹、第2特徵常識樹),利用讀 取記錄表中貢訊,將記錄表中有計分值的常識樹之總和, 例如為3除以所有定義散佈在多少常識樹的常識樹之總 和,例如為3產生一比例值,例如3 / 3。 最後’依據記錄表中之資訊與比例值之間關係,計算 這二顆常識樹包含之定義在常識樹中之平均深度以及加上 常識樹包含之定義的廣度產生一相似度,例如-0 · 5丨6 6。 (3/(3+1)+(-3/(4+1)+(-2/(2+1)))/3)*3/3+(3/3*(1-3/3))=-0· 5166 根據以上所述’本發明根據常識樹狀標示法計算概念 相似度之方法,能正確表達兩詞彙間相似度,並且具有快 速且客觀的新增詞彙法則,因此在知識管理的應用上極具 有產業的價值。 雖然本發明已以較佳實施例揭露如上,然其並非用以589549 V. Description of the invention (8) Mark and common sense tree: score value (richnessl rich, richness I poor) = 3 / (3 + 1) poor, ri ch I rich) Mark value common sense tree: score (p 〇〇r |-(3 / (4 + 1)) The second feature common sense tree: score value (undesired 丨 match, desired | good) =-(2 / (2 + 1)) > Next, calculate the The score value obtained is recorded in a record table. The total of the common sense tree with the score value contained in the record table is recorded in the record table, for example, 3 (sign common sense tree 'sign value common sense tree, second characteristic common sense tree), and The definition of the two common words in the two words is the sum of common sense trees. For example, it is 3 (sign common sense tree, flag value common sense tree, and second characteristic common sense tree). Sum of common sense trees with a scoring value, for example, 3 divided by the sum of all common sense trees that define how many common sense trees are scattered, for example, a proportional value is generated for 3, such as 3 / 3. Finally, according to the information in the record table and The relationship between the ratio values, calculate the average depth of the two common sense trees defined in the common sense tree and add the common sense tree The breadth of the included definition produces a similarity, such as -0 · 5 丨 6 6. (3 / (3 + 1) + (-3 / (4 + 1) + (-2 / (2 + 1))) / 3) * 3/3 + (3/3 * (1-3 / 3)) =-0 · 5166 According to the above, the method of the present invention for calculating the similarity of concepts according to the common sense tree marking method can correctly express two words The similarity between them and the fast and objective addition of vocabulary rules, so it has great industrial value in the application of knowledge management. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to

第11頁 〇213-8982TWF(nl);STljC-01-B9192;FRANKLIN.ptd 589549 五、發明說明(9) 限定本發明,任何熟習此技藝者,在不脫離本發明之精神 和範圍内,當可作各種之更動與潤飾,因此本發明之保護 範圍當視後附之申請專利範圍所界定者為準。Page 11 〇213-8982TWF (nl); STljC-01-B9192; FRANKLIN.ptd 589549 V. Description of the invention (9) The invention is limited, and anyone skilled in the art will not depart from the spirit and scope of the invention. Various modifications and retouching can be made, so the protection scope of the present invention shall be determined by the scope of the attached patent application.

0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第12頁 589549 圖式簡單說明 第1圖為示意圖係顯示依據本發明實施例之根據常識 樹狀標示法計算概念相似度方法之操作流程圖; 第2圖為示意圖係顯示依據本發明實施例之第2特徵常 識樹; 第3圖為示意圖係顯示依據本發明實施例之根據常識 樹狀標示法計算概念相似度方法之關聯詞庫。 【符號說明】 S 1 0〜擷取兩詞彙; S1 2〜依據常識樹狀結構查詢兩詞彙所包含之定義; S1 4〜判斷兩詞彙所包含之定義是否位於相同之常識樹 上; S1 6〜當兩定義位於相同之常識樹上計算兩定義之計分 值; 51 8〜將計分值記錄至一記錄表; 52 0〜讀取記錄表產生一比例值; S2 2〜依據記錄表中之資訊與比例值之間關係計算產生 一相似值; S24〜依據兩詞彙之相似度建立一同義詞庫及反義詞 庫; 3 0〜關聯詞庫。0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 12 589549 Brief description of the diagram Figure 1 is a schematic diagram showing the operation of the method for calculating conceptual similarity according to the common sense tree notation according to the embodiment of the present invention Flowchart; Figure 2 is a schematic diagram showing a second feature common sense tree according to an embodiment of the present invention; Figure 3 is a schematic diagram showing a related thesaurus of a method for calculating conceptual similarity based on a common sense tree notation according to an embodiment of the present invention. [Symbol description] S 1 0 ~ capture two words; S1 2 ~ query the definition of two words according to the common sense tree structure; S1 4 ~ judge whether the definitions contained in the two words are on the same common sense tree; S1 6 ~ When the two definitions are located on the same common sense tree, the score values of the two definitions are calculated; 51 8 ~ Record the score value to a record table; 52 0 ~ Read the record table to generate a proportional value; S2 2 ~ According to the record table The calculation of the relationship between the information and the proportional value produces a similar value; S24 ~ Establish a thesaurus and antonym based on the similarity of the two words; 3 0 ~ related thesaurus.

0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第13頁0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 13

Claims (1)

589549 六、申請專利範圍 1. * 種根據常識樹狀標不法计鼻概念相似度之方 法,其包括下列步驟: 提供一常識樹狀結構,其具有複數常識樹; 提供第一詞彙和第二詞彙,上述第一詞彙具有至少一 第一定義,上述第二詞彙具有至少一第二定義; 分別判斷上述第一定義與上述第二定義是否位於相同 之上述常識樹; 當上述第一定義和上述第二定義位於相同之上述常識 樹時,則決定對應上述常識樹之至少一計分值; 根據上述計分值,決定上述第一詞彙和上述第二詞彙 之相似度。 2. 如申請專利範圍第1項所述之根據常識樹狀標示 法計算概念相似度之方法,其中當上述第一定義與上述第 二定義位於相同之上述常識樹而且上述第一定義與上述第 二定義具有相同字面、相同義元以及相同英文拼法,給予 上述第一定義與上述第二定義相同之上述常識樹一計分 值,該計分值為上述第一定義與上述第二定義在上述常識 樹所具有之最高相似度。 3. 如申請專利範圍第2項所述之根據常識樹狀標示 法計算概念相似度之方法,其中計算上述常識樹所具有之 最高相似度的方法為(樹之最底層/(樹之最底層+ 1 ))° 4. 如申請專利範圍第1項所述之根據常識樹狀標示 法計算概念相似度之方法,其中步驟更包括若上述第一定589549 VI. Application for patent scope 1. * A method for calculating similarity of nasal concepts based on common sense tree marks, which includes the following steps: Provide a common sense tree structure with a plurality of common sense trees; provide a first word and a second word The first vocabulary has at least a first definition and the second vocabulary has at least a second definition; judging whether the first definition and the second definition are located in the same common sense tree, respectively; when the first definition and the first When the two definitions are located in the same common sense tree, at least one score value corresponding to the common sense tree is determined; based on the score value, the similarity between the first word and the second word is determined. 2. The method for calculating conceptual similarity based on the common sense tree notation described in item 1 of the scope of patent application, wherein when the first definition is located in the same common sense tree as the second definition and the first definition is the same as the first The two definitions have the same literals, the same meanings, and the same English spelling. The above-mentioned first definition is the same as the above-mentioned second common sense tree. The score value is the above-mentioned first definition and the above-mentioned second definition. The highest similarity of common sense trees. 3. As described in item 2 of the scope of patent application, the method for calculating the similarity of concepts based on the common sense tree notation, wherein the method for calculating the highest similarity of the common sense tree is (the lowest level of the tree / (the lowest level of the tree) + 1)) ° 4. The method for calculating conceptual similarity based on the common sense tree designation method described in item 1 of the scope of patent application, wherein the steps further include 0213-8982TW(nl);STljC-01-B9192;FRANKLIN.ptd 第14頁 589549 六、申請專利範圍 義和上述第二定義字面不相同、義元不相同或英文拼法不 相同,將在上述常識樹中尋找最深的共同父節點,計算上 述常識樹之一計分值。 5. 如申請專利範圍第4項所述之根據常識樹狀標示 法計算概念相似度之方法,其中計算上述常識樹之一計分- 值方法為(最深的共同父節點值/(常識樹之最底層+ 1)) 〇 6. 如申請專利範圍第1項所述之根據常識樹狀標 示法計算概念相似度之方法,其中更包括一步驟將計算所 得之正、負計分值記錄至一記錄表。 7. 如申請專利範圍第6項所述之根據常識樹狀標 示法計算概念相似度之方法,其中上述記錄表更包含記錄 在記錄表中有計分值的常識樹之總和T1以及上述第一定義 和上述第二定義散佈在多少常識樹的常識樹之總和T2。 8. 如申請專利範圍第7項所述之根據常識樹狀標 示法計算概念相似度之方法,其中將上述T1值除以上述T 2 產生一比例值。 9. 如申請專利範圍第1項所述之根據常識樹狀標 示法計算概念相似度之方法,其中步驟更包括當上述第一 定義與上述第二定義為反義或對義,將上述計分值加上負 號,產生一負計分值。 1 0 .如申請專利範圍第6項所述之根據常識樹狀標 示法計算概念相似度之方法,其中依據上述記錄表產生一 計分值之總和除以記錄表中有計分值的常識樹之總和加上0213-8982TW (nl); STljC-01-B9192; FRANKLIN.ptd page 14 589549 6. The scope of the patent application is not the same as the second definition above, the meaning is different, or the English spelling is different, which will be in the common sense tree above. Find the deepest common parent node and calculate the score of one of the common sense trees above. 5. The method for calculating conceptual similarity based on the common sense tree notation as described in item 4 of the scope of patent application, wherein the score-value method for calculating one of the above common sense trees is (the deepest common parent node value / (of the common sense tree The lowest level + 1)) 〇6. As described in item 1 of the scope of patent application, the method of calculating conceptual similarity based on common sense tree marking method, which further includes a step of recording the calculated positive and negative score values to a Record Form. 7. The method for calculating conceptual similarity based on the common sense tree marking method described in item 6 of the scope of the patent application, wherein the above-mentioned record table further includes the total T1 of the common sense tree recorded in the record table and the above-mentioned first The sum of the common sense tree T2 of the definition and the second definition above are scattered. 8. The method for calculating conceptual similarity based on common sense tree notation as described in item 7 of the scope of patent application, wherein the above-mentioned T1 value is divided by the above-mentioned T2 to produce a proportional value. 9. The method for calculating the similarity of a concept according to the common sense tree designation method described in item 1 of the scope of patent application, wherein the steps further include scoring the above points when the first definition and the second definition are antisense or opposite. A negative sign is added to the value to produce a negative scoring value. 10. The method for calculating conceptual similarity based on the common sense tree marking method described in item 6 of the scope of patent application, wherein a total of a score value is generated according to the above record table divided by a common sense tree with a score value in the record table Sum 0213-8982TWF(nl);STljC-01-B9192;FRANKLIN.ptd 第15頁 ~'申靖專利範圍 ^述比例值乘以括號丨減掉上述比例值(計分值之總和/ =鲦表中有計分值的常識樹之總和*比例值)+ (比例值 〇 -比例值))產生一相似值。 〜11 ·如申請專利範圍第1 〇項所述之根據常識樹狀標 示决計算概念相似度之方法,其中將上述相似度儲存至一 關聯辭庫,統計上述關聯辭庫中每個3彙與其他词彙的相 =度,依據上述相似度高的詞彙建立一同義詞庫,依據上 迷相似度低的詞彙建立一反義詞庫。〆 法 1 2 · —種根據常識樹狀標示法計鼻概心相似度之方 其包括下列步驟: 提供一常識樹狀結構; 提供第一詞彙和第二詞彙 '定義,上述第二詞彙具有至少 根據上述第一定義與上述第二 _ 構中之關聯性,決定上述第一詞彙和上述第二詞彙之相似 度。 1 3 ·如申請專利範圍第1 2項所述之根據常識樹狀標示 法計算概念相似度之方法,其中當上述第一定義與上述第 二定義位於相同之上述常識樹而真上述第一定義與上述第 二定義具有相同字面、才目同義元以及相同^文拼法」給予 上述第一定義與上述第二定義相同之上述$ 5我樹一 δ十分 ,,該計分值為上述第一定義與上述第二定義在上述常識 樹所具有之最高相似度。 Η.如申請專利範圍第U項戶斤述之根據常識樹狀標示 第 上述第一詞彙具有至少一 /第》—定義;以及 定義在上述常識樹狀結0213-8982TWF (nl); STljC-01-B9192; FRANKLIN.ptd page 15 ~ 'Shenjing patent range ^ The ratio value is multiplied by brackets 丨 minus the above ratio value (the sum of the scoring values / = in the table) The sum of the common sense tree of the scoring values * proportional value) + (proportional value 0-proportional value)) produces a similar value. ~ 11 · As described in Item 10 of the scope of patent application, the method for calculating the similarity of concepts based on the common sense tree mark, wherein the similarity is stored in an association dictionary, and each 3 Phases of other vocabularies = degree, a synonym database is established based on the above-mentioned similar words with a high degree of similarity, and an antonym dictionary is established based on the words with a low similarity degree. Method 1 2-A method for calculating the similarity of nasal outlines according to the common sense tree designation method, which includes the following steps: providing a common sense tree structure; providing a definition of a first word and a second word, the second word having at least The similarity between the first vocabulary and the second vocabulary is determined according to the correlation between the first definition and the second construct. 1 3 · The method for calculating conceptual similarity based on the common sense tree designation as described in item 12 of the scope of patent application, wherein the first definition is true when the first definition and the second definition are located in the same common sense tree. It has the same literals, identical meanings, and same spellings as the second definition above. ”The above-mentioned first definition is the same as the second definition. The highest similarity between the definition and the second definition in the common sense tree. Η. If the U.S. households in the scope of the patent application apply the tree-based labeling based on common sense, the first term above has at least one / number "—definition; and the definition is in the common-sense tree-like structure. 0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第16頁 589549 六、申請專利範圍 法計算概念相似度之方法,其中計算上述常識樹所具有之 最南相似度的方法為(樹之最底層/(樹之最底層+ 1 ))° 1 5.如申請專利範圍第1 2項所述之根據常識樹狀標示 法計算概念相似度之方法,其中步驟更包括若上述第一定 義和上述第二定義字面不相同、義元不相同或英文拼法不 相同,將在上述常識樹中尋找最深的共同父節點,計算上 述常識樹之一計分值。 1 6.如申請專利範圍第1 5項所述之根據常識樹狀標示 法計算概念相似度之方法,其中計算上述常識樹之一計分 值方法為(最深的共同父節點值/(常識樹之最底層+ 1 )) 〇 1 7.如申請專利範圍第1 2項所述之根據常識樹狀標 示法計算概念相似度之方法,其中步驟更包括將計算所得 之正、負計分值記錄至一記錄表。 1 8.如申請專利範圍第17項所述之根據常識樹狀標 示法計算概念相似度之方法,其中上述記錄表更包含記錄 在記錄表中有計分值的常識樹之總和T1以及上述第一定義 和上述第二定義散佈在多少常識樹的常識樹之總和T2。 1 9 .如申請專利範圍第1 8項所述之根據常識樹狀標 示法計算概念相似度之方法,其中將上述Tl·值除以上述T2 產生一比例值。 2 〇 .如申請專利範圍第1 2項所述之根據常識樹狀標 示法計算概念相似度之方法,其中步驟更包括當上述第一0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd page 16 589549 VI. Method for calculating conceptual similarity by applying the patent scope method, in which the method for calculating the southernmost similarity of the common sense tree is (tree The lowest level of the tree / (the lowest level of the tree + 1)) ° 1 5. The method for calculating the similarity of the concept according to the common sense tree notation as described in item 12 of the scope of patent application, wherein the steps further include the above first definition It is literally different from the second definition above, the meaning is different, or the English spelling is not the same. The deepest common parent node will be found in the common sense tree and a score value of one of the common sense trees will be calculated. 1 6. The method for calculating conceptual similarity based on the common sense tree notation as described in item 15 of the scope of patent application, wherein one of the above common sense trees is calculated as (the deepest common parent node value / (common sense tree) The lowest level + 1)) 〇1 7. The method for calculating conceptual similarity based on the common sense tree marking method described in item 12 of the scope of patent application, wherein the steps further include recording the calculated positive and negative score values To a record sheet. 1 8. The method for calculating conceptual similarity based on the common sense tree marking method described in item 17 of the scope of patent application, wherein the above-mentioned record table further includes the total T1 of the common sense tree recorded in the record table with a score value and the above-mentioned first The sum of the common sense tree T1 of one definition and the second definition above is scattered. 19. The method for calculating conceptual similarity based on common sense tree notation as described in item 18 of the scope of patent application, wherein the above-mentioned Tl · value is divided by the above-mentioned T2 to produce a proportional value. 2. The method for calculating the similarity of a concept based on the common sense tree notation as described in item 12 of the scope of patent application, wherein the steps further include when the first 0213-8982TWF(nl);STLC-01.B9192;FRANKLIN.ptd 第17頁 589549 六、申請專利範圍 定義與上述第二定義為反義或對義,將上述計分值加上負 號,產生一負計分值。 2 1 .如申請專利範圍第1 7項所述之根據常識樹狀標 示法計算概念相似度之方法,其中依據上述記錄表產生一 計分值之總和除以記錄表中有計分值的常識樹之總和加上-上述比例值乘以括號1減掉上述比例值(計分值之總和/ 記錄表中有計分值的常識樹之總和* 比例值)+ (比例值 * (1 - 比例值))產生一相似值。 2 2 .如申請專利範圍第2 1項所述之根據常識樹狀標 示法計算概念相似度之方法,其中將上述相似度儲存至一 關聯辭庫,統計上述關聯辭庫中每個詞彙與其他詞彙的相 似度,依據上述相似度高的詞彙建立一同義詞庫,依據上 述相似度低的詞彙建立一反義詞庫。0213-8982TWF (nl); STLC-01.B9192; FRANKLIN.ptd Page 17 589549 6. The definition of the scope of patent application and the above second definition are antisense or opposite. Add the above score value to a negative sign to generate a Negative score. 2 1. The method for calculating conceptual similarity based on the common sense tree designation method described in item 17 of the scope of patent application, wherein the total of a score value generated according to the above record table is divided by the common sense of the score value in the record table Sum of trees plus-the above scale value multiplied by brackets 1 minus the above scale value (sum of score values / sum of common sense trees with score values in the record table * scale value) + (scale value * (1-scale Value)) produces a similar value. 2 2. The method for calculating the similarity of a concept according to the common sense tree designation method described in item 21 of the scope of patent application, wherein the similarity is stored in a related dictionary, and each word in the related dictionary is counted with other The vocabulary similarity is based on the above high similarity vocabulary to establish a thesaurus, and the low similarity vocabulary is based on the antonym. 0213-8982TWF(nl);STLC-01-B9192;FRANKLIN.ptd 第18頁0213-8982TWF (nl); STLC-01-B9192; FRANKLIN.ptd Page 18
TW91137148A 2002-12-24 2002-12-24 Method for calculating conceptual similarity based on common sense tree marking method TW589549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW91137148A TW589549B (en) 2002-12-24 2002-12-24 Method for calculating conceptual similarity based on common sense tree marking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW91137148A TW589549B (en) 2002-12-24 2002-12-24 Method for calculating conceptual similarity based on common sense tree marking method

Publications (2)

Publication Number Publication Date
TW589549B true TW589549B (en) 2004-06-01
TW200411422A TW200411422A (en) 2004-07-01

Family

ID=34058081

Family Applications (1)

Application Number Title Priority Date Filing Date
TW91137148A TW589549B (en) 2002-12-24 2002-12-24 Method for calculating conceptual similarity based on common sense tree marking method

Country Status (1)

Country Link
TW (1) TW589549B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI419071B (en) * 2010-03-05 2013-12-11 Ceci Engineering Consultants Inc Active knowledge management system, method and computer program product for problem solving

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI419071B (en) * 2010-03-05 2013-12-11 Ceci Engineering Consultants Inc Active knowledge management system, method and computer program product for problem solving

Also Published As

Publication number Publication date
TW200411422A (en) 2004-07-01

Similar Documents

Publication Publication Date Title
Schwarm et al. Reading level assessment using support vector machines and statistical language models
Banerjee et al. An adapted Lesk algorithm for word sense disambiguation using WordNet
Heilman Automatic factual question generation from text
Pino et al. A selection strategy to improve cloze question quality
WO2019165678A1 (en) Keyword extraction method for mooc
Chen et al. Automated essay scoring by capturing relative writing quality
Rickford et al. African American, Creole, and other vernacular Englishes in education: A bibliographic resource
Babaii et al. Author-assigned keywords in research articles: Where do they come from
Majumder et al. Automatic selection of informative sentences: The sentences that can generate multiple choice questions
Narendra et al. Automatic cloze-questions generation
McNamee et al. Learning named entity hyponyms for question answering
Odom et al. Some “cloze” technique studies of language capability in the deaf
TW589549B (en) Method for calculating conceptual similarity based on common sense tree marking method
Wang A corpus-based study of English vocabulary in art research articles
Agarwal Cloze and open cloze question generation systems and their evaluation guidelines
Russu et al. An opinion mining approach for Romanian language
Quebec Morphologic Segmentation Linearity in Jose Garcia Villa's PROEM
Nurdiani et al. Morphosyntactic Analysis of Political News on Online Tempo
Yang A Semantic FAQ System for Online Community Learning.
Monokpo et al. Adequacy, currency and influence of organization of information resources on undergraduate students’ utilization of information resources in academic libraries in Rivers State, Nigeria
Rose Automatic Word Quiz Construction Using Regular and Simple English Wikipedia
Okpala Use of Electronic Resources by National Open University of Nigeria (NOUN) Undergraduate and Postgraduate Students
van Helden Essay: How to survive in a world of publication metrics in public sector accounting research
Taşkın “Evaluation” Game in Book Publishing
Kittler Jam Sessions: Celant and Battisti, Modern and Early Modern Connections

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees