TW469386B - Document retrieval and classification method and apparatus - Google Patents

Document retrieval and classification method and apparatus Download PDF

Info

Publication number
TW469386B
TW469386B TW89117245A TW89117245A TW469386B TW 469386 B TW469386 B TW 469386B TW 89117245 A TW89117245 A TW 89117245A TW 89117245 A TW89117245 A TW 89117245A TW 469386 B TW469386 B TW 469386B
Authority
TW
Taiwan
Prior art keywords
classification
retrieval
search
retrieved
document
Prior art date
Application number
TW89117245A
Other languages
Chinese (zh)
Inventor
Naohiko Noguchi
Yuji Sugano
Mitsuhiro Sato
Kai Itou
Takao Fukushige
Original Assignee
Matsushita Electric Ind Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Ind Co Ltd filed Critical Matsushita Electric Ind Co Ltd
Application granted granted Critical
Publication of TW469386B publication Critical patent/TW469386B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In a document retrieval and classification system, a retrieving operation is performed on a database of documents in accordance with retrieval conditions entered by a user so as to pick up retrieved documents as intended. The user is allowed to input classification standards of a plurality of classifications in response to the retrieved documents picked up by the retrieving operation. The classification standards are converted into retrieval conditions. The similarity between the converted retrieval conditions resultant from the classification standards and retrieved documents picked up by the retrieving operation is calculated. And, an attribute of each retrieved document picked up by the retrieving operation to each classification is calculated with reference to the similarity, thereby classifying each retrieved document into a classification having a highest attribute.

Description

經濟部智慧財產局員工消費合作社印製Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs

發明領I 本發明係關於一種文件檢索及分類方法,用來從儲 存一捆電子文件資料的資料庫如想要地搜尋文件。再者, 本發明關於一種實施本發明之文件檢索及分類方法的文件 檢索及分類系統。 發明背i 本發明適用於儲存在諸如裝設在字組處理器、辦公 電版、個人電腦或類似者中之記憶體裝置的_資料庫中之 多種文件資訊,以及儲存可載入它們的媒體之資訊。 在包括電子郵件、電子目錄、及電子公佈的資料通 k之領域中的近來發展提供給使用者可存取和獲得的大量 之文件資訊。再者’網際網路使用者之數目大幅增加。因 此’針對從巨大資料庫如想要地搜尋或收集文件的需求增 加。同時,針對如想要地把拾取的文件分類之需要增加。 然而,根據習用文件檢索及分類系統,檢索條件和 分類標準通常事先或根據使用者之喜好而碎定。在此方面 ’在檢索條件和分類標準之立場上習用文件檢索及分類系 統係靜態的》 本發明之概_ 本發明之一目的係改善文件檢索及分類上的彈性。 本發明之另一目的係允許使用者任意改變檢索條件 和分類標準之立場。 本發明之另一目的係允許使用者根據他們的瞬間判 斷’響應於檢索結果來實施檢索和分類操作* 本紙張尺度適用中國囤家標準(CNS)A4規格(210 X 297公釐) 4 ——:----:---.1''!展 * n n ^"J. I 線, C請先閱讀背面之注意事項再填寫本頁) A7 B7 絰濟部智慧財產局員Η消費合作钍印製 五、發明說明(2 本發明之另一目的係實現一自動分類以輔助使用者 之心智活動。 為了完成上述和其他相關目的.本發明提供一第一 文伴檢索及分類系統’包含用來允許使用者輸入檢索條件 和分類標準的一輸入,輸出裝置,:一檢索裝置被設置,來 依據包含任意字組和字元串的檢索條件在文件之資料庫上 實施=檢索操作,並用來計算在由檢索操作拾取的經檢索 文件和檢索條件間之相似度=一檢索結果儲存裝置被設置 來儲存由檢索操作拾取的經檢索文件ε 一分類標準轉換裝 置被設置來把分類標準轉換成檢索條件。分類標準被表示 為一組任意字紐_或任意字元串。一檢索結果分類裝置被設 置來依據多個分類標準把由檢索操作拾取的經檢索文件分 類 因此本發明可提供一彈性文件檢索及分類系統, 以在文件檢索和分類期間輔助心智活動。 根據本發明之較佳實施例,檢索裝置響應於由使用 者經由該輸入/輸出裝置輸入的檢索條件並依據由使用 者輸入的檢索條件在文件之資料庫上執行檢索操作。檢索 果儲存裝置健存由檢索裝置-之檢索操作拾取的經檢索文 件1類標準帱換裝置響應於由使用者經由該輸入/輸出 裝置輸八的多個分類標準.並產生由所輸人分類標準所致 的經輯換檢索條伴1索裝置箕在經轉換檢索條件和由 索备取的程礆索文件間之柘似度並把它儲存在礆 n尤Af 同4 .桧索培果V類裝置參考*檢索 >^1 cl I- t ^^1 i ^·1 «1 I * n· n I n I n IBP J Aft ^^1 I 線 (請先閱讀背面之注意事項再填寫本頁) ' :Ί0 · 4 69 38 6 ΚΙ ___ ____Β7___ 五、發明說明(3 ) 裝置算得的相似度來計算由檢索操作拾取之各經檢索文件 針對各分類標準的屬性,藉此實施一文件分類。 以此配置’使用者在檢索操作期間在心中有如此字 組或字元串時可任意輸入檢索條件。再者,使用者可把檢 索結果如想要地任意分類。 根攄本發明之較佳實施例’該輸入/輸出裝置允許使 用者輸入各包含一組任意字組或任意字元串的多個分類標 準’且分類標準轉換裝置把該組任意字組或任意字元串轉 換成檢索條件。 以此配置,使用者可輸入他們心中湧現的任意字組 或任意字元串作為分類標準(亦即,分類之立場)。因此在 設定分類之立場上給予大的彈性。 根據本發明之較佳實施例’該文件檢索及分類系統 更包含用來從一任意句子或文件抽取關鍵字的一關鍵字檢 測裝置。在此情形中’關鍵字檢測裝置響應於由使用者經 由該輸入/輸出裝置輸入的任意句子所表達之多個分類標 準’並從該經輸入句子抽取關鍵字。同時,該分類標準轉 換裝置把一組抽出關鍵字轉換成檢索條件。 以此配置,使用者可直接輸入屬於一想要欄位的任 意句子作為分類標準。這使得可能表示複雜的分類立場。 因此,分類之立場的設定可用多重層面來彈性實施。 根據本發明之較佳實施例,輸入/輸出裝置允許使用 者指定供用作多個分類標準的多個文件。經指定文件係從 由該檢索操作拾取的經檢索文件而選出。關鍵字檢測裝置 本紙張尺度適用t國國家標準<CNS)A4規格(210 X 297公爱) <請先閱讀背*之注意事項再填寫本頁) M. 訂 經濟部智慧財產局員工消費合作杜印製 6 經濟部智慧財產局員Η消費合作社印絮 A7 B7 五、發明說明(4 ) 從該經指定文伴抽取關鍵字。同時1分類標準轉換裝置 把一組柚出關鍵字轉換成檢索條件; 以此配置,在使用者證實由該檢索操作拾取的經檢 索文件後.使用者可選擇該經檢索文件本身或其部份作為 表達分類之立場因此.可易於實施分類之立場的設定。 再者,本發明提供一第二文件檢索及分類系統,包 含用來允許使用者輸入檢索條件的一輸入/輸出裝置。一 檢索裝置被設置,來依據包含任意字組和字元串的檢索條 件在文件之資料庫上實施-檢索操作,並用來計算在由檢 索操作拾取的經檢索文件和檢索條件間之相似度。一檢索 結果儲存裝置被設置來儲存由檢索操作拾取的經檢索文件 …關鍵字檢測裝置被設置來從由該檢索操作拾取的經檢 索文件柚取關鍵字—自動關鍵字分類裝置被設置來把經 抽出關鍵字自動分類成多個叢集一分類標準轉換裝置被 設置來把分類標準轉換成檢索條件。各個分類標準係分類 成各叢集的一組關鍵字。一檢索結果分類裝置被設置來依 據分類標準把由檢索操作拾取的—組經檢索文件分類。 因此本發明可提供一自動文件檢索及分類系統, 以在文件檢索和分類期間輔助心智活動^ 很據本發明之較佳實施例.檢索裝置響應於由使用 者經由該輸八輸出裝置輸入的檢索條件並依據由使用 者輸~的檢索條件在文件之f料庫上執行檢索操作:檢索 5果储存裝置儲存宙撿索裝置之檢索操作拾取的經檢索文 迮攻r贺冼取浞由該.檢拿.操作拾驭的經檢索文沖抽 I Μ,----------------漆 (請先閱讀背面之注意事項再填寫本頁) 經濟邨智慧財產局具工消费合作社印樂 4 6 9 38 6 A7 _ B7 五、發明說明(5 ) 取關鍵字。該自動關鍵字分類裝置把經抽出關鍵字自動分 類成多個叢集。分類標準轉換裝置產生由各係分類成各叢 集的一組關鍵字之分類標準所致的經轉換檢索條件。檢索 裝置計算在經轉換檢索條件和由檢索操作拾取的經檢索文 件間之相似度,並把它儲存在檢索結果儲存裝置中。同時 ’檢索結果分類裝置參考由檢索裝置算得的相似度來計算 由檢索操作拾取之各經檢索文件針對各分類標準的屬性, 藉此實施一文件分類。 以此配置’變得可能不依賴使用者輸入的分類標準 而自動柚取檢索結果内在的分類之立場。使用者可自動獲 得他們未想到的分類立場。無需特別努力。結果,變得可 能有效輔助文件分類工作。 再者’本發明提供—第一文件檢索及分類方法,包 含,依據由使用者輸入的檢索條件在文件之資料庫上實施 一檢索操作,以如想要地拾取經檢索文件的一步驟;允許 使用者響應於由該檢索操作拾取的經檢索文件來輸入多個 分類之分類標準的一步驟;把該等分類標準轉換成檢索條 件的一步驟·’計算在從該等分類標準所致的經轉換檢索條 件和由該檢索操作拾取的經檢索文件間之相似度的一步驟 ;及參考由該相似度來計算由該檢索操作拾取之各經檢索 文件針對各分類的屬性之一步驟,藉此把各經檢索文件分 類成具有最高屬性的_類別。 以此方法’使用者在該檢索操作期間在心中有如此 干組時可任意輸入該等檢索條件。再者,使用者可把檢索 本紙張尺度適用中因a家標準(CNS)A4規格(210 X 297公楚 II丨——--------裝---- I n n amme ϋ 1 .^1 t 線-, (請先閱讀背面之注項為填寫本頁) A7 經濟郭.智慧財產局員工湞費合作社印製 五、發明說明(6 ) 結果如想要地任意分類:因此, „ 本發明在文件檢索和分類 期間可輔助心智活動= 根據本發明之較佳實施例.當使用者釺對各分類之 :類標準輸八一組任意字組或任意字元串g寺所輪入任意 或任心子凡串被轉換成檢索條件,且在經轉換檢索條 件和由該檢索操作拾取的經檢索文件間的相似度被計算。 乂此方去使闬者可輸八他們心中湧現的任意字組 ::意字元串作為分類標準(亦即,分類之立場卜因此在 设疋分類之立場上給予大的彈性。 根據本發明之較佳實施例,當使用者輸入任意句子 供用作各分類之分類標準時,關鍵字從該句子Μ、一組 抽出的關鍵字被轉換成檢索條件、且在經轉換檢索條件和 由該Μ索钿作拾取的經檢索文件間的相似度被計算: 以此方法,使用者可直接輸入屬於一想要襴位的任 意句子作為分類標準。這使得可能表達複雜的分類立場。 因此‘分類之立場的設定可用多重層面來彈性實施。 根據本發明之較佳實施例,使用者指定由該檢索操 作拾取的經檢索文件中的多個文件,經指定文件供用作各 分類之分類標準:然後.關鍵字從抽出的文件抽出—組 柚出的關鍵字被轉換成檢索條件同時,在經轉換檢索條 件和由該檢索操作拾取的經檢索文件間的相似度被計算。 以此配置存使用奮證實由該檢索操作拾取的經檢 索义#後 使用者叮選擇該經礆索文件本身或其部份作為 f達分類艾-場 ;g兑 ^易斤實施分類之立場的設定, 裝--------訂-------- (請先閱讀背面之ii意事項再填寫本頁) Α7 θ-β 4 δ9 386 〜__Β7___ 五、發明說明(7 ) 再者,本發明提供一第二文件檢索及分類方法,包 含:依據由使用者輸入的檢索條件在文件之資料庠上實施 一檢索操作,以如想要地拾取經檢索文件的一步驟;從由 該檢索操作拾取的經檢索文件抽取關鍵字之一步驟;把抽 出的關鍵字分類成多個叢集的一步驟;把屬於各叢集的一 組柚出關鍵字轉換成檢索條件的一步驟;計算在從該抽出 之關鍵字所致的經轉換檢索條件和由該檢索操作拾取的經 檢索文件間之相似度的一步驟;及參考該相似度來計算由 該檢索操作拾取之各經檢索文件針對各分類的屬性之一步 驟,藉此把各經檢索文件分類成具有最高屬性的一分類。 以此方法,變得可能不依賴使用者輸入的分類標準 而自動抽取檢索結果内在的分類之立場。使用者可自動獲 得他們未想到的分類立場。無需特別努力《結果,變得可 能有效辅助文件分類工作。 J式之簡I描碑 從與伴隨圖式一起讀取的下面詳細描述,本發明之 上述和其它目的、特徵和優點將變得更明顯,其中: 第1圖係顯示依據本發明之第一實施例的一文件檢索 及分類系統之結構配置的功能方塊圖: 第2圖係顯示由依據本發明之第一實施例的文件檢索 及分類系統獲得之檢索結果的視围; 第3圖係顯示根據依據本發明之第一實施例的分類標 準之檢索結果的視圖; 第4圈係顯示依據本發明之第一實施例的屬性之計算 本紙張尺度適用中囷囷家標準(CNS〉A4規格(210 x 297公釐〉 — — — — — 1111111 一5, I—— — — — — — — <請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印繫 10 五、發明説明( A: B7 經濟部智慧財產局員工消費合作.i±fp¥e 的視圖; 第5圖係顯示依據本發明之第一實施例的文件分類之 結果的視圊: 第6圖係顯示依據本發明之第二實施例的一文件檢索 及分類系統之結構配置的功能方塊圖:及 第7圖係顯示依據本發明之第三實施例的一文件檢索 及分類系統之結構配置的功能方塊圖。 _接f施例之詳細描述 此後,將參考伴隨圖式來解說本發明之較佳實施例 :綜貫圖式相同元件由相同參考標號註明。本發明之實施 例係根據包含曰本字的文件。因此,下列解說包括漢字及 ,或片假名’接著有括弧中的英文翻譯。 第一實施例 第1圊係顯示依據本發明之第一實施例用來實施— 件檢索及妒類方法的-文件檢索及分類系統之結構配置 功能方塊圖 在第1圖+顯示的文件檢索及分類系統中,一輸入/輸 出部段21允許❹者輸入檢索條件和分類標準,並也輸 檢索結0㈣結果u料部段24儲存文件一 索部段23訂算經檢索文件和檢索條件間的相似度…” 結果儲存部段乃储存如經檢索文件的檢索結果一分類標 進轉換部㈣接收從輸出部段2丨供應的分類標 隸M 類料轉^檢章.部能㈣.能的檢索 —抬㈣果㈣部段轉“ Μ部齡|墓得的相 文 的 出 檢索 準' 條件 (請先閱讀背面之注意事項再填寫本頁) —裝 -線 規柊 A7 469386 ______B7______ 發明說明(9) 依據分類標準把經檢索文件分類。 此後將解說依據第一實施例的文件檢索及分類處理 之細節。 首先,使用者把檢索條件輸入到輸入/輸出部段21。 例如’下列邏輯(布林)表示式1可給定作為檢索條件。 (米 OR u / OR政策) (1) 其中米係代表稻米之漢字、3 >係代表稻米的片假 名、且政策係代表政策的漢字。 檢索部段23根據檢索條件來檢索儲存在文件儲存部 段24中的文件。檢索部段23可根據包含任意字組或任意字 元串的檢索條件來實施檢索。再者,檢索部段23可計算檢 索結果和檢索條件間的相似度。 如在已公開曰本專利申請案第9-3 19766號中揭露的, 此種檢索部段可併有能夠檢測一指動字組存在其中的所有 文件之整個句子檢索部段。 檢索條件和經檢索文件(亦即,檢索結果)卬間的相似 度可以下式來表示。 S(Dj)=2 {fij X(l-log(di/N))} 其令Σ係相對一變數”厂,的總和、”fij·’,代表文件切令 各字組”ti”之發生頻率(或程度)、,’di,,代表其中字組,,ti,,出 現的文件數目、且”N”代表經檢索文件之總數。 上式表達取用檢索條件中涉及的個別字組,,ti”之相似 度總和。 此根據’’TFIDF”方法係參照為字組加權,且根據内 A7 B7 五、發明說明( 10 尺度的相似度計算: 現在假設在某一經檢索文件Dj中.目前檢索條件中 的檢索字組具有下列的發生頻率' 亦即,"fij··給定如下= 米(稻米(漢字)) 3 口〆(稻朱(片假名)) 2 政架(政策丨 1 同時' 儲存在文件儲存部段24中的所有文件中,涉 及各檢索字組的文件數目如下:亦即,”dr給定如下 米{稻米(漢字)) 5000 π / (稻米(片假名)) 1250 政策(政策) 2500 當N= ] 0000時,以下列方式來計算Dj之相似度S(Dj) (锖先¾讀背面之>i意事項再填寫本頁) .—裝 經濟部智慧財產局_工消費合作社印繫Invention Field I The present invention relates to a document retrieval and classification method for searching a document as desired from a database storing a bundle of electronic document data. Furthermore, the present invention relates to a document retrieval and classification system that implements the document retrieval and classification method of the present invention. SUMMARY OF THE INVENTION The present invention is suitable for storing various kinds of file information in a database such as a memory device installed in a word processor, an office electronic version, a personal computer, or the like, and storing a medium in which they can be loaded. Information. Recent developments in fields including e-mail, electronic catalogs, and electronically published data links have provided users with a wealth of document information that can be accessed and obtained. Furthermore, the number of Internet users has increased significantly. Therefore, 'the need to search or collect documents from a huge database if desired. At the same time, the need for classifying picked files as desired increases. However, according to the conventional document retrieval and classification system, the search conditions and classification criteria are usually determined in advance or according to the preferences of users. In this respect, 'the conventional document retrieval and classification system is static from the standpoint of retrieval conditions and classification criteria.' Summary of the present invention_ One of the objects of the present invention is to improve the flexibility in document retrieval and classification. Another object of the present invention is to allow users to arbitrarily change search conditions and classification criteria. Another object of the present invention is to allow users to perform search and classification operations in response to search results based on their instantaneous judgments * This paper size applies the China Store Standard (CNS) A4 (210 X 297 mm) 4 —— : ----: ---. 1 ''! Exhibition * nn ^ " J. I line, C Please read the notes on the back before filling this page) A7 B7 ΗMember of Intellectual Property Bureau of the Ministry of Economic Affairs ΗConsumption Cooperation 钍Printed 5. Description of the invention (2 Another object of the present invention is to achieve an automatic classification to assist the user's mental activity. In order to accomplish the above and other related purposes. The present invention provides a first text peer retrieval and classification system 'contained use An input, output device to allow a user to enter search conditions and classification criteria: A search device is set to perform a = search operation on a database of documents according to a search condition containing arbitrary strings and character strings, and is used to Calculate the similarity between the retrieved files picked up by the retrieval operation and the retrieval conditions = a retrieval result storage device is set to store the retrieved files picked up by the retrieval operation ε a classification standard conversion device is The classification criteria are converted into search conditions. The classification criteria are represented as a set of arbitrary characters or strings. A retrieval result classification device is configured to classify the retrieved documents picked up by the retrieval operation according to a plurality of classification criteria. Therefore, the present invention can provide a flexible document retrieval and classification system to assist mental activities during document retrieval and classification. According to a preferred embodiment of the present invention, the retrieval device is responsive to a retrieval condition input by a user via the input / output device. The search operation is performed on the document database according to the search conditions input by the user. The search result storage device stores the retrieved file type 1 standard conversion device picked up by the search operation of the search device-response in response to the The input / output device enters multiple classification criteria, and generates a search and retrieval search bar accompanied by the input classification criteria with a search device between the converted search conditions and the process request file obtained from the request.柘 Likeness and store it in 礆 nYou Af Same as 4. 桧 索佩 果 V type device reference * Retrieve> ^ 1 cl I- t ^^ 1 i ^ · 1 «1 I * n · n I n I n IBP J Aft ^^ 1 I line (please read the precautions on the back before filling in this page) ': Ί0 · 4 69 38 6 ΚΙ ___ ____ Β7 ___ V. Description of the invention (3) The similarity calculated by the device is calculated by the retrieval The attributes of each retrieved document picked up by the operation are sorted based on the attributes of each classification criterion, thereby implementing a document classification. In this configuration, the user can enter search conditions arbitrarily when such a group or character string is in mind during the retrieval operation. The user can arbitrarily classify the search results as desired. Based on the preferred embodiment of the present invention, 'the input / output device allows the user to input multiple classification criteria each containing a set of arbitrary words or arbitrary character strings. 'And the classification standard conversion device converts the set of arbitrary words or strings into search conditions. With this configuration, the user can input any word group or any character string emerging in their mind as a classification criterion (that is, a classification position). Therefore, a great deal of flexibility is given to the stance of setting classifications. According to a preferred embodiment of the present invention ', the document retrieval and classification system further includes a keyword detection device for extracting keywords from an arbitrary sentence or document. In this case, the 'keyword detection means is responsive to a plurality of classification criteria expressed by a user via an arbitrary sentence input by the input / output means' and extracts keywords from the input sentence. At the same time, the classification standard conversion device converts a set of extracted keywords into search conditions. With this configuration, the user can directly input any sentence belonging to a desired field as a classification criterion. This makes it possible to express complex classification positions. Therefore, the setting of classification positions can be implemented flexibly with multiple levels. According to a preferred embodiment of the present invention, the input / output device allows a user to specify a plurality of files for use as a plurality of classification criteria. The designated document is selected from the retrieved documents picked up by the retrieval operation. Keyword detection device This paper standard is applicable to national standards of the country < CNS) A4 specification (210 X 297 public love) < Please read the precautions on the back before filling this page) Cooperative Du Printing 6 Member of the Intellectual Property Bureau of the Ministry of Economic Affairs and Consumer Cooperatives Print A7 B7 V. Description of the invention (4) Extract keywords from the designated text companion. At the same time, a 1-category standard conversion device converts a group of keywords out into search conditions; with this configuration, after the user confirms the retrieved document picked up by the retrieval operation, the user can select the retrieved document itself or a part thereof As a position for expressing classifications, the setting of classification positions can be easily implemented. Furthermore, the present invention provides a second document retrieval and classification system including an input / output device for allowing a user to input retrieval conditions. A retrieval device is provided to perform a -retrieval operation on a database of documents according to retrieval conditions containing arbitrary strings and character strings, and is used to calculate the similarity between the retrieved files and retrieval conditions picked up by the retrieval operation. A retrieval result storage means is provided to store the retrieved documents picked up by the retrieval operation ... The keyword detection means is arranged to retrieve keywords from the retrieved documents picked up by the retrieval operation-an automatic keyword classification means is arranged to retrieve the retrieved documents The extracted keywords are automatically classified into a plurality of clusters. A classification criterion conversion device is set to convert the classification criteria into search conditions. Each taxonomy is a set of keywords classified into clusters. A retrieval result sorting means is arranged to sort the set of retrieved documents picked up by the retrieval operation according to the sorting criteria. Therefore, the present invention can provide an automatic document retrieval and classification system to assist mental activities during document retrieval and classification. ^ According to a preferred embodiment of the present invention, the retrieval device responds to the retrieval input by the user via the input and output device. Conditions and according to the search conditions entered by the user to perform a retrieval operation on the f database of the file: the retrieval operation retrieved by the retrieval operation of the retrieval device of the 5th storage device to store the retrieved text 迮 rr 冼Seize and retrieve the retrieved text of the operation to extract I Μ, ---------------- lacquer (please read the precautions on the back before filling this page) Economic Village Intellectual Property Bureau Consumer Goods Cooperative Yinle 4 6 9 38 6 A7 _ B7 V. Description of Invention (5) Take keywords. The automatic keyword classification device automatically classifies the extracted keywords into a plurality of clusters. The classification standard conversion means generates a converted search condition caused by the classification criteria of a set of keywords in which each department is classified into each cluster. The retrieval means calculates the similarity between the converted retrieval condition and the retrieved document picked up by the retrieval operation, and stores it in the retrieval result storage means. At the same time, the retrieval result classification device refers to the similarity calculated by the retrieval device to calculate the attributes of each retrieved file picked up by the retrieval operation for each classification criterion, thereby implementing a document classification. With this configuration ', it becomes possible to automatically take the stand of the classification inherent in the search results independently of the classification criteria input by the user. Users can automatically obtain classification positions that they did not expect. No special effort is required. As a result, it becomes possible to effectively assist in document classification. Furthermore, the present invention provides a first document retrieval and classification method including a step of performing a retrieval operation on a document database according to retrieval conditions input by a user to pick up the retrieved document as desired; allowed A step in which a user inputs a classification criterion of a plurality of classifications in response to a retrieved document picked up by the retrieval operation; a step of converting the classification criteria into a search condition; A step of converting a search condition and a similarity between the retrieved files picked up by the search operation; and a step of calculating an attribute of each category of each retrieved file picked up by the search operation with reference to the similarity, whereby Each retrieved file is classified into a category with the highest attribute. In this way, the user can arbitrarily enter the search conditions when there is such a group in mind during the search operation. Furthermore, the user can retrieve the paper size applicable to the National Standard (CNS) A4 specification (210 X 297 public Chu II 丨 -------- installation ---- I nn amme ϋ 1 . ^ 1 t line-, (Please read the note on the back to fill out this page first) A7 Economic Guo. Printed by the Intellectual Property Bureau Staff Expense Cooperatives V. Invention Description (6) The results can be arbitrarily classified as desired: Therefore, „The present invention can assist mental activity during document retrieval and classification = according to a preferred embodiment of the present invention. When the user 釺 for each classification: class standard enters a set of eight arbitrary words or any character string. The arbitrary or arbitrary strings are converted into search conditions, and the similarity between the converted search conditions and the retrieved files picked up by the search operation is calculated. This method allows the loser to lose eight Arbitrary word group :: Italian character string as a classification criterion (ie, the position of classification therefore gives great flexibility in the position of classified classification. According to a preferred embodiment of the present invention, when a user enters an arbitrary sentence for use When making the classification criteria for each classification, the keywords are extracted from the sentence M and a group Keywords are converted into search conditions, and the similarity between the converted search conditions and the retrieved files picked up by the M cable is calculated: In this way, the user can directly enter the Arbitrary sentences are used as classification criteria. This makes it possible to express complex classification positions. Therefore, the setting of the 'classification position' can be flexibly implemented in multiple levels. According to a preferred embodiment of the present invention, a user specifies a retrieved document picked up by the retrieval operation. Multiple files in the specified file are used as the classification criteria for each classification: Then. Keyword is extracted from the extracted file-the keywords of the group are converted into search conditions. At the same time, after the search conditions are converted and the search operation is performed, The similarity between the retrieved retrieved files is calculated. With this configuration, the user retrieves the retrieved file itself or a part of it as a fida classification after using the retrieved search result to confirm the retrieval operation picked up by the retrieval operation. Field; the setting of the position of the classification of g and ^ yi jin, -------- order -------- (please read the second notice on the back before filling this page) Α7 θ-β4 δ9 386 ~ __Β7 ___ V. Description of the invention (7) Furthermore, the present invention provides a second document retrieval and classification method, which includes: performing a retrieval operation on the data frame of the document according to the retrieval conditions input by the user, such as A step of picking up retrieved files desirably; a step of extracting keywords from the retrieved files picked up by the retrieval operation; a step of classifying the extracted keywords into multiple clusters; a group of grapefruits belonging to each cluster A step of converting a keyword into a search condition; a step of calculating a similarity between the converted search condition caused by the extracted keyword and the retrieved file picked up by the search operation; and referring to the similarity to One step of calculating the attributes of each retrieved document picked up by the retrieval operation for each classification, thereby classifying each retrieved document into a classification having the highest attribute. In this way, it becomes possible to automatically extract the stance of the inherent classification of the search result without relying on the classification criteria input by the user. Users can automatically obtain classification positions that they did not expect. No special effort is needed, as a result, it becomes possible to effectively assist in document classification. The above-mentioned and other objects, features, and advantages of the present invention will become more apparent from the following detailed description of the J formula read from the accompanying drawings, in which: Figure 1 shows the first according to the present invention A functional block diagram of the structural configuration of a document retrieval and classification system according to the embodiment: FIG. 2 is a view showing a view of a retrieval result obtained by the document retrieval and classification system according to the first embodiment of the present invention; FIG. 3 is a display A view of the search results according to the classification standard according to the first embodiment of the present invention; The fourth circle shows the calculation of the attributes according to the first embodiment of the present invention. The paper size is applicable to the Chinese standard (CNS> A4 specification ( 210 x 297 mm> — — — — — 1111111 a 5, I — — — — — — — < Please read the notes on the back before filling out this page) Department of Economics, Intellectual Property, Employees, Consumer Cooperatives, Department of Printing 10 5 、 Explanation of invention (A: B7 View of employee co-operation. I ± fp ¥ e of Intellectual Property Bureau of the Ministry of Economic Affairs; Figure 5 is a view showing the result of document classification according to the first embodiment of the present invention: No. 6 Is a functional block diagram showing the structural configuration of a document retrieval and classification system according to the second embodiment of the present invention: and FIG. 7 is a diagram showing the structural configuration of a document retrieval and classification system according to the third embodiment of the present invention Functional block diagram. _Following the detailed description of the embodiment, hereafter, the preferred embodiment of the present invention will be explained with reference to the accompanying drawings: the same elements of the comprehensive drawing are designated by the same reference numerals. The embodiments of the present invention are based on the This character file. Therefore, the following explanation includes Chinese characters and or katakana 'followed by an English translation in parentheses. First Embodiment The first example shows that the first embodiment of the present invention is used to carry out-case retrieval and jealousy. Method-like structure of a document retrieval and classification system. Functional block diagram. In the document retrieval and classification system shown in Figure 1+, an input / output section 21 allows the user to enter retrieval conditions and classification criteria, and also enters retrieval. In the end, the result section 24 stores the file, and the search section 23 determines the similarity between the retrieved file and the search conditions ... "The result storage section stores the retrieved text. The search result of a classification is entered into the conversion section, and the classification target supplied from the output section 2 丨 the M class material is transferred to the inspection chapter. The department can be searched—can be retrieved—lift the fruit section to “M 部 AGE | The criteria for the retrieval and retrieval of the tomb's photos (please read the precautions on the back before filling in this page) —installation-wire gauge 柊 A7 469386 ______B7______ Description of the invention (9) The searched documents are classified according to the classification criteria. The explanation will be explained later The details of the document retrieval and classification processing according to the first embodiment. First, the user enters a retrieval condition into the input / output section 21. For example, 'the following logical (Bolin) expression 1 can be given as a retrieval condition. (M OR u / OR policy) (1) where rice is the Chinese character for rice, 3 > is the katakana for rice, and policy is the Chinese character for policy. The retrieval section 23 retrieves documents stored in the document storage section 24 based on the retrieval conditions. The search section 23 may perform a search based on a search condition including an arbitrary word group or an arbitrary character string. Furthermore, the retrieval section 23 can calculate the similarity between the retrieval results and the retrieval conditions. As disclosed in Published Patent Application No. 9-3 19766, such a retrieval section may be combined with an entire sentence retrieval section capable of detecting all documents in which a finger group exists. The similarity between the search condition and the retrieved file (that is, the search result) can be expressed by the following formula. S (Dj) = 2 {fij X (l-log (di / N))} The sum of Σ is a relative variable “factory”, “fij ·”, which represents the occurrence of the document “ti” Frequency (or degree), 'di,' represents the number of documents in which the phrase ,, ti ,, appears, and "N" represents the total number of retrieved documents. The above expression takes the sum of the similarity of the individual words involved in the search conditions, ti ". This is based on the weighting of words according to the" TFIDF "method, and according to A7 B7 V. Description of the invention (10-scale similarity Degree calculation: Now suppose that in a retrieved document Dj. The search phrase in the current search condition has the following occurrence frequency ', that is, " fij ·· Given as follows = rice (rice (Chinese characters)) 3 口 〆 ( Daozhu (Katakana)) 2 Politics (Policy 丨 1 At the same time, among all the documents stored in the document storage section 24, the number of documents related to each search block is as follows: that is, "dr is given as follows {rice (Chinese characters)) 5000 π / (rice (Katakana)) 1250 Policy (policy) 2500 When N =] 0000, calculate the similarity S (Dj) of Dj in the following way (锖 first read the back of the > i (Please fill in this page for the matters needing attention) .— Install the Intellectual Property Bureau of the Ministry of Economic Affairs_Industrial and Consumer Cooperatives

SfDj) = 3 Xf ]-log(5000/10000)) -2 x ( i-!ogf 1250/10000)) ~1 x (l-log(2500/10000)) =6+8+3=17 第2圖顯示檢索結果之細節,其中各經檢索文件之文 件數目係與各經檢索文件之相似度和内容一起顯示根據 第2圖,總共丨0文件根據上述檢索條件而拾取了.並以相 位度之次序來列等級 籍由把最大值指定到100來把各文 4疋相似度標稱化、檢索結果储存在檢索結果儲存部段2《 看 餘' 輸出# ί索纪:果後.' _ΰ_!_货_ 道檢索結果 丨、…新的搜尋 ^OJ- n n n ί I n n n n i n -i if i8 >,_·iaf If. tfay-M. -1 ;ί:Λ; 規Μ 469386 Α7 Β7 經濟部智慧財產局負工消費合作社印製 五、發明說明(1〗) 把目前獲传的檢索結果分類。 當使用者要把第2圖中顯示的檢索結果分類時,使用 者了經由輪入/輸出部段21來輸入多個分類標準。根攄第 一實施例’分類標準係表達分類之立場的字組。例如,使 用者可經由輸入/輸出部段2 1來輸入下列字組作為分類標 準。 分類標準1 : (稻米;片假名),米価(稻米價格), 新食糧法(新主食控制法案) 分類標準2 :北朝鮮(北韓),中国(中國),米朝協議( 美國-北韓談判) 分類標準3 :米国(美國),米軍(美軍) 分類標準轉換部段22把所輸入分類標準轉患成檢索 部段23中可能的檢索條件。 例如,較佳藉由適當組合輸入為分類標準的字組且 然後使用AND來把所產生邏輯表達與供用作最後檢索條 件的上列邏輯表達1連接而做成一邏輯表達。下列係經轉 換檢索條件之例子。 檢索條件1 : (3〆OR米價)AND(米OR〕〆OR政策) 檢索條件2 :(北朝鮮OR中国)AND(米OR 3〆OR政策) 檢索條件3 :(米国OR米軍)AND(米OR 3〆OR政策) 用AND把最後檢索條件(亦即,邏輯表達1)舆分類標 準連接為較佳來縮減要檢索數量之尺度。 其次,檢索部段23根據上述檢索條件1至3來實施搜 尋,並獲得第3囷中顯示的檢索結果。 本紙張尺度適用中國國家標準(CNS>A4規格(210 X 297公« ) 14 (請先《讀背面之注意事項再填寫本真) I裝 -SJ. A: B7 五、發明說明(l2 …如第,中顯示的·由檢㈣件…獲得的檢索社 係第2圖中具§ - A 加相 ’' %果 .屬不的整個集合之經檢索文件之子集合= 中.ίν寸於各说y +々侏 ° 3 _ '二檢索文件的相似度顯示相對於—對應 件的一算桿沾4 /办战 ’心每索條 \旳相似度值。現在假設saj)代表—文件.广 檢索條件(亦即,分類標準ντ的相似度 對 其次‘檢索結果分類部段26計算文件.‘Ρ針對 Μ '·; r' ,, 问 $貝標 .1 屬性Tn.jp例如,下式用來計算屬性丁(丨,丨)。 T<!〇)-t *SfKj)-(l-C)MOO^(S(i.j)/I S(i.k)) {2) 其中Σ係相對一變數,.k,·之總和 '且c係範圍〇<c<丨中 的一常數。 上式2只是用來獲得屬性的例子:用來計算屬性 I(i'j)的方法不限於公式2 第4圖顯示在〇 0.5施於文件卜1 〇和分類t至3的條件下 拫捸上式2計算的屬性T(i,j)。 檢索結果分類部段26使用下式3來辨認具有相對於各 文件”丨的最南屬性T (丨,j)的一分類。 c(i)=max{T(i,j)} (3) 其中”max”係相對一變數”j”的最大值; 最後’檢索結果分類部段26輸出文件屬於分類c(丨丨 的結·論:此結論羥由輸八/輸出部段2丨顯示給或通知使用 者 第 < 圖顯;τ概據第4圖中顯;f:的倒子π最後化分類的 餐出例子: % _} ¥ Ά :广;第:Κ屮顧π的檢.索结累4 a; f.個集 -t ---I I I----------— — Ini — In f琦先閱锖背面之达意事項再填寫本頁) 經濟部智慧財產局員Η消費合竹私印繁SfDj) = 3 Xf] -log (5000/10000)) -2 x (i-! Ogf 1250/10000)) ~ 1 x (l-log (2500/10000)) = 6 + 8 + 3 = 17 2nd The figure shows the details of the search results, in which the number of files of each retrieved file is displayed together with the similarity and content of each retrieved file. According to Figure 2, a total of 0 files are picked up according to the above search conditions. The order of the ranks is specified by specifying the maximum value to 100 to normalize the similarity of each article, and the search results are stored in the search result storage section 2 "看 余 'output # 索索 纪: fruit after.' _Ϋ́_! _ 货 _ Search results 丨, ... new search ^ OJ- nnn ί I nnnnin -i if i8 >, _ · iaf If. Tfay-M. -1 ; ί: Λ; Regulation M 469386 Α7 Β7 Ministry of Economy Wisdom Printed by the Property Coordination and Consumer Cooperatives V. Description of the Invention (1) Sort the search results that have been passed. When the user wants to classify the search results shown in Fig. 2, the user inputs a plurality of classification criteria via the turn-in / output section 21. Based on the first embodiment, the classification criterion is a group of words expressing the position of classification. For example, the user can input the following words as classification criteria via the input / output section 21. Classification Standard 1: (Rice; Katakana), Rice Noodle (Rice Price), New Food Law (New Staple Food Control Act) Classification Standard 2: North Korea (North Korea), China (China), Rice North Korea Agreement (US-North Korea Negotiation) Classification criteria 3: Miguchi (USA), Mijun (US military) The classification criteria conversion section 22 converts the entered classification criteria into possible retrieval conditions in the retrieval section 23. For example, it is preferable to make a logical expression by appropriately combining the words input as the classification criteria and then using AND to connect the generated logical expression with the above-mentioned logical expression 1 for use as the final retrieval condition. The following are examples of search conditions that are converted. Search condition 1: (3〆OR rice price) AND (rice OR) 〆OR policy) Search condition 2: (North Korea OR China) AND (rice OR 3〆OR policy) Search condition 3: (rice country OR rice army) AND (Mi OR 3〆OR policy) Use AND to connect the last search condition (ie, logical expression 1) to the classification criteria to better reduce the scale of the number to be retrieved. Next, the search section 23 performs a search based on the search conditions 1 to 3 described above, and obtains the search results shown in the third step. This paper size applies Chinese national standard (CNS > A4 specification (210 X 297 male «)) 14 (Please read" Notes on the back before filling in the true ") I-SJ. A: B7 V. Description of the invention (l2… such as As shown in the figure, the search department obtained from the inspection document ... in Figure 2 has §-A plus phase ''% fruit. The sub-collection of the retrieved documents that belong to the entire collection is not equal to = ίνinch Yu Geshuo y + 々 ° ° 3 _ 'The similarity of the two retrieved files is displayed relative to — the corresponding piece of a score 4 / do battle' heart per cable \ 旳 similarity value. Now suppose saj) represents — file. Condition (ie, the similarity of the classification criterion ντ calculates a file for its next 'retrieval result classification section 26.' P is for M '; r' ,, asks $ 贝 标. 1 attribute Tn.jp For example, the following formula is used Calculate attribute D (丨, 丨). T <! 〇) -t * SfKj)-(lC) MOO ^ (S (ij) / IS (ik)) {2) where Σ is a relative variable, .k, · Sum 'and c is a constant in the range 0 < c <. The above formula 2 is only an example for obtaining attributes: the method for calculating the attribute I (i'j) is not limited to the formula 2. The figure 4 shows the conditions of 0. 5 applied to the file 1 and the classification t to 3. The attribute T (i, j) calculated by Equation 2 above. The search result classification section 26 uses the following formula 3 to identify a classification having the southernmost attribute T (丨, j) with respect to each file ". C (i) = max {T (i, j)} (3) Among them, "max" is the maximum value of a relative variable "j"; Finally, the output file of the search result classification section 26 belongs to the classification c (丨 丨 Conclusion: This conclusion is shown by the input eight / output section 2 丨Or notify the user of the < graph display; τ profile is shown in graph 4; f: an example of the finalization of the inverted π meals:% _} ¥ Ά: 广; 第: Κ 屮 顾 π's check. Cable knot tired 4 a; f. A set -t --- II I ------------ — Ini — In f (read the information on the back of the card before filling in this page) Intellectual Property of the Ministry of Economic Affairs BureaucratsΗConsumption

A7 4 69 386 B7__ 五、發明說明(13 ) 合之經檢索文件)可根據係由使用者輸入之字組的分類標 準1至3而分類成多個子集合。 根據上例,〉荬子,米,係邏輯表示式1中給定的檢索元 素之一個。然而,曰本字,、米,.有多種意思。因此在第2圖 中顯不的經檢索文件中,一些文件(文件第6和10號)包括 表達”稻米’’的’'米”,而其他文件(文件第3、4、5、7和9號) 包括表達美國(亦即,合眾國)的’’米”。然而,使用者可接 著藉由輸入適當分類標準把這些文件分成不同類別。 再者’當使用者輸入檢索條件或分類標準時,使用 者可無需特別注意如’’新食糧法”和”米朝協議,,等由多個字 元構成的組合字而任意選擇檢索字组。 再者’在根據給定之分類標準把分類最後化後,使 用者可指定對應於一分類標準的一子集合作為針對另一分 類(亦即,精確分類)之新的數量。 如上述的,本發明之第一實施例提供彈性的文件檢 索及分類方法和系統。根據第一實施例,依據由使用者輸 入的檢索條件在文件之資料庫上實施一檢索操作,以如想 要地拾取經檢索文件。允許使用者牢應於由該檢索操作拾 取的經檢索文件來輸入多個分類之分類標準。把該等分類 標準轉換成檢索條件0在從該等分類標準所致的經轉換檢 索條件和由該檢索操作拾取的經檢索文件間之相似度被計 算。同時’參考該相似度來計算由該檢索操作拾取之各經 檢索文件針對各分類的屬性,藉此把各經檢索文件分類成 具有最高屬性的一分類。 本紙張尺度適用t國國家標準(CNS)A4規格(2〗0 X 297公釐) <請先閱讀背面之注項再填寫本頁> ij- 經濟部智慧財產局貝工消费合作社印製 16 五 、發明說明 (14 A7 B7 經濟部智慧財產局員工泪费合作杜 •J· π本發明之第一實施例,使 在心中右丄 π考在礆索#作期間 ’如此字組時可任意輸入檢索绛 可把檢f 条件1者,使用者 文件h 士' 凡第一貫把洌可在 ‘双东和分類期間輔助心智活動、- 分類據本㈣之卜實施例,當使用者針對各 人任^㈣準輸〜一組任意字組或任意字元串時,所輸 檢幸二組或任意字元串被轉換成檢索條件,且在經轉換 4 由該檢索操作拾取的經檢索文件間之㈣㈣幻异C. 立這使得使用者可輸入他們心中消現的任意字組或任 思子兀串作為分類標準(亦即,分類之立場}。因此,在設 定分類之立場上給予大的彈性c 第二貫施例 第6圖顯示依據本發明之第二實施例的—文件檢索及 分類系統之結搆配置的功能方塊圈。 在第6圖中顯示的文件檢索及分類系統中、—輸入/輸 出部段11允許使用者輸人檢索條件和分類標準,並也輸出 檢索結果和分類結果—文件储存部段15儲存文件。一檢 索#段丨4計算經檢索文件和檢索條件間的相似度一檢索 結果赌存部段聞存如經檢索文件的檢索結果…關鍵字 檢;'則部段i 2接收表達由使用者透過輸八/輸出部段丨丨輸入 的分類H場的Π並從所收到句子檢出關鍵字經檢 關鍵字 m μ 用Ύ輕樸準.來§择.¾ 、 i彳*準轉換部段13接收 分類標準厂並 國 1:標,规格 CiiO、 -i -------------Μ------------- f請先閱讀背面之注意事項再填寫本頁) 4 69 38 6 A7 B7 五、發明說明(15 ) 經濟部智慧財產局員工消费合作社印髮 所輸入關鍵字轉換成檢索部段14中可能的檢索條件。一檢 索結果分類部段17參考由檢索部段14算得的相似度依據分 類標準把經檢索文件分類。 此後將解說依據第二實施例的文件檢索及分類處理 之細節。 首先’使用者把檢索條件輸入到輸入/輸出部段2 j。 像第1實施例地現在假設由邏輯表示式1界定的檢索條件 被輸入。同時’第2圖中顯示的檢索結果被獲得。 當使用者要把第2圊中顯示的檢索結果分類時,使用 者可透過輸入/輸出部段Π來輸入多個分類標準。根據第 二實施例,分類標準係表達分類之立場的句子 '辨認經檢 索文件之參考號碼、或經檢索文件之本質部份。 例如,使用者可經由輸入/輸出部段11來輸入下列句 子作為分類標準。 分類標準4: 3〆市場妒政府〇米価政(關 於稻采市場和政府的稻米價格政策) 分類標準5 :北朝鮮妒中国々i,,(c对寸5米国对応( 美國針對北韓和中國的態度) 分類標準6:韓国铲曰本匕妇(十§米軍問題(在韓國和 曰本的美軍之問題) 響應於如此句子之輸入’第二實施例之文件檢索及 分類系統實施下面處理》 關鍵字檢測部段12根據構詞分析藉由使用一字典資 料庫(未顯示)來抽取出現在各句子中的字組,並選擇各句 請 先« 讀 背 面 之 注 項 再 填 !裝 頁 訂 線 本紙張尺度適用中國國家標準(CNS)A4規格(21〇 κ 297公釐) 18A7 4 69 386 B7__ 5. Description of the invention (13) The retrieved documents in combination) can be classified into multiple sub-sets according to the classification criteria 1 to 3 of the word groups input by the user. According to the above example,> Xunzi, M, is one of the given search elements in logical expression 1. However, the Japanese word, rice,. Has multiple meanings. Therefore, among the retrieved documents shown in Figure 2, some documents (documents Nos. 6 and 10) include `` rice '' expressing "rice", while other documents (documents 3, 4, 5, 7 and No. 9) includes "meters" expressing the United States (i.e., the United States of America). However, users can then divide these documents into different categories by entering the appropriate taxonomy. Furthermore, 'when the user enters a search condition or taxonomy, Users can arbitrarily select search groups without special attention, such as "New Food Law" and "Michao Agreement", etc. Combination words composed of multiple characters. Furthermore, the classification is finalized according to the given classification criteria Then, the user can designate a subset corresponding to a classification criterion as a new quantity for another classification (ie, precise classification). As mentioned above, the first embodiment of the present invention provides flexible document retrieval and classification Method and system. According to a first embodiment, a retrieval operation is performed on a database of documents according to retrieval conditions input by a user to pick up the retrieved document as desired. Allowing the user The classification criteria for multiple classifications shall be entered in the retrieved documents picked up by the retrieval operation. These classification criteria are converted into retrieval conditions 0 in the transformed retrieval conditions resulting from the classification criteria and the retrieval criteria picked up by the retrieval operation The similarity between the retrieved documents is calculated. At the same time, the attributes of each retrieved document picked up by the retrieval operation for each classification are calculated with reference to the similarity, thereby classifying each retrieved document into a classification having the highest attribute. This paper size applies to the national standard (CNS) A4 specifications (2) 0 X 297 mm. ≪ Please read the note on the back before filling out this page > ij- Printed by the Shellfish Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 16 V. Description of the invention (14 A7 B7 The cooperation between employees of the Intellectual Property Office of the Ministry of Economic Affairs and tears. Du J. Pi. The first embodiment of the present invention makes it possible to test in the heart the right time when you do this. Arbitrary input retrieval can check f condition 1, user file h '' Anyone who consistently controls can assist mental activity during the 'double-dong and classification', according to the embodiment of the classification, when the user Each person ^ ㈣ quasi-input ~ a set of arbitrary words or strings of characters, the two groups of words or strings of characters entered are converted into search conditions, and the converted 4 files retrieved by the search operation are converted 4 C. Legacy This makes it possible for users to enter any word group or string of Rensizi that they find in their minds as the classification criteria (ie, the position of the classification). Therefore, it gives great flexibility in setting the classification. c Second Embodiment 6 FIG. 6 shows a functional block circle of the structure configuration of the document retrieval and classification system according to the second embodiment of the present invention. In the document retrieval and classification system shown in FIG. 6, input / The output section 11 allows the user to input search conditions and classification criteria, and also outputs the search results and classification results—the file storage section 15 stores files. One search section # 4 Calculates the similarity between the retrieved files and the search conditions. Retrieve the results of the search. The search results of the search results of the retrieved files ... keyword check; 'The section i 2 receives the expression of the classification H field entered by the user through the input / output section 丨 丨 from the Receive Sentence detection keyword, inspection keyword m μ, use Ύ light and simple quasi. To choose. ¾, i 彳 * quasi conversion section 13 receives the classification standard factory and country 1: standard, specifications CiiO, -i ---- --------- Μ ------------- f Please read the notes on the back before filling this page) 4 69 38 6 A7 B7 V. Description of the invention (15) Economy The input keywords issued by the Ministry of Intellectual Property Bureau's Consumer Cooperatives are converted into possible search conditions in the search section 14. A search result classification section 17 refers to the similarity calculated by the search section 14 to classify the retrieved documents according to the classification criteria. Details of the document retrieval and classification processing according to the second embodiment will be explained later. First, the user inputs a search condition to the input / output section 2 j. As in the first embodiment, it is now assumed that a search condition defined by logical expression 1 is input. At the same time, the search results shown in Fig. 2 are obtained. When the user wants to classify the search results displayed in the second step, the user can input multiple classification criteria through the input / output section Π. According to a second embodiment, the classification criterion is a sentence expressing a standpoint of classification, 'identifying a reference number of a retrieved document, or an essential part of a retrieved document. For example, the user can input the following sentences via the input / output section 11 as a classification criterion. Taxonomy 4: 3 market jealous government 0 meter government (on rice harvesting market and government's rice price policy) Taxonomy 5: North Korea is jealous of China 々i, (c to 5 meters country to 応 (United States for North Korea and China) Classification standard 6: Korean shovel shogunate woman (10 § Mijun problem (question of the US military in Korea and Japan) In response to the input of such a sentence, the document retrieval and classification system of the second embodiment implements the following processing 》 Keyword detection section 12 uses a dictionary database (not shown) to extract the word groups that appear in each sentence according to the word formation analysis, and select each sentence, please «read the note on the back and fill in! Dimensions of this paper are in accordance with China National Standard (CNS) A4 (21〇κ 297mm) 18

五、發明說明(ί6) 子之關鍵字(本貧字組卜 關於各句子之關鍵字的選取,期望事先檢查儲存在 文件健存部段15中㈣有幻半中組的發生頻率(或 度 > 且然後根據TFiDF..字組加權方法或類似者來選擇 關鍵字:在例如圖f館和資訊科學第26號(1988.)由Um_ 岍撰之根據發生頻率資訊的字組加權原理,·中有揭露此種 字組加權方法, 再者,如用來選擇關鍵字的另一方法.在從句子抽 取字組(亦即,字元上針對日本文件較佳考慮諸如片假 名、平假名、及漢字等字型之差異對於檢測未登錄在字 典中的新字或組合字這是有效的s 不用說'把上述兩種方法適當組合是較佳的。 根據第二實施例’參考字典資料庫來選擇各句子之 關鍵字。現在假設從上述分類標準4至6抽取下列字組, 分類標準4 . 口 / (稻米),市場(.市場),政府(政府) ,米佈政策(稻米價格政策) 分類標準5‘ :北朝鲜(北韓),中国(中國),米国(u.S.) 分類標準6 :韓国(韓國).曰本 < 日本),米軍(美軍) 問題(問題) 其後分類標準轉換部段1 3以和第〜·實施例之分類 標準轉換部段22中實施的處理相同方式把分類標準4*至6, 轉換成在撿索部段丨4 Φ可能的檢索條件… 再者使用者可能在.查看第2圖Φ顯示的檢索結 單後.:;r -疋緩冷t .5,.伴之文#數a作為分類標準 T.Tr-——_____ ___________ ”…v肩私澤 Λ. ;CN 丨 S);Vi 規楼Άϋ ) ,0 I 1— tr n n n n n 1— I n n <1 n 一OJ_ I» n i (請先閱讀背面之注$項再填寫本頁) 經濟部智慧财產局8工消費合作祍印製 Α7 Λ 69 386 ___Β7____ 五、發明說明(17 ) 〇 分類標準7 : 1,2 分類標準8 : 4,5 分類標準9 : 9 響應於如此參考編號之輸入,第二實施例之文件檢 索及分類系統實施下面處理。 瞄鍵字檢測部段12從文件儲存部段μ讀取由參考編 號(亦即’分類標準)指定的文件之本文,並抽取在經指定 文件中涉及的關鍵字^ 可以和從句子抽取關鍵字之上述例子中相同方式來 實施關鍵字之抽取。替換地,較佳從各文件事先柚取關鍵 字,並把所柚取關鍵字與對應文件一起儲存在文件儲存部 段15中。在此情形,關鍵字檢測部段丨2參考所指定文件( 亦即’分類標準)之參考編號雷讀取儲存在文件儲存部段15 中的關鍵字。 現在假設從上述分類標準7至9抽取下列字組。 分類標準7’ : :7〆(稻米),備蓄(存貨),食糧(食物) ,米価(稻米價格),農協(農會),生產(產品),農家(農民) ’稻作(稻米產量)’消費者(消費者),米(稻米) 分類標準8’ :北朝鮮(北韓),会談(會議),韓国(韓國) ’協議(會談),米(美國)’米韓(美國-韓國),問題(問題) ’南北(北-南),朝鮮半島(朝鮮半島),米軍(美軍) 分類標準9’ :沖绳(沖繩),米国(美國),連邦(聯邦) ,調查(調查),返還(返回)’公文書(官方文件),資料(材 本紙張尺度適用令國國家標準(CNS)A4規格(210 X 297公t > — II I) I I I J--i — — —— —— ^ius I (請先Μ讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 20 A: B7 經濟部智慧財產局員工消費合作.社印1 五、發明說明(18 ) 料).ί反处分(假處分>,地裁(地方法院卜決定(決定} 其後分類標準轉換部段π以和第—實施例之分類 標準轉換部段22中實施的處理相同方式把分類標準7.至9. 轉換成在檢索部段14中可能的檢索條件: 在元成把分類標準轉換成檢索條件後,第二實施例 之文件檢索及分類系統實施如第一實施例中揭露的相同處 理:. 如上解說的,第2圖中顯示的檢索結果(亦即,整個集 合之經檢索文件)可根據係表達使用者輪入的分類之立場 的句子,辨認經檢索文件之參考編號、或經檢索文件之本 質部份的分類標準4至6(或7至9)來分類成多個子集合。因 此’對於使用者變得可能以多種方式實施分類。例如,使 用者可在分類經檢索文件上彈性和選擇性使用複雜的立場 或簡化的立場; 如上述的,根據本發明之第二實施例,當使用者輸 入供用作各分類之分類標準的任意句子時,從句子抽取關 鍵字,一組所抽取關鍵字被轉換成檢索條件,且在經轉換 檢索條件和由或檢索操作拾取的經檢索文件間之相似度被 1V. Description of the invention (ί6) Keyword of the child (This poor word group is about the selection of the key words of each sentence. It is expected to check in advance the frequency (or degree) of the magic half in the document storage section 15 > and then select keywords based on the TFiDF .. block weighting method or the like: in, for example, Figure F and Information Science No. 26 (1988.) by Um_, based on the block weighting principle based on frequency of occurrence information, There is a method for weighting such words, and, for example, another method for selecting keywords. It is better to consider words such as katakana and hiragana when extracting words from a sentence (that is, characters for Japanese documents). Differences in font types such as Chinese characters, Chinese characters, etc. are effective for detecting new or combined characters that are not registered in the dictionary. Needless to say 'It is better to combine the above two methods appropriately. According to the second embodiment' reference dictionary data To select the keywords for each sentence. Now suppose the following words are extracted from the above classification criteria 4 to 6, classification criteria 4. mouth / (rice), market (.market), government (government), rice cloth policy (rice price Policy) points Criterion 5 ': North Korea (North Korea), China (China), Rice (uS) Classification Standard 6: Korea (Korea). Japanese < Japan), Rice Army (US Army) Question (Question) Subsequent Classification Standard Conversion Department Segment 13 converts the classification criteria 4 * to 6 into the retrieval segment in the same manner as the processing performed in the classification criterion conversion segment 22 of the first to fourth embodiments. Φ 4 possible search conditions ... Further users Possibly after viewing the retrieval statement shown in Figure 2 Φ :: r-疋 slowly cooling t .5 ,. 之 之 文 # 数 a as a classification criterion T.Tr -——_____ ___________ ”... Λ.; CN 丨 S); Vi Άϋ 楼), 0 I 1— tr nnnnn 1— I nn < 1 n—OJ_ I »ni (Please read the note on the back before filling this page) Wisdom of the Ministry of Economic Affairs Printed by the Bureau of Property, Industrial and Consumer Cooperation, A7 Λ 69 386 ___ Β7 ____ 5. Description of Invention (17) 〇 Classification Criteria 7: 1, 2 Classification Criteria 8: 4, 5 Classification Criteria 9: 9 In response to the input of such a reference number, The document retrieval and classification system of the second embodiment implements the following processing. The key detection section 12 reads from the file storage section μ by reference No. (that is, the 'classification criteria') of the text of the document specified, and extracting the keywords involved in the specified document ^ The keyword extraction can be implemented in the same manner as in the above example of extracting keywords from sentences. Alternatively, the comparison It is better to fetch keywords from each file in advance, and store the picked keywords along with the corresponding files in the file storage section 15. In this case, the keyword detection section 丨 2 refers to the designated file (that is, the 'classification' Standard) The reference number Ray reads the keywords stored in the file storage section 15. It is now assumed that the following blocks are extracted from the above-mentioned classification criteria 7 to 9. Classification criteria 7 ':: 7〆 (rice), reserve (inventory), food (food), rice 価 (rice price), Nonghyup (peasant association), production (product), farmhouse (farmer)' rice production (rice yield) 'Consumer (Consumer), Rice (Rice) Taxonomy 8': North Korea (North Korea), Talks (Conference), South Korea (Korea) 'Agreement (Talks), Rice (US),' Mihan (US-Korea) Question (Question) 'North-South (North-South), Korean Peninsula (Korean Peninsula), Rice Army (US Army) Classification Criteria 9': Okinawa (Okinawa), Rice (US), Lianbang (Federal), Investigation (Investigation) , Return (return) 'public documents (official documents), information (materials, paper size, applicable national standard (CNS) A4 specifications (210 X 297 male t > — II I) III J--i — — —— —— iusius I (please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 20 A: B7 Consumption cooperation of the employees of the Intellectual Property Bureau of the Ministry of Economic Affairs. 18) material). Anti-disposition (False sanctions>, local rulings (District Court decision (decision)) The classification criterion conversion section π subsequently classifies the classification criteria 7. to 9. in the same manner as the processing implemented in the classification criterion conversion section 22 of the first embodiment. Conversion into possible search conditions in the retrieval section 14: After Yuancheng converts the classification criteria into search conditions, the document retrieval and classification system of the second embodiment implements the same processing as disclosed in the first embodiment: as explained above The search results shown in Figure 2 (that is, the retrieved documents of the entire collection) can identify the reference number of the retrieved document, or the essence of the retrieved document according to the sentence that expresses the position of the user's turn classification. Part of the classification criteria 4 to 6 (or 7 to 9) to classify into multiple sub-sets. So 'it becomes possible for users to implement classification in multiple ways. For example, users can be flexible and selective on the sorted and retrieved documents Use a complex position or a simplified position; as described above, according to the second embodiment of the present invention, when a user inputs an arbitrary sentence for use as a classification criterion for each classification, a clause Extracted keyword, the extracted keyword set is converted into a search condition, and the similarity between the converted file retrieved by the retrieval condition and the retrieval operation or in the pickup 1 is

It异· 根據本發明之第二實施例.使用者可直接輸八屬於 想要襴位的一任意句+作為分類標準;這使得可能表達複 雜的分類立.場因此可用I*重層面彈性實施分類之立場 的設定. 4含赍權冬.發圮:ϋ 貫泡#;.使用者在由檢索 1 >· fi· ;-i n-i -f ^ U , ... ,. , IEJ L- . 11., A t m ".· & ai 1 :;; Λ; ; .: ·: .:;j·' : - U- : - ,It .^1 n I n n n 一*DJ. n .^1 IT n I 線 <請先閲讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 469 38 6 A7 B7 五、發明說明(19) 操作拾取的經檢索文件中指定多個文件,經指定文件供用 作各分類之分類標準。然後從所抽取文件抽出關鍵字。一 組所抽取關鍵字被轉換成檢索條件。同時,在經轉換檢索 條件和由該檢索操作拾取的經檢索文件間之相似度被計算 0 在使用者已證實由檢索操作拾取的經檢索文件後, 使用者可選擇經檢索文件本身或其部份來表達分類之立場 。因此,可易於實施分類之立場的設定。 第三實施例 第三實施例提供一種文件檢索及分類系統,其特徵 在於分類標準被自動決定且經檢索文件被自動分類。 第7圖顯示依據本發明之第三實施例的文件檢索及分 類系統之結構配置的功能方塊圖。 在第7圖中顯示的文件檢索及分類系統中,一輸入/輸 出部段71允許使用者輸入檢索條件和分類標準,並也輸出 檢索結果和分類結果。一文件儲存部段76儲存文件。一檢 索部段75計算經檢索文件和檢索條件間的相似度。一檢索 結果儲存部段77儲存如經檢索文件的檢索結果。一關鍵字 檢測部段72從儲存在檢索結果儲存部段77中的經檢索文件 檢出關鍵字。一自動關鍵字分類部段73把一組檢出關鍵字 分類成多個叢集各叢集中的經分類關鍵字供用作分類標 準。_分類標準轉換部段74接收從關鍵字分類部段73送出 的關鍵字(亦即,分類標準),並把所輸入關鍵字轉換成檢 索部段75中可能的檢索條件。一檢索結果分類部段78參考 本紙張尺度S困家標準(CNS)A4規格(210 X 297公爱〉 22 — I J — 1,1 — I — I —— — — — — I—-^y ·1111111 (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印农 A7 _____B7 ---- —--- 五、發明說明(2G ) 由檢索部a 7 ’具得的相似度依據分類標準把經檢索文件分 類。 此後將解說依據第三實施例的文件檢索及分頬處理 之細節。 首先’使闬者把檢索條件輸入到輸入:,輸出部段71 — 像第一實施例地現在假設由邏輯表示式丨界定的檢索條件 被輸入=同時,第2圖中顯示的檢索結果被獲得。 第二貫铯例與上述第一和第二實施例不同在於無需 依賴由使用者輸八分類標準而自動決定分類標準。 此後將詳細解說依據第三實施例的自動分類。首 先,關鍵字檢測部段72檢出儲存在檢索結果儲存部段77中 的各經檢索文件之關鍵字。在本發明之上述第二實施例中 揭露有關鍵字抽取之細節也可能使用在已公開日本專利 申清案第9-176822號中揭露的關鍵字抽取方法。 其次,自動關鍵字分類部段73把_組檢出的關鍵字 分類成多個子集合。關於自動關鍵字分類,使用下列方法 V; 現在假設文件儲存部段76儲存總數η個文件D1至Dn ‘ 其中總數m個字組W 1至W m分別出現 在此惰彤中下列第η級内量Vj可針對各字組Wj而引It is different. According to the second embodiment of the present invention, the user can directly input an arbitrary sentence + which belongs to the desired position + as a classification criterion; this makes it possible to express complex classifications. The field can therefore be implemented with I * multiple levels of flexibility. Setting of the classification position. 4 Contains the right winter. Hairpin: ϋ 泡 泡 # ;. User search 1 > · fi ·; -i ni -f ^ U, ...,., IEJ L- ., A tm ". · &Amp; ai 1:;; Λ;;.: ·:.:; J · ':-U-:-, It. ^ 1 n I nnn-* DJ. N. ^ 1 IT n I line < Please read the notes on the back before filling out this page) Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 469 38 6 A7 B7 V. Description of the invention (19) Designated in the retrieved documents picked up by operation Multiple documents, designated documents for use as classification criteria for each classification. Then extract keywords from the extracted files. A set of extracted keywords is converted into search conditions. At the same time, after the similarity between the transformed search condition and the retrieved document picked up by the retrieval operation is calculated, after the user has confirmed that the retrieved document picked up by the retrieval operation, the user can choose the retrieved document itself or a part thereof Share to express the position of classification. Therefore, setting of the classification stand can be easily performed. Third Embodiment A third embodiment provides a document retrieval and classification system, which is characterized in that classification criteria are automatically determined and the retrieved documents are automatically classified. Fig. 7 is a functional block diagram showing the configuration of a document retrieval and classification system according to a third embodiment of the present invention. In the document retrieval and classification system shown in Fig. 7, an input / output section 71 allows a user to input retrieval conditions and classification criteria, and also outputs retrieval results and classification results. A document storage section 76 stores documents. A search section 75 calculates the similarity between the retrieved documents and the retrieval conditions. A retrieval result storage section 77 stores retrieval results such as retrieved documents. A keyword detection section 72 detects keywords from the retrieved files stored in the search result storage section 77. An automatic keyword classification section 73 classifies a set of detected keywords into a plurality of clusters and the classified keywords in each cluster are used as classification criteria. The _classification criterion conversion section 74 receives the keywords (i.e., classification criteria) sent from the keyword classification section 73 and converts the input keywords into possible search conditions in the retrieval section 75. A search result classification section 78 refers to the paper size S Standard (CNS) A4 specification (210 X 297 Public Love) 22 — IJ — 1,1 — I — I — — — — — I —- ^ y · 1111111 (Please read the notes on the back before filling out this page) Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs, Consumer Cooperatives, Ainong A7 _____B7 ---- ------- V. Description of Invention (2G) The similarity classifies the retrieved documents according to the classification criteria. The details of the document retrieval and segmentation processing according to the third embodiment will be explained later. First, the 'enter' enters the retrieval conditions into the input :, and the output section 71-like the first In the embodiment, it is now assumed that the search condition defined by the logical expression 丨 is inputted = At the same time, the search results shown in Fig. 2 are obtained. The second embodiment is different from the first and second embodiments described above in that it does not need to rely on The user automatically determines the classification criteria by inputting eight classification criteria. The automatic classification according to the third embodiment will be explained in detail later. First, the keyword detection section 72 detects the relationship of each retrieved document stored in the retrieval result storage section 77. Keyword. The details of keyword extraction disclosed in the above-mentioned second embodiment of the present invention may also use the keyword extraction method disclosed in Published Japanese Patent Application No. 9-176822. Second, automatic keyword classification Section 73 classifies the keywords detected by the _ group into multiple sub-sets. For automatic keyword classification, the following method V is used; Now suppose that the file storage section 76 stores a total of n files D1 to Dn 'where a total of m groups of words W 1 to W m appear in this inertia respectively. The following n-th order internal quantities Vj can be cited for each block Wj.

Vj-iei.e2„e3 ..eni 、式4顯不各rsj量元素e i (r: L .、η !之計真 ;r I Mi Λ j ί \ i〇g(n ; W) ij ! 4 : 張< I,適爭π is爵家槔,NS)A4規格— --------------裝i (靖先閱讀背面之沒$項再填寫本頁} 訂-_ -·線. 經濟部智慧財產局8工消费合作社印數 9 38 6 A7 B7 五、發明說明(2i ) 其中TFi(Wj)代表文件Di中字組Wj之出現頻率(或程度) '且DF(Wj)代表其中字組Wj出現的文件數目。 較佳地’向量Vj被標稱化使得其長度變為1。 以此方式,可針對爪個字組分別獲得向量乂丨至Vm。 其次’考慮多個字組群組G1至Gp。各字組組包含在 ~特定欄位中經常出現的特定字組。各字組群組可手動產 生、或藉由利用一字典或大尺度文件中的字組之發生分佈 而自動產生。 在此情形中,下列第η級向量VGk可針對各字組群組 Gk而引入。 VGk=(e’l,e’2,e’3, 下式5顯示各向量元素e’i(i=i,...,n)之計算。 e’i=TFi(Gj)Xlog(n/DF(Gj)) (5) 其中TFi(Gj)代表文件Di中屬於群組Gj的字組之出現 頻率(或程度)、且DF(Gj)代表其中屬於群組Dj的任何字组 出現的文件數目。 較佳地,向量VGk被標稱化使得其長度變為1。 以此方式,可針對p個字組分別獲得向量VG1至VGp 〇 可由向量Vj和向量VGk之一内積來獲得各字組Wj和 字組群組Gk間的相似度Sjk。 使用上述向量和相似度計算使得容易實現關鍵字之 自動分類。例如,現在假設在下面攔位中經常使用的有三 個字組群組G1、G2和G3 本纸張尺度適用中國國家標準(CNS)A4規格(210 χ 297公釐) 24 I I j I I I .1 — LI I I I · I I I I I I I ·11111111 (請先IHtl背面之注項再填寫本頁) A: B: 經濟部智慧財產局員工消費合作社印f 五、發明說明(22) G1 :用於自動車輛的円燃機引擎 G 2 :飛機事故 G3 :網際網路 檢索部段75檢索關於··引擎•的文件。然後,關鍵字檢 測部段72抽取下列關鍵字。Vj-iei.e2 „e3 ..eni, Equation 4 shows that each rsj quantity element ei (r: L., Η! Is true; r I Mi Λ j ί \ i〇g (n; W) ij! 4 : Zhang < I, Competitive π is JJ's House, NS) A4 Specifications--------------- Install i (Jing first read the $ items on the back before filling in this page} Order -_-· line. Intellectual Property Bureau of the Ministry of Economic Affairs, 8 Industrial Consumer Cooperatives, 9 9 6 6 A7 B7 V. Description of Invention (2i) where TFi (Wj) represents the frequency (or degree) of the word Wj in the file Di 'and DF (Wj) represents the number of files in which the word group Wj appears. Preferably, the vector Vj is normalized so that its length becomes 1. In this way, vectors 乂 to Vm can be obtained for each word group of the claw. Secondly, ' Consider multiple block groups G1 to Gp. Each block group contains a specific block group that often appears in a specific field. Each block group can be generated manually, or by using a dictionary or large-scale file. The occurrence distribution of the word group is automatically generated. In this case, the following n-th level vector VGk can be introduced for each word group group Gk. VGk = (e'l, e'2, e'3, shown in Equation 5 below Calculation of each vector element e'i (i = i, ..., n). E'i = TFi ( Gj) Xlog (n / DF (Gj)) (5) where TFi (Gj) represents the frequency (or degree) of the word group belonging to the group Gj in the file Di, and DF (Gj) represents the group of Dj The number of files in any block. Preferably, the vector VGk is normalized so that its length becomes 1. In this way, vectors VG1 to VGp can be obtained for p blocks, respectively. One of the vectors Vj and VGk can be obtained. The inner product is used to obtain the similarity Sjk between each word group Wj and the word group Gk. Using the above vector and similarity calculation makes it easy to realize automatic classification of keywords. For example, now suppose that there are three words often used in the following blocks. Groups G1, G2, and G3 This paper size applies to the Chinese National Standard (CNS) A4 (210 χ 297 mm) 24 II j III .1 — LI III · IIIIIII · 11111111 (please note the note on the back of IHtl before (Fill in this page) A: B: Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs f. 5. Description of the invention (22) G1: Beacon engine for automatic vehicles G 2: Aircraft accident G3: Internet retrieval section 75 Search for files related to the engine. Then, the keyword detection section 72 extracts the following keywords.

办V U > (汽油)、事故(事故)、W Ww、燃費(燃料消 耗)、檢索(檢索)、爆発(爆炸)、空港(機場)' URL 對於個別字組群組G 1 . G2和G3的各字組之相似度以 下列方式來計算: S(事故(事故):产(0.2.0.6,0.3) S(WWW) = (〇.1.0.2.0.8) S(燃費(燃料消耗))气0.7.0丄0.2 > S(檢索(檢索))=(0.0.0.2,0.6) S(爆势(爆炸))=(0.4.0.6,0.1) S(空港(機場)) = (0.0,0.9,0,2) S(URL) = (0.L0.0,0.9) 然後關注各關鍵字屬於具有最高相似度的字組群組 •有時·以下%方式把所有柚出的關鍵字分類成個別字绂 群組Gl、G2和G3 = G1:々y _丨 > (汽油卜燃費(燃料消耗> G 2事故〖事故丨爆势{爆炸;空港(機場) G 3 索檢(檢索卜[丨Η1 如此it 3動關鍵ΐ類套;段 ' 獲浔的關鍵字群组被 (請先閱讀背面之注意事項再填寫本頁) --裝--------訂---------線------- 469 386 A7 B7 經濟部智慧財產局興工消费合作杜印製 五、發明說明(23 ) 輸入到分類標準轉換部段74中。 當字組群組G之數目很大(例如,1〇〇)時,或當供用作 分類標準的關鍵字群組之數目需要縮減(例如,2)時,自 動關鍵字分類部段7 3以下列方式操作。 *lSt步驟一針對各字組群組〇採用經分類關鍵字之加 權值的總和,且然後把所獲得總和視為此字組群組之分數 〇 - *2nd步驟一連續考慮分數之最高值來選擇預定數目之 群組。 根據上例, G1 之分數:0.8+0.7=1.5 G2之分數:0.6+0.6+0.9=2.1 G3之分數:0.8 + 0.6 + 0.9=2.3 據此,當關鍵字群組之數目需要縮減到2時,自動關 鍵字分類部段73考慮各字組群組之分數來選擇群組〇2和 G3。 當自動關鍵字分類部段73實施上述處理時,從經檢 索文件抽出的一組關鍵字可自動分類成多個群組》根據上 例,獲得下列分類標準》 分類標準10 :方V 〇 > (汽油)’燃費(燃料消耗) 分類標準11 :事故(事故),爆発(爆炸)’空港(機場) 分類標準12 : WWW,檢索(檢索)’ URL 其後,分類標準轉換部段74以和第一實施例之分類 標準轉換部段22中實施的處理之相同方式把分類標準1〇至 (請先閱讀背面之注意»項#埃寫本頁) 訂- --線」 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公t) 26 A. B7 經濟部智慧財產局員X.消費合作社印1 五、發明說明(24 ) 12轉換成檢索部段中可能的檢索條件。 在完成分類標準之轉換成檢索條件後,第三實施例 之文繼及分類系統實施如第—實施例中揭露 理, ^如上解說的.第三實施例自動判定經檢索文件中經 當出現的字組之攔位。經檢出攔位被規為分類標準。因此 ‘變得可能根據檢索結果之本質來實施文件分類。換言之 第三實铯例提供簡化的文件分類 由自動關鍵字㈣料73獲得關鍵字群組可透過 輸八.,輸出部段Μ示予制者_次。使用者可修正或校 正指出的關鏈字群組。然後,分類標準轉換部段74把經修 正關鍵字群組卩‘經修正分類標準)㈣錢索條件。 々此々*員處理使得叮能瓖使用者知道未由使用者想到的分 類立場。結果,第三實施例有效輔助文件分類工作。 如上述的,本發明之第三實施例提供自動文件檢索 及分類方法和系統、依據由使用者輸入的檢索條件在文件 之資料庫上實施一檢索操作,以如想要地拾取經檢索文件 -攸由該檢索操作拾取的經檢索文件抽取關鍵字。所抽出 關鍵+破分類成f個叢集:屬於各叢集的一組抽出的關鍵 卞被轉換成檢索條件:在從該抽出之關鍵字所致的經轉換 檢索條件和由該檢索操作拾取的經檢索文件間之相似度被 ^具同時:f老該相妃度來計算由該檢索操作拾取之各 二文休貧t於士 v.類的屬饴、籍屺把各經檢索文件分類 規格 : 111IIIIII1— — · * J I I 1 I I I 睡 — — — — — — — (請先閲讀背面之注意事項再填寫本頁) Λ 6 9 3 Β 6 Α7 ___ Β7 五、發明說明(25 根據本發明之第三實施例’變得可能不依賴由使用 者輪入的分類標準來自動抽取檢索結果内在的分類之立場 。使用者可自動獲得未被想到的分類立場。無需特別努力 。結果’變得可能有效輔助文件分類工作。 本發明可以幾個形式來實施而不偏離其基本特性之 精神。如所述的這些實施例因此意圖只是說明性而非限制 性的,-因此本發明之範疇係由所附申請專利範圍、而非由 其前面的描述來界定。落入申請專利範圍之邊界和限度、 或如此邊界和限度之等效者内的所有改變因此意圃由申請 專利範圍所包括。 元件標號對照Office VU > (gasoline), accident (accident), W Ww, fuel (fuel consumption), search (retrieval), explosion (explosion), airport (airport) 'URL for individual block groups G 1. G2 and G3 The similarity of each word group is calculated in the following way: S (accident (accident): production (0.2.0.6, 0.3) S (WWW) = (〇.1.0.2.0.8) S (fuel cost (fuel consumption)) Gas 0.7.0 丄 0.2 > S (Retrieval (Retrieval)) = (0.0.0.2,0.6) S (Explosion (Explosion)) = (0.4.0.6,0.1) S (Airport (Airport)) = (0.0, 0.9,0,2) S (URL) = (0.L0.0,0.9) Then pay attention to each keyword belongs to the group with the highest similarity G1, G2, and G3 = G1: 々y _ 丨 > (Gasoline fuel cost (fuel consumption) G 2 accident [accident 丨 explosion potential {explosion; airport (airport) G 3 claim inspection (retrieval Bu [丨 Η1 so it 3 sets of action key sets; paragraphs' won the keyword group was (please read the precautions on the back before filling this page) --install -------- order-- ------- Line ------- 469 386 A7 B7 DuPont Printing 2. Description of invention (23) is input into the classification criterion conversion section 74. When the number of the word group G is large (for example, 100), or when the number of keyword groups used as classification criteria needs to be reduced (For example, 2), the automatic keyword classification section 73 operates in the following manner. * LSt step 1 uses the sum of the weighted values of the classified keywords for each block group 0, and then treats the obtained sum as The score of this word group is 0- * 2nd. Step 1 continuously considers the highest value of the score to select a predetermined number of groups. According to the above example, the score of G1: 0.8 + 0.7 = 1.5 The score of G2: 0.6 + 0.6 + 0.9 = 2.1 G3 score: 0.8 + 0.6 + 0.9 = 2.3 According to this, when the number of keyword groups needs to be reduced to 2, the automatic keyword classification section 73 considers the score of each word group to select a group. 2 And G3. When the automatic keyword classification section 73 implements the above processing, a group of keywords extracted from the retrieved files can be automatically classified into multiple groups. According to the above example, the following classification criteria are obtained. Classification criteria 10: Fang V 〇 > (Gasoline) 'Fuel (fuel consumption) classification Standard 11: Accident (accident), explosion (explosion) 'airport (airport) classification standard 12: WWW, search (retrieval)' URL Thereafter, the classification standard conversion section 74 is the same as the classification standard conversion section of the first embodiment. In the same way as the treatment implemented in 22, the classification standard is 10 to (please read the note on the back first »Item # Establish this page) Order----" This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297g t) 26 A. B7 Member of the Intellectual Property Bureau of the Ministry of Economic Affairs X. Consumption Cooperative Press 1 V. Description of Invention (24) 12 Translate into possible search conditions in the search section. After completing the conversion of the classification criteria into the search conditions, the text and classification system of the third embodiment is implemented as disclosed in the first embodiment, ^ as explained above. The third embodiment automatically determines the proper occurrences in the retrieved documents. Block of words. Detected stops were classified as classification criteria. So ‘it becomes possible to implement document classification based on the nature of the search results. In other words, the third example provides simplified document classification. The keyword group obtained by the automatic keyword data 73 can be entered by the input section 8. The output section M is shown to the producer_times. The user can modify or correct the group of related keywords indicated. Then, the classification criterion conversion section 74 sets the modified keyword group 卩 'the modified classification criterion' to the money condition. This process allows the user to know that the user is not aware of the classification position. As a result, the third embodiment effectively assists the document classification work. As mentioned above, the third embodiment of the present invention provides an automatic document retrieval and classification method and system for implementing a retrieval operation on a database of documents in accordance with retrieval conditions input by a user to pick up retrieved documents as desired- The keywords are extracted from the retrieved files picked up by the retrieval operation. The extracted keys + are classified into f clusters: a group of extracted keys belonging to each cluster are converted into search conditions: the converted search conditions caused by the extracted keywords and the retrieved by the retrieval operation The similarity between the documents is simultaneously: f is the relative degree to calculate the two disparities that are picked up by the retrieval operation, and the genus and membership of the class v. Classifies each retrieved document into a specification: 111IIIIII1 — — · * JII 1 III Sleep — — — — — — — (Please read the notes on the back before filling in this page) Λ 6 9 3 Β 6 Α7 ___ Β7 V. Description of the invention (25 According to the third implementation of the invention Example 'It becomes possible not to rely on the classification criteria turned on by the user to automatically extract the classification position inherent in the search results. Users can automatically obtain unintended classification positions. No special effort is required. Results' becomes possible to effectively assist the document Classified work. The invention can be implemented in several forms without departing from the spirit of its basic characteristics. The embodiments as described are therefore intended to be illustrative only and not restrictive, and thus the invention The scope is defined by the scope of the attached patent application, not by its previous description. All changes that fall within the boundaries and limits of the scope of the patent application, or equivalents of such boundaries and limits, are therefore intended to be governed by the scope of the patent application. Included: Component number comparison

經濟部智慧財產局員工消費合作杜印I 11,21,71…輸入/輸出部段 13,22,74…分類標準轉換部段 15,24,76…文件儲存部段 17,26,78…檢索結果分類部段 12,72…關鍵字檢測部段 14,23,75…檢索部段 16,2^77…檢索結果儲存部段 73人‘自動關鍵字分類部段 ϋ ^1- I n n n « n n 1* 9 (請先閱讀背面之注意事項δ寫本頁> 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 28Employees' Cooperative Cooperation of Intellectual Property Bureau of the Ministry of Economic Affairs Du Yin I 11,21,71… Input / output section 13,22,74… Classification standard conversion section 15,24,76… File storage section 17,26,78… Retrieval Result classification section 12,72 ... Keyword detection section 14,23,75 ... Search section 16,2 ^ 77 ... Search result storage section 73 people 'Automatic keyword classification sectionϋ ^ 1- I nnn «nn 1 * 9 (Please read the precautions on the back first. Write this page> This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 28

Claims (1)

A8 B8 C8 D8 4 6 9 38 6 六、申請專利範圍 1 ‘一種文件檢索及分類系統,包含: 輸入/輸出裝置(21、11),用來允許一使用者輸入 檢索條件和分類標準: 檢索裝置(23、14),用來依據包含任意字組或任 意子元串的該等檢索條件對一文件資料庫實施—檢索 操作’並用來計算在由該檢索操作拾取的經檢索文件 和該等檢索條件間之相似度; 檢索結果儲存裝置(25、16),用來儲存由該檢索 操作拾取的該等經檢索文件; 分類標準轉換裝置(22、13),用來把該等分類標 準轉換成檢索條件,該等分類標準被表達為一組任意 字組或任意字元串;及 檢索結果分類裝置(26、17),用來依據多個分類 標準把由該檢索操作拾取的該等經檢索文件分類。 2.依據申請專利範圍第1項的文件檢索及分類系統,其中: 該檢索裝置(23)響應於由該使用者經由該輸入/輸 出裝置(21)輸入的該等檢索條件,並依據由使用者輪 入的該等檢索條件對文件資料庫執行該檢索操作, 該檢索結果儲存裝置(25)儲存由該檢索裝置之該 檢索操作拾取的該等經檢索文件, 該分類標準轉換裝置(22)饗應於由該使用者經由 該輸入/輸出裝置輸入的多個分類標準,並產生由該等 輸入的分類標準所生的經轉換檢索條件, 邊檢索裝置(2 3)計算在該等經轉換檢索條件和由 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) <請先閲讀背面之注意事項再填窝本頁) i τ 經濟部智慧財產局員工消費合作社印製 29 經濟部智慧財產局員Η消費合竹社"1_ __ Si 申請專利範圍 該檢素操作拾取且健存在該檢索結果儲存裝置中的該 等經檢索文件間之相似度,及 μ k索果刀類裝置(26丨參考由該檢索裝置算得 的該相似度來計算由該檢索操作拾取之各個經檢索文 件針對各分類標準的屬性,藉此實施一文件分類動作 c 3. 依據申請專利範圍第】項的文件檢索及分類參統,其中: 邊輸入輸出裝置(2 η允許該使用者輸人各包含一 組任意子組或任意字元串的多個分類標準,及 該分類標準轉換裝置(22)把該組任意字組或任意 字串轉換成檢索條件。 4, 依據申請專利範圍第丨項的文件檢索及分類系統,其更 包含: 關鍵字檢測裝置(12).用來從一任意句子或文件 抽取關鍵字, 其中該關鍵字檢測裝置(12)響應於由該使用者經 由該輸入/輸出裝置輸入的一任意句子所表達之多個分 類標準。並從該等輪入的句子抽取關鍵字 > 及 A 77類標準轉換裝置i丨3)把—組所抽取關鍵字轉 換成檢索條件。 依據申請專利範圍第1項的文件檢索及分類系統、其中- 該輸\ ..餘出裝置(n :!允許該使闬者指定供用作多 馁分類標伞的多闼文件.該等經指定文件係從由該該 待t.換作洽取的該等經檢索又件Φ選出 t Jv v j — — — — — ——I—— — — 1 11 — 1 — 11 *lliJ — — ·· (請先閱讀背面之注意事項再填寫本頁) ,,-:ϊ: · -------- ----------------------如〜__一 考申!t ^^家蟫推ws,A i規格; 釐. 469 386 A8 B8 C8 D8 六、申請專利範圍 該關鍵字檢測裝置(12)從該等經指定文件抽取關 鍵字,及 該分類標準轉換裝置(13)把一組所抽取的關鍵字 ,換成檢索條件。 6,一種文件檢索及分類系統,包含: 輸入/輸出裝置(71),用來允許_使用者輸入檢索 條件; 檢索裝置(75),用來依據包含任意字組或任意字 元串的該等檢索條件對一文件資料庫實施一檢索操作 ’並用來计算在由該檢索操作拾取的經檢索文件和該 等檢索條件間之相似度; 檢索結果储存裝置(77) ’用來儲存由該檢索操作 拾取的該等經檢索文件; 關鍵字檢測裝置(72),用來從由該檢索操作拾取 的該等經檢索文件柚取關鍵字; 自動關鍵字分類裝置(73) ’用來把該等所抽取關 鍵字自動分類成多個叢集; 分類標準轉換裝置(74),用來把分類標準轉換成 檢索條件,各個該等分類標準係分類成各叢集的一組 關鍵字;及 檢索結果分類裝置(78),用來依據該等分類標準 把由該檢索操作拾取的一組該等經檢索文件分類。 7.依據申請專利範圍第6項的文件檢索及分類系統,其中: 該檢索裝置(75)響應於由該使用者經由該輸入/輸 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) — — — — — — — — — — — — — — — — — I) 11111111 I / {請先閲讀背面之注意事項再填寫本頁) 經濟部智慧財產局貝工消費合作社印?衣 31A8 B8 C8 D8 4 6 9 38 6 6. Scope of patent application 1 'A file retrieval and classification system, including: input / output devices (21, 11), used to allow a user to enter search conditions and classification criteria: search device (23, 14) for performing a search operation on a document database based on such search conditions containing arbitrary words or arbitrary substrings, and for calculating the retrieved files and such searches picked up by the search operation Similarity between conditions; search result storage means (25, 16) for storing the retrieved documents picked up by the search operation; classification standard conversion means (22, 13) for converting the classification criteria into Search conditions, the classification criteria being expressed as a set of arbitrary words or arbitrary character strings; and a retrieval result classification device (26, 17) for retrieving the retrieved items picked up by the retrieval operation according to multiple classification criteria File classification. 2. A document retrieval and classification system according to item 1 of the scope of patent application, wherein: the retrieval device (23) is responsive to the retrieval conditions entered by the user via the input / output device (21), and used in accordance with The search conditions entered by the reporter perform the search operation on the document database, the search result storage device (25) stores the searched files picked up by the search operation of the search device, and the classification standard conversion device (22)飨 Should be based on a plurality of classification criteria input by the user via the input / output device, and generate transformed search conditions generated by the input classification criteria, while the retrieval device (2 3) calculates the converted criteria The search conditions and Chinese paper standard (CNS) A4 specifications (210 X 297 mm) are applied to this paper size < Please read the notes on the back before filling in this page) i τ Printed by the Consumers ’Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 29 Member of the Intellectual Property Bureau of the Ministry of Economic Affairs, “Consumer Hezhushe” " 1_ __ Si Application for Patent Scope This inspection operation picks up and stores such information in the search result storage device. The similarity between the retrieved files and the μ k cable cutter device (26 丨 refers to the similarity calculated by the retrieval device to calculate the attributes of each retrieved file picked up by the retrieval operation for each classification criterion, thereby implementing A file classification action c 3. Document retrieval and classification parameters according to item [Scope of patent application], where: edge input output device (2 η allows the user to enter each of which contains a set of arbitrary subgroups or arbitrary character strings Multiple classification standards, and the classification standard conversion device (22) converts the set of arbitrary words or strings into search conditions. 4. The document retrieval and classification system according to item 丨 of the scope of patent application, which further includes: Word detection device (12) for extracting keywords from an arbitrary sentence or file, wherein the keyword detection device (12) is responsive to a plurality of expressions expressed by an arbitrary sentence input by the user via the input / output device. Classification criteria, and extract keywords from these rounded sentences & A 77 type standard conversion device i 3) to convert the extracted keywords of the group into search conditions. According to the document retrieval and classification system of the first patent application scope, among which-the input \ .. surplus output device (n :! allows the messenger to designate multiple documents for use as multi-class classification umbrellas. These are designated The document is to select t Jv vj from the searched and retrieved Φ which should be replaced by the waiting t. — — — — — ——I—— — — 1 11 — 1 — 11 * lliJ — — · (( Please read the notes on the back before filling this page) ,,-:-: · -------- ---------------------- such as ~ __A test application! T ^^ Jia 蟫 push ws, A i specifications; .. 469 386 A8 B8 C8 D8 VI. Patent application scope The keyword detection device (12) extracts keywords from these designated documents, And the classification standard conversion device (13) replaces a set of extracted keywords with search conditions. 6. A document retrieval and classification system, comprising: an input / output device (71) for allowing a user to enter a search condition; a retrieval device (75) for using the The search condition performs a search operation on a document database and is used to calculate the similarity between the retrieved file picked up by the search operation and the search conditions; the search result storage device (77) is used to store the search operation by the search operation. The retrieved documents picked up; keyword detection means (72) for extracting keywords from the retrieved documents picked up by the retrieval operation; automatic keyword sorting means (73) 'for The extracted keywords are automatically classified into multiple clusters; a classification standard conversion device (74) is used to convert the classification standards into search conditions, and each of these classification standards is classified into a set of keywords of each cluster; and a search result classification device ( 78) for classifying a set of the retrieved documents picked up by the retrieval operation according to the classification criteria. 7. A document retrieval and classification system according to item 6 of the scope of patent application, wherein: the retrieval device (75) responds to the input / output of the paper size by the user via the Chinese National Standard (CNS) A4 specification (210 X 297 mm) — — — — — — — — — — — — — — — I) 11111111 I / {Please read the notes on the back before filling out this page) Printed by the Shelley Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs? Clothing 31 六、申請專利範圍 經濟部智慧財產局員X消費合怍杜印类 出裝置Π1)輸入的嗲笙 3寻檢索條f牛,並依據由該使用者 輪人的“檢索條件對該文件資料庫執行該檢索操作 該檢索結果儲存裝置(77)儲存由該檢索裝置之該 檢索插作拾取的該等經檢索文件, *該關鍵字檢測裝置(72)從由該檢索操作拾取的該 等經檢索文件柚取該等關鍵字, 該自動關鍵字分類裝置(73)把該等所抽取關鍵车 自動分類成該等多個叢集, * 該分類標準轉換裝置(74)產生由各係分類成各叢 集的-組關鍵字之該等分類標準所生的經轉換檢 件, ’…、 i該檢索裝置(75)計算在該等經轉換檢余條件和由 忒檢索操作拾取且儲存在該檢索結果儲存裝置中的玆 等經檢索文件間之相似度,及 該檢索結果分類裝置(78)參考由該檢索裝置算得 的該相似度,來計算由該檢旁操作拾取之各個經檢^ 文件針對各分類標準的屬性’藉此實施一文件分類動 作 —種文件洽索及分類方法、包含-τ列步驟: ,依據由-使用者輸入的檢索條件對—文件資料庫 實矩—殮索操作.以拾取想要之經檢旁文件 使用者響應於由該檢索操作拾取的該等經 撿玄文伴來輪多個分類之妒類榡準. ---------------------訂-------- I (tt先閱讀背面之注意事項再填寫本頁J A8B8C8D8 469 38 6 六、申請專利範圍 把該等分類標準轉換成檢索條件; 計算在從該等分類標準所生的該等經轉換檢索條 件和由該檢索操作拾取的該等經檢索文件間之相似度 :及 參考該相似度來計算由該檢索操作拾取之各個經 檢索文件針對各分類的屬性,藉此把各個經檢索文件 分類到具有最高屬性的一類別。 9. 依據申請專利範圍第8項的文件檢索及分類方法,其中: 當該使用者針對各類別之該分類標準輸入一組任 意字組或任意字元串時, 所輸入任意字組或任意字元串被轉換成檢索條件 ,及 在該等經轉換檢索條件和由該檢索操作拾取的該 等經檢索文件間的相似度被計算。 10. 依據申請專利範圍第8項的文件檢索及分類方法,其中: 當該使用者輸入供用作各類別之該分類標準的一 任意句子時, 關鍵字從該句子柚取出, 一組抽出的關鍵字被轉換成檢索條件,及 在該等經轉換檢索條件和由該檢索操作拾取的該 等經檢索文件間的相似度被計算。 11_依據申請專利範圍第8項的文件檢索及分類方法,其中: 當該使用者指定由該檢索操作拾取的該等經檢索 文件中的多個文件,用作各類別之分類標準時, 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) — Kill — — I I I I I t ·11 -- - — — II I J. (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 33 A8 B8cs D8 六、申請專利範圍 關鍵字從該等所袖取文件抽取出, 4出的關鍵字被轉換成檢索條件及 在該等經轉換檢索條件和由該檢索操作拾取的該 寺經檢索文件間的相似度被計曾, -種文件檢索及分類方法,包:下列步驟: 依據由t用者輪入的檢索條件對一文件資料庫 實施:檢索操作'以拾取想要之經檢索文件: 故由A k索钿作拾取的該等經檢索文件抽取關鍵 字; 把該等所柚取關鍵字分類成多個叢集; 把屬於各叢集的-組該等所抽取關鍵字轉換成檢 索條件; 。十异在彳心戎等所柚取關鍵字所生的該等經轉換檢 索條件和由該檢索操作拾取的該等經檢索文件間之相 似度:及 參考該相似度來計算由該檢索操作拾取之各個經 檢索文件針對各類別的屬性,藉此把各經檢索文件分 類成具有最高屬性的一類別= ---------------^--------^---------^ (讀先閱讀背面之注意事項再填寫本頁) ___ 經濟部智慧財產局員Η消費合作,-fic_ 16. Scope of patent application: Member of the Intellectual Property Bureau of the Ministry of Economic Affairs, X Consumption, Printed Device, Type 1) Enter the search bar f3, and execute the operation on the document database according to the search conditions of the user ’s turn. The retrieval operation, the retrieval result storage device (77) stores the retrieved files picked up by the retrieval device of the retrieval device, * the keyword detection device (72) retrieves the retrieved files from the retrieval operation, Pomelo takes these keywords, and the automatic keyword classification device (73) automatically classifies the extracted key cars into the multiple clusters. * The classification standard conversion device (74) generates classifications that are classified into clusters by each department. -The converted test pieces generated by the classification criteria of the group of keywords, '..., i, the search device (75) calculates the converted check conditions and is retrieved by a search operation and stored in the search result storage device; The similarity between the retrieved documents and the retrieval result classification device (78) refers to the similarity calculated by the retrieval device to calculate each of the inspections picked up by the inspection operation ^ A file classification action based on the attributes of each classification criterion is used to implement a document classification action—a kind of document consultation and classification method, including a step of -τ column:, according to the search conditions entered by the user pair—the actual moment of the document database—the search Operation. The user who picks up the desired documents to be scanned responds to the selected jealousy companions picked up by the retrieval operation to turn multiple categories of jealousy criteria. ----------- ---------- Order -------- I (ttPlease read the notes on the back before filling in this page J A8B8C8D8 469 38 6 6. Apply for a patent scope and convert these classification criteria into a search Conditions; Calculate the similarity between the transformed search conditions generated from the classification criteria and the retrieved documents picked up by the retrieval operation: and refer to the similarity to calculate the various retrieved documents picked up by the retrieval operation The attributes of the retrieved documents for each category are used to classify each retrieved document into a category with the highest attributes. 9. The document retrieval and classification method according to item 8 of the scope of patent application, where: The taxonomy inputs a set of any When a word group or any character string is entered, any input word group or any character string is converted into a search condition, and the similarity between the converted search condition and the searched documents picked up by the search operation is changed. Calculate 10. According to the method of document retrieval and classification according to item 8 of the scope of patent application, wherein: when the user enters an arbitrary sentence for use as the classification criterion for each category, the keywords are taken out from the sentence and a group is extracted Keywords are converted into search conditions, and the similarity between the converted search conditions and the retrieved documents picked up by the retrieval operation is calculated. 11_ Document retrieval and classification according to item 8 of the scope of patent application Method, wherein: when the user designates a plurality of documents among the retrieved documents picked up by the retrieval operation to be used as classification criteria for each category, this paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 Mm) — Kill — — IIIII t · 11--— — II I J. (Please read the notes on the back before filling out this page) Member of the Intellectual Property Bureau, Ministry of Economic Affairs Printed by Consumer Cooperatives 33 A8 B8cs D8 VI. The keywords of the patent application scope are extracted from these documents, and the 4 keywords are converted into search conditions and the converted search conditions and those picked up by the search operation The similarity between the retrieved documents in this temple is counted.-A method of document retrieval and classification, including: The following steps: Implementation of a document database based on the search conditions rotated by the user: search operation 'to pick the desired The retrieved documents: Therefore, the retrieved documents picked up by Ak are used to extract keywords; the extracted keywords are classified into multiple clusters; the ones belonging to each cluster-group of the extracted keywords Into search conditions; The similarity between the transformed search conditions generated by Shiyi Rong Xinrong and other keywords and the retrieved files picked up by the search operation: and the similarity is used to calculate the pick up by the search operation Each retrieved document has attributes for each category, thereby classifying each retrieved document into a category with the highest attributes = --------------- ^ ------- -^ --------- ^ (Read the precautions on the back before filling out this page) ___ Member of the Intellectual Property Bureau of the Ministry of Economic Affairs, Consumer Cooperation, -fic_ 1
TW89117245A 1999-02-26 2000-08-25 Document retrieval and classification method and apparatus TW469386B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP05080399A JP3693514B2 (en) 1999-02-26 1999-02-26 Document retrieval / classification method and apparatus

Publications (1)

Publication Number Publication Date
TW469386B true TW469386B (en) 2001-12-21

Family

ID=12868946

Family Applications (1)

Application Number Title Priority Date Filing Date
TW89117245A TW469386B (en) 1999-02-26 2000-08-25 Document retrieval and classification method and apparatus

Country Status (2)

Country Link
JP (1) JP3693514B2 (en)
TW (1) TW469386B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4142881B2 (en) 2002-03-07 2008-09-03 富士通株式会社 Document similarity calculation device, clustering device, and document extraction device
JP2003281161A (en) * 2002-03-19 2003-10-03 Seiko Epson Corp Information classification method, information classification device, program and record medium
US7428530B2 (en) 2004-07-01 2008-09-23 Microsoft Corporation Dispersing search engine results by using page category information
JP4536445B2 (en) * 2004-07-26 2010-09-01 三菱電機株式会社 Data classification device
JP4857448B2 (en) * 2006-03-10 2012-01-18 独立行政法人情報通信研究機構 Information retrieval apparatus and program using multiple meanings
US20090094210A1 (en) * 2007-10-05 2009-04-09 Fujitsu Limited Intelligently sorted search results
JP5751318B2 (en) * 2012-12-10 2015-07-22 キヤノンマーケティングジャパン株式会社 Document classification apparatus, document classification method, and program
CN107766371B (en) * 2016-08-19 2023-11-17 中兴通讯股份有限公司 Text information classification method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0554037A (en) * 1991-08-28 1993-03-05 Fujitsu Ltd Document classifying system
JP3577819B2 (en) * 1995-07-14 2004-10-20 富士ゼロックス株式会社 Information search apparatus and information search method

Also Published As

Publication number Publication date
JP3693514B2 (en) 2005-09-07
JP2000250925A (en) 2000-09-14

Similar Documents

Publication Publication Date Title
US9323827B2 (en) Identifying key terms related to similar passages
Lewis et al. QDA Miner 2.0: Mixed-model qualitative data analysis software
Qin et al. Ranking with multiple hyperplanes
US8214363B2 (en) Recognizing domain specific entities in search queries
US8386453B2 (en) Providing search information relating to a document
US8364668B2 (en) User interfaces for a document search engine
JP6381775B2 (en) Information processing system and information processing method
JP2009517750A (en) Information retrieval
Sarkar et al. Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks
JP2002537604A (en) Document similarity search
CN109918555A (en) Method, apparatus, equipment and the medium suggested for providing search
TW469386B (en) Document retrieval and classification method and apparatus
CN113239681B (en) Court case file identification method
Saenko et al. Filtering abstract senses from image search results
Soni Text Classification Feature extraction using SVM
CN112183063A (en) Medical literature similarity discrimination method combining biological information body and attention mechanism
Denda Fugitive literature in the cross hairs: An examination of bibliographic control and access
JP4497337B2 (en) Concept search device and recording medium recording computer program
Aref Mining publication papers via text mining Evaluation and Results
Fahmi Examining learning algorithms for text classification in digital libraries
Thakur et al. Design Of Boolean Retrival Model For Information Extraction
Win et al. Relevancy Prediction of Scientific Articles Using Similarity Measures and Citation Mention Rate
Gunawan et al. Observing the Performance of the TextRank Algorithm on Automatic Text Summarization for Bahasa Indonesia.
Yoon Improving recall of browsing sets in image retrieval from a semiotics perspective
JP2005338992A (en) Document search device and program

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees