JP4656330B2

JP4656330B2 - Synonym integration system

Info

Publication number: JP4656330B2
Application number: JP2006338632A
Authority: JP
Inventors: 喜博宇野
Original assignee: 一般財団法人工業所有権協力センター
Priority date: 2006-12-15
Filing date: 2006-12-15
Publication date: 2011-03-23
Anticipated expiration: 2026-12-15
Also published as: JP2008152454A

Description

この発明は、テキスト・データベースに対して、検索論理式を用いて検索するに際し、論理式のＯＲ結合された語群を抽出した複数の類義語群から類義語辞書を作成・統合する、類義語辞書統合システムに関するものである。 The present invention provides a synonym dictionary integration system that creates and integrates synonym dictionaries from a plurality of synonym groups obtained by extracting logically ORed word groups when searching a text database using a search logical expression. It is about.

文字即ちテキストを含む資料の集合体である「テキスト・データベース」を検索して、検索者が必要としている資料を見つけ出す検索システムが用いられている。
この検索システムの構成例を図１に示す。図１において、サーバ等に構築したテキスト・データベース・システム２０に対して、インターネットやＬＡＮ等の電気通信回線２７を介して、液晶等の表示装置２５，キーボード・マウス等の入力装置２６を備えた、検索端末２３から、検索式等を入力して、データベースから目的の文献等の検索を行う。この検索システムでは、検索者がテキスト検索式を作成・入力し、当該検索式が示す内容と一致する、文字配列を持つ資料が抽出されるのである。サーバに構築される「テキスト・データベース」としては、特許公報、新聞や雑誌記事、学会論文、インターネット上のテキスト資料、その他テキストからなる資料、これらの要約などの２次資料、或いはこれらに関連して付与してある見出し語やタグである。図、画像、動画、音、などからなる資料に付与してある、テキストによる、解説、見出し語、タグ、などの集合である。 2. Description of the Related Art A search system is used that searches a “text database” that is a collection of materials including characters, that is, texts, and finds materials required by a searcher.
A configuration example of this search system is shown in FIG. In FIG. 1, a text database system 20 built on a server or the like is provided with a display device 25 such as a liquid crystal and an input device 26 such as a keyboard / mouse via an electric communication line 27 such as the Internet or a LAN. Then, a search expression or the like is input from the search terminal 23, and a target document or the like is searched from the database. In this search system, a searcher creates and inputs a text search formula, and a material having a character arrangement that matches the content indicated by the search formula is extracted. “Text database” built on the server includes patent gazettes, newspapers and magazine articles, academic papers, text materials on the Internet, other text materials, secondary materials such as summaries, etc. These are the headwords and tags assigned. It is a set of textual explanations, headwords, tags, etc., attached to materials consisting of figures, images, videos, sounds, etc.

テキストによる検索では、例えば、「銀塩フィルムを使用しないで、電子的に撮影するカメラ」の概念を表す、「デジタルカメラ」をテキスト検索式に用いると、コンピュータは「デジタルカメラ」と全く同じ文字列を含む資料しか抽出しない。しかしながら、それぞれの人が記述する文章の表現は千差万別である。そのほかに、「電子カメラ」、「ＣＣＤカメラ」、「静止画カメラ」、「電子スチルカメラ」、「デジカメ」、その他の表現が多数ある。この様な、概念が殆ど同じと見なせる語の関係を「類義語」と呼び、その語群を「類義語群」と呼び、「類義語群」を集めて使用の便利を計ったものが「類義語辞書」である。 In text search, for example, if you use “digital camera” in a text search expression, which represents the concept of “camera that electronically shoots without using silver halide film”, the computer will have exactly the same characters as “digital camera”. Extract only materials that contain columns. However, the expressions of the sentences described by each person are quite different. In addition, there are many expressions such as “electronic camera”, “CCD camera”, “still image camera”, “electronic still camera”, “digital camera”. The relationship between words that can be regarded as having almost the same concept is called a "synonym", the group of words is called a "synonym group", and the "synonym dictionary" is a collection of "synonym groups" for convenient use. It is.

従って、検索式に「デジタルカメラ」のみを使用すると、「デジタルカメラ」の意味で他の表現が使用された資料は抽出されないが、類義語をＯＲ結合で用いることによって、即ち、検索式を
デジタルカメラ＋電子カメラ＋ＣＣＤカメラ＋静止画カメラ＋電子スチルカメラ＋デジカメ −−（１）
とすることで抽出漏れを減らすことが可能となる。
類義語辞書の作成やその後の修正維持は、人手で行った場合膨大な労力を必要とするので、コンピュータを用いる試みが行われている。
特許文献１には、類義語辞書の作成に際して、語の収集、分類にコンピュータを用いて自動作成する試みが記載されている。検索対象文書から、形態素解析を用いて文を構造解析し、得られた単語間の関係から単語群を纏める例、連想を用いる例、意味解析を行う例、共起関係と出現頻度から閾値を超えた語を辞書登録する例が示されている。
特許文献２〜５においては、検索者が入力した検索式やログデータを解析して、類義語辞書に蓄積する構成が示されている。 Therefore, if only “digital camera” is used in the search expression, materials in which other expressions are used in the meaning of “digital camera” are not extracted, but by using synonyms in an OR combination, that is, the search expression is digital camera. + Electronic camera + CCD camera + Still image camera + Electronic still camera + Digital camera-(1)
It becomes possible to reduce extraction omission.
The creation of a synonym dictionary and the subsequent maintenance of corrections require a great deal of labor when done manually, so attempts have been made to use computers.
Patent Document 1 describes an attempt to automatically create a synonym dictionary using a computer for collecting and classifying words. From the search target document, structural analysis of the sentence using morphological analysis, examples of collecting word groups from the obtained relationships between words, examples of using associations, examples of performing semantic analysis, thresholds from co-occurrence relationships and appearance frequency An example of registering a word exceeding the dictionary is shown.
Patent Documents 2 to 5 show a configuration in which a search expression or log data input by a searcher is analyzed and stored in a synonym dictionary.

特開平８−１６１３４３号公報JP-A-8-161343 特開平６−３１４２９６号公報JP-A-6-314296 特開平１０−３２０４１９号公報Japanese Patent Laid-Open No. 10-320419 特開平１１−３１２１６８号公報JP 11-31168 A 特開平９−３１９７６７号公報Japanese Patent Laid-Open No. 9-319767

従来の技術で示した特許文献に記載されている、形態素解析、構造解析、連想、意味解析、共起関係の利用等の自然言語処理を使用することによる類義語の集積は、未だ人間が使用する言葉の多様さ、複雑さを理解するには至っておらず、そこから生成される類義語辞書は実用性において不十分なものである。
又、検索者が入力した検索式やログデータを解析して、類義語辞書に蓄積する構成のものは、語群の整理統合の問題に関して取り扱っていない。
本発明の目的は、検索式から抽出した類義語群から、類義語辞書を作成する類義語辞書統合システムを提供するものである。 Accumulation of synonyms by using natural language processing such as morphological analysis, structural analysis, association, semantic analysis, use of co-occurrence relationships, etc. described in the patent documents shown in the prior art is still used by humans We have not yet understood the diversity and complexity of words, and synonym dictionaries generated from them are not practical enough.
Also, a configuration in which a search expression or log data input by a searcher is analyzed and stored in a synonym dictionary is not dealt with regarding the problem of organizing and integrating word groups.
An object of the present invention is to provide a synonym dictionary integration system that creates a synonym dictionary from synonym groups extracted from a search expression.

上述の発明の目的を達成するために、本発明は、検索時に入力された検索式のＯＲ結合語を行とする集合の類義語群から、類義語辞書を作成する類義語統合システムであって、前記類義語群の行の各語に対応する語頻度を１とし、行を保存して類義語辞書に取り込む処理を行う類義語群取り込み手段と、類義語辞書の１行を主行とし、他の１行を副行とする主行副行決定手段と、主行と副行とを比較して、所定の条件を満足するかを調べ、統合処理を行うかを決定する行比較手段と、前記行比較手段で、比較した結果、所定の条件が満足している場合、副行の語で主行と同一の語は、主行の対応する語の語頻度に１を加算し、副行の主行に無い語は主行に追加して、副行を主行に統合するとともに、主行内の語を語頻度の順に並び換える行統合手段とを備えることを特徴とする。
さらに、類義語辞書の行統合処理が終了後、類義語辞書内の行の先頭語を用いて、一定順序で並び換えを行う行並び換え手段を備えることもできる。
前記類義語辞書に統合済みの類義語部分があった場合、類義語群取り込み手段は、取り入れた類義語部分を統合済みの類義語部分と区別し、主行副行決定手段は、前記統合済みの類義語部分から主行とし、副行は新規に取り込んだ類義語群の部分とするとよい。また、前記行比較手段の所定の条件は、主行の比較対象語数を制限し、副行との一致した語数が所定数以上とすることができる。前記行統合手段は、さらに統合した副行を削除することもできる。
前記主行副行決定手段は、予め主行の比較対象語が少なくとも２語含まれる副行のリストを作成し、そのリスト中から副行を選択するとよい。
さらに、同義語辞書を有し、前記類義語群取り込み手段で類義語辞書に取り組む前に、前記同義語辞書で、同義語同士を同じ語にまとめる同義語処理を行う同義語処理手段を備えることもできる。
上述の類義語統合システムの各機能をコンピュータ・システムに実現させるためのプログラムやこのプログラムを記録した記録媒体も本発明である。 In order to achieve the above-mentioned object, the present invention provides a synonym integration system for creating a synonym dictionary from a set of synonym groups whose rows are OR combination words of a search expression inputted at the time of search, and the synonym The synonym group importing means for processing the word frequency corresponding to each word of the group line to 1, storing the line and importing it into the synonym dictionary, one line of the synonym dictionary as a main line, and the other line as a sub line A main line sub-row determination means, a main line and a sub-line are compared, a line comparison means for determining whether a predetermined condition is satisfied and determining whether to perform integration processing, and the line comparison means, As a result of comparison, when a predetermined condition is satisfied, a word that is the same as the main row in the sub-row is added to the word frequency of the corresponding word in the main row, and a word that is not in the main row of the sub-row Is added to the main line, sub-lines are integrated into the main line, and words in the main line are rearranged in order of word frequency. Characterized in that it comprises a means.
Furthermore, after the synonym dictionary line integration processing is completed, it is possible to provide line rearranging means for rearranging in a fixed order using the first word of the line in the synonym dictionary.
When there is an integrated synonym part in the synonym dictionary, the synonym group capturing means distinguishes the imported synonym part from the integrated synonym part, and the main sub-determining means determines the main synonym part from the integrated synonym part. The sub-line should be the part of the newly imported synonym group. Further, the predetermined condition of the row comparison means can limit the number of comparison target words in the main row, and the number of words that match the sub-row can be a predetermined number or more. The row integration means can also delete the integrated sub rows.
The main row / sub row determination means may create a sub row list including at least two words to be compared in the main row in advance and select a sub row from the list.
Further, the synonym dictionary may be provided with synonym processing means for performing synonym processing for synthesizing synonyms into the same word before working on the synonym dictionary with the synonym group capturing means. .
A program for causing a computer system to realize each function of the above synonym integration system and a recording medium on which the program is recorded are also the present invention.

上述の本発明の構成により、類義語群を整理することで、重複が大幅に削減され、見易く、分かりやすい類義語辞書が作成可能である。しかも、再処理を重ねることによって、辞書の質は増大していく。
本発明を用いることで、殆どの場合その場限りで捨てられている、検索者が必要とする状況毎に考え思い付いた貴重な語の組み合わせ、即ち類義語群を、有効に利用することが可能となった。 By arranging the synonym groups according to the configuration of the present invention described above, duplication is greatly reduced, and a synonym dictionary that is easy to see and understand can be created. Moreover, the quality of the dictionary increases with repeated reprocessing.
By using the present invention, it is possible to effectively use a combination of precious words, that is, a synonym group that has been conceived in each situation required by a searcher, which is discarded in most cases. became.

Embodiment of the Invention

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

検索者が入力した検索式から、ＯＲ結合された語群は、類義語とすることができると考えられるので、この類義語群を検索式から抽出して収集する。この収集した類義語群から類義語辞書を作成することを考える。さて、この抽出した類義語群には、次の様な課題を持っている。
１）行数が多い。類義語群には、収集場所や期間にもよるが例えば１０００万行以上のものが考えられる。この中には同じ概念を表した行が重複存在する。
この様に重複が多い状況下では、類義語群を表示して検索に使用する際、大変に見づらい。
また、同じ概念の行を探し出し統合していくには、多くのコンピュータ資源を必要とする。少ない計算量で目的を達成する手順の開発が求められる。 Since it is considered that the OR group-connected word group can be used as a synonym from the search expression input by the searcher, the synonym group is extracted from the search expression and collected. Consider creating a synonym dictionary from this collected synonym group. Now, this extracted synonym group has the following problems.
1) There are many lines. The synonym group may be, for example, 10 million lines or more, depending on the collection location and period. There are duplicate rows that represent the same concept.
In such a situation where there are many duplications, it is very difficult to see the synonym group for display.
In addition, finding and integrating lines with the same concept requires a lot of computer resources. Development of a procedure that achieves the objective with a small amount of calculation is required.

２）検索者が検索したい項目は千差万別であり、検索者の表現方法も多様であるため、同じ概念の語群と言っても各種のものがあり、どれとどれが同じ概念かを判断するのは困難が伴う。
「記憶」の類義語を例に説明する。検索式中に、この「記憶」の語を含んでＯＲ結合して使用された例は多く、このＯＲ結合を抽出した類義語群中の「記憶」の類義語群の中には統合処理のための判断を困難にする事例が存在する。それらを以下に示す。 2) There are many different items that the searcher wants to search, and there are various ways of expressing the searcher, so there are various types of words with the same concept, and what is the same concept? It is difficult to judge.
A synonym of “memory” will be described as an example. In the search formula, there are many examples that are used in an OR combination including the word “memory”, and the synonym group of “memory” in the synonym group from which the OR combination is extracted is used for integration processing. There are cases that make judgment difficult. They are shown below.

ａ）全く同じ語の組み合わせ、語順であるもの（この場合は問題なく統合できる）
蓄積，保存，記憶，メモリ，格納，記録 −−−−（２）
蓄積，保存，記憶，メモリ，格納，記録 −−−−（３） a) Combinations of exactly the same words, in word order (in this case, they can be integrated without problems)
Accumulation, storage, storage, memory, storage, recording ---- (2)
Accumulation, preservation, storage, memory, storage, recording ---- (3)

ｂ）語の組み合わせは語群（２）と同じだが語順が異なるもの
メモリ，保存，記録，格納，記憶，蓄積 −−−−（４） b) Word combinations that are the same as word group (2) but with different word order Memory, storage, recording, storage, storage, accumulation ---- (4)

ｃ）いくつかの語は語群（２）と同じものが含まれるがいくつかの語は異なるもの
記憶，蓄積，保存，メモリ，取り込み，書き込み−−−−（５） c) Some words contain the same word group (2) but some words are different Memory, accumulation, storage, memory, import, write ---- (5)

ｄ）語群（２）の一部の語しか含まないもの
記憶，蓄積，保存 −−−−−（６） d) Memory, storage, storage of only a part of words in the word group (2) ----- (6)

ｅ）類義語と言えなくはないが、概念が僅かに離れている語を含むもの
記憶，メモリ，蓄積，格納，登録，ファイル −−−−−（７） e) Things that are not synonyms, but contain words whose concepts are slightly separated Memory, memory, storage, storage, registration, file ----- (7)

ｆ）常識的に考えて、いくつかの語は類義語と考えられるがいくつかの語は類義語とは考えられないものであるが、検索時の必要性からＯＲ結合で用いられたもの
記憶，蓄積，保存，ナレーション，会話，音声 −−−− （８） f) Considering common sense, some words are considered synonyms, but some words are not considered synonyms. , Preservation, narration, conversation, voice ---- (8)

ｇ）使用される場面で概念を異にし、語群を異にするもの。
（以下の例では、語順は多様であるが理解をし易くするため、「記憶」を先頭に配置してある）
記憶，メモリ、ＲＡＭ，ＲＯＭ，ＨＤ，ノンボラ， −−−−（９）
記憶，メモリ，固定長，可変長 −−−−（１０）
記憶，記念，思い出，でき事 −−−−−（１１）
記憶，レジスタ，バッファ，ラッチ −−−−−（１２）
記憶，リセット，プリセット，書き込み，読み出し −−−−（１３） g) Different concepts and different word groups when used.
(In the following example, the word order is diverse, but “memory” is placed at the top for easy understanding.)
Memory, memory, RAM, ROM, HD, non-bola, ---- (9)
Memory, memory, fixed length, variable length ---- (10)
Memory, Memorial, Memories, Events (11)
Memory, register, buffer, latch ----- (12)
Memory, reset, preset, write, read ---- (13)

ｈ）他の概念のもの。この例はＡＤ変換の概念の行である。
記憶，ＡＤ変換，Ａ／Ｄ変換，Ａ・Ｄ変換 −−−−（１４）
ｉ）上記ｂ）〜ｈ）において、語の配列順序は多様である。
これらの例は「記憶」を含む語群を例として説明したが行の中のどの語に着目するかも又課題である。ある行とある行とが同じ概念のものであるか、否かを判断する適切な手法の開発が求められる。 h) Other concepts. This example is the AD conversion concept line.
Memory, AD conversion, A / D conversion, A / D conversion ---- (14)
i) In the above b) to h), the word arrangement order is various.
In these examples, a group of words including “memory” has been described as an example, but it is also a problem which word to focus on in a row. It is necessary to develop an appropriate method for determining whether or not a certain line and a certain line have the same concept.

３）同義語の扱い。
「ディスプレイ，デスプレイ，デイスプレイ，デスプレー，ディスプレー，デイスプレー，表示」のような語の組は、同義語又は異表記と呼ばれる。文字配列が異なるが意味的には１００％同じ概念の語である。同義語は類義語の中に含まれている特殊な例である。全く同じ概念であるにもかかわらず表示行の中で多数のセルを占め表示画面を見にくくする。
また、次の例の場合は後述する統合処理において統合されるべき行であるにもかかわらず、行間で異なる語数が多いので統合すべきでないと誤った判断がなされる。
ディスプレイ，デイスプレイ，ディスプレー，表示，掲示 −−−（１５）
デスプレイ，デスプレー，デイスプレー，表示，警告 −−−（１６）
この問題の解決も求められている。 3) Handling synonyms.
A set of words such as “display, display, display, display, display, display, display” is called a synonym or notation. Although the character arrangement is different, the words are conceptually 100% identical in concept. Synonyms are special examples that are included in synonyms. Despite the exact same concept, it occupies a large number of cells in the display row and makes the display screen difficult to see.
Further, in the case of the following example, although it is a line to be integrated in the integration process described later, since there are many different words between the lines, it is erroneously determined that it should not be integrated.
Display, Display, Display, Display, Posting ---- (15)
Display, display, display, display, warning ---- (16)
There is also a need to solve this problem.

４）統合整理処理が完了した後の類義語辞書は見やすく使用し易いものでなければならない。どの行が重要か、どの語が重要か、類義語検索式を作成するとき、どの語を使用すべきかが分かり易いことが求められる。 4) The synonym dictionary after completion of the integrated organization process should be easy to see and use. When creating a synonym search expression, it is required that it is easy to understand which line is important, which word is important, and which word should be used.

５）統合処理を行うには、少なくとも２行を比較して、統合すべきか否かを判定しなければならない。類義語辞書のサイズが大きくなると、２行の組み合わせ数は、類義語辞書の持つ行数の二乗に比例する。この判定に要する計算量だけでも膨大なものとなる。従って計算量を減らすことも求められる。
以下で説明する本発明の実施形態は、上述の課題を全て解決するものである。 5) To perform the integration process, it is necessary to compare at least two lines to determine whether or not to integrate. When the size of the synonym dictionary increases, the number of combinations of two lines is proportional to the square of the number of lines of the synonym dictionary. Even the amount of calculation required for this determination is enormous. Therefore, it is also required to reduce the calculation amount.
The embodiment of the present invention described below solves all the above-mentioned problems.

まず、本発明の１つの実施形態を図２〜図８を用いて詳しく説明する。
図２は、本発明の「類義語統合システム」のハードウェアの構成例の概略を説明するものである。
図２において、処理装置１５は、コンピュータ・システムにおける中心的な装置で、類義語統合のための処理プログラム１６をインストールして、類義語統合システムとして機能する。記憶装置１０は、処理装置１５と情報の交換ができ、類義語辞書統合システムに使用する各種の表が格納してある、或いは格納することのできるものである。記憶装置１０には、類義語統合システムに使用する、新規の類義語群１１，統合済類義語辞書１２，その他のデータ等１４が格納されている。これらの新規の類義語群１１，統合済類義語辞書１２，その他のデータ等１４等は、後で詳しく説明する。記憶装置１０内には、このほかにも必要に応じて処理の途中で使用される各種の表やデータが格納される。なお、新規の類義語群は、技術的背景で説明した検索システムの検索端末２３で入力された検索式からＯＲ結合したものをサーバ２０で選択して蓄積したものであり、まだ類義語辞書には統合前のものである。
表示装置１７は、処理結果等を表示し、入力装置１８は処理装置１５に指示を与えるためのものであり、例えばキーボードやマウスである。
なお、上述の図２に示した類義語統合システムの構成を、図１のサーバ２０内に構成してもよい。 First, one embodiment of the present invention will be described in detail with reference to FIGS.
FIG. 2 illustrates an outline of a hardware configuration example of the “synonym integration system” of the present invention.
In FIG. 2, a processing device 15 is a central device in a computer system and functions as a synonym integration system by installing a processing program 16 for synonym integration. The storage device 10 can exchange information with the processing device 15, and stores or can store various tables used in the synonym dictionary integration system. The storage device 10 stores a new synonym group 11, an integrated synonym dictionary 12, and other data 14 used for the synonym integration system. The new synonym group 11, the integrated synonym dictionary 12, other data 14, etc. will be described in detail later. In addition to the above, various types of tables and data used during the processing are stored in the storage device 10 as necessary. Note that the new synonym group is an OR combination of the search expressions input from the search terminal 23 of the search system described in the technical background, and is stored in the synonym dictionary. It is the previous one.
The display device 17 displays processing results and the like, and the input device 18 is for giving an instruction to the processing device 15, and is, for example, a keyboard or a mouse.
The configuration of the synonym integration system shown in FIG. 2 may be configured in the server 20 of FIG.

図３は、図２のシステムで行われる統合処理を示すフローチャートである。
図３のフローチャートでは、過去に統合処理が行われた「類義語辞書」に、新規に収集した類義語群を追加して、統合処理を行うことを前提に説明する。初回の統合処理においては、統合処理が行われた類義語辞書が存在しないとして処理を行えば良い。 FIG. 3 is a flowchart showing integration processing performed in the system of FIG.
The flowchart in FIG. 3 will be described on the assumption that a newly collected synonym group is added to the “synonym dictionary” that has been integrated in the past and integrated. In the first integration process, the process may be performed assuming that there is no synonym dictionary on which the integration process has been performed.

さて、類義語辞書の表示を行う際に、マトリックス状の表形式を使用すると分かりやすい。ここでは、辞書等を表形式で示し説明する。以下で使用する表形式では、１つの概念の類義語群を表の１行として表し、行の中では１語を１セルの中に記載する。行が異なれば別の類義語群であるとする。
本発明において取り扱うのは、検索論理式で用いられた式から、抽出した類義語である。類義語とは言えない語群を含む場合もあるが、検索論理式においてＯＲ結合で使用された語群は、類義語として扱う。ＯＲ結合とは、ＯＲの他、＋、空白、等「又は」の意味で結合されていた語群を集めたものである。
次に検索論理式の例を示す。
カメラ＋表示＋液晶＋撮影 −−−（１７）
（カメラ＋表示＋液晶＋撮影）＊（案内＋観光）−−（１８）
（カメラ＋表示＋液晶＋撮影）＋（案内＋観光）−−−（１９）
カメラ＋表示＋液晶＋撮影＋案内＋観光−−−（２０）
（カメラ＋表示＋液晶＋撮影）近傍式記号（案内＋観光）−−−（２１）
類義語群は上記のごとき多様な表現から、１組のＯＲ結合された部分を１群の、即ち１行の類義語として集め、配列したものである。上記の検索式で、（１８）式，（１９）式，（２１）式の場合には、２行の類義語群として収集される。
図４は、ＯＲ結合の検索式から収集された類義語群である。この類義語群は、例えば、図１に示したテキスト検索システム２０において、検索端末２３から送られた検索式を収集して作成する。この統合前類義語辞書において、行内語順は検索式に使用された順である。 Now, when displaying a synonym dictionary, it is easy to understand if a matrix-like table format is used. Here, the dictionary and the like will be described in the form of a table. In the table format used below, a synonym group of one concept is represented as one row of the table, and one word is described in one cell in the row. If the lines are different, it is assumed to be another synonym group.
In the present invention, a synonym extracted from an expression used in a search logical expression is handled. Although there may be a group of words that cannot be said to be synonyms, the group of words used in the OR combination in the search logical expression is treated as a synonym. The OR combination is a collection of words that are combined in the meaning of “or” such as +, blank, etc. in addition to OR.
The following is an example of a search logical expression.
Camera + Display + LCD + Shooting ---- (17)
(Camera + Display + LCD + Shooting) * (Guidance + Tourism)-(18)
(Camera + Display + LCD + Shooting) + (Guidance + Tourism) ---- (19)
Camera + Display + LCD + Shooting + Guidance + Sightseeing-(20)
(Camera + Display + LCD + Shooting) Neighborhood symbol (Guidance + Sightseeing) --- (21)
The synonym group is obtained by collecting and arranging a set of OR-linked portions as a group, that is, one row of synonyms from various expressions as described above. In the above search formula, in the case of formula (18), formula (19), and formula (21), it is collected as a synonym group of two lines.
FIG. 4 is a group of synonyms collected from an OR join search expression. This synonym group is created by collecting the search formulas sent from the search terminal 23 in the text search system 20 shown in FIG. In this pre-integrated synonym dictionary, the in-line word order is the order used in the search formula.

まず、類義語群を、類義語辞書に取り込む処理を行う（Ｓ１１０）。取り込み処理後の類義語辞書を図５Ａ，図５Ｂに示す。
図５Ａは、類義語を示す図であり、横一行が類義語群を示している。
図５に示す類義語辞書（図５Ａと図５Ｂとを一体とし表現するとき図５と記す。以降他の図番においてもＡ，Ｂがある場合は同じ）においては、各単語に対応する語頻度のデータが存在し、図５Ｂに示している。なお、ここでは、図５Ａ，図５Ｂとして、便宜的に２つの表として表現しているが、語と語頻度との対応が取られていれば、データ構造はここに示した表形式でなくてもよい。 First, the synonym group is taken into the synonym dictionary (S110). The synonym dictionary after the import process is shown in FIGS. 5A and 5B.
FIG. 5A is a diagram illustrating synonyms, and a horizontal line indicates a synonym group.
In the synonym dictionary shown in FIG. 5 (when FIG. 5A and FIG. 5B are expressed as a unit, it is shown as FIG. 5. Hereinafter, the word frequency corresponding to each word is the same when there are A and B in other figure numbers). Are present and are shown in FIG. 5B. 5A and 5B are expressed as two tables for the sake of convenience, but if the correspondence between words and word frequencies is taken, the data structure is not in the table format shown here. May be.

図５の１行から９行までが、以前に統合処理を行った類義語辞書の部分で、１０行〜３４行は、類義語辞書に追加した新規の類義語辞書の部分であり、図４に示したものである。
図５Ａにおいて、Ｂ列が新旧識別タグで、統合処理が終了した類義語辞書部分では０と、まだ統合処理が終了していない、新規に取り込まれた類義語辞書部分には１とする。Ａ列は行番号、Ｆ列は行内語数，Ｊ列は先頭語頻度を格納している。Ｋ列が先頭語の位置で、それ以降に、その行の類義語が配列されている。なお、これらの列のデータ表示は、処理過程が把握できるように記載しているので、必ずしも処理に必要なものではない。
図５Ｂに示した、類義語辞書の各単語に対応している語頻度は、新規に取り込まれた類義語辞書部分には、初期値として１を設定する。この語頻度は、統合処理により書き換えられ、統合された語数を示している。 Lines 1 to 9 in FIG. 5 are synonym dictionary parts that have been previously integrated, and lines 10 to 34 are parts of new synonym dictionaries added to the synonym dictionary shown in FIG. Is.
In FIG. 5A, column B is the old and new identification tags, 0 in the synonym dictionary part for which the integration process has been completed, and 1 in the newly imported synonym dictionary part for which the integration process has not yet been completed. The A column stores the row number, the F column stores the number of words in the row, and the J column stores the head word frequency. The K column is the position of the first word, and the synonyms of that row are arranged after that. In addition, since the data display of these columns is described so that the process can be grasped, it is not always necessary for the process.
The word frequency corresponding to each word in the synonym dictionary shown in FIG. 5B is set to 1 as an initial value in the newly imported synonym dictionary part. This word frequency is rewritten by the integration process and indicates the number of integrated words.

ここまでの処理が統合処理を行うための準備のための処理である。これから、行同士を統合する処理を行う。
行同士を統合する処理は、図５に示した類義語辞書の上の行から下の行へと処理を進める。さて、行同士を統合する処理は常に主行と副行の２行に対して行われ、主行とは、処理の主体となる行であり、処理相手の行（副行）とで、行同士の統合処理が行われる。
まず、主行（ｍｎ：主行番号）、副行（ｓｎ：副行番号）の初期設定を行う（Ｓ１５０）。この工程においては、最初の主行は、統合済みの類義語辞書部分の１行目（ｍｎ＝１）であり、副行はその下の新規の類義語辞書部分の１行目である１０行目（ｓｎ＝１０）である。これは、統合済みの辞書部分については統合処理を行う必要性は低いので、新規に取り入れた部分と統合済みの辞書部分との統合処理を主に行うためである。 The processing so far is the preparation for performing the integration processing. From now on, the process of integrating the rows is performed.
The process of integrating lines proceeds from the upper line to the lower line of the synonym dictionary shown in FIG. Now, the process of merging lines is always performed on two lines, the main line and the sub-line. The main line is the main line of processing, and the line (sub-line) of the processing partner A mutual integration process is performed.
First, initial setting of a main row (mn: main row number) and a sub row (sn: sub row number) is performed (S150). In this process, the first main line is the first line (mn = 1) of the integrated synonym dictionary part, and the sub-line is the tenth line (first line of the new synonym dictionary part below it) ( sn = 10). This is because the necessity of performing the integration process on the integrated dictionary portion is low, and the integration processing of the newly incorporated portion and the integrated dictionary portion is mainly performed.

そして、主行、副行の２行を比較し、統合する処理を行うかどうかの統合可否の判断を行う（Ｓ１６０）。
主行には「主行比較語数」を設定することができる。この主行比較語数をｍｇとする。副行の語と比較する主行の語は、先頭語からｍｇ個の語までである。これが「主行比較語」である。主行の語数がｍｇに満たない場合は全語が「主行比較語」となる。
あとで理解されるように、主行の語数は統合により増加する。このため、「主行比較語」を設定している。
この主行比較語数ｍｇの値は、統合処理の間一定値、例えばｍｇ＝６とすることができる。或いはこのｍｇの値は全語としても良く、あるいは主行の属性によって適応的に変化させても良い。適応的に変化させる例としては、行の語数の関数とする、先頭語の頻度の関数とする等である。 Then, the main line and the sub-line are compared, and it is determined whether or not integration processing is performed (S160).
The number of main line comparison words can be set for the main line. This number of main bank comparison words is set to mg. The main line words to be compared with the sub line words are from the first word to mg words. This is the “main bank comparison word”. When the number of words in the main line is less than mg, all the words are “main line comparison words”.
As will be seen later, the number of words in the main bank increases with integration. For this reason, “main line comparison word” is set.
The value of the main row comparison word number mg can be a constant value during the integration process, for example, mg = 6. Alternatively, this mg value may be all words, or may be adaptively changed according to the attribute of the main line. Examples of adaptive changes include a function of the number of words in a row and a function of the frequency of the first word.

副行にも、主行との比較に使用する語を制限する「副行比較語数」が設定してあり、これをｓｇとする。この副行比較語数も、副行の左端の先頭語からの語数であり、副行の語数がｓｇに満たない場合は全語とする。
この副行比較語数も、一定値（例えばｓｇ＝６）、全語、副行の属性に応じて適応的に変化させる等とすることができる。
このように、副行を制限する場合は、行内の語の順が例えば、使用頻度順のように、ある種の重要度順に並んでいる必要がある。しかしながら、新旧タグが１（新規である）の副行においては、行内の語順は無秩序であり、行が持つ概念順とは言えないので、ｓｇは行内の全語とするのが良い。なお、新規の類義語群部分の行も主行となるが、このときも、行内の全語を主行比較語とするとよい。 Also in the sub-row, a “number of sub-row comparison words” for limiting the word used for comparison with the main row is set, and this is set as sg. The number of sub-row comparison words is also the number of words from the first word at the left end of the sub-row. If the number of words in the sub-row is less than sg, all words are used.
The number of sub-row comparison words can also be changed adaptively according to a certain value (for example, sg = 6), all words, and sub-row attributes.
In this way, when restricting sub-rows, it is necessary that the order of words in the row be arranged in a certain order of importance, for example, in order of frequency of use. However, in the sub-row where the old and new tags are 1 (new), the word order in the line is disordered and cannot be said to be the conceptual order of the line, so sg is preferably all the words in the line. The new synonym group line is also the main line, but at this time, all the words in the line may be the main line comparison word.

統合後の類義語で構成する行は、１つのある概念を表現する語群であるが、本発明では、後で説明するように、語頻度を用いて行の左端の先頭語が一番当該行概念を表す度合いが大きくなる工夫をしており、順次右に行くにつれて当該行の概念を表す度合いが減少するのみならず、当該行概念でない他の概念を表す度合いが増加するように配列している。このため、あまり多くの語を統合可否の判断に使用すると判断を誤る場合が生ずることがある。これが比較の対象の語数を制限した理由である。 A line composed of synonyms after integration is a group of words expressing a certain concept. In the present invention, as described later, the first word at the left end of a line is the most concerned line using word frequency. Arranged so that the degree of representing the concept increases, and as it goes to the right sequentially, the degree of representing the concept of the row not only decreases, but also the degree of representing another concept that is not the row concept increases. Yes. For this reason, if too many words are used for determining whether or not integration is possible, the determination may be wrong. This is the reason for limiting the number of words to be compared.

統合処理を行うかどうかの統合可否判断は、主行と副行の各語同士を比較して、一致する語の数を求めて行う。統合可否の条件について、以下に詳細に説明する。ここで、比較する主行・副行それぞれの、主行比較語，副行比較語の少ない方の語数を比較語数ｈｎとする。主行比較語、副行比較語を比較して一致した語の数をｉｎとする。
なお、以下の条件において、副行の先頭語頻度が１である場合は全部としているが、これは、この行が一度も統合されたことがない行であり、行内が無秩序のままであるからである。
（判断条件例１）
ｍｇ＝６、ｓｇ：先頭語頻度１は全語、先頭語頻度２以上はｓｇ＝６とする。
統合可と判断するのは、比較語数ｈｎが４語以上の場合はｉｎは３語以上、比較語数ｈｎが３語の場合はｉｎは２語以上、比較語数ｈｎが２語の場合はｉｎは２語とする。統合可と判断した場合以外は統合不可と判断する。 Whether or not integration processing is to be performed is determined by comparing each word in the main row and sub-row to determine the number of matching words. The conditions for whether integration is possible will be described in detail below. Here, the number of words having the smaller main row comparison word and sub row comparison word for each of the main row and the sub row to be compared is set as the comparison word number hn. The number of words that are matched by comparing the main row comparison word and the sub row comparison word is defined as in.
In the following conditions, if the head word frequency of the sub-row is 1, it is all, but this is a row where this row has never been integrated, and the inside of the row remains disordered. It is.
(Judgment condition example 1)
mg = 6, sg: First word frequency 1 is all words, and first word frequency 2 or more is sg = 6.
If the comparison word number hn is 4 words or more, in is 3 words or more, if the comparison word number hn is 3 words, in is 2 words or more, and if the comparison word number hn is 2 words, in is 2 words. Unless it is determined that integration is possible, it is determined that integration is not possible.

（判断条件例２）
ｍｇ＝３，ｓｇ：先頭語頻度１は全語，先頭語頻度２以上はｓｇ＝３
統合可と判断するのは、ｉｎが２語以上
（判断条件例３）
ｍｇ＝４，ｓｇ：先頭語頻度１は全語，先頭語頻度２以上はｓｇ＝４
統合可と判断するのは、ｉｎは２語以上 (Judgment condition example 2)
mg = 3, sg: first word frequency 1 is all words, first word frequency 2 or more is sg = 3
It is determined that integration is possible because in is 2 words or more (judgment condition example 3)
mg = 4, sg: first word frequency 1 is all words, first word frequency 2 or more is sg = 4
It is judged that integration is possible.

（判断条件例４）
ｍｇ＝６，ｓｇ：先頭語頻度１は全語，先頭語頻度２以上はｓｇ＝６
統合可と判断するのは、
比較語数ｈｎが５語以上の場合：ｉｎは４語以上
比較語数ｈｎが４語の場合：ｉｎは３語以上、
比較語数ｈｎが３語の場合：ｉｎは３語
比較語数ｈｎが２語の場合：ｉｎは２語
（判断条件例５）
ｍｇ＝７，ｓｇ：先頭語頻度１は全語、先頭語頻度２以上はｓｇ＝７
統合可と判断するのは、
比較語数ｈｎが５語以上の場合：ｉｎは４語以上
比較語数ｈｎが４語の場合：ｉｎは３語以上
比較語数ｈｎが３語の場合：ｉｎは２語以上
比較語数ｈｎが２語の場合：ｉｎは２語 (Judgment condition example 4)
mg = 6, sg: head word frequency 1 is all words, head word frequency 2 or more is sg = 6
The decision to merge is
When the comparison word number hn is 5 words or more: in is 4 words or more When the comparison word number hn is 4 words: in is 3 words or more,
When the comparison word number hn is 3 words: in is 3 words When the comparison word number hn is 2 words: in is 2 words (judgment condition example 5)
mg = 7, sg: first word frequency 1 is all words, first word frequency 2 or more is sg = 7
The decision to merge is
When the comparison word number hn is 5 words or more: in is 4 words or more When the comparison word number hn is 4 words: in is 3 words or more When the comparison word number hn is 3 words: in is 2 words or more The comparison word number hn is 2 words Case: in is 2 words

上述の例の他、上記判断条件例を参考にして、類義語辞書の特性に応じて、ｍｇ，ｓｇ，ｉｎの値を決めることができる。
図３のフローチャートでは、上述の判断条件例１で判断した。
上述した判断条件例１による統合可否判断（Ｓ１６０）で、統合不可（＝否＝Ｎｏ）と判断された場合には、副行を更新する処理（Ｓ１８０）に移行する。統合可（＝可＝Ｙｅｓ）と判断された場合には、行の統合処理（Ｓ１７０）に移行する。 In addition to the above example, the values of mg, sg, and in can be determined according to the characteristics of the synonym dictionary with reference to the above example of the judgment condition.
In the flowchart of FIG. 3, the determination is performed according to the above-described determination condition example 1.
If it is determined that the integration is impossible (= No = No) in the integration possibility determination (S160) according to the above-described determination condition example 1, the process proceeds to the process of updating the sub-row (S180). If it is determined that integration is possible (= Yes = Yes), the process proceeds to line integration processing (S170).

なお、統合可否判断（Ｓ１６０）では先ず、予備統合可否判断として、副行比較語に主行比較語が存在するかをまず調べてもよい。上述した条件による統合可否判断に比べて簡便な処理であるので、この処理を最初に行うことによって、可否判断の計算量を削減することができる。主行比較語と副行比較語に所定数の同一の語が存在しない場合には、すぐに統合不可（＝否）と判断する。 In the integration possibility determination (S160), first, as the preliminary integration possibility determination, it may be first checked whether the main line comparison word exists in the sub-line comparison word. Since this process is simpler than the integration possibility determination based on the above-described conditions, the calculation amount for the possibility determination can be reduced by performing this process first. If the predetermined number of identical words does not exist in the main row comparison word and the sub row comparison word, it is immediately determined that integration is not possible (= no).

図３の行の統合処理（Ｓ１７０）は、主行と副行の全語を比較し、一致した語の副行の語頻度を主行の当該語の語頻度に加算し、主行に一致した語のない副行の語は、当該語をその頻度と共に主行の語列の右端に追加する（主行の語数が増加する）。統合された副行の語頻度は全て０とする。これで副行の全ての語は語頻度と共に主行に統合される。
統合後の主行に対して、行内語を行内の語頻度順に並べ換える。主行においては、同じ語の重複は存在しない。 The line integration processing (S170) in FIG. 3 compares all the words in the main line and the sub line, adds the word frequency of the sub line of the matched word to the word frequency of the word in the main line, and matches the main line. For words in sub-rows that do not have a word, the word is added to the right end of the word row in the main row along with its frequency (the number of words in the main row increases). The word frequencies of the integrated sub-lines are all 0. Now all the words in the sub-line are integrated into the main line along with the word frequency.
The in-line words are rearranged in the order of word frequencies in the line for the integrated main line. In the main bank, there is no duplication of the same word.

その後、「副行更新，副行は終り？」の処理（Ｓ１８０）に移行する。副行の更新および副行は最終かの処理（Ｓ１８０）においては、まず、先頭語の語頻度が０でない行まで、副行を下げる（ｓｎ＝ｓｎ＋ｎ，ｎ：正の整数)。これは、先頭語の語頻度が０である行は、主行に統合された行であるからである。
そして、副行が類義語辞書の最後まできているか調べ、最後でなければ、行の統合可否判断処理（Ｓ１６０）に戻る。１つの主行に対する副行の統合処理が類義語辞書の最後まできていれば、主行を更新および主行は最終かの処理（Ｓ１９０）へ行く。 Thereafter, the process proceeds to the process of “subrow update, subrow finished?” (S180). In the process of updating the sub-line and determining whether the sub-line is the last (S180), first, the sub-line is lowered to a line where the word frequency of the first word is not 0 (sn = sn + n, n: a positive integer). This is because the line where the word frequency of the first word is 0 is a line integrated with the main line.
Then, it is checked whether or not the sub-line is at the end of the synonym dictionary. If not, the process returns to the line integration possibility determination process (S160). If the integration processing of the sub-rows for one main row has been completed to the end of the synonym dictionary, the main row is updated and the processing of whether the main row is the last (S190).

「主行を更新，主行は終り？等」（Ｓ１９０）の処理においては、主行を、先頭語の語頻度が０でない行まで下げる（ｍｎ＝ｍｎ＋ｍ，ｍ：正の整数）。このとき、新規の類義語辞書部分も主行となる。そして、主行が類義語辞書の最後でなければ、副行を決定する。このとき、副行は、主行より下の行で、先頭語頻度が０でない，新規の類義語部分の行にある最初の行を副行とする。そして、行の統合可否判断処理（Ｓ１６０）に戻る。
主行が類義語辞書の最後の行であれば（Ｓ１９０でＹＥＳ）、行の統合処理は全ての行に対して終了しており、不要な語・行を削除する処理（Ｓ２００）へ行く。
行の統合処理が全ての行に対して終了したとき（不要な語・行の削除処理（Ｓ２００）に行く前）の類義語辞書を、図６Ａ、図６Ｂ（併せて図６）に示す。 In the processing of “update main line, end main line?” (S190), the main line is lowered to a line where the word frequency of the first word is not 0 (mn = mn + m, m: positive integer). At this time, the new synonym dictionary part also becomes the main bank. Then, if the main line is not the end of the synonym dictionary, the sub line is determined. At this time, the sub line is a line below the main line, and the first line in the line of the new synonym portion whose head word frequency is not 0 is set as the sub line. Then, the process returns to the row integration possibility determination process (S160).
If the main line is the last line of the synonym dictionary (YES in S190), the line integration process is completed for all lines, and the process goes to a process of deleting unnecessary words / lines (S200).
FIG. 6A and FIG. 6B (also FIG. 6) show synonym dictionaries when the row integration processing is completed for all rows (before going to the unnecessary word / row deletion processing (S200)).

さて、不要な語・行の削除処理（Ｓ２００）においては、先頭語の語頻度０の行を削除する。この先頭語の語頻度０の行は、主行に統合された行である。したがって、削除する必要がある。
また、新規に取り入れた類義語辞書部分のサイズが大きいと、副行を統合した主行の行内語数が非常に大きくなる。このように１行に多くの語があると、使いやすい辞書として適切ではない場合がある。このような場合先頭語からｃｎ語迄を採用し、不要である語を切り捨てる処理を行うとよい。例えばｃｎ＝３６とし、３７語目以降は削除する。行内の語を切り捨てる語範囲ｃｎに関しても、切り捨て無し、ｃｎは一定値、属性によって適応的に変化させるとしても良い。適応的に変化させる例としては、行の語数の関数とする、先頭語の頻度や語頻度の関数とする、等がある。 In the unnecessary word / line deletion process (S200), the line with the word frequency 0 of the first word is deleted. This line of word frequency 0 of the first word is a line integrated with the main line. Therefore, it is necessary to delete.
In addition, if the size of the newly introduced synonym dictionary portion is large, the number of in-line words in the main line in which the sub lines are integrated becomes very large. Thus, if there are many words in one line, it may not be suitable as an easy-to-use dictionary. In such a case, it is preferable to adopt a process from the first word to the cn word and cut off unnecessary words. For example, cn = 36, and the 37th word and after are deleted. The word range cn for cutting off the words in the line may also be changed adaptively depending on the attribute, with no cutting, cn being a constant value. Examples of adaptive changes include a function of the number of words in a row, a function of the frequency of the first word and a word frequency, and the like.

上述の先頭語の語頻度０の行の削除や、不要語の削除処理は、後で説明するように、行の統合処理を行うときに行ってもよい。このときは、この不要な語・行の削除処理（Ｓ２００）を行う必要はない。
また、先頭語の使用頻度の低い行は削除することもできる。先頭語の使用頻度が例えば３以下の行は削除する等である。なお、ここで「使用頻度」とは、作成した類似語辞書内での使用頻度等である。
上述のように、「使用頻度の少ない語」や「先頭語の使用頻度の少ない行」を削除しておくと、辞書利用時使いやすくなる。 The above-described deletion of the first word line with the word frequency 0 and the unnecessary word deletion process may be performed when the line integration process is performed, as will be described later. At this time, it is not necessary to perform the unnecessary word / line deletion process (S200).
It is also possible to delete lines where the first word is not used frequently. For example, a line whose usage frequency of the first word is 3 or less is deleted. Here, “usage frequency” refers to the frequency of use in the created similar word dictionary.
As described above, it is easier to use a dictionary by deleting “words that are less frequently used” and “lines where the first word is less frequently used”.

その後、各行を新しい先頭語が、例えば使用頻度順になるように、先頭語が同じ時には先頭語頻度順に、先頭語頻度も同じ時は行内語数順に、行内語数も同じ時は行内語が使用頻度順になるように並び換える（Ｓ２１０）。
なお、使用頻度順の表は類義語辞書から作成するのみならず、各種テキスト・データベースから作成しても良い。各種テキスト・データベースの例としては、特許公報データベース、新聞データベース、雑誌データベース、学会誌データベース、その他がある。また、上述では、頻度による重要度で、語の順番の表を作成したものを例示したが、他の基準による重要度順により語順表を用いることもできる。必ずしも、頻度順である必要はなく、ＪＩＳコード順等の一意の順番でもよい。
この処理が終了すると、統合処理の終了である。
図５に示した例に対して、統合処理が終了したときの類義語辞書を図７Ａ，図７Ｂに示す。図７に示すように、コンパクトに纏まった、行ごとに先頭に最重要である語（語頻度が行内で最大の語）を有する類義語辞書になる。 Then, in each row, the new head word is in order of frequency of use, for example, when the head word is the same, in the order of head word frequency. They are rearranged as follows (S210).
It should be noted that the usage frequency order table may be created not only from the synonym dictionary but also from various text databases. Examples of various text databases include patent gazette databases, newspaper databases, magazine databases, academic journal databases, and others. In the above description, an example of creating a word order table with importance by frequency is illustrated, but a word order table can also be used in order of importance according to other criteria. The order is not necessarily the frequency order, and may be a unique order such as a JIS code order.
When this process ends, the integration process ends.
FIG. 7A and FIG. 7B show synonym dictionaries when the integration process is completed with respect to the example shown in FIG. As shown in FIG. 7, a synonym dictionary having the most important words (words with the highest word frequency in the line) at the top of each line, compactly arranged.

本発明による、類義語辞書統合の更なる特徴は、同一の先頭語であっても異なる概念の行は別行となることである。図７Ａでは、表のサイズが小さいので特徴が出にくいが、
行番号３においては、地図、ＭＡＰ、マップ、測量、測定、座標，ＧＰＳ、・・・
行番号４においては、地図、観光、目的地、コース
行番号５においては、地図、印刷、カメラ
となっており、行番号３は「測量的な地図そのもの」、行番号４は「観光案内用の地図」、行番号５は「地図の印刷関連」の概念の行となっている。
以上述べた統合処理を、類義語群が新規に作成されるごとに、類義語辞書に取り入れて繰り返し行うことによって、類義語辞書はより良いものへと進化し続ける。
また、一旦作成した類義語辞書を対象とし、全ての行を新規として、上述の統合処理を行うことにより、類義語辞書がさらに統合されることが期待できる。このときは、異なる概念の行同士を統合しないように、統合条件を設定する必要がある。例えば、主行，副行ともに、比較語数はある程度大きい同じ値として、判断条件も厳しいものとするとよい。例えば上述の判断条件５を使用するとよい。 A further feature of synonym dictionary integration according to the present invention is that different conceptual lines become separate lines even with the same head word. In FIG. 7A, since the size of the table is small, the feature is difficult to appear,
In line number 3, map, MAP, map, survey, measurement, coordinates, GPS, ...
Line number 4 is a map, sightseeing, destination, course line number 5 is a map, printing, and camera. Line number 3 is a surveying map itself, and line number 4 is a sightseeing guide No. map ”and line number 5 are lines of the concept of“ map printing related ”.
The synonym dictionary continues to evolve to a better one by repeatedly performing the integration process described above by incorporating it into the synonym dictionary each time a new synonym group is created.
Moreover, it can be expected that the synonym dictionary is further integrated by performing the above-described integration process with the synonym dictionary once created as a target and all lines being new. At this time, it is necessary to set an integration condition so that rows of different concepts are not integrated. For example, it is preferable that the number of comparison words is the same for both the main row and the sub-row, and the judgment conditions are strict. For example, the determination condition 5 described above may be used.

図８は、図１の検索システムにおいて、上述の処理で作成した類義語辞書を使用して、検索を行うときの様子を説明するものである。まず、図１のテキスト検索システムのサーバ２０や検索端末２３に、上述の処理で作成された類義語辞書を格納して、検索端末２３から、検索できるようにする。
図８は、図１の検索端末２３における表示装置２５で表示されている、検索式作成をするための画面例である。検索式は、「検索式作成領域」３４に入力装置２６から文字を入力して作成される。
さて、図８に示した検索入力画面において、本発明で作成した類義語辞書を用いて検索式語を作成することを説明する。
まず、類義語辞書に対して検索を行うための検索語入力域３１に、検索語を入力する。ここに例えば「地図」という語を入力して、「語検索」ボタン３２を押すと、類義語辞書が検索されて、「類義語検索結果表示領域」３３に、検索結果が表示される。図８では、地図を先頭語とした類義語辞書の行が表示されている。先頭語の頻度の表示欄３６が３および１であることが例示されている。行内においては、語は頻度が高い語から低い語へと左から右に表示されている。先頭語の頻度の高い行が上の行に表示される。
検索を行うユーザは、この表示された類義語を参照して、カットアンドペーストで、「検索式作成領域」３４に必要な語をコピーして、適切な検索式を完成させることができる。その後、「資料検索ボタン」３５を押して、入力した検索式により資料を検索することができる。 FIG. 8 is a diagram for explaining how the search system of FIG. 1 performs a search using the synonym dictionary created by the above-described processing. First, the synonym dictionary created by the above-described processing is stored in the server 20 or the search terminal 23 of the text search system of FIG. 1 so that the search terminal 23 can search.
FIG. 8 is an example of a screen for creating a search expression displayed on the display device 25 in the search terminal 23 of FIG. The search formula is created by inputting characters from the input device 26 into the “search formula creation area” 34.
Now, it will be described how to create a search expression word using the synonym dictionary created in the present invention on the search input screen shown in FIG.
First, a search word is input into the search word input area 31 for searching the synonym dictionary. For example, when the word “map” is input here and the “word search” button 32 is pressed, the synonym dictionary is searched, and the search result is displayed in the “synonym search result display area” 33. In FIG. 8, a line of a synonym dictionary having a map as the first word is displayed. It is exemplified that the frequency column 36 of the first word is 3 and 1. In the row, words are displayed from left to right from words with high frequency to words with low frequency. The line with the highest first word frequency is displayed in the upper line.
The user who performs the search can refer to the displayed synonyms, copy the necessary words to the “search expression creation area” 34 by cut and paste, and complete an appropriate search expression. Thereafter, a “material search button” 35 can be pressed to search for the material according to the input search formula.

Other embodiments

＜インデックス表＞
上述したように、本発明においては、上述の実施形態では、行の統合処理を行う対象である、主行と副行の組み合わせを一種の総当り方式で取り出して、２行を比較して統合可否判断を行う。このため、統合可否判断の回数は大きく、辞書サイズが大きくなると計算量も多くなり、処理結果を得るまでの時間が多くなる。
以下に説明する処理では、インデックス表を作成し、このインデックス表を用いて、２つの行（主行と副行）の組み合わせを取り出して、行の統合処理を行っている。これにより、総当りによる処理より、処理コストを低減することができる。 <Index table>
As described above, in the present invention, in the above-described embodiment, a combination of a main row and a sub-row, which is a target of row integration processing, is extracted by a kind of brute force method, and two rows are compared and integrated. Judgment is made. For this reason, the number of times integration is determined is large, and as the dictionary size increases, the amount of calculation increases and the time until the processing result is obtained increases.
In the processing described below, an index table is created, a combination of two rows (main row and sub row) is taken out using this index table, and row integration processing is performed. Thereby, processing cost can be reduced rather than processing by round robin.

まず、インデックス表について説明する。インデックス表を図９に示す。図９に示したインデックス表は、図５に示した、新規の類義語群を取り込んだ類義語辞書に対応するインデックス表である。
「インデックス表」（図９参照）は、Ｚ列の当該語を「見出し語」に持ち、行方向に「類義語辞書」（図５）における「当該見出し語」を含む全ての行の行番号が並んだ表である。
図９のインデックス表において、Ｈ列は語番号，Ｚ列は語，Ｊ列は使用頻度である。図９のインデックス表の語番号１の「表示」に関して説明すると、Ｋ列以降に、図５Ａ「類義語辞書」の「表示」を含む行番号を全て記してある。その他の当該語に関しても、その語を含む行番号が全て記してある。これが「インデックス表」である。なお、このインデックス表は、Ｊ行に示した使用頻度順で示している。
インデックス表は統合処理ごとに１回作れば良く、これが、インデックス表を用いると処理コストが減らせる理由である。 First, the index table will be described. An index table is shown in FIG. The index table shown in FIG. 9 is an index table corresponding to the synonym dictionary shown in FIG. 5 incorporating a new synonym group.
The “index table” (see FIG. 9) has the corresponding word in the Z column as the “entry word”, and the row numbers of all the rows including “the relevant word” in the “synonym dictionary” (FIG. 5) in the row direction. It is a lined table.
In the index table of FIG. 9, the H column is the word number, the Z column is the word, and the J column is the usage frequency. The “display” of word number 1 in the index table of FIG. 9 will be described. All row numbers including “display” of “synonym dictionary” in FIG. As for other relevant words, all line numbers including the word are written. This is an “index table”. This index table is shown in order of use frequency shown in the J line.
The index table may be created once for each integration process, and this is the reason why the processing cost can be reduced by using the index table.

以下に、図３に示した統合処理のフローチャートの処理において、インデックス表を用いて、主行と副行の対を決定する手順を説明する。まず、図３のフローチャートにおいて、新規の類義語群を取り込んだ後に、インデックス表も作成する。
主行が設定される処理のとき（図３：Ｓ１５０，Ｓ１９０）に、インデックス表（図９）を用いて、その主行に対する副行の範囲を示す表（図１１参照）を作成し、その副行範囲の表内で「行の統合処理」を行う副行を選択する。 A procedure for determining a pair of a main row and a sub row using an index table in the processing of the flowchart of the integration processing shown in FIG. 3 will be described below. First, in the flowchart of FIG. 3, after taking in a new synonym group, an index table is also created.
During the process of setting the main row (FIG. 3: S150, S190), the index table (FIG. 9) is used to create a table (see FIG. 11) that shows the range of sub-rows for that main row. In the table of the sub-row range, select the sub-row to perform “row integration processing”.

図５に示す類義語辞書の第６行目を主行とする（図３のＳ１９０の処理時）場合の副行の範囲の表（図１１参照）の作成処理を例に説明する。
図５の類義語辞書の第６行目は９語から構成されているが、主行比較語数が６語（ｍｇ＝６）の場合、先頭語から６語が統合可否判断に使用される。この主行比較語が含まれている行の表（図１０参照）を、インデックス表（図９参照）からまず作成する。
図１０の表は、主行比較語を含む行の表であるので、Ｚ列に第１行の語の最初から６語を縦に語順で格納し、Ｈ列に各語の語番号（図９の語番号）を記してある。Ｋ列以降に、主行比較語各語を含む行の行番号を図９のインデックス表から求めて、図１０に示した表に格納する。主行比較語に含まれない語に関しては無視するので、６語分のみ作成する。 An example of a process of creating a sub-row range table (see FIG. 11) when the sixth line of the synonym dictionary shown in FIG. 5 is the main line (during the process of S190 in FIG. 3) will be described.
The sixth line of the synonym dictionary of FIG. 5 is composed of nine words, but when the number of main comparison words is six (mg = 6), the six words from the first word are used for determining whether or not to integrate. A table (see FIG. 10) containing the main row comparison word is first created from the index table (see FIG. 9).
Since the table in FIG. 10 is a table of rows including main row comparison words, the first six words of the first row are stored in the Z column in the word order in the Z column, and the word numbers (Fig. 9 word number). After the column K, the row numbers of the rows including the main row comparison words are obtained from the index table of FIG. 9 and stored in the table shown in FIG. Since words not included in the main comparison word are ignored, only 6 words are created.

つぎに、図１０に示した表を用いて、主行比較語をＮ語含む副行の表（副行の範囲を示す表：図１１参照）を作成する。図１１に示した副行の範囲を示す表は、図５Ａの６行目における主行比較語（図１０：「観光」〜「名所」の６語）から、２語（Ｎ＝２）含む行の行番号を示したものである。
図３のフローチャートでは、統合可否判断（Ｓ１６０）で統合するための判断条件例での一致語数ｉｎの最低値は２語であるので、副行の範囲を定めるときでも２語一致（Ｎ＝２）とした。このＮの値は統合可否判断で使用される一致語数ｉｎの内、最も小さい値とする必要がある。これで、統合可否判断を行うべき副行を少なくすることができる。
なお、副行の範囲は必ず主行（この場合、第６行目）より下の行が副行となり、主行自身（この場合、第６行）は記入されない。 Next, using the table shown in FIG. 10, a sub-row table including N main row comparison words (a table showing the range of sub-rows: see FIG. 11) is created. The table showing the range of sub-rows shown in FIG. 11 includes two words (N = 2) from the main row comparison words (FIG. 10: six words “sightseeing” to “famous place”) in the sixth row of FIG. 5A. Indicates the line number of the line.
In the flowchart of FIG. 3, since the minimum value of the number of matching words in in the determination condition example for integration in the integration possibility determination (S160) is two words, even when the sub-line range is determined, two-word matching (N = 2) ). The value of N needs to be the smallest value of the number of matching words in used in determining whether or not integration is possible. Thus, it is possible to reduce the number of sub-lines that should be determined for integration.
In the sub-line range, the line below the main line (in this case, the sixth line) is always the sub-line, and the main line itself (in this case, the sixth line) is not entered.

図３の副行を更新している処理（Ｓ１５０，Ｓ１８０，Ｓ１９０）において、この表内の行番号から副行を順次選択して、主行と統合処理を行うことができる。
図１１に示した副行の範囲を示す表は、行の統合処理において行を統合し、語の並べ換えが行われた等で、主行比較語の語種が変化した場合に、その都度作り直される。また、副行が主行に統合されると、その行は図１１の副行の範囲を示す表から削除される。
このように、インデックス表を用いて副行の選択範囲を予め絞ることによって、統合可否判断（Ｓ１６０）を行うべき副行を限定することができ、総当りによる処理を行う必要がなくなる。 In the processing (S150, S180, S190) in which the sub-rows in FIG. 3 are updated, the sub-rows can be sequentially selected from the row numbers in this table, and the integration processing with the main row can be performed.
The table showing the range of the sub-rows shown in FIG. 11 is regenerated whenever the word type of the main row comparison word is changed, for example, when the rows are merged in the row integration process and the words are rearranged. . Further, when the sub-row is integrated with the main row, the row is deleted from the table showing the range of the sub-row in FIG.
In this way, by narrowing down the selection range of sub-rows in advance using the index table, it is possible to limit the sub-rows on which the integration possibility determination (S160) should be performed, and it is not necessary to perform round-robin processing.

＜統合された副行の削除＞
行の統合処理を行う（図３：Ｓ１７０）場合に、統合された副行を削除することでも処理コスト削減を実現することができる。また、不要語の削除も行の統合処理で行うことができる。
この処理を行う場合は、図３のフローチャートにおいて、行の統合処理（Ｓ１７０）で統合された副行を削除するので、副行の更新のとき（Ｓ１８０）や主行の更新のとき（Ｓ１９０）に、先頭語の語頻度が０でない行まで下げる必要はなく、次行とすればよい。また、最終行まで統合した後に不要な行や語の削除を行う必要はない（Ｓ２００参照）。
このように、統合した副行を行の統合処理で削除した場合、最終行まで統合したとき（Ｓ２１０の前）の類義語辞書を示す図６Ａ，図６Ｂは、図１２Ａ，図１２Ｂとなる。 <Delete merged sub-row>
When performing row integration processing (FIG. 3: S170), processing cost reduction can also be realized by deleting the integrated sub-row. Unnecessary words can also be deleted by line integration processing.
When this processing is performed, the sub-row integrated in the row integration processing (S170) is deleted in the flowchart of FIG. 3, so that the sub-row is updated (S180) or the main row is updated (S190). In addition, it is not necessary to reduce to the line where the word frequency of the first word is not 0, and it may be the next line. Further, it is not necessary to delete unnecessary lines and words after integration to the last line (see S200).
6A and 6B showing the synonym dictionaries when the integrated sub-rows are deleted by the line integration processing and the final lines are integrated (before S210) are shown in FIGS. 12A and 12B.

＜同義語の処理＞
類義語の中に含まれる特殊な場合として同義語がある。類義語は概念が近い語の集合であるが、同義語はほぼ１００％概念が同じである。文字列のみ異なる語群に異表記があるが、異表記を含めて１００％近く概念が同じ語を本発明では同義語と呼ぶ。語によっては類義語か同義語か判断できない場合もあるので、本発明では同義語辞書に登録してある語群を同義語として扱い、同義語辞書に登録してない語群は類義語として扱う。 <Processing of synonyms>
Synonyms are a special case included in synonyms. A synonym is a set of words with similar concepts, but synonyms are almost 100% identical in concept. Although there are different notations in word groups that differ only in character strings, words that have almost the same concept, including different notations, are called synonyms in the present invention. Depending on the word, it may not be possible to determine whether it is a synonym or a synonym. Therefore, in the present invention, a word group registered in the synonym dictionary is treated as a synonym, and a word group not registered in the synonym dictionary is treated as a synonym.

次の語群は同義語である。例えば、
ディスプレイ、デスプレイ、デイスプレイ、デスプレー、ディスプレー、デイスプレー −−（２１）
である。この６語は全く同じ概念で使用される。類義語辞書の或る行が、
ディスプレイ、デスプレイ、デイスプレイ、デスプレー、ディスプレー、デイスプレー、表示、掲示、−−−（２２）
の様な場合、主行であるとして、主行比較語数をｍｇ＝６に設定しても、この６語が同義語で占有されてしまいｍｇ＝１と等価となり、統合可否判断に弊害が発生する場合がある。この様な行は類義語辞書を使用する際の表示においても、全く同じ概念で文字列が僅かに異なる語が、セルを同義語の数だけ占有するので表示画面を見にくくし、必要な語を捜すため多くのセルを注視せねばならず検索者の疲労を増大させる。 The next word group is synonymous. For example,
Display, Display, Display, Display, Display, Display --- (21)
It is. These six words are used in exactly the same concept. A line in the synonym dictionary
Display, Display, Display, Display, Display, Display, Display, Posting, (22)
In this case, even if the main comparison word count is set to mg = 6 assuming that the main bank is the main bank, these 6 words are occupied by synonyms and are equivalent to mg = 1. There is a case. Such a line is also displayed when using the synonym dictionary. Words with slightly different character strings using the same concept occupy the same number of synonyms, making the display screen difficult to see and searching for the necessary words. Therefore, many cells must be watched, which increases the searcher's fatigue.

また、次のような２行の場合を考える。ただしカギ括弧内は当該語の頻度である。
ディスプレイ[１]、デイスプレイ[１]、ディスプレー[１]、表示[１]、掲示[１]
−−−（２３）
デスプレイ[１]、デスプレー[１]、デイスプレー[１]、表示[１]、案内[１]
−−−（２４）
この場合、図３のＳ１６０で述べたような判断基準を用いると、統合不可と判断される。この２行は統合されるべきなのにも拘わらず統合処理不可の判定となる。同義語に関してはこの様な問題があるので、この同義語を処理して、障害を取り除く必要がある。 Further, consider the case of the following two lines. However, the frequency in the brackets is the frequency of the word.
Display [1], Display [1], Display [1], Display [1], Posting [1]
--- (23)
Display [1], Display [1], Display [1], Display [1], Guidance [1]
-(24)
In this case, it is determined that the integration is impossible if the determination criterion described in S160 of FIG. 3 is used. These two lines are determined to be unintegrated even though they should be integrated. Since there is such a problem with synonyms, it is necessary to process the synonyms and remove the obstacles.

同義語に関しては、図１３Ａに同義語辞書（語部）、図１３Ｂに同義語辞書（語相対頻度部）として例示すような同義語辞書を用意する。同義語辞書の各行において同義語代表語（先頭語）を定める。同義語代表語（先頭語）として適切なものは、各行において最も使用される頻度の高い語であり、行の最左のセルに記載する。
「同義語処理」は、この同義語辞書を用いて、図３のフローチャートにおける統合前類義語辞書を取り込む処理（Ｓ１１０）に先立って、類義語群中の同義語辞書に記載のある語は、同義語先頭語に置き換える。その結果、同じ行に複数の同じ同義語先頭語が存在する場合には、1つの同義語先頭語に纏めることができる。
語頻度は、以下に示すように、同義語の語頻度を合計する場合と、合計しない場合とを必要に応じて選択することができる。
上述の類義語群例（２３），（２４）の場合の語頻度は、この同義語処理により次のようになる。
（同義語処理後に、語頻度を加算する場合）
ディスプレイ[３]、表示[１]、掲示[１] −−（２５）
ディスプレイ[３]、表示[１]、案内[１] −−（２６）
この２行は、表示したとき見易く、統合処理においても判定基準２に照らして統合可の判定が得られる。
（同義語処理後に、語頻度を加算しない場合）
ディスプレイ[１]，表示[１]，掲示[１]−−−（２７）
ディスプレイ[１]，表示[１]，案内[１] −−（２８） As for synonyms, a synonym dictionary (word part) is prepared as shown in FIG. 13A and a synonym dictionary (word relative frequency part) as shown in FIG. 13B. A synonym representative word (first word) is defined in each line of the synonym dictionary. What is appropriate as a synonym representative word (first word) is a word that is most frequently used in each row, and is written in the leftmost cell of the row.
The “synonym processing” uses the synonym dictionary, and the word described in the synonym dictionary in the synonym dictionary prior to the process (S110) of importing the synonym dictionary before integration in the flowchart of FIG. Replace with the first word. As a result, when there are a plurality of the same synonym head words on the same line, they can be combined into one synonym head word.
As shown below, the word frequency can be selected according to need between the case of summing up the word frequencies of synonyms and the case of not summing up.
The word frequencies in the above synonym group examples (23) and (24) are as follows by this synonym processing.
(When adding word frequencies after synonym processing)
Display [3], Display [1], Posting [1]-(25)
Display [3], Display [1], Guidance [1]-(26)
These two lines are easy to see when displayed, and even in the integration process, it is possible to determine whether integration is possible in light of the determination criterion 2.
(After synonym processing, word frequency is not added)
Display [1], Display [1], Posting [1] --- (27)
Display [1], Display [1], Guidance [1]-(28)

類義語辞書内の同義語を、同義語辞書を使用して同義語先頭語に置き換える処理は、類義語辞書の整理統合に役立つのみならず、同義語を含む一般類義語辞書の場合においても役立つ。統合処理を行った類義語辞書、或いは統合処理をしていない類義語辞書で上記同義語に置き換える処理をして、各同義語を同義語代表語（先頭語）で置き換えてある類義語辞書は、辞書使用時には、同義語辞書を使用して複数の同義語に展開する。この時展開される同義語は、元の置換前の語群と同じとは限らない。以下に同義語代表語（先頭語）を展開する例を示す。 The process of replacing the synonym in the synonym dictionary with the synonym head word using the synonym dictionary is not only useful for organizing and synonym of synonym dictionaries, but also useful for general synonym dictionaries including synonyms. A synonym dictionary in which synonyms are replaced with the above synonyms in a synonym dictionary that has been integrated or that has not been integrated, and each synonym is replaced with a synonym representative (first word) Sometimes it is expanded into multiple synonyms using a synonym dictionary. The synonyms developed at this time are not necessarily the same as the original word group before replacement. An example of developing synonym representative words (first words) is shown below.

１）検索論理式作成時に同義語辞書に存在する語が使用されたときには、同義語辞書を参照して論理式上で自動展開する。検索式が
ディスプレー＋表示＋掲示 −−−（２９）
の時、自動展開された検索式は、同義語辞書が図１３Ａの場合、
ディスプレー＋デスプレイ＋デイスプレイ＋デスプレー＋ディスプレー＋デイスプレー＋表示＋掲示 −−−（３０）
となる。展開する語数に例えば上位３語までの制限を設けて、
ディスプレー＋デスプレイ＋デイスプレイ＋表示＋掲示 −−−（３１）
とすることもできる。この展開する語数に関する制限としては、
ａ）頻度が高い方から所定の語数、
ｂ）頻度が同義語代表語（先頭語）に対する相対頻度が一定割合以上の語、
ｃ）その他必要に応じて設定することができる。 1) When a word existing in the synonym dictionary is used at the time of creating the search logical expression, it is automatically expanded on the logical expression with reference to the synonym dictionary. Search formula is display + display + posting ---- (29)
At this time, the automatically expanded search formula is as follows when the synonym dictionary is FIG.
Display + Display + Display + Display + Display + Display + Display + Display ---- (30)
It becomes. For example, set a limit on the number of words to be expanded to the top three words,
Display + Display + Display + Display + Posting ---- (31)
It can also be. As a restriction on the number of words to expand,
a) a predetermined number of words from the highest frequency,
b) a word whose frequency is a relative frequency with respect to a synonym representative word (first word) or more,
c) Others can be set as necessary.

各語の頻度を示すために、語と頻度を並べて表示することもできる。語と頻度は横に並べても良いし、縦に並べても良い。検索者は、表示された類義語から必要な語を「検索式作成領域」にコピーして、資料検索式を構成する。検索式完成後、データベースの検索が行われる。 To show the frequency of each word, the word and frequency can be displayed side by side. Words and frequencies may be arranged horizontally or vertically. The searcher copies a necessary word from the displayed synonyms into the “search formula creation area” to construct a material search formula. After the search formula is completed, the database is searched.

２）検索論理式作成時に同義語辞書に存在する語が使用されたとき、同義語辞書の当該行を、図１の検索端末の表示装置２５上に頻度と共に展開表示し、検索論理式作成者に必要な類義語を選択させるようにする。
また、図８に示すような類義語の表示と同時に同義語辞書を表示してもよい。
図１４は、上述の（２９）式のような語「ディスプレー」を含む検索式を入力した際に、自動で、または手動による指示で表示される、語「ディスプレー」の同義語とその頻度を表示する画面例である。図１４において、同義語代表語（先頭語）には使用された頻度が、それ以外の語には同義語代表語（先頭語）の頻度を１として正規化した相対頻度が示されている。検索者は、図１４の表示を見て、どこまでの同義語を使用するかを判断する。また、同義語を自動で追加する場合でも確認のために、図１４に示すような表を表示すると検索式作成者に安心感を与える。 2) When a word existing in the synonym dictionary is used at the time of creating the search logical expression, the corresponding line of the synonym dictionary is expanded and displayed along with the frequency on the display device 25 of the search terminal in FIG. To select the necessary synonyms.
Moreover, you may display a synonym dictionary simultaneously with the display of synonyms as shown in FIG.
FIG. 14 shows the synonym of the word “display” and its frequency, which are automatically or manually displayed when a search expression including the word “display” such as the expression (29) is input. It is an example of a screen to display. In FIG. 14, the frequency used for the synonym representative word (first word) is shown, and the relative frequency normalized with the frequency of the synonym representative word (first word) as 1 is shown for the other words. The searcher looks at the display of FIG. 14 and determines how many synonyms are used. Further, even when a synonym is automatically added, displaying a table as shown in FIG. 14 for confirmation gives the search formula creator a sense of security.

従来のテキスト検索システムの構成を示す図である。It is a figure which shows the structure of the conventional text search system. 類義語辞書を作成するためのシステム構成例を示す図である。It is a figure which shows the system configuration example for creating a synonym dictionary. 統合処理のフローチャートを示す図である。It is a figure which shows the flowchart of an integration process. 新規の類義語群を示す図である。It is a figure which shows a new synonym group. 類義語辞書（語部）を示す図である。It is a figure which shows a synonym dictionary (word part). 類義語辞書（語頻度部）を示す図である。It is a figure which shows a synonym dictionary (word frequency part). 最終行まで統合した（Ｓ１９０）後の類義語辞書（語部）Synonym dictionary (word part) after integration to the last line (S190) 最終行まで統合した（Ｓ１９０）後の類義語辞書（語頻度部）Synonym dictionary (word frequency part) after integration to the last line (S190) 統合処理が終了した（Ｓ２１０）後の類義語辞書（語部）Synonym dictionary (word part) after the integration process is completed (S210) 統合処理が終了した（Ｓ２１０）後の類義語辞書（語頻度部）Synonym dictionary (word frequency section) after the integration process is completed (S210) 類義語辞書の使用例を説明する図である。It is a figure explaining the usage example of a synonym dictionary. インデックス表を示す図である。It is a figure which shows an index table. 主行（図５の６行目）の先頭から６語が存在する行を示す図である。It is a figure which shows the line | wire where six words exist from the head of the main line (6th line | wire of FIG. 5). 主行（図５の６行目）に対する副行の範囲を示す図である。It is a figure which shows the range of the sub line with respect to the main line (6th line of FIG. 5). 統合された副行を削除する場合の最終行まで統合した（Ｓ１９０）後の類義語辞書（語部）Synonym dictionary (word part) after integration to the last line in the case of deleting the integrated subline (S190) 統合された副行を削除する場合の最終行まで統合した（Ｓ１９０）後の類義語辞書（語頻度部）Synonym dictionary (word frequency part) after integration to the last line when deleting the integrated sub-row (S190) 同義語辞書（語部）を示す図である。It is a figure which shows a synonym dictionary (word part). 同義語辞書（語相対頻度部）を示す図である。It is a figure which shows a synonym dictionary (word relative frequency part). 同義語辞書の画面表示を示す図である。It is a figure which shows the screen display of a synonym dictionary.

Explanation of symbols

１０ディスク等の記憶装置
１１新規の類義語群
１２統合済類義語辞書
１４その他のデータ
１５処理装置
１６処理プログラム
１７表示装置
１８入力装置
２０テキスト・データベース・システム
２３検索端末
２５表示装置
２６入力装置
２７電気通信回線
３０検索画面
３１検索語入力領域
３２語検索ボタン
３３類義語検索結果表示領域
３４検索式作成領域
３５資料検索ボタン
３６先頭語頻度表示領域 DESCRIPTION OF SYMBOLS 10 Storage devices, such as a disk 11 New synonym group 12 Integrated synonym dictionary 14 Other data 15 Processing device 16 Processing program 17 Display device 18 Input device 20 Text database system 23 Search terminal 25 Display device 26 Input device 27 Telecommunication Line 30 Search screen 31 Search term input area 32 Word search button 33 Synonym search result display area 34 Search formula creation area 35 Material search button 36 First word frequency display area

Claims

A synonym integration system for creating a synonym dictionary from a synonym group of a set of OR-linked words of a search expression inputted at the time of search,
Synonym group importing means for setting the word frequency corresponding to each word of the synonym group line to 1, storing the line and importing it into the synonym dictionary;
A main line sub-line determination means having one line of the synonym dictionary as a main line and the other line as a sub-line;
A row comparison means for comparing the main row and the sub row, checking whether a predetermined condition is satisfied, and determining whether to perform integration processing;
As a result of the comparison by the row comparison means, if a predetermined condition is satisfied, a word that is the same as the main row in the sub-row is added to the word frequency of the corresponding word in the main row, and the sub-row A synonym integration system comprising: a line integration unit that adds words that are not in the main line to the main line, integrates the sub lines into the main line, and rearranges the words in the main line in order of word frequency.

The synonym integration system according to claim 1,
Further, the synonym integration system further comprising: a line rearrangement unit that rearranges the synonym dictionary in a certain order using the first word of the line in the synonym dictionary after the synonym dictionary line integration processing ends.

The synonym integration system according to claim 1 or 2,
If there is an integrated synonym part in the synonym dictionary,
The synonym group capturing means distinguishes the imported synonym part from the integrated synonym part,
The synonym synthesizing system, wherein the main sub-determining means sets a main line from the integrated synonym part, and the sub line is a part of a newly imported synonym group.

In the synonym integrated system in any one of Claims 1-3,
The synonym integration system characterized in that the predetermined condition of the row comparison means limits the number of words to be compared in the main row, and the number of words matched with the sub-row is not less than a predetermined number.

In the synonym integration system in any one of Claims 1-4,
The synonym integration system, wherein the line integration unit further deletes the integrated sub-line.

In the synonym integration system in any one of Claims 1-4,
The synonym integration system, wherein the main row / sub row determination means creates a sub row list including at least two words to be compared in the main row in advance and selects a sub row from the list.

In the synonym integration system in any one of Claims 1-6,
In addition, it has a synonym dictionary,
A synonym integration system comprising synonym processing means for performing synonym processing for grouping synonyms into the same word in the synonym dictionary before working on the synonym dictionary by the synonym group capturing means.

The program for making a computer system implement | achieve each function of the synonym integrated system in any one of Claims 1-7.

The recording medium which recorded the program for making each computer system implement | achieve each function of the synonym integrated system in any one of Claims 1-7.