JP3427674B2

JP3427674B2 - Related word presentation device and medium recording related word presentation program

Info

Publication number: JP3427674B2
Application number: JP13730197A
Authority: JP
Inventors: 博増市
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1997-05-27
Filing date: 1997-05-27
Publication date: 2003-07-22
Anticipated expiration: 2017-05-27
Also published as: JPH10334106A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は関連語提示装置及び
関連語提示用プログラムを記録した媒体に関し、特に検
索条件に関連する単語を提示する関連語提示装置及びコ
ンピュータに検索条件に関連する単語を提示させるため
の関連語提示用プログラムを記録した媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a related word presenting device and a medium storing a related word presenting program, and more particularly to a related word presenting device for presenting words related to a search condition and a computer for displaying a word related to a search condition. The present invention relates to a medium in which a related word presentation program for presentation is recorded.

【０００２】[0002]

【従来の技術】膨大な量の文書を対象とする検索システ
ムでは、一般にキーワードによる検索方法が用いられて
いる。検索条件として任意のキーワード（検索語）を検
索システムに入力すると、文書内容に検索語を含む全て
の文書が検索結果として得られる。この方式による検索
は、全文検索と呼ばれている。また、各文書に対して検
索用のキーワードを予め付加しておき、入力された検索
語と一致するキーワードが付加された文書を検索結果と
する方式も広く用いられている。2. Description of the Related Art In a search system for an enormous amount of documents, a search method using keywords is generally used. When an arbitrary keyword (search word) is input to the search system as a search condition, all documents that include the search word in the document content are obtained as search results. This type of search is called full-text search. In addition, a method is also widely used in which a keyword for search is added to each document in advance, and a document to which a keyword matching the input search word is added is used as the search result.

【０００３】このような検索システムで検索結果として
得られるのは、ユーザによって入力された検索後と完全
に一致する語を含んでいる文書か、あるいは、ユーザが
入力したキーワードと完全に一致する語が検索用のキー
ワードとして付加されている文書のみである。そのた
め、検索語とキーワード間の完全一致が要求されると、
必ずしもユーザが求める全ての文書を網羅的に得られる
とはいえない。A search result obtained by such a search system is a document containing a word which is completely matched by the user after the search, or a word which is completely matched with the keyword inputted by the user. Is only a document to which is added as a search keyword. Therefore, if an exact match between the search term and the keyword is required,
It is not always possible to exhaustively obtain all the documents required by the user.

【０００４】そこで、「特開平２−２９７２９０号公
報」において提案されているように、検索漏れを防ぐた
めに、関連語辞書を用いることにより検索語の関連語を
ユーザに提示し、より検索意図に合致する検索式の作成
を促す方式が用いられている。Therefore, as proposed in Japanese Patent Laid-Open No. 2-297290, in order to prevent omission of search, a related word dictionary is presented to the user by using a related word dictionary, so that the search can be performed more intentionally. A method that encourages the creation of a matching search expression is used.

【０００５】例えば、ユーザが入力した検索語が「ＳＧ
ＭＬ」の場合、関連語辞書から「ＳＧＭＬ」の関連語と
して「ＨＴＭＬ」「ＯＤＡ」「構造化文書」等を取得
し、ユーザに提示する。提示された関連語の中からユー
ザが適切であると判断した関連語を「ＳＧＭＬ」と論理
和演算子で接続することによって、検索漏れの軽減を図
る。このように、検索式に対して、提示された関連語の
いくつかを論理和演算子で接続することによって、検索
漏れを防ぐことが可能となる。For example, the search term input by the user is "SG
In the case of "ML", "HTML", "ODA", "structured document" and the like are acquired from the related word dictionary as related words of "SGML" and presented to the user. By connecting a related word judged to be appropriate by the user from the presented related words to “SGML” by a logical sum operator, it is possible to reduce omission of search. In this way, it is possible to prevent omission of search by connecting some of the presented related words to the search expression by the logical sum operator.

【０００６】また、提示された関連語は、検索漏れの軽
減を目的とする以外に、検索結果の絞り込みを補助する
ためにも用いることもできる。すなわち、検索結果とし
て得られる文書数が多すぎる場合に、検索式に対して、
提示された関連語のいくつかを論理積演算子で接続する
ことによって、適切な絞り込みが可能となる。Further, the presented related words can be used not only for the purpose of reducing omission of the search but also for assisting the narrowing down of the search results. In other words, if there are too many documents as the search results,
Appropriate narrowing is possible by connecting some of the presented related words with a logical product operator.

【０００７】このように、ユーザは、検索結果の数の多
少により、検索漏れを防ぐのか検索結果の絞り込みを行
うのかを判断し、適切な関連語を選択して検索式に接続
していく。In this way, the user determines whether to prevent omission of search or to narrow down the search results depending on the number of search results, and selects an appropriate related word to connect to the search expression.

【０００８】[0008]

【発明が解決しようとする課題】ところで、検索漏れを
防ぐ場合と、検索結果の絞り込みを行う場合とでは、検
索式に接続すべき関連語が異なってくる。そのため、ユ
ーザが提示を望む関連語も異なる。一般に、検索結果が
少ない場合には、検索漏れを防ぐために、広範かつ詳細
な関連語の提示が望まれる。これに対し、検索結果が多
い場合には、検索結果を絞り込むことに重点が置かれ、
広範あるいは詳細な関連語よりも、検索結果を適切な量
に絞り込むことが可能となる関連語の提示が望まれる。By the way, the related words to be connected to the search expression are different depending on whether the search omission is prevented or the search result is narrowed down. Therefore, the related words that the user wants to present are also different. Generally, when the number of search results is small, it is desirable to present a wide range of detailed related words in order to prevent omission of search. On the other hand, when there are many search results, the focus is on narrowing down the search results,
It is desirable to present related words that enable the search results to be narrowed down to an appropriate amount, rather than broad or detailed related words.

【０００９】しかし、上記の従来技術では、検索結果の
多少に関係なく常に固定的に設定された関連語が提示さ
れるため、どの関連語を検索式に接続すべきかを的確に
判断することができない。すなわち、検索漏れの軽減と
検索結果の絞り込みの両目的を効果的に達成する検索式
を作成するのが簡単ではなかった。[0009] However, in the above-mentioned conventional technique, since the related word which is fixedly set is always presented regardless of the number of the search results, it is possible to accurately judge which related word should be connected to the search expression. Can not. In other words, it is not easy to create a search formula that effectively achieves both the search omission and the search result narrowing.

【００１０】本発明はこのような点に鑑みてなされたも
のであり、検索漏れの軽減と検索結果の絞り込みの両目
的を効果的に達成できるような関連語を提示する関連語
提示装置を提供することを目的とする。The present invention has been made in view of the above circumstances, and provides a related word presenting device for presenting related words that can effectively achieve both the purposes of reducing omission of search and narrowing down search results. The purpose is to do.

【００１１】また、本発明の他の目的は、検索漏れの軽
減と検索結果の絞り込みの両目的を効果的に達成できる
ような関連語の提示をコンピュータに行わせるための関
連語提示用プログラムを記録した媒体を提供することで
ある。Another object of the present invention is to provide a related word presenting program for causing a computer to present related words so as to effectively achieve both purposes of reducing omission of search and narrowing down search results. It is to provide a recorded medium.

【００１２】[0012]

【課題を解決するための手段】本発明では上記課題を解
決するために、検索条件に関連する単語を提示する関連
語提示装置において、複数の文書を格納する文書格納手
段と、入力された検索条件を受け取る検索条件受取手段
と、前記検索条件受取手段が受け取った検索条件に適合
する文書集合を前記文書格納手段から取得する文書検索
手段と、前記検索条件受取手段が受け取った検索条件に
関連性の高い単語を関連語として、関連語集合を取得す
る関連語計算手段と、前記文書検索手段から得られた文
書集合の中で、前記関連語計算手段が取得した関連語集
合中の各関連語を含んでいる文書の数である出現数を、
関連語ごとに計算する出現数計算手段と、前記文書検索
手段が取得した文書集合の文書数の増加に従い値が段階
的に増加する下限値を定め、出現数が前記下限値以上で
ある関連語が選択されるような計算式を用いて、表示す
べき関連語を選択する関連語選定手段と、前記関連語選
定手段が選択した関連語を表示装置に表示する関連語表
示手段と、を有することを特徴とする関連語提示装置が
提供される。In order to solve the above problems, the present invention relates to a related word presentation device for presenting words related to a search condition, a document storage means for storing a plurality of documents, and an input search. Relevance to the search condition receiving means for receiving the condition, the document search means for acquiring the document set matching the search condition received by the search condition receiving means from the document storing means, and the search condition received by the search condition receiving means In the related word set acquired by the related word calculation means in the document set obtained from the document search means, and related words in the related word set acquired from the document search means. the number of occurrences is the number of documents that contain,
And occurrence number calculation means for calculating for each Related Terms, value with increasing number of documents of a document set in which the document retrieving means acquires the steps
Set a lower limit that increases, and if the number of appearances is greater than or equal to the lower limit
A related word selecting means for selecting a related word to be displayed by using a calculation formula such that a related word is selected; and a related word displaying means for displaying the related word selected by the related word selecting means on a display device. There is provided a related word presentation device characterized by having:

【００１３】この関連語提示装置によれば、検索条件が
入力されると、その検索条件が検索条件受取手段で受け
取られる。すると、文書検索手段が、検索条件受取手段
が受け取った検索条件に適合する文書集合を文書格納手
段から取得する。さらに、関連語計算手段が、検索条件
受取手段が受け取った検索条件に関連性の高い単語を関
連語として、関連語集合を取得する。すると、出現数計
算手段により、関連語計算手段が取得した関連語集合中
の各関連語の、文書検索手段から得られた文書集合中で
の出現数が計算される。そして、関連語選定手段によ
り、文書検索手段が取得した文書集合の文書数と、出現
数計算手段から得られた各関連語の出現数とを変数とす
る計算式を用いて、表示すべき関連語が選択される。選
択された関連語は、関連語表示手段により、表示装置に
表示される。According to this related word presentation device, when the search condition is input, the search condition is received by the search condition receiving means. Then, the document search means acquires from the document storage means a document set that matches the search condition received by the search condition receiving means. Further, the related word calculation means acquires a related word set by using words having high relevance to the search condition received by the search condition receiving means as related words. Then, the appearance number calculation means calculates the number of appearances of each related word in the related word set acquired by the related word calculation means in the document set obtained from the document search means. Then, the related word selection means uses the calculation formula in which the number of documents in the document set acquired by the document search means and the number of appearances of each related word obtained from the number-of-appearances calculation means are used as variables to display the relations to be displayed. A word is selected. The selected related word is displayed on the display device by the related word display means.

【００１４】これにより、文書検索手段が取得した文書
集合の文書数と、出現数計算手段から得られた各関連語
の出現数とに応じて、表示される関連語を絞り込むこと
ができる。特に、文書集合の文書数の増加に従い値が段
階的に増加する下限値を定め、出現数が下限値以上であ
る関連語が選択されるような計算式を用いて、表示すべ
き関連語を選択するようにしたので、文書数に対応した
関連語を適切に絞り込むことを可能にする。 Thus, the related words to be displayed can be narrowed down according to the number of documents in the document set acquired by the document search means and the number of appearances of each related word obtained from the appearance number calculation means. Especially, as the number of documents in the document set increases,
The lower limit that increases in a hierarchical manner is set, and the number of occurrences is greater than or equal to the lower limit.
Display using a calculation formula that selects related words
Since the related words are selected, it corresponds to the number of documents.
It is possible to appropriately narrow down related words.

【００１５】[0015]

【００１６】[0016]

【００１７】[0017]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。図１は、本発明の第１の原理構成
図である。文書格納手段１は、複数の文書を格納する。
検索条件受取手段２は、キーボードなどを用いてユーザ
が入力した検索条件「Ｓ」を受け取る。文書検索手段３
は、検索条件受取手段２が受け取った検索条件「Ｓ」に
適合する文書集合「Ｘ」を文書格納手段１から取得す
る。関連語計算手段４は、検索条件受取手段２が受け取
った検索条件「Ｓ」に関連性の高い単語を関連語とし
て、関連語集合「Ｗｎ」を取得する。例えば、関連語辞
書を用いて、検索条件に含まれる単語の関連語を抽出し
て関連語集合とする。出現数計算手段５は、関連語計算
手段４から得られた関連語集合「Ｗｎ」中の各関連語
の、文書検索手段３が取得した文書集合中での出現数
「Ｒ（Ｗｎ）」を計算する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a first principle configuration diagram of the present invention. The document storage means 1 stores a plurality of documents.
The search condition receiving means 2 receives the search condition “S” input by the user using a keyboard or the like. Document search means 3
Acquires from the document storage means 1 a document set “X” that matches the search condition “S” received by the search condition receiving means 2. The related word calculation unit 4 acquires a related word set “Wn” by using a word highly related to the search condition “S” received by the search condition receiving unit 2 as a related word. For example, the related word dictionary is used to extract the related words of the words included in the search condition to form a related word set. The number-of-occurrence calculating unit 5 calculates the number of occurrences “R (Wn)” of each related word in the related word set “Wn” obtained from the related word calculating unit 4 in the document set acquired by the document search unit 3. calculate.

【００１８】関連語選定手段６は、文書検索手段３から
得られた文書集合の文書数「Ｎ」と、出現数計算手段５
から得られた各関連語の出現数「Ｒ（Ｗｎ）」とを変数
とする計算式を用いて、表示すべき関連語を選択する。
例えば、文書検索手段３が取得した文書集合の文書数の
増加に従い値が段階的に増加する下限値Ｇ１（Ｎ）を定
めるとともに、文書検索手段３が取得した文書集合の文
書数の増加に従い値が段階的に減少する上限値Ｇ２
（Ｎ）を定める。そして、出現数が下限値以上、上限値
以下である関連語が選択されるような計算式を用いて、
表示すべき関連語を選択する。The related word selection means 6 includes the number of documents “N” in the document set obtained from the document search means 3 and the appearance number calculation means 5.
A related word to be displayed is selected using a calculation formula having a variable of the number of appearances of each related word “R (Wn)” obtained from
For example, the lower limit value G1 (N) is set such that the value gradually increases as the number of documents in the document set acquired by the document search unit 3 increases, and the value decreases as the number of documents in the document set acquired by the document search unit 3 increases. Upper limit value G2 that gradually decreases
Determine (N). Then, using a calculation formula that selects related words whose number of occurrences is greater than or equal to the lower limit and less than or equal to the upper limit,
Select the related word to be displayed.

【００１９】関連語表示手段７は、関連語選定手段６が
選択した関連語を表示装置に表示する。このような構成
の関連語提示装置によれば、検索漏れの軽減と検索結果
の絞り込みの両目的に有効な関連語が表示装置に表示さ
れる。すなわち、検索結果Ｎが多い場合には、検索結果
中で出現数の多い関連語が提示される。これは、検索結
果を徐々に絞り込む際に有効な関連語提示となる。The related word display means 7 displays the related word selected by the related word selection means 6 on the display device. According to the related word presentation device having such a configuration, the related word effective for both the purpose of reducing omission of search and narrowing down the search result is displayed on the display device. That is, when the number of search results N is large, the related words having the largest number of appearances in the search results are presented. This is an effective related word presentation when gradually narrowing down the search results.

【００２０】また、検索結果Ｎが少数の場合には、出現
数の少ない関連語が提示される。これは、検索漏れの軽
減を目的として検索式に論理和接続する際に有効な、よ
り広範な関連語提示となる。When the number of search results N is small, a related word having a small number of appearances is presented. This is a wider range of related word presentation that is effective when logically connecting to a search expression for the purpose of reducing search omissions.

【００２１】図２は、本発明の第２の原理構成図であ
る。この図に示した構成の図１の構成との大きな相違
は、関連語計算処理と出現数計算処理との処理順序が逆
になっている点である。それに伴って、出現数の計算対
象が関連語集合であったのが、文書集合に含まれる単語
集合（関連語計算処理の実行前であるため）になってい
る。ただし、この図の説明では、関連語集合と同じ記号
「Ｗｎ」で文書集合に含まれる単語の集合を示す。FIG. 2 is a block diagram showing the second principle of the present invention. The major difference between the configuration shown in this figure and the configuration in FIG. 1 is that the processing order of the related word calculation processing and the appearance number calculation processing is reversed. Along with this, the target of calculation of the number of occurrences is the related word set, which is the word set included in the document set (because before the related word calculation process is executed). However, in the description of this figure, the same symbol “Wn” as the related word set indicates a set of words included in the document set.

【００２２】文書格納手段１１は、複数の文書を格納す
る。検索条件受取手段１２は、入力された検索条件
「Ｓ」を受け取る。文書検索手段１３は、検索条件受取
手段１２が受け取った検索条件に適合する文書集合
「Ｘ」を文書格納手段１１から取得する。出現数計算手
段１４は、文書検索手段１３が取得した文書集合中に存
在する全ての単語の集合を「Ｗｎ」とし、各単語の文書
集合中での出現数「Ｒ（Ｗｎ）」を計算する。The document storage means 11 stores a plurality of documents. The search condition receiving means 12 receives the input search condition “S”. The document search means 13 acquires from the document storage means 11 a document set “X” that matches the search conditions received by the search condition reception means 12. The number-of-appearances calculation unit 14 sets a set of all words existing in the document set acquired by the document search unit 13 as “Wn”, and calculates the number of appearances “R (Wn)” of each word in the document set. .

【００２３】関連語候補選定手段１５は、文書検索手段
１３が取得した文書集合の文書数「Ｎ」と、出現数計算
手段１４から得られた各単語の出現数「Ｒ（Ｗｎ）」と
を変数とする計算式を用いて、表示すべき関連語候補を
選択する。The related word candidate selecting means 15 sets the number of documents “N” in the document set acquired by the document searching means 13 and the number of appearances “R (Wn)” of each word obtained from the number of appearances calculating means 14. A related word candidate to be displayed is selected using a calculation formula as a variable.

【００２４】関連語計算手段１６は、以下の手順で関連
語候補の中から関連語を抽出する。まず、文書検索手段
１３が取得した文書の数である第１の値と、文書検索手
段１３が取得した文書の中で、各関連語候補を含んでい
る文書の数である関連語候補ごとの第２の値と、文書格
納手段１１に格納されている文書の中で、各関連語候補
を含んでいる文書の数である関連語候補ごとの第３の値
とを取得する。次に、第１の値と第３の値との積あるい
は和である第４の値を関連語候補ごとに計算し、第２の
値と第４の値との比率に基づいて、検索条件受取手段１
２が受け取った検索条件と各関連語候補との間の関連度
を計算する。そして、関連度の高い関連語候補を関連語
として抽出する。関連度を計算するための計算式には、
後述する拡張相互情報量、拡張Ｄｉｃｅ−ｃｏｅｆｆｉ
ｃｉｅｎｔおよび拡張ｔ−ｓｃｏｒｅを用いることがで
きる。The related word calculating means 16 extracts a related word from the related word candidates by the following procedure. First, for the first value that is the number of documents acquired by the document search unit 13 and for each related word candidate that is the number of documents that include each related word candidate among the documents acquired by the document search unit 13. The second value and the third value for each related word candidate, which is the number of documents containing each related word candidate among the documents stored in the document storage unit 11, are acquired. Next, a fourth value, which is the product or sum of the first value and the third value, is calculated for each related word candidate, and the search condition is calculated based on the ratio of the second value and the fourth value. Receiving means 1
2 calculates the degree of association between the search condition received and each related word candidate. Then, a related word candidate having a high degree of association is extracted as a related word. The formula for calculating the degree of association is
Extended mutual information amount and extended Dice-coeffi described later
The client and the extended t-score can be used.

【００２５】関連語表示手段１７は、関連語計算手段１
６が抽出した関連語を表示装置に表示する。このような
関連語提示装置によれば、関連度を計算する前に関連語
候補の選定を行っているため、関連度計算が不要な関連
語に対しての関連度計算処理を省略することが可能とな
る。したがって、複雑な関連度計算を行う場合であって
も、文書の絞り込み等に有効な関連語の表示を高速に行
うことができる。The related word display means 17 is related word calculation means 1
The related word extracted by 6 is displayed on the display device. According to such a related word presentation device, since the related word candidates are selected before calculating the related degree, the related degree calculation process for the related words for which the related degree calculation is unnecessary can be omitted. It will be possible. Therefore, even when a complicated degree-of-association calculation is performed, it is possible to quickly display related words effective for narrowing down documents and the like.

【００２６】図３は、本発明の第３の原理構成図であ
る。この関連語提示装置は、文書格納手段２１、検索条
件受取手段２２、文書検索手段２３、出現数計算手段２
４、関連語候補選定手段２５、関連語計算手段２６、関
連語対出現数計算手段２７、関連語間関連度計算手段２
８、及び関連語表示手段２９を有している。ここで、文
書格納手段２１、検索条件受取手段２２、文書検索手段
２３、出現数計算手段２４、関連語候補選定手段２５、
関連語計算手段２６については、図２に示した同名の構
成要素と同じ機能を有しているため説明省略し、他の構
成要素について以下に説明する。FIG. 3 is a block diagram of the third principle of the present invention. This related word presentation device includes a document storage unit 21, a search condition receiving unit 22, a document search unit 23, and an appearance number calculation unit 2.
4, related word candidate selection means 25, related word calculation means 26, related word pair appearance number calculation means 27, related word inter-relationship degree calculation means 2
8 and related word display means 29. Here, the document storage means 21, the search condition reception means 22, the document search means 23, the appearance number calculation means 24, the related word candidate selection means 25,
The related word calculation means 26 has the same function as the constituent element having the same name shown in FIG. 2, and therefore the description thereof is omitted, and other constituent elements will be described below.

【００２７】関連語対出現数計算手段２７は、任意の関
連語が指定されると、指定された特定関連語と、関連語
計算手段２６から得られた関連語集合中の他の関連語と
の同時出現数を、文書検索手段２３が取得した文書集合
を対象に計算する。When any related word is specified, the related word pair appearance number calculation means 27 recognizes the specified specific related word and other related words in the related word set obtained from the related word calculation means 26. The number of simultaneous appearances of is calculated for the document set acquired by the document search means 23.

【００２８】関連語間関連度計算手段２８は、出現数計
算手段２４から得られた各関連語の出現数と、関連語対
出現数計算手段２７から得られた特定関連語と他の関連
語との間の同時出現数とを変数とする計算式に基づいて
関連語間関連度を計算し、特定関連語に対して関連語間
関連度の高い関連語を抽出する。The related word inter-relationship degree calculating means 28 calculates the number of appearances of each related word obtained from the appearance number calculating means 24, the specific related word obtained from the related word pair appearance number calculating means 27, and other related words. The degree of association between related words is calculated based on a calculation formula in which the number of simultaneous appearances between and is associated with a variable, and the related words having a high degree of association between related words are extracted with respect to a specific related word.

【００２９】関連語表示手段２９は、関連語計算手段２
６が抽出した関連語を表示装置に表示するとともに、関
連語間関連度計算手段２８が抽出した関連語をも表示装
置に表示する。The related word display means 29 is related word calculation means 2
The related word extracted by 6 is displayed on the display device, and the related word extracted by the related word inter-related degree calculating unit 28 is also displayed on the display device.

【００３０】このような関連語提示装置により、ユーザ
は、任意の関連語を指定すれば、その関連語と一定の関
係にある別の関連語を知ることができ、検索式に対して
論理演算子で接続すべき関連語を容易に選択できる。With such a related word presenting device, the user can know another related word having a certain relationship with the related word by designating an arbitrary related word, and perform a logical operation on the retrieval expression. You can easily select related words that should be connected as a child.

【００３１】なお、この図の構成は、図２の構成に、関
連語対出現数計算手段と関連語間関連度計算手段とを追
加し、関連語表示手段の機能を追加したものであるが、
同様の手段及び機能を、図１の構成に追加することも可
能である。The configuration of this figure is obtained by adding the related word pair appearance number calculation means and the related word inter-relationship degree calculation means to the structure of FIG. 2 and adding the function of the related word display means. ,
Similar means and functions can be added to the configuration of FIG.

【００３２】また、上記の各原理構成の構成要素の機能
は、各処理機能の命令が記述されたプログラムをコンピ
ュータで実行することにより実現できる。その場合、プ
ログラムは、コンピュータで読み取り可能な記録媒体に
格納しておく。記録媒体としては、半導体記憶装置や、
磁気記録装置、あるいは光ディスク等を用いることがで
きる。Further, the functions of the constituent elements of the above-described principle configurations can be realized by executing a program in which a command of each processing function is described by a computer. In that case, the program is stored in a computer-readable recording medium. As the recording medium, a semiconductor memory device,
A magnetic recording device, an optical disk, or the like can be used.

【００３３】ところで、本発明の関連語計算手段では、
本来単語間の類似度として用いる統計量である相互情報
量、Ｄｉｃｅ−ｃｏｅｆｆｉｃｉｅｎｔおよびｔ−ｓｃ
ｏｒｅを拡張することによって、検索式と単語の間の類
似度を計算し、類似度の高いものを関連語とすることが
できる。相互情報量、Ｄｉｃｅ−ｃｏｅｆｆｉｃｉｅｎ
ｔおよびｔ−ｓｃｏｒｅを単語間の類似度計算に用いた
例として、「春野，山崎：辞書と統計を用いた対訳アラ
イメント，情報処理学会自然言語処理研究会研究報告，
９６−ＮＬ−１１２，ｐｐ．２３−３０（１９９
６）」、「大森，堤，中西：統計情報を用いた対訳単語
辞書の作成，言語処理学会第２回年次大会発表論文集，
ｐｐ．４９−５２（１９９６）」等を挙げることができ
る。By the way, in the related word calculating means of the present invention,
Mutual information, Dice-coefficient, and t-sc, which are statistic amounts originally used as similarity between words.
By expanding ore, the degree of similarity between the search expression and the word can be calculated, and the one having a high degree of similarity can be set as the related word. Mutual information, Dice-coefficien
As an example of using t and t-score for similarity calculation between words, "Haruno, Yamazaki: Bilingual alignment using dictionaries and statistics, Research Report of IPSJ Natural Language Processing Research Group,
96-NL-112, pp. 23-30 (199
6) ”,“ Omori, Tsutsumi, Nakanishi: Preparation of bilingual word dictionary using statistical information, Proceedings of the 2nd Annual Conference of the Language Processing Society of Japan,
pp. 49-52 (1996) "and the like.

【００３４】以下に、相互情報量などを本願発明に適用
するための拡張方法について説明する。単語ｗｏｒｄ１
とｗｏｒｄ２の間の相互情報量（ＭＩ）は、An extension method for applying mutual information and the like to the present invention will be described below. Word word1
And the mutual information (MI) between word2 is

【００３５】[0035]

【数１】 [Equation 1]

【００３６】と定義される。ただし、全検索対象文書数
をＭ、ｗｏｒｄ１とｗｏｒｄ２を共に含む文書数をａ、
ｗｏｒｄ１のみを含む文書数をｂ、ｗｏｒｄ２のみを含
む文書数をｃとした場合、Is defined as However, the total number of search target documents is M, the number of documents including both word1 and word2 is a,
When the number of documents including only word1 is b and the number of documents including only word2 is c,

【００３７】[0037]

【数２】 [Equation 2]

【００３８】[0038]

【数３】 [Equation 3]

【００３９】[0039]

【数４】 [Equation 4]

【００４０】である。これに対して本発明では、検索式
Ｓと単語ｗｏｒｄの間の相互情報量（ＭＩ₀）を、It is On the other hand, in the present invention, the mutual information amount (MI ₀ ) between the search expression S and the word word is

【００４１】[0041]

【数５】 [Equation 5]

【００４２】と定義する。ただし、全検索対象文書数を
Ｍ、ｗｏｒｄを含みかつ検索式Ｓから得られる文書の数
をａ₀、検索式Ｓから得られる文書のうちｗｏｒｄを含
まない文書の数をｂ₀、ｗｏｒｄを含む文書のうち検索
式Ｓから得られる文書を除いた文書の数をｃ₀とした場
合、It is defined as However, the total number of search target documents includes M and words, and the number of documents obtained from the search formula S includes a ₀ , and the number of documents that do not include word among the documents obtained from the search formula S includes b ₀ and word. If the number of documents excluding the documents obtained from the search expression S among the documents is c ₀ ,

【００４３】[0043]

【数６】 [Equation 6]

【００４４】[0044]

【数７】 [Equation 7]

【００４５】[0045]

【数８】 [Equation 8]

【００４６】である。ここで、「ａ₀＋ｂ₀」が図２の
説明における「第１の値」に相当し、「ａ₀」が「第２
の値」に相当し、「ａ₀＋ｃ₀」が「第３の値」に相当
する。したがって、式（５）は、It is Here, “a ₀ + b ₀ ” corresponds to the “first value” in the description of FIG. 2, and “a ₀ ” corresponds to the “second value”.
"Value of" and "a ₀ + c ₀ " corresponds to "third value". Therefore, equation (5) becomes

【００４７】[0047]

【数９】 [Equation 9]

【００４８】とすることにより、全検索対象文書数Ｍ、
「第１の値」、「第２の値」及び「第３の値」を変数と
する計算式となる。相互情報量と同様に単語間の類似度
を求める統計量として、Ｄｉｃｅ−ｃｏｅｆｆｉｃｉｅ
ｎｔおよびｔ−ｓｃｏｒｅを挙げることができる。Ｄｉ
ｃｅ−ｃｏｅｆｆｉｃｉｅｎｔ（ＤＣ）およびｔ−ｓｃ
ｏｒｅ（ＴＳ）は、As a result, the total number of search target documents M,
The calculation formula has “first value”, “second value”, and “third value” as variables. Dice-coefficie is used as a statistic for obtaining the similarity between words as well as the mutual information.
Mention may be made of nt and t-score. Di
ce-coefficient (DC) and t-sc
ore (TS) is

【００４９】[0049]

【数１０】 [Equation 10]

【００５０】[0050]

【数１１】 [Equation 11]

【００５１】と定義される。これらについても、相互情
報量と同様に、検索式と単語の間の類似度計算するため
に以下のような拡張を施すことが可能である。Is defined as Similar to the mutual information, these can be expanded as follows to calculate the similarity between the search formula and the word.

【００５２】[0052]

【数１２】 [Equation 12]

【００５３】[0053]

【数１３】 [Equation 13]

【００５４】ＭＩ₀（Ｓ，word），ＤＣ₀（Ｓ，wor
d），ＴＳ₀（Ｓ，word）のいずれも、その値が大きい
ほど検索式Ｓと単語ｗｏｒｄの間に高い類似性があるこ
とを意味する。以後、ＭＩ₀（Ｓ，word）を「拡張相互
情報量」、ＤＣ₀（Ｓ，word）を「拡張ＤＣ」、ＴＳ₀
（Ｓ，word）を「拡張ＴＳ」と呼ぶこととする。なお、
相互情報量と同様に、拡張ＤＣと拡張ＴＳとをそれぞ
れ、MI ₀ (S, word), DC ₀ (S, wor
The larger the value of both d) and TS ₀ (S, word), the higher the similarity between the search expression S and the word word. Thereafter, MI ₀ (S, word) is “extended mutual information”, DC ₀ (S, word) is “extended DC”, and TS ₀
(S, word) will be referred to as "extended TS". In addition,
Similar to the mutual information, the extended DC and the extended TS are

【００５５】[0055]

【数１４】 [Equation 14]

【００５６】[0056]

【数１５】 [Equation 15]

【００５７】と表すことができる。式（１４）から分か
るように、拡張ＤＣを求める際には、全検索対象文書数
Ｍは不要である。次に、本発明の関連語提示装置の実施
の形態を具体的に説明する。It can be expressed as As can be seen from the equation (14), the total number of search target documents M is not necessary when obtaining the extended DC. Next, an embodiment of the related word presentation device of the present invention will be specifically described.

【００５８】図４は、本発明の実施の形態の構成を示す
ブロック図である。これは、図３に示した構成に基づい
た実施の形態である。文書格納手段３１は、電子化され
た検索対象文書の内容を形態素解析手段３２によって付
加される文書識別子と対にして格納する記憶装置であ
る。FIG. 4 is a block diagram showing the configuration of the embodiment of the present invention. This is an embodiment based on the configuration shown in FIG. The document storage means 31 is a storage device for storing the contents of the electronically-searched document in a pair with the document identifier added by the morpheme analysis means 32.

【００５９】形態素解析手段３２は、文書格納手段３１
に格納されている各文書に文書識別子を付加した上で、
各文書に形態素解析処理を施して自立語（キーワードと
なるべき語）を抽出し、対応する文書識別子と対にして
格納する。The morphological analysis means 32 is the document storage means 31.
After adding the document identifier to each document stored in
A morphological analysis process is performed on each document to extract an independent word (a word to be a keyword), which is stored as a pair with a corresponding document identifier.

【００６０】索引構造生成手段３３は、形態素解析手段
３２での形態素解析処理結果を基に、索引構造として、
単語−単語識別子リスト３４ａ、単語識別子−文書識別
子リスト３４ｂ、文書識別子−単語識別子リスト３４ｃ
を作成する。The index structure generating means 33, based on the morpheme analysis processing result by the morpheme analyzing means 32, creates an index structure.
Word-word identifier list 34a, word identifier-document identifier list 34b, document identifier-word identifier list 34c
To create.

【００６１】索引構造格納手段３４は、索引構造生成手
段３３によって作成された単語−単語識別子リスト３４
ａ、単語識別子−文書識別子リスト３４ｂ、文書識別子
−単語識別子リスト３４ｃを格納する記憶装置である。The index structure storage means 34 includes the word-word identifier list 34 created by the index structure generation means 33.
a, a word identifier-document identifier list 34b, and a document identifier-word identifier list 34c.

【００６２】単語−単語識別子リスト３４ａは、単語文
字列とその単語を特定するための単語識別子の対応関係
を記述したリストである。単語識別子−文書識別子リス
ト３４ｂは、各単語識別子について、その単語識別子で
示される単語文字列を含む文書の文書識別子の集合を記
述したリストである。The word-word identifier list 34a is a list in which the correspondence between a word character string and a word identifier for identifying the word is described. The word identifier-document identifier list 34b is a list that describes, for each word identifier, a set of document identifiers of documents that include the word character string indicated by the word identifier.

【００６３】文書識別子−単語識別子リスト３４ｃは、
各文書識別子について、その文書識別子で示される文書
に含まれる単語の単語識別子の集合を記述したリストで
ある。The document identifier-word identifier list 34c is
For each document identifier, it is a list describing a set of word identifiers of words included in the document indicated by the document identifier.

【００６４】検索条件受取手段４１は、単語を論理和演
算子あるいは論理積演算子で接続することによって構成
される検索条件（検索式）の入力を、キーボードなどの
入力装置から受け付けるユーザインタフェースである。The search condition receiving means 41 is a user interface that receives an input of a search condition (search formula) formed by connecting words by a logical sum operator or a logical product operator from an input device such as a keyboard. .

【００６５】文書検索手段４２は、検索条件受取手段４
１に入力された検索条件に適合する全ての文書の文書識
別子を、単語−単語識別子リスト３４ａ及び単語識別子
−文書識別子リスト３４ｂを参照して取得し、取得した
文書識別子集合を保存する。また、関連語候補選定手段
４５に対しては、保存した文書識別子集合中の識別子数
を渡す。関連語計算手段４６に対しては、保存した文書
識別子集合中の識別子数を渡すと共に、関連語計算手段
４６から与えられる単語識別子に対応する単語を含む文
書の総数を渡す。関連語対出現数計算手段４７に対して
は、関連語対出現数計算手段４７から与えられる単語識
別子対を含む文書であり、かつ、保存した文書識別子集
合に含まれる文書の総数を渡す。The document retrieval means 42 is the retrieval condition receiving means 4
The document identifiers of all the documents that match the search condition input in 1 are acquired by referring to the word-word identifier list 34a and the word identifier-document identifier list 34b, and the acquired document identifier set is stored. Further, the number of identifiers in the stored document identifier set is passed to the related word candidate selection means 45. To the related word calculation means 46, the number of identifiers in the stored document identifier set is passed, and the total number of documents including the word corresponding to the word identifier given from the related word calculation means 46 is passed. To the related word pair appearance number calculation means 47, the total number of documents which are documents including the word identifier pair given from the related word pair appearance number calculation means 47 and which are included in the saved document identifier set is passed.

【００６６】文書内単語検索手段４３は、文書検索手段
４２から得られる検索条件に適合する文書集合の各文書
に含まれる単語の識別子集合を、文書識別子−単語識別
子リスト３４ｃを参照して取得し、それらを連接して１
つの単語識別子集合とする。The in-document word searching means 43 acquires the identifier set of words included in each document of the document set that matches the search condition obtained from the document searching means 42 by referring to the document identifier-word identifier list 34c. , Connect them 1
One word identifier set.

【００６７】単語出現数計算手段４４は、文書内単語検
索手段４３から得られた単語識別子集合中での各単語識
別子の出現数を計算し、単語識別子と出現数とを対にし
てリストを作成する。The word appearance number calculation means 44 calculates the number of appearances of each word identifier in the word identifier set obtained from the in-document word search means 43, and creates a list by pairing the word identifiers and the number of appearances. To do.

【００６８】関連語候補選定手段４５は、単語出現数計
算手段４４から得られた各単語識別子に対応する出現数
と、文書検索手段４２から得られた検索条件に適合する
文書識別子の総数とから、関連語の候補となる単語識別
子を計算する。The related word candidate selection means 45 uses the number of appearances corresponding to each word identifier obtained from the word appearance number calculation means 44 and the total number of document identifiers matching the search condition obtained from the document search means 42. , Calculate word identifiers that are candidates for related words.

【００６９】関連語計算手段４６は、関連語候補選定手
段４５によって計算された関連語の候補となる各単語識
別子に対応する出現数と、文書検索手段４２から得られ
た検索条件に適合する文書識別子の総数と、文書検索手
段４２から得られた単語識別子に対応する単語を含む文
書数の３つの値を基に、検索条件受取手段４１に入力さ
れた検索条件と関連語候補選定手段４５によって計算さ
れた各関連語候補単語との間の拡張相互情報量を計算す
る。そして、予め定められた閾値よりも大きい拡張相互
情報量を持つ関連語候補単語を関連語とする。The related word calculation means 46 matches the number of appearances corresponding to each word identifier, which is a related word candidate calculated by the related word candidate selection means 45, and the document matching the search condition obtained from the document search means 42. Based on the three values of the total number of identifiers and the number of documents including the word corresponding to the word identifier obtained from the document search means 42, the search condition input to the search condition reception means 41 and the related word candidate selection means 45 The extended mutual information between each calculated related word candidate word is calculated. Then, a related word candidate word having an extended mutual information amount larger than a predetermined threshold is set as a related word.

【００７０】関連語対出現数計算手段４７は、関連語計
算手段４６から各関連語の出現数を取得し、さらに、検
索条件受取手段４１に入力された検索条件に適合する文
書集合の中で、任意の２つの関連語の対を同時に含む文
書数を、文書検索手段４２から取得する。The related word pair appearance number calculation means 47 obtains the number of appearances of each related word from the related word calculation means 46, and further, in the document set that matches the search condition input to the search condition receiving means 41. , The number of documents that simultaneously include any two pairs of related words is acquired from the document search means 42.

【００７１】関連語間関連度計算手段４８は、関連語対
出現数計算手段４７から得られる検索条件に適合する文
書集合での、各関連語の出現数と各関連語対を含む文書
数を基に、関連語間の関連度（関連性）を計算する。The related word inter-relationship degree calculation means 48 calculates the number of appearances of each related word and the number of documents including each related word pair in the document set that meets the search condition obtained from the related word pair appearance number calculation means 47. Based on this, the degree of association (relevance) between related words is calculated.

【００７２】関連語表示手段４９は、関連語計算手段４
６で計算された各関連語を、関連語間関連度計算手段４
８によって計算された関連語間関連度に従って出力する
ユーザインタフェースである。The related word display means 49 is related word calculation means 4
The related word calculated in 6 is used as the related word inter-related degree calculating means 4
8 is a user interface that outputs according to the degree of association between related words calculated by 8.

【００７３】検索結果表示手段５０は、文書検索手段４
２から得られる検索条件受取手段４１に入力された検索
条件に適合する文書集合を、文書格納手段３１を参照し
て出力するユーザインタフェースである。The search result display means 50 is the document search means 4
2 is a user interface that outputs a document set that matches the search condition input to the search condition receiving unit 41 obtained from No. 2 by referring to the document storage unit 31.

【００７４】なお、上記の各構成要素の有している機能
は、コンピュータが所定のプログラムモジュールを実行
することによって実現される機能である。そして、これ
らを実現するためのコンピュータプログラムは、半導体
メモリや磁気記録媒体などの記録媒体に記録されてい
る。ただし、文書格納手段３１と索引構造格納手段３４
とは、実際のＨＤＤ（ハードディスク装置）などの記憶
装置を制御することにより実現される機能である。The function of each of the above components is a function realized by the computer executing a predetermined program module. A computer program for realizing these is recorded on a recording medium such as a semiconductor memory or a magnetic recording medium. However, the document storage means 31 and the index structure storage means 34
Is a function realized by controlling a storage device such as an actual HDD (hard disk device).

【００７５】また、図４の関連語提示装置の各構成要素
は、図３の構成要素に対して次のような関係にある。文
書格納手段３１及び索引構造格納手段３４が文書格納手
段２１に対応する。検索条件受取手段４１が検索条件受
取手段２２に対応する。文書検索手段４２が文書検索手
段２３に対応する。文書内単語検索手段４３及び単語出
現数計算手段４４が出現数計算手段２４に対応する。関
連語候補選定手段４５が関連語候補選定手段２５に対応
する。関連語計算手段４６が関連語計算手段２６に対応
する。関連語対出現数計算手段４７が関連語対出現数計
算手段２７に対応する。関連語間関連度計算手段４８が
関連語間関連度計算手段２８に対応する。関連語表示手
段４９が関連語表示手段２９に対応する。Further, each constituent element of the related word presentation device of FIG. 4 has the following relationship with the constituent element of FIG. The document storage means 31 and the index structure storage means 34 correspond to the document storage means 21. The search condition receiving means 41 corresponds to the search condition receiving means 22. The document search means 42 corresponds to the document search means 23. The in-document word searching means 43 and the word appearance number calculating means 44 correspond to the appearance number calculating means 24. The related word candidate selection means 45 corresponds to the related word candidate selection means 25. The related word calculation means 46 corresponds to the related word calculation means 26. The related word pair appearance number calculation means 47 corresponds to the related word pair appearance number calculation means 27. The related word inter-related degree calculating unit 48 corresponds to the related word inter-related degree calculating unit 28. The related word display means 49 corresponds to the related word display means 29.

【００７６】ここで、本実施の形態では関連文書検索を
行う前に、予め索引構造の生成処理を実行しておく必要
がある。そこで、まず索引構造の生成処理について説明
する。Here, in the present embodiment, it is necessary to execute the index structure generation processing in advance before performing the related document search. Therefore, the index structure generation process will be described first.

【００７７】索引構造の生成処理の前提として、形態素
解析結果リストが生成されていなければならない。図５
は、形態素解析手段３２に格納される形態素解析結果リ
スト３２ａの例を示す図である。形態素解析手段３２
は、文書格納手段３１に格納されている各検索対象文書
に識別子を割当てた上で、それぞれの文書に形態素解析
処理を施して自立語を抽出し、対応する文書識別子と対
にして格納する。ただし、同一文書中から同一の自立語
が複数回抽出された場合は、２回目以降の抽出結果を無
視し、一つの文書識別子に対応する自立語が重複するこ
とはないものとする。As a premise of the index structure generation process, the morphological analysis result list must be generated. Figure 5
FIG. 6 is a diagram showing an example of a morpheme analysis result list 32a stored in the morpheme analysis means 32. Morphological analysis means 32
Assigns an identifier to each search target document stored in the document storage unit 31, performs morphological analysis processing on each document, extracts an independent word, and stores it as a pair with a corresponding document identifier. However, when the same independent word is extracted a plurality of times from the same document, the second and subsequent extraction results are ignored, and independent words corresponding to one document identifier do not overlap.

【００７８】この形態素解析結果リスト３２ａを基に、
索引構造生成手段３３が各種索引構造を生成する。図６
〜図８に索引構造生成手段３３により作成され、索引構
造格納手段３４に格納される索引構造の例を示す。なお
図６〜図８中のデータは、図５のデータに基づいて作成
された例となっている。Based on this morphological analysis result list 32a,
The index structure generation means 33 generates various index structures. Figure 6
8 shows an example of the index structure created by the index structure generation means 33 and stored in the index structure storage means 34. Note that the data in FIGS. 6 to 8 is an example created based on the data in FIG.

【００７９】図６は、単語−単語識別子リストの例を示
す図である。単語−単語識別子リスト３４ａには、抽出
された単語と、その単語に割り当てられた識別子とが組
となって格納されている。FIG. 6 is a diagram showing an example of the word-word identifier list. The word-word identifier list 34a stores the extracted words and the identifiers assigned to the words as a set.

【００８０】図７は、単語識別子−文書識別子リストの
例を示す図である。単語識別子−文書識別子リスト３４
ｂには、単語識別子と、その単語識別子が割り当てられ
ている単語を含む文書の識別子（文書識別子）が組とな
って格納されている。FIG. 7 is a diagram showing an example of a word identifier-document identifier list. Word identifier-document identifier list 34
In b, a word identifier and a document identifier (document identifier) including a word to which the word identifier is assigned are stored as a set.

【００８１】図８は、文書識別子−単語識別子リストの
例を示す図である。文書識別子−単語識別子リスト３４
ｃには、文書識別子と、その文書識別子が割り当てられ
ている文書に含まれる単語の単語識別子とが組となって
格納されている。FIG. 8 is a diagram showing an example of the document identifier-word identifier list. Document identifier-word identifier list 34
In c, a document identifier and a word identifier of a word included in the document to which the document identifier is assigned are stored as a set.

【００８２】索引構造生成手段３３による索引構造の生
成アルゴリズムは以下の通りである。図９は、索引構造
の生成アルゴリズムを示すフローチャートである。［Ｓ１］単語−単語識別子リスト３４ａの生成処理形態素解析手段３２に格納されている形態素解析結果リ
スト中の全単語を、重複なく、かつ、単語文字列の持つ
値の順にソートしたリストを作成する。各単語に対し
て、リストの先頭から順に１で始まる自然数を単語識別
子として割当てる。［Ｓ２］文書識別子−単語識別子リスト３４ｃの生成処
理形態素解析手段３２に格納されている形態素解析結果リ
スト中の各単語をステップＳ１で割当てた単語識別子で
置き換え、各文書識別子ごとに対応する単語識別子を小
さい値順にソートする。［Ｓ３］単語識別子−文書識別子リスト３４ｂの生成処
理単語識別子を１から順に並べ、各単語識別子に対応する
単語が含まれる文書の文書識別子を、ステップＳ２で作
成した文書識別子−単語識別子リスト３４ｃを参照して
抽出し、単語識別子と対にして格納する。The index structure generation algorithm by the index structure generation means 33 is as follows. FIG. 9 is a flowchart showing an index structure generation algorithm. [S1] Generation process of word-word identifier list 34a All words in the morpheme analysis result list stored in the morpheme analysis means 32 are sorted without duplication and in the order of the value of the word character string to create a list. . For each word, a natural number starting with 1 from the beginning of the list is assigned as a word identifier. [S2] Generation process of document identifier-word identifier list 34c Each word in the morpheme analysis result list stored in the morpheme analysis means 32 is replaced with the word identifier assigned in step S1, and the word identifier corresponding to each document identifier is replaced. Are sorted in ascending order. [S3] Generation process of word identifier-document identifier list 34b The word identifiers are arranged in order from 1, and the document identifiers of the documents including the words corresponding to the respective word identifiers are set to the document identifier-word identifier list 34c created in step S2. It is referred to and extracted, and is stored as a pair with a word identifier.

【００８３】以上のアルゴリズムにより、索引構造が生
成される。索引構造の生成処理が行われた後、検索条件
受取手段４１は検索式の入力が可能となる。そして、利
用者がキーボードなどの入力装置を用いて所望の検索式
を入力し、検索開始の指令を行うと、関連語の提示処理
が開始される。The index structure is generated by the above algorithm. After the index structure generation process is performed, the search condition receiving unit 41 can input a search expression. Then, when the user inputs a desired search expression using an input device such as a keyboard and issues a search start command, the related word presentation process is started.

【００８４】図１０、図１１に、検索条件受取手段４１
に入力された検索式から関連文書を求めるためのアルゴ
リズムを示す。図１０は、本発明の処理手順を示すフロ
ーチャートの前半であり、図１１は、本発明の処理手順
を示すフローチャートの後半である。以下、図１０、図
１１の各ステップについて説明する。なお、以下の説明
において、単語−単語識別子リスト３４ａをＬ１、単語
識別子−文書識別子リスト３４ｂをＬ２、文書識別子−
単語識別子リスト３４ｃをＬ３と記述する。［Ｓ１１］検索条件受取手段４１が、単語を論理積演算
子あるいは論理和演算子で結合した検索式を受け取る。
この検索式をＳと呼ぶことにする。［Ｓ１２］文書検索手段４２が、Ｓに適合する文書の文
書識別子を、Ｌ１およびＬ２を参照して取得する。得ら
れた文書識別子集合をＸと呼び、集合Ｘの要素数をＮと
する。［Ｓ１３］ステップＳ１２においてＮ＝０であればステ
ップＳ１４へ進み、そうでなければステップＳ１５へ進
む。［Ｓ１４］関連語表示手段４９が、Ｓの関連文書がない
旨の表示を行い、終了する。［Ｓ１５］文書内単語検索手段４３が、Ｘに属する各文
書識別子に対応する全ての単語識別子を、Ｌ３を参照し
て取得する。取得した単語識別子の集合をＹとする。［Ｓ１６］単語出現数計算手段４４が、Ｙに属する単語
識別子の重複を取り除き、各単語識別子の重複回数を記
録する。重複の取り除かれた単語識別子集合を新たにＹ
とし、Ｙの要素Ｗｎ（ｎ＝１，２，・・・，Ｐ）の重複
回数をＲ（Ｗｎ）とする。ただし、ＰはＹの要素数とす
る。［Ｓ１７］関連語候補選定手段４５が、Ｇ１（Ｎ）≦Ｒ
（Ｗｎ）≦Ｇ２（Ｎ）を満たさないＷｎをＹの要素から
取り除き、得られた集合を新たにＹとし、Ｙの要素Ｗｎ
（ｎ＝１，２，・・，Ｐ）の重複回数をＲ（Ｗｎ）とす
る。ただし、Ｐを新たにＹの要素数とする。ここで、Ｇ
１（ｘ）はｘについての増加（階段）関数、Ｇ２（ｘ）
はｘについての減少（階段）関数である。10 and 11, the retrieval condition receiving means 41 is shown.
An algorithm for obtaining related documents from the search expression input in is shown. FIG. 10 is the first half of a flowchart showing the processing procedure of the present invention, and FIG. 11 is the second half of the flowchart showing the processing procedure of the present invention. Hereinafter, each step of FIGS. 10 and 11 will be described. In the following description, the word-word identifier list 34a is L1, the word identifier-document identifier list 34b is L2, and the document identifier-
The word identifier list 34c is described as L3. [S11] The search condition receiving means 41 receives a search expression in which words are combined by a logical product operator or a logical sum operator.
This search formula will be called S. [S12] The document search means 42 acquires the document identifier of the document matching S by referring to L1 and L2. The obtained document identifier set is called X, and the number of elements of the set X is N. [S13] If N = 0 in step S12, the process proceeds to step S14, and if not, the process proceeds to step S15. [S14] The related word display means 49 displays that there is no related document of S, and ends. [S15] The in-document word searching unit 43 acquires all word identifiers corresponding to each document identifier belonging to X by referring to L3. Let Y be the set of acquired word identifiers. [S16] The word appearance number calculation means 44 removes the duplication of the word identifiers belonging to Y and records the number of duplications of each word identifier. Newly remove the duplicated word identifier set Y
And the number of overlaps of the element Wn (n = 1, 2, ..., P) of Y is R (Wn). However, P is the number of elements of Y. [S17] The related word candidate selection means 45 causes G1 (N) ≦ R.
Wn that does not satisfy (Wn) ≦ G2 (N) is removed from the elements of Y, the obtained set is newly set as Y, and the element Wn of Y is set.
Let R (Wn) be the number of overlaps of (n = 1, 2, ..., P). However, P is newly set as the number of elements of Y. Where G
1 (x) is an increasing (step) function for x, G2 (x)
Is the decreasing (staircase) function for x.

【００８５】例えば、Ｇ１（ｘ）とＧ２（ｘ）とを以下
のような関数とすることができる。For example, G1 (x) and G2 (x) can be the following functions.

【００８６】[0086]

【数１６】Ｇ１（ｘ）＝［（２０×ｎ−１９≦ｘ≦２０
×ｎ）を満たす自然数ｎ］## EQU16 ## G1 (x) = [(20 × n−19 ≦ x ≦ 20
Xn) natural number n]

【００８７】[0087]

【数１７】 [Equation 17]

【００８８】［Ｓ１８］文書検索手段４２が、Ｙに属す
る全単語識別子Ｗｎ（ｎ＝１，２，・・・，Ｐ）に関し
て、Ｗｎに対応する文書識別子の総数をＬ２から取得す
る。そして、Ｙの要素Ｗｎに対応する文書識別子数をＦ
（Ｗｎ）とする。［Ｓ１９］関連語計算手段４６が、Ｙに属する単語識別
子Ｗｎ（ｎ＝１，２，・・・，Ｐ）について、全検索対
象文書数をＭとして、[S18] With respect to all word identifiers Wn (n = 1, 2, ..., P) belonging to Y, the document searching means 42 acquires the total number of document identifiers corresponding to Wn from L2. Then, the number of document identifiers corresponding to the element Wn of Y is F
(Wn). [S19] With respect to the word identifier Wn (n = 1, 2, ..., P) belonging to Y, the related word calculation means 46 sets the total number of search target documents to M, and

【００８９】[0089]

【数１８】 [Equation 18]

【００９０】[0090]

【数１９】ｐｒｏｂ（Ｗｎ）＝Ｆ（Ｗｎ）／Ｍ・・・・（１９）を計算し、これらの値をＷｎと組にしてリストとする。
また、[Mathematical formula-see original document] prob (Wn) = F (Wn) / M ... (19) is calculated, and these values are paired with Wn to form a list.
Also,

【００９１】[0091]

【数２０】ｐｒｏｂ（Ｓ）＝Ｎ／Ｍ・・・・（２０）を計算する。［Ｓ２０］関連語計算手段４６が、Ｙに属する各単語識
別子Ｗｎ（ｎ＝１，２，・・・，Ｐ）について、式
（５）に従って、拡張相互情報量ＭＩ₀（Ｓ，Ｗｎ）を
計算する。［Ｓ２１］予め設定された閾値Ｔに関して、Ｔ≦ＭＩ₀
（Ｓ，Ｗｎ）を満たすＷｎ（ｎ＝１，２，３・・・，
Ｐ）が存在すればステップＳ２２へ、存在しなければス
テップＳ２３へ進む。［Ｓ２２］関連語表示手段４９が、Ｔ≦ＭＩ₀（Ｓ，Ｗ
ｎ）を満たすＷｎ（ｎ＝１，２，３・・・，Ｐ）に対応
する単語をＭＩ₀（Ｓ，Ｗｎ）の値が大きいものから順
にＳの関連語として出力し、終了する。［Ｓ２３］関連語表示手段４９が、Ｓの関連文書がない
旨の表示を行い、終了する。## EQU20 ## Prob (S) = N / M ... (20) is calculated. [S20] The related word calculation means 46 calculates the extended mutual information MI ₀ (S, Wn) for each word identifier Wn (n = 1, 2, ..., P) belonging to Y according to the equation (5). calculate. [S21] For a preset threshold value T, T ≦ MI ₀
Wn (n = 1, 2, 3 ...,) that satisfies (S, Wn)
If P) exists, the process proceeds to step S22, and if it does not exist, the process proceeds to step S23. [S22] The related word display means 49 displays T ≦ MI ₀ (S, W
The words corresponding to Wn (n = 1, 2, 3, ..., P) satisfying n) are output as the related words of S in order from the one having the largest MI ₀ (S, Wn) value, and the process ends. [S23] The related word display means 49 displays that there is no related document of S, and ends.

【００９２】以上の処理により、検索式から関連語を得
ることが可能となる。しかも、ステップＳ１７によっ
て、検索式Ｓに適合する文書（検索結果）が多い場合に
は、検索結果中で出現数の多い単語を基に関連語計算を
行うことになり、検索結果を徐々に絞り込む際に有効な
関連語の提示が可能となる。なお、極端に出現数が多い
単語はありふれた単語であるため関連語として提示する
には不適切であるとして除外されている。ステップＳ１
７の式の中のＧ２は、このような極端に出現数の高い単
語を取り除くための関数である。By the above processing, it is possible to obtain the related word from the search expression. Moreover, in step S17, when there are many documents (search results) that match the search expression S, the related words are calculated based on the words that appear most frequently in the search results, and the search results are gradually narrowed down. At that time, it becomes possible to present effective related words. It should be noted that a word with an extremely high number of occurrences is a common word and is excluded as inappropriate for being presented as a related word. Step S1
G2 in the equation of 7 is a function for removing such an extremely frequently appearing word.

【００９３】また、検索結果が少ない場合には、出現数
の少ない単語を含めて関連語の計算を行うことにより、
検索漏れを軽減することを目的として検索式に論理和接
続するための関連語を、より広い範囲で提示することが
可能となる。When the number of retrieval results is small, the related words are calculated by including the words having a small number of appearances,
It is possible to present a wider range of related words for logically connecting to a search expression for the purpose of reducing search omissions.

【００９４】関連語対出現数計算手段４７及び関連語間
関連度計算手段４８による関連語間関連度（関連語間の
関係）を求めるアルゴリズムは以下の通りである。図１
２は、関連語間関連度の算出手順を示すフローチャート
である。［Ｓ３１］上記のステップＳ１１〜Ｓ２３までの処理に
より得られたＳの関連語集合をＺとし、Ｚの要素をＶｍ
（ｍ＝，２，３・・・Ｑ）とする。ただし、ＱはＺの要素
数とする。［Ｓ３２］関連語対出現数計算手段４７が、任意の２つ
の関連語の対（Ｖｍ１，Ｖｍ２）（ｍ１＝１，２，…，
Ｑ，ｍ２＝１，２，…，Ｑ，ｍ１≠ｍ２）に関して、Ｖ
ｍ１とＶｍ２を同時に含む文書であり、かつステップＳ
１２で得られたＸに対応する文書集合に属する文書の総
数（＝Ｒ２（Ｖｍ１，Ｖｍ２））を取得する。［Ｓ３３］関連語間関連度計算手段４８が、Ｖｍ（ｍ＝
１，２，…，Ｑ）に関して、ｕｐｐｅｒ（Ｖｍ），ｌｏ
ｗｅｒ（Ｖｍ），ｅｑｕｉｖａｌｅｎｔ（Ｖｍ），ｓｉ
ｍｉｌａｒ（Ｖｍ）を求める。ここで、ｕｐｐｅｒ（Ｖ
ｍ），ｌｏｗｅｒ（Ｖｍ），ｅｑｕｉｖａｌｅｎｔ（Ｖ
ｍ），ｓｉｍｉｌａｒ（Ｖｍ）はそれぞれ以下の条件を
満たすＶｎ（ｎ＝１，２，…，Ｑ）を要素とするＺの部
分集合である。ｕｐｐｅｒ（Ｖｍ）：Ｔｕ１≦Ｒ（Ｖｎ）／Ｒ（Ｖ
ｍ），Ｔｕ２≦Ｒ２（Ｖｍ，Ｖｎ）／Ｒ（Ｖｍ）ｌｏｗｅｒ（Ｖｍ）：Ｔｌ１≦Ｒ（Ｖｍ）／Ｒ（Ｖ
ｎ），Ｔｌ２≦Ｒ２（Ｖｍ，Ｖｎ）／Ｒ（Ｖｍ）ｅｑｕｉｖａｌｅｎｔ（Ｖｍ）：Ｔｒ１≦Ｒ（Ｖｎ）／
Ｒ（Ｖｍ）≦Ｔｒ２，Ｒ２（Ｖｍ，Ｖｎ）／Ｒ（Ｖｍ）
≦Ｔｒ３ｓｉｍｉｌａｒ（Ｖｍ）：Ｔｓ１≦Ｒ（Ｖｎ）／Ｒ（Ｖ
ｍ）≦Ｔｓ２，Ｔｓ３≦Ｒ２（Ｖｍ，Ｖｎ）／Ｒ（Ｖ
ｍ）ただし、Ｔｕ１，Ｔｕ２（≦１），Ｔｌ１，Ｔｌ２（≦
１），Ｔｒ１，Ｔｒ２（≧Ｔｒ１），Ｔｒ３，Ｔｓ１，
Ｔｓ２（≧Ｔｓ１），Ｔｓ３は予め設定された定数であ
る。The algorithm for calculating the degree of association between related words (relationship between related words) by the number-of-appearance-of-relationship-words calculation means 47 and the association degree of related words calculation means 48 is as follows. Figure 1
2 is a flowchart showing a procedure for calculating the degree of association between related words. [S31] Let Z be the related word set of S obtained by the processing of steps S11 to S23, and let Vm be the element of Z.
(M =, 2, 3 ... Q). However, Q is the number of elements of Z. [S32] The related word pair appearance number calculation means 47 causes the arbitrary two related word pairs (Vm1, Vm2) (m1 = 1, 2, ...,).
Q, m2 = 1,2, ..., Q, m1 ≠ m2)
It is a document containing m1 and Vm2 at the same time, and step S
The total number of documents (= R2 (Vm1, Vm2)) belonging to the document set corresponding to X obtained in 12 is acquired. [S33] The related word degree-of-association calculating unit 48 calculates Vm (m =
1, 2, ..., Q), upper (Vm), lo
wer (Vm), equivalent (Vm), si
Calculate milar (Vm). Where upper (V
m), lower (Vm), equivalent (V
m) and similar (Vm) are subsets of Z each having Vn (n = 1, 2, ..., Q) satisfying the following conditions. upper (Vm): Tu1 ≦ R (Vn) / R (V
m), Tu2 ≦ R2 (Vm, Vn) / R (Vm) lower (Vm): Tl1 ≦ R (Vm) / R (V
n), Tl2 ≤ R2 (Vm, Vn) / R (Vm) equivalent (Vm): Tr1 ≤ R (Vn) /
R (Vm) ≦ Tr2, R2 (Vm, Vn) / R (Vm)
≦ Tr3 similar (Vm): Ts1 ≦ R (Vn) / R (V
m) ≦ Ts2, Ts3 ≦ R2 (Vm, Vn) / R (V
m) However, Tu1, Tu2 (≦ 1), Tl1, Tl2 (≦
1), Tr1, Tr2 (≧ Tr1), Tr3, Ts1,
Ts2 (≧ Ts1) and Ts3 are preset constants.

【００９５】例えば、以下のような定数を設定する。Ｔｕ１＝Ｔｌ１＝４Ｔｕ２＝Ｔｌ２＝Ｔｒ１＝Ｔｓ１＝Ｔｓ３＝０．９Ｔｒ２＝Ｔｓ２＝１．１Ｔｒ３＝０．１上記の各式は、次のような意味を持っている。For example, the following constants are set. Tu1 = Tl1 = 4 Tu2 = Tl2 = Tr1 = Ts1 = Ts3 = 0.9 Tr2 = Ts2 = 1.1 Tr3 = 0.1 The above equations have the following meanings.

【００９６】ｕｐｐｅｒ（Ｖｍ）は、Ｖｍを含む文書よ
りも多量の文書に含まれており、且つ、Ｖｍを含む文書
のほとんどに含まれた関連語である。ｌｏｗｅｒ（Ｖ
ｍ）は、Ｖｍを含む文書よりも少量の文書にしか含まれ
ておらず、且つ、その関連語を含む文書のほとんどにＶ
ｍも含まれているような関連語である。Upper (Vm) is a related word contained in a larger amount of documents than Vm-containing documents and contained in most of Vm-containing documents. lower (V
m) is contained in a smaller amount of documents than the document containing Vm, and V in most of the documents containing the related word is Vm.
It is a related word that includes m.

【００９７】ｅｑｕｉｖａｌｅｎｔ（Ｖｍ）は、Ｖｍと
同程度の数の文書に含まれており、且つ、Ｖｍと同じ文
書に含まれることはほとんど無いような関連語である。
ｓｉｍｉｌａｒ（Ｖｍ）は、Ｖｍと同程度の数の文書に
含まれており、且つ、Ｖｍを含む文書のほとんどに含ま
れている関連語である。Equivalent (Vm) is a related word that is included in the same number of documents as Vm and is rarely included in the same document as Vm.
similar (Vm) is a related word that is included in as many documents as Vm and is included in most of documents that include Vm.

【００９８】図１３は、関連語ＶｎとＶｍとの関係を概
念的に示す図である。なお、各文書集合は、全て検索式
Ｓに適合する文書集合Ｘに含まれている。（Ａ）は、ｅ
ｑｕｉｖａｌｅｎｔ（Ｖｍ）に含まれる関連語Ｖｎを含
む文書集合６１と、関連語Ｖｍを含む文書集合７１との
関係を示している。この図のように、関連語Ｖｎを含む
文書集合６１と関連語Ｖｍを含む文書集合７１とは、ほ
とんど同じ量の文書を有している。そして、両方の文書
集合に含まれる文書の数は、微量である（若しくは全く
ない）。FIG. 13 is a diagram conceptually showing the relationship between the related words Vn and Vm. It should be noted that all the document sets are included in the document set X that matches the search expression S. (A) is e
The relationship between the document set 61 including the related word Vn included in the quivalent (Vm) and the document set 71 including the related word Vm is shown. As shown in the figure, the document set 61 including the related word Vn and the document set 71 including the related word Vm have almost the same amount of documents. And, the number of documents included in both document sets is very small (or none).

【００９９】（Ｂ）は、ｓｉｍｉｌａｒ（Ｖｍ）に含ま
れる関連語Ｖｎを含む文書集合６２と、関連語Ｖｍを含
む文書集合７２との関係を示している。この図のよう
に、関連語Ｖｎを含む文書集合６２と関連語Ｖｍを含む
文書集合７２とは、ほとんど同じ量の文書を有してい
る。そして、一方の関連語を含む文書のほとんどが他方
の関連語を含んでいる。(B) shows the relationship between the document set 62 including the related word Vn included in the similar (Vm) and the document set 72 including the related word Vm. As shown in this figure, the document set 62 including the related word Vn and the document set 72 including the related word Vm have almost the same amount of documents. And most of the documents containing one related word contain the other related word.

【０１００】（Ｃ）は、ｕｐｐｅｒ（Ｖｍ）に含まれる
関連語Ｖｎを含む文書集合６３と、関連語Ｖｍを含む文
書集合７３との関係を示している。この図のように、関
連語Ｖｍを含む文書集合７３のほとんどが、関連語Ｖｎ
を含む文書集合６３にも含まれている。しかも、関連語
Ｖｎを含む文書集合６３の方が、多量の文書を有してい
る。[0100] (C) shows the relationship between the document set 63 including the related word Vn included in upper (Vm) and the document set 73 including the related word Vm. As shown in this figure, most of the document set 73 including the related word Vm is the related word Vn.
It is also included in the document set 63 including. Moreover, the document set 63 including the related word Vn has a larger number of documents.

【０１０１】（Ｄ）は、ｌｏｗｅｒ（Ｖｍ）に含まれる
関連語Ｖｎを含む文書集合６４と、関連語Ｖｍを含む文
書集合７４との関係を示している。この図のように、関
連語Ｖｎを含む文書集合６４のほとんどが、関連語Ｖｍ
を含む文書集合７４にも含まれている。しかも、関連語
Ｖｎを含む文書集合７４の方が、多量の文書を有してい
る。(D) shows the relationship between the document set 64 including the related word Vn included in lower (Vm) and the document set 74 including the related word Vm. As shown in this figure, most of the document set 64 including the related word Vn is the related word Vm.
It is also included in the document set 74 including the. Moreover, the document set 74 including the related word Vn has a larger number of documents.

【０１０２】これらの関係を関連語と共にユーザに提示
すれば、検索結果が多い場合に絞り込みを行うための関
連語を、ユーザがより適切に選択できる。。次に、上記
のように分類された関連語をユーザに提示する場合のユ
ーザインタフェースについて説明する。If these relations are presented to the user together with the related words, the user can more appropriately select the related words for narrowing down when there are many search results. . Next, a user interface for presenting the related words classified as described above to the user will be described.

【０１０３】図１４は、関連語検索画面を示す図であ
る。この関連語検索画面８０は、３つのサブウィンドウ
８１〜８３に分かれている。サブウィンドウ８１は、検
索式入力用のウィンドウであり、テキスト入力フィール
ド８１ａと、検索ボタン８１ｂとが設けられている。ユ
ーザは、キーボードなどの入力装置を用いて、テキスト
入力フィールド８１ａに検索式を入力し、検索ボタン８
１ｂを押下することにより、検索指令を入力できる。こ
のサブウィンドウ８１によって検索条件受取手段４１に
対応する機能が提供される。FIG. 14 is a diagram showing a related word search screen. The related word search screen 80 is divided into three sub windows 81 to 83. The sub window 81 is a window for inputting a search expression, and has a text input field 81a and a search button 81b. The user uses a keyboard or other input device to enter a search expression in the text input field 81a and press the search button 8
A search command can be input by pressing 1b. The sub window 81 provides a function corresponding to the search condition receiving means 41.

【０１０４】サブウィンドウ８２には、関連語表示フィ
ールド８２ａと関連語関係表示フィールド８２ｂとが設
けられている。関連語表示フィールド８２ａには、テキ
スト入力フィールド８１ａに入力された検索式に関連す
る関連語が表示される。関連語関係表示フィールド８２
ｂには、関連語表示フィールド８２ａ内の選択された関
連語に対して、所定の関係を有する関連語が表示され
る。The sub window 82 is provided with a related word display field 82a and a related word relation display field 82b. The related word display field 82a displays related words related to the search expression input in the text input field 81a. Related term relation display field 82
In b, a related word having a predetermined relationship with the selected related word in the related word display field 82a is displayed.

【０１０５】サブウィンドウ８３は、検索結果表示用の
ウィンドウである。このサブウィンドウ８３には、テキ
スト入力フィールド８１ａに入力された検索式に適合す
る文書情報が表示される。The sub window 83 is a window for displaying a search result. In this sub-window 83, document information that matches the search expression input in the text input field 81a is displayed.

【０１０６】このようなユーザインターフェースによ
り、文書の検索を行うユーザは、まずキーボードなどの
入力装置を用いて、テキスト入力フィールド８１ａに検
索式を入力し、検索ボタン８１ｂを押下する。すると、
検索条件受取手段４１によって、検索式が関連文書検索
装置で受け取られる。すると、ステップＳ１１〜Ｓ２３
の処理が実行される。その処理の結果得られた関連語
が、サブウィンドウ８２中の関連語表示フィールド８２
ａに表示される。また、サブウィンドウ８３には、ステ
ップＳ１２で得られた文書識別子集合Ｘに対応する文書
集合の文書情報が表示される。With such a user interface, a user who searches for a document first inputs a search expression in the text input field 81a using an input device such as a keyboard and presses the search button 81b. Then,
The search condition receiving means 41 receives the search expression in the related document search device. Then, steps S11 to S23
The process of is executed. The related word obtained as a result of the processing is the related word display field 82 in the sub window 82.
It is displayed in a. Further, the sub-window 83 displays the document information of the document set corresponding to the document identifier set X obtained in step S12.

【０１０７】関連語フィールド８２ａに表示された関連
語の１つを選択することにより、ステップＳ２１〜Ｓ２
３の処理が行われる。その結果、指定された関連語のｕ
ｐｐｅｒ（Ｖｍ），ｌｏｗｅｒ（Ｖｍ），ｅｑｕｉｖａ
ｌｅｎｔ（Ｖｍ），ｓｉｍｉｌａｒ（Ｖｍ）に属する関
連語が得られる。これらの関連語は、関連語関係表示フ
ィールド８２ｂに表示される。By selecting one of the related words displayed in the related word field 82a, steps S21 to S2 are performed.
Process 3 is performed. As a result, the specified related word u
pper (Vm), lower (Vm), equiva
Related words belonging to lent (Vm) and similar (Vm) are obtained. These related words are displayed in the related word relation display field 82b.

【０１０８】図１５は、「飛行機」を検索式として入力
した場合の表示例を示す図である。テキスト入力フィー
ルド８１ａには、入力された検索式「飛行機」が表示さ
れている。関連語表示フィールド８２ａには、「飛行
機」から得られた関連語が表示されている。サブウィン
ドウ８３には、「飛行機」を含む文書の文書情報が表示
されている。FIG. 15 is a diagram showing a display example when "airplane" is entered as a search expression. The input search expression "airplane" is displayed in the text input field 81a. In the related word display field 82a, related words obtained from "airplane" are displayed. In the sub window 83, the document information of the document including “airplane” is displayed.

【０１０９】この例では、検索結果が多いため、ステッ
プＳ１７の処理によって検索結果中の出現数が高い単語
を関連語の候補として関連語計算が行われる。したがっ
て、表示された関連語を用いて絞り込みを行った場合
（元の検索式に論理積演算子で接続した場合）でも、過
度の絞り込みとはならず、検索漏れの極端な増大を防止
できる。In this example, since there are many retrieval results, the processing of step S17 performs the related word calculation by using the word having a high occurrence number in the search results as the candidate of the related word. Therefore, even when narrowing down is performed using the displayed related words (when the original search expression is connected by the logical product operator), the narrowing down does not occur excessively and it is possible to prevent an excessive increase in search omission.

【０１１０】図１６は、関連語「主翼」を指定した場合
の表示例を示す図である。ユーザが、関連語表示フィー
ルド８２ａの中の「主翼」をマウスカーソルで指定する
と、関連語関係表示フィールド８２ｂには、「主翼」と
の間に所定の関係（ｕｐｐｅｒ，ｌｏｗｅｒ，ｅｑｕｉ
ｖａｌｅｎｔ，ｓｉｍｉｌａｒ）を有する関連語が表示
される。ユーザは、これらの関係を参照しながら、絞り
込みを行うための検索式を作成する。これにより、適切
な検索式を作成できる。FIG. 16 is a diagram showing a display example when the related word "main wing" is designated. When the user specifies "main wing" in the related word display field 82a with the mouse cursor, a predetermined relationship (upper, lower, equi) with "main wing" is displayed in the related word relationship display field 82b.
Related words with (valent, similar) are displayed. The user creates a search formula for narrowing down the search while referring to these relationships. Thereby, an appropriate search expression can be created.

【０１１１】図１７は、「（戦闘機ａｎｄコックピット
ａｎｄミサイル）」を検索式として入力した場合の表示
例を示す図である。テキスト入力フィールド８１ａに
は、入力された検索式「（戦闘機ａｎｄコックピットａ
ｎｄミサイル）」が表示されている。関連語表示フィー
ルド８２ａには、「（戦闘機ａｎｄコックピットａｎｄ
ミサイル）」から得られた関連語が表示されている。サ
ブウィンドウ８３には、「戦闘機」、「コックピッ
ト」、「ミサイル」の全てを含む文書の文書情報が表示
されている。FIG. 17 is a diagram showing a display example when "(fighter and cockpit and missile)" is entered as a search formula. In the text input field 81a, the entered search formula "(fighter and cockpit a
nd missile) "is displayed. In the related word display field 82a, "(fighter and cockpit and
Related words obtained from "missile)" are displayed. In the sub-window 83, document information of a document including all of "fighter", "cockpit", and "missile" is displayed.

【０１１２】この場合、検索結果が少ないため、ステッ
プＳ１７の処理によって、検索結果中の出現頻度が少な
い単語も含めて関連語計算が行われる。したがって、詳
細な関連語が表示されることになり、ユーザが必要と思
われる関連語を検索式に追加（論理和演算子で接続）す
ることによって、ノイズ（検索目的に合致しない文書）
の少ない検索結果を得ることが可能となる。In this case, since the number of retrieval results is small, the related word calculation is performed by the process of step S17, including the words having a low appearance frequency in the retrieval results. Therefore, detailed related words will be displayed, and by adding related words that the user thinks necessary to the search expression (connecting with a logical sum operator), noise (documents that do not match the search purpose)
It is possible to obtain search results with less.

【０１１３】[0113]

【発明の効果】以上説明したように本発明の関連語提示
装置では、文書検索手段が取得した文書集合の文書数
と、出現数計算手段から得られた各関連語の出現数とを
変数とする計算式を用いて、表示すべき関連語を選択す
るようにしたため、検索条件に適合する文書の多少や、
関連語の出現頻度に応じて表示される関連語を変えるこ
とができる。そのため、検索結果の絞り込みに有効な関
連語を選択的に提示することが可能となり、ユーザは、
検索漏れの軽減と検索結果の絞り込みの両目的を効果的
に達成できる。特に、文書集合の文書数の増加に従い値
が段階的に増加する下限値を定め、出現数が下限値以上
である関連語が選択されるような計算式を用いて、表示
すべき関連語を選択するようにしたので、文書数に対応
した関連語を適切に絞り込むことが可能になる。 As described above, in the related word presentation device of the present invention, the number of documents in the document set acquired by the document search means and the number of appearances of each related word obtained from the appearance number calculation means are used as variables. Since the related words to be displayed are selected by using the calculation formula, the number of documents that meet the search conditions,
The displayed related word can be changed according to the appearance frequency of the related word. Therefore, it becomes possible to selectively present related words effective for narrowing down the search results, and the user can
Both the purpose of reducing omission of search and narrowing down of search results can be effectively achieved. In particular, the value increases as the number of documents in the document set increases.
Defines a lower limit that increases in stages, and the number of occurrences is greater than or equal to the lower limit
Display using a calculation formula that selects a related word that is
Corresponding to the number of documents by selecting related words that should be
It is possible to appropriately narrow down the related words.

【０１１４】[0114]

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の第１の原理構成図である。FIG. 1 is a first principle configuration diagram of the present invention.

【図２】本発明の第２の原理構成図である。FIG. 2 is a second principle configuration diagram of the present invention.

【図３】本発明の第３の原理構成図である。FIG. 3 is a third principle configuration diagram of the present invention.

【図４】本発明の実施の形態の構成を示すブロック図で
ある。FIG. 4 is a block diagram showing a configuration of an embodiment of the present invention.

【図５】形態素解析手段に格納される形態素解析結果リ
ストの例を示す図である。FIG. 5 is a diagram showing an example of a morphological analysis result list stored in a morphological analysis unit.

【図６】単語−単語識別子リストの例を示す図である。FIG. 6 is a diagram showing an example of a word-word identifier list.

【図７】単語識別子−文書識別子リストの例を示す図で
ある。FIG. 7 is a diagram showing an example of a word identifier-document identifier list.

【図８】文書識別子−単語識別子リストの例を示す図で
ある。FIG. 8 is a diagram showing an example of a document identifier-word identifier list.

【図９】索引構造の生成アルゴリズムを示すフローチャ
ートである。FIG. 9 is a flowchart showing an index structure generation algorithm.

【図１０】本発明の処理手順を示すフローチャートの前
半でありFIG. 10 is the first half of a flowchart showing the processing procedure of the present invention.

【図１１】本発明の処理手順を示すフローチャートの後
半である。FIG. 11 is the second half of the flowchart showing the processing procedure of the present invention.

【図１２】関連語間関連度の算出手順を示すフローチャ
ートである。FIG. 12 is a flowchart showing a procedure for calculating a degree of association between related words.

【図１３】関連語ＶｎとＶｍとの関係を概念的に示す図
である。（Ａ）はｅｑｕｉｖａｌｅｎｔ（Ｖｍ）に含ま
れる関連語Ｖｎを含む文書集合と関連語Ｖｍを含む文書
集合との関係を示している。（Ｂ）はｓｉｍｉｌａｒ
（Ｖｍ）に含まれる関連語Ｖｎを含む文書集合と関連語
Ｖｍを含む文書集合との関係を示している。（Ｃ）はｕ
ｐｐｅｒ（Ｖｍ）に含まれる関連語Ｖｎを含む文書集合
と関連語Ｖｍを含む文書集合との関係を示している。
（Ｄ）はｌｏｗｅｒ（Ｖｍ）に含まれる関連語Ｖｎを含
む文書集合と関連語Ｖｍを含む文書集合との関係を示し
ている。FIG. 13 is a diagram conceptually showing the relationship between related words Vn and Vm. (A) shows the relationship between the document set including the related word Vn included in the equivalent (Vm) and the document set including the related word Vm. (B) is similar
The relationship between the document set including the related word Vn included in (Vm) and the document set including the related word Vm is shown. (C) is u
The relationship between the document set including the related word Vn included in pper (Vm) and the document set including the related word Vm is shown.
(D) shows the relationship between the document set including the related word Vn included in lower (Vm) and the document set including the related word Vm.

【図１４】関連語検索画面を示す図である。FIG. 14 is a diagram showing a related word search screen.

【図１５】「飛行機」を検索式として入力した場合の表
示例を示す図である。FIG. 15 is a diagram showing a display example when “airplane” is input as a search formula.

【図１６】関連語「主翼」を指定した場合の表示例を示
す図である。FIG. 16 is a diagram showing a display example when a related word “main wing” is designated.

【図１７】「（戦闘機ａｎｄコックピットａｎｄミサイ
ル）」を検索式として入力した場合の表示例を示す図で
ある。FIG. 17 is a diagram showing a display example when “(fighter and cockpit and missile)” is entered as a search formula.

[Explanation of symbols]

１文書格納手段２検索条件受取手段３文書検索手段４関連語計算手段５出現数計算手段６関連語選定手段７関連語表示手段 1 Document storage means 2 Search condition receiving means 3 Document search means 4 Related word calculation means 5 Number of appearances calculation means 6 Related word selection means 7 Related word display means

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ─────────────────────────────────────────────────── ─── Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30

Claims

(57) [Claims]

1. A related word presentation device for presenting a word related to a search condition, comprising: a document storage means for storing a plurality of documents; a search condition receiving means for receiving an input search condition; and the search condition receiving means. A document search unit that acquires a document set that matches the received search condition from the document storage unit, and a relationship that acquires a related word set using a word highly relevant to the search condition received by the search condition reception unit as a related word. The number of appearances, which is the number of documents including each related word in the related word set acquired by the related word calculation unit in the document set obtained from the word calculation unit and the document search unit, The number-of-appearance calculating means for calculating each number , and the increase in the number of documents in the document set acquired by the document searching means.
Therefore, the lower limit value that the value increases gradually is set, and the number of appearances is
A related word selecting means for selecting a related word to be displayed by using a calculation formula for selecting a related word having a lower limit value or more, and a related word for displaying the related word selected by the related word selecting means on a display device. A related word presentation device comprising: a word display means.

2. A relation for presenting a word related to a search condition.
In terms presentation device, adapted to a plurality and document storage means for storing documents, a search condition receiving means for receiving an input search condition, the search condition the retrieval condition receiving means has received
Document search means for acquiring a document set from the document storage means
Of the search conditions received by the search condition receiving means.
Related words for which a set of related words is acquired with high words as related words
In the calculation means and the document set obtained from the document search means, the relation
Includes each related word in the related word set acquired by the compound word calculation means
Calculate the number of occurrences, which is the number of documents, for each related word
To increase the number of appearances calculation means and the number of documents in the document set acquired by the document search means
Therefore, the upper limit value that the value decreases gradually is set, and the number of appearances is
Use a formula that selects related words that are less than or equal to the upper limit
And a related word selecting means for selecting a related word to be displayed, and displaying the related word selected by the related word selecting means on a display device.
And a related word display means for displaying the related word.

3. Relevance for presenting words related to search conditions
In terms presentation device, adapted to a plurality and document storage means for storing documents, a search condition receiving means for receiving an input search condition, the search condition the retrieval condition receiving means has received
Document search means for acquiring a document set from the document storage means
And each single document existing in the document set acquired by the document search means.
The number of occurrences, which is the number of documents containing words, is calculated for each word.
The number of appearances calculation means for calculating, the number of documents in the document set acquired by the document search means,
The number of appearances of each word obtained from the appearance number calculation means
Related word candidates that select related word candidates using a calculation formula that
Selection means and a first value which is the number of documents acquired by the document search means
And the output for each related word candidate acquired by the appearance number calculation means.
The second value which is the current number and stored in the document storage means
Number of documents that include each related word candidate
Obtain a third value for each related word candidate, and obtain the first value as
The fourth value which is the product or sum of the third value and the related value candidate
Calculated for each and based on the ratio of the second and fourth values
The search conditions and each function received by the search condition receiving means.
Calculates the degree of relevance between collocation candidates and related words with high relevance
Related word calculating means for extracting candidates as related words , and displaying related words extracted by the related word calculating means on a display device
And a related word display means for displaying the related word.

4. The related word candidate selection means is configured to detect the document.
The value increases as the number of documents in the document set acquired by the search means increases.
The lower limit that increases in a hierarchical manner is set, and the number of occurrences is greater than or equal to the lower limit.
Display using a calculation formula that selects related words
4. The function according to claim 3, wherein a related word is selected.
Multiple word presentation device.

5. The related word candidate selection means is configured to detect the document.
The value increases as the number of documents in the document set acquired by the search means increases.
An upper limit value that decreases hierarchically is set, and the number of appearances is less than or equal to the upper limit value.
Display using a calculation formula that selects related words
4. The function according to claim 3, wherein a related word is selected.
Multiple word presentation device.

6. The related word calculation means is the document storage device.
The first value, where M is the number of all documents stored in the column
Is α, the second value for each related word candidate is β, for each related word candidate
A third value when the γ of the following formula, extended mutual information = log2 extended mutual information value obtained by {(Mβ) / (αγ) }, the search condition receiving
Relationship between the search condition received by the means and each related word candidate
4. The related word presentation device according to claim 3, wherein
Place

7. The related word calculation means is the document storage device.
The first value, where M is the number of all documents stored in the column
Is α, the second value for each related word candidate is β, for each related word candidate
When the third value of γ is γ , the search condition receiving means obtains the value of the extended TS obtained by the extended TS (t-score) = M {(Mβ-αγ) / (αγ)}.
The degree of relevance between the received search conditions and each related word candidate
4. The related word presentation device according to claim 3, wherein:

8. The related word calculation means sets the first value to α,
The second value for each related word candidate is β, and the third value for each related word candidate
When the value of is set to γ , the search condition receiving means gives the value of the extended DC obtained by the following calculation formula, extended DC (Dice-coefficient) = 2β / (α + γ).
The degree of relevance between the received search conditions and each related word candidate
4. The related word presentation device according to claim 3, wherein:

9. When any related word is specified, it is specified.
The specific related word and the relationship obtained from the related word calculating means
The number of simultaneous occurrences with other related words in the word set is determined by the document search.
Related Word Pair Occurrence Calculated by Document Set Obtained by Means
A number calculation means, the number of appearances of each related word obtained from the appearance number calculation means,
A specific related word obtained from the related word pair appearance number calculation means,
In the calculation formula with the number of simultaneous occurrences with other related words as a variable
Calculate the degree of association between related words based on
Relevance between related words to extract related words with high relevance between related words
And a calculating means, wherein the related degree displaying means is extracted by the related word calculating means.
The related words are displayed on the display device, and
The related word extracted by the degree calculation means is also displayed on the display device.
That, related word presentation device according to claim 3, wherein a.

10. The presentation of words related to a search condition is controlled.
Write a program for presenting related words for computer
In recording a medium compatible with a plurality of document storage means for storing a document, retrieval condition receiving means for receiving an input search condition, the search condition the retrieval condition receiving means has received
A document searcher that acquires a document set from the document storage means
If the search conditions received by the search condition receiving means are related to each other,
Related words for which a set of related words is acquired with high words as related words
In the document set obtained from the calculation means and the document search means, the relation
Includes each related word in the related word set acquired by the compound word calculation means
Calculate the number of occurrences, which is the number of documents, for each related word
To increase the number of documents in the document set acquired by the number-of-appearances calculation unit and the document search unit.
Therefore, the lower limit value that the value increases gradually is set, and the number of appearances is
Use a formula that selects related words that are greater than or equal to the lower limit
Associated word selecting means for selecting an associated word to be displayed, and displaying the associated word selected by the associated word selecting means on a display device.
Make a computer function as a related word display means
A medium on which a program for presenting related words is recorded.

11. The presentation of words related to a search condition is controlled.
Write a program for presenting related words for computer
In recording a medium compatible with a plurality of document storage means for storing a document, retrieval condition receiving means for receiving an input search condition, the search condition the retrieval condition receiving means has received
A document searcher that acquires a document set from the document storage means
Stage, each of the single present in the document set in which the document retrieving means acquires
The number of occurrences, which is the number of documents containing words, is calculated for each word.
The number of appearances calculation means for calculating, the number of documents in the document set acquired by the document search means,
The number of appearances of each word obtained from the appearance number calculation means
Related word candidates that select related word candidates using a calculation formula that
Selection means, a first value which is the number of documents acquired by the document search means
And the output for each related word candidate acquired by the appearance number calculation means.
The second value which is the current number and stored in the document storage means
Number of documents that include each related word candidate
Obtain a third value for each related word candidate, and obtain the first value as
The fourth value which is the product or sum of the third value and the related value candidate
Calculated for each and based on the ratio of the second and fourth values
The search conditions and each function received by the search condition receiving means.
Calculates the degree of relevance between collocation candidates and related words with high relevance
Related word calculating means for extracting candidates as related words , and displaying related words extracted by the related word calculating means on a display device
Make a computer function as a related word display means
A medium on which a program for presenting related words is recorded.