JP3563682B2

JP3563682B2 - Next search candidate word presentation method and apparatus, and recording medium storing next search candidate word presentation program

Info

Publication number: JP3563682B2
Application number: JP2000277195A
Authority: JP
Inventors: 正之杉崎; 俊朗牧野; 大二郎森; 博人稲垣
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-09-12
Filing date: 2000-09-12
Publication date: 2004-09-08
Anticipated expiration: 2020-09-12
Also published as: JP2002092032A

Description

【０００１】
【発明の属する技術分野】
本発明は、大量の文書から所望の情報を検索する場合において再検索を支援するための次検索候補単語を過去に検索要求として入力された検索語から抽出して提示する次検索候補単語提示方法および装置と次検索候補単語提示プログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
近年、インターネットなどのコンピュータネットワークを通じて、大量の電子化された文書をやり取りしたり、不特定多数を対象にした情報発信ができるようになっている。そのため、そのような文書情報を対象に、個人が必要とする情報を検索できるようなサービスが不特定多数を対象にネットワーク上で実現されている。
【０００３】
しかし、自分が欲しい結果を得ることができない場合が多い。その原因として、その結果を得るために用いた検索語が不十分であったり、検索語の表記の揺れ（例えば、英語表記とカタカナ表記：「ｓｅａｒｃｈｅｎｇｉｎｅ」「サーチエンジン」など）のためであったりする。それらを解消する方法として、検索サービス側から検索できる単語を次の検索時に利用できるように提案する手法が検討されている（「絞り込み検索語候補の抽出に関する一検討」、井上他、情報処理学会第５６回全国大会、第３分冊、ｐｐ．３−９５，１９９８）。
【０００４】
次検索に利用される候補となる単語（次検索候補単語）を用意する方法として、大きく分けて３種類あると考えられる。（１）検索対象と全く別に独自に作成したもの、（２）文書内から抽出したもの、（３）検索サービス利用者の実際の入力から抽出したものである。（１）は、百科事典などのように検索対象が大きく変化しない場合は、表記の揺れを吸収するような辞書を用意しておく価値はあると思われるが、インターネットのＷｅｂページのように検索対象が日々変化する場合には不向きである。（２）は、検索対象内に存在する単語を提示するため、次検索時に単語を追加することで検索結果が非常に少なくなるような場合は避けることができるが、利用者からみれば検索しようとしている単語が提示されるとは限らず不便な場合がある。（３）は、次検索時に自分以外の検索サービス利用者が入力した同じような単語を見ることができるため、利用者にとっては理解しやすい検索語をサービス提供側から提供できる。
【０００５】
また、検索入力から入力された単語と入力された時間、および検索入力した人を特定するためのＩＤから、検索語のグループを生成する手法は公知の事実である（「ＷＷＷ検索ログに基づく情報ニーズの抽出」、大久保他、情報処理学会論文誌、Ｖｏｌ３９，Ｎｏ．７，ｐｐ．２２５０−２２５８，１９９８）。その利用方法としては、検索結果を返す際に、入力された検索語を含むグループが抽出されたグループ内に存在するかどうかを調べ、存在する場合はそのグループ内に存在する単語を次検索候補単語として表示する。
【０００６】
【発明が解決しようとする課題】
しかし、単純にグループ化された単語は確かに関連があるが、その中には表記の揺れによる単語やなんらかの特定の情報を求めようとするための単語が混合した状態で抽出されるため、次検索時の適切な単語を選択することが困難であるという問題がある。
【０００７】
本発明は、上記に鑑みてなされたもので、その目的とするところは、次検索候補単語を提示する場合にグループ内で単語を分類してより細分化したグループを生成して提示する次検索候補単語提示方法および装置と次検索候補単語提示プログラムを記録した記録媒体を提供することにある。
【０００８】
【課題を解決するための手段】
上記目的を達成するため、第１の本発明は、大量の文書から所望の情報を検索する場合において再検索を支援するための次検索候補単語を過去に検索要求として入力された検索語から抽出して提示する次検索候補単語提示方法であって、入力された検索語を該検索語の入力時刻、検索者を特定するための検索者識別情報とともに検索入力履歴情報として記録し、この記録された検索入力履歴情報を分析し、検索語間の関連度を計算し、この計算した検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出し、この抽出した次検索候補単語を提示し、前記検索語間の関連度の計算処理では、前記検索入力履歴情報として記録されている検索語、入力時刻、検索者識別情報、および検索結果から選択した項目の情報を用いて、前記検索語間の間隔関連度、選択関連度、および検索語間の距離を計算し、前記次検索候補単語の抽出処理では、前記計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更にこの抽出された次検索候補単語を除いた検索語から、前記計算した間隔関連度および選択関連度を用いて次検索候補単語を抽出し、更にこの抽出した次検索候補単語を前記計算した検索語間の距離に基づきグループ化することを要旨とする。
【０００９】
本発明にあっては、入力された検索語を入力時刻、検索者識別情報とともに検索入力履歴情報として記録し、この検索入力履歴情報を分析し、検索語間の関連度を計算し、この検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出して提示するため、検索利用者はこの提示された次検索候補単語を利用することにより所望の情報に容易に辿りつくことができる。
【００１１】
また、本発明にあっては、検索入力履歴情報を用いて、検索語間の間隔関連度、選択関連度、および検索語間の距離を計算し、この計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更に間隔関連度および選択関連度を用いて次検索候補単語を抽出し、この抽出した次検索候補単語を検索語間の距離に基づきグループ化するため、次検索時に使用し得る適切な単語を適確に抽出することができる。
【００１２】
第２の本発明は、大量の文書から所望の情報を検索する場合において再検索を支援するための次検索候補単語を過去に検索要求として入力された検索語から抽出して提示する次検索候補単語提示装置であって、入力された検索語を該検索語の入力時刻、検索者を特定するための検索者識別情報とともに検索入力履歴情報として記録する検索語記録手段と、この記録された検索入力履歴情報を分析し、検索語間の関連度を計算する検索語解析手段と、この計算した検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出する次検索候補単語抽出手段と、この抽出した次検索候補単語を提示する次検索候補単語提示手段とを有し、前記検索語計算手段は、前記検索入力履歴情報として記録されている検索語、入力時刻、検索者識別情報、および検索結果から選択した項目の情報を用いて、前記検索語間の間隔関連度、選択関連度、および検索語間の距離を計算する手段を有し、前記次検索候補単語抽出手段は、前記計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更にこの抽出された次検索候補単語を除いた検索語から、前記計算した間隔関連度および選択関連度を用いて次検索候補単語を抽出し、更にこの抽出した次検索候補単語を前記計算した検索語間の距離に基づきグループ化する手段を有することを要旨とする。
【００１３】
本発明にあっては、入力された検索語を入力時刻、検索者識別情報とともに検索入力履歴情報として記録し、この検索入力履歴情報を分析し、検索語間の関連度を計算し、この検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出して提示するため、検索利用者はこの提示された次検索候補単語を利用することにより所望の情報に容易に辿りつくことができるため、次検索時に使用し得る適切な単語を適確に抽出することができる。
【００１５】
また、本発明にあっては、検索入力履歴情報を用いて、検索語間の間隔関連度、選択関連度、および検索語間の距離を計算し、この計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更に間隔関連度および選択関連度を用いて次検索候補単語を抽出し、この抽出した次検索候補単語を検索語間の距離に基づきグループ化する。
【００１６】
第３の本発明は、大量の文書から所望の情報を検索する場合において再検索を支援するための次検索候補単語を過去に検索要求として入力された検索語から抽出して提示する次検索候補単語提示プログラムを記録した記録媒体であって、検索語記録手段に入力された検索語を該検索語の入力時刻、検索者を特定するための検索者識別情報とともに検索入力履歴情報として記録させ、検索語解析手段にこの記録された検索入力履歴情報を分析させ、検索語間の関連度を計算させ、次検索候補単語抽出手段にこの計算した検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出させ、次検索候補単語提示手段にこの抽出した次検索候補単語を提示させ、前記検索語計算手段に、前記検索入力履歴情報として記録されている検索語、入力時刻、検索者識別情報、および検索結果から選択した項目の情報を用いて、前記検索語間の間隔関連度、選択関連度、および検索語間の距離を計算させ、前記次検索候補単語抽出手段に、前記計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出させ、更にこの抽出された次検索候補単語を除いた検索語から、前記計算した間隔関連度および選択関連度を用いて次検索候補単語を抽出させ、更にこの抽出した次検索候補単語を前記計算した検索語間の距離に基づきグループ化させる処理を実行する次検索候補単語提示プログラムを記録媒体に記録することを要旨とする。
【００１７】
本発明にあっては、入力された検索語を入力時刻、検索者識別情報とともに検索入力履歴情報として記録し、この検索入力履歴情報を分析し、検索語間の関連度を計算し、この検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出して提示する次検索候補単語提示プログラムを記録媒体に記録しているため、該記録媒体を用いて、その流通性を高めることができる。
【００１９】
また、本発明にあっては、検索入力履歴情報を用いて、検索語間の間隔関連度、選択関連度、および検索語間の距離を計算し、この計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更に間隔関連度および選択関連度を用いて次検索候補単語を抽出し、この抽出した次検索候補単語を検索語間の距離に基づきグループ化する次検索候補単語提示プログラムを記録媒体に記録しているため、該記録媒体を用いて、その流通性を高めることができる。
【００２０】
【発明の実施の形態】
最初に、本発明の次検索候補単語提示方法の基本概念として検索サービスを利用する複数の人の検索入力履歴から単語のグループ化を行なう方法について説明する。
【００２１】
まず、検索入力履歴として、（１）検索入力された単語、（２）入力された時間、（３）検索入力した人を特定するためのＩＤを取得する。同一の利用者によって利用された検索語は、その使用時間間隔が短ければ同じ情報を求めるために、長ければ別の情報を求めるために、それぞれ利用された可能性が高いと仮定できる。よって、利用者ｉの検索語ｘ，ｙの使用時間差の最小値をｔｍｉｎ_ｉ［ｘ，ｙ］とし、単語ｘ，ｙの間隔関連度をＴ_ｘｙとすと
【数１】

として定義する。ただし、０＜ｔ_１＜ｔ_２であり、ａは実数値とする。
【００２２】
また、上記の検索入力履歴に加え、（４）検索入力された単語に対する検索結果から選択した項目を取得し、単語間の関連度の計算に利用する。（４）の情報を用いて、単語ｘ，ｙの選択関連度ＣＬ_ｘｙを、
【数２】

として定義する。
【００２３】
次に、単語間の関連度を用いて単語をグループ化する。その方法として（１）単語間の距離（類似度）を定義し、（２）距離の近い単語から１つのグループに割り当てる。
【００２４】
単語間の距離は、当然、上記で求めた間隔関連度Ｔ_ｘｙや選択関連度ＣＬ_ｘｙなどをそのまま使うこともできるし、別の定義の方法もある。例えば、単語ｘ，ｙの距離Ｄｉｓ_ｘｙを、
【数３】

といった、いわゆるベクトルの内積（三角関数のｃｏｓθ）とし、値が大きい方が距離が近いと定義することができる。
【００２５】
検索入力に対する間隔関連度のみを用いた距離Ｄｉｓ_ｘｙの特徴として、表記の揺れによって入力された単語、あるいは別の単語に容易に置換可能な単語の距離が近くなる（例えば、「コンピュータ」「コンピューター」、「旅行」「国際旅行」「国内旅行」、「プロバイダ」「プロバイダー」など）。なぜなら、間隔関連度のみを用いた距離Ｄｉｓ_ｘｙは、同じような分布を示す単語同士が値が大きくなる傾向があるからである。例をあげると、実際の入力として「旅行格安」「海外旅行格安」「国内旅行格安」という入力があれば「旅行」「海外旅行」「国内旅行」は同じような単語の使われ方をし、これらの距離の値は大きくなる。
【００２６】
また、間隔関連度や選択関連度をそのまま単語間の距離として用いた場合、より詳細な検索を実現するための追加単語（例えば、「携帯電話」に対し「着信メロディ」「通話料金」「割引」「通話エリア」「着メロ」など）の距離が近くなる。
【００２７】
（２）のグループ分けは、検索入力に対し距離の近い（すなわち、関連のある）単語を単純に並べて表示するのではなく、距離の近い単語集合内でもさらに距離の近い単語同士をグループ化して表示しようとするものである。例えば、先ほどの「携帯電話」と関連がある単語の例を用いると、「着信メロディ」「着メロ」、「通話料金」「割引」、「通話エリア」の３つのグループに分けて提示する。その方法として、まず、候補を出す検索語と関連のある単語を間隔関連度あるいは選択関連度を用いて抽出する。次に、抽出された単語グループ内の単語同士の距離Ｄｉｓ_ｘｙを求めて、グループ分けを行なう。
【００２８】
グループ分けをした結果の利用法として、例えば、表示の際にグループ化された単語集合の中から代表となる単語を抽出し提示することで、重複した不必要な次検索候補単語の提示を避けることが可能となる。
【００２９】
以上、これらの関連度を使うことで、次検索のための単語として検索サービス利用者に対してどのような単語を追加することが必要かを容易に検討できるようになる。
【００３０】
次に、図面を用いて、本発明の実施形態について説明する。図１は、本発明の一実施形態に係る次検索候補単語提示装置の構成を示すブロック図である。図１に示す次検索候補単語提示装置は、通常の検索サービスを行なっている時に入力された検索語を該検索語の入力時刻、コンピュータのアドレス、検索者を特定するための検索者識別情報である個人ＩＤ、および検索結果内で選択した項目の情報などとともに検索入力履歴情報として図２に示すように記録する検索語記録部１、この検索語記録部１において記録した検索入力履歴情報を分析し、検索時に入力された検索語間の関連度、具体的には前記単語間の間隔関連度Ｔ_ｘｙ、単語間の選択関連度ＣＬ_ｘｙ、単語間の距離Ｄｉｓ_ｘｙを計算する検索語解析部３、検索サービスに対して入力された検索語を受け入れる検索語入力部５、前記検索語解析部３で計算した検索語間の関連度および検索語入力部５で受け入れた入力検索語を用いて、次検索候補として適した単語グループの抽出を行なう次検索候補作成部７、およびこの次検索候補作成部７で抽出した単語のグループを次検索時に利用できるように出力する次検索候補出力部９から構成されている。なお、検索語入力部５から入力された検索語は自動的に検索語記録部１に記録され、検索語解析部３での処理時に利用されることは勿論のことである。
【００３１】
検索語解析部３は、上述したように検索語記録部１で記録した検索入力履歴情報を分析して単語間の間隔関連度Ｔ_ｘｙ、単語間の選択関連度ＣＬ_ｘｙ、単語間の距離Ｄｉｓ_ｘｙを計算するとともに、抽出された全単語の集合をＬＷとする。この時、検索語記録部１で記録した情報は、記録を開始した時からある時点までのすべての検索入力の記録を利用する場合もあるし、またはある期間だけに限定して利用する場合も考えられる。
【００３２】
期間を限定する例としては、例えば「各週の日曜日の０：００からそれ以前の２週間前の日曜日の０：００までの検索入力の記録から関連度を求める」といった限定方法がある。また、記録を開始した時からある時点までのすべての検索入力の記録を利用する方法では、長期にわたり普遍的に利用された単語間の関連度が高くなる傾向があり（例えば、距離Ｄｉｓ_ｘｙを用いると「映画」と「映画館」）、期間を限定する方法では、短期間に利用された単語間の関連度が高くなる傾向がある（例えば、距離Ｄｉｓ_ｘｙを用いると「映画」と「（流行している映画名）」）。
【００３３】
図３は、検索語解析部３で計算された単語間の関連度を示す図である。図３（ａ）は、検索語記録部１で記録された図２に示したような検索語の記録から抽出された単語を示すものであり、図３（ｂ）は、このように抽出された各単語間の関連度を記録した関連語テーブルを示しているものである。図３（ｂ）に示す関連語テーブルは、間隔関連度Ｔ_ｘｙ、選択関連度ＣＬ_ｘｙ、距離Ｄｉｓ_ｘｙのそれぞれ毎に同様なテーブルが作成される。
【００３４】
選択関連度について図４を参照して説明する。例えば、「桜の名所」と「桜の花見」という２つの検索入力を考えると、この検索入力では同じ単語を含む入力を行なっているため、短期間に検索入力として投入されることは少なく、間隔関連度では、このような単語の関連度を計算できない。そこで、検索結果内から選択した項目を見ることにより、要求として関連があるかどうか判断するための関数が選択関連度である。すなわち、検索結果内で選択された項目が同じである割合が多い単語同士は関連があるとするものである。
【００３５】
次検索候補作成部７は、検索語解析部３の解析結果である検索語間の関連度と検索語入力部５からの入力検索語から次検索候補として適した単語グループを抽出するが、更に具体的には図５および図６のフローチャートに示すような処理を行なう。
【００３６】
次検索候補作成部７は、図５に示すフローチャートでは、まず検索語入力部５からの検索入力ｋに対して検索語解析部３で計算した距離Ｄｉｓ_ｋｙが第１の閾値ｔｈ１以上の上位ｎ１個の単語を単語集合ＬＷから抽出し、この抽出結果をＲｅｓ１に保存する（ステップＳ１１）。
【００３７】
それから、このＲｅｓ１に保存された単語を除いて、検索入力ｋに対して検索語解析部３で計算した間隔関連度Ｔ_ｋｙが第２の閾値ｔｈ２以上の上位ｎ２個の単語を単語集合ＬＷから抽出し、この抽出結果をＲｅｓ２に保存する（ステップＳ１３）。また更に、Ｒｅｓ１，Ｒｅｓ２に保存された単語を除いて、検索入力ｋに対して選択関連度ＣＬ_ｋｙが第３の閾値ｔｈ３以上の上位ｎ３個の単語を単語集合ＬＷから抽出し、この抽出結果をＲｅｓ３に保存する（ステップＳ１５）。
【００３８】
この結果、検索語入力部５からの入力検索語に対して、単語集合ＬＷから関連度の高い単語として、所定の閾値以上の距離Ｄｉｓ_ｋｙ、間隔関連度Ｔ_ｋｙ、選択関連度ＣＬ_ｋｙを有する単語がそれぞれＲｅｓ１，Ｒｅｓ２，Ｒｅｓ３に保存されることになる。なお、この処理では、検索語解析部３で抽出した図３（ｂ）に示すような関連語テーブルの値を用いることにより、関連度の高い単語を容易に見つけることができる。
【００３９】
また、次検索候補作成部７は、図６のフローチャートに示すように、単語集合Ｒｅｓ２内に保存された単語を更に分類してグループ化する。すなわち、この図６に示す処理では、単語間の距離Ｄｉｓ_ｋｙを用いるため、集合Ｒｅｓ２内の単語ｗｊの特徴ベクトルＷＶｊ＝（Ｔｊ１，…，Ｔｊｎ）を求める（ステップＳ２１）。なお、Ｔｉｊは間隔関連度である。それから、集合Ｒｅｓ２内から距離Ｄｉｓ_ｘｙを用いて、最も関連した単語の組（ｘ，ｙ）を探索する（ステップＳ２３）。そして、終了条件を判定して（ステップＳ２５）、終了条件が満たされていない場合には、新たなクラスタを生成して集合Ｒｅｓ２に登録する（ステップＳ２７）。それから、集合Ｒｅｓ２内の単語ｘ，ｙの情報を削除し（ステップＳ２９）、ステップＳ２３に戻り、終了条件が満たされるまで、同じ処理を繰り返して、Ｒｅｓ２内の単語をグループ化する。
【００４０】
すなわち、この処理では、分類のために間隔関連度Ｔ_ｘｙを用いた距離Ｄｉｓ_ｘｙを使用し、単語間の関連度を定義し、最も関連した単語同士から１つの集合（クラスタ）に割り当てるという方法で分類を行なっているものである。
【００４１】
例えば、「本」という検索入力に対して、単語集合Ｒｅｓ２には｛「検索」「カタログ」「予約」「通販」「通信販売」｝が抽出されたとすると、図６に示す次検索候補作成部７の処理によりＲｅｓ２内の単語は、｛「検索」「カタログ」「予約」｛「通販」「通信販売」｝｝というように「通販」「通信販売」を１つのクラスタに割り当てる。追加する単語として｛「通販」「通信販売」｝を選択した場合には、「本かつ（通販または通信販売）」という検索を次検索入力として生成することができるものである。
【００４２】
次に、本実施形態の次検索候補単語提示装置の処理の流れを具体的に説明する。本次検索候補単語提示装置が検索対象とする文書は、コンピュータネットワークであるインターネット上のＷＷＷページ（ＨＴＭＬファイル）とする。検索サービスを行なっている場合に、入力された検索語は該検索語の入力時刻、コンピュータのアドレス、検索者を特定するための検索者識別情報である個人ＩＤ、および検索結果内で選択した項目の情報などとともに検索入力履歴情報として検索語記録部１で図２に示すように記録される。
【００４３】
次に、検索語解析部３は、検索語記録部１で記録された検索語を分析し、単語間の関連度、すなわち単語間の間隔関連度Ｔ_ｘｙ、単語間の選択関連度ＣＬ_ｘｙ、単語間の距離Ｄｉｓ_ｘｙを計算し、図３に示すようなテーブルを間隔関連度Ｔ_ｘｙ、選択関連度ＣＬ_ｘｙ、距離Ｄｉｓ_ｘｙ毎に作成する。
【００４４】
次検索候補作成部７は、検索語解析部３で計算した単語間の関連度を用いて、検索語入力部５から入力された単語に対して再検索に有効な単語の抽出およびグループ化を行なう。
【００４５】
例えば、図７に示すように、検索語記録部１から入力された単語が「インタネット」であったとする。この入力単語に対して、単語間の間隔関連度Ｔ_ｘｙを使用した距離Ｄｉｓ_ｘｙを用いて、関連のある語を求めると、「インターネット」「Ｉｎｔｅｒｎｅｔ」などが関連語として抽出される。
【００４６】
また、単語間の間隔関連度Ｔ_ｘｙを使用して関連のある語を求めると、「設定」「説明」「パソコン」「接続方法」「料金」「プロバイダ」などの検索を絞り込むような単語が抽出される。
【００４７】
これらの抽出された単語に対して、更に単語間の間隔関連度Ｔ_ｘｙを使用した距離Ｄｉｓ_ｘｙによって分類を行なうと、図７に示すように、「料金」「プロバイダ」というグループ、「接続方法」「パソコン」「設定」というグループ、および「説明」というグループに分けられることになる。
【００４８】
また更に、単語間の選択関連度を使用して関連のある語を求めると、上記単語以外にその他の単語として、図８に示すように、「インターネットＴＶ」「ケーブルインターネット」や、更には「ＬＡＮ」「無線」「米国」「軍事ネットワーク」などのように更に多岐に渡る単語が抽出される。
【００４９】
上述したように、次検索候補作成部７で抽出された単語グループは、次検索候補出力部９に送られ、次検索候補出力部９において検索結果とともに提示されることになる。
【００５０】
なお、上記実施形態の次検索候補単語提示方法の処理手順をプログラムとして記録媒体に記録して、この記録媒体をコンピュータシステムに組み込むとともに、該記録媒体に記録されたプログラムをコンピュータシステムにダウンロードまたはインストールし、該プログラムでコンピュータシステムを作動させることにより、次検索候補単語提示方法を実施する次検索候補単語提示装置として機能させることができることは勿論であり、このような記録媒体を用いることにより、その流通性を高めることができるものである。
【００５１】
【発明の効果】
以上説明したように、本発明によれば、入力された検索語を入力時刻、検索者識別情報とともに検索入力履歴情報として記録し、この検索入力履歴情報を分析し、検索語間の関連度を計算し、この検索語間の関連度および入力された検索語を用いて、次検索候補単語を抽出して提示するので、検索利用者はこの提示された次検索候補単語を利用することにより所望の情報に容易に辿りつくことができる。
【００５２】
また、本発明によれば、検索入力履歴情報を用いて、検索語間の間隔関連度、選択関連度、および検索語間の距離を計算し、この計算した検索語間の距離を用いて、関連のある単語を次検索候補単語として抽出し、更に間隔関連度および選択関連度を用いて次検索候補単語を抽出し、この抽出した次検索候補単語を検索語間の距離に基づきグループ化するので、次検索時に使用し得る適切な単語を適確に抽出することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る次検索候補単語提示装置の構成を示すブロック図である。
【図２】図１に示す次検索候補単語提示装置に使用されている検索語記録部で記録される検索の記録例である検索入力履歴情報を示す図である。
【図３】図１に示す次検索候補単語提示装置に使用されている検索語解析部で計算される単語間の関連度を示す図である。
【図４】図１に示す次検索候補単語提示装置に使用されている検索語解析部で計算される選択関連度の考え方を示す説明図である。
【図５】図１に示す次検索候補単語提示装置に使用されている次検索候補作成部による検索入力に対する次検索候補単語抽出処理（１）を示すフローチャートである。
【図６】図１に示す次検索候補単語提示装置に使用されている次検索候補作成部による検索入力に対する次検索候補単語抽出処理（２）を示すフローチャートである。
【図７】図１に示す次検索候補単語提示装置の出力イメージ（１）を示す説明図である。
【図８】図１に示す次検索候補単語提示装置の出力イメージ（２）を示す説明図である。
【符号の説明】
１検索語記録部
３検索語解析部
５検索語入力部
７次検索候補作成部
９次検索候補出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention provides a next search candidate word presentation method for extracting and presenting a next search candidate word from a search word input as a search request in the past to support re-search when searching for desired information from a large number of documents. In addition, the present invention relates to an apparatus and a recording medium that records a next search candidate word presentation program.
[0002]
[Prior art]
2. Description of the Related Art In recent years, it has become possible to exchange a large amount of electronic documents and transmit information to an unspecified large number through a computer network such as the Internet. For this reason, a service capable of retrieving information required by an individual for such document information has been realized on an unspecified majority on a network.
[0003]
However, in many cases, they cannot get the results they want. This is because the search terms used to obtain the results are insufficient, or the search terms are swayed (for example, English and katakana: search engine, search engine, etc.). There. As a method to solve them, a method is proposed that proposes words that can be searched from the search service side so that they can be used in the next search ("A Study on Extraction of Narrowed Search Word Candidates", Inoue et al., IPSJ) 56th National Convention, Third Volume, pp. 3-95, 1998).
[0004]
It can be considered that there are roughly three types of methods for preparing words to be used in the next search (next search candidate words). (1) a document created completely independently of a search target, (2) a document extracted from a document, and (3) a document extracted from an actual input of a search service user. For (1), if the search target does not change significantly, such as in an encyclopedia, it may be worthwhile to prepare a dictionary that absorbs the sway of the notation, but it is possible to search like a Web page on the Internet. It is not suitable when the subject changes daily. (2) presents a word existing in the search target, so that it is possible to avoid a case where the search result becomes extremely small by adding a word at the next search, but from the user's point of view, try to search. Is not always presented and may be inconvenient. In the case of (3), since a similar word entered by a search service user other than the user at the time of the next search can be seen, the service provider can provide a search word that is easy for the user to understand.
[0005]
In addition, a method of generating a group of search words from a word input from a search input, an input time, and an ID for specifying a person who performed the search input is a known fact (“information based on WWW search log”). Extraction of Needs ", Okubo et al., Transactions of Information Processing Society of Japan, Vol. 39, No. 7, pp. 2250-2258, 1998). As a method of using it, when returning search results, it is checked whether the group containing the input search word exists in the extracted group, and if so, the words in that group are searched for next search candidates. Display as words.
[0006]
[Problems to be solved by the invention]
However, words that are simply grouped are certainly related, but some of them are extracted as a mixture of words due to swaying of the notation and words to seek some specific information, so There is a problem that it is difficult to select an appropriate word at the time of search.
[0007]
SUMMARY OF THE INVENTION The present invention has been made in view of the above, and an object of the present invention is to provide a next search candidate word by classifying words in a group to generate a more subdivided group and presenting it. An object of the present invention is to provide a candidate word presenting method and apparatus and a recording medium recording a next search candidate word presenting program.
[0008]
[Means for Solving the Problems]
To achieve the above objectives,FirstThe present invention provides a next search candidate word presentation for extracting and presenting a next search candidate word from a search word previously input as a search request to assist in re-searching when searching for desired information from a large number of documents. A method, recording the input search word as search input history information together with the input time of the search word, searcher identification information for specifying a searcher, analyzing the recorded search input history information, Calculate the relevance between the search terms, extract the next search candidate word using the calculated relevance between the search terms and the input search term, and present the extracted next search candidate wordIn the calculation process of the degree of relevance between the search terms, the search term, the input time, the searcher identification information recorded as the search input history information, and the information of the item selected from the search results, the search The inter-word interval relevance, the selection relevance, and the distance between search terms are calculated, and in the next search candidate word extraction process, the next search for related words is performed using the calculated distance between search terms. The next search candidate word is extracted as a candidate word, and the next search candidate word is further extracted from the search word excluding the extracted next search candidate word using the calculated interval relevance and selection relevance. Grouping words based on the calculated distance between search termsIs the gist.
[0009]
The present inventionIn, the entered search term is recorded as search input history information together with the input time and the searcher identification information, the search input history information is analyzed, the relevance between the search terms is calculated, and the search term The next search candidate word is extracted and presented using the degree of relevance and the input search word, so that the search user can easily reach desired information by using the presented next search candidate word. be able to.
[0011]
Also,In the present invention, the interval relevance between search terms, the selection relevance, and the distance between the search terms are calculated using the search input history information, and the calculated distance between the search terms is used. To extract the next search candidate word using the interval relevance and the selection relevance, and to group the extracted next search candidate words based on the distance between the search words. Thus, it is possible to accurately extract appropriate words that can be used in the next search.
[0012]
SecondThe present invention provides a next search candidate word presentation for extracting and presenting a next search candidate word from a search word previously input as a search request to assist in re-searching when searching for desired information from a large number of documents. A search word recording means for recording an input search word as search input history information together with an input time of the search word and searcher identification information for specifying a searcher; and the recorded search input history Search term analysis means for analyzing information and calculating a degree of relevance between search terms, and a next search candidate word for extracting a next search candidate word using the calculated degree of relevance between search terms and the input search term Extraction means and next search candidate word presenting means for presenting the extracted next search candidate wordThe search term calculation means, using the search terms recorded as the search input history information, input time, searcher identification information, and information of the item selected from the search results, between the search terms Means for calculating the interval relevance, the selection relevance, and the distance between the search terms, and the next search candidate word extracting means uses the calculated distance between the search terms to calculate a related word next. It is extracted as a search candidate word, and the extracted next search candidate word is further extracted. From the search terms excluding the above, the next search candidate words are extracted using the calculated interval relevance and the selection relevance, and the extracted next search candidate words are grouped based on the calculated distance between the search terms. Having meansThat is the gist.
[0013]
The present inventionIn, the entered search term is recorded as search input history information together with the input time and the searcher identification information, the search input history information is analyzed, the relevance between the search terms is calculated, and the search term The next search candidate word is extracted and presented using the degree of relevance and the input search word, so that the search user can easily reach desired information by using the presented next search candidate word. Therefore, appropriate words that can be used in the next search can be accurately extracted.
[0015]
Also,In the present invention, the interval relevance between search terms, the selection relevance, and the distance between the search terms are calculated using the search input history information, and the calculated distance between the search terms is used. Is extracted as a next search candidate word, a next search candidate word is further extracted using the interval relevance and the selection relevance, and the extracted next search candidate words are grouped based on the distance between the search words.
[0016]
ThirdThe present invention provides a next search candidate word presentation program for extracting and presenting a next search candidate word for assisting re-search when searching for desired information from a large number of documents, from a search word previously input as a search request. A recording medium on which is recordedFor search term recording meansRecord the input search word as search input history information together with the input time of the search word and searcher identification information for specifying the searcherLet,For search term analysisAnalyze this recorded search input history informationLet, Calculate relevance between search termsLet,Next search candidate word extraction meansThe next search candidate word is extracted using the calculated relevance between search words and the input search word.Let,Next search candidate word presentation meansPresent the extracted next search candidate wordThe search term calculation means uses the search term recorded as the search input history information, the input time, the searcher identification information, and the information of the item selected from the search result to determine the interval between the search terms. Degree, the degree of selection relevance, and the distance between search terms are calculated, and the next search candidate word extracting means is used to extract a related word as the next search candidate word using the calculated distance between search terms, Further, from the search word excluding the extracted next search candidate word, a next search candidate word is extracted using the calculated interval relevance and the selection relevance, and the extracted next search candidate word is further calculated by the calculated search. Perform grouping based on the distance between wordsThe gist is to record the next search candidate word presentation program on a recording medium.
[0017]
The present inventionIn, the entered search term is recorded as search input history information together with the input time and the searcher identification information, the search input history information is analyzed, the relevance between the search terms is calculated, and the search term The next search candidate word presentation program that extracts and presents the next search candidate word using the degree of relevance and the input search word is recorded on a recording medium. Can be enhanced.
[0019]
Also,In the present invention, the interval relevance between search terms, the selection relevance, and the distance between the search terms are calculated using the search input history information, and the calculated distance between the search terms is used. Is extracted as a next search candidate word, a next search candidate word is further extracted using the interval relevance and the selection relevance, and the extracted next search candidate words are grouped based on the distance between the search words. Since the search candidate word presentation program is recorded on a recording medium, the distribution can be enhanced by using the recording medium.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
First, as a basic concept of the next search candidate word presenting method of the present invention, a method of grouping words from search input histories of a plurality of persons using a search service will be described.
[0021]
First, as the search input history, (1) a word input and searched, (2) an input time, and (3) an ID for specifying a person who input the search are acquired. It can be assumed that the search terms used by the same user are more likely to be used to find the same information if the use time interval is short, and to find different information if the use time interval is long. Therefore, the minimum value of the use time difference between the search words x and y of the user i is set to tmin._i[X, y], and the interval relevance between words x, y is T_xyToss
(Equation 1)

Is defined as However, 0 <t₁<T₂And a is a real value.
[0022]
In addition, in addition to the above-mentioned search input history, (4) an item selected from a search result for the searched and input word is obtained and used for calculating the degree of association between words. Using the information of (4), the degree of selection relevance CL of words x and y_xyTo
(Equation 2)

Is defined as
[0023]
Next, words are grouped using the degree of association between words. As the method, (1) the distance (similarity) between words is defined, and (2) words with short distances are assigned to one group.
[0024]
The distance between words is naturally the interval relevance T determined above._xyAnd selection relevance CL_xyEtc. can be used as is, or there is another definition method. For example, the distance Dis between words x and y_xyTo
(Equation 3)

The so-called inner product of vectors (cos θ of a trigonometric function) can be defined, and the larger the value, the closer the distance.
[0025]
Distance Dis using only interval relevance to search input_xyAs a characteristic of, the distance between a word input due to a sway of the notation or a word that can be easily replaced with another word becomes shorter (for example, “computer”, “computer”, “travel”, “international travel”, “domestic travel” , "Provider", "provider"). Because the distance Dis using only the interval relevance_xyThis is because words having a similar distribution tend to have larger values. For example, if the actual input is “Travel cheap”, “Overseas travel cheap”, “Domestic travel cheap”, “Travel”, “Overseas travel”, and “Domestic travel” will use similar words. , These distance values increase.
[0026]
When the interval relevance or the selection relevance is used as it is as the distance between words, additional words for realizing a more detailed search (for example, “ring melody”, “call charge”, “discount for“ mobile phone ”) ), “Call area”, “ringtone” etc.).
[0027]
In the grouping of (2), words that are closer (that is, related) to the search input are not simply displayed side by side, but words that are further closer to each other are grouped in a word set that is closer to each other. What you want to display. For example, using the above example of the word related to “mobile phone”, the words are presented in three groups of “ring melody”, “ringtone”, “call rate”, “discount”, and “call area”. As the method, first, words related to a search word to be candidates are extracted using the interval relevance or the selection relevance. Next, the distance Dis between words in the extracted word group_xyAnd perform grouping.
[0028]
As a method of using the result of grouping, for example, a representative word is extracted from a group of words that are grouped at the time of display and presented, thereby avoiding presentation of redundant unnecessary next search candidate words. It becomes possible.
[0029]
As described above, by using these degrees of relevance, it is possible to easily examine what words need to be added to the search service user as words for the next search.
[0030]
Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a next search candidate word presentation device according to one embodiment of the present invention. The next search candidate word presentation device shown in FIG. 1 uses a search word input during a normal search service as input time of the search word, a computer address, and searcher identification information for specifying a searcher. A search word recording unit 1 that records as a search input history information as shown in FIG. 2 together with a certain personal ID and information on an item selected in a search result, and analyzes the search input history information recorded in the search word recording unit 1 Then, the degree of relevance between search words input at the time of search, specifically, the degree of relevance T between the words,_xy, Selectivity CL between words_xy, Distance between words Dis_xy, A search term input unit 5 that accepts a search term input to a search service, and a degree of relevance between search terms calculated by the search term analysis unit 3 and a search term input unit 5 that accepts the search term. A next search candidate creating unit 7 for extracting a word group suitable as a next search candidate using the input search word, and outputting the word group extracted by the next search candidate creating unit 7 so that it can be used in the next search And a next search candidate output unit 9. The search word input from the search word input unit 5 is automatically recorded in the search word recording unit 1 and is used for processing by the search word analysis unit 3 as a matter of course.
[0031]
The search term analysis unit 3 analyzes the search input history information recorded by the search term recording unit 1 as described above, and_xy, Selectivity CL between words_xy, Distance between words Dis_xyIs calculated, and the set of all the extracted words is defined as LW. At this time, the information recorded in the search word recording unit 1 may use the record of all search inputs from the start of recording to a certain point in time, or may be used only for a certain period. Conceivable.
[0032]
As an example of limiting the period, there is a limiting method such as "determining the degree of relevance from the record of the search input from 0:00 on Sunday of each week to 0:00 on Sunday two weeks earlier than that". Further, in the method of using the record of all search inputs from the start of recording to a certain point in time, there is a tendency that the degree of relevance between words that have been widely used for a long period of time is high (for example, distance Dis_xyWith the use of “movie” and “movie theater”, the method of limiting the period tends to increase the degree of relevance between words used in a short period of time (for example, distance Dis_xyTo use "movie" and "(popular movie name)").
[0033]
FIG. 3 is a diagram showing the degree of relevance between words calculated by the search word analysis unit 3. FIG. 3 (a) shows words extracted from the search word record as shown in FIG. 2 recorded in the search word recording unit 1, and FIG. 3 (b) shows such extracted words. 3 shows a related word table in which the degrees of relevance between the respective words are recorded. The related word table shown in FIG._xy, Selection relevance CL_xy, Distance Dis_xyA similar table is created for each of.
[0034]
The selection relevance will be described with reference to FIG. For example, considering two search inputs, “famous places of cherry blossoms” and “cherry blossom viewing”, since these search inputs include the same word, they are rarely input as search inputs in a short period of time. Degrees cannot calculate the relevance of such words. Therefore, a function for judging whether or not there is a request as a request by looking at the selected item from the search result is the selection relevance. That is, words having a high ratio of the same item selected in the search result are assumed to be related.
[0035]
The next search candidate creating unit 7 extracts a word group suitable as the next search candidate from the relevance between the search words and the search word input from the search word input unit 5 as the analysis result of the search word analysis unit 3. Specifically, processing as shown in the flowcharts of FIGS. 5 and 6 is performed.
[0036]
In the flowchart shown in FIG. 5, the next search candidate creation unit 7 first calculates the distance Dis calculated by the search word analysis unit 3 with respect to the search input k from the search word input unit 5._kyExtracts from the word set LW the top n1 words that are greater than or equal to the first threshold th1 and stores the extraction result in Res1 (step S11).
[0037]
Then, excluding the word stored in Res1, the interval relevance T calculated by the search word analysis unit 3 for the search input k_kyExtracts from the word set LW the upper n2 words that are equal to or greater than the second threshold th2, and stores the extraction result in Res2 (step S13). Further, except for the words stored in Res1 and Res2, the selection relevance CL is set for the search input k._kyExtracts from the word set LW the top n3 words that are equal to or greater than the third threshold th3, and saves this extraction result in Res3 (step S15).
[0038]
As a result, with respect to the input search word from the search word input unit 5, as a word having a high degree of relevance from the word set LW, a distance Dis which is equal to or greater than a predetermined threshold value_ky, Interval relevance T_ky, Selection relevance CL_kyAre stored in Res1, Res2, Res3, respectively. In this process, words having a high degree of relevance can be easily found by using the values of the related word table as shown in FIG.
[0039]
Further, as shown in the flowchart of FIG. 6, the next search candidate creating unit 7 further classifies the words stored in the word set Res2 into groups. That is, in the processing shown in FIG._ky, A feature vector WVj = (Tj1,..., Tjn) of the word wj in the set Res2 is obtained (step S21). Note that Tij is the interval association degree. Then, the distance Dis from the set Res2_xyIs used to search for the most relevant word set (x, y) (step S23). Then, the termination condition is determined (step S25). If the termination condition is not satisfied, a new cluster is generated and registered in the set Res2 (step S27). Then, information on the words x and y in the set Res2 is deleted (step S29), and the process returns to step S23, and the same processing is repeated until the end condition is satisfied to group the words in Res2.
[0040]
That is, in this processing, the interval relevance T_xyDistance Dis using_xyIs used to define the degree of relevance between words, and the classification is performed by assigning the most relevant words to one set (cluster).
[0041]
For example, in response to a search input of “book”, if “search”, “catalog”, “reservation”, “mail order”, and “mail order” are extracted in the word set Res2, the next search candidate creation unit shown in FIG. As a result of the processing in step 7, the word in Res2 is assigned to one cluster such as {"search", "catalog", "reservation", "mail order", "mail order", etc. If the user selects {"mail order" or "mail order"} as a word to be added, a search of "book and (mail order or mail order)" can be generated as the next search input.
[0042]
Next, a processing flow of the next search candidate word presenting apparatus of the present embodiment will be specifically described. The document to be searched by the next search candidate word presentation device is a WWW page (HTML file) on the Internet which is a computer network. When a search service is provided, the input search word is the input time of the search word, the address of the computer, the personal ID as searcher identification information for specifying the searcher, and the item selected in the search result. This is recorded as search input history information in the search word recording unit 1 as shown in FIG.
[0043]
Next, the search word analysis unit 3 analyzes the search words recorded in the search word recording unit 1, and determines the relevance between words, that is, the interval relevance T between words._xy, Selectivity CL between words_xy, Distance between words Dis_xyIs calculated, and a table as shown in FIG._xy, Selection relevance CL_xy, Distance Dis_xyCreate each time.
[0044]
The next search candidate creation unit 7 uses the relevance between words calculated by the search word analysis unit 3 to extract and group words effective for re-searching the words input from the search word input unit 5. Do.
[0045]
For example, as shown in FIG. 7, it is assumed that the word input from the search word recording unit 1 is “Internet”. For this input word, the inter-word interval relevance T_xyDistance Dis using_xyWhen a related word is obtained using, “Internet”, “Internet”, etc. are extracted as related words.
[0046]
In addition, the degree of interval relevance between words T_xyWhen a related word is obtained using, words that narrow down the search for “setting”, “description”, “computer”, “connection method”, “charge”, “provider”, etc. are extracted.
[0047]
For these extracted words, the interval relevance T between the words is further calculated._xyDistance Dis using_xyAs shown in FIG. 7, the classification is made into a group called "charge" and "provider", a group called "connection method", "PC" and "setting", and a group called "description".
[0048]
Further, when a related word is obtained using the degree of selection relevance between words, as shown in FIG. 8, as other words in addition to the above words, “Internet TV”, “Cable Internet”, and “ A wider variety of words such as LAN, "wireless", "USA", and "military network" are extracted.
[0049]
As described above, the word group extracted by the next search candidate creating unit 7 is sent to the next search candidate output unit 9 and presented together with the search result in the next search candidate output unit 9.
[0050]
Note that the processing procedure of the next search candidate word presentation method of the above embodiment is recorded as a program on a recording medium, and this recording medium is incorporated in a computer system, and the program recorded on the recording medium is downloaded or installed in the computer system. Then, by operating the computer system with the program, it is of course possible to function as a next search candidate word presentation device that implements a next search candidate word presentation method, and by using such a recording medium, It can enhance the distribution.
[0051]
【The invention's effect】
As described above, according to the present invention, an input search word is recorded as search input history information together with an input time and searcher identification information, the search input history information is analyzed, and the degree of association between search words is determined. The next search candidate word is extracted and presented using the degree of relevance between the search terms and the input search word, so that the search user can use the presented next search candidate word to obtain Information can be easily reached.
[0052]
Further, according to the present invention, using the search input history information, calculate the interval relevance between search terms, the selection relevance, and the distance between search terms, using the calculated distance between search terms, A related word is extracted as a next search candidate word, a next search candidate word is further extracted using the interval relevance and the selection relevance, and the extracted next search candidate words are grouped based on a distance between the search words. Therefore, appropriate words that can be used in the next search can be accurately extracted.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a next search candidate word presentation device according to an embodiment of the present invention.
FIG. 2 is a diagram showing search input history information that is a recording example of a search recorded in a search word recording unit used in the next search candidate word presentation device shown in FIG.
FIG. 3 is a diagram showing a degree of association between words calculated by a search word analysis unit used in the next search candidate word presentation device shown in FIG. 1;
FIG. 4 is an explanatory diagram showing a concept of a selection relevance calculated by a search word analysis unit used in the next search candidate word presentation device shown in FIG. 1;
FIG. 5 is a flowchart showing a next search candidate word extraction process (1) for a search input by a next search candidate creating unit used in the next search candidate word presentation device shown in FIG. 1;
6 is a flowchart showing a next search candidate word extraction process (2) for a search input by a next search candidate creating unit used in the next search candidate word presentation device shown in FIG. 1;
FIG. 7 is an explanatory diagram showing an output image (1) of the next search candidate word presentation device shown in FIG. 1;
FIG. 8 is an explanatory diagram showing an output image (2) of the next search candidate word presentation device shown in FIG. 1;
[Explanation of symbols]
1 Search word recording section
3 Search term analysis section
5 Search word input section
7 Next search candidate creation section
9 Next search candidate output section

Claims

A next search candidate word presentation method for extracting and presenting a next search candidate word to assist re-search when searching for desired information from a large number of documents, from a search word previously input as a search request,
The input search word is recorded as search input history information together with the input time of the search word, searcher identification information for specifying the searcher,
Analyze the recorded search input history information, calculate the relevance between search terms,
The next search candidate word is extracted using the calculated relevance between the search words and the input search word,
Present the extracted next search candidate word ,
In the calculation processing of the degree of relevance between the search terms, the search terms recorded as the search input history information, the input time, the searcher identification information, and the information of the item selected from the search results, using the information between the search terms Calculates the interval relevance, selection relevance, and distance between search terms,
In the process of extracting the next search candidate word, a related word is extracted as the next search candidate word using the calculated distance between the search words, and the extracted next search candidate word is removed from the search word. Extracting a next search candidate word using the calculated interval relevance and selection relevance, and further grouping the extracted next search candidate word based on the calculated distance between the search words.
The next search candidate word presentation method characterized by the following.

A next search candidate word presentation device that extracts and presents a next search candidate word from a search word previously input as a search request in order to support a re-search when searching for desired information from a large amount of documents,
Search term recording means for recording the input search term as search input history information together with the input time of the search term and searcher identification information for specifying the searcher;
A search term analyzing means for analyzing the recorded search input history information and calculating the relevance between the search terms;
A next search candidate word extracting means for extracting a next search candidate word using the calculated relevance between the search words and the input search word;
Next search candidate word presenting means for presenting the extracted next search candidate word ,
The search term calculation means uses a search term recorded as the search input history information, an input time, searcher identification information, and information of an item selected from a search result, and an interval relevance between the search terms, Having means for calculating the selection relevance and the distance between the search terms,
The next search candidate word extracting means extracts a related word as a next search candidate word using the calculated distance between search words, and further extracts the extracted next search candidate word. From the search terms excluding the above, the next search candidate words are extracted using the calculated interval relevance and the selection relevance, and the extracted next search candidate words are grouped based on the calculated distance between the search terms. means
A next search candidate word presentation device, characterized by having :

A record that records a next search candidate word presentation program for extracting and presenting next search candidate words from search words previously input as a search request to support re-search when searching for desired information from a large amount of documents A medium,
Search word storage means is input to the search term for the search word input time, is recorded as search input history information along with searcher identification information for identifying a searcher,
Search word analyzer means is analyzing the recorded search input history information, to calculate the relevance between the search term,
Using the following search candidate word extraction means relevance and input search word among the calculated search terms, then extracts the next search candidate word,
Causing the next search candidate word presenting means to present the extracted next search candidate word ,
The search term calculation means, using a search term recorded as the search input history information, input time, searcher identification information, and information on items selected from the search results, the interval relevance between the search terms, Calculate the selection relevance and the distance between search terms,
Using the distance between the calculated search words, the next search candidate word extracting means causes a related word to be extracted as a next search candidate word, and further, from the search words excluding the extracted next search candidate word, Using the calculated interval relevance and the selection relevance to extract a next search candidate word, and further performing a process of grouping the extracted next search candidate word based on the calculated distance between the search words. Recording medium for recording the next search candidate word presentation program.