JP2006309377A

JP2006309377A - Document retrieval device, document retrieval method, its program, and recording medium

Info

Publication number: JP2006309377A
Application number: JP2005129079A
Authority: JP
Inventors: Ayahiro Nakajima; 紋宏中島
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2005-04-27
Filing date: 2005-04-27
Publication date: 2006-11-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document retrieval device capable of automatically creating a related term dictionary, and capable of further improving accuracy of document retrieval by detecting a related term of a word included in a retrieval sentence more accurately than conventional. <P>SOLUTION: Vector retrieval of a retrieval target document is carried out by a retrieval sentence, and a retrieval sentence in which a word is replaced by a related term. A word included in a selected subject in a retrieval result, and a word included in the retrieval sentence are used to automatically generate the related term dictionary database. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、入力された検索文に関連する検索対象文書を出力する、文書検索装置および文書検索方法ならびにそのプログラムと記録媒体に関する。 The present invention relates to a document search apparatus, a document search method, a program thereof, and a recording medium that output a search target document related to an input search sentence.

近年、インターネットに接続されたＷｅｂサーバにアクセスしてＷｅｂページを端末へ表示する技術の普及や、大量の文書の電子化などにより、文書検索システムに関心が高まっている。文書検索システムにおいては、利用者は、検索文や検索語を入力し、所望の文書を得ることができる。ところで、文書検索システムにおいて、利用者が、文書に含まれていない単語を含む検索文で検索すると、検索に失敗したり、不適切な検索結果になったりするなどの問題が起こる。このような問題を解決するため、文書検索システムの中には、関連語辞書を備えこの問題に対処するものがある。このような文書検索システムは、利用者が入力した検索文の中に、検索対象文書に含まれない単語が現れると、単語の関連語を関連語辞書から取得し、単語を関連語に置き換えたり、追加したりして、検索文を補う処理を行う。ここで、この関連語辞書に人手で関連語を追加する作業は、労力がかかるので、半自動あるいは自動的に関連語を追加する手法が望まれる。なお、自動的に関連語辞書を作る方法として、特許文献１が公開されている。
特開平１１−３１２１６８号公報 In recent years, interest in document search systems has increased due to the spread of technology for accessing a Web server connected to the Internet and displaying Web pages on a terminal, and digitization of a large number of documents. In a document search system, a user can obtain a desired document by inputting a search sentence or a search term. By the way, in a document search system, when a user searches with a search sentence including a word that is not included in a document, problems such as a search failure or an inappropriate search result occur. In order to solve such a problem, some document search systems have a related word dictionary to deal with this problem. In such a document search system, when a word not included in the search target document appears in a search sentence entered by a user, a related word of the word is acquired from the related word dictionary, and the word is replaced with the related word. , Add or perform processing to supplement the search sentence. Here, since the task of manually adding a related word to this related word dictionary is laborious, a method of adding the related word semi-automatically or automatically is desired. As a method for automatically creating a related word dictionary, Patent Document 1 is disclosed.
JP 11-31168 A

ここで、上述の特許文献１における同義語計算装置及びプログラムでは、検索履歴を取得し、その履歴が含む単語の時間間隔と頻度に基づき関連度を求め、関連度の高い単語同士を関連語辞書に登録する方法を提案している。この方法を文書検索システムに組み込めば、自動的に関連語辞書を作り、検索文を補うものができる。しかしこの方法だと、単純に時間間隔が近い単語同士が関連付けられてしまうので、検索に失敗したときや、所望の検索結果が得られないときの検索文に含まれる単語同士も、関連付けられてしまい、結果として、検索結果が改善されないという問題点があった。 Here, in the above-described synonym calculation device and program in Patent Document 1, a search history is acquired, a relevance degree is obtained based on a time interval and a frequency of words included in the history, and words having a high relevance degree are related words. Suggest a way to register with. If this method is incorporated into a document search system, a related word dictionary can be automatically created to supplement the search sentence. However, with this method, words that are close in time interval are simply associated with each other, so the words included in the search sentence when the search fails or when the desired search result cannot be obtained are also associated with each other. As a result, there is a problem that the search result is not improved.

そこでこの発明は、関連語辞書のデータベースを自動的に作成できる文書検索装置であって、従来よりも検索文に含まれる単語の関連語をより精度良く検出することで、さらなる文書検索の精度を上げることができる、文書検索装置および文書検索方法ならびにそのプログラムと記録媒体を提供することを目的としている。 Therefore, the present invention is a document search apparatus that can automatically create a database of related word dictionaries, and by detecting related words of words included in a search sentence more accurately than before, further accuracy of document search can be improved. An object of the present invention is to provide a document search apparatus and a document search method, a program thereof, and a recording medium.

本発明は、上述の課題を解決すべくなされたもので、入力された検索文に関連する検索対象文書を出力する文書検索装置であって、既に学習された情報として、前記検索文に含まれる単語を被関連語とその被関連語に関連する関連語との対応関係の情報を記憶し、またそれら被関連語と関連語との関連の強さを示す関連度を対応付けて記憶する関連語記憶部と、前記検索文に含まれる各単語のうち、前記関連語記憶部に前記関連語が対応付けられて記憶されている単語については、その関連語に置き換える検索語置換処理部と、前記置き換え後の検索文のうち、置き換えた単語についてはその単語の重みに前記関連度を乗じた値を用いて、当該置き換え後の検索文の各単語の検索文書ベクトルを生成し、また、当該置き換え後の検索文に含まれる各単語の前記検索対象文書における重みの値に基づいて対象文書ベクトルを生成し、それら検索文書ベクトルと対象文書ベクトルと、の成す角度に応じて、当該成す角度が小さい所定の数の検索対象文書を、検索対象文書記憶部に記憶している複数の検索対象文書の中から抽出する検索対象文書抽出処理部と、前記抽出された検索文書の件名を、前記検索文書ベクトルと対象文書ベクトルとの成す角度に応じて順次表示できる検索結果画面データを生成する検索結果画面生成処理部と、前記検索結果画面データを出力する検索結果画面出力処理部と、を備えることを特徴とする文書検索装置である。 The present invention has been made to solve the above-described problem, and is a document search apparatus that outputs a search target document related to an input search sentence, and is included in the search sentence as already learned information. A relationship that stores information on the correspondence between related words and related words related to the related words, and stores the association level indicating the strength of the relationship between the related words and the related words. A word storage unit and a search word replacement processing unit for replacing the related word with the related word stored in the related word storage unit among the words included in the search sentence; Among the search sentences after the replacement, for the replaced word, a search document vector of each word of the search sentence after the replacement is generated using a value obtained by multiplying the weight of the word by the relevance, Included in search text after replacement A target document vector is generated based on a weight value of each word in the search target document, and a predetermined number of search targets having a small angle formed according to an angle formed by the search document vector and the target document vector A search target document extraction processing unit that extracts a document from a plurality of search target documents stored in the search target document storage unit, a subject of the extracted search document, the search document vector and the target document vector A document search apparatus comprising: a search result screen generation processing unit that generates search result screen data that can be sequentially displayed according to an angle formed by; and a search result screen output processing unit that outputs the search result screen data. It is.

本発明によれば、より精度高く検索対象文書を検索するために学習された、関連語への置き換えが行われる。その置き換え後の検索文によって検索処理が行われる。これにより、従来よりも精度の高い検索の処理を行うことができる。 According to the present invention, replacement with related words learned to search for a search target document with higher accuracy is performed. Search processing is performed by the search sentence after the replacement. Thereby, it is possible to perform a search process with higher accuracy than in the past.

また本発明は、前記出力した検索結果画面に表示する検索対象文書の件名のうち何れかの件名が選択されたか否かを、少なくとも当該件名の選択により受付ける前記検索対象文書の表示指示に基づいて検出する件名選択判定処理部と、複数に渡り前記検索文が入力された場合のそれら各検索文が、同一セッションであるか否かを判定する同一セッション判定処理部と、前記検索結果画面において件名が選択された際のその検索結果画面の出力を促した、検索文中の単語を関連語として検出する関連語検出処理部と、当該検索と同一セッションにおいて、以前に前記検索結果画面の件名を選択しなかった検索を特定し、その検索文中の単語を被関連語として検出する被関連語検出処理部と、前記検出した被関連語と関連語の対応関係が、前記関連語記憶部に予め対応付けられて記憶されていない場合には、その被関連語と関連語の対応関係と所定の関連度とを対応付けて関連語辞書ＤＢに登録する関連語学習処理部と、を備えることを特徴とする。 Further, the present invention is based on at least the display instruction of the search target document received by selection of the subject, whether any subject is selected from the subject of the search target document displayed on the output search result screen. A subject selection determination processing unit to be detected, a same session determination processing unit that determines whether or not each of the search sentences when a plurality of the search sentences are input is the same session, and a subject in the search result screen The related word detection processing unit that detects the word in the search sentence as the related word, which prompted the output of the search result screen when is selected, and previously selected the subject of the search result screen in the same session as the search A related word detection processing unit that identifies a search that has not been performed and detects a word in the search sentence as a related word, and a correspondence relationship between the detected related word and the related word includes the relationship A related word learning processing unit that associates the correspondence between the related word and the related word and a predetermined degree of association with each other and registers the related word in the related word dictionary DB when the word storage unit does not store the related word in advance. It is characterized by providing.

本発明によれば、同一セッションの情報によって管理される一連の検索処理において、ユーザが異なる検索文を入力した場合には、それらが関連している検索文であると考えることができるので、その複数の検索文に含まれる単語同士を関連語ペア（関連語と被関連語の組合せ）として自動登録することができる。そしてこれにより関連語辞書データベースの登録の管理者の作業を軽減することができる。また関連語ペアの登録を多くすることで、ユーザが入力した検索文に含まれる単語以外の関連語での検索も自動的に行われるので、従来よりも検索の精度を向上させることができる。 According to the present invention, when a user inputs different search sentences in a series of search processes managed by information of the same session, they can be considered as related search sentences. Words included in a plurality of search sentences can be automatically registered as related word pairs (combinations of related words and related words). As a result, the work of the administrator for registering the related word dictionary database can be reduced. In addition, by increasing the number of registered related word pairs, a search with related words other than the words included in the search text input by the user is automatically performed, so that the search accuracy can be improved as compared with the related art.

また本発明は、前記出力した検索結果画面に表示する検索対象文書の件名のうち何れかの件名が選択されたか否かを、少なくとも当該件名の選択により受付ける前記検索対象文書の表示指示に基づいて検出する件名選択判定手段と、複数に渡り前記検索文が入力された場合のそれら各検索文が、同一セッションであるか否かを判定する同一セッション判定処理部と、前記同一セッションにおいて入力された複数の検索文のうち、前回入力された検索文であってその検索文の入力に応じて出力された前記検索結果画面の件名が選択されなかった検索文を特定し、当該検索文中の単語を被関連語として検出する被関連語検出処理部と、前記同一セッションにおいて入力された複数の検索文のうち、今回入力された検索文であってその検索文の入力に応じて出力された前記検索結果画面において選択された件名に含まれる単語のうち、前記前回入力された検索文の入力に応じて出力された前記検索結果画面の件名の中で表示されていない単語を、関連語として検出する関連語検出処理部と、前記検出した被関連語と関連語の対応関係が、前記関連語記憶部に予め対応付けられて記憶されていない場合には、その被関連語と関連語の対応関係と所定の関連度とを対応付けて登録する関連語学習処理部と、を備えることを特徴とする。 Further, the present invention is based on at least the display instruction of the search target document received by selection of the subject, whether any subject is selected from the subject of the search target document displayed on the output search result screen. The subject selection selection means to detect, the same session determination processing unit for determining whether or not each of the search sentences when a plurality of the search sentences are input is the same session, and the input in the same session Among a plurality of search sentences, a search sentence that has been input last time and that is output in response to the input of the search sentence is not selected, and a search sentence that is not selected is specified, and a word in the search sentence is determined. A related word detection processing unit for detecting as a related word, and a search sentence input this time among a plurality of search sentences input in the same session, and input of the search sentence Of the words included in the subject selected on the search result screen that is output in response, the words that are not displayed in the subject of the search result screen that is output in response to the input of the previously input search sentence Is detected as a related word, and a correspondence relationship between the detected related word and the related word is not stored in association with the related word storage unit in advance. A related word learning processing unit for registering a correspondence relationship between a word and a related word and a predetermined degree of association in association with each other.

本発明によれば、同一セッションＩＤによって管理される一連の検索処理において、ユーザが異なる検索文を入力した場合には、選択されなかった検索結果画面の出力を促す前の検索文の単語と、次の検索文に基づいて出力された検索結果画面における選択された件名の単語とは、それらが関連していると考えることができるので、その単語同士を関連語ペアとして自動登録することができる。そしてこれにより関連語辞書データベースの登録の管理者の作業を軽減することができる。また関連語ペアの登録を多くすることで、ユーザが入力した検索文に含まれる単語以外の関連語での検索も自動的に行われるので、従来よりも検索の精度を向上させることができる。 According to the present invention, in a series of search processes managed by the same session ID, when the user inputs a different search sentence, the search sentence word before prompting the output of the search result screen not selected, Since the words of the selected subject in the search result screen output based on the next search sentence can be considered to be related, the words can be automatically registered as related word pairs. . As a result, the work of the administrator for registering the related word dictionary database can be reduced. In addition, by increasing the number of registered related word pairs, a search with related words other than the words included in the search text input by the user is automatically performed, so that the search accuracy can be improved as compared with the related art.

また本発明は、前記検出した被関連語と関連語の対応関係が、前記関連語記憶部に既に対応付けられて記憶されている場合には、所定の算出式に基づいて得られた関連度を、その被関連語と関連語に対応付けられて記憶されている関連度に加算する関連度増加処理部と、を備えることを特徴とする。これにより関連の強い関連語ペアにおける被関連語と関連語に対応する関連度の自動修正を行うことができる。 In the present invention, when the correspondence relationship between the detected related word and the related word is already stored in association with the related word storage unit, the degree of association obtained based on a predetermined calculation formula And a related degree increase processing unit for adding to the related degree and the related degree stored in association with the related word. As a result, it is possible to automatically correct the related degree corresponding to the related word and the related word in the related word pair having strong relation.

また本発明は、前記出力した検索結果画面に表示する検索対象文書の件名のうち何れかの件名が選択されたか否かを、少なくとも当該件名の選択により受付ける前記検索対象文書の表示指示に基づいて検出する件名選択判定処理部と、前記出力した検索結果画面に表示する検索対象文書の件名のうち何れの件名についても選択されなかった場合には、その検索結果画面の出力に利用された前記検索文中の単語と、前記選択されなかった件名に含まれる単語の対応関係のうち、前記関連語記憶部に既に対応付けられて記憶されている対応関係を特定し、当該対応関係の各単語の組合せについて記録されている関連度を、所定の算出式に基づいて得られた関連度を用いて減算する関連度削減処理部と、を備えることを特徴とする。これにより関連の弱い関連語ペアにおける被関連語と関連語に対応する関連度の自動修正を行うことができる。 Further, the present invention is based on at least the display instruction of the search target document received by selection of the subject, whether any subject is selected from the subject of the search target document displayed on the output search result screen. If no subject is selected from the subject selection determination processing unit to be detected and the subject of the search target document displayed on the output search result screen, the search used for the output of the search result screen Among the correspondences between the words in the sentence and the words included in the non-selected subject, the correspondences that are already associated and stored in the related word storage unit are identified, and the combinations of the words in the correspondences And a relevance reduction processing unit that subtracts the relevance recorded for the image using a relevance obtained based on a predetermined calculation formula. As a result, it is possible to automatically correct the related degree corresponding to the related word and the related word in the weakly related word pair.

また本発明は、入力された検索文に関連する検索対象文書を出力する文書検索装置における文書検索方法であって、関連語記憶部が、既に学習された情報として、前記検索文に含まれる単語を被関連語とその被関連語に関連する関連語との対応関係の情報を記憶し、またそれら被関連語と関連語との関連の強さを示す関連度を対応付けて記憶し、検索語置換処理部が、前記検索文に含まれる各単語のうち、前記関連語記憶部に前記関連語が対応付けられて記憶されている単語については、その関連語に置き換え、検索対象文書抽出処理部が、前記置き換え後の検索文のうち、置き換えた単語についてはその単語の重みに前記関連度を乗じた値を用いて、当該置き換え後の検索文の各単語の検索文書ベクトルを生成し、また、当該置き換え後の検索文に含まれる各単語の前記検索対象文書における重みの値に基づいて対象文書ベクトルを生成し、それら検索文書ベクトルと対象文書ベクトルと、の成す角度に応じて、当該成す角度が小さい所定の数の検索対象文書を、検索対象文書記憶部に記憶している複数の検索対象文書の中から抽出し、検索結果画面生成処理部が、前記抽出された検索文書の件名を、前記検索文書ベクトルと対象文書ベクトルとの成す角度に応じて順次表示する検索結果画面データを生成し、検索結果画面出力処理部が、前記検索結果画面データを出力する、ことを特徴とする文書検索方法である。 Further, the present invention is a document search method in a document search apparatus for outputting a search target document related to an input search sentence, wherein the related word storage unit includes the word included in the search sentence as already learned information. Is stored in association with the related word and the related word related to the related word, and the degree of association indicating the strength of the relationship between the related word and the related word is stored in association with each other and searched. The word replacement processing unit replaces, with respect to each word included in the search sentence, a word stored in association with the related word in the related word storage unit, and performs search target document extraction processing. Part of the search sentence after the replacement, using a value obtained by multiplying the weight of the word by the relevance for the word replaced, to generate a search document vector of each word of the search sentence after the replacement, In addition, after the replacement A target document vector is generated based on a weight value of each word included in the search sentence in the search target document, and the predetermined angle is small according to an angle formed by the search document vector and the target document vector. A plurality of search target documents are extracted from a plurality of search target documents stored in the search target document storage unit, and a search result screen generation processing unit determines the subject of the extracted search document as the search document vector A search result screen data to be sequentially displayed according to an angle formed by the target document vector and a search result screen output processing unit outputs the search result screen data.

また本発明は、既に学習された情報として、前記検索文に含まれる単語を被関連語とその被関連語に関連する関連語との対応関係の情報を記憶し、またそれら被関連語と関連語との関連の強さを示す関連度を対応付けて記憶する関連語記憶部を備え、前記検索文に関連する検索対象文書を出力する文書検索装置のコンピュータに実行させるプログラムであって、前記検索文に含まれる各単語のうち、前記関連語記憶部に前記関連語が対応付けられて記憶されている単語については、その関連語に置き換える処理と、前記置き換え後の検索文のうち、置き換えた単語についてはその単語の重みに前記関連度を乗じた値を用いて、当該置き換え後の検索文の各単語の検索文書ベクトルを生成し、また、当該置き換え後の検索文に含まれる各単語の前記検索対象文書における重みの値に基づいて対象文書ベクトルを生成し、それら検索文書ベクトルと対象文書ベクトルと、の成す角度に応じて、当該成す角度が小さい所定の数の検索対象文書を、検索対象文書記憶部に記憶している複数の検索対象文書の中から抽出する処理と、前記抽出された検索文書の件名を、前記検索文書ベクトルと対象文書ベクトルとの成す角度に応じて順次表示する検索結果画面データを生成する処理と、前記検索結果画面データを出力する処理と、をコンピュータに実行させるプログラムである。 In addition, the present invention stores, as already learned information, information on the correspondence relationship between a related word and a related word related to the related word as a word included in the search sentence, and also relates to the related word. A program for causing a computer of a document search apparatus to output a search target document related to the search sentence, comprising a related word storage unit that stores a degree of association indicating the strength of a relation with a word, Among the words included in the search sentence, for the word that is stored in association with the related word in the related word storage unit, the process of replacing the related word with the related word and the replacement of the search sentence after the replacement For each word, a value obtained by multiplying the weight of the word by the relevance is used to generate a search document vector for each word in the replacement search sentence, and each word included in the replacement search sentence of A target document vector is generated based on the weight value in the search target document, and a predetermined number of search target documents having a small angle formed according to the angle formed by the search document vector and the target document vector are searched. A process of extracting from among a plurality of search target documents stored in the target document storage unit, and a subject of the extracted search document are sequentially displayed according to an angle formed by the search document vector and the target document vector. A program for causing a computer to execute processing for generating search result screen data and processing for outputting the search result screen data.

また本発明は、既に学習された情報として、前記検索文に含まれる単語を被関連語とその被関連語に関連する関連語との対応関係の情報を記憶し、またそれら被関連語と関連語との関連の強さを示す関連度を対応付けて記憶する関連語記憶部を備え、前記検索文に関連する検索対象文書を出力する文書検索装置のコンピュータに実行させるプログラムを記憶する記録媒体であって、前記検索文に含まれる各単語のうち、前記関連語記憶部に前記関連語が対応付けられて記憶されている単語については、その関連語に置き換える処理と、前記置き換え後の検索文のうち、置き換えた単語についてはその単語の重みに前記関連度を乗じた値を用いて、当該置き換え後の検索文の各単語の検索文書ベクトルを生成し、また、当該置き換え後の検索文に含まれる各単語の前記検索対象文書における重みの値に基づいて対象文書ベクトルを生成し、それら検索文書ベクトルと対象文書ベクトルと、の成す角度に応じて、当該成す角度が小さい所定の数の検索対象文書を、検索対象文書記憶部に記憶している複数の検索対象文書の中から抽出する処理と、前記抽出された検索文書の件名を、前記検索文書ベクトルと対象文書ベクトルとの成す角度に応じて順次表示する検索結果画面データを生成する処理と、前記検索結果画面データを出力する処理と、をコンピュータに実行させるプログラムを記憶する記録媒体である。 In addition, the present invention stores, as already learned information, information on the correspondence relationship between a related word and a related word related to the related word as a word included in the search sentence, and also relates to the related word. A storage medium for storing a program to be executed by a computer of a document search apparatus that outputs a search target document related to the search sentence, including a related word storage unit that stores a degree of association indicating the strength of the relation with the word Of the words included in the search sentence, for words stored in association with the related word in the related word storage unit, processing for replacing the related word and the search after the replacement Among the sentences, for the replaced word, a search document vector of each word of the search sentence after the replacement is generated using a value obtained by multiplying the weight of the word by the relevance, and the search sentence after the replacement A target document vector is generated based on a weight value of each word included in the search target document, and a predetermined number of searches with a small angle formed according to an angle formed by the search document vector and the target document vector The process of extracting the target document from the plurality of search target documents stored in the search target document storage unit, and the subject of the extracted search document at an angle formed by the search document vector and the target document vector A recording medium for storing a program for causing a computer to execute processing for generating search result screen data to be sequentially displayed and processing for outputting the search result screen data.

以下、本発明の一実施形態による文書検索システム（文書検索装置）を図面を用いて説明する。図１は同実施形態による文書検索システムの構成を示すブロック図である。この図において、符号１は文書検索サーバである。また２はＷｅｂサーバである。また３はＰＣ（Personal Computer）などの端末である。そして、文書検索サーバ１はＷｅｂサーバ２と通信ネットワークを介して接続され、また端末３はＷｅｂサーバとインターネットなどを介して接続されている。本実施形態においては、端末３がＷｅｂサーバ２にアクセスして検索文の情報を通知すると、Ｗｅｂサーバ２はその検索文の情報を文書検索サーバ３へ転送する。そして文書検索サーバ３が検索対象文書の中から検索文に関連する検索対象文書の情報をＷｅｂサーバへ出力する。またＷｅｂサーバ２は検索文に関連する検索対象文書の情報を表示するためのウェブページのデータを端末３へ送信する処理を行う。この過程において、文書検索サーバ１は、後述の処理により、検索精度の良い検索対象文書、つまり、従来にも増して検索文に内容の近い検索対象文書の情報を抽出する処理を行う。 Hereinafter, a document search system (document search apparatus) according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a document search system according to the embodiment. In this figure, reference numeral 1 denotes a document search server. Reference numeral 2 denotes a Web server. Reference numeral 3 denotes a terminal such as a PC (Personal Computer). The document search server 1 is connected to the Web server 2 via a communication network, and the terminal 3 is connected to the Web server via the Internet. In the present embodiment, when the terminal 3 accesses the Web server 2 and notifies the search text information, the Web server 2 transfers the search text information to the document search server 3. Then, the document search server 3 outputs information on the search target document related to the search sentence from the search target documents to the Web server. In addition, the Web server 2 performs processing for transmitting data of a Web page for displaying information of a search target document related to the search sentence to the terminal 3. In this process, the document search server 1 performs a process of extracting information on a search target document with high search accuracy, that is, a search target document whose content is closer to the search sentence than in the past, by a process described later.

図２はＷｅｂサーバと文書検索サーバの機能ブロックを示す図である。
この図が示すように、Ｗｅｂサーバ２はセッションＩＤ生成部２１とウェブページ処理部２２を備える。セッションＩＤは一連の文書検索を特定する識別情報であって、このセッションＩＤにより、ユーザが、ある内容の文書を検索した一連の処理において受付けた検索文を特定することが可能となる。セッションＩＤ生成部２１はこのセッションＩＤを後述する処理により生成する。またウェブページ処理部２２は検索文の情報の受け付けや、その検索結果のウェブページのデータを送信する処理を行う。 FIG. 2 is a diagram showing functional blocks of the Web server and the document search server.
As shown in this figure, the Web server 2 includes a session ID generation unit 21 and a web page processing unit 22. The session ID is identification information for identifying a series of document searches. With this session ID, the user can identify a search sentence received in a series of processes for retrieving a document having a certain content. The session ID generation unit 21 generates this session ID by processing to be described later. Further, the web page processing unit 22 performs processing for receiving information on a search sentence and transmitting web page data of the search result.

また文書検索サーバ１において、１０１は各処理部を制御する制御部である。また１０２は検索文ベクトルと、対象文書ベクトルを生成する文書ベクトル生成部である。また１０３は複数の検索対象文書の中から、検索文に関連する所定の数の検索対象文書を抽出する処理を行う文書検索部である。また１０４は、検索文に含まれる単語の関連語をデータベースに登録する処理などを行う関連語学習部である。また１０５は、形態素解析を行う為の各単語などを記憶する形態素解析辞書ＤＢ（データベース）である。また１０６は、検索文に含まれる単語（被関連語）と、その単語に関連する単語（関連語）と、その単語の組合せの関連度とを対応付けて記憶する関連語辞書ＤＢである。また１０７は、複数の検索対象文書を記憶する検索対象文書ＤＢである。また１０８は、検索文の情報を受け付ける度に、その検索文により検索処理した際の情報の履歴を記憶する検索履歴ＤＢである。 In the document search server 1, reference numeral 101 denotes a control unit that controls each processing unit. Reference numeral 102 denotes a search vector and a document vector generation unit that generates a target document vector. Reference numeral 103 denotes a document search unit that performs a process of extracting a predetermined number of search target documents related to a search sentence from a plurality of search target documents. Reference numeral 104 denotes a related word learning unit that performs processing for registering related words of words included in a search sentence in a database. Reference numeral 105 denotes a morpheme analysis dictionary DB (database) that stores words and the like for performing morpheme analysis. Reference numeral 106 denotes a related word dictionary DB that stores a word (related word) included in a search sentence, a word related to the word (related word), and a degree of association of the combination of the words in association with each other. Reference numeral 107 denotes a search target document DB that stores a plurality of search target documents. Reference numeral 108 denotes a search history DB that stores a history of information when a search process is performed using the search sentence each time information on the search sentence is received.

図３は、検索対象文書ＤＢが記憶するデータの構成を示す図である。
この図が示すように検索対象文書ＤＢ１０７は、文書ＩＤと件名の情報と、本文の情報と、文書ベクトルと、件名に含まれる単語の情報とを対応付けて記憶している。ここで、検索対象文書ＤＢ１０７の記憶する文書ベクトルは、件名と本文に含まれる単語のそれぞれの重みの値によって表される。 FIG. 3 is a diagram illustrating a configuration of data stored in the search target document DB.
As shown in this figure, the search target document DB 107 stores a document ID, subject information, body information, document vector, and word information included in the subject in association with each other. Here, the document vector stored in the search target document DB 107 is represented by the weight value of each word included in the subject and the text.

図４は、検索履歴ＤＢが記憶するデータの構成を示す図である。
この図が示すように検索履歴ＤＢ１０８は、セッションＩＤと、検索ＩＤと、検索文単語と、検索結果と、閲覧文書と、関連語ペアとを対応付けて記憶している。ここで検索ＩＤとは検索文の情報を受付けるごとの処理を識別するための情報である。また検索文単語は、検索文に含まれる単語である。また検索結果とは、検索文に基づいて文書検索サーバ１が検索した検索対象文書の文書ＩＤである。また閲覧文書とは、検索結果のうち端末３を利用するユーザが閲覧した検索対象文書の文書ＩＤである。また関連語ペアとは文書検索サーバ１において、検索文に含まれる単語（被関連語）と、当該単語に対応付けられて関連語辞書ＤＢ１０６に記憶されている単語（関連語）の組合せを記憶した情報である。 FIG. 4 is a diagram illustrating a configuration of data stored in the search history DB.
As shown in this figure, the search history DB 108 stores a session ID, a search ID, a search sentence word, a search result, a viewed document, and a related word pair in association with each other. Here, the search ID is information for identifying a process each time information of a search sentence is received. The search sentence word is a word included in the search sentence. The search result is a document ID of a search target document searched by the document search server 1 based on a search sentence. The browse document is the document ID of the search target document browsed by the user who uses the terminal 3 in the search results. The related word pair stores a combination of a word (related word) included in the search sentence and a word (related word) stored in the related word dictionary DB 106 in association with the word in the document search server 1. Information.

図５は、関連語辞書ＤＢの記憶するデータの構成を示す図である。
この図が示すように、関連語辞書ＤＢ１０６は、検索文に含まれる単語（被関連語）とその単語に関連する単語（関連語）と、それら単語の組合せにおける関連の強さを示す関連度ａ（０≦ａ≦１）とを対応付けて記憶する。なお、文書検索サーバ１は、後述の処理により関連語辞書ＤＢ１０６に記憶する被関連語と関連語の組合せについて、検索文が入力される度に、増加するか否かの判定処理や、増加、減少の処理を行う。また関連度ａについても増加または減算する処理を行う。これにより、自動的に関連語辞書ＤＢ１０６の生成を行って管理者の労力を軽減し、また関連語辞書ＤＢ１０６の記憶する情報に基づいて、精度良い検索結果を出力する処理を行う。 FIG. 5 is a diagram showing a configuration of data stored in the related word dictionary DB.
As shown in this figure, the related word dictionary DB 106 stores a word (related word) included in a search sentence, a word related to the word (related word), and a degree of relevance indicating the strength of the relation in the combination of these words a (0 ≦ a ≦ 1) is stored in association with each other. Note that the document search server 1 determines whether to increase each time a search sentence is input for a combination of a related word and a related word stored in the related word dictionary DB 106 by a process described later, Perform a reduction process. Further, the relevance degree a is increased or subtracted. As a result, the related word dictionary DB 106 is automatically generated to reduce the labor of the administrator, and a process for outputting a search result with high accuracy based on the information stored in the related word dictionary DB 106 is performed.

図６は文書検索サーバの処理フローを示す図である。
次に、図６を用いて、文書検索サーバの処理フローについて説明する。
まず、端末３のＷｅｂブラウザの機能よりアクセスを受付けたＷｅｂサーバ２は、文書検索の為の検索文入力画面を端末３に出力する。この検索文入力画面において端末３を利用するユーザが検索文を入力し、検索指示を与えると、端末３は検索文の情報をＷｅｂサーバ２へ送信する。するとＷｅｂサーバ２が検索文を受信し、セッションＩＤ生成部２１がセッションＩＤを生成して、検索文とセッションＩＤとを文書検索サーバ１へ送信する。なお、この時既に同一セッションＩＤを示す前の検索文を文書検索サーバ１が受信しており、これについての検索履歴の情報が検索履歴ＤＢ１０８に記録されているものとする。セッションＩＤは、例えば所定の時間間隔を超えない間隔で受付けた検索文を同一セッションＩＤにより管理しても良いし、または、検索文入力画面において、“検索変更”等の指示を受付け、その情報を文書検索サーバ１で受信するまで同一セッションＩＤにより管理するようにしても良い。文書検索サーバ１は同一の内容を検索する一連の処理が終了したと判断するまで、後述する処理により端末３から受付けた検索文を同一セッションＩＤに対応付けて検索履歴ＤＢ１０８に記憶していく。 FIG. 6 is a diagram showing a processing flow of the document search server.
Next, the processing flow of the document search server will be described with reference to FIG.
First, the Web server 2 that has received access from the Web browser function of the terminal 3 outputs a search text input screen for document search to the terminal 3. When a user using the terminal 3 inputs a search sentence on this search sentence input screen and gives a search instruction, the terminal 3 transmits information on the search sentence to the Web server 2. Then, the Web server 2 receives the search text, the session ID generation unit 21 generates a session ID, and transmits the search text and the session ID to the document search server 1. At this time, it is assumed that the document search server 1 has already received the previous search statement indicating the same session ID, and the search history information about this has been recorded in the search history DB 108. As for the session ID, for example, a search sentence received at an interval not exceeding a predetermined time interval may be managed by the same session ID, or an instruction such as “change search” is received on the search sentence input screen, and the information May be managed by the same session ID until the document retrieval server 1 receives the message. Until the document search server 1 determines that a series of processes for searching for the same content is completed, the search sentence received from the terminal 3 by the process described later is stored in the search history DB 108 in association with the same session ID.

文書検索サーバ１において検索文を受付けると、文書ベクトル生成部１０２がその検索文を単語に分解する（ステップＳ１０１）。そして、文書ベクトル生成部１０２は、検索文内の単語を被関連語とし、その被関連語の単語を、関連語辞書ＤＢ１０６に記録されている関連語に置き換えて（ステップＳ１０２）、その置き換え後の検索文の検索文ベクトルを生成する（ステップＳ１０３）。なおこの時、置き換え後の検索文の検索文ベクトルに含まれる各単語の重みを関連度を用いて変更する処理を行う。そして文書検索部１０３は、置き換え後の検索文の検索文ベクトルと対象文書ベクトルとの成す角度を、検索文と検索対象文書の組み合わせ毎に計算する。 When the document search server 1 accepts a search sentence, the document vector generation unit 102 decomposes the search sentence into words (step S101). Then, the document vector generation unit 102 sets a word in the search sentence as a related word, replaces the word of the related word with a related word recorded in the related word dictionary DB 106 (step S102), and after the replacement A search sentence vector of the search sentence is generated (step S103). At this time, processing is performed to change the weight of each word included in the search sentence vector of the search sentence after replacement using the relevance. Then, the document search unit 103 calculates an angle formed by the search sentence vector of the search sentence after replacement and the target document vector for each combination of the search sentence and the search target document.

次に、文書ベクトル生成部１０２は、ベクトル検索の処理として、置き換え後の検索文の検索文ベクトルとの成す角度が小さい対象文書ベクトルを特定し、所定の数の検索対象文書を検索結果と決定する（ステップＳ１０４）。ここで、検索対象文書１０７に含まれる全ての検索対象文書の対象文書ベクトルと置き換え後の検索文の検索文ベクトルとを用いて検索処理を行っても良いし、また、検索対象文書ＤＢ１０７に含まれる検索対象文書のうち、置き換え後の検索文に含まれる何れかの単語を必ず含む複数の検索対象文書の対象文書ベクトルと置き換え後の検索文の検索文ベクトルとを用いて検索処理を行っても良い。なお、検索結果となる検索対象文書の決定の処理について、より詳細に後述する。 Next, as a vector search process, the document vector generation unit 102 specifies a target document vector having a small angle with the search sentence vector of the replacement search sentence, and determines a predetermined number of search target documents as search results. (Step S104). Here, the search processing may be performed using the target document vectors of all the search target documents included in the search target document 107 and the search sentence vectors of the search sentences after replacement, or included in the search target document DB 107. Search processing using the target document vector of a plurality of search target documents that always include any word included in the replacement search sentence and the search sentence vector of the replacement search sentence. Also good. The process of determining the search target document that is the search result will be described in detail later.

次に、検索結果が特定されると文書検索部１０３は、検索文に一意な検索ＩＤを生成する。そして検索結果として決定した検索対象文書の文書ＩＤと、件名の情報と、を検索対象文書ＤＢ１０７から読み取って、それら読み取った情報を、前記セッションＩＤ生成部２１から受付けたセッションＩＤと前記生成した検索ＩＤとに対応付けて検索履歴ＤＢ１０８に登録する。また文書検索部１０３は、検索文を形態素解析して各単語を抽出しそれら各単語についても対応付けて検索履歴ＤＢ１０８に登録する。また文書検索部１０３は、文書ベクトル生成部１０２が検索文ベクトルの生成に利用した関連語とその関連語の抽出に利用した被関連語の単語の組合せを、関連語ペアとして検索履歴ＤＢ１０８に登録する。以上の処理により文書検索サーバ１が検索の処理を完了する。 Next, when the search result is specified, the document search unit 103 generates a search ID unique to the search sentence. Then, the document ID of the search target document determined as the search result and the subject information are read from the search target document DB 107, and the read information is received from the session ID generation unit 21 and the generated search. The ID is registered in the search history DB 108 in association with the ID. Further, the document search unit 103 extracts a word by performing a morphological analysis on the search sentence, and registers each word in association with the search history DB 108. Further, the document search unit 103 registers, in the search history DB 108, a combination of related words used by the document vector generation unit 102 to generate search sentence vectors and related words used to extract the related words as related word pairs. To do. With the above processing, the document search server 1 completes the search processing.

そして、制御部１０１が、検索結果として決定した検索対象文書の文書ＩＤと、件名の情報とをＷｅｂサーバ２へ送信する。次にＷｅｂサーバ２のウェブページ処理部２２が、検索結果として決定した検索対象文書の文書ＩＤと、件名の情報との一覧を表示する検索結果画面データを生成し、その検索結果画面のデータを端末３へ送信する。 Then, the control unit 101 transmits the document ID of the search target document determined as the search result and the subject information to the Web server 2. Next, the web page processing unit 22 of the web server 2 generates search result screen data that displays a list of document IDs and subject information of the search target documents determined as the search results, and the search result screen data is generated. Transmit to terminal 3.

端末３においては、検索結果画面に表示された件名の中に、ユーザが閲覧したいと希望する文書の内容を表す件名が表示されていれば、その件名の検索対象文書の表示指示が入力される。また検索結果画面に表示された件名の中に、ユーザが閲覧したいと希望する文書の内容を表す件名が表示されていなければ、その件名の検索対象文書の表示指示は入力されない。ここで、ある件名についての検索対象文書の表示指示が端末３に入力されると、端末３はその件名と文書ＩＤの情報を含む検索対象文書表示要求の情報をＷｅｂサーバ２へ送信する。Ｗｅｂサーバ２は検索対象文書表示要求の情報を文書検索サーバ１へ送信する。文書検索サーバ１は、この検索対象文書表示要求の情報を受付けるか否かによって、ユーザが検索結果の何れかの検索対象文書を閲覧したか否かを判定する（ステップＳ１０５）。そして閲覧したと判断した場合には、文書検索部１０３は文書ＩＤに関連づけられて検索対象文書ＤＢ１０７に記録されている本文の情報などを読み取り、制御部１０１を介してＷｅｂサーバ２へ送信する。Ｗｅｂサーバ２のウェブページ処理部は本文表示画面データを生成し、そのデータを端末３へ送信する。 In the terminal 3, if a subject representing the content of the document that the user desires to browse is displayed in the subject displayed on the search result screen, an instruction to display the search target document with the subject is input. . Also, if the subject displayed on the search result screen does not display a subject representing the content of the document that the user wants to view, the display instruction for the search target document with that subject is not input. Here, when an instruction to display a search target document for a certain subject is input to the terminal 3, the terminal 3 transmits information on a search target document display request including information on the subject and document ID to the Web server 2. The Web server 2 transmits information on the search target document display request to the document search server 1. The document search server 1 determines whether or not the user has browsed any search target document in the search result depending on whether or not the search target document display request information is received (step S105). If it is determined that the document has been browsed, the document search unit 103 reads the text information and the like recorded in the search target document DB 107 in association with the document ID, and transmits the information to the Web server 2 via the control unit 101. The web page processing unit of the web server 2 generates text display screen data and transmits the data to the terminal 3.

この本文表示画面のデータを端末３へ送信する場合において、関連語学習部１０４は関連語辞書ＤＢ１０６の自動更新の処理を行う。この自動更新の処理は、関連語ペア候補の作成処理（ステップＳ１０６）、関連語ペア候補の関連語辞書ＤＢ１０６における登録済み判定処理（ステップＳ１０７）、未登録の関連語ペア候補の関連語辞書ＤＢ１０６への登録処理（ステップＳ１０８）、登録済みの関連語ペアの関連度増加処理（ステップＳ１０９）などが行われる。また関連語学習部１０４は、検索結果画面に表示された件名が選択されなかった場合（つまり検索結果の何れかの検索対象文書が全て閲覧されなかったと判定した場合）には、後述する処理により、置き換え後の検索文における、置き換え前の単語と置き換え後の単語からなる関連語ペアについての関連度減少の処理を行う（ステップＳ１１０）。また再度、検索文の情報を受付けるか否かの判定を行って（ステップＳ１１１）、検索文を受付けた場合には検索処理を開始する。 When transmitting the text display screen data to the terminal 3, the related word learning unit 104 performs an automatic update process of the related word dictionary DB 106. This automatic update processing includes related word pair candidate creation processing (step S106), related word pair candidate related word dictionary DB 106 registered determination processing (step S107), unregistered related word pair candidate related word dictionary DB 106. Registration processing (step S108), relevance increase processing of registered related word pairs (step S109), and the like are performed. In addition, when the subject displayed on the search result screen is not selected (that is, when it is determined that any of the search target documents in the search result has not been browsed), the related word learning unit 104 performs a process described later. In the search sentence after replacement, the relevance reduction process is performed for the related word pair including the word before replacement and the word after replacement (step S110). Further, it is determined again whether or not the information of the search sentence is accepted (step S111). When the search sentence is accepted, the search process is started.

次に、ステップＳ１０５の処理において、検索結果画面の何れかの検索対象文書が閲覧されたと判定した場合の文書検索サーバ１のステップＳ１０５以降の詳細な処理フローについて説明する。
まず、本文表示画面を端末３に送信した後、関連語学習部１０４は、本文表示画面の送信を促した検索文の単語を読み取る。また、その検索文を受付けたセッションＩＤと同一のセッションＩＤに対応付けられて検索履歴ＤＢ１０８に記録されている他の検索文の検索ＩＤであって、閲覧文書の記録領域に情報が記録されていない検索ＩＤ（つまり検索結果画面において検索対象文書が全て閲覧されなかった検索文のＩＤ）の検索文の単語を読み取る。そして、それら読み取った単語同士を対応付けた関連語ペア候補を作成する（上記ステップＳ１０６の処理）。この時、本文表示画面の送信を促した検索文の単語を関連語、検索結果画面において検索対象文書が全て閲覧されなかった検索文の単語を被関連語とする。またその単語に対応する関連度の総和は０．５とする。この関連度の総和の値は適宜変更可能である。 Next, a detailed processing flow after step S105 of the document search server 1 when it is determined that any search target document on the search result screen has been browsed in the process of step S105 will be described.
First, after transmitting the text display screen to the terminal 3, the related word learning unit 104 reads the word of the search sentence that prompted the transmission of the text display screen. Further, it is a search ID of another search sentence recorded in the search history DB 108 in association with the same session ID as the session ID that accepted the search sentence, and information is recorded in the recording area of the browse document. A word of a search sentence having no search ID (that is, an ID of a search sentence in which all search target documents have not been browsed on the search result screen) is read. And the related word pair candidate which matched these read words is produced (process of the said step S106). At this time, the word of the search sentence that prompted the transmission of the text display screen is the related word, and the word of the search sentence in which all the search target documents are not browsed on the search result screen is the related word. The sum of the relevance levels corresponding to the word is 0.5. The value of the total degree of relevance can be changed as appropriate.

この上記ステップＳ１０６の処理を図４を用いてより詳細に説明すると、まず、本文表示画面の送信を促した検索文の検索ＩＤが「Ｒ１０３」で、同一セッションの他の検索文の検索ＩＤであって検索結果画面において閲覧されなかった前回の検索文の検索ＩＤが「Ｒ１０２」および「Ｒ１０１」であるとすると、まず、検索ＩＤ「Ｒ１０３」と「Ｒ１０２」の組合せにより、検索ＩＤ「Ｒ１０３」の検索文には「圧縮ツール」と「復元」の単語（関連語）が、また検索ＩＤ「Ｒ１０２」の検索文には「ｃａｂ」の単語（被関連語）が含まれていることがわかる。従って、関連語学習部１０４は、「ｃａｂ」→「圧縮ツール」という関係と、「ｃａｂ」→「復元」という関係とを、関連語ペア候補として作成する。この時、今回の検索文に含まれる単語が２つであるので、「関連度の総和÷今回の検索文に含まれる単語数」＝「０．５÷２」によりそれぞれの関連語ペア候補の関連度を計算する。図７は、抽出した被関連語と関連語の関連語ペア候補とその関連度を示す第１の図である。 The processing in step S106 will be described in more detail with reference to FIG. 4. First, the search ID of the search sentence that prompted the transmission of the text display screen is “R103”, and the search ID of another search sentence in the same session is used. If the search IDs of the previous search sentences that were not browsed on the search result screen are “R102” and “R101”, first, the search ID “R103” is obtained by the combination of the search IDs “R103” and “R102”. It can be seen that the search sentences of "compression tool" and "restoration" include the words (related words), and the search ID "R102" includes the word "cab" (related words). . Therefore, the related word learning unit 104 creates a relationship “cab” → “compression tool” and a relationship “cab” → “restoration” as related word pair candidates. At this time, since there are two words included in the current search sentence, each of the related word pair candidates is expressed by “total sum of relevance ÷ number of words included in current search sentence” = “0.5 ÷ 2.” Calculate relevance. FIG. 7 is a first diagram showing extracted related word and related word pair candidates of related words and their degrees of association.

また図４において、関連語学習部１０４は、検索ＩＤ「Ｒ１０３」の検索文には「圧縮ツール」と「復元」の単語（関連語）が、また検索ＩＤ「Ｒ１０１」の検索文には「ｚｉｐ」と「展開」の単語（被関連語）が含まれていることを確認する。そして、「ｚｉｐ」→「圧縮ツール」、「ｚｉｐ」→「復元」、「展開」→「圧縮ツール」、「展開」→「復元」という４つの関係を関連語ペアとして抽出する。この時、関連度の計算は、被関連語「ｚｉｐ」に対して、検索ＩＤ「Ｒ１０１」の検索文の単語（関連語）が２つあるので、「ｚｉｐ」→「圧縮ツール」と「ｚｉｐ」→「復元」のそれぞれの関連度は「関連度の総和÷今回の検索文に含まれる単語数」＝０．２５である。また同様に、「展開」→「圧縮ツール」、「展開」→「復元」の各関連度も０．２５である。図８は、抽出した被関連語と関連語の関連語ペア候補とその関連度を示す第２の図である。 In FIG. 4, the related word learning unit 104 includes “compression tool” and “restoration” words (related words) for the search sentence with the search ID “R103”, and “search for the search ID“ R101 ”. Confirm that the word “zip” and the word “development” (related words) are included. Then, four relationships of “zip” → “compression tool”, “zip” → “restoration”, “decompression” → “compression tool”, “decompression” → “restoration” are extracted as related word pairs. At this time, since there are two search sentence words (related words) with the search ID “R101” with respect to the related word “zip”, “zip” → “compression tool” and “zip” The degree of relevance of “→ restoration” is “sum of relevance / number of words included in current search sentence” = 0.25. Similarly, each degree of association of “decompression” → “compression tool” and “decompression” → “restore” is 0.25. FIG. 8 is a second diagram showing extracted related word and related word pair candidates of related words and their degrees of association.

以上の図７や図８のような関連語ペア候補の作成によれば、同一セッションＩＤによって管理される一連の検索処理において、ユーザが異なる検索文を入力した場合には、検索結果が閲覧された検索文と閲覧されなかった検索文が関連している検索文であると考えることができるので、その検索結果が閲覧された検索文と閲覧されなかった検索文の単語同士を関連語ペアとして自動登録することができる。そしてこれにより関連語辞書ＤＢ１０６の登録の管理者の作業を軽減することができる。また関連語ペアの登録を多くすることで、ユーザが入力した検索文に含まれる単語以外の関連語での検索も自動的に行われるので、従来よりも検索の精度を向上させることができる。 According to the related word pair candidate creation as shown in FIGS. 7 and 8 above, when a user inputs a different search sentence in a series of search processes managed by the same session ID, the search result is browsed. Search terms that are not browsed and search terms that are not viewed can be considered as related search terms. You can register automatically. As a result, the administrator's work for registering the related word dictionary DB 106 can be reduced. In addition, by increasing the number of registered related word pairs, a search with related words other than the words included in the search text input by the user is automatically performed, so that the search accuracy can be improved as compared with the related art.

また関連語学習部１０４は上述の処理以外にも次の処理により関連語ペア候補を作成する処理を行う。
関連語学習部１０４は、同一セッションＩＤに対応付けられた複数の検索ＩＤのうち、検索結果画面において検索対象文書が全て閲覧されなかった検索文の検索ＩＤを特定し、その検索ＩＤの検索文に基づいて検索した結果の検索対象文書の文書ＩＤを検索履歴ＤＢ１０８から読み取る。またその文書ＩＤに対応付けられて検索対象文書ＤＢ１０７に記録されている件名に含まれる単語（件名単語）を読み取る。そして、その読み取った単語に含まれていない単語を、検索結果から閲覧された件名に含まれる単語の中から取得する。そして、この取得した単語（関連語）と、検索対象文書が全て閲覧されなかった検索文に含まれる単語（被関連語）とを関係のある関連語ペアとして抽出する。以下、この処理についてより詳細に説明する。 Moreover, the related word learning part 104 performs the process which produces a related word pair candidate by the following process besides the above-mentioned process.
The related word learning unit 104 specifies a search ID of a search sentence in which all search target documents are not browsed on the search result screen among a plurality of search IDs associated with the same session ID, and the search sentence of the search ID The document ID of the retrieval target document as a result of retrieval based on the retrieval history DB 108 is read from the retrieval history DB 108. Further, a word (subject word) included in the subject recorded in the search target document DB 107 in association with the document ID is read. And the word which is not contained in the read word is acquired from the words contained in the subject browsed from the search result. Then, the acquired word (related word) and the word (related word) included in the search sentence in which the entire search target document is not browsed are extracted as related word pairs. Hereinafter, this process will be described in more detail.

まず、
１．検索対象文書が全て閲覧されなかった検索文に含まれる単語＜この単語のまとまりを、仮に検索文単語Ａとする＞を「ｚｉｐ」,「展開」とする。
また、
２．検索対象文書が全て閲覧されなかった検索結果画面に表示されている件名に含まれる単語＜この単語のまとまりを、仮に件名単語Ａとする＞を「ドライブ」,「フォルダ」,「エラー」,「コンピュータ」とする。
また、
３．検索結果画面において表示された件名が選択されて本文を送信した、当該検索結果画面に表示される件名に含まれる単語＜この単語のまとまりを、仮に件名単語Ｂとする＞を「圧縮」,「フォルダ」,「右クリック」とする。 First,
1. Words included in a search sentence in which all search target documents have not been browsed (a group of these words is assumed to be a search sentence word A) are defined as “zip” and “development”.
Also,
2. “Drive”, “Folder”, “Error”, “Word” included in the subject displayed on the search result screen where all the search target documents have not been browsed are assumed to be a subject word A. Computer ".
Also,
3. The subject displayed on the search result screen is selected and the text is transmitted, and the word included in the subject displayed on the search result screen <a group of these words as subject word B> is expressed as “compressed”, “ “Folder”, “Right click”.

このような場合において、以下のような処理として言い換えることができる。つまり、「ユーザが、検索文単語Ａを入力し、その結果、件名単語Ａが出力されるが、件名単語Ａを見ても、所望の検索対象文書が結果として得られていないと判定する。そしてさらに次の検索として検索文単語Ｂを入力し、その検索結果画面が出力されて、その画面に所望の検索対象文書の件名がある場合にはその件名に含まれる件名単語Ｂを選択する。」といった処理として言い換えることができる。 In such a case, it can be paraphrased as the following processing. That is, “the user inputs the search sentence word A, and as a result, the subject word A is output, but even if the subject word A is seen, it is determined that the desired search target document is not obtained as a result. Further, the search sentence word B is input as the next search, the search result screen is output, and when the subject of the desired search target document is on the screen, the subject word B included in the subject is selected. It can be paraphrased as a process such as “

そして、この時、件名単語Ｂの単語群＜「圧縮」,「フォルダ」,「右クリック」＞から件名単語Ａの単語群＜「ドライブ」,「フォルダ」,「エラー」,「コンピュータ」＞に含まれる単語を除いた各単語＜「圧縮」,「右クリック」＞は、検索文単語Ａの単語群＜「ｚｉｐ」,「展開」＞に関係のある単語であると考えることができる。なぜなら、選択されなかった検索結果画面に表示されている各件名の単語は、所望の検索対象文書を表す単語としては不適格であるため、その不適格な単語を、選択された検索結果画面に表示されている件名の単語から除けば、選択されなかった検索結果画面の出力を促す検索文の各単語に関連があるといえるためである。従って、同一セッションＩＤによって管理される一連の検索処理において、ユーザが異なる検索文を入力した場合には、選択されなかった検索結果画面の出力を促す前の検索文の単語＜「ｚｉｐ」,「展開」＞（被関連語）と、次の検索文に基づいて出力された検索結果画面における選択された件名の単語のうち前の検索文によって出力された検索結果画面に表示されていない単語＜「圧縮」,「右クリック」＞（関連語）の、被関連語と関連語の４つの組合せを関連語ペア候補として抽出する。 At this time, the word group of the subject word B <“compression”, “folder”, “right click”> to the word group of the subject word A <“drive”, “folder”, “error”, “computer”> Each word <“compressed”, “right click”> excluding the included words can be considered as a word related to the word group <“zip”, “expand”> of the search sentence word A. Because the word of each subject displayed on the search result screen that was not selected is ineligible as a word representing the desired search target document, the ineligible word is displayed on the selected search result screen. This is because it can be said that each word of the search sentence that prompts the output of the search result screen that has not been selected is related except for the displayed subject words. Therefore, in a series of search processes managed by the same session ID, when the user inputs a different search sentence, the search sentence word <"zip", " "Expanded"> (related word) and a word that is not displayed in the search result screen output by the previous search sentence among the words of the selected subject in the search result screen output based on the next search sentence < Four combinations of “compressed”, “right-click”> (related word) of related words and related words are extracted as related word pair candidates.

以上の関連語ペアの抽出によれば、同一セッションＩＤによって管理される一連の検索処理において、ユーザが異なる検索文を入力した場合には、選択されなかった検索結果画面の出力を促す前の検索文の単語と、次の検索文に基づいて出力された検索結果画面における選択された件名の単語から、選択されなかった検索結果画面に表示されている件名の単語を除いた単語とは、それらが関連している検索文であると考えることができるので、その単語同士を関連語ペアとして自動登録することができる。そしてこれにより関連語辞書ＤＢ１０６の登録の管理者の作業を軽減することができる。また関連語ペアの登録を多くすることで、ユーザが入力した検索文に含まれる単語以外の関連語での検索も自動的に行われるので、従来よりも検索の精度を向上させることができる。 According to the above related word pair extraction, when a user inputs a different search sentence in a series of search processes managed by the same session ID, the search before prompting the output of the search result screen not selected. Sentence words and words obtained by subtracting the subject words displayed in the search result screen not selected from the selected subject words in the search result screen output based on the next search sentence Can be considered as a related search sentence, the words can be automatically registered as related word pairs. As a result, the administrator's work for registering the related word dictionary DB 106 can be reduced. In addition, by increasing the number of registered related word pairs, a search with related words other than the words included in the search text input by the user is automatically performed, so that the search accuracy can be improved as compared with the related art.

次に関連語学習部１０４は、上述の処理によって抽出した関連語ペア候補が関連語辞書ＤＢ１０６に登録されているか否かを判定する（上記ステップＳ１０７の処理）。そして既に関連語辞書ＤＢ１０６に登録されている場合には、その関連語ペアに対応付けられて関連語辞書ＤＢ１０６に記録されている関連度を増加させる処理を行う（上記ステップＳ１０８の処理）。また関連語辞書ＤＢ１０６に登録されていなければ、それら抽出した関連語ペア候補を関連語辞書ＤＢ１０６に登録する（上記ステップＳ１０９の処理）。この関連度の増加の処理において、関連語ペアが「ｃａｂ」→「復元」と「ｃａｂ」→「展開」であるとする。また既にこの関連語ペアについての関連度が関連語辞書ＤＢ１０６に登録されており、「ｃａｂ」→「復元」の関連語ペアの関連度が０．５、「ｃａｂ」→「展開」の関連語ペアの関連度が０．６であるとする。 Next, the related word learning unit 104 determines whether or not the related word pair candidate extracted by the above-described process is registered in the related word dictionary DB 106 (the process of step S107). If it is already registered in the related word dictionary DB 106, a process of increasing the degree of association associated with the related word pair and recorded in the related word dictionary DB 106 is performed (the process of step S108). If not registered in the related word dictionary DB 106, the extracted related word pair candidates are registered in the related word dictionary DB 106 (the process in step S109). In this process of increasing relevance, it is assumed that related word pairs are “cab” → “restoration” and “cab” → “expansion”. In addition, the degree of association for the related word pair is already registered in the related word dictionary DB 106, the degree of association of the related word pair “cab” → “restoration” is 0.5, and the related word “cab” → “expand”. Assume that the relevance of a pair is 0.6.

まず、関連語学習部１０４は、上記ステップＳ１０５の処理において判断した、閲覧された検索対象文書の件名において、上述の処理によって作成した関連語ペア候補の関連語（復元，展開）を含む検索対象文書の文書ＩＤを検索履歴ＤＢ１０８より読み取る。そして、この文書ＩＤに対応付けられて検索対象文書ＤＢ１０７に記録されている文書ベクトルを読み取る。そして、この文書ベクトルのうちの関連語ペア候補の関連語（復元，展開）に対応するベクトル値を抽出する。ここで、復元のベクトル値がＷ１、展開のベクトル値がＷ２であるとする。なお上述したようにこのベクトル値は単語の文書内の重みの値により表されるものである。この重みの値は、従来のＴＦ（Term Frequency：一つの文書中での一つの単語の出現回数）値や、ＩＤＦ（inverse document frequency：全文書中での単語の出現頻度）値などの計算により予め算出されて記録されている値である。そして、関連語ペア候補に対する最大増加値を０．１とし、その値を関連語ペア候補の「関連語」のベクトル値で比例配分した値を、それぞれの「被関連語」→「関連語」の関連語ペアに割当てるとすると、「復元」を関連語とする関連語ペアにおいては、 First, the related word learning unit 104 includes a related word (restoration and expansion) of a related word pair candidate created by the above-described processing in the subject of the retrieved search target document determined in the process of step S105. The document ID of the document is read from the search history DB 108. Then, the document vector recorded in the search target document DB 107 in association with the document ID is read. Then, a vector value corresponding to a related word (restoration, expansion) of a related word pair candidate is extracted from the document vector. Here, it is assumed that the vector value for restoration is W1 and the vector value for expansion is W2. As described above, this vector value is represented by the weight value in the word document. This weight value is calculated by calculating a conventional TF (Term Frequency: the number of occurrences of one word in one document) value, IDF (inverse document frequency: the occurrence frequency of words in all documents), or the like. This is a value calculated and recorded in advance. Then, the maximum increase value with respect to the related word pair candidate is set to 0.1, and the value obtained by proportionally distributing the value by the vector value of the “related word” of the related word pair candidate is expressed as “related word” → “related word”. In the related word pair with “restore” as the related word,

０．１× Ｗ１／（Ｗ１＋Ｗ２）・・・（１） 0.1 × W1 / (W1 + W2) (1)

また、「展開」を関連語とする関連語ペアにおいては、 Also, in the related word pair with “expanded” as the related word,

０．１× Ｗ２／（Ｗ１＋Ｗ２）・・・（２） 0.1 × W2 / (W1 + W2) (2)

により関連度の増加分を算出する。例えば、式（１）の計算結果が０．０１である場合には、「ｃａｂ」→「復元」の関連語ペアの関連度が０．５１へ、また、式（２）の計算結果が０．０９である場合には、「ｃａｂ」→「展開」の関連語ペアの関連度が０．６９へと増加される。なお関連語ペアに対する最大増加値を０．１としたが、この値はこれに限らず適宜設定可能である。 To calculate the increase in relevance. For example, when the calculation result of Expression (1) is 0.01, the relevance of the related word pair “cab” → “restoration” is 0.51, and the calculation result of Expression (2) is 0. In the case of 0.09, the relevance of the related word pair “cab” → “expand” is increased to 0.69. Although the maximum increase value for the related word pair is 0.1, this value is not limited to this and can be set as appropriate.

次に、上述の検索結果画面において、件名が選択されなかった場合の処理について説明する。次の検索文の情報が端末３から送信され、それを受信した場合や、端末３から“検索変更”等の指示の情報を受信した場合や、検索結果画面を出力してから所定の時間が経過したことを検出した場合などは、検索結果画面において表示されている件名が選択されなかったと判断する（上記ステップＳ１０５の処理）。そして、その検索結果画面の出力を促した検索文の検索ＩＤに対応付けられて検索履歴ＤＢ１０８に記録されている検索対象文書の文書ＩＤを検索結果の欄から読み取る。そして、その文書ＩＤに対応付けられて検索対象文書ＤＢ１０７に記録されている件名単語を読み取る。そしてその読み取った件名単語を関連語、前記検索文の単語を被関連語として、被関連語→関連語の関連語ペアの関連度を減算する。この削減の値は、例えば０．１以下である方が望ましい。なお、関連度ａは０≦ａ≦１の範囲で増減させることができる。この理由については後述する。 Next, processing when a subject is not selected on the above-described search result screen will be described. When the information of the next search sentence is transmitted from the terminal 3 and received, when the instruction information such as “search change” is received from the terminal 3, or when a predetermined time has elapsed since the output of the search result screen When it is detected that the time has elapsed, it is determined that the subject displayed on the search result screen has not been selected (the process of step S105). Then, the document ID of the search target document recorded in the search history DB 108 in association with the search ID of the search sentence that prompted the output of the search result screen is read from the search result column. Then, a subject word recorded in the search target document DB 107 in association with the document ID is read. Then, using the read subject word as a related word and the word of the search sentence as a related word, the relevance of the related word pair of the related word → the related word is subtracted. The reduction value is desirably 0.1 or less, for example. The degree of association a can be increased or decreased in the range of 0 ≦ a ≦ 1. The reason for this will be described later.

次に、上述の検索対象文書の検索についての詳細を説明する。
ステップＳ１０４の処理において、例えば、元の検索文に含まれる単語として「ｚｉｐ」、「展開」が含まれるとし、関連語ペアとして「ｚｉｐ」→「圧縮ツール」と「ｚｉｐ」→「復元」が関連語辞書ＤＢ１０６登録されているとする。この時、被関連語を関連語に置き換えた検索文に含まれる単語は「圧縮ツール」、「復元」、「展開」となる。置き換えた後の検索文に含まれる単語を利用して、従来のベクトル検索方と同じ手法により検索文ベクトルを生成する。今、置き換えによって検索文に含まれる単語が、
１．「圧縮ツール」
２．「復元」
３．「展開」
であり、関連語辞書ＤＢ１０６に登録されている関連語ペアが
ｚｉｐ（被関連語）→圧縮ツール（関連語）：０．２（関連度）
ｚｉｐ（被関連語）→復元（関連語）：０．３（関連度）
であるので、上記置き換え後の検索文に含まれる単語の検索対象文書における重みの値が（圧縮ツール、復元、展開）＝（Ｗ１、Ｗ２、Ｗ３）であるとすると、検索文ベクトルは、（Ｗ１×０．２、Ｗ２×０．３、Ｗ３）として計算される。つまり、置き換え後の検索文の検索文ベクトルを生成する際には、その検索文に含まれる単語の重みの値に、その単語を関連語とする関連度ペアに対応付けられている関連度を乗じて、その単語についてのベクトル値を算出する。また、検索対象文書のベクトル（対象文書ベクトル）は、検索文に含まれる単語の何れかを保持する検索対象文書の、当該検索文に含まれる単語に対応する重み成分により決定される。つまり、検索文に含まれる単語は「圧縮ツール」、「復元」、「展開」であり、その全ての単語を含む検索対象文書の対象文書ベクトルは、当該検索対象文書における「圧縮ツール」、「復元」、「展開」の各単語の重みにより表される。この重みの値は、検索対象文書ＤＢ１０７の文書ベクトル内に含まれる複数の値のうち、各単語に対応する値を取得することで得られる。 Next, details of the search for the search target document will be described.
In the process of step S104, for example, “zip” and “development” are included as words included in the original search sentence, and “zip” → “compression tool” and “zip” → “restoration” are related word pairs. Assume that the related word dictionary DB 106 is registered. At this time, words included in the search sentence in which the related word is replaced with the related word are “compression tool”, “restoration”, and “development”. A search sentence vector is generated by the same method as the conventional vector search method using words included in the search sentence after replacement. Now, the word included in the search sentence by the replacement,
1. "Compression tool"
2. "Restore"
3. "Deployment"
The related word pair registered in the related word dictionary DB 106 is zip (related word) → compression tool (related word): 0.2 (relevance)
zip (related word) → restoration (related word): 0.3 (relevance)
Therefore, if the weight value of the word included in the search sentence after the replacement in the search target document is (compression tool, decompression, expansion) = (W1, W2, W3), the search sentence vector is ( W1 × 0.2, W2 × 0.3, W3). In other words, when generating a search sentence vector of a search sentence after replacement, the relevance level associated with the relevance pair having the word as a related word is added to the weight value of the word included in the search sentence. Multiply to calculate the vector value for that word. Further, the search target document vector (target document vector) is determined by the weight component corresponding to the word included in the search sentence of the search target document holding any of the words included in the search sentence. That is, the words included in the search sentence are “compression tool”, “restoration”, and “decompression”, and the target document vector of the search target document including all the words is “compression tool”, “ It is represented by the weight of each word of “restoration” and “development”. The weight value is obtained by acquiring a value corresponding to each word from among a plurality of values included in the document vector of the search target document DB 107.

そして文書検索部１０３は、置き換え後の検索文の検索文ベクトルと対象文書ベクトルとの成す角度を、検索文と複数の検索対象文書の組み合わせ毎に計算する。そして、文書ベクトル生成部１０２は、検索文ベクトルとの成す角度が小さい対象文書ベクトルに対応する、所定の数の検索対象文書を検索結果と決定する。以上の処理により検索対象文書の検索を行う。ここで、対象文書ベクトルを計算する複数の検索対象文書は、検索対象文書１０７に含まれる全ての検索対象文書であってもよいし、検索対象文書１０７に含まれる検索対象文書のうち、検索文に含まれる単語を必ず含む複数の検索対象文書であってもよい。そして、文書検索部１０３は、その検索の処理を示す検索ＩＤに対応付けた検索履歴ＤＢ１０８へ検索結果として決定した検索対象文書の文書ＩＤを登録する。 Then, the document search unit 103 calculates an angle formed between the search sentence vector of the search sentence after replacement and the target document vector for each combination of the search sentence and the plurality of search target documents. Then, the document vector generation unit 102 determines a predetermined number of search target documents corresponding to the target document vector having a small angle with the search sentence vector as a search result. The search target document is searched by the above processing. Here, the plurality of search target documents for calculating the target document vector may be all search target documents included in the search target document 107, or a search sentence among the search target documents included in the search target document 107. It may be a plurality of search target documents that always include the words included in. Then, the document search unit 103 registers the document ID of the search target document determined as the search result in the search history DB 108 associated with the search ID indicating the search process.

この処理によれば、０≦関連度ａ≦１の範囲の値を示す関連度を、関連語への置き換え後の検索文に含まれる単語の重みの値に乗じて、その値によって検索文ベクトルを生成している。これにより関連語へ置き換えない検索文の検索文ベクトルに比べて、検索対象文書の対象文書ベクトルとの成す角度が小さくなる。従って、本来の検索文の検索文ベクトルと検索対象文書の対象文書ベクトルとの成す角度よりも、関連語への置き換え後の検索文の検索文ベクトルと検索対象文書の対象文書ベクトルとの成す角度の方が小さいことが予想される為、本来の検索文による結果を上位の検索結果とする検索結果画面を出力する事ができる。これにより、本来の検索文を用いた検索結果を優先した端末３への表示が可能となる。 According to this processing, the degree of relevance indicating a value in the range of 0 ≦ relevance a ≦ 1 is multiplied by the value of the word weight included in the search sentence after replacement with the related word, and the search sentence vector is calculated by the value. Is generated. As a result, the angle formed with the target document vector of the search target document is smaller than the search sentence vector of the search sentence that is not replaced with the related word. Therefore, the angle formed between the search sentence vector of the search sentence after replacement with the related word and the target document vector of the search target document rather than the angle formed between the search sentence vector of the original search sentence and the target document vector of the search target document. Since it is expected that is smaller, it is possible to output a search result screen in which the result of the original search sentence is the upper search result. As a result, it is possible to display on the terminal 3 giving priority to the search result using the original search sentence.

そして次に、制御部１０１が、検索結果として決定した検索対象文書の文書ＩＤと、件名の情報とをＷｅｂサーバ２へ送信する。次にＷｅｂサーバ２のウェブページ処理部２２が、検索結果として決定した検索対象文書の文書ＩＤと、件名の情報との一覧を表示する検索結果画面を生成し、その検索結果画面のデータを端末３へ送信する。 Next, the control unit 101 transmits the document ID of the search target document determined as the search result and the subject information to the Web server 2. Next, the web page processing unit 22 of the web server 2 generates a search result screen that displays a list of document IDs of subject documents determined as search results and subject information, and stores the data of the search result screen as a terminal. 3 to send.

なお上述の各サーバや端末は内部に、コンピュータシステムを有している。そして、上述した処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 Each of the above servers and terminals has a computer system inside. The process described above is stored in a computer-readable recording medium in the form of a program, and the above process is performed by the computer reading and executing this program. Here, the computer-readable recording medium means a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

文書検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of a document search system. Ｗｅｂサーバと文書検索サーバの機能ブロックを示す図である。It is a figure which shows the functional block of a Web server and a document search server. 検索対象文書ＤＢが記憶するデータの構成を示す図である。It is a figure which shows the structure of the data which search object document DB memorize | stores. 検索履歴ＤＢが記憶するデータの構成を示す図である。It is a figure which shows the structure of the data which search history DB memorize | stores. 関連語辞書ＤＢの記憶するデータの構成を示す図である。It is a figure which shows the structure of the data which related word dictionary DB memorize | stores. 文書検索サーバの処理フローをを示す図である。It is a figure which shows the processing flow of a document search server. 被関連語と関連語の関連語ペアとその関連度を示す第１の図である。It is a 1st figure which shows the related word pair of a related word and a related word, and its relevance degree. 被関連語と関連語の関連語ペアとその関連度を示す第２の図である。It is a 2nd figure which shows the related word pair of a to-be-related word and a related word, and its relevance degree.

Explanation of symbols

１・・・文書検索サーバ、２・・・Ｗｅｂサーバ、３・・・端末、１０１・・・制御部、１０２・・・文書ベクトル生成部、１０３・・・文書検索部、１０４・・・関連語学習部、１０５・・・形態素解析辞書ＤＢ、１０６・・・関連語辞書ＤＢ、１０７・・・検索対象文書ＤＢ、１０８・・・検索履歴ＤＢ、２１・・・セッションＩＤ生成部、２２・・・ウェブページ処理部
DESCRIPTION OF SYMBOLS 1 ... Document search server, 2 ... Web server, 3 ... Terminal, 101 ... Control part, 102 ... Document vector generation part, 103 ... Document search part, 104 ... Related Word learning unit, 105 ... morphological analysis dictionary DB, 106 ... related word dictionary DB, 107 ... search target document DB, 108 ... search history DB, 21 ... session ID generation unit, 22 ..Web page processing section

Claims

A document search device that outputs a search target document related to an input search sentence,
As information that has already been learned, information on the correspondence relationship between the related word and the related word related to the related word is stored as the word included in the search sentence, and the relationship between the related word and the related word is stored. A related word storage unit for storing the degree of association indicating strength in association with each other;
Among words included in the search sentence, for words stored in association with the related word in the related word storage unit, a search word replacement processing unit to replace the related word,
Among the search sentences after the replacement, for the replaced word, a search document vector of each word of the search sentence after the replacement is generated using a value obtained by multiplying the weight of the word by the relevance, A target document vector is generated based on a weight value in the search target document of each word included in the search sentence after replacement, and the formed angle is determined according to an angle formed by the search document vector and the target document vector. A search target document extraction processing unit for extracting a small predetermined number of search target documents from a plurality of search target documents stored in the search target document storage unit;
A search result screen generation processing unit that generates search result screen data capable of sequentially displaying the subject of the extracted search document in accordance with an angle formed by the search document vector and the target document vector;
A search result screen output processing unit for outputting the search result screen data;
A document search apparatus comprising:

Subject selection selection for detecting whether any subject is selected from the subject of the search target document to be displayed on the output search result screen, based on at least the display target document display instruction received by selection of the subject A processing unit;
The same session determination processing unit that determines whether each of the search sentences when the search sentences are input over a plurality of times is the same session,
A related word detection processing unit that detects a word in a search sentence as a related word, which prompted the output of the search result screen when a subject is selected on the search result screen;
In the same session as the search, a search that has not previously selected the subject of the search result screen is specified, and a related word detection processing unit that detects a word in the search sentence as a related word;
When the correspondence relationship between the detected related word and the related word is not stored in the related word storage unit in advance, the correspondence relationship between the related word and the related word and a predetermined degree of association are obtained. A related word learning processing unit to be registered in the related word dictionary DB in association with each other;
The document search apparatus according to claim 1, further comprising:

Subject selection selection for detecting whether any subject is selected from the subject of the search target document to be displayed on the output search result screen, based on at least the display target document display instruction received by selection of the subject Means,
The same session determination processing unit that determines whether each of the search sentences when the search sentences are input over a plurality of times is the same session,
Among the plurality of search sentences input in the same session, a search sentence that has been input last time and that has been selected in response to the input of the search sentence is identified. A related word detection processing unit for detecting a word in the search sentence as a related word;
Among the plurality of search sentences input in the same session, among the words included in the subject selected in the search result screen that is the search sentence input this time and output according to the input of the search sentence, A related word detection processing unit that detects a word that is not displayed in the subject of the search result screen that is output according to the input of the previously input search sentence, as a related word;
When the correspondence relationship between the detected related word and the related word is not stored in the related word storage unit in advance, the correspondence relationship between the related word and the related word and a predetermined degree of association are obtained. A related word learning processing unit registered in association with each other;
The document search apparatus according to claim 1, further comprising:

When the correspondence relationship between the detected related word and the related word is already stored in association with the related word storage unit, the degree of relevance obtained based on a predetermined calculation formula is calculated. A degree-of-association increasing processing unit for adding to the degree of association stored in association with the word and the related word;
The document search apparatus according to claim 2, further comprising:

Subject selection selection for detecting whether any subject is selected from the subject of the search target document to be displayed on the output search result screen, based on at least the display target document display instruction received by selection of the subject A processing unit;
When no subject is selected from the subject of the search target document displayed on the output search result screen, the word in the search sentence used for the output of the search result screen and the selection are not performed. Among the correspondences of the words included in the subject name, the correspondences that are already associated and stored in the related word storage unit are specified, and the degree of association recorded for each word combination of the correspondences, A relevance reduction processing unit that subtracts using the relevance obtained based on a predetermined calculation formula;
The document search apparatus according to claim 1, further comprising:

A document search method in a document search apparatus for outputting a search target document related to an input search sentence,
The related word storage unit stores, as already learned information, information on the correspondence relationship between the related word and the related word related to the related word for the word included in the search sentence, and the related word Stores the association level indicating the strength of the association with the related word,
The search word replacement processing unit replaces each word included in the search sentence with respect to a word that is stored in association with the related word in the related word storage unit.
The search target document extraction processing unit uses a value obtained by multiplying the weight of the word by the relevance for the replaced word in the search sentence after the replacement, and the search document of each word of the search sentence after the replacement A vector is generated, and a target document vector is generated based on a weight value in the search target document of each word included in the search sentence after the replacement, and an angle formed by the search document vector and the target document vector In response to the above, a predetermined number of search target documents having a small angle formed are extracted from a plurality of search target documents stored in the search target document storage unit,
The search result screen generation processing unit generates search result screen data for sequentially displaying the subject of the extracted search document according to the angle formed by the search document vector and the target document vector,
The search result screen output processing unit outputs the search result screen data.
A document search method characterized by the above.

As information that has already been learned, information on the correspondence relationship between the related word and the related word related to the related word is stored as the word included in the search sentence, and the relationship between the related word and the related word is stored. A related word storage unit that stores association levels indicating strength in association with each other;
A program to be executed by a computer of a document search device that outputs a search target document related to the search sentence,
Among the words included in the search sentence, for the word stored in association with the related word in the related word storage unit, processing to replace the related word,
Among the search sentences after the replacement, for the replaced word, a search document vector of each word of the search sentence after the replacement is generated using a value obtained by multiplying the weight of the word by the relevance, A target document vector is generated based on a weight value in the search target document of each word included in the search sentence after replacement, and the formed angle is determined according to an angle formed by the search document vector and the target document vector. A process of extracting a small predetermined number of search target documents from a plurality of search target documents stored in the search target document storage unit;
A process of generating search result screen data for sequentially displaying the subject of the extracted search document according to an angle formed by the search document vector and the target document vector;
Processing to output the search result screen data;
A program that causes a computer to execute.

As information that has already been learned, information on the correspondence relationship between the related word and the related word related to the related word is stored as the word included in the search sentence, and the relationship between the related word and the related word is stored. A related word storage unit that stores association levels indicating strength in association with each other;
A recording medium for storing a program to be executed by a computer of a document search apparatus that outputs a search target document related to the search sentence,
Among the words included in the search sentence, for the word stored in association with the related word in the related word storage unit, processing to replace the related word,
Among the search sentences after the replacement, for the replaced word, a search document vector of each word of the search sentence after the replacement is generated using a value obtained by multiplying the weight of the word by the relevance, A target document vector is generated based on a weight value in the search target document of each word included in the search sentence after replacement, and the formed angle is determined according to an angle formed by the search document vector and the target document vector. A process of extracting a small predetermined number of search target documents from a plurality of search target documents stored in the search target document storage unit;
A process of generating search result screen data for sequentially displaying the subject of the extracted search document according to an angle formed by the search document vector and the target document vector;
Processing to output the search result screen data;
Medium for storing a program for causing a computer to execute the program.