JP5104329B2

JP5104329B2 - Document search system

Info

Publication number: JP5104329B2
Application number: JP2008006743A
Authority: JP
Inventors: 智子坪田; 真理難波; 和也武田; 征二松本
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2012-12-19
Anticipated expiration: 2028-01-16
Also published as: JP2009169651A

Description

本発明は、端末とサーバとがネットワークを介して接続され、サーバが端末から受信した検索キーワードの関連キーワードを提示するドキュメント検索システムに関する。 The present invention relates to a document search system in which a terminal and a server are connected via a network, and the server presents related keywords of search keywords received from the terminal.

一般的なドキュメント検索システムでは、利用者が入力したキーワードに対して対象のドキュメントを検索し、検索結果のドキュメントの保管場所を提示する。ここで、ドキュメントとは、例えば、テキストファイル、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）ファイル、ＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）ファイル、画像ファイル、その他文書レイアウトソフト、ワープロソフト、表計算ソフト等で作成された全てのファイルを含むものとする。 In a general document search system, a target document is searched for a keyword input by a user, and a storage location of the search result document is presented. Here, the document is, for example, a text file, an HTML (Hyper Text Markup Language) file, an XML (Extensible Markup Language) file, an image file, other document layout software, word processing software, spreadsheet software, etc. Includes files.

また、最近では、利用者が入力したキーワード（キーワード１とする。）に対して、過去の検索履歴（全利用者の検索履歴）を参照し、キーワード１と共に使用されたキーワード、またはキーワード１を含むキーワードを提示するサービスが行われている。例えば、利用者が「カレー」と入力した場合、過去の検索キーワードのトップ３として、「カレーレシピ」、「カレーの作り方」、「カレーうどん」を提示し、利用者の検索作業を支援する。しかし、過去の検索履歴を参照する場合、ドキュメントが日々更新されていく場合には有効なキーワードを提示することができない。 Also, recently, with reference to a keyword entered by a user (referred to as keyword 1), a past search history (search history of all users) is referred to, and a keyword used together with keyword 1 or keyword 1 is selected. There is a service that presents keywords that include them. For example, when the user inputs “curry”, “curry recipe”, “how to make curry”, and “curry udon” are presented as the top three search keywords in the past to support the user's search operation. However, when referring to past search histories, effective keywords cannot be presented when documents are updated daily.

一方、検索対象のドキュメントを参照し、検索キーワードの関連単語を提示するサービスも提案されている（特許文献１）。特許文献１では、関連単語として精度の高いものを提示するため、検索対象のドキュメントの中から「名詞、並立助詞、名詞」の順に並んでいるものを関連単語の組み合せとしている。
特開２００４−９４３８８号公報 On the other hand, a service that refers to a search target document and presents related words of a search keyword has also been proposed (Patent Document 1). In Patent Document 1, in order to present highly accurate words as related words, combinations of related words are arranged in the order of “noun, parallel particle, noun” from documents to be searched.
JP 2004-94388 A

しかしながら、特許文献１の仕組みでは、関連単語としての精度を追求するあまり、提示できる関連単語が少なくなる可能性がある。すなわち、入力される検索キーワードに対して、関連単語を提示できない場合が多くなる可能性がある。また、提示された関連単語は関連度が極めて高いため、逆に広がりのないものになってしまう。例えば、検索対象ドキュメントが施設情報の場合、利用者は検索をする過程で「何をしたいか」、「どこに行きたいか」、「何を食べたいか」等を決定していく場合が多い。このような場合、特許文献１の仕組みでは、文法的な並列関係にある単語のみを関連単語としているため、利用者の検索作業を十分に支援することはできない。 However, in the mechanism of Patent Document 1, there is a possibility that the number of related words that can be presented is reduced because the accuracy of related words is pursued. That is, there are many cases where related words cannot be presented for the input search keyword. In addition, the presented related words have a very high degree of relevance, so that they are not spread. For example, when the search target document is facility information, the user often decides “what he wants to do”, “where he wants to go”, “what he wants to eat”, etc. during the search process. In such a case, in the mechanism of Patent Document 1, only words that have a grammatical parallel relationship are used as related words, and thus the user's search operation cannot be fully supported.

本発明は、前述した問題点に鑑みてなされたもので、その目的は、入力された検索キーワードに対して、検索対象のドキュメントに含まれ、かつ広がりのある（文法的な並列関係にある単語のみにとらわれない）関連キーワードを提示し、利用者の検索作業を支援することができるドキュメント検索システムを提供することである。 The present invention has been made in view of the above-described problems, and the object of the present invention is included in a search target document with respect to an input search keyword, and has a broad (words in a grammatical parallel relationship). It is to provide a document search system that presents related keywords (not limited to only) and can support a user's search work.

前述した目的を達成するために第１の発明は、端末とサーバとがネットワークを介して接続され、前記サーバが前記端末から受信した検索キーワードの関連キーワードを提示するドキュメント検索システムであって、前記サーバは、単語の種類を単語属性として定義し、単語属性が検索対象ドキュメント中に一定の近さの範囲内に出現する前記単語属性の組合せを出現パターンとして定義し、単語群関連度を算出する際の前記出現パターンごとの重みを出現パターン重みとして定義した単語属性出現パターン情報を保持する手段と、前記検索対象ドキュメントの中で前記出現パターンと一致する単語群を検索し、検索した単語群の距離と前記出現パターン重みとによって単語群関連度を算出する関連度算出手段と、前記関連度算出手段によって算出した前記単語群関連度を単語関連度情報として保持する手段と、前記端末から前記検索キーワードを受信すると、前記単語関連度情報を参照して前記検索キーワードの関連キーワードを提示する関連キーワード提示手段と、を具備することを特徴とするドキュメント検索システムである。第１の発明に係るドキュメント検索システムでは、入力された検索キーワードに対して、検索対象のドキュメントに含まれ、かつ広がりのある関連キーワードを提示し、利用者の検索作業を支援することができる。 In order to achieve the above object, a first invention is a document search system in which a terminal and a server are connected via a network, and the server presents a related keyword of a search keyword received from the terminal, The server defines a word type as a word attribute, defines a combination of the word attributes in which the word attribute appears within a certain range in the search target document as an appearance pattern, and calculates a word group relevance Means for holding word attribute appearance pattern information in which the weight for each appearance pattern is defined as the appearance pattern weight, and searching for a word group that matches the appearance pattern in the search target document, Relevance calculating means for calculating the word group relevance based on the distance and the appearance pattern weight, and the relevance calculating means. Means for holding the calculated word group relevance as word relevance information, and related keyword presenting means for presenting a related keyword of the search keyword with reference to the word relevance information when the search keyword is received from the terminal And a document search system characterized by comprising: In the document search system according to the first aspect of the present invention, it is possible to support a user's search work by presenting related keywords that are included in the search target document and have a broader range for the input search keyword.

前記第１の発明における前記単語群関連度は、例えば、前記出現パターン重みと単語群の距離の逆数との積を、事前に登録した全ての単語および全ての前記出現パターンに対して合算したものである。 The word group relevance in the first invention is, for example, the sum of the product of the appearance pattern weight and the reciprocal of the distance of the word group for all previously registered words and all the appearance patterns. It is.

また、前記出現パターン重みは、登録した単語群の出現順序が全て一致する場合と、それ以外の場合との両方の値を設定可能であることが望ましい。これによって、適切な値を設定すれば、提示する関連キーワードの精度を高めることができる（ここで、精度が高いとは、例えば、多くの利用者に対して検索作業を支援できる確率が高いことを言う。）。 In addition, it is desirable that the appearance pattern weights can be set to values in both cases where the appearance order of the registered word groups is identical and in other cases. Thus, if an appropriate value is set, the accuracy of the related keywords to be presented can be improved (here, high accuracy means that, for example, there is a high probability that search operations can be supported for many users) Say.)

また、前記単語属性出現パターン情報は、更新可能であることが望ましい。これによって、検索対象のドキュメントの量、内容が変化した場合でも、単語属性出現パターン情報を適切に更新することで関連キーワードの精度を保つことができる。 The word attribute appearance pattern information is preferably updatable. As a result, even when the amount and content of the search target document change, the accuracy of the related keywords can be maintained by appropriately updating the word attribute appearance pattern information.

また、前記関連キーワード提示手段は、前記検索キーワードによる検索結果、および／または前記関連キーワードに係る単語属性とともに、前記関連キーワードを提示することが望ましい。これによって、利用者は、自ら入力した検索キーワードによる検索結果と、提示された関連キーワードとを比較して、再び検索要求を行うかどうか判断することができる。また、利用者は、入力された検索キーワードに対して、広がりのある関連キーワード情報を自動的に（利用者自らが思考することなく）入手することができる。 The related keyword presenting means preferably presents the related keyword together with a search result based on the search keyword and / or a word attribute related to the related keyword. Thus, the user can determine whether or not to make a search request again by comparing the search result based on the search keyword input by himself with the presented related keyword. In addition, the user can automatically obtain broad related keyword information (without thinking by the user himself / herself) for the input search keyword.

第２の発明は、ネットワークを介して端末と接続され、前記端末から受信した検索キーワードの関連キーワードを提示するサーバであって、単語の種類を単語属性として定義し、単語属性が検索対象ドキュメント中に一定の近さの範囲内に出現する前記単語属性の組合せを出現パターンとして定義し、単語群関連度を算出する際の前記出現パターンごとの重みを出現パターン重みとして定義した単語属性出現パターン情報を保持する手段と、前記検索対象ドキュメントの中で前記出現パターンと一致する単語群を検索し、検索した単語群の距離と前記出現パターン重みとによって単語群関連度を算出する関連度算出手段と、前記関連度算出手段によって算出した前記単語群関連度を単語関連度情報として保持する手段と、前記端末から前記検索キーワードを受信すると、前記単語関連度情報を参照して前記検索キーワードの関連キーワードを提示する関連キーワード提示手段と、を具備することを特徴とするサーバである。 A second invention is a server that is connected to a terminal via a network and presents a related keyword of a search keyword received from the terminal, wherein a word type is defined as a word attribute, and the word attribute is included in a search target document. Word attribute appearance pattern information in which a combination of the word attributes appearing within a certain close range is defined as an appearance pattern, and a weight for each appearance pattern when calculating the word group relevance is defined as an appearance pattern weight Relevance calculating means for searching for a word group that matches the appearance pattern in the search target document, and calculating a word group relevance by the distance of the searched word group and the appearance pattern weight; A means for holding the word group relevance calculated by the relevance calculation means as word relevance information; Upon receiving the keyword, a server, characterized by comprising, a related keyword presentation means for presenting the related keywords of the search keyword by referring to the word relevance information.

第３の発明は、端末とサーバとがネットワークを介して接続され、前記サーバは単語の種類を単語属性として定義し、単語属性が検索対象ドキュメント中に一定の近さの範囲内に出現する前記単語属性の組合せを出現パターンとして定義し、単語群関連度を算出する際の前記出現パターンごとの重みを出現パターン重みとして定義した単語属性出現パターン情報を保持し、前記サーバが前記端末から受信した検索キーワードの関連キーワードを提示するドキュメント検索方法であって、前記サーバが、前記検索対象ドキュメントの中で前記出現パターンと一致する単語群を検索し、検索した単語群の距離と前記出現パターン重みとによって単語群関連度を算出するステップと、前記サーバが、前記単語群関連度を算出するステップによって算出した前記単語群関連度を単語関連度情報として保持するステップと、前記端末が、前記検索キーワードを前記サーバに送信するステップと、前記サーバが、前記単語関連度情報を参照し、受信した前記検索キーワードの関連キーワードを提示するステップと、を含むことを特徴とするドキュメント検索方法である。 In a third invention, a terminal and a server are connected via a network, the server defines a word type as a word attribute, and the word attribute appears within a certain range in the search target document. A combination of word attributes is defined as an appearance pattern, word attribute appearance pattern information is defined in which the weight for each occurrence pattern when calculating the word group relevance is defined as an appearance pattern weight, and the server receives from the terminal A document search method for presenting related keywords of a search keyword, wherein the server searches a word group that matches the appearance pattern in the search target document, and the distance between the searched word group and the appearance pattern weight And calculating the word group relevance by the server and calculating the word group relevance by the server. Holding the word group relevance as word relevance information, the terminal transmitting the search keyword to the server, and the server referring to the word relevance information and receiving the search And a step of presenting a keyword related to the keyword.

第４の発明は、コンピュータを第２の発明に記載のサーバとして機能させるプログラムである。 A fourth invention is a program for causing a computer to function as the server described in the second invention.

本発明により、入力された検索キーワードに対して、検索対象のドキュメントに含まれ、かつ広がりのある関連キーワードを提示し、利用者の検索作業を支援することができるドキュメント検索システムを提供することができる。そして、このようなドキュメント検索システムは、例えば、検索対象ドキュメントが施設情報の場合、利用者の検索作業を十分に支援することができる。 According to the present invention, it is possible to provide a document search system capable of supporting a user's search operation by presenting a wide range of related keywords included in a search target document with respect to an input search keyword. it can. Such a document search system can sufficiently support a user's search work when the search target document is facility information, for example.

以下図面に基づいて、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

まず、図１を参照しながら、本発明の実施の形態に係るドキュメント検索システム１の概略構成について説明する。 First, a schematic configuration of a document search system 1 according to an embodiment of the present invention will be described with reference to FIG.

図１は、ドキュメント検索システム１の概略構成を示す図である。図１に示すように、ドキュメント検索システム１は、管理者端末３、インデックス作成用サーバ５、検索用サーバ７、ウェブサーバ９、利用者端末１１等が、ネットワーク（図示しない）を介して接続される。 FIG. 1 is a diagram showing a schematic configuration of a document search system 1. As shown in FIG. 1, the document search system 1 includes an administrator terminal 3, an index creation server 5, a search server 7, a web server 9, a user terminal 11, and the like connected via a network (not shown). The

インデックスとは、利用者に提示する関連キーワードに係る関連単語を高速に検索するために作成するデータを指す。本発明の実施の形態では、インデックスを作成し、関連単語の情報（検索単語との関連情報、格納場所情報、関連単語の属性情報等）を複数のファイルに分割して保持する。 The index refers to data that is created in order to quickly search for related words related to related keywords presented to the user. In the embodiment of the present invention, an index is created, and related word information (related information with a search word, storage location information, related word attribute information, etc.) is divided into a plurality of files and held.

管理者端末３は、ドキュメント検索システム１の管理者が使用する端末であり、インデックス作成用サーバ５と通信を行う。管理者は、管理者端末３を介してドキュメント検索システム１で使用するデータをインデックス作成用サーバ５に送信する。尚、管理者は、ドキュメント検索システム１で使用するデータを直接インデックス作成用サーバ５に入力しても良い。 The administrator terminal 3 is a terminal used by the administrator of the document search system 1 and communicates with the index creation server 5. The administrator transmits data used in the document search system 1 to the index creation server 5 via the administrator terminal 3. The administrator may directly input data used in the document search system 1 to the index creation server 5.

インデックス作成用サーバ５は、関連キーワードのインデックスを作成するサーバであり、管理者端末３、検索用サーバ７、ウェブサーバ９と通信を行う。 The index creation server 5 is a server that creates an index of related keywords, and communicates with the administrator terminal 3, the search server 7, and the web server 9.

検索用サーバ７は、関連キーワードのデータを保持するサーバであり、インデックス作成用サーバ５、ウェブサーバ９と通信を行う。 The search server 7 is a server that holds related keyword data, and communicates with the index creation server 5 and the web server 9.

ウェブサーバ９は、利用者端末１１から送信される検索等の要求に応答するサーバであり、インデックス作成用サーバ５、検索用サーバ７と通信を行う。 The web server 9 is a server that responds to a search request transmitted from the user terminal 11, and communicates with the index creation server 5 and the search server 7.

利用者端末１１は、ドキュメント検索システム１の利用者が使用する端末であり、ウェブサーバ９と通信を行う。ドキュメント検索システム１を実施する際、利用者端末１１は、基本的には、市販のＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）ソフト、ウェブ閲覧ソフトがインストールされていれば、特別な機能を有する必要はない。 The user terminal 11 is a terminal used by a user of the document search system 1 and communicates with the web server 9. When the document search system 1 is implemented, the user terminal 11 basically does not need to have a special function if a commercially available OS (Operating System) software and web browsing software are installed.

インデックス作成用サーバ５、検索用サーバ７、ウェブサーバ９は、１つの装置で構成しても良い。また、インデックス作成用サーバ５、検索用サーバ７、ウェブサーバ９は、負荷分散、耐障害性等を考慮して、同一の機能を有する装置を複数台設置するようにしても良い。 The index creation server 5, the search server 7, and the web server 9 may be configured by a single device. The index creation server 5, the search server 7, and the web server 9 may be provided with a plurality of devices having the same function in consideration of load distribution, fault tolerance, and the like.

以下では、インデックス作成用サーバ５、検索用サーバ７、ウェブサーバ９を区別しない、または総称する場合、単に「サーバ」という。また、管理者端末３、利用者端末１１を区別しない、または総称する場合、単に「端末」という。 Hereinafter, when the index creation server 5, the search server 7, and the web server 9 are not distinguished or collectively referred to, they are simply referred to as “servers”. Further, when the administrator terminal 3 and the user terminal 11 are not distinguished or collectively referred to, they are simply referred to as “terminals”.

次に、図２を参照しながら、各装置のハードウェア構成を説明する。 Next, the hardware configuration of each device will be described with reference to FIG.

図２は、端末とサーバを実現するコンピュータのハードウェア構成図である。尚、図２のハードウェア構成は一例であり、用途、目的に応じて様々な構成を採ることが可能である。
端末とサーバを実現するコンピュータは、制御部２１、記憶部２３、メディア入出力部２５、通信制御部２７、入力部２９、表示部３１、周辺機器Ｉ／Ｆ部３３等が、バス３５を介して接続される。 FIG. 2 is a hardware configuration diagram of a computer that realizes a terminal and a server. Note that the hardware configuration in FIG. 2 is an example, and various configurations can be adopted depending on the application and purpose.
A computer that realizes a terminal and a server includes a control unit 21, a storage unit 23, a media input / output unit 25, a communication control unit 27, an input unit 29, a display unit 31, a peripheral device I / F unit 33, and the like via a bus 35. Connected.

制御部２１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成される。 The control unit 21 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵは、記憶部２３、ＲＯＭ、記録媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス３５を介して接続された各装置を駆動制御し、端末とサーバが行う後述する処理を実現する。
ＲＯＭは、不揮発性メモリであり、コンピュータのブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持している。
ＲＡＭは、揮発性メモリであり、記憶部２３、ＲＯＭ、記録媒体等からロードしたプログラム、データ等を一時的に保持するとともに、制御部２１が各種処理を行う為に使用するワークエリアを備える。 The CPU calls a program stored in the storage unit 23, ROM, recording medium or the like to a work memory area on the RAM and executes it, and drives and controls each device connected via the bus 35, which is performed by the terminal and the server. The processing described later is realized.
The ROM is a non-volatile memory and permanently holds a computer boot program, a program such as BIOS, data, and the like.
The RAM is a volatile memory, and temporarily stores programs, data, and the like loaded from the storage unit 23, ROM, recording medium, and the like, and includes a work area used by the control unit 21 for performing various processes.

記憶部２３は、ＨＤＤ（ハードディスクドライブ）であり、制御部２１が実行するプログラム、プログラム実行に必要なデータ等が格納される。プログラムに関しては、ＯＳに相当する制御プログラムや、後述の処理に相当するアプリケーションプログラムが格納されている。
これらの各プログラムコードは、制御部２１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて各種の手段として実行される。 The storage unit 23 is an HDD (hard disk drive), and stores a program executed by the control unit 21, data necessary for program execution, and the like. As for the program, a control program corresponding to the OS and an application program corresponding to processing to be described later are stored.
Each of these program codes is read by the control unit 21 as necessary, transferred to the RAM, read by the CPU, and executed as various means.

メディア入出力部２５（ドライブ装置）は、データの入出力を行い、例えば、ＣＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）、ＤＶＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）、ＭＯドライブ等のメディア入出力装置を有する。 The media input / output unit 25 (drive device) inputs / outputs data, for example, a CD drive (-ROM, -R, -RW, etc.), DVD drive (-ROM, -R, -RW, etc.), MO drive, etc. And other media input / output devices.

通信制御部２７は、通信制御装置、通信ポート等を有し、コンピュータとネットワーク１３間の通信を媒介する通信インタフェースであり、ネットワーク１３を介して、他のコンピュータ間との通信制御を行う。 The communication control unit 27 has a communication control device, a communication port, and the like, is a communication interface that mediates communication between the computer and the network 13, and controls communication with other computers via the network 13.

入力部２９は、データの入力を行い、例えば、キーボード、マウス等のポインティングデバイス、テンキー等の入力装置を有する。
入力部２９を介して、コンピュータに対して、操作指示、動作指示、データ入力等を行うことができる。 The input unit 29 inputs data and includes, for example, a keyboard, a pointing device such as a mouse, and an input device such as a numeric keypad.
An operation instruction, an operation instruction, data input, and the like can be performed on the computer via the input unit 29.

表示部３１は、ＣＲＴモニタ、液晶パネル等のディスプレイ装置、ディスプレイ装置と連携してコンピュータのビデオ機能を実現するための論理回路等（ビデオアダプタ等）を有する。 The display unit 31 includes a display device such as a CRT monitor and a liquid crystal panel, and a logic circuit (such as a video adapter) for realizing a video function of the computer in cooperation with the display device.

周辺機器Ｉ／Ｆ（インタフェース）部３３は、コンピュータに周辺機器を接続させるためのポートであり、周辺機器Ｉ／Ｆ部３３を介してコンピュータは周辺機器とのデータの送受信を行う。周辺機器Ｉ／Ｆ部３３は、ＵＳＢやＩＥＥＥ１３９４やＲＳ−２３２Ｃ等で構成されており、通常複数の周辺機器Ｉ／Ｆを有する。周辺機器との接続形態は有線、無線を問わない。 The peripheral device I / F (interface) unit 33 is a port for connecting a peripheral device to the computer, and the computer transmits and receives data to and from the peripheral device via the peripheral device I / F unit 33. The peripheral device I / F unit 33 is configured by USB, IEEE 1394, RS-232C, or the like, and usually has a plurality of peripheral devices I / F. The connection form with the peripheral device may be wired or wireless.

バス３５は、各装置間の制御信号、データ信号等の授受を媒介する経路である。 The bus 35 is a path that mediates transmission / reception of control signals, data signals, and the like between the devices.

次に、図３を参照しながら、サーバのソフトウェア構成について説明する。 Next, the software configuration of the server will be described with reference to FIG.

図３は、サーバのソフトウェア構成図である。図３に示すように、サーバは、関連キーワードインデックス作成アプリケーション４１、関連キーワード検索アプリケーション４３等を具備する。 FIG. 3 is a software configuration diagram of the server. As shown in FIG. 3, the server includes a related keyword index creation application 41, a related keyword search application 43, and the like.

関連キーワードインデックス作成アプリケーション４１は、インデックス作成用サーバ５にインストールされ（但し、管理者端末３から入力する部分は、必要があれば、管理者端末３にインストールされる。）、単語情報登録５１、ドキュメント登録５２、インデックス作成５３、インデックス配布５４、インデックス作成履歴参照５６等の機能を有する。更に、インデックス作成５３の機能は、インデックス更新対象ドキュメント判別処理７１、形態素解析処理７２、単語変換処理７３、キーワードアシスト利用単語選別処理７４、関連度算出処理７５、インデックス作成処理７６等の処理を実行する。機能および処理の詳細は、図４から図１６の説明にて後述する。 The related keyword index creation application 41 is installed in the index creation server 5 (however, a part input from the administrator terminal 3 is installed in the administrator terminal 3 if necessary), and word information registration 51, It has functions such as document registration 52, index creation 53, index distribution 54, and index creation history reference 56. Furthermore, the function of the index creation 53 executes processes such as an index update target document discrimination process 71, a morpheme analysis process 72, a word conversion process 73, a keyword assist use word selection process 74, a relevance calculation process 75, and an index creation process 76. To do. Details of the functions and processing will be described later with reference to FIGS.

尚、キーワードアシストとは、検索キーワードの関連キーワードを提示して、利用者の検索作業を支援することを意味する。 The keyword assist means that the related keyword of the search keyword is presented to assist the user's search operation.

また、関連キーワードインデックス作成アプリケーション４１は、単語属性出現パターン情報１０１、単語属性情報１０２、単語変換情報１０３、ドキュメント情報１０４、更新ドキュメント履歴情報１０５、インデックス実行処理履歴情報１０６、更新対象ドキュメント情報１０７、形態素解析結果情報１０８、単語変換結果情報１０９、関連度計算対象単語情報１１０、単語関連度情報１１１、インデックス作成処理履歴情報１１２、キーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５等のデータを一時的、又は半永久的に保持する。データの詳細は、図４から図１６の説明にて後述する。 The related keyword index creation application 41 includes word attribute appearance pattern information 101, word attribute information 102, word conversion information 103, document information 104, updated document history information 105, index execution processing history information 106, update target document information 107, Morphological analysis result information 108, word conversion result information 109, relevance calculation target word information 110, word relevance information 111, index creation processing history information 112, keyword assist hash information 113, keyword assist search word information 114, keyword assist related words Data such as information 115 is temporarily or semi-permanently retained. Details of the data will be described later with reference to FIGS.

関連キーワード検索アプリケーション４３は、検索用サーバ７にインストールされ（但し、利用者端末１１からの要求に応答する部分は、ウェブサーバ９にインストールされる。）、関連キーワード検索５５等の機能を有する。機能の詳細は、図４から図１６の説明にて後述する。 The related keyword search application 43 is installed in the search server 7 (however, a part responding to a request from the user terminal 11 is installed in the web server 9), and has functions such as a related keyword search 55. Details of the function will be described later with reference to FIGS.

また、関連キーワード検索アプリケーション４３は、関連キーワードインデックス作成アプリケーション４１から配布される、キーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５等のデータを半永久的に保持する。データの詳細は、図４から図１６の説明にて後述する。 Further, the related keyword search application 43 semipermanently holds data such as the keyword assist hash information 113, the keyword assist search word information 114, and the keyword assist related word information 115 distributed from the related keyword index creation application 41. Details of the data will be described later with reference to FIGS.

次に、図４を参照しながら、ドキュメント検索システム１における作業フローについて説明する。 Next, a work flow in the document search system 1 will be described with reference to FIG.

図４は、ドキュメント検索システム１における作業フローの概要を示す図である。図４に示すように、ドキュメント検索システム１における作業は、単語情報の定義６１、ドキュメントの収集・更新６２、インデックスの作成６３、インデックスの配布６４、検索要求の受付６５等の順番で行う。 FIG. 4 is a diagram showing an outline of a work flow in the document search system 1. As shown in FIG. 4, the work in the document search system 1 is performed in the order of word information definition 61, document collection / update 62, index creation 63, index distribution 64, search request acceptance 65, and the like.

単語情報の定義６１では、管理者が単語属性出現パターン情報１０１、単語属性情報１０２、単語変換情報１０３等の登録・更新・削除作業を行う。このとき、関連キーワードインデックス作成アプリケーション４１が具備する単語情報登録５１の機能を用いる。単語情報の定義６１は、システム運用前に一括で行うことが望ましい。また、単語情報の定義６１は、システム運用後も、管理者が定期的に行うことが望ましい。 In the word information definition 61, the administrator performs registration / update / deletion work of the word attribute appearance pattern information 101, the word attribute information 102, the word conversion information 103, and the like. At this time, the function of the word information registration 51 included in the related keyword index creation application 41 is used. It is desirable to define the word information 61 in a batch before operating the system. Moreover, it is desirable for the administrator to periodically define the word information definition 61 even after the system is operated.

ドキュメントの収集・更新６２では、例えば、管理者がドキュメント情報１０４等の登録・更新・削除作業を行う。このとき、関連キーワードインデックス作成アプリケーション４１が具備するドキュメント登録５２の機能を用いる。また、例えば、コンピュータがドキュメントの自動収集作業を行っても良い。 In the document collection / update 62, for example, an administrator registers, updates, and deletes the document information 104 and the like. At this time, the function of the document registration 52 provided in the related keyword index creation application 41 is used. Further, for example, a computer may perform an automatic document collection operation.

インデックスの作成６３では、インデックス作成用サーバ５がキーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５等の作成作業を行う。このとき、関連キーワードインデックス作成アプリケーション４１が具備するインデックス作成５３の機能を用いる。インデックスの作成６３は、管理者の指示によって実行するようにしても良いし、バッチ処理で１日１回など定期的に実行するようにしても良い。 In the index creation 63, the index creation server 5 creates keyword assist hash information 113, keyword assist search word information 114, keyword assist related word information 115 and the like. At this time, the function of the index creation 53 provided in the related keyword index creation application 41 is used. The index creation 63 may be executed according to an instruction from the administrator, or may be executed periodically such as once a day by batch processing.

インデックスの配布６４では、インデックス作成用サーバ５がキーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５等の配布作業を行う。このとき、関連キーワードインデックス作成アプリケーション４１が具備するインデックス配布５４の機能を用いる。 In the index distribution 64, the index creation server 5 distributes the keyword assist hash information 113, the keyword assist search word information 114, the keyword assist related word information 115, and the like. At this time, the function of index distribution 54 included in the related keyword index creation application 41 is used.

検索要求の受付６５では、ウェブサーバ９が利用者端末１１からの検索要求の受付作業を行う。また、検索サーバ７が検索要求に対する検索作業を行う。このとき、関連キーワード検索アプリケーション４３が具備する関連キーワード検索５５の機能を用いる。検索要求の受付６５は、ドキュメントの収集・更新６２、インデックスの作成６３、インデックスの配布６４の作業中に行っても良い。但し、一度もドキュメントの収集・更新６２、インデックスの作成６３、インデックスの配布６４の作業を行っていない場合、検索結果は０件となる。 In the search request reception 65, the web server 9 performs a search request reception operation from the user terminal 11. Further, the search server 7 performs a search operation for the search request. At this time, the function of the related keyword search 55 provided in the related keyword search application 43 is used. The search request reception 65 may be performed during the work of document collection / update 62, index creation 63, and index distribution 64. However, if the document collection / update 62, index creation 63, and index distribution 64 have not been performed, the number of search results is zero.

尚、図示はしていないが、管理者は単語情報の定義６１の作業の際、インデックス作成履歴の参照も行うことができる。このとき、管理者は、関連キーワードインデックス作成アプリケーション４１が具備するインデックス作成履歴参照５６の機能を用いて、インデックス作成処理履歴情報１１２を参照する。インデックス作成処理履歴情報１１２には、過去に実行したインデックス作成処理の履歴データが含まれる。 Although not shown, the administrator can also refer to the index creation history when working with the word information definition 61. At this time, the administrator refers to the index creation processing history information 112 using the function of the index creation history reference 56 provided in the related keyword index creation application 41. The index creation processing history information 112 includes history data of index creation processing executed in the past.

以下では、図５から図１６を参照しながら、図３で示した機能、処理、データを図４に示した作業ごとに詳細に説明する。 Hereinafter, the functions, processes, and data shown in FIG. 3 will be described in detail for each operation shown in FIG. 4 with reference to FIGS.

（単語情報の定義６１）
図５から図７を参照しながら、単語情報の定義６１の作業に関する機能等について説明する。本発明の実施の形態では、検索対象ドキュメントが施設情報の場合を例にして説明する。 (Definition 61 of word information)
With reference to FIGS. 5 to 7, functions related to the work of the word information definition 61 will be described. In the embodiment of the present invention, a case where the search target document is facility information will be described as an example.

図５は、単語属性出現パターン情報１０１の一例を示す図である。出現パターンとは、ドキュメント内にどのような種類の単語がどのような順序で出現しているかの類型である。 FIG. 5 is a diagram illustrating an example of the word attribute appearance pattern information 101. The appearance pattern is a type of what kind of word appears in what order in the document.

管理者は、単語の種類を単語属性として定義し、単語属性が検索対象ドキュメント中に一定の近さの範囲内に出現する前記単語属性の組合せを出現パターンとして定義し、単語間の関連度を算出する際の出現パターンごとの重みを出現パターン重みとして定義する。すなわち、単語属性出現パターン情報１０１は、出現パターンＩＤ、出現パターンごとに定義する複数の単語属性、出現パターン重み等のデータ項目を有する。出現パターンは、例えば、１２８個程度まで登録できる。 The administrator defines a word type as a word attribute, defines a combination of the word attributes in which the word attribute appears within a certain range in the search target document as an appearance pattern, and determines the degree of association between words. The weight for each appearance pattern at the time of calculation is defined as the appearance pattern weight. That is, the word attribute appearance pattern information 101 includes data items such as an appearance pattern ID, a plurality of word attributes defined for each appearance pattern, and an appearance pattern weight. For example, up to 128 appearance patterns can be registered.

出現パターンの数は、提示する関連キーワードの精度（ここで、精度が高いとは、例えば、多くの利用者に対して検索作業を支援できる確率が高いことを言う。）、検索処理のパフォーマンス等に影響するため、システムの運用後も随時メンテナンスを行うことが望ましい。具体的には、多くのドキュメントの内容が変更された場合、またはドキュメントの数が大きく増減した場合、管理者は、単語属性出現パターン情報１０１を更新することが望ましい。本発明の実施の形態では、単語属性出現パターン情報１０１は、管理者が容易に更新できるように構成する。 The number of appearance patterns is the accuracy of the related keywords to be presented (here, high accuracy means, for example, that there is a high probability that search operations can be supported for many users), search processing performance, etc. Therefore, it is desirable to perform maintenance from time to time even after system operation. Specifically, when the contents of many documents are changed, or when the number of documents greatly increases or decreases, the administrator desirably updates the word attribute appearance pattern information 101. In the embodiment of the present invention, the word attribute appearance pattern information 101 is configured to be easily updated by an administrator.

単語の種類、すなわち単語属性は、例えば、地名等に関する単語である「地名単語」、店舗の名称等に関する単語である「店名単語」、料理の名称等に関する単語である「料理単語」、食材等に関する単語である「食材単語」、駅の名称等に関する単語である「駅名単語」等である。出現パターンごとに並べる単語属性は２個以上であり、例えば、最大８個である。 Types of words, that is, word attributes are, for example, “place name words” that are words related to place names, etc., “store name words” that are words related to store names, etc., “cooking words” that are words related to dish names, etc. “Food word” that is a word related to “station name”, “Station name word” that is a word related to the name of the station, etc. There are two or more word attributes arranged for each appearance pattern, for example, a maximum of eight.

出現パターン重みの意義は、管理者が検索対象のドキュメント群を閲覧し（またはドキュメント作成者からの要望でも良い。）、例えば、各ドキュメントを特定する確率が高い出現パターンに対して高い値を設定する、等によって提示する関連キーワードの精度を高めることができるというものである。また、出現パターン重みは、登録した単語群の出現順序が全て一致する場合と、それ以外の場合との両方の値を設定可能である。これによって、更に、提示する関連キーワードの精度を高めることができる。 The significance of the appearance pattern weight is set to a high value for an appearance pattern in which the administrator browses a group of documents to be searched (or may be a request from the document creator) and has a high probability of specifying each document, for example. It is possible to improve the accuracy of related keywords to be presented. In addition, the appearance pattern weight can be set for both the case where the appearance order of the registered word groups is identical and the other case. This further increases the accuracy of the related keywords to be presented.

図５に示すように、出現パターンＩＤが「１」のデータは、単語属性の出現順序が「地名単語」、「店名単語」であり、単語群の出現順序が全て一致する場合の出現パターン重みが「１」、それ以外の場合の出現パターン重みが「０．５」である。また、出現パターンＩＤが「２」のデータは、単語属性の出現順序が「店名単語」、「料理単語」、「食材単語」であり、単語群の出現順序が全て一致する場合の出現パターン重みが「１」、それ以外の場合の出現パターン重みが「０．５」である。また、出現パターンＩＤが「３」のデータは、単語属性の出現順序が「地名単語」、「駅名単語」であり、単語群の出現順序が全て一致する場合の出現パターン重みが「１」、それ以外の場合の出現パターン重みが「０．５」である。 As shown in FIG. 5, in the data with the appearance pattern ID “1”, the appearance pattern weights when the appearance order of the word attributes is “place name word” and “store name word” and the appearance order of the word groups all match. Is “1”, and the appearance pattern weight in other cases is “0.5”. In addition, in the data with the appearance pattern ID “2”, the appearance pattern weights when the appearance order of the word attributes is “store name word”, “cooking word”, and “food word” and the appearance order of the word groups all match. Is “1”, and the appearance pattern weight in other cases is “0.5”. In addition, in the data with the appearance pattern ID “3”, the appearance order of the word attributes is “place name word” and “station name word”, and the appearance pattern weight is “1” when the appearance order of the word groups all match. In other cases, the appearance pattern weight is “0.5”.

図６は、単語属性情報１０２の一例を示す図である。管理者は、検索対象のドキュメント群を閲覧し（またはドキュメント作成者からの要望でも良い。）、検索キーワード、または関連キーワードとして使用する単語を決定する。そして、管理者は、検索キーワード、または関連キーワードとして使用する単語の単語属性を定義する。尚、基本的には、ここで定義した単語のみが後述する処理で用いられる。 FIG. 6 is a diagram illustrating an example of the word attribute information 102. The administrator browses a group of documents to be searched (or may be a request from the document creator) and determines a word to be used as a search keyword or a related keyword. Then, the administrator defines word attributes of words used as search keywords or related keywords. Basically, only the words defined here are used in the processing described later.

単語属性情報１０２は、単語ＩＤ、単語、単語属性等のデータ項目を有する。図６に示すように、単語ＩＤが「１」のデータは、単語が「北海道」、単語属性が「地名単語」である。また、単語ＩＤが「４」のデータは、単語が「マクドナルド（登録商標）」、単語属性が「店名単語」である。 The word attribute information 102 includes data items such as a word ID, a word, and a word attribute. As shown in FIG. 6, the data with the word ID “1” has the word “Hokkaido” and the word attribute “place name word”. The data with the word ID “4” has the word “McDonald (registered trademark)” and the word attribute “store name word”.

図７は、単語変換情報１０３の一例を示す図である。単語は、一般に表記の揺れ、略語等によって同じ意味の文字列が複数存在する。これら同じ意味の文字列を一つの単語として扱うために、管理者は、単語変換情報１０３を登録することが望ましい。 FIG. 7 is a diagram illustrating an example of the word conversion information 103. A word generally has a plurality of character strings having the same meaning depending on the shaking of the notation, abbreviations, and the like. In order to handle these character strings having the same meaning as one word, the administrator desirably registers the word conversion information 103.

単語変換情報１０３は、変換前単語、変換後単語等のデータ項目を有する。図７に示すように、変換前単語が「マクド（マクドナルド（登録商標）の略語の一つ）」のデータは、変換後単語が「マクドナルド（登録商標）」である。 The word conversion information 103 includes data items such as a pre-conversion word and a post-conversion word. As shown in FIG. 7, the data in which the pre-conversion word is “Mcdo (one of abbreviations for McDonald's (registered trademark))” is “Macdonald (registered trademark)”.

（ドキュメントの収集・更新６２）
次に、図８を参照しながら、ドキュメントの収集・更新６２の作業に関する機能等について説明する。 (Document collection / update 62)
Next, functions related to the work of document collection / update 62 will be described with reference to FIG.

図８は、ドキュメント情報１０４のデータ項目の一例を示す図である。管理者は、サーバが最終的に利用者に提示するドキュメント（厳密に言うと、サーバはドキュメントの保管場所を提示する。）に係るドキュメント情報１０４の登録・更新・削除作業を行う。また、コンピュータがドキュメントの自動収集作業を行い、ドキュメント情報１０４の登録・更新・削除作業を行っても良い。 FIG. 8 is a diagram illustrating an example of data items of the document information 104. The administrator performs registration, update, and deletion of the document information 104 related to the document that the server finally presents to the user (more precisely, the server presents the document storage location). In addition, the computer may perform an automatic document collection operation, and may register, update, or delete the document information 104.

図８に示すように、ドキュメント情報１０４は、ドキュメントＩＤ、ＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ）、サイトＩＤ、タイトル、カテゴリ、キーワード、本文、最終更新日、ダウンロード時間、状態等のデータ項目を有する。 As illustrated in FIG. 8, the document information 104 includes data items such as a document ID, a URI (Uniform Resource Identifier), a site ID, a title, a category, a keyword, a text, a last update date, a download time, and a status.

ドキュメントＩＤは、ドキュメントのユニークなＩＤ番号である。ＵＲＩは、ドキュメントのＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）である。サイトＩＤは、ドキュメントが存在するサイトのサイト名である。タイトルは、ドキュメントのタイトルである。カテゴリは、ドキュメントのカテゴリ名である。キーワードは、ドキュメントに付加するキーワードである。本文は、ドキュメントの本文である。最終更新日は、ドキュメントの最終更新日である。ダウンロード時間は、ドキュメントをダウンロードした時間である。状態は、新規または更新／変更なし／削除を表す数値である。 The document ID is a unique ID number of the document. The URI is a document URL (Uniform Resource Locator). The site ID is the site name of the site where the document exists. The title is the title of the document. The category is a category name of the document. The keyword is a keyword added to the document. The text is the text of the document. The last update date is the last update date of the document. The download time is the time when the document is downloaded. The status is a numerical value indicating new or updated / no change / deleted.

本発明の実施の形態で利用するデータ項目は、ドキュメントＩＤ、タイトル、本文、状態等である。 Data items used in the embodiment of the present invention are a document ID, a title, a text, a state, and the like.

（インデックスの作成６３およびインデックスの配布６４）
次に、図９から図１５を参照しながら、インデックスの作成６３およびインデックスの配布６４の作業に関する機能等について説明する。 (Index creation 63 and index distribution 64)
Next, functions and the like regarding the work of index creation 63 and index distribution 64 will be described with reference to FIGS.

図９は、インデックス更新対象ドキュメント判別処理７１におけるデータの流れを示す図である。図９に示すように、インデックス更新対象ドキュメント判別処理７１では、サーバがドキュメント情報１０４を基に、更新ドキュメント履歴情報１０５、インデックス実行処理履歴情報１０６を参照し、更新対象ドキュメント情報１０７を作成する。ここで、更新対象ドキュメント情報１０７は、次の処理（形態素解析処理７２）が終了すれば不要であることから、一時的に保持していれば良い。 FIG. 9 is a diagram illustrating a data flow in the index update target document determination processing 71. As shown in FIG. 9, in the index update target document determination processing 71, the server creates update target document information 107 by referring to the update document history information 105 and index execution processing history information 106 based on the document information 104. Here, the update target document information 107 is not necessary when the next process (morpheme analysis process 72) is completed, and may be temporarily held.

例えば、管理者がインデックスの作成を指示する場合、全ドキュメントからインデックスを新規に作成するか（新規作成モード）、新規登録／更新／削除された差分のみを対象に既存のインデックスを更新するか（更新モード）を選択するようにしても良い。また、バッチ処理の場合、例えば、特定曜日、または月末日のみ新規作成モードとし、それ以外の日は更新モードとするようにしても良い。 For example, when an administrator instructs creation of an index, whether to create a new index from all documents (new creation mode), or to update an existing index only for newly registered / updated / deleted differences ( Update mode) may be selected. In the case of batch processing, for example, the new creation mode may be set only on a specific day of the week or the last day of the month, and the update mode may be set on other days.

サーバは、選択されたモード、更新ドキュメント履歴情報１０５、インデックス実行処理履歴情報１０６を参照し、ドキュメント情報１０４からインデックス作成の対象となるドキュメントを抽出し、更新対象ドキュメント情報１０７を作成する。 The server refers to the selected mode, the updated document history information 105, and the index execution processing history information 106, extracts the document to be indexed from the document information 104, and creates the update target document information 107.

図１０は、形態素解析処理７２におけるデータの流れを示す図である。図１０に示すように、形態素解析処理７２では、サーバが更新対象ドキュメント情報１０７を基に形態素解析結果情報１０８を作成する。ここで、形態素解析結果情報１０８は、次の処理（単語変換処理７３）が終了すれば不要であることから、一時的に保持していれば良い。 FIG. 10 is a diagram showing a data flow in the morphological analysis process 72. As shown in FIG. 10, in the morphological analysis process 72, the server creates morphological analysis result information 108 based on the update target document information 107. Here, the morphological analysis result information 108 is not necessary when the next process (word conversion process 73) is completed, and may be temporarily held.

サーバは、更新対象ドキュメント情報１０７から、ドキュメントのタイトルと本文を形態素レベルに分解することで、形態素解析結果情報１０８を作成する。図１０に示すように、サーバは、例えば、「恵比寿駅から３分のところにあるマクドナルド（登録商標）・・・」という文字列に対し、「恵比寿／駅／から／３／分／の／ところ／に／ある／マクドナルド（登録商標）・・・」と形態素レベルに分解する。 The server generates the morpheme analysis result information 108 by decomposing the document title and body from the update target document information 107 into morpheme levels. As shown in FIG. 10, the server, for example, with respect to a character string “McDonald's (registered trademark) 3 minutes from Ebisu Station”, “Ebisu / Station / From / 3 / min / of / However, it is decomposed into the morpheme level as follows: / N / A / McDonald (registered trademark).

図１１は、単語変換処理７３におけるデータの流れを示す図である。図１１に示すように、単語変換処理７３では、サーバが形態素解析結果情報１０８を基に単語変換結果情報１０９を作成する。ここで、単語変換結果情報１０９は、次の処理（キーワードアシスト利用単語選別処理７４）が終了すれば不要であることから、一時的に保持していれば良い。 FIG. 11 is a diagram illustrating a data flow in the word conversion process 73. As shown in FIG. 11, in the word conversion process 73, the server creates word conversion result information 109 based on the morphological analysis result information 108. Here, the word conversion result information 109 is not necessary when the next process (keyword assist using word selection process 74) is completed, and may be temporarily held.

サーバは、単語の表記の揺れ等を統一するため、単語変換情報１０３を参照し、形態素解析結果情報１０８を変換して、単語変換結果情報１０９を作成する。図１１に示すように、「マック（マクドナルド（登録商標）の略語の一つ）」、「マクド（マクドナルド（登録商標）の略語の一つ）」、「マクドナルド（登録商標）」、「ＭｃＤｏｎａｌｄ’ｓ（登録商標）」は、全て同じ意味、すなわち「マクドナルド（登録商標）」という店舗の名称であることから、「マクドナルド（登録商標）」に変換する。 The server refers to the word conversion information 103 and converts the morpheme analysis result information 108 to create word conversion result information 109 in order to unify fluctuations in word notation and the like. As shown in FIG. 11, “Mac (one of abbreviations for McDonald's (registered trademark))”, “Mcdo (one of abbreviations for McDonald's (registered trademark))”, “Macdonald (registered trademark)”, “McDonald ' Since “s (registered trademark)” has the same meaning, that is, the name of the store “McDonald (registered trademark)”, it is converted to “McDonald (registered trademark)”.

図１２は、キーワードアシスト利用単語選別処理７４におけるデータの流れを示す図である。図１２に示す「特定の形態素解析エンジンによる品詞の分類の例１１６」は、キーワードアシストに利用する単語の品詞の一例であり、品詞の分類はこれに限定されるものではない。図１２に示すように、キーワードアシスト利用単語選別処理７４では、サーバが単語変換結果情報１０９を基に、単語属性情報１０２、特定の形態素解析エンジンによる品詞の分類の例１１６によって例示されるキーワードアシストに利用する単語の品詞の分類情報を参照し、関連度計算対象単語情報１１０を作成する。 FIG. 12 is a diagram illustrating a data flow in the keyword assist using word selection process 74. “Example of part-of-speech classification 116 by a specific morphological analysis engine” shown in FIG. 12 is an example of a part-of-speech word used for keyword assist, and the part-of-speech classification is not limited to this. As shown in FIG. 12, in the keyword assist using word selection process 74, the server uses the word assist information 109 and the keyword assist exemplified by the word attribute information 102 and the part-of-speech classification example 116 by a specific morphological analysis engine. The relevance calculation target word information 110 is created by referring to the part-of-speech classification information of the word used for the above.

サーバは、キーワードアシストを高速に行う為、キーワードアシストに利用する単語を選別する。具体的には、サーバは、キーワードアシストに利用する単語の品詞１１６に示す品詞の単語、かつ単語属性情報１０２に登録されている単語を条件として、単語変換結果情報１０９から単語を抽出し、関連度計算対象単語情報１１０を作成する。
尚、前述のキーワードアシストに利用する単語の品詞は、検索対象とするドキュメントの内容、用途等によっては名詞以外を含めても良い。 The server selects words used for keyword assist in order to perform keyword assist at high speed. Specifically, the server extracts a word from the word conversion result information 109 on the condition that the part-of-speech word shown in the part-of-speech 116 of the word used for keyword assist and the word registered in the word attribute information 102 are related. The degree calculation target word information 110 is created.
It should be noted that the part of speech of the word used for the keyword assist described above may include other than nouns depending on the content, usage, etc. of the document to be searched.

図１３は、関連度算出処理７５の処理の流れを示す図である。また、図１４は、関連度算出処理７５におけるデータの流れを示す図である。以下では、図１３を中心として、図１４を参照しながら、関連度算出処理７５について説明する。 FIG. 13 is a diagram illustrating a processing flow of the relevance calculation processing 75. FIG. 14 is a diagram illustrating a data flow in the relevance calculation process 75. Hereinafter, the relevance calculation process 75 will be described with reference to FIG.

関連度算出処理７５では、サーバが、検索対象ドキュメントの中で出現パターンと一致する単語群を検索し、検索した単語群の距離と出現パターン重みとによって単語群関連度を算出する。また、サーバは、算出した単語群関連度を単語関連度情報１１１として保持する。 In the degree-of-association calculation process 75, the server searches a word group that matches the appearance pattern in the search target document, and calculates the degree of word group relevance based on the distance of the searched word group and the appearance pattern weight. In addition, the server holds the calculated word group association degree as word association degree information 111.

図１３に示すように、サーバの制御部２１は、ドキュメントＩＤの添字ｋ（ｋは自然数）に１を代入し（ステップ１００１）、単語抽出回数ｎ（ｎは０または自然数）に０を代入し（ステップ１００２）、出現パターンの添字ｍ（ｍは自然数）に１を代入し（ステップ１００３）、初期化処理を行う。ここで、ドキュメントＩＤの添字ｋとは、図１４に示す関連度計算対象単語情報１１０に係る表のドキュメントＩＤの欄の番号である。図１４では、一例として、１、２、・・・、８６２が図示されている。尚、同じ表の出現位置の欄の値は、その単語がドキュメントの中で何単語目に出現したかを示すものである。また、単語抽出回数ｎとは、図１４に示す単語抽出８１の矢印を１回目とし、単語抽出８１を行う回数を示す値である。また、出現パターンの添字ｍとは、図１４に示す単語属性出現パターン情報１０１に係る表の先頭の番号である。図１４では、一例として、１、２、３が図示されている。 As shown in FIG. 13, the control unit 21 of the server substitutes 1 for the subscript k (k is a natural number) of the document ID (step 1001), and substitutes 0 for the word extraction count n (n is 0 or a natural number). (Step 1002), 1 is substituted for the subscript m (m is a natural number) of the appearance pattern (Step 1003), and initialization processing is performed. Here, the subscript k of the document ID is a number in the column of the document ID in the table related to the relevance calculation target word information 110 shown in FIG. In FIG. 14, 1, 2,... 862 are illustrated as an example. Note that the value in the appearance position column of the same table indicates the number of words that the word appears in the document. Further, the word extraction number n is a value indicating the number of times the word extraction 81 is performed with the arrow of the word extraction 81 shown in FIG. Further, the subscript m of the appearance pattern is the head number of the table related to the word attribute appearance pattern information 101 shown in FIG. In FIG. 14, 1, 2, and 3 are illustrated as an example.

次に、サーバの制御部２１は、ｋ番目のドキュメントの中で、関連度計算対象単語情報１１０に登録されている（ｎ×Ｐ−ｎ）番目（Ｐは自然数）の単語を先頭として、Ｐ個の単語を抽出する（ステップ１００４）。ここで、抽出個数Ｐは、例えば、１００個である。また、抽出個数Ｐを大きい値にすれば、ほとんどのドキュメントに対して、ドキュメント全体の単語を抽出して後続の処理を実行することが可能である。図１４では、単語抽出８１の矢印が示す表によって、ステップ１００４で抽出した単語を表わしている。 Next, the control unit 21 of the server starts with the (n × P−n) -th (P is a natural number) word registered in the relevance calculation target word information 110 in the k-th document, P A word is extracted (step 1004). Here, the extraction number P is, for example, 100. Further, if the extraction number P is set to a large value, it is possible to extract the word of the entire document and execute subsequent processing for most documents. In FIG. 14, the word extracted in step 1004 is represented by the table indicated by the arrow of the word extraction 81.

次に、サーバの制御部２１は、ｍ番目の出現パターンに一致する全ての単語群に対して関連度を算出し（ステップ１００５）、単語関連度情報１１１を更新する（ステップ１００６）。具体的には、ステップ１００５およびステップ１００６を繰り返すことで、サーバの制御部２１は、出現パターン重みと単語群の距離の逆数との積を、事前に登録した全ての単語（但し、検索対象のドキュメントに出現する単語に限る。）および全ての出現パターンに対して合算し、単語群関連度とする。尚、図５の説明にて前述したように、出現パターン重みは、登録した単語群の出現順序が全て一致する場合と、それ以外の場合との両方の値を設定可能である。 Next, the control unit 21 of the server calculates relevance levels for all word groups that match the mth appearance pattern (step 1005), and updates the word relevance information 111 (step 1006). Specifically, by repeating Step 1005 and Step 1006, the server control unit 21 calculates the product of the appearance pattern weight and the reciprocal of the distance of the word group for all the words registered in advance (however, the search target (Only words that appear in the document.) And all occurrence patterns are added together to obtain the word group relevance. As described above with reference to FIG. 5, the appearance pattern weights can be set to values in both cases where the appearance order of the registered word groups is identical and in other cases.

図１４に示す関連度算出８２の矢印では、ステップ１００５およびステップ１００６を二つのデータ（後述するデータ（１）、データ（２）に相当）に対してだけ実行したときの例を表わしている。具体的には、サーバの制御部２１は、単語属性の出現パターン情報１０１から、出現パターンの添字ｍが「１」のデータを取得する。次に、サーバの制御部２１は、単語属性が「地名単語」と「料理単語」である単語群を抽出する。図１４に示す例では、単語が「東京」（出現位置が「１０」）と「カレー」（出現位置が「８７」）（データ（１））、および単語が「カレー」（出現位置が「８７」）と「東京」（出現位置が「９２８」）（データ（２））の二つのデータを抽出している。次に、サーバの制御部２１は、出現パターン重みと距離の逆数との積を算出する。まず、データ（１）の場合、登録した単語群に係る単語属性の出現順序（最初に「地名単語」、次に「料理単語」）が全て一致することから、出現パターン重みは「１」となる。また、距離は、例えば、単語が「カレー」の出現位置「８７」から、単語が「東京」の出現位置「１０」を引いた値とする。一方、データ（２）の場合、登録した単語群に係る単語属性の出現順序と一致しないことから、出現パターン重みは「０．５」となる。また、距離は、例えば、単語が「東京」の出現位置「９２８」から、単語が「カレー」の出現位置「８７」を引いた値とする。これを式で示すと、１×（１／（８７−１０））＋０．５×（１／（９２８−８７））＝０．０１３５８・・・となる。従って、図１４に示す単語関連度情報１１１に係る表において、単語１が「東京」、単語２が「カレー」のレコードは、関連度が「０．０１３６」となっている。 The arrow of the relevance calculation 82 shown in FIG. 14 represents an example when step 1005 and step 1006 are executed only for two data (corresponding to data (1) and data (2) described later). Specifically, the control unit 21 of the server obtains data whose appearance pattern subscript m is “1” from the word attribute appearance pattern information 101. Next, the control unit 21 of the server extracts a word group whose word attributes are “place name word” and “cooking word”. In the example shown in FIG. 14, the words “Tokyo” (appearance position “10”) and “curry” (appearance position “87”) (data (1)), and the word “curry” (appearance position “ 87 ”) and“ Tokyo ”(appearance position is“ 928 ”) (data (2)). Next, the control unit 21 of the server calculates the product of the appearance pattern weight and the reciprocal of the distance. First, in the case of data (1), since the appearance order of word attributes related to the registered word group (first “place name word” and then “cooking word”) all match, the appearance pattern weight is “1”. Become. The distance is a value obtained by subtracting the appearance position “10” of the word “Tokyo” from the appearance position “87” of the word “curry”, for example. On the other hand, in the case of data (2), the appearance pattern weight is “0.5” because it does not coincide with the appearance order of the word attributes related to the registered word group. The distance is a value obtained by subtracting the appearance position “87” of the word “curry” from the appearance position “928” of the word “Tokyo”. This is expressed by an equation: 1 × (1 / (87−10)) + 0.5 × (1 / (928−87)) = 0.01358. Therefore, in the table related to the word association degree information 111 shown in FIG. 14, the record in which the word 1 is “Tokyo” and the word 2 is “curry” has the association degree “0.0136”.

尚、図１４に示す例では、距離の算出は、出現位置の差で求めていたが、他の算出方法でも良い。例えば、各ドキュメントに含まれる総単語数を用いて正規化するようにしても良い。 In the example shown in FIG. 14, the distance is calculated from the difference in appearance position, but other calculation methods may be used. For example, normalization may be performed using the total number of words included in each document.

図１３の説明に戻る。次に、サーバの制御部２１は、出現パターンの添字ｍにｍ＋１を代入し（ステップ１００７）、ｍ＞Ｍ（Ｍは出現パターンの添字の最終番号）を満たすかどうか確認する（ステップ１００８）。
条件を満たさない場合（ステップ１００８のＮｏ）、サーバの制御部２１は、ステップ１００５から繰り返す。
条件を満たす場合（ステップ１００８のＹｅｓ）、サーバの制御部２１は、ステップ１００９に進む。 Returning to the description of FIG. Next, the control unit 21 of the server substitutes m + 1 for the subscript m of the appearance pattern (step 1007), and checks whether m> M (M is the final number of the subscript of the appearance pattern) is satisfied (step 1008).
When the condition is not satisfied (No in Step 1008), the server control unit 21 repeats from Step 1005.
When the condition is satisfied (Yes in Step 1008), the server control unit 21 proceeds to Step 1009.

次に、サーバの制御部２１は、単語抽出回数ｎにｎ＋１を代入し（ステップ１００９）、ステップ１００４でＰ個の単語を抽出しているかどうか確認する（ステップ１０１０）。
条件を満たす場合（ステップ１０１０のＹｅｓ）、サーバの制御部２１は、ステップ１００３から繰り返す。
条件を満たさない場合（ステップ１０１０のＮｏ）、サーバの制御部２１は、ステップ１０１１に進む。 Next, the control unit 21 of the server substitutes n + 1 for the number n of word extractions (step 1009), and checks whether or not P words are extracted in step 1004 (step 1010).
If the condition is satisfied (Yes in step 1010), the control unit 21 of the server repeats from step 1003.
When the condition is not satisfied (No in Step 1010), the server control unit 21 proceeds to Step 1011.

次に、サーバの制御部２１は、ドキュメントＩＤの添字ｋにｋ＋１を代入し（ステップ１０１１）、ｋ＞Ｋ（ＫはドキュメントＩＤの添字の最終番号）を満たすかどうか確認する（ステップ１０１２）。
条件を満たさない場合（ステップ１０１２のＮｏ）、サーバの制御部２１は、ステップ１００２から繰り返す。
条件を満たす場合（ステップ１０１２のＹｅｓ）、サーバの制御部２１は、処理を終了する。 Next, the control unit 21 of the server substitutes k + 1 for the subscript k of the document ID (step 1011), and checks whether k> K (K is the final number of the subscript of the document ID) is satisfied (step 1012).
When the condition is not satisfied (No in Step 1012), the control unit 21 of the server repeats from Step 1002.
If the condition is satisfied (Yes in step 1012), the control unit 21 of the server ends the process.

図１５は、インデックス作成処理７６におけるデータの流れを示す図である。図１５に示すように、インデックス作成処理７６では、サーバが単語関連度情報１１１を基に、キーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５を作成する。ここで、キーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５は、検索処理の高速化の為、図１５に示すインデックスデータ１１７に相当するデータを分割して保持している。 FIG. 15 is a diagram showing a data flow in the index creation processing 76. As shown in FIG. 15, in the index creation processing 76, the server creates keyword assist hash information 113, keyword assist search word information 114, and keyword assist related word information 115 based on the word association degree information 111. Here, the keyword assist hash information 113, the keyword assist search word information 114, and the keyword assist related word information 115 are divided and held by dividing data corresponding to the index data 117 shown in FIG. 15 in order to speed up the search process. Yes.

図４に示すインデックスの配布６４の作業に関する機能等については、特に図示はしていない。インデックスの配布６４は、関連キーワード検索アプリケーション４３をインストールする装置に対して、キーワードアシストハッシュ情報１１３、キーワードアシスト検索単語情報１１４、キーワードアシスト関連単語情報１１５を配布する作業である。 The functions related to the work of index distribution 64 shown in FIG. 4 are not particularly shown. The index distribution 64 is an operation of distributing the keyword assist hash information 113, the keyword assist search word information 114, and the keyword assist related word information 115 to an apparatus that installs the related keyword search application 43.

（検索要求の受付６５）
次に、図１６を参照しながら、検索要求の受付６５の作業に関する機能等について説明する。 (Retrieval request reception 65)
Next, with reference to FIG. 16, functions and the like related to the work of the search request reception 65 will be described.

図１６は、検索要求の受付６５に関する処理の流れを示す図である。検索要求の受付６５に関する処理では、サーバは、利用者が検索キーワードとして入力した検索単語を受信して関連単語を検索する。 FIG. 16 is a diagram showing a flow of processing related to the search request reception 65. In the processing related to the search request reception 65, the server receives a search word input as a search keyword by the user and searches for a related word.

図１６に示すように、サーバの制御部２１は、検索単語のトリミング、文字変換を行う（ステップ２００１）。検索単語のトリミング、文字変換とは、（１）検索単語の両端の空白を削除、（２）全角英数字から半角英数字への変換、（３）半角カタカナから全角カタカナへの変換、（４）大文字英字から小文字英字への変換、等を行うことである。 As shown in FIG. 16, the control unit 21 of the server performs trimming and character conversion of the search word (step 2001). Search word trimming and character conversion are (1) deleting blanks at both ends of the search word, (2) conversion from full-width alphanumeric characters to half-width alphanumeric characters, (3) conversion from half-width katakana to full-width katakana, (4 ) Conversion from uppercase letters to lowercase letters.

次に、サーバの制御部２１は、既定のハッシュ関数を利用してキーワードアシストハッシュ情報１１３から検索単語ファイル位置を検索し（ステップ２００２）、対象のデータが存在するかどうか確認する（ステップ２００３）。
対象のデータが存在する場合（ステップ２００３のＹｅｓ）、サーバの制御部２１は、ステップ２００４に進む。
対象のデータが存在しない場合、（ステップ２００３のＮｏ）、サーバの制御部２１は、処理を終了する。 Next, the control unit 21 of the server searches the search word file position from the keyword assist hash information 113 using a predetermined hash function (step 2002), and checks whether the target data exists (step 2003). .
When the target data exists (Yes in Step 2003), the control unit 21 of the server proceeds to Step 2004.
When the target data does not exist (No in Step 2003), the server control unit 21 ends the process.

次に、サーバの制御部２１は、キーワードアシスト検索単語情報１１４から関連単語ファイル位置を検索し（ステップ２００４）、対象のデータが存在するかどうか確認する（ステップ２００５）。
対象のデータが存在する場合（ステップ２００５のＹｅｓ）、サーバの制御部２１は、ステップ２００６に進む。
対象のデータが存在しない場合、（ステップ２００５のＮｏ）、サーバの制御部２１は、処理を終了する。 Next, the control unit 21 of the server searches the related word file position from the keyword assist search word information 114 (step 2004), and confirms whether the target data exists (step 2005).
When the target data exists (Yes in Step 2005), the control unit 21 of the server proceeds to Step 2006.
When the target data does not exist (No in Step 2005), the server control unit 21 ends the process.

次に、サーバの制御部２１は、キーワードアシスト関連単語情報１１５から関連単語の一覧を取得する（ステップ２００６）。ここで、関連単語とともに、関連単語に紐付く関連度、単語属性も合わせて取得することが望ましい。取得した関連単語、関連度、単語属性は、サーバの制御部２１が関連キーワードとして利用者の端末に送信する。送信する関連キーワードは、取得した関連単語等の全てでも良いし、関連度が閾値以上のものだけでも良いし、関連度が上位のものだけでも良い。
尚、サーバの制御部２１は、利用者が入力した検索キーワードによる検索を別途行い、検索キーワードによる検索結果とともに、関連キーワードを提示するようにしても良い。これによって、利用者は、自ら入力した検索キーワードによる検索結果と、提示された関連キーワードとを比較して、再び検索要求を行うかどうか判断することができる。また、サーバの制御部２１は、単語属性を取得する場合、単語属性も含めて利用者に提示するようにしても良い。これによって、利用者は、入力された検索キーワードに対して、広がりのある関連キーワード情報を自動的に（利用者自らが思考することなく）入手することができる。 Next, the control unit 21 of the server obtains a list of related words from the keyword assist related word information 115 (step 2006). Here, it is desirable to obtain the degree of association associated with the related word and the word attribute together with the related word. The acquired related word, relevance degree, and word attribute are transmitted to the user terminal by the control unit 21 of the server as related keywords. The related keywords to be transmitted may be all of the acquired related words or the like, may be those having a relevance level equal to or higher than a threshold value, and may be those having a high relevance level.
Note that the control unit 21 of the server may separately perform a search based on a search keyword input by the user, and present a related keyword together with the search result based on the search keyword. Thus, the user can determine whether or not to make a search request again by comparing the search result based on the search keyword input by himself with the presented related keyword. Moreover, you may make it the control part 21 of a server show to a user also including a word attribute, when acquiring a word attribute. As a result, the user can automatically obtain broad related keyword information (without thinking by the user himself / herself) for the input search keyword.

また、更に、サーバの制御部２１は、端末から関連キーワードを入力することなく、端末が関連キーワードによって再び検索要求を送信できるように提示することが望ましい。具体的には、（ウェブサーバ９としての）サーバの制御部２１は、例えば、「関連キーワードの表示部分は利用者が端末の入力部２９で選択可能であり、かつ利用者が入力部２９を介して関連キーワードの表示部分を選択すると、端末が関連キーワードによる検索要求を送信する」ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）レスポンスを端末に送信すれば良い。これによって、利用者は検索作業をスムーズに行うことができる。 Further, it is desirable that the control unit 21 of the server presents the terminal so that the terminal can transmit the search request again using the related keyword without inputting the related keyword from the terminal. Specifically, the control unit 21 of the server (as the web server 9), for example, “The display portion of the related keyword can be selected by the user using the input unit 29 of the terminal, and the user can input the input unit 29. When the display portion of the related keyword is selected via the terminal, an HTTP (HyperText Transfer Protocol) response in which the terminal transmits a search request based on the related keyword may be transmitted to the terminal. As a result, the user can perform the search operation smoothly.

次に、図１７から図２１を参照しながら、本発明の実施の形態に係る実施例について説明する。本実施例は、情報ポータルサイトにおいて、サイト内検索機能の補助機能として、キーワードアシスト機能を利用する例である。 Next, examples according to the embodiment of the present invention will be described with reference to FIGS. The present embodiment is an example in which a keyword assist function is used as an auxiliary function of an in-site search function in an information portal site.

図１７は、実施例のシステム構成を示す図である。図１７に示すように、本実施例では、インターネット１９ａを介してサイト閲覧者端末１１ａとＷｅｂサーバ（情報ポータルサイト）９ａとが接続し、例えば、ファイアウォール（図示しない）内でキーワードアシスト機能インデックス作成用サーバ５ａ、キーワードアシスト機能検索用サーバ７ａ、Ｗｅｂサーバ９ａとが接続している。 FIG. 17 is a diagram illustrating a system configuration of the embodiment. As shown in FIG. 17, in this embodiment, the site viewer terminal 11a and the Web server (information portal site) 9a are connected via the Internet 19a, and for example, a keyword assist function index is created in a firewall (not shown). Server 5a, keyword assist function search server 7a, and Web server 9a are connected.

図１８は、実施例で使用する主なデータを示す図である。図１８に示すように、本実施例では、図１８に示す単語属性出現パターン情報１０１ａ、単語属性情報１０２ａを使用する。 FIG. 18 is a diagram illustrating main data used in the embodiment. As shown in FIG. 18, in this embodiment, the word attribute appearance pattern information 101a and the word attribute information 102a shown in FIG. 18 are used.

サイト管理者は、情報ポータルサイト内のコンテンツ（＝検索対象のドキュメント）の内容から、単語の出現パターンを推測し、単語属性出現パターン情報１０１ａ、単語属性情報１０２ａをキーワードアシスト機能インデックス作成用サーバ５ａに登録する。データの登録は、例えば、サイト管理者の端末から、専用のＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）アプリケーションによって行う。 The site administrator infers the word appearance pattern from the content of the content (= document to be searched) in the information portal site, and uses the word attribute appearance pattern information 101a and the word attribute information 102a as the keyword assist function index creation server 5a. Register with. Data registration is performed by a dedicated GUI (Graphical User Interface) application from the site administrator's terminal, for example.

また、サイト管理者は、Ｗｅｂサーバ９ａから検索対象のドキュメントをキーワードアシスト機能インデックス作成用サーバ５ａにインポートするように指示する。ドキュメントのインポートは、例えば、サイト管理者の端末から、専用のＧＵＩアプリケーションによって行う。同様に、サイト管理者は、インデックスの作成、インデックスの配布について、サイト管理者の端末から、専用のＧＵＩアプリケーションによってサーバに指示する。 In addition, the site administrator instructs the Web server 9a to import the search target document into the keyword assist function index creation server 5a. For example, the document is imported from a site administrator's terminal using a dedicated GUI application. Similarly, the site administrator instructs the server about the creation of the index and the distribution of the index from the terminal of the site manager by the dedicated GUI application.

図１９は、検索キーワードの入力を示す図である。Ｗｅｂサーバ９ａは、情報ポータルサイトのサイト内検索フォームに入力された検索文字列を受け取る、すなわちサイト閲覧者端末１１ａからの検索要求を受け付ける。そして、Ｗｅｂサーバ９ａは、形態素解析を行い、単語に分解する。図１９に示す検索キーワードの例「カレー有名店東京」に対しては、「カレー／有名／店／東京」と分解する。 FIG. 19 is a diagram illustrating input of a search keyword. The Web server 9a receives the search character string input to the in-site search form of the information portal site, that is, receives a search request from the site viewer terminal 11a. Then, the Web server 9a performs morphological analysis and breaks it down into words. The example of the search keyword shown in FIG. 19 is “curry / famous store / Tokyo”, and is decomposed into “curry / famous / store / Tokyo”.

図２０は、関連キーワードの検索を示す図である。Ｗｅｂサーバ９ａは、分解した単語のうち、サイト管理者が登録した単語属性情報１０２ａに含まれる単語のみを取り出し、それぞれに対して関連単語を検索する。図２０に示すように、最初に「カレー」で検索すると、関連単語の検索結果は、「地名：インド」、「地名：銀座」、「食材：牛肉」、「食材：シーフード」である。次に「東京」で検索すると、関連単語の検索結果は、「地名：品川」、「地名：銀座」、「行動：観光」、「食材：シーフード」である。 FIG. 20 is a diagram illustrating retrieval of related keywords. The Web server 9a extracts only the words included in the word attribute information 102a registered by the site manager from the decomposed words, and searches for related words for each. As shown in FIG. 20, when first searching for “curry”, the related word search results are “place name: India”, “place name: Ginza”, “food: beef”, and “food: seafood”. Next, when searching for “Tokyo”, the related word search results are “place name: Shinagawa”, “place name: Ginza”, “action: tourism”, and “food: seafood”.

図２１は、関連キーワードの提示を示す図である。Ｗｅｂサーバ９ａは、関連単語の検索結果から共通の単語を取り出し、サイト閲覧者に対して関連キーワードを提示する。図２１に示す例では、共通の単語が「銀座」、「シーフード」であることから、一つ目の関連キーワードとして、単語属性を含めた「＋地名：[カレー東京銀座]」を提示する。また、二つ目の関連キーワードとして、単語属性を含めた「＋食材：[カレー東京シーフード]」を提示する。このように、本実施例においては、入力された検索キーワードに対して広がりのある関連キーワードを提示することができる。また、検索対象のドキュメント群に対して、検索結果をより適切に絞り込むことが可能な関連キーワードを提示することができる。 FIG. 21 is a diagram illustrating presentation of related keywords. The Web server 9a extracts a common word from the related word search result and presents the related keyword to the site viewer. In the example shown in FIG. 21, since the common words are “Ginza” and “Seafood”, “+ place name: [curry Tokyo Ginza]” including the word attribute is presented as the first related keyword. In addition, “+ ingredients: [Curry Tokyo Seafood]” including the word attribute is presented as the second related keyword. As described above, in this embodiment, related keywords that are broad with respect to the input search keyword can be presented. Further, it is possible to present related keywords that can more appropriately narrow down the search results for the document group to be searched.

以上、添付図面を参照しながら、本発明に係るドキュメント検索システム等の好適な実施形態について説明したが、本発明はかかる例に限定されない。当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the document search system and the like according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. Understood.

ドキュメント検索システム１の概略構成を示す図The figure which shows schematic structure of the document search system 1 コンピュータのハードウェア構成図Computer hardware configuration diagram サーバのソフトウェア構成図Server software configuration diagram ドキュメント検索システム１における作業フローの概要を示す図The figure which shows the outline | summary of the work flow in the document search system 1 単語属性出現パターン情報１０１の一例を示す図The figure which shows an example of the word attribute appearance pattern information 101 単語属性情報１０２の一例を示す図The figure which shows an example of the word attribute information 102 単語変換情報１０３の一例を示す図The figure which shows an example of the word conversion information 103 ドキュメント情報１０４のデータ項目の一例を示す図The figure which shows an example of the data item of the document information 104 インデックス更新対象ドキュメント判別処理７１におけるデータの流れを示す図The figure which shows the data flow in the index update object document discrimination | determination processing 71. 形態素解析処理７２におけるデータの流れを示す図The figure which shows the flow of the data in the morphological analysis process 72 単語変換処理７３におけるデータの流れを示す図The figure which shows the flow of the data in the word conversion process 73 キーワードアシスト利用単語選別処理７４におけるデータの流れを示す図The figure which shows the flow of the data in the keyword assist utilization word selection process 74 関連度算出処理７５の処理の流れを示す図The figure which shows the flow of a process of the relevance calculation process 75 関連度算出処理７５におけるデータの流れを示す図The figure which shows the flow of the data in the relevance calculation process 75 インデックス作成処理７６におけるデータの流れを示す図The figure which shows the flow of the data in the index creation process 76 検索要求の受付６５に関する処理の流れを示す図The figure which shows the flow of a process regarding reception 65 of a search request. 実施例のシステム構成を示す図The figure which shows the system configuration | structure of an Example. 実施例で使用する主なデータを示す図Figure showing the main data used in the examples 検索キーワードの入力を示す図Diagram showing search keyword input 関連キーワードの検索を示す図Diagram showing related keyword search 関連キーワードの提示を示す図Diagram showing related keyword presentation

Explanation of symbols

１………ドキュメント検索システム
３………管理者端末
５………インデックス作成用サーバ
７………検索用サーバ
９………ウェブサーバ
１１………利用者端末
１３………ネットワーク
２１………制御部
２３………記憶部
２５………メディア入出力部
２７………通信制御部
２９………入力部
３１………表示部
３３………周辺機器Ｉ／Ｆ部
３５………バス
４１………関連キーワードインデックス作成アプリケーション
４３………関連キーワード検索アプリケーション
５１………単語情報登録
５２………ドキュメント登録
５３………インデックス作成
５４………インデックス配布
５５………関連キーワード検索
５６………インデックス作成履歴参照
６１………単語情報の定義
６２………ドキュメントの収集・更新
６３………インデックスの作成
６４………インデックスの配布
６５………検索要求の受付
７１………インデックス更新対象ドキュメント判別処理
７２………形態素解析処理
７３………単語変換処理
７４………キーワードアシスト利用単語選別処理
７５………関連度算出処理
７６………インデックス作成処理
１０１………単語属性出現パターン情報
１０２………単語属性情報
１０３………単語変換情報
１０４………ドキュメント情報
１０５………更新ドキュメント履歴情報
１０６………インデックス実行処理履歴情報
１０７………更新対象ドキュメント情報
１０８………形態素解析結果情報
１０９………単語変換結果情報
１１０………関連度計算対象単語情報
１１１………単語関連度情報
１１２………インデックス作成処理履歴情報
１１３………キーワードアシストハッシュ情報
１１４………キーワードアシスト検索単語情報
１１５………キーワードアシスト関連単語情報
１１６………特定の形態素解析エンジンによる品詞の分類の例
１１７………インデックスデータ DESCRIPTION OF SYMBOLS 1 ......... Document search system 3 ......... Administrator terminal 5 ......... Index creation server 7 ......... Search server 9 ......... Web server 11 ......... User terminal 13 ......... Network 21 ... ... Control unit 23 ......... Storage unit 25 ......... Media input / output unit 27 ......... Communication control unit 29 ......... Input unit 31 ......... Display unit 33 ......... Peripheral device I / F unit 35 ......... Bus 41 ... …… Related keyword index creation application 43 ……… Related keyword search application 51 ……… Word information registration 52 ……… Document registration 53 ……… Index creation 54 ……… Index distribution 55 ……… Related keyword search 56 ... …… Index creation history reference 61 ……… Definition of word information 62 ……… Document collection / update 63 ……… Indices 64 ......... Distribution of index 65 ......... Reception of search request 71 ......... Index update target document discrimination processing 72 ......... Morphological analysis processing 73 ......... Word conversion processing 74 ......... Words using keyword assist Sorting process 75 ......... Relevance calculation process 76 ... ... Index creation process 101 ... ... Word attribute appearance pattern information 102 ... ... Word attribute information 103 ... ... Word conversion information 104 ... ... Document information 105 ... ... Update document history information 106 ......... Index execution process history information 107 ......... Update target document information 108 ......... Morphological analysis result information 109 ......... Word conversion result information 110 ......... Relation degree calculation target word information 111 ... ... Word relevance information 112 ......... Index creation processing history information 113 ......... Over the word assist hash information 114 ......... keyword assisted search word information 115 ......... keywords assist relevant word information 116 ......... particular example 117 ......... index data word class classification according to the morphological analysis engine

Claims

A document search system in which a terminal and a server are connected via a network, and the server presents a related keyword of a search keyword received from the terminal,
The server
When defining a word type as a word attribute, defining a combination of the word attributes where the word attribute appears within a certain range in the search target document as an appearance pattern, and calculating the word group relevance Means for holding word attribute appearance pattern information in which the weight for each appearance pattern is defined as the appearance pattern weight;
Relevance calculating means for searching for a word group that matches the appearance pattern in the search target document, and calculating a word group relevance by the distance of the searched word group and the appearance pattern weight;
Means for holding the word group relevance calculated by the relevance calculation means as word relevance information;
When the search keyword is received from the terminal, related keyword presenting means for presenting the related keyword of the search keyword with reference to the word relevance information;
A document search system comprising:

The word group relevance is a product of the product of the appearance pattern weight and the reciprocal of the distance of the word group for all previously registered words and all the appearance patterns. Item 2. The document search system according to Item 1.

2. The document search system according to claim 1, wherein the appearance pattern weights can be set for both cases where the appearance order of the registered word groups all match and other cases.

The document search system according to claim 1, wherein the word attribute appearance pattern information is updateable.

The document search system according to claim 1, wherein the related keyword presenting unit presents the related keyword together with a search result based on the search keyword and / or a word attribute related to the related keyword.

A server connected to a terminal via a network and presenting a related keyword of a search keyword received from the terminal,
When defining a word type as a word attribute, defining a combination of the word attributes where the word attribute appears within a certain range in the search target document as an appearance pattern, and calculating the word group relevance Means for holding word attribute appearance pattern information in which the weight for each appearance pattern is defined as the appearance pattern weight;
Relevance calculating means for searching for a word group that matches the appearance pattern in the search target document, and calculating a word group relevance by the distance of the searched word group and the appearance pattern weight;
Means for holding the word group relevance calculated by the relevance calculation means as word relevance information;
When the search keyword is received from the terminal, related keyword presenting means for presenting the related keyword of the search keyword with reference to the word relevance information;
A server comprising:

The word group relevance is a product of the product of the appearance pattern weight and the reciprocal of the distance of the word group for all previously registered words and all the appearance patterns. Item 7. The server according to item 6.

The server according to claim 6, wherein the appearance pattern weight can be set to a value for both cases where the appearance order of the registered word groups all match and other cases.

The server according to claim 6, wherein the word attribute appearance pattern information is updateable.

The server according to claim 6, wherein the related keyword presenting unit presents the related keyword together with a search result based on the search keyword and / or a word attribute related to the related keyword.

A terminal and a server are connected via a network, the server defines a word type as a word attribute, and the word attribute appears in the search target document within a certain close range. It is defined as a pattern, holds word attribute appearance pattern information in which the weight for each appearance pattern when calculating the word group relevance is defined as the appearance pattern weight, and the related keyword of the search keyword received by the server from the terminal A document search method to be presented,
The server searches for a word group that matches the appearance pattern in the search target document, and calculates a word group relevance by the distance of the searched word group and the appearance pattern weight;
The server holds the word group relevance calculated by the step of calculating the word group relevance as word relevance information;
The terminal transmitting the search keyword to the server;
The server refers to the word relevance information and presents the received related keyword of the search keyword;
A document search method characterized by including:

The word group relevance is a product of the product of the appearance pattern weight and the reciprocal of the distance of the word group for all previously registered words and all the appearance patterns. Item 12. The document search method according to Item 11.

The document search method according to claim 11, wherein the appearance pattern weights can be set in both cases where the appearance order of the registered word groups is identical and in other cases.

The document search method according to claim 11, wherein the word attribute appearance pattern information is updateable.

12. The document search method according to claim 11, wherein the step of presenting the related keyword presents the related keyword together with a search result by the search keyword and / or a word attribute related to the related keyword.

A program that causes a computer to function as the server according to any one of claims 6 to 10.