JP2005352878A

JP2005352878A - Document retrieval system, retrieval server and retrieval client

Info

Publication number: JP2005352878A
Application number: JP2004174363A
Authority: JP
Inventors: Osamu Konichi; 修今一; Yoko Oi; 洋子大井; Yoshiki Niwa; 芳樹丹羽
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-06-11
Filing date: 2004-06-11
Publication date: 2005-12-22
Also published as: US20050278293A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide the summary of a retrieval result in an associative retrieval system based on a plurality of view points. <P>SOLUTION: One document database is indexed in a plurality of manners, whereby a retrieval result can be overview-displayed from a plurality of view points. Respective documents in indexed document databases 403, 503 and 603 are managed with a common identifier, whereby the summary of a document group obtained as the retrieval result can be formed by use of the respective indexes. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文書検索システムに関し、特に検索結果を複数の視点から概観表示する連想検索システムに関する。 The present invention relates to a document search system, and more particularly to an associative search system that displays search results from a plurality of viewpoints.

コンピュータやインターネットの普及にともない、文書情報の電子化が急速に進んでいる。入手可能な情報が増加するにつれ、その中から必要な情報を探し出すことが重要な課題となってくる。また、複数の文書データベース間での文書群の関連性を調べたいという要求も高まっている。例えば、興味のある新聞記事に対して、それに関連する百科事典の項目を検索したいという要求は多い。 With the spread of computers and the Internet, the digitization of document information is progressing rapidly. As the available information increases, it becomes an important issue to search for necessary information. In addition, there is an increasing demand for examining the relationship between document groups among a plurality of document databases. For example, there are many requests to search for articles of encyclopedia related to newspaper articles of interest.

現在実用となっているキーワード検索では、複数の文書データベースを切り替えて検索することは可能であるが、ある文書データベースに含まれる文書群に対して、それに関連する文書群を、同一文書データベース、あるいは、別の文書データベースから検索すること（文書連想検索と呼ばれる検索方式）は不可能である。 In keyword search that is currently in practical use, it is possible to search by switching between multiple document databases. However, for a document group included in a document database, a related document group is assigned to the same document database, or It is impossible to search from another document database (search method called document associative search).

同一文書データベース内に限れば、文書間の関連度をあらかじめ計算しておくことで、文書群を検索入力とした文書連想検索を実現することはできる。しかし、複数の文書データベースに対しては、あらかじめ計算すべき文書間の関連度が文書データベース数の増加にともなって組み合わせ数が爆発的に増加するため、現実的には不可能となっている。 If it is limited to the same document database, it is possible to realize a document associative search using a document group as a search input by calculating the degree of association between documents in advance. However, for a plurality of document databases, the degree of association between documents to be calculated in advance increases as the number of document databases increases.

これに対して、特開２０００−１５５７５８号公報「複数文書データベースを対象とした文書検索方法及び文書検索サービス」には、利用者が指定した文書データベース中の任意の文書群に対して、その文書群に関連する文書群を任意の文書データベースから効率良く検索する方法が開示されている。この方法では、文書群として入力された検索入力内の特徴的な単語のみを使うことで、高速な文書連想検索を実現している。この方法により、利用者は複数の異なる種類の文書データベースを切り替えながら、文書群の関連性を調べることで、高精度かつ効率の良い文書検索を行なうことが可能となる。また、この方法では、検索結果として得られた文書群に出現する特徴的な単語を抽出し、それらを検索結果の概観（要約）として利用者に提示することで、利用者が検索結果の可否を判断する際の支援手段も提供している。 On the other hand, Japanese Patent Laid-Open No. 2000-155758, “Document Search Method and Document Search Service Targeting Multiple Document Databases”, describes an arbitrary document group in a document database designated by a user for the document. A method for efficiently retrieving a document group related to a group from an arbitrary document database is disclosed. This method realizes a high-speed document associative search by using only characteristic words in the search input inputted as a document group. With this method, the user can perform highly accurate and efficient document search by checking the relationship between document groups while switching between a plurality of different types of document databases. In this method, characteristic words appearing in the document group obtained as a search result are extracted and presented to the user as an overview (summary) of the search result. It also provides a support means when judging.

特開２０００−１５５７５８号公報JP 2000-155758 A

単語に基づく文書検索では、文書中に出現する単語でその文書のインデックス付けを行ない、文書検索を実現している。特開２０００−１５５７５８号公報で開示されている方法でも同様であり、文書から特徴的な単語を抽出するために、その文書に含まれている単語について統計的尺度（tf*idf法などが代表的）を用いてその重要度を計算し、重要度の高い順に単語を抽出している。インデックス付けは、ひとつの文書データベースに対して一通りの方法で行なうのが一般的である。しかし、専門用語（生物医学分野では、疾患名、遺伝子名、タンパク質名など）やファクト情報（例えば、生物医学分野ではタンパク質間相互作用など）は、一般の単語分布中に埋没してしまうため、特徴単語として抽出されにくくなってしまう。また、インデックス付けが一通りだと、検索結果の概観として表示されるのは、ひとつの視点に限定されたものになるため、その視点が利用者の検索要求や興味に合致していない場合には、適切な概観表示とはなり得ない。 In document retrieval based on words, the documents are indexed by using words appearing in the document to realize document retrieval. The same is true for the method disclosed in Japanese Patent Laid-Open No. 2000-155758, and in order to extract characteristic words from a document, a statistical measure (such as the tf * idf method) is used for the words included in the document. The degree of importance is calculated by using (a), and words are extracted in descending order of importance. In general, indexing is performed on a single document database in one way. However, technical terms (such as disease names, gene names, and protein names in the biomedical field) and fact information (for example, protein-protein interactions in the biomedical field) are buried in the general word distribution. It becomes difficult to be extracted as a feature word. In addition, when indexing is complete, the search result overview is limited to one viewpoint, so if that viewpoint does not match the user's search request or interest Cannot be an appropriate overview display.

本発明は上記現状を鑑み、利用者の興味に合致した複数の視点から検索結果を概観表示する文書検索システムを提供することを目的とする。 An object of the present invention is to provide a document search system that displays an overview of search results from a plurality of viewpoints that match a user's interest.

上記課題を解決するために、本発明では、ひとつの文書データベースに対して、複数通りのインデックス付けを行なうことで、検索結果を複数視点から概観表示できるようにする。 In order to solve the above-described problems, in the present invention, a search result can be displayed in an overview from a plurality of viewpoints by indexing a single document database in a plurality of ways.

例えば、ひとつの文書データベースに対して、通常の単語によるインデックス付け、専門用語によるインデックス付け、ファクト情報によるインデックス付けを行なう。それぞれのインデックス付けされた文書データベースの対応をとるために、各文書を共通の識別子によって管理し、ある文書からの概略を、それぞれのインデックスを用いて作成できるようにする。 For example, one word database is indexed with ordinary words, indexed with technical terms, and indexed with fact information. In order to correspond to each indexed document database, each document is managed by a common identifier so that an outline from a document can be created using each index.

本発明の文書検索システムは、検索要求を入力する入力部、検索された文書群を表示する検索結果表示部、及び検索された文書群の概略を表示する概略表示部を備える検索クライアントと、インデックス付けされた複数の文書を格納した文書データベース、受信した検索要求に対して関連度の高い文書を前記文書データベースから検索する検索部、及び与えられた文書群に対して前記インデックスを用いて概略を作成する概略作成部を備える検索サーバとを含み、インデックスとして種類の異なる複数のインデックスを備える。 A document search system of the present invention includes a search client including an input unit for inputting a search request, a search result display unit for displaying a searched document group, and a general display unit for displaying an outline of the searched document group, and an index A document database that stores a plurality of attached documents, a search unit that searches the document database for documents that are highly relevant to the received search request, and an outline for a given document group using the index And a search server including a schematic creation unit to be created, and includes a plurality of different types of indexes as indexes.

検索クライアントの概略表示部には、複数の種類の概略が異なる視点毎に区分して表示される。検索結果表示部は、表示された文書群の中から次の検索のキーとなる文書を選択するための文書選択部を有し、概略表示部は、表示された概略の要素の中から次の検索のキーとなる要素を選択するための概略選択部を有する。 On the summary display section of the search client, a plurality of types of summary are displayed separately for each different viewpoint. The search result display unit has a document selection unit for selecting a document to be a key for the next search from the displayed document group, and the schematic display unit displays the next of the displayed schematic elements. It has a rough selection unit for selecting an element that is a key for search.

検索結果として得られた文書集合に対して、複数の視点からの概観表示を見ることによって、利用者はより適切に検索結果の性質を把握できるようになる。また、各視点間の関連を検索対象文書を仲介として捉えることができるため、検索結果の分析をより詳細に行なうことができる。 By viewing the overview display from a plurality of viewpoints with respect to the document set obtained as a search result, the user can more appropriately grasp the nature of the search result. Further, since the relationship between the viewpoints can be regarded as a search target document, the search result can be analyzed in more detail.

以下、図面を参照して本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明を実現するためのシステムの構成例を示す概略図である。このシステムは、利用者が検索要求を入力したり、検索結果を表示したりする検索クライアント２０、文書データベースを検索するための検索サーバ４０、５０、６０、検索クライアント２０と検索サーバ４０、５０、６０の仲介をする連想検索サーバ３０を通信ネットワーク１０により接続して構成される。図に示した例では、文書データベースを検索するための検索サーバとして３つの検索サーバが通信ネットワークに接続されているが、通信ネットワークに接続される検索サーバの数は任意でよい。検索クライアントの数も任意である。 FIG. 1 is a schematic diagram showing a configuration example of a system for realizing the present invention. This system includes a search client 20 in which a user inputs a search request and displays a search result, search servers 40, 50, 60 for searching a document database, search client 20 and search servers 40, 50, The associative search server 30 that mediates 60 is connected by the communication network 10. In the example shown in the figure, three search servers are connected to the communication network as search servers for searching the document database, but the number of search servers connected to the communication network may be arbitrary. The number of search clients is also arbitrary.

検索サーバ４０、５０、６０の備える検索手段４０２、５０２、６０２は、連想検索サーバから送られてきた検索要求に対して、関連度の高い文書群を文書データベース４０３、５０３、６０３から検索し、その検索結果を関連度の重み付きで連想検索サーバ３０に返す。ここでの検索手段は、例えば、公知のキーワード検索手法により実現できる。 The search means 402, 502, 602 provided in the search servers 40, 50, 60 search the document database 403, 503, 603 for a document group having a high degree of relevance in response to the search request sent from the associative search server. The search result is returned to the associative search server 30 with the weight of relevance. The search means here can be realized by, for example, a known keyword search method.

キーワード検索手法では、検索処理の効率を上げるために、文書データベースに含まれる文書を単語に分割し（日本語の文書に対しては形態素解析、英語の文書に対してはstemming処理を行なう）、どの文書にどの単語が含まれるかをインデックスとして作成しておく。検索実行時には、作成されたインデックスが主記憶に読み込まれるため、検索処理を高速に実行できる。図１においては、検索サーバ４０、５０、６０が有する文書データベース４０３、５０３、６０３のそれぞれに対してインデックス４０４、５０４、６０４を作成し、検索処理に利用する。 In the keyword search method, in order to increase the efficiency of the search process, the document contained in the document database is divided into words (morphological analysis is performed for Japanese documents, and stemming processing is performed for English documents) Which word is included in which document is created as an index. When the search is executed, the created index is read into the main memory, so that the search process can be executed at high speed. In FIG. 1, indexes 404, 504, and 604 are created for document databases 403, 503, and 603 of search servers 40, 50, and 60, respectively, and are used for search processing.

検索サーバ４０、５０、６０の備える概略作成手段４０１、５０１、６０１は、文書データベース４０３、５０３、６０３から検索された文書群の概略を作成する。ここで概略とは、文書群の内容をよく表わす単語集合のことである。概略作成手段としては、特開２０００−１５５７５８号公報などの既存手法が利用できる。概略作成時にも、上記インデックスを利用する。すなわち、ある文書にどの単語が含まれているかを、インデックスを参照して調べるわけである。 The outline creation means 401, 501, and 601 included in the search servers 40, 50, and 60 create outlines of document groups searched from the document databases 403, 503, and 603. Here, an outline is a word set that well represents the contents of a document group. As the outline creation means, an existing method such as JP 2000-155758 A can be used. The above index is also used when creating the outline. That is, it is checked by referring to the index which word is included in a certain document.

一例を示すと、まず、概略を作成しようとする文書群中の全文書に含まれる単語の頻度集計をする。一般に、ある文書群で良く現れる単語ほどその文書群を代表する度合も高いため、文書群中で出現頻度が高いほど概略に含まれやすいことになる。ただし、「する」などのように、どの文書にも良く現れるような一般的な単語は概略単語としては適当ではない。よって、通常は、文書群が属する文書データベース中での出現頻度も考慮して概略単語を選択する。つまり、指定された文書群での出現頻度が高く、かつ、文書データベース全体での総出現頻度が低い単語ほど、その文書群中でしか現れないという意味で特徴的な単語であり、その文書群を特徴付ける概略として適切である。具体的には、文書群中のそれぞれの単語について、文書群中での出現頻度、文書データベース中での出現頻度を入力とする適当な関数により単語の重みを計算し、ある閾値以上の重みを持つ単語を概略として採用する。 As an example, first, the frequency of words included in all the documents in the document group to be outlined is totaled. In general, the more frequently a word appears in a document group, the higher the degree of representation of the document group. Therefore, the higher the frequency of appearance in the document group, the easier it is to be included in the outline. However, general words that frequently appear in any document, such as “do”, are not suitable as approximate words. Therefore, generally, an approximate word is selected in consideration of the appearance frequency in the document database to which the document group belongs. That is, a word having a higher appearance frequency in a specified document group and a lower total appearance frequency in the entire document database is a characteristic word in the sense that it appears only in the document group. It is suitable as an outline to characterize. Specifically, for each word in the document group, the weight of the word is calculated by an appropriate function that inputs the appearance frequency in the document group and the appearance frequency in the document database. Adopt the word you have as an outline.

検索クライアント２０は、検索要求入力手段２０１、検索結果表示手段２０２、概略表示手段２０３を備える。 The search client 20 includes search request input means 201, search result display means 202, and summary display means 203.

図２は検索クライアントにおける初期画面の例を示す図である。利用者は検索要求入力エリア２０１１に検索要求を入力し、検索指示ボタン２０１２をクリックすることで検索を行なう。 FIG. 2 is a diagram illustrating an example of an initial screen in the search client. A user inputs a search request in the search request input area 2011 and clicks a search instruction button 2012 to perform a search.

図３は検索クライアントにおける検索結果の例を示す図である。検索結果は検索結果表示手段２０２によって表示され、検索結果の概略が概略表示手段２０３によって表示される。検索結果表示手段２０２は文書群指定手段も兼ねており、文書選択チェックボックス２０２１により、任意個の記事を選択し、連想検索指示ボタン２００１をクリックすることで、選択した記事と関連する文書を検索することができる。概略表示手段２０３は単語群指定手段も兼ねており、単語選択チェックボックス２０３１、２０３２により、任意個の単語を選択し、連想検索指示ボタン２００１をクリックすることで、概略単語からの検索を行なうことができる。 FIG. 3 is a diagram illustrating an example of a search result in the search client. The search result is displayed by the search result display unit 202, and an outline of the search result is displayed by the summary display unit 203. The search result display unit 202 also serves as a document group specifying unit. By selecting an arbitrary number of articles using the document selection check box 2021 and clicking an associative search instruction button 2001, a document related to the selected article is searched. can do. The summary display means 203 also serves as a word group designation means. By selecting an arbitrary number of words using the word selection check boxes 2031 and 2032 and clicking the associative search instruction button 2001, a search from the summary words is performed. Can do.

連想検索サーバ３０は、検索クライアント２０から送られてくる検索要求を解析する検索要求解析手段３０１、検索クライアント２０から送られてくる検索要求を、検索サーバ４０、５０、６０に振り分ける機能を持つ検索要求発行手段３０２、文書群に対する概略単語を検索サーバ４０、５０、６０に要求する概略単語要求手段３０３を備える。 The associative search server 30 has a search request analysis unit 301 that analyzes a search request sent from the search client 20 and a search function that distributes the search request sent from the search client 20 to the search servers 40, 50, and 60. The request issuing unit 302 includes an approximate word requesting unit 303 that requests the search servers 40, 50, and 60 for an approximate word for a document group.

検索要求解析手段３０１は、検索クライアント２０から送られてくる検索要求を解析してその中に含まれる単語を同定し検索キーを作成する。検索要求解析手段３０１は任意のものであってよいが、日本語文に関しては文を単語に分割する形態素解析、英文に対しては単語の原形還元と品詞付けを行なうstemming処理を最低含んで構成される。 The search request analysis unit 301 analyzes the search request sent from the search client 20, identifies the words included therein, and creates a search key. The search request analysis unit 301 may be arbitrary, but is configured to include at least a morpheme analysis that divides a sentence into words for a Japanese sentence, and a stemming process that performs original form reduction and part of speech addition for an English sentence. The

検索要求発行手段３０２に送られてくる検索要求としては、（１）検索要求解析手段３０１によって作成された単語集合、（２）検索クライアント２０が備える検索結果表示手段（文書群指定手段）から送られてくる文書ＩＤの集合、あるいは（３）検索クライアント２０が備える概略表示手段（単語群指定手段）２０３から送られてくる単語集合、である。検索要求が（１）と（３）の場合、その単語集合を検索要求として検索サーバに送る。検索要求が（２）の場合、概略単語要求手段３０３が検索サーバに対して、その文書ＩＤの集合に対応する文書群の概略を要求し、送られてきた概略単語集合を検索要求として検索サーバに送る。検索要求発行手段３０２がどの検索サーバに検索要求を送るかは、各検索サーバが保持するインデックスの内容に応じて異なるため、後述の例を用いてその動作を示す。 The search request sent to the search request issuing unit 302 includes (1) a word set created by the search request analyzing unit 301 and (2) a search result display unit (document group specifying unit) included in the search client 20. A set of document IDs received, or (3) a set of words sent from the summary display means (word group designation means) 203 provided in the search client 20. When the search requests are (1) and (3), the word set is sent to the search server as a search request. When the search request is (2), the approximate word requesting unit 303 requests the search server for an outline of a document group corresponding to the set of document IDs, and the search server uses the sent approximate word set as a search request. Send to. The search server to which the search request issuing unit 302 sends the search request differs depending on the contents of the index held by each search server, and the operation will be described using an example described later.

従来の連想検索システムでは、ひとつの文書データベースに対しては、ひとつの視点からのインデックス付けのみが行なわれていた。本発明では、ひとつの文書データベースに対して、複数の視点からのインデックス付けを行なうことで、利用者の利便性を高めることを目的としている。このことを実現するために、必要な要件は、（１）複数の視点からのインデックス作成すること、（２）複数の、インデックス付き文書データベースに含まれる同一の文書を共通識別子で管理すること、である。同一文書を共通識別子で管理することによって、検索結果として得られた文書集合のそれぞれのインデックス間における同一性が保持できるため、それぞれの視点から同じ文書集合に対して概略単語を作成することができる。 In the conventional associative search system, only one indexing is performed on one document database. An object of the present invention is to improve user convenience by indexing a single document database from a plurality of viewpoints. In order to realize this, the necessary requirements are (1) index creation from a plurality of viewpoints, (2) management of the same document included in a plurality of indexed document databases with a common identifier, It is. By managing the same document with a common identifier, it is possible to maintain the sameness between the indexes of the document set obtained as a search result, so that approximate words can be created for the same document set from each viewpoint. .

図４、図５、図６は、ひとつの文書データベースベースに対し、複数の視点からのインデックス付けを行なった場合のインデックスの例である。 4, 5, and 6 are examples of indexes when indexing from a plurality of viewpoints is performed on one document database base.

図４は、文書ＩＤが１２３４５である文書のインデックスを、一般語、タンパク質名、タンパク質間相互作用で行なった例である。インデックス列の各単語の前の数字は、その単語がその文書で出現する頻度である。図５は、文書ＩＤが１２３４５である文書のインデックスを、タンパク質名で行なった例である。図６は、文書ＩＤが１２３４５である文書のインデックスをタンパク質間相互作用で行なった例である。上記要件（２）を満たすために、それぞれのインデックス付けにおいて共通の文書ＩＤ「１２３４５」を用いている。それぞれの視点でのインデックスの作成方法は任意でよいが、現実的には、一つのインデックスが、他の複数のインデックスを包含しているように作成するのが便利である。上記の例では、図４のインデックスが図５、図６のインデックスを包含している。こうすることにより、上述した検索要求発行手段３０２に送られてくる検索要求はすべて検索サーバ４０に送ればよいことになる。検索サーバ５０、６０は検索結果に対する概略作成時にのみ用いられる。 FIG. 4 is an example in which the index of the document with the document ID 12345 is performed by a general term, a protein name, and an interaction between proteins. The number before each word in the index string is the frequency with which that word appears in the document. FIG. 5 is an example in which the index of the document whose document ID is 12345 is performed by the protein name. FIG. 6 shows an example in which the index of the document whose document ID is 12345 is performed by the protein-protein interaction. In order to satisfy the requirement (2), a common document ID “12345” is used in each indexing. The method of creating the index from each viewpoint may be arbitrary, but in reality, it is convenient to create so that one index includes a plurality of other indexes. In the above example, the index of FIG. 4 includes the indexes of FIGS. By doing so, all the search requests sent to the search request issuing unit 302 described above may be sent to the search server 40. The search servers 50 and 60 are used only when an outline is created for a search result.

図３は、図４、図５、図６のインデックスを用いて連想検索を行なった例を示す図である。検索結果として、記事タイトルが表示されている。また、検索結果の概略として、これらの記事に含まれるタンパク質名とタンパク質間相互作用が表示されている。 FIG. 3 is a diagram illustrating an example in which an associative search is performed using the indexes of FIGS. 4, 5, and 6. The article title is displayed as a search result. In addition, as an outline of the search results, protein names and protein-protein interactions included in these articles are displayed.

以下、図７と図８のシーケンス図を用いて処理の流れを説明する。説明のため、検索サーバ４０、５０、６０に備えられた文書データベース４０３、５０３、６０３のインデックス４０４、５０４、６０４を、それぞれ図４、図５、図６のように作成するとする。このようなインデックス付けを行なった場合、検索要求発行手段３０２の動作は以下のようになる。利用者が入力した検索要求に対しては、検索要求発行手段３０２は、検索サーバ４０に対して検索要求を発行する。検索サーバ４０から得られた検索結果に対する概略単語を作成するときには、検索サーバ５０、６０に対して概略単語要求手段３０３が概略単語の作成要求を発行する。利用者が文書群を指定して、文書群からの再検索を実行する場合には、検索サーバ４０に対して検索要求を発行する。利用者が単語群を指定して、単語群からの再検索を実行する場合には、検索サーバ４０に対して検索要求を発行する。このように、検索はすべて検索サーバ４０で行う。検索サーバ５０、６０は検索結果の概略単語を作成するときだけ使う。「タンパク質名」「タンパク質相互作用」の両方の単語を指定した場合でも、検索サーバ４０は、検索サーバ５０、６０のインデックスを包含しているので、問題なく動作する。 Hereinafter, the flow of processing will be described with reference to the sequence diagrams of FIGS. For the sake of explanation, it is assumed that the indexes 404, 504, and 604 of the document databases 403, 503, and 603 provided in the search servers 40, 50, and 60 are created as shown in FIGS. 4, 5, and 6, respectively. When such indexing is performed, the operation of the search request issuing unit 302 is as follows. In response to the search request input by the user, the search request issuing unit 302 issues a search request to the search server 40. When creating an approximate word for the search result obtained from the search server 40, the approximate word requesting unit 303 issues a request for generating an approximate word to the search servers 50 and 60. When the user designates a document group and executes a re-search from the document group, a search request is issued to the search server 40. When the user designates a word group and executes a re-search from the word group, a search request is issued to the search server 40. In this way, all searches are performed by the search server 40. The search servers 50 and 60 are used only when creating approximate words of search results. Even when both the words “protein name” and “protein interaction” are designated, the search server 40 includes the indexes of the search servers 50 and 60, and thus operates without any problem.

次に処理の流れを、図７のシーケンス図を用いて説明する。利用者は検索クライアント２０の検索要求入力手段２０１を用いて、検索要求を入力する。入力された検索要求は連想検索サーバに送信される（Ｔ１１）。連想検索サーバ３０の検索要求解析手段３０１は検索要求を解析し、検索サーバに送信するための検索要求を作成する。検索要求発行手段３０２により検索要求が検索サーバ４０に送信される（Ｔ１２）。検索サーバ４０の検索手段４０２は、インデックス４０４を用いて文書データベース４０３を検索し、その結果を連想検索サーバ３０に送信する（Ｔ１３）。連想検索サーバ３０の概略単語要求手段３０３は、得られた検索結果の概略を作成するための、概略の作成要求を検索サーバ５０と検索サーバ６０に送信する（Ｔ１４，Ｔ１６）。検索サーバ５０と検索サーバ６０の概略単語作成手段５０１、６０１は、それぞれインデックス５０４、６０４を利用して概略単語を作成する。この例の場合、概略単語作成手段５０１は、タンパク質名で構成される概略単語を作成し、概略単語作成手段６０１は、タンパク質間相互作用で構成される概略単語を作成する。それぞれの概略単語作成手段で作成された概略単語は、連想検索サーバ３０に送信される（Ｔ１５，Ｔ１７）。最後に、検索結果と概略単語が連想検索サーバ３０から検索クライアント２０に送信され（Ｔ１８）、検索クライアント２０の検索結果表示手段２０２と概略表示手段２０３によって利用者に提示される。 Next, the flow of processing will be described with reference to the sequence diagram of FIG. The user uses the search request input means 201 of the search client 20 to input a search request. The input search request is transmitted to the associative search server (T11). The search request analysis unit 301 of the associative search server 30 analyzes the search request and creates a search request for transmission to the search server. The search request issuing means 302 transmits a search request to the search server 40 (T12). The search means 402 of the search server 40 searches the document database 403 using the index 404 and transmits the result to the associative search server 30 (T13). The summary word requesting unit 303 of the associative search server 30 transmits a summary creation request for creating a summary of the obtained search results to the search server 50 and the search server 60 (T14, T16). The approximate word creation means 501 and 601 of the search server 50 and the search server 60 create approximate words using the indexes 504 and 604, respectively. In this example, the approximate word creation means 501 creates an approximate word composed of protein names, and the approximate word creation means 601 creates an approximate word composed of protein-protein interactions. The summary word created by each summary word creation means is transmitted to the associative search server 30 (T15, T17). Finally, the search result and the approximate word are transmitted from the associative search server 30 to the search client 20 (T18) and presented to the user by the search result display means 202 and the summary display means 203 of the search client 20.

次に、図８のシーケンス図を用いて説明する。このシーケンス図は、検索結果として得られた文書やその概略単語から再検索を行なう場合の処理の流れを示している。 Next, description will be made with reference to the sequence diagram of FIG. This sequence diagram shows the flow of processing when a search is performed again from a document obtained as a search result or its approximate word.

始めに検索結果として得られた文書から再検索を行う場合について説明する。利用者は、検索クライアント２０の文書群指定手段２０２を用いて、再検索のキーとなる文書を選択する。選択された文書の識別子は連想検索サーバ３０に送信される（Ｔ２１）。連想検索サーバ３０の概略単語要求手段３０３は、選択された文書の概略を作成するための、概略の作成要求を検索サーバ４０に送信する（Ｔ２２）。検索サーバ４０の概略単語作成手段４０１は、インデックス４０４を利用して概略単語を作成する。すなわち、前述のように、特開２０００−１５５７５８号公報などと同じ手法で、統計的に重要な単語を選択して概略単語を作成する。作成された概略単語は、連想検索サーバ３０に送信される（Ｔ２３）。 First, a case where re-searching is performed from a document obtained as a search result will be described. The user uses the document group specifying means 202 of the search client 20 to select a document that is a key for re-search. The identifier of the selected document is transmitted to the associative search server 30 (T21). The summary word requesting unit 303 of the associative search server 30 transmits a summary creation request for creating a summary of the selected document to the search server 40 (T22). The approximate word creation unit 401 of the search server 40 creates an approximate word using the index 404. That is, as described above, a statistically important word is selected and the approximate word is created by the same method as in Japanese Patent Laid-Open No. 2000-155758. The created approximate word is transmitted to the associative search server 30 (T23).

利用者が文書のみから再検索を実行する場合は、連想検索サーバ３０の検索要求発行手段３０２により、得られた概略単語が検索サーバ４０に送信される（Ｔ２５）。検索サーバ４０の検索手段４０２は、インデックス４０４を用いて文書データベース４０３を検索し、その結果を連想検索サーバ３０に送信する（Ｔ２６）。以降の処理は図７のシーケンス図における概略単語作成手段以降の処理と同様である。 When the user performs a search again only from the document, the obtained approximate word is transmitted to the search server 40 by the search request issuing unit 302 of the associative search server 30 (T25). The search means 402 of the search server 40 searches the document database 403 using the index 404 and transmits the result to the associative search server 30 (T26). The subsequent processing is the same as the processing after the rough word creating means in the sequence diagram of FIG.

利用者が概略単語から再検索を行なう場合は、利用者は検索クライアント２０の単語群指定手段２０３を用いて、再検索のキーとなる単語を選択する。このとき、複数視点の単語を同時に指定することも可能である。選択された単語、あるいは単語の識別子は連想検索サーバ３０に送信される（Ｔ２４）。以降の処理は図８のシーケンスにおける検索要求発行手段以降の処理と同様である。 When the user performs a re-search from the approximate word, the user uses the word group designating unit 203 of the search client 20 to select a word as a key for the re-search. At this time, words from a plurality of viewpoints can be specified at the same time. The selected word or the identifier of the word is transmitted to the associative search server 30 (T24). The subsequent processing is the same as the processing after the search request issuing means in the sequence of FIG.

ある視点から作成した概略単語を用いて再検索を行なうことにより、その視点と他の視点の関連を、文書データベースを仲介として把握することができる。一例を挙げると、タンパク質名から構成される概略単語を用いて、再検索を行なった場合、選択したタンパク質名に関連する文書が得られ、さらに、選択したタンパク質名に関連するタンパク質名相互作用を知ることができる。このことにより、検索結果を多面的な観点から詳細に分析していくことができるようになる。 By performing a re-search using an approximate word created from a certain viewpoint, the relationship between that viewpoint and another viewpoint can be grasped using the document database as an intermediary. For example, when a re-search is performed using a rough word composed of protein names, a document related to the selected protein name is obtained, and further, the protein name interaction related to the selected protein name is obtained. I can know. As a result, the search result can be analyzed in detail from various viewpoints.

図９は、インデックスとして、タンパク質名と疾患名を用いた場合の例である。上記説明と同様の手順をとることにより、利用者が興味をもつタンパク質名から、そのタンパク質名と関連する疾患名を知ることができる。また、逆に、利用者が興味をもつ疾患名から、その疾患名と関連するタンパク質名を知ることができる。 FIG. 9 shows an example in which a protein name and a disease name are used as indexes. By taking the same procedure as described above, the name of the disease associated with the protein name can be known from the name of the protein that the user is interested in. Conversely, the name of a protein associated with the name of the disease can be obtained from the name of the disease in which the user is interested.

次に、本発明の変形例を、図１０を用いて説明する。
実施例１では、どの視点で検索結果の概略を作成するかをあらかじめ固定していた。しかし、あらかじめ複数視点からのインデックスを保持する複数の検索サーバを用意しておき、利用者が自分が利用したい視点を選択することも可能である。図１０は、利用者が視点を選択するための初期画面の例である。 Next, a modification of the present invention will be described with reference to FIG.
In the first embodiment, the viewpoint from which the outline of the search result is created is fixed in advance. However, it is also possible to prepare a plurality of search servers that hold indexes from a plurality of viewpoints in advance and select a viewpoint that the user wants to use. FIG. 10 is an example of an initial screen for the user to select a viewpoint.

視点選択手段２０１３には、視点（view1、view2）として、選択可能な３つの視点（遺伝子“gene”によるインデックス、タンパク質“protein”によるインデックス、タンパク質間相互作用“protein interaction”）が提示されている。利用者は各視点として、どの視点から概観を得たいかを選択する。図１０の例では、利用者はview1としてタンパク質“protein”によるインデックス、view2としてタンパク質間相互作用“protein interaction”を選択している。 The viewpoint selection means 2013 presents three selectable viewpoints (index by gene “gene”, index by protein “protein”, protein interaction “protein interaction”) as viewpoints (view1, view2). . The user selects from which viewpoint he wants to get an overview. In the example of FIG. 10, the user has selected the index by the protein “protein” as view1 and the “protein interaction” as view2.

利用者はこの後、検索要求入力エリア２０１１に検索要求を入力し、検索指示ボタン２０１２をクリックすることで検索を行なう。以後の処理は実施例１と同様である。 Thereafter, the user inputs a search request in the search request input area 2011 and clicks a search instruction button 2012 to perform a search. The subsequent processing is the same as in the first embodiment.

次に、本発明の別の変形例を図１１を用いて説明する。
実施例１では、複数視点から作成されたインデックスを別々のサーバが保持していた。すなわち、図４のインデックスを検索サーバ４０のインデックス４０４、図５のインデックスを検索サーバ５０のインデックス５０４、図６のインデックスを検索サーバ６０のインデックス６０４が保持している。しかし、検索サーバは必ずしも複数必要というわけではなく、一つの検索サーバに複数のインデックスを保持させることも可能である。 Next, another modification of the present invention will be described with reference to FIG.
In the first embodiment, different servers hold indexes created from a plurality of viewpoints. That is, the index 404 of the search server 40 is held by the index of FIG. 4, the index 504 of the search server 50 is held by the index of FIG. 5, and the index 604 of the search server 60 is held by the index of FIG. However, a plurality of search servers are not necessarily required, and a single search server can hold a plurality of indexes.

図１１は、一つの検索サーバに複数のインデックスを保持させる場合の構成図である。検索サーバ７０が有する文書データベース７０３に対して、複数視点から作成したインデックスをインデックス７０４、７０５、７０６として保持している。ひとつの検索サーバ内に複数のインデックスを保持する場合、それぞれのインデックスは独立に保持するのが普通である。個々のインデックスは、例えば、縦に文書、横に単語をとった行列形式とすることができる。行列の要素には、その単語がその文書に何回出現するかの出現頻度情報を入れておく。この場合、複数のインデックス（行列）間で、縦軸の文書の同一性を保持する必要があるため、複数のインデックス間で同一の文書は同一の識別子によって管理する。 FIG. 11 is a configuration diagram when a plurality of indexes are held in one search server. In the document database 703 of the search server 70, indexes created from a plurality of viewpoints are held as indexes 704, 705, and 706. When holding a plurality of indexes in one search server, it is usual to hold each index independently. Each index can be in the form of a matrix with, for example, a document vertically and a word horizontally. In the elements of the matrix, appearance frequency information indicating how many times the word appears in the document is entered. In this case, since it is necessary to maintain the identity of the document on the vertical axis among a plurality of indexes (matrix), the same document is managed by the same identifier among the plurality of indexes.

実施例１では、連想検索サーバ３０が有する検索要求発行手段３０２が、検索要求のタイプに応じて、どの検索サーバに対して検索要求を発行するかを制御していた。図１１のように、検索サーバが一つの場合には、検索要求発行手段３０２は、検索要求のタイプに応じて、検索サーバ７０のどのインデックスを用いて検索するかを制御するようにすればよい。図７、図８のシーケンス図において、検索サーバをすべて同一の検索サーバと見なすことで、実施例１と同様の処理が行なわれる。 In the first embodiment, the search request issuing unit 302 included in the associative search server 30 controls which search server the search request is issued in accordance with the type of the search request. As shown in FIG. 11, when there is one search server, the search request issuing unit 302 may control which index of the search server 70 is used for searching according to the type of search request. . In the sequence diagrams of FIGS. 7 and 8, the same processing as in the first embodiment is performed by regarding all the search servers as the same search server.

本発明を実現するためのシステムの構成例を示す概略図。1 is a schematic diagram showing a configuration example of a system for realizing the present invention. 検索クライアントにおける初期画面の例を示す図。The figure which shows the example of the initial screen in a search client. 検索クライアントにおける検索結果の例を示す図。The figure which shows the example of the search result in a search client. インデックス付けの例を示す図。The figure which shows the example of indexing. インデックス付けの例を示す図。The figure which shows the example of indexing. インデックス付けの例を示す図。The figure which shows the example of indexing. 検索クライアント，連想検索サーバ，検索サーバの間のデータ及び処理の流れを示すシーケンス図。The sequence diagram which shows the data between a search client, an associative search server, and a search server, and the flow of a process. 検索クライアント，連想検索サーバ，検索サーバの間のデータ及び処理の流れを示すシーケンス図。The sequence diagram which shows the data between a search client, an associative search server, and a search server, and the flow of a process. 検索クライアントにおける検索結果の表示例を示す図。The figure which shows the example of a display of the search result in a search client. 検索クライアントにおける初期画面の例を示す図。The figure which shows the example of the initial screen in a search client. 本発明を実現するためのシステムの他の構成例を示す概略図。Schematic which shows the other structural example of the system for implement | achieving this invention.

Explanation of symbols

１０：通信ネットワーク
２０：検索クライアント
２００１：連想検索指示ボタン
２０１：検索要求入力手段
２０１１：検索要求入力エリア
２０１２：検索指示ボタン
２０１３：視点選択手段
２０２：検索結果表示手段（文書群指定手段）
２０２１：文書選択チェックボックス
２０３：概略表示手段（単語群指定手段）
２０３１：単語選択チェックボックス
２０３２：単語選択チェックボックス
３０：連想検索サーバ
３０１：検索要求解析手段
３０２：検索要求発行手段
３０３：概略単語要求手段
４０：検索サーバ
４０１：概略作成手段
４０２：検索手段
４０３：文書データベース
４０４：インデックス
５０：検索サーバ
５０１：概略作成手段
５０２：検索手段
５０３：文書データベース
５０４：インデックス
６０：検索サーバ
６０１：概略作成手段
６０２：検索手段
６０３：文書データベース
６０４：インデックス
７０：検索サーバ
７０１：概略作成手段
７０２：検索手段
７０３：文書データベース
７０４：インデックス
７０５：インデックス
７０６：インデックス 10: Communication network 20: Search client 2001: Associative search instruction button 201: Search request input means 2011: Search request input area 2012: Search instruction button 2013: Viewpoint selection means 202: Search result display means (document group specification means)
2021: Document selection check box 203: Outline display means (word group designation means)
2031: Word selection check box 2032: Word selection check box 30: Associative search server 301: Search request analysis means 302: Search request issue means 303: Outline word request means 40: Search server 401: Outline creation means 402: Search means 403: Document database 404: Index 50: Search server 501: Outline creation means 502: Search means 503: Document database 504: Index 60: Search server 601: Outline creation means 602: Search means 603: Document database 604: Index 70: Search server 701 : Outline creation means 702: Search means 703: Document database 704: Index 705: Index 706: Index

Claims

A search client comprising an input unit for inputting a search request, a search result display unit for displaying a searched document group, and a general display unit for displaying an outline of the searched document group;
A document database storing a plurality of indexed documents, a search unit that searches the document database for documents highly relevant to a received search request, and an outline using the index for a given document group A search server including a schematic creation unit for creating
A document search system comprising a plurality of different types of indexes as the index.

2. The document search system according to claim 1, comprising a plurality of search servers, each search server having a different type of index, and the same document is managed by the same identifier among the document databases of the plurality of search servers. Document retrieval system characterized by that.

2. The document search system according to claim 1, wherein one search server includes a plurality of different types of indexes, and the same document is managed by the same identifier among the plurality of indexes. system.

2. The document search system according to claim 1, wherein one of the plurality of indexes is an index obtained by integrating the remaining plurality of indexes.

2. The document search system according to claim 1, wherein the summary display unit of the search client includes a summary display unit for each index that displays different summaries corresponding to different indexes.

6. The document search system according to claim 5, wherein the search client includes means for selecting a schematic element displayed on the schematic display unit, and transmits the selected element as the search request. Document retrieval system.

A document database storing multiple documents;
A plurality of types of indexes assigned from different viewpoints to documents in the document database;
A search unit that searches the document database for documents highly relevant to the received search request;
A summary creation unit for creating a plurality of types of summary using the index for a given document group,
The search server, wherein the same document is managed by the same identifier among the plurality of indexes.

An input part for inputting a search request;
A search result display section for displaying a document group as a received search result;
An outline display unit for displaying an outline of the document group for each of a plurality of different viewpoints;
The search result display unit has a document selection unit for selecting a document to be a key for the next search from the displayed document group,
The schematic display unit includes a schematic selection unit for selecting an element that is a key for the next search from the displayed schematic elements,
A search client, comprising: a search request input to the input unit, a document selected by the document selection unit, or information on a general element selected by the general selection unit as a search request.