JP2009230296A

JP2009230296A - Document retrieval system

Info

Publication number: JP2009230296A
Application number: JP2008072830A
Authority: JP
Inventors: Toshiko Aizono; 敏子相薗; Akinori Koike; 彰規小池
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-03-21
Filing date: 2008-03-21
Publication date: 2009-10-08

Abstract

PROBLEM TO BE SOLVED: To acquire a document adapted to a retrieval key and a document relevant to the adapted document in the case of retrieving the document. SOLUTION: A document retrieval server 1 is provided with: a document group generation part 11a for generating a document group by gathering quotation origin documents quoting the same sections of quotation destination documents based on inter-document quotation relation; a document group retrieval index generation part 11b for generating a retrieval index by extracting feature words from the generated document group; a document group retrieval part 11c for retrieving a document group retrieval index; and a document retrieval part 11h for retrieving the document retrieval index for retrieving the document included in the retrieval target document group, and configured to transmit the acquired document group retrieval result 13b and a document retrieval result 13d to a document retrieval client 2. A document retrieval client displays the received document group retrieval result 13b and the document retrieval result 13d in association at a retrieval result display part 11g. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は文書検索システムに関し、具体的には検索対象の文書集合から文書群を生成し、検索のためのインデックスを生成するシステムに関する。 The present invention relates to a document search system, and more particularly to a system that generates a document group from a set of documents to be searched and generates an index for search.

文書検索では、ユーザが所望の文書を漏れなくかつ効率的に見つけ出せることが重要である。これに対して、非特許文献１に記載のエルゼビア社の書誌情報検索サービス「SCOPUS」では、ヒットした文書と引用関係にある文書のリストを表示する機能を提供している。この機能によりユーザはヒットした文書と関連する文書を容易に取得できるので、検索漏れの低減が可能となる。非特許文献２に記載のサイトで提供されているＡＣＭ（Association for Computing Machinery：国際計算機学会)の電子図書館サービスも同様に、引用関係にある文書へのリンクを表示する機能を有している。 In document retrieval, it is important that a user can find a desired document efficiently without omission. In contrast, Elsevier's bibliographic information search service “SCOPUS” described in Non-Patent Document 1 provides a function of displaying a list of documents that have a citation relationship with a hit document. With this function, the user can easily obtain a document related to the hit document, so that search omission can be reduced. Similarly, the ACM (Association for Computing Machinery) electronic library service provided on the site described in Non-Patent Document 2 has a function of displaying a link to a document having a citation relationship.

また特許文献１に記載の文書検索装置は、検索結果に含まれる文書を引用関係に基づいて分類する機能を有している。この機能によりユーザは、検索結果のうち関連のある文書をまとめて参照できるので、検索結果のチェックを効率的に行うことができる。 The document search device described in Patent Document 1 has a function of classifying documents included in search results based on citation relationships. With this function, the user can refer to related documents in the search results collectively, so that the search results can be checked efficiently.

足立泰. “学術情報ナビゲーションサービス：Scopus（スコーパス）”. 情報管理. Vol. 47, No. 8, (2004), 558-562.Yasushi Adachi. “Academic Information Navigation Service: Scopus”. Information Management. Vol. 47, No. 8, (2004), 558-562. ACMポータル http://portal.acm.org/portalACM Portal http://portal.acm.org/portal 特開2007-328714号公報JP 2007-328714 A

非特許文献１及び２に記載の技術により表示される関連文書は、検索キーと適合しているとは限らない。そのような文書は、検索キーで所望の文書を検索しているユーザにとってノイズとなる可能性がある。 Related documents displayed by the techniques described in Non-Patent Documents 1 and 2 are not always compatible with the search key. Such a document may be noisy for a user searching for a desired document with a search key.

特許文献１に記載の技術により分類される文書は、検索キーにヒットした文書に限られる。またある文書と引用関係にある文書は、同一の観点から引用されているとは限らない。そのため、文書を引用関係に基づいて分類された文書群には多様な観点が混在してしまう可能性があり、参照が煩雑となりうる。 Documents classified by the technique described in Patent Document 1 are limited to documents that hit the search key. A document that has a citation relationship with a document is not necessarily cited from the same viewpoint. Therefore, there is a possibility that various viewpoints may be mixed in the document group in which the documents are classified based on the citation relationship, and the reference may be complicated.

本発明の文書検索システムは、検索対象の文書集合から文書間の引用関係に基づいて文書群を生成する文書群生成手段と、前記引用関係にある文書から特徴語を抽出して文書群の検索インデックスを生成する文書群検索インデックス生成手段とを有する。ここで、文書群生成手段は、他の文書から引用されている文書を引用先文書とし、他の文書を引用している文書を引用元文書として、引用先文書が同じである文書をまとめて文書群を生成する。 The document search system of the present invention includes a document group generation unit that generates a document group based on a citation relationship between documents from a set of documents to be searched, and a document group search by extracting feature words from the document having the citation relationship. Document group search index generation means for generating an index. Here, the document group generation means collects documents that are the same in the cited document, with a document cited from another document as a cited document, a document that cites another document as a cited document, and the like. Generate a group of documents.

本発明によると、検索キーに適合する文書及びそれに関連する文書を効率的に参照することができる。 According to the present invention, it is possible to efficiently refer to a document that matches a search key and a related document.

以下、図面を参照して本発明の実施の形態を説明する。
本発明の文書検索装置は、検索対象の文書集合から文書間の引用関係に基づいて文書群を生成し、引用関係にある文書から特徴語を抽出して文書群の検索インデックスを生成する。 Embodiments of the present invention will be described below with reference to the drawings.
The document search apparatus of the present invention generates a document group based on a citation relationship between documents from a set of documents to be searched, extracts a feature word from a document having a citation relationship, and generates a search index for the document group.

図１は文書検索における検索インデックス生成の手順の概要を示す図であり、図１（ａ）は一般的な文書検索方式における検索インデックスの生成手順の概要を示し、図１（ｂ）は本発明における検索インデックスの生成手順の概要を示す。図１（ａ）に示すように、一般的な文書検索では検索対象の文書集合に含まれる文書ごとに検索インデックスを生成する。一方、本発明では、図１（ｂ）に示すように、検索対象の文書集合から、文書間の引用関係に基づいて文書群を生成し、ひとつの文書群を１文書と見なして検索インデックスを生成する。 FIG. 1 is a diagram showing an outline of a procedure for generating a search index in document search, FIG. 1 (a) shows an overview of a procedure for generating a search index in a general document search method, and FIG. An outline of a search index generation procedure in FIG. As shown in FIG. 1A, in a general document search, a search index is generated for each document included in a search target document set. On the other hand, in the present invention, as shown in FIG. 1B, a document group is generated from a set of documents to be searched based on the citation relationship between documents, and a search index is determined by regarding one document group as one document. Generate.

ここで本発明における文書間の引用関係について、図２及び図３を用いて詳細に説明する。 Here, the citation relationship between documents in the present invention will be described in detail with reference to FIGS.

図２は引用関係がある文書の一例を示す図であり、論文間の引用関係を示している。図中、文書１において点線で囲まれた部分、すなわち「文献［１］によれば長期の…」は他の論文を引用している。このように他の文書を引用している文書を、以下「引用元文書」と呼ぶ。一方、図２において文書２は、引用元箇所で「文献［１］」として引用されている文書である。このような他の文書から引用されている文書を、以下「引用先文書」と呼ぶ。図中、文書１の引用元箇所、及び文書２の引用先箇所については後述する。図２に示す論文の例のように、文書の著者により定義されている文書間の関連を以下「著者引用」と呼ぶ。 FIG. 2 is a diagram showing an example of a document having a citation relationship, and shows a citation relationship between papers. In the figure, the part surrounded by a dotted line in document 1, ie, “long term according to document [1]” refers to another paper. A document that cites another document in this way is hereinafter referred to as a “citation document”. On the other hand, in FIG. 2, document 2 is a document cited as “document [1]” at the citation source location. Such a document cited from another document is hereinafter referred to as a “cited document”. In the figure, the citation source part of the document 1 and the citation destination part of the document 2 will be described later. As in the example of the paper shown in FIG. 2, the relationship between documents defined by the document author is hereinafter referred to as “author citation”.

これに対して、文書の著者以外が文書間の関連を定義する場合もある。図３は、文書の著者以外によって定義された文書間の関連の一例を示す図であり、具体的には特許出願の審査引例を示している。図中、文書３は特許出願に対する拒絶理由通知書であり、文書４は文書３の拒絶理由通知書に記載されている審査対象の特許文書、文書５は文書４の特許を拒絶する理由として引用されている文献である。文書３の拒絶理由通知書中、点線で囲まれた部分「請求項１」は文書４の点線で囲まれた部分に、「引用文献１の［００２７］に記載の…」は文書５の点線で囲まれた部分に対応しており、文書４に対して文書５が関連していることが記載されている。本発明ではこのような文書の著者以外が文書を関連付けている場合も引用関係として扱い、ある文書（図中の文書４。審査対象の特許）に対してある文書（図中の文書５。引用文献）が関連ありと定義された場合、前者を「引用元文書」、後者を「引用先文書」と呼ぶ。 On the other hand, there is a case where a document other than the document author defines the relationship between documents. FIG. 3 is a diagram showing an example of a relationship between documents defined by a person other than the document author, and specifically shows an examination reference for a patent application. In the figure, document 3 is a notice of reasons for refusal of a patent application, document 4 is a patent document to be examined as described in the notice of reasons for refusal of document 3, and document 5 is cited as a reason for refusing the patent of document 4. It is a literature that has been. In the notification of reasons for refusal of document 3, the part “claim 1” surrounded by a dotted line is the part surrounded by the dotted line of document 4, and “described in [0027] of cited document 1” is the dotted line of document 5. The document 5 indicates that the document 5 is related to the document 4. In the present invention, even when a document other than the author of such a document is associated with the document, it is treated as a citation relationship, and a certain document (document 4 in the figure, patent to be examined) is referred to as document 5 in the figure. When the document) is defined as related, the former is referred to as “cited document” and the latter is referred to as “cited document”.

文書群生成処理では、上記に説明した引用関係にある文書群をまとめて文書群を生成し、検索インデックスを生成する。なお、図１（ｂ）に示す文書群の生成において、検索対象の文書集合中いずれの文書とも引用関係を持っていない文書が存在する場合、１文書から構成される文書群を生成するなどの方法がある。 In the document group generation process, a document group is generated by collecting the document groups having the citation relationship described above, and a search index is generated. In the generation of the document group shown in FIG. 1B, when there is a document that does not have a citation relationship with any document in the search target document set, a document group composed of one document is generated. There is a way.

上記のように、引用関係とは、文書間の関連を文書の著者又は文書の参照者が定義したものであり、信頼性が高い。本発明では、そのような信頼性の高い関連に基づいて文書群を生成し、特徴語を抽出する。一般的に文書の検索インデックスの生成では、ある文書のインデックスを生成する範囲、すなわち特徴語を抽出する文字列の範囲が広ければ広いほど、検索の漏れが少なくなることが知られている。このことから、ある文書に関して引用関係にある文書をまとめて文書群を生成し、文書群をひとつの文書と見なして検索インデックスを生成すると、ある文書に対する検索の漏れが少なくなることが期待できる。さらにユーザは、関連のある文書を効率的に参照することができる。 As described above, the citation relationship is defined by the document author or the document referenceer as the relationship between documents, and has high reliability. In the present invention, a document group is generated based on such a highly reliable relationship, and feature words are extracted. In general, in generating a search index for a document, it is known that the wider the range in which a certain document index is generated, that is, the range of character strings from which feature words are extracted, the smaller the omission of search. From this, it can be expected that if a document group is generated by grouping documents having a citation relationship with respect to a certain document, and a search index is generated by regarding the document group as one document, omission of search for a certain document is reduced. Furthermore, the user can refer to related documents efficiently.

文書群生成処理においては、引用先文書が同じである文書をまとめて文書群を生成する。ここで、引用先文書を中心として引用関係に基づく文書群の生成について説明する。 In the document group generation process, documents having the same cited document are grouped to generate a document group. Here, generation of a document group based on the citation relationship with the citation destination document as the center will be described.

文書には、他の文書からよく引用されるものがある。そのような文書は、その文書が属する分野において高い評価を受けている、あるいはその分野における重要な事項が記載されているなどの特徴を持つ。文書群生成処理において、ある文書（引用先文書）に対してそれを引用している文書（引用元文書）を集めると、前述のようなある分野において高い評価を受けている、あるいは重要な事項の記載を含む文書とその関連文書からなる文書群が生成される可能性が高い。本発明は、この点に着眼し、引用先文書が同じである文書をまとめて文書群を生成する。これにより、ユーザはある分野において引用されやすい文書を効率的に検索することができる。 Some documents are often cited from other documents. Such a document has characteristics such as being highly evaluated in the field to which the document belongs, or having important matters in the field described therein. When collecting documents (quoting documents) that cite a certain document (cited document) in the document group generation process, it is highly evaluated or important in certain fields as described above There is a high possibility that a document group including a document including the description and related documents is generated. The present invention focuses on this point, and generates a document group by collecting documents having the same cited document. As a result, the user can efficiently search for documents that are easily cited in a certain field.

文書群生成処理は、引用先文書において他の文書から引用されている被引用部分を取得する被引用部分取得処理と、取得した被引用先部分が同じである文書をまとめて文書群を生成する文書まとめあげ処理を含んでいてもよい。ここで本発明では、引用元文書において引用先文書を引用している部分を「引用元箇所」、引用先文書において引用元文書から引用されている部分を「引用先箇所」と呼ぶ。例を用いて説明する。 The document group generation process generates a document group by combining the cited part acquisition process for acquiring a cited part cited from another document in the cited document and the document having the same cited target part. Document summarization processing may be included. Here, in the present invention, a part in which the cited document is cited in the cited document is referred to as a “cited part”, and a part cited from the cited document in the cited document is referred to as a “cited part”. This will be described using an example.

先に示した図２において文書１の点線で囲まれた部分は、引用先文書である文書２を引用している。このような部分を「引用元箇所」と呼ぶ。図中、文書１の引用元箇所は、薬の投与による症状の改善に関して文書２を引用している。これに対して文書２中点線で囲まれている部分は、薬の投与による症状の改善について述べられており、先の引用元箇所に対応している。このように他の文書から引用されている記述がある部分を、「引用先箇所」と呼ぶ。 In FIG. 2 shown above, the part surrounded by the dotted line of the document 1 quotes the document 2 that is the cited document. Such a part is called a “quoting source part”. In the figure, the citation source part of document 1 cites document 2 regarding improvement of symptoms by administration of the drug. On the other hand, the part surrounded by the dotted line in Document 2 describes the improvement of the symptoms by administration of the drug, and corresponds to the previous quotation source part. A portion having a description quoted from another document in this way is referred to as a “quotation place”.

また特許文書の引用関係の一例を示した図３において、文書３、すなわち拒絶理由通知書には、審査対象の特許の「請求項１」を拒絶する理由として引用文献の［００２７］が引用されている。このような文書の著者以外の定義による引用関係においても、引用元文書と引用先文書で対応している箇所をそれぞれ「引用元箇所」、「引用先箇所」と呼ぶ。具体的には図３中、審査対象の特許の請求項１、すなわち文書４の点線で囲まれた部分が「引用元箇所」、引用文献の［００２７］、すなわち文書５の点線で囲まれた部分が「引用先箇所」である。 In FIG. 3 showing an example of the citation relationship of patent documents, in the document 3, that is, the notice of reasons for refusal, [0027] of the cited document is cited as the reason for refusing “claim 1” of the patent to be examined. ing. Even in such a citation relationship based on definitions other than the author of the document, portions corresponding to the citation source document and the citation destination document are referred to as a “citation source location” and a “citation destination location”, respectively. Specifically, in FIG. 3, claim 1 of the patent to be examined, that is, the portion surrounded by the dotted line of the document 4 is surrounded by the dotted line of the cited reference [0027], that is, the document 5 of the cited document. The part is “cited place”.

図１に示す引用関係に基づく文書群の生成において、引用元箇所及び引用先箇所を考慮した文書群生成処理では、引用先文書が同じであり、かつ当該文書中の引用先箇所に対応する引用元箇所を含む引用元文書を集めて文書群を生成する。 In the generation of a document group based on the citation relationship shown in FIG. 1, in the document group generation process considering the citation source location and the citation destination location, the citation destination document is the same and the citation corresponding to the citation destination location in the document is included. A group of documents is generated by collecting citation documents including the original part.

ここで引用元文書の引用元箇所及びそれに対応する引用先文書の引用先箇所は、検索対象の文書集合からあらかじめ人手により抽出しておく、あるいは引用関係にある文書を解析して自動的に抽出するなどすればよい。一例として、文書をデータベースに登録する際、著者が文書において引用した箇所を同時に登録する、あるいは図３に示す拒絶理由通知書のように文書を引用する表現に一定のパターン、例えば「引用文献１の［ｘｘｘｘ］
」など、が見られる場合、そのパターンを用いて引用元箇所と引用先箇所を抽出するなどの方法が考えられる。 Here, the citation source part of the citation source document and the citation destination part of the citation target document corresponding to the citation source document are manually extracted from the search target document set in advance, or automatically extracted by analyzing the citation-related documents. You can do it. As an example, when a document is registered in a database, a part that is cited by the author in the document is registered at the same time, or the expression for quoting the document as in the notice of reasons for refusal shown in FIG. [Xxxx]
Or the like, a method of extracting a citation source location and a citation destination location using the pattern is conceivable.

一般に、文書には複数のトピックが含まれている。そのため同じ文書（引用先文書）を引用している複数の文書（引用元文書）は、必ずしも同じトピックを引用しているとは限らない。それぞれ、引用先文書中の異なるトピックを引用している可能性がある。引用元箇所及び引用先箇所を考慮した文書群生成処理は、引用先文書中の引用先箇所が同じである引用元文書をまとめて文書群を生成するので、生成された文書群は同じトピックを引用している文書から構成される可能性が高くなる。これによりユーザは所望の文書を効率的に見つけることが可能となる。 In general, a document includes a plurality of topics. Therefore, a plurality of documents (quoting documents) that cite the same document (cited document) do not always cite the same topic. Each may cite a different topic in the cited document. In the document group generation process considering the citation source part and the citation destination part, the citation source documents having the same citation destination part in the citation target document are collectively generated, so that the generated document group has the same topic. It is more likely to be composed of cited documents. As a result, the user can efficiently find a desired document.

また、文書群生成処理は、引用先文書を分割する引用先文書分割処理と、分割した引用先文書の部分に対して引用元文書を割り付けることにより文書群を生成する文書割付処理を含んでいてもよい。 The document group generation process includes a citation document division process for dividing the citation document and a document allocation process for generating a document group by allocating the citation document to the divided citation document part. Also good.

先に述べたように引用先箇所は、人手あるいは自動的に抽出するという方法が考えられる。しかし引用先箇所は、引用元箇所と比べて引用元文書において明示的に記載されていないなど抽出が困難な場合がある。このような場合、本発明では、引用先文書を分割して、引用元文書を割り付けることにより文書群を生成する。 As described above, it is conceivable to extract the cited part manually or automatically. However, it may be difficult to extract the quoted part because it is not explicitly described in the cited source document as compared to the cited part. In such a case, in the present invention, the document group is generated by dividing the cited document and assigning the cited document.

図４を用いて説明する。まず、検索対象の文書集合から同じ引用先文書を引用している引用元文書を集めて仮の文書群を生成する（図中(i)）。次に、各仮の文書群に対して、次の処理を繰り返す。まず、引用先文書を分割する。分割の手段については後述する（図中(ii)）。次に、分割された引用先文書の部分に対して、同じ仮の文書群に含まれる引用元文書を割り付ける（図中(iii)）。具体的には、引用先文書と同じ仮の文書群に含まれる引用元文書に対して、次の処理を繰り返す。すなわち、分割された引用先文書の各部分に対して引用元文書との類似度を計算し、最も類似度の高い部分に対して引用元文書を割り付ける。最後に、引用先文書と、引用先文書の同じ部分に対して割り付けられた引用元文書から構成される文書群を生成する（図中(iv)）。なお、分割された引用先文書の部分のうち、いずれの引用元文書も割り付けられなかった部分は、いずれの文書群にも含まれない。この方法により、引用先文書の引用先箇所が不明な場合でも低コストで、引用先文書の部分と引用元文書から構成される文書群を生成することが可能となる。 This will be described with reference to FIG. First, citation documents that cite the same citation documents are collected from a search target document set to generate a temporary document group ((i) in the figure). Next, the following processing is repeated for each temporary document group. First, the cited document is divided. The dividing means will be described later ((ii) in the figure). Next, a citation source document included in the same temporary document group is allocated to the divided citation target portion ((iii) in the figure). Specifically, the following processing is repeated for the citation source document included in the same temporary document group as the citation destination document. That is, the similarity with the citation source document is calculated for each portion of the divided citation destination document, and the citation source document is assigned to the portion with the highest similarity. Finally, a document group composed of the cited document and the cited document assigned to the same part of the cited document is generated ((iv) in the figure). Note that, among the divided citation target parts, a part to which no citation document is assigned is not included in any document group. By this method, even when the citation destination part of the citation destination document is unknown, it is possible to generate a document group including the citation destination document portion and the citation source document at low cost.

引用先文書分割方法の一例は、引用先文書の文書構造に従って文書を分割する方法である。具体的には、文書に付与されている文書の構造を示すタグ、又は文書のレイアウトを手がかりに、文書を章や節、段落ごとに分割する。あるいは引用先文書が特許文書の場合は、「特許請求の範囲」を分割対象とし、各請求項に分割して引用先文書の部分とするなどの方法が考えられる。この方法により、引用先文書を容易に分割することが可能となる。また、この方法で分割された引用先文書の部分は、元の引用先文書に一致している部分（章、節、段落などの文書構造）があるので、ユーザにとって元の引用先文書との構造的な関係が把握しやすいという利点もある。 An example of the cited document dividing method is a method of dividing a document according to the document structure of the cited document. Specifically, the document is divided into chapters, sections, and paragraphs using a tag indicating the document structure attached to the document or a document layout as a clue. Alternatively, in the case where the cited document is a patent document, a method may be considered in which “claims” are to be divided and divided into each claim to be a part of the cited document. This method makes it possible to easily divide the cited document. In addition, the part of the cited document divided by this method has a part (document structure such as chapter, section, paragraph, etc.) that matches the original cited document. There is also an advantage that the structural relationship is easy to grasp.

引用先文書分割方法の別の例は、引用先文書の特徴語をクラスタリングして文書を分割する引用先文書タームクラスタリング法である。具体的には、引用先文書から特徴語を抽出して、それらに対してタームクラスタリングの技術を適用し、生成されたタームクラスタを引用先文書の部分と見なす。この方法により、引用先文書を容易に分割することが可能となる。また、この方法で分割された引用先文書の部分は、引用先文書に含まれるトピックを表しているので、ユーザにとって元の引用先文書との内容的な関係が把握しやすいという利点もある。 Another example of the cited document dividing method is a cited document term clustering method in which feature words of the cited document are clustered to divide the document. Specifically, feature words are extracted from the cited document, term clustering technology is applied to them, and the generated term cluster is regarded as a part of the cited document. This method makes it possible to easily divide the cited document. In addition, since the portion of the cited document divided by this method represents a topic included in the cited document, there is also an advantage that the user can easily understand the content relationship with the original cited document.

前記文書割付処理は、同じ文書を引用している複数の引用元文書をクラスタリングする引用元文書クラスタリング処理と、上記引用先文書分割処理により分割した引用先文書の部分に、引用元文書のクラスタを割り付けることにより文書群を生成する引用元文書クラスタ割付処理を含んでいてもよい。 The document allocation process includes a citation source document clustering process for clustering a plurality of citation documents that cite the same document, and a citation source document cluster is added to a citation source document portion divided by the citation destination document division process. Citation source document cluster assignment processing for generating a document group by assignment may be included.

図４の(iii)に示した引用先文書の部分への引用元文書の割り付けにおいては、前述のように引用元文書をひとつひとつ割り付けてもよいし、引用元文書を文書クラスタリングの技術により分類してからクラスタごと割り付けてもよい。ここで、クラスタに含まれる引用元文書は互いに意味的な関連を持つため、生成された文書群は意味的に関連する文書から構成される可能性が高くなる。 In the assignment of the cited document to the cited document part shown in (iii) of FIG. 4, the cited document may be assigned one by one as described above, or the cited document is classified by the document clustering technique. After that, it may be assigned for each cluster. Here, since the citation documents included in the cluster have a semantic relationship with each other, there is a high possibility that the generated document group includes semantically related documents.

文書群検索インデックス生成処理では、前記被引用部分取得手段により取得した引用先文書の被引用部分と、取得した被引用部分を引用している引用元文書の部分から特徴語を抽出して文書群の検索インデックスを生成してもよい。なお、特徴語の抽出は、詳細には、引用先箇所及び引用元箇所からそれぞれ単語を抽出して、次に抽出した単語をマージする、あるいはまず引用先箇所と引用元箇所の文字列をマージして仮の文書を生成し、当該文書から単語を抽出するなどにより達成可能である。前者の場合は、各文書における単語の出現頻度などで単語に重みを付けることが可能となる。後者の場合は、文書群全体で見たときの単語の出現頻度を単語の重みに反映することが可能となる。 In the document group search index generation process, a feature word is extracted from the cited part of the cited document obtained by the cited part obtaining unit and the part of the citation source document that cites the obtained cited part. The search index may be generated. In addition, the extraction of feature words, in detail, extract words from the quotation destination location and the quotation source location, and then merge the extracted words, or first merge the character strings of the quotation destination location and the quotation source location This can be achieved by generating a temporary document and extracting words from the document. In the former case, the words can be weighted based on the appearance frequency of the words in each document. In the latter case, it is possible to reflect the appearance frequency of the word as viewed in the entire document group in the weight of the word.

このように引用関係にある文書において関連する部分から特徴語を抽出することで、文書群の特徴をよく表す検索インデックスが生成可能である。これによりユーザは、検索キーと適合した文書群を効率的に取得することができる。 Thus, by extracting feature words from related parts in a document having a citation relationship, a search index that well represents the characteristics of a document group can be generated. As a result, the user can efficiently acquire a document group that matches the search key.

ここで、引用先文書の被引用部分と引用元文書の部分から特徴語を抽出する際、被引用部分に出現する単語により高い重みを付与して特徴語を抽出し文書群の検索インデックスを生成するようにしてもよい。 Here, when extracting feature words from the cited part of the cited document and the cited document part, the feature word is extracted by assigning a higher weight to the word appearing in the cited part, and a search index for the document group is generated. You may make it do.

文書群に含まれる引用元文書はすべて引用先文書の引用先箇所と関連を持っている。このことから引用先文書の引用先箇所は、文書群に含まれる文書の中心として位置付けられる。このとき、文書群に含まれる文書の中心として位置付けられる引用先箇所に出現する単語に高い重みを付与すると、文書群の特徴をよく表す検索インデックスが生成可能となる。 All the citation documents included in the document group are related to the citation location of the citation document. For this reason, the citation destination part of the citation target document is positioned as the center of the document included in the document group. At this time, if a high weight is given to the word appearing at the quotation destination position positioned as the center of the document included in the document group, a search index that well represents the characteristics of the document group can be generated.

文書群検索インデックス生成処理は、前述のように被引用部分取得手段により取得した引用先文書の被引用部分と、取得した被引用部分を引用している引用元文書の部分から特徴語を抽出して文書群の検索インデックスを生成することに代えて、文書分割処理により分割された引用先文書の部分と、文書割付処理により引用先文書の部分に割り付けられた文書から特徴語を抽出して文書群の検索インデックスを生成するようにしてもよい。 As described above, the document group search index generation process extracts feature words from the cited part of the cited document obtained by the cited part obtaining unit and the part of the cited document that cites the obtained cited part. Instead of generating a search index for a group of documents, a feature word is extracted from the cited document part divided by the document dividing process and the document assigned to the cited document part by the document assigning process. A group search index may be generated.

この方法によっても、引用関係にある文書において関連する部分から特徴語を抽出するので、文書群の特徴をよく表す検索インデックスが生成可能である。これによりユーザは、検索キーと適合した文書群を効率的に取得することができる。 Also with this method, since feature words are extracted from related portions in a document having a citation relationship, a search index that well represents the characteristics of a document group can be generated. As a result, the user can efficiently acquire a document group that matches the search key.

引用先文書の部分と、当該引用先文書の部分に割り付けられた文書から特徴語を抽出する際、引用先文書の部分に出現する単語により高い重みを付与して特徴語を抽出し文書群の検索インデックスを生成するようにしてもよい。文書群生成処理により生成される文書群において、引用先文書の部分は引用元文書間の関連の中心として位置付けられるが、一方で引用先箇所は文書の一部分であるため、そこから抽出される単語数は引用元文書と比べて少ないことが予想される。これに対して、この方法では、引用先箇所に出現する単語には高い重みを付与する。このように引用先箇所に出現する単語に高い重みを付与することにより、文書群の特徴をよく表す検索インデックスが生成可能となる。 When extracting feature words from the cited document part and the document assigned to the cited document part, the feature words are extracted by assigning higher weights to the words appearing in the cited document part. A search index may be generated. In the document group generated by the document group generation process, the cited document part is positioned as the center of the relationship between the cited document, while the cited part is a part of the document, so the word extracted therefrom The number is expected to be small compared to the original document. On the other hand, in this method, a high weight is given to the word appearing at the quotation destination location. Thus, by assigning a high weight to the word appearing at the quotation destination location, a search index that well represents the characteristics of the document group can be generated.

本発明の文書検索装置は、文書群検索インデックス生成手段により生成された文書群検索インデックスを検索する文書群検索手段と、文書群検索手段により取得した文書群検索結果を表示する文書群検索結果表示手段を有し、文書群検索結果表示手段は、文書群の表示において、引用先文書と引用元文書を一覧表示し、さらに引用先文書の引用先箇所を容易に識別可能な状態で表示し、引用元文書中、引用先箇所に出現する単語は反転表示する。 The document search apparatus according to the present invention includes a document group search unit that searches the document group search index generated by the document group search index generation unit, and a document group search result display that displays the document group search result acquired by the document group search unit. The document group search result display means displays a list of the cited document and the cited source document in the display of the document group, and further displays the cited destination position of the cited document in an easily identifiable state. In the citation source document, the word appearing at the citation location is highlighted.

前述のように引用先文書の引用先箇所は文書群に含まれる文書の中心的な位置づけにある。上記の文書群検索結果表示より、ユーザは引用先文書の部分である引用先箇所を容易に参照することができるので、ユーザは文書群の特徴を効率的に把握することが可能となる。さらに引用元文書中、引用先箇所に出現する単語が反転表示されるので、ユーザは引用元文書と引用先箇所との関連を容易に把握することが可能となる。ここで、引用先文書の引用先箇所とは、引用先文書の部分、又は引用先文書に出現する単語を要素とするタームクラスタをさす。 As described above, the citation destination of the citation target document is at the central position of the documents included in the document group. From the above document group search result display, the user can easily refer to the citation destination part, which is a part of the citation target document, so that the user can efficiently grasp the characteristics of the document group. Further, since the word appearing at the citation destination location is highlighted in the citation source document, the user can easily grasp the relationship between the citation source document and the citation location. Here, the citation destination part of the citation target document refers to a term cluster having a part of the citation target document or a word appearing in the citation target document as an element.

本発明の文書検索装置は、また、文書群検索インデックス生成手段により生成された文書群検索インデックスを検索する文書群検索手段と、検索対象の文書集合に含まれる各文書から特徴語を抽出して生成した文書検索インデックスを検索する文書検索手段と、文書群検索手段により取得した文書群検索結果と、文書検索手段により取得した文書検索結果を対比させて表示する文書検索結果表示手段を有する。この場合、ユーザは、同じ検索対象の文書集合に対する複数の検索結果を参照することができるので、所望の文書を効率的に見つけることが可能となる。 The document search apparatus of the present invention also extracts a document group search means for searching the document group search index generated by the document group search index generation means, and extracts feature words from each document included in the document set to be searched. Document search means for searching the generated document search index, document search result display means for displaying the document group search result acquired by the document search means and the document search result acquired by the document search means in comparison. In this case, the user can refer to a plurality of search results for the same set of documents to be searched, so that a desired document can be found efficiently.

文書検索結果表示手段は、文書群検索手段により取得した文書群検索結果に含まれる引用元文書を、文書検索手段により取得した文書検索結果におけるランクで並び替えて表示してもよい。文書群には、ひとつ以上の引用元文書が含まれる。これら引用元文書を入力検索キーとの適合度順に表示することにより、文書群として入力に適合していて、かつ文書としても適合しているものを容易に参照することが可能となる。 The document search result display unit may display the citation source documents included in the document group search result acquired by the document group search unit by rearranging them according to the rank in the document search result acquired by the document search unit. The document group includes one or more citation documents. By displaying these citation documents in the order of suitability with the input search key, it is possible to easily refer to documents that are suitable for input as a document group and also suitable as documents.

文書検索結果表示手段は、また、文書検索手段により取得した文書検索結果の表示において、文書群検索手段により取得した文書群検索結果におけるランクを表示してもよい。この場合、ユーザは、文書群検索インデックスによる適合度のランクと、文書検索インデックスによる適合度のランクを容易に比較することができる。これにより、文書としてランクが高く、かつ文書群でもランクが高い文書は所望の文書である可能性が高い、あるいは文書としては低いランクでも文書群でランクが高ければ一応参照するなど、参照の際の参考情報が得られるので、参照漏れや効率化に有用である。 The document search result display unit may display the rank in the document group search result acquired by the document group search unit in displaying the document search result acquired by the document search unit. In this case, the user can easily compare the suitability rank based on the document group search index and the suitability rank based on the document search index. As a result, a document with a high rank as a document and a document group with a high rank is highly likely to be a desired document, or a document with a low rank but a document group with a high rank is referred to once. It is useful for reference omission and efficiency improvement.

以下、本発明を実施例によって詳細に説明する。
［実施例１］
図５を用いて本発明の第一の実施例にかかるシステムの全体構成を示し、続いて図６を用いて本発明の第一の実施例にかかるシステムを構成するコンピュータのハードウェア構成を示す。次に、図７〜図１２を用いて文書検索システムのモジュール構成及び各構成要素の詳細を説明し、図１３〜図１５を用いて本発明の第一の実施例にかかる文書検索システムの処理手順について述べる。最後に、図１６及び図１７を用いて本発明の第一の実施例にかかる文書検索システムの検索結果表示画面の一例を示す。 Hereinafter, the present invention will be described in detail by way of examples.
[Example 1]
5 shows the overall configuration of the system according to the first embodiment of the present invention, and FIG. 6 shows the hardware configuration of a computer constituting the system according to the first embodiment of the present invention. . Next, the module configuration and details of each component of the document search system will be described with reference to FIGS. 7 to 12, and the processing of the document search system according to the first embodiment of the present invention will be described with reference to FIGS. The procedure is described. Finally, an example of a search result display screen of the document search system according to the first embodiment of the present invention is shown using FIG. 16 and FIG.

（システムの全体構成）
まず、図５を用いて本発明の第一の実施例にかかるシステムの全体構成について説明する。図５に示すように文書検索システムは、文書検索サーバ１、文書検索クライアント２、及び通信回線３から構成される。文書検索サーバ１は、検索対象の文書集合及び検索インデックスを格納し、前記検索インデックスを検索するコンピュータである。文書検索クライアント２は、文書の検索を行うユーザが用いるコンピュータであり、文書検索サーバ１に格納された文書を検索するために用いる。通信回線３は、コンピュータ間にデータを受け渡す伝送路であり、文書検索サーバ１及び文書検索クライアント２を接続しており、有線、無線を問わない。具体的には、ＬＡＮやインターネットを指す。なお、図５では文書検索サーバ１は１台の構成であるが、これに限定しない。文書検索サーバ１は複数のコンピュータから構成されてもよい。 (Overall system configuration)
First, the overall configuration of the system according to the first embodiment of the present invention will be described with reference to FIG. As shown in FIG. 5, the document search system includes a document search server 1, a document search client 2, and a communication line 3. The document search server 1 is a computer that stores a search target document set and a search index, and searches the search index. The document search client 2 is a computer used by a user who searches for a document, and is used to search for a document stored in the document search server 1. The communication line 3 is a transmission path for transferring data between computers, and connects the document search server 1 and the document search client 2, whether wired or wireless. Specifically, it refers to a LAN or the Internet. In FIG. 5, the document search server 1 has a single configuration, but the present invention is not limited to this. The document search server 1 may be composed of a plurality of computers.

（ハードウェア構成）
続いて図６を用いて、本発明の第一の実施例にかかる文書検索サーバ１、及び文書検索クライアント２のハードウェア構成について説明する。これらはすべて同一の構成であるため、ここでは図６に示す文書検索サーバ１のハードウェア構成図を用いて説明する。 (Hardware configuration)
Next, the hardware configuration of the document search server 1 and the document search client 2 according to the first embodiment of the present invention will be described with reference to FIG. Since all of these have the same configuration, description will be made here with reference to the hardware configuration diagram of the document search server 1 shown in FIG.

図６に示すように、文書検索サーバ１は、ＣＰＵ１０、ハードディスク２０、メモリ３０、ディスプレイ４０、ディスプレイ制御部４１、キーボード５０、キーボード制御部５１、マウス６０、マウス制御部６１、及びバス７０から構成される。ＣＰＵ１０は、データの入出力、読み込み、格納及び各種処理を実行する。ハードディスク２０はデータを保存する装置、メモリ３０はプログラム及びデータをロードして記憶する装置である。ディスプレイ４０はユーザにデータを表示する装置であり、ディスプレイ制御部４１によって制御される。キーボード５０及びマウス６０はユーザからの入力を受け付ける装置であり、それぞれキーボード制御部５１及びマウス制御部６１によって制御される。バス７０は、各構成要素間にデータを受け渡す。 As shown in FIG. 6, the document search server 1 includes a CPU 10, a hard disk 20, a memory 30, a display 40, a display control unit 41, a keyboard 50, a keyboard control unit 51, a mouse 60, a mouse control unit 61, and a bus 70. Is done. The CPU 10 executes data input / output, reading, storage, and various processes. The hard disk 20 is a device that stores data, and the memory 30 is a device that loads and stores programs and data. The display 40 is a device that displays data to the user, and is controlled by the display control unit 41. The keyboard 50 and the mouse 60 are devices that receive input from the user, and are controlled by the keyboard control unit 51 and the mouse control unit 61, respectively. The bus 70 passes data between the components.

（モジュール構成）
次に、図７〜図１２を用いて、本発明の第一の実施例にかかる文書検索システムのモジュール構成について説明する。図７は、本発明の第一の実施例にかかる文書検索システムのモジュール構成を示す図である。図７に示すように、文書検索システムは次の３種類のモジュールから構成される。すなわち、各種処理を実行する処理部（画面中、矩形で示す。以下、他の実施例の説明においても同じ。）、データを格納するデータ格納部（ドラム形。以下、同じ。）、及びテンポラリに生成されるデータを格納するテンポラリデータ部（テープ形。以下、同じ。）である。以下、図７に示す各モジュールについて説明する。 (Module configuration)
Next, the module configuration of the document search system according to the first embodiment of the present invention will be described with reference to FIGS. FIG. 7 is a diagram showing a module configuration of the document search system according to the first embodiment of the present invention. As shown in FIG. 7, the document search system includes the following three types of modules. That is, a processing unit that executes various processes (indicated by a rectangle in the screen; hereinafter, the same applies to the description of other embodiments), a data storage unit that stores data (drum type; the same applies hereinafter), and a temporary Is a temporary data portion (tape type; the same applies hereinafter) for storing data generated in Hereinafter, each module shown in FIG. 7 will be described.

（文書検索システムのモジュール構成：処理部）
図７に示すように、文書検索システムの処理部は、文書検索サーバ１における文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、及び文書検索クライアント２におけるクライアント通信部１１ｅ、検索キー受付部１１ｆならびに検索結果表示部１１ｇから構成される。 (Module configuration of document search system: processing section)
As shown in FIG. 7, the processing unit of the document search system includes a document group generation unit 11a, a document group search index generation unit 11b, a document group search unit 11c, a server communication unit 11d, and a document search client 2 in the document search server 1. Client communication unit 11e, search key receiving unit 11f, and search result display unit 11g.

文書検索サーバ１における文書群生成部１１ａは、文書格納部１２ａに格納された文書から文書間の引用関係に基づき文書群を生成し、文書群格納部１２ｂに格納する。当該文書群生成部１１ａは、さらに引用先文書分割部１１ａ１及び文書割付部１１ａ２から構成される。引用先文書分割部１１ａは引用先文書構造分割部１１ａ１１を有し、引用関係にある文書の集合に含まれる引用先文書を文書の構造に基づいて分割する。文書割付部１１ａ２は、引用関係にある文書の集合のうち引用元文書を前記分割した引用先文書の部分に割り付ける。文書群検索インデックス生成部１１ｂは、文書群格納部１２ｂに格納された文書群の検索インデックスを生成し、文書群検索インデックス格納部１２ｃに格納する。文書群検索部１１ｃは、検索キーデータ１３ａに格納された検索キーに基づき文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索し、その結果を文書群検索結果データ１３ｂに格納する。サーバ通信部１１ｄは、文書検索クライアント２から送信された検索キーデータ１３ａを受信して、文書群検索結果データ１３ｂを文書検索クライアント２に送信する。 The document group generation unit 11a in the document search server 1 generates a document group based on the citation relationship between documents from the document stored in the document storage unit 12a, and stores the document group in the document group storage unit 12b. The document group generation unit 11a further includes a citation document division unit 11a1 and a document allocation unit 11a2. The cited document dividing unit 11a includes a cited document structure dividing unit 11a11, and divides a cited document included in a set of documents having a citation relationship based on the document structure. The document assigning unit 11a2 assigns the citation source document to the divided citation destination part of the set of documents having the citation relationship. The document group search index generation unit 11b generates a search index for the document group stored in the document group storage unit 12b and stores it in the document group search index storage unit 12c. The document group search unit 11c searches the document group search index stored in the document group search index storage unit 12c based on the search key stored in the search key data 13a, and stores the result in the document group search result data 13b. . The server communication unit 11 d receives the search key data 13 a transmitted from the document search client 2 and transmits the document group search result data 13 b to the document search client 2.

文書検索クライアント２におけるクライアント通信部１１ｅは、検索キーデータ１３ａを文書検索サーバ１に送信し、文書検索サーバ１から送信された文書群検索結果データ１３ｂを受信する。検索キー受付部１１ｆは、ユーザから検索キーを受け付けて検索キーデータ１３ａに格納する。検索結果表示部１１ｇは、文書群検索結果データ１３ｂに格納された文書群検索結果に格納された文書群検索結果をユーザに表示する。 The client communication unit 11e in the document search client 2 transmits the search key data 13a to the document search server 1 and receives the document group search result data 13b transmitted from the document search server 1. The search key receiving unit 11f receives a search key from the user and stores it in the search key data 13a. The search result display unit 11g displays the document group search result stored in the document group search result stored in the document group search result data 13b to the user.

上記処理部のうち文書群検索部１１ｃ、サーバ通信部１１ｄ、クライアント通信部１１ｅ及び検索キー受付部１１ｆは、一般的な文書検索技術を適用することにより実装可能であるので詳細な説明は省略する。文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、及び検索結果表示部１１ｇの詳細は、フローチャート又は画面例を用いて後述する。 Among the above processing units, the document group search unit 11c, the server communication unit 11d, the client communication unit 11e, and the search key reception unit 11f can be implemented by applying a general document search technology, and thus detailed description thereof is omitted. . Details of the document group generation unit 11a, the document group search index generation unit 11b, and the search result display unit 11g will be described later with reference to flowcharts or screen examples.

（文書検索システムのモジュール構成：データ格納部）
次に、図中にドラム形で示されるデータ格納部について説明する。図７に示すように、文書検索システムの文書検索サーバ１におけるデータ格納部は、文書格納部１２ａ、文書群格納部１２ｂ、及び文書群検索インデックス格納部１２ｃの３つがある。 (Module configuration of document retrieval system: data storage unit)
Next, a data storage unit indicated by a drum shape in the figure will be described. As shown in FIG. 7, there are three data storage units in the document search server 1 of the document search system: a document storage unit 12a, a document group storage unit 12b, and a document group search index storage unit 12c.

文書格納部１２ａは検索対象の文書を格納するエリア、文書群格納部１２ｂは文書格納部１２ａに格納されている文書から生成した文書群を格納するエリア、文書群検索インデックス格納部１２ｃは文書群格納部１２ｂに格納されている文書群を検索するためのインデックスを格納するエリアである。
ここで、各データ格納部の詳細な構成について図８〜図１０を用いて説明する。 The document storage unit 12a is an area for storing documents to be searched, the document group storage unit 12b is an area for storing document groups generated from the documents stored in the document storage unit 12a, and the document group search index storage unit 12c is a document group. This is an area for storing an index for searching a document group stored in the storage unit 12b.
Here, a detailed configuration of each data storage unit will be described with reference to FIGS.

（文書格納部１２ａの構成）
図８は、文書格納部１２ａの構成を示す図である。図８に示すように、文書格納部１２ａは、文書番号１２ａ０１、テキスト１２ａ０２、及び引用先文書番号１２ａ０３から構成される。文書番号１２ａ０１は、文書を識別するための番号を格納する。テキスト１２ａ０２は、文書の文字列を格納する。引用先文書番号１２ａ０３は、文書が引用している文書の文書番号を格納する。より詳細には、論文などの文書において、参考文献として記載されている文書の書誌的な情報に基づき引用先文書を同定し、当該文書に対応する文書番号１２ａ０１を取得し、格納する。引用先文書の同定は、「コンピュータソフトウェア，Vol.13, No.1, pp.35-39, (1997)」などに記載の公知の技術を用いて行うことができる。 (Configuration of Document Storage Unit 12a)
FIG. 8 is a diagram showing the configuration of the document storage unit 12a. As shown in FIG. 8, the document storage unit 12a includes a document number 12a01, a text 12a02, and a cited document number 12a03. The document number 12a01 stores a number for identifying a document. The text 12a02 stores a character string of the document. The cited document number 12a03 stores the document number of the document cited by the document. More specifically, in a document such as a paper, a cited document is identified based on bibliographic information of a document described as a reference, and a document number 12a01 corresponding to the document is acquired and stored. The cited document can be identified using a known technique described in “Computer Software, Vol. 13, No. 1, pp. 35-39, (1997)”.

図８中１行目には、文書番号が「doc050789」である文書のタグ付のテキスト「<doc><title>門脈肺高血圧症の一例について</title>…」が格納されており、さらには当該文書で引用している文書の番号として「doc013024」、「doc027863」等が格納されている。 The first line in FIG. 8 stores a text “<doc> <title> Example of portal pulmonary hypertension </ title> ...” with a tag of a document whose document number is “doc050789”. Further, “doc013024”, “doc027863”, and the like are stored as document numbers cited in the document.

（文書群格納部１２ｂの構成）
図９は、文書群格納部１２ｂの構成を示す図である。図９に示すように、文書群格納部１２ｂは、文書群番号１２ｂ０１、引用先文書番号１２ｂ０２、引用先箇所１２ｂ０３、及び引用元文書番号１２ｂ０４から構成される。文書群番号１２ｂ０１は、生成した文書群を識別する番号を格納する。引用先文書番号１２ｂ０２は、当該生成された文書群における引用先文書の番号を格納する。詳細には引用先文書番号１２ｂ０２は、引用先文書が格納されている文書格納部１２ａにおける文書番号１２ａ０１の値を格納する。引用先箇所１２ｂ０３は、前記引用先文書において、後述する引用元文書から引用されている部分の文字列を格納する。引用元文書番号１２ｂ０４は、前記引用先箇所を引用している文書の番号を格納する。詳細には引用元文書番号１２ｂ０４は、前記引用箇所を引用している文書が格納されている文書格納部１２ａにおける文書番号１２ａ０１を格納する。 (Configuration of Document Group Storage Unit 12b)
FIG. 9 is a diagram showing a configuration of the document group storage unit 12b. As shown in FIG. 9, the document group storage unit 12b includes a document group number 12b01, a citation destination document number 12b02, a citation destination location 12b03, and a citation source document number 12b04. The document group number 12b01 stores a number for identifying the generated document group. The cited document number 12b02 stores the number of the cited document in the generated document group. Specifically, the cited document number 12b02 stores the value of the document number 12a01 in the document storage unit 12a in which the cited document is stored. The citation location 12b03 stores a character string of a portion quoted from a later-described citation source document in the citation destination document. The citation source document number 12b04 stores the number of the document quoting the citation destination location. Specifically, the citation source document number 12b04 stores the document number 12a01 in the document storage unit 12a in which the document quoting the cited part is stored.

図９中１行目には、文書群番号１２ｂ０１が「cl048901」である文書群が格納されている。当該文書群の引用先文書番号１２ｂ０２は「doc050789」であり、当該引用先文書の引用先箇所１２ｂ０３は「＜ｐ＞症例検査の…＜／ｐ＞」である。さらに当該引用先箇所を引用している引用元文書番号１２ｂ０４は「doc070041」、「doc068344」である。 In the first line in FIG. 9, a document group whose document group number 12b01 is “cl048901” is stored. The citation destination document number 12b02 of the document group is “doc050789”, and the citation destination location 12b03 of the citation destination document is “ case of examination ... ”. Further, the citation source document number 12b04 quoting the citation location is “doc070041” and “doc068344”.

図中、３行目、４行目、５行目には、文書群番号１２ｂ０１がそれぞれ「cl048903」、「cl048904」、「cl048905」である文書群が格納されている。これら文書群の引用先文書番号１２ｂ０２は「doc050791」であり、同じである。一方で、これら３つの文書群の引用先箇所１２ｂ０３は、それぞれ「＜ｐ＞はじめに…」、「＜ｐ＞調査…」、「＜ｐ＞まとめ…」と異なる。これは、ひとつの引用先文書に複数の引用先箇所が含まれており、それぞれ別個に引用元文書が割り付けられて文書群が生成されたことを示している。 In the figure, the third, fourth, and fifth lines store document groups whose document group numbers 12b01 are “cl048903”, “cl048904”, and “cl048905”, respectively. The cited document number 12b02 of these document groups is “doc050791”, which is the same. On the other hand, the citation destinations 12b03 of these three document groups are different from “ Introduction ...”, “ Research ...”, and “ Summary ...”, respectively. This indicates that one citation destination document includes a plurality of citation destination portions, and each citation source document is assigned separately to generate a document group.

（文書群検索インデックス格納部１２ｃの構成）
図１０は、文書群検索インデックス格納部１２ｃの構成を示す図である。図１０に示すように、文書群検索インデックス格納部１２ｃは、文書群番号１２ｃ０１、及び特徴語１２ｃ０２から構成される。文書群番号１２ｃ０１は、インデックスに対応する文書群の番号を格納する。ここで文書群の番号とは、文書群格納部１２ｂに格納された文書群の番号を指す。特徴語１２ｃ０２には、文書群を構成する引用先文書の引用先箇所及び当該引用先を引用している引用元文書から抽出した特徴語を格納する。文書群からの特徴語の抽出は、「情報処理学会論文誌、Vol.38, No.2, pp.299-309，(1997)」などに記載の文書から特徴語を抽出する公知の技術を適用することにより実装可能である。なお、図１０中、特徴語１２ｃ０２には、特徴語の文字列のみを記載したが、これに限らない。文書群における特徴語の出現頻度あるいは重要度を示す値などとともに格納してもよい。 (Configuration of Document Group Search Index Storage Unit 12c)
FIG. 10 is a diagram showing the configuration of the document group search index storage unit 12c. As shown in FIG. 10, the document group search index storage unit 12c includes a document group number 12c01 and a feature word 12c02. The document group number 12c01 stores the number of the document group corresponding to the index. Here, the document group number refers to the document group number stored in the document group storage unit 12b. In the feature word 12c02, the feature word extracted from the citation destination part of the citation destination document constituting the document group and the citation source document quoting the citation destination are stored. Extraction of feature words from a document group is performed by a known technique for extracting feature words from documents described in “Information Processing Society Journal, Vol.38, No.2, pp.299-309, (1997)”. It can be implemented by applying. In FIG. 10, only the character string of the feature word is described in the feature word 12c02, but this is not restrictive. You may store with the value etc. which show the appearance frequency or importance of the feature word in a document group.

図１０中１行目には、文書群番号１２ｃ０１が「cl048901」である文書群の検索インデックスが格納されている。具体的には特徴語１２ｃ０２に当該文書群の特徴語として「高血圧」、「診断」、「肺機能」等が格納されている。 The first row in FIG. 10 stores a search index of a document group whose document group number 12c01 is “cl048901”. Specifically, “high blood pressure”, “diagnosis”, “pulmonary function”, and the like are stored in the feature word 12c02 as feature words of the document group.

（文書検索システムのモジュール構成：テンポラリデータ部）
次に、図中にテープ形で示されるテンポラリデータ部について説明する。図７に示すように、文書検索システムのテンポラリデータ部は、検索キーデータ１３ａ、文書群検索結果データ１３ｂ、及び文書本文データ１３ｃの３つがある。 (Module structure of document search system: temporary data section)
Next, the temporary data portion shown in the form of a tape in the figure will be described. As shown in FIG. 7, there are three temporary data portions of the document search system: search key data 13a, document group search result data 13b, and document body data 13c.

検索キーデータ１３ａは、ユーザが文書検索クライアント２にて入力した文書を検索するための検索キーを格納する。文書群検索結果データ１３ｂは、文書群検索部１１ｃが検索キーデータ１３ａに基づいて文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索した結果を格納する。具体的には、検索キーデータ１３ａに適合する文書群の番号のリストがそれぞれ検索キーとの適合度とともに格納される。文書本文データ１３ｃは、検索結果に含まれる文書群において、ユーザが指定した文書の本文を格納する。 The search key data 13 a stores a search key for searching for a document input by the user using the document search client 2. The document group search result data 13b stores the result of the document group search unit 11c searching the document group search index stored in the document group search index storage unit 12c based on the search key data 13a. Specifically, a list of document group numbers that match the search key data 13a is stored together with the matching degree with the search key. The document text data 13c stores the text of the document specified by the user in the document group included in the search result.

ここで、各テンポラリデータ部のうち、検索キーデータ１３ａ及び文書群検索結果データ１３ｂの一例を、図１１〜図１２を用いて説明する。 Here, an example of the search key data 13a and the document group search result data 13b among the temporary data portions will be described with reference to FIGS.

（検索キーデータ１３ａの一例）
図１１は、検索キーデータ１３ａの一例を示す図である。ユーザが入力した検索キーとして、「高血圧症」、「肺機能」、「血管」が格納されている。なお図１１では、検索キーとして単語のみを示したが、各単語に重みを示す数値が付与されていてもよい。 (Example of search key data 13a)
FIG. 11 is a diagram illustrating an example of the search key data 13a. As search keys input by the user, “hypertension”, “pulmonary function”, and “blood vessel” are stored. In FIG. 11, only the word is shown as the search key, but a numerical value indicating the weight may be given to each word.

（文書群検索結果データ１３ｂの一例）
図１２は、文書群検索結果データ１３ｂの一例を示す図である。図中、検索キーにマッチする文書群の番号のリストとして「cl050989」、「cl029845」等が格納されている。なお、図１２では文書群番号と共に文書群の検索キーに対する適合度を示す数値「９．３２８」などが格納してあるが、これに限らない。検索キーに適合した順位などでもよい。 (Example of document group search result data 13b)
FIG. 12 is a diagram illustrating an example of the document group search result data 13b. In the figure, “cl050989”, “cl029845”, and the like are stored as a list of document group numbers that match the search key. In FIG. 12, the numerical value “9.328” indicating the degree of matching of the document group with the search key is stored together with the document group number. However, the present invention is not limited to this. The order suitable for the search key may be used.

（文書検索システムの処理手順）
図１３は、本発明の第一の実施例における文書検索システムの処理手順を示すフローチャートである。 (Processing procedure of document retrieval system)
FIG. 13 is a flowchart showing the processing procedure of the document search system in the first embodiment of the present invention.

図１３に示すように、文書検索システムは、まず文書検索サーバ１において文書格納部１２ａに格納された検索対象の文書から文書群を生成して文書群格納部１２ｂに格納し（１０１）、前記格納された文書群から特徴語を抽出して文書群の検索インデックスを生成して文書群検索インデックス格納部１２ｃに格納する（１０２）。ユーザが文書検索クライアント２において文書検索の終了を指示すれば処理を終了し（１０３）、そうでなければ検索キーの入力を受け付けて検索キーデータ１３ａに格納して文書検索サーバ１に送信する（１０４）。文書検索サーバ１は文書検索クライアント２が送信した検索キーデータ１３ａを受信して、前記検索キーデータ１３ａに格納されていた検索キーに基づいて文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索して結果を文書群検索結果データ１３ｂに格納して文書検索クライアント２に送信する（１０５）。文書検索クライアント２は前記文書検索サーバ１から送信された文書群検索結果データ１３ｂを受信して、格納されている文書群を適合度のランクと共に表示する。なお、ステップ１０１及びステップ１０２の処理は、文書検索サーバ１で一度実行しておけばよく、検索のたびに実行する必要はない。 As shown in FIG. 13, the document search system first generates a document group from a search target document stored in the document storage unit 12a in the document search server 1 and stores it in the document group storage unit 12b (101). A feature word is extracted from the stored document group, a search index for the document group is generated, and stored in the document group search index storage unit 12c (102). If the user instructs the document search client 2 to end the document search, the process ends (103). Otherwise, the input of the search key is accepted, stored in the search key data 13a, and transmitted to the document search server 1 ( 104). The document search server 1 receives the search key data 13a transmitted from the document search client 2, and searches for the document group stored in the document group search index storage unit 12c based on the search key stored in the search key data 13a. The index is searched and the result is stored in the document group search result data 13b and transmitted to the document search client 2 (105). The document search client 2 receives the document group search result data 13b transmitted from the document search server 1, and displays the stored document group together with the rank of the matching level. Note that the processing of step 101 and step 102 need only be executed once by the document search server 1 and does not need to be executed each time a search is performed.

ここで、上記処理のうち文書群生成ステップ１０１、及び文書群検索インデックス生成ステップ１０２の処理について図１４及び図１５に示す詳細なフローチャートを用いて説明する。 Here, the processing of the document group generation step 101 and the document group search index generation step 102 in the above processing will be described with reference to detailed flowcharts shown in FIGS.

（文書群生成ステップ１０１の詳細な処理手順）
図１４は、文書群生成ステップ１０１の詳細な処理手順を示すフローチャートである。図１４に示すように、文書群生成ステップ１０１は、文書群を識別する番号を初期化し（１０１０１）、文書群格納部１２ｂのインデックスを示す変数ｉに０を、文書格納部１２ａのインデックスを示す変数ｊに１をセットして（１０１０２）、文書格納部１２ａのｊ番目の文書を取得する（１０１０３）。次に文書格納部１２ａを参照し、ｊ番目の文書番号１２ａ０１の値が引用先文書番号１２ａ０３に格納されている文書を引用元文書として取得する（１０１０４）。取得した引用元文書、すなわちｊ番目の文書を引用先とする文書の数が０よりも大きければステップ１０１０６に進み、それ以外はステップ１０１１２に進む（１０１０５）。ステップ１０１０６では文書群番号を追加し、文書格納部１２ａのｊ番目のテキスト１２ａ０２に格納されたテキストを、タグを利用して文書の構造ごとに分割する（１０１０７）。 (Detailed processing procedure of document group generation step 101)
FIG. 14 is a flowchart showing a detailed processing procedure of the document group generation step 101. As shown in FIG. 14, the document group generation step 101 initializes a number for identifying a document group (10101), sets 0 to a variable i indicating an index of the document group storage unit 12b, and indicates an index of the document storage unit 12a. The variable j is set to 1 (10102), and the jth document in the document storage unit 12a is acquired (10103). Next, the document storage unit 12a is referred to, and the document in which the value of the jth document number 12a01 is stored in the citation destination document number 12a03 is acquired as a citation source document (10104). If the number of acquired citation documents, that is, the document with the jth document as a citation destination is larger than 0, the process proceeds to step 10106, and otherwise, the process proceeds to step 10112 (10105). In step 10106, a document group number is added, and the text stored in the jth text 12a02 of the document storage unit 12a is divided for each document structure using a tag (10107).

次に、引用元文書の数を示す変数ｋに１をセットし（１０１０８）、ステップ１０１０４において取得した引用元文書のうちｋ番目の文書をステップ１０１０７において分割された引用先文書に割り付ける（１０１０９）。より詳細には、引用先文書がｘ個の部分に分割されたとき、ｋ番目の引用元文書とｘ個の部分との類似度をそれぞれ計算し、ｘ個の部分のうち類似度が最も高い部分にｋ番目の引用元文書を割り付ける。引用元文書と分割された文書の部分との類似度は、ベクトル空間法など文書間の類似度を求める公知の技術を適用することにより実現することができる。次にｋを１増やして（１０１１０）、ｋがステップ１０１０４において取得した引用元文書の数より大きくなればステップ１０１１２に進み、それ以外はステップ１０１０９に戻る（１０１１１）。ステップ１０１１２では、ｊを１増やして（１０１１２）、ｊが文書格納部１２ａに格納された文書数より大きくなればリターン、それ以外はステップ１０１０３に戻る（１０１１３）。 Next, 1 is set to a variable k indicating the number of citation source documents (10108), and the kth document among the citation source documents acquired in step 10104 is assigned to the citation destination document divided in step 10107 (10109). . More specifically, when the cited document is divided into x parts, the degree of similarity between the kth source document and the x parts is calculated, and the degree of similarity is the highest among the x parts. Assign the kth source document to the part. The similarity between the citation source document and the divided document portion can be realized by applying a known technique for obtaining the similarity between documents, such as a vector space method. Next, k is incremented by 1 (10110), and if k becomes larger than the number of the citation source documents acquired in step 10104, the process proceeds to step 10112, and otherwise, the process returns to step 10109 (10111). In step 10112, j is incremented by 1 (10112), and if j becomes larger than the number of documents stored in the document storage unit 12a, the process returns, otherwise returns to step 10103 (10113).

（文書群検索インデックス生成ステップ１０２の詳細な処理手順）
図１５は、文書群検索インデックス生成ステップ１０２の詳細な処理手順を示すフローチャートである。図１５に示すように、文書群検索インデックス生成ステップ１０２は、文書群検索インデックス格納部１２ｃのインデックスを示す変数ｍ及び文書群格納部１２ｂのインデックスを示す変数ｎにそれぞれ０をセットし（１０２０１）、ｎを１増やして（１０２０２）、ｎが文書群格納部ｂに格納された文書群数よりも大きければリターン、それ以外はステップ１０２０４に進む（１０２０３）。ステップ１０２０４ではｍを１増やし、文書群格納部１２ｂのｎ番目の文書群番号１２ｂ０１に格納された文書群番号を、文書検索インデックス格納部１２ｃのｍ番目の文書群番号１２ｃ０１に格納し（１０２０５）、文書群格納部１２ｂのｎ番目の引用先箇所１２ｂ０３に格納された文字列から特徴語を抽出し（１０２０６）、さらに文書群格納部１２ｂのｎ番目の引用元文書番号１２ｂ０４を参照して、引用元文書を文書格納部１２ａから取得してそれぞれ特徴語を抽出し（１０２０７）、抽出した特徴語を文書群検索インデックス格納部１２ｃのｍ番目の特徴語１２ｃ０２に格納して（１０２０８）、ステップ１０２０２に戻る。 (Detailed processing procedure of document group search index generation step 102)
FIG. 15 is a flowchart showing a detailed processing procedure of the document group search index generation step 102. As shown in FIG. 15, the document group search index generation step 102 sets 0 to a variable m indicating an index of the document group search index storage unit 12c and a variable n indicating an index of the document group storage unit 12b (10201). , N is incremented by 1 (10202). If n is larger than the number of document groups stored in the document group storage unit b, the process returns. Otherwise, the process proceeds to step 10204 (10203). In step 10204, m is incremented by 1, and the document group number stored in the nth document group number 12b01 of the document group storage unit 12b is stored in the mth document group number 12c01 of the document search index storage unit 12c (10205). The feature word is extracted from the character string stored in the nth citation destination location 12b03 of the document group storage unit 12b (10206), and the nth citation source document number 12b04 of the document group storage unit 12b is referred to. The citation source document is acquired from the document storage unit 12a and each feature word is extracted (10207), and the extracted feature word is stored in the mth feature word 12c02 of the document group search index storage unit 12c (10208). Return to 10202.

以上、フローチャートを用いて本発明の第一の実施例にかかる文書検索システムの文書群生成ステップ１０１、及び文書群検索インデックス生成ステップ１０２の詳細な処理手順について説明した。 The detailed processing procedure of the document group generation step 101 and the document group search index generation step 102 of the document search system according to the first embodiment of the present invention has been described above using the flowchart.

次に、上記処理のうち検索結果表示ステップ１０６の処理について、図１６及び図１７に示す検索結果表示画面を用いて説明する。 Next, the process of the search result display step 106 among the above processes will be described using the search result display screen shown in FIGS.

（文書検索システムにおける検索結果表示画面の一例）
図１６は、本発明の第一の実施例にかかる文書検索システムの文書検索クライアント２における文書群検索結果表示画面の一例を示す図である。図に示すように、文書検索クライアント２の文書検群索結果表示画面は、検索キー入力エリア４０１、検索ボタン４０２、文書群検索結果表示エリア４０３、及び文書件数表示エリア４０４から構成される。 (Example of search result display screen in document search system)
FIG. 16 is a diagram showing an example of a document group search result display screen in the document search client 2 of the document search system according to the first embodiment of the present invention. As shown in the figure, the document search result display screen of the document search client 2 includes a search key input area 401, a search button 402, a document group search result display area 403, and a document number display area 404.

検索キー入力エリア４０１は、ユーザが検索キーを入力するエリアである。図１６の例では、検索キーとして「高血圧症」、「肺機能」、「血管」が入力された状態が表示してある。検索ボタン４０２は、ユーザが検索の実行を指示するボタンである。文書群検索結果表示エリア４０３は、文書検索クライアント２が文書群検索結果データ１３ｂを検索キーとの適合度の高い順に表示するエリアである。文書件数表示エリア４０４は、文書群検索結果表示エリア４０３に表示した文書群に含まれる文書数やチェックした文書群の数等を表示するエリアである。 A search key input area 401 is an area where the user inputs a search key. In the example of FIG. 16, a state in which “hypertension”, “pulmonary function”, and “blood vessel” are input as search keys is displayed. A search button 402 is a button for the user to instruct execution of the search. The document group search result display area 403 is an area where the document search client 2 displays the document group search result data 13b in descending order of the degree of matching with the search key. The document number display area 404 is an area for displaying the number of documents included in the document group displayed in the document group search result display area 403, the number of checked document groups, and the like.

ここで、文書群検索結果表示エリア４０３、及び文書件数表示エリア４０４について詳細に説明する。 Here, the document group search result display area 403 and the document number display area 404 will be described in detail.

（文書群検索結果表示エリア４０３の詳細な説明）
文書群検索結果表示エリア４０３は、文書群検索結果ランク４０３０１、引用先文書４０３０２、引用先箇所４０３０２、及び引用元文書４０３０４から構成される。文書群検索結果ランク４０３０１には、当該文書群と検索キーとの適合度のランクを表示する。引用先文書４０３０２には、文書群における引用先文書を表示する。具体的には引用先文書のタイトルや著者名あるいは文書の一部分を表示する。引用先箇所４０３０３には、引用先文書中当該文書群において引用元文書から割り付けられている部分を表示する。引用元文書４０３０４には、文書群において前記引用先文書の同一の部分を引用している引用元文書を表示する。具体的には、引用元文書のタイトルや著者名あるいは文書の一部分を表示する。 (Detailed description of the document group search result display area 403)
The document group search result display area 403 includes a document group search result rank 40301, a citation destination document 40302, a citation destination portion 40302, and a citation source document 40304. The document group search result rank 40301 displays the rank of the degree of matching between the document group and the search key. The cited document 40302 displays a cited document in the document group. Specifically, the title, author name, or part of the document is displayed. The citation destination portion 40303 displays a portion allocated from the citation source document in the document group in the citation destination document. The citation source document 40304 displays a citation source document that cites the same portion of the citation destination document in the document group. Specifically, the title or author name of the citation source document or a part of the document is displayed.

図１６中、検索キーとの適合度が２番目に高かった文書群の引用先文書４０３０２には引用先文書のタイトル「肺機能におよぼす高血圧…」が表示されており、当該引用先文書中の同じ引用箇所を引用している引用元文書４０３０４として引用元文書のタイトル「肺換気障害と高血圧症…」、「慢性肺疾患患者の…」が表示されている。 In FIG. 16, the cited document 40302 of the document group having the second highest matching degree with the search key displays the title of the cited document “high blood pressure on lung function ...”. As the citation document 40304 quoting the same citation, the titles of the citation documents “pulmonary ventilation disorder and hypertension ...” and “chronic lung disease patient…” are displayed.

（文書件数表示エリア４０４の詳細な説明）
文書件数表示エリア４０４は、表示件数４０４０１、参照見積４０４０２、及び参照済４０４０３から構成される。 (Detailed description of the document count display area 404)
The document count display area 404 includes a display count 40401, a reference estimate 40402, and a reference 40403.

文書群検索結果表示エリア４０３に表示されている文書群に含まれている文書は文書群ごとに異なる。そのため表示件数４０４０１には、文書群検索結果表示エリア４０３に表示されている文書群に含まれている文書数を表示する。図の例を見ると、表示件数４０４０１には検索結果に含まれる文書群が「２０」件であり、２０件の文書群に含まれている文書(引用先文書と引用元文書)の数は全部で「４２」件であることが示されている。より詳細には、文書群検索結果表示エリア４０３に表示されている文書群数がＮ、ｉ番目の文書群に含まれる文書数がDocnum(i)であるとき、文書群に含まれる文書数Ｍは、Ｍ＝Docnum(1)＋Docnum(2)＋Docnum(3)＋‥‥＋Docnum(N)である。 The documents included in the document group displayed in the document group search result display area 403 are different for each document group. Therefore, the display number 40401 displays the number of documents included in the document group displayed in the document group search result display area 403. In the example of the figure, the display number 40401 includes “20” document groups included in the search result, and the number of documents (cited document and cited document) included in the 20 document groups is as follows. It is shown that there are “42” cases in total. More specifically, when the number of document groups displayed in the document group search result display area 403 is N, and the number of documents included in the i-th document group is Docnum (i), the number of documents M included in the document group. Is M = Docnum (1) + Docnum (2) + Docnum (3) +... + Docnum (N).

参照見積４０４０２には、文書をｘ件参照するには、文書文検索結果表示エリア４０３に表示されている文書群を上位何件まで参照すればよいかを表示する。具体的には、参照見積４０４０２には、参照する文書群の数、又は含まれる文書の数のいずれかをユーザが入力することができる。図１６中、ユーザが「含まれる文書数」の数値に「２０」を入力すると、システムは合計２０件の文書を参照するには、文書群を上位「１２」件参照すればよいという数値を表示している。 The reference estimate 40402 displays how many documents in the document text search result display area 403 should be referred to in order to refer to x documents. Specifically, in the reference estimate 40402, the user can input either the number of document groups to be referenced or the number of documents included. In FIG. 16, when the user inputs “20” as the numerical value of “the number of documents included”, the system sets the numerical value that the top “12” documents should be referred to in order to refer to a total of 20 documents. it's shown.

参照済４０４０３は、ユーザが文書群検索結果において参照した文書群の数と当該文書群に含まれる文書の数を表示する。図の例では、参照済４０４０３に、文書群検索結果においてユーザが参照した文書群の数が「３」件、当該参照した文書群に含まれる文書の数が「１０」件であることが表示されている。 Referenced 40403 displays the number of document groups referred to by the user in the document group search result and the number of documents included in the document group. In the example of the figure, the reference 40403 indicates that the number of document groups referred to by the user in the document group search result is “3” and the number of documents included in the referenced document group is “10”. Has been.

（文書検索システムにおける文書群表示画面の一例）
図１７は、本発明の第一の実施例にかかる文書検索システムの文書検索クライアント２における文書群検索結果表示画面の一例を示す図であり、図１６に示す文書群検索結果表示画面においてユーザが選択した文書群を表示している画面の一例である。 (Example of document group display screen in the document search system)
FIG. 17 is a diagram showing an example of a document group search result display screen in the document search client 2 of the document search system according to the first embodiment of the present invention. In the document group search result display screen shown in FIG. It is an example of the screen which displays the selected document group.

図に示すように、文書検索クライアント２の文書群表示画面は、引用先箇所５０１、引用先文書５０２、及び引用元文書５０３から構成される。引用先箇所５０１は、文書群を構成する引用先文書の部分であり、かつ当該文書群に含まれる引用元文書が割り付けられている部分を表示する。引用先文書５０２は、引用先箇所を含む引用先文書を表示する。引用元文書５０３は、引用先箇所に割り付けられている引用元文書を表示する。 As shown in the figure, the document group display screen of the document search client 2 includes a citation location 501, a citation destination document 502, and a citation source document 503. The citation location 501 is a portion of the citation destination document constituting the document group, and displays a portion to which the citation source document included in the document group is assigned. The cited document 502 displays a cited document including a cited part. The citation source document 503 displays the citation source document assigned to the citation destination location.

図１７中、引用先箇所５０１には「２．対象と方法今年１年、当病院において…」が表示されている。引用先文書５０２には、引用先箇所を含む文書「肺機能におよぼす高血圧…」が表示されており、文書中、引用先箇所５０１に対応する部分が強調表示されている。引用元文書５０３には、引用先箇所５０１を引用している文書が複数表示されており、文書中、引用先箇所５０１に出現する単語、「高血圧」や「高齢者」など、が強調表示されている。
以上、本発明の第一の実施例にかかる文書検索システムについて説明した。 In FIG. 17, “2. Object and method at this hospital for one year this year” is displayed in the reference place 501. In the cited document 502, the document “high blood pressure affecting pulmonary function ...” including the cited location is displayed, and the portion corresponding to the cited location 501 is highlighted in the document. In the citation document 503, a plurality of documents quoting the citation location 501 are displayed, and the words appearing in the citation location 501 such as “high blood pressure” and “elderly” are highlighted in the document. ing.
The document search system according to the first embodiment of the present invention has been described above.

［実施例２］
次に、本発明の第二の実施例にかかる文書検索システムについて図を用いて説明する。システムの全体構成及びシステムを構成するコンピュータのハードウェア構成は、先に説明した第一の実施例と同じであるので説明を省略する。 [Example 2]
Next, a document search system according to a second embodiment of the present invention will be described with reference to the drawings. Since the overall configuration of the system and the hardware configuration of the computer constituting the system are the same as those of the first embodiment described above, description thereof will be omitted.

（モジュール構成）
図１８は、本発明の第二の実施例にかかる文書検索システムのモジュール構成を示す図である。以下、図１８に示す各モジュールについて説明する。
（文書検索システムのモジュール構成：処理部）
文書検索システムの処理部には、文書検索サーバ１における文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、及び文書検索クライアント２におけるクライアント通信部１１ｅ、検索キー受付部１１ｆ、検索結果表示部１１ｇがある。 (Module configuration)
FIG. 18 is a diagram showing the module configuration of the document search system according to the second embodiment of the present invention. Hereinafter, each module shown in FIG. 18 will be described.
(Module configuration of document search system: processing section)
The processing unit of the document search system includes a document group generation unit 11a in the document search server 1, a document group search index generation unit 11b, a document group search unit 11c, a server communication unit 11d, and a client communication unit 11e in the document search client 2. There is a search key receiving part 11f and a search result display part 11g.

文書検索サーバ１における文書群生成部１１ａは、文書格納部１２ａに格納された文書から文書間の引用関係に基づき文書群を生成し、文書群格納部１２ｂに格納する。当該文書群生成部１１ａは、さらに引用先文書分割部１１ａ１及び文書割付部１１ａ２を備える。引用先文書分割部１１ａは、引用先文書タームクラスタリング部１１ａ１２を有し、引用関係にある文書の集合に含まれる引用先文書を、文書に含まれる単語をクラスタリングすることにより複数の内容に分割する。文書割付部１１ａ２は、引用元文書クラスタリング部１１ａ２１及び引用先文書クラスタ割付部１１ａ２２を備える。引用元文書クラスタリング部１１ａ２１は、引用関係にある文書の集合のうち引用元文書をクラスタリングし、文書のクラスタを生成する。引用元文書クラスタ割付部１１ａ２２は、生成した引用元文書のクラスタを分割した引用先文書の部分に割り付ける。 The document group generation unit 11a in the document search server 1 generates a document group based on the citation relationship between documents from the document stored in the document storage unit 12a, and stores the document group in the document group storage unit 12b. The document group generation unit 11a further includes a cited document division unit 11a1 and a document allocation unit 11a2. The cited document dividing unit 11a includes a cited document term clustering unit 11a12, and divides a cited document included in a set of documents having a citation relationship into a plurality of contents by clustering words included in the document. . The document allocation unit 11a2 includes a citation source document clustering unit 11a21 and a citation destination document cluster allocation unit 11a22. The citation source document clustering unit 11a21 clusters the citation source documents among a set of documents having a citation relationship, and generates a document cluster. The citation source document cluster assignment unit 11a22 assigns the generated citation source document cluster to the quoting destination document portion.

図１８に示す処理部のうち文書群生成部１１ａ以外の処理部、すなわち文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、クライアント通信部１１ｅ、検索キー受付部１１ｆ、及び検索結果表示部１１ｇは、先に説明した第一の実施例と同じであるので説明を省略する。 Among the processing units shown in FIG. 18, processing units other than the document group generation unit 11a, that is, a document group search index generation unit 11b, a document group search unit 11c, a server communication unit 11d, a client communication unit 11e, a search key reception unit 11f, and Since the search result display unit 11g is the same as that of the first embodiment described above, the description thereof is omitted.

（文書検索システムのモジュール構成：データ格納部）
次に、図中にドラム形で示されるデータ格納部であるが、図１８に示すようにデータ格納部は、文書格納部１２ａ、文書群格納部１２ｂ、及び文書群検索インデックス格納部１２ｃの３つがある。これらはすべて、先に説明した第一の実施例と同じであるので詳細な説明を省略する。 (Module configuration of document retrieval system: data storage unit)
Next, a data storage unit shown in the form of a drum in the figure is shown. As shown in FIG. 18, the data storage unit includes a document storage unit 12a, a document group storage unit 12b, and a document group search index storage unit 12c. There is one. Since these are all the same as the first embodiment described above, detailed description thereof is omitted.

（文書検索システムのモジュール構成：テンポラリデータ部）
また、図中にテープ形で示されるテンポラリデータ部は、図１８に示すように、検索キーデータ１３ａ、文書群検索結果データ１３ｂ、及び文書本文データ１３ｃの３つがある。これらはすべて、先に説明した第一の実施例と同じであるので詳細な説明を省略する。 (Module structure of document search system: temporary data section)
Further, as shown in FIG. 18, there are three temporary data portions shown in a tape form in the drawing: search key data 13a, document group search result data 13b, and document body data 13c. Since these are all the same as the first embodiment described above, detailed description thereof is omitted.

以上、本発明の第二の実施例にかかる文書検索システムのモジュール構成について説明した。 The module configuration of the document search system according to the second embodiment of the present invention has been described above.

（文書検索システムの処理手順）
本発明の第二の実施例における文書検索システムの処理手順は、図１３のフローチャートに示される第一の実施例における処理手順と同様である。すなわち、まず文書検索サーバ１において文書格納部１２ａに格納された検索対象の文書から文書群を生成して文書群格納部１２ｂに格納し（１０１）、格納された文書群から特徴語を抽出して文書群の検索インデックスを生成して文書群検索インデックス格納部１２ｃに格納する（１０２）。ユーザが文書検索クライアント２において文書検索の終了を指示すれば処理を終了し（１０３）、そうでなければ検索キーの入力を受け付けて検索キーデータ１３ａに格納して文書検索サーバ１に送信する（１０４）。文書検索サーバ１は、文書検索クライアント２が送信した検索キーデータ１３ａを受信して、検索キーデータ１３ａに格納されていた検索キーに基づいて文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索し、結果を文書群検索結果データ１３ｂに格納して文書検索クライアント２に送信する（１０５）。文書検索クライアント２は、文書検索サーバ１から送信された文書群検索結果データ１３ｂを受信して、格納されている文書群を適合度のランクと共に表示する。なお、ステップ１０１及びステップ１０２の処理は、文書検索サーバ１で一度実行しておけばよく、検索のたびに実行する必要はない。 (Processing procedure of document retrieval system)
The processing procedure of the document search system in the second embodiment of the present invention is the same as the processing procedure in the first embodiment shown in the flowchart of FIG. That is, first, a document group is generated from a search target document stored in the document storage unit 12a in the document search server 1 and stored in the document group storage unit 12b (101), and feature words are extracted from the stored document group. The document group search index is generated and stored in the document group search index storage unit 12c (102). If the user instructs the document search client 2 to end the document search, the process ends (103). Otherwise, the input of the search key is accepted, stored in the search key data 13a, and transmitted to the document search server 1 ( 104). The document search server 1 receives the search key data 13a transmitted from the document search client 2 and searches the document group search index storage unit 12c based on the search key stored in the search key data 13a. The index is searched, and the result is stored in the document group search result data 13b and transmitted to the document search client 2 (105). The document search client 2 receives the document group search result data 13b transmitted from the document search server 1, and displays the stored document group together with the suitability rank. Note that the processing of step 101 and step 102 need only be executed once by the document search server 1 and does not need to be executed each time a search is performed.

一方、本発明の第二の実施例では、上記処理のうち文書群生成ステップ１０１の詳細が第一の実施例と異なる。具体的には、本発明の第二の実施例における文書群生成ステップ１０１は、検索対象の文書集合から引用先文書と当該文書を引用している引用元文書を抽出し、引用先文書から単語を抽出してタームクラスタリングの技術を適用して関連する単語をまとめてクラスタを生成し、次に引用元文書に対して文書クラスタリングの技術を適用し文書のクラスタを生成して、先のタームクラスタに引用元文書のクラスタを割り付けることにより、文書群を生成する。
以上、本発明の第二の実施例にかかる文書検索システムについて説明した。 On the other hand, in the second embodiment of the present invention, the details of the document group generation step 101 in the above processing are different from those in the first embodiment. Specifically, the document group generation step 101 in the second embodiment of the present invention extracts the cited document and the cited document that cites the document from the search target document set, and extracts the word from the cited document. The term clustering technique is applied and related words are grouped to generate a cluster. Next, the document clustering technique is applied to the citation source document to generate a document cluster. A document group is generated by assigning a cluster of citation documents to.
The document search system according to the second embodiment of the present invention has been described above.

［実施例３］
次に、本発明の第三の実施例に係る文書検索システムについて図を用いて説明する。本発明の第三の実施例に係るシステムの全体構成及びシステムを構成するコンピュータのハードウェア構成は先に説明した本発明の第一の実施例と同じであるので説明を省略する。 [Example 3]
Next, a document search system according to a third embodiment of the present invention will be described with reference to the drawings. Since the overall configuration of the system according to the third embodiment of the present invention and the hardware configuration of the computer constituting the system are the same as those of the first embodiment of the present invention described above, description thereof will be omitted.

（モジュール構成）
図１９は、本発明の第三の実施例にかかる文書検索システムのモジュール構成を示す図である。以下、図１９に示す各モジュールについて説明する。
（文書検索システムのモジュール構成：処理部）
図１９中に示すように、文書検索システムの処理部には、文書検索サーバ１における文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、及び文書検索クライアント２におけるクライアント通信部１１ｅ、検索キー受付部１１ｆ、検索結果表示部１１ｇがある。 (Module configuration)
FIG. 19 is a diagram showing the module configuration of the document search system according to the third embodiment of the present invention. Hereinafter, each module shown in FIG. 19 will be described.
(Module configuration of document search system: processing section)
As shown in FIG. 19, the processing unit of the document search system includes a document group generation unit 11a, a document group search index generation unit 11b, a document group search unit 11c, a server communication unit 11d, and a document search in the document search server 1. The client 2 includes a client communication unit 11e, a search key receiving unit 11f, and a search result display unit 11g.

文書検索サーバ１における文書群生成部１１ａは、文書格納部１２ａに格納された文書から文書間の引用関係に基づき文書群を生成し、文書群格納部１２ｂに格納する。文書群生成部１１ａは、さらに被引用部分取得部１１ａ３及び文書まとめあげ部１１ａ４を有する。被引用部分取得部１１ａ３は、引用先文書において引用先箇所を取得する。文書まとめあげ部１１ａ４は、引用先箇所が同じ引用元文書をまとめて文書群を生成する。 The document group generation unit 11a in the document search server 1 generates a document group based on the citation relationship between documents from the document stored in the document storage unit 12a, and stores the document group in the document group storage unit 12b. The document group generation unit 11a further includes a cited part acquisition unit 11a3 and a document collection unit 11a4. The cited part obtaining unit 11a3 obtains a cited part in the cited document. The document summarizing unit 11a4 collects citation documents having the same citation destination part and generates a document group.

図１９に示す処理部のうち文書群生成部１１ａ以外の処理部、すなわち文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、クライアント通信部１１ｅ、検索キー受付部１１ｆ、及び検索結果表示部１１ｇは、先に説明した本発明の第一の実施例と同じであるので説明を省略する。 Among the processing units shown in FIG. 19, processing units other than the document group generation unit 11a, that is, the document group search index generation unit 11b, the document group search unit 11c, the server communication unit 11d, the client communication unit 11e, the search key reception unit 11f, and The search result display portion 11g is the same as that of the first embodiment of the present invention described above, and therefore the description thereof is omitted.

（文書検索システムのモジュール構成：データ格納部）
図中にドラム形で示されるデータ格納部は、図１９に示すように、文書格納部１２ａ、文書群格納部１２ｂ、及び文書群検索インデックス格納部１２ｃの３つがある。これらはすべて、先に説明した本発明の第一の実施例と同じであるので詳細な説明を省略する。 (Module configuration of document retrieval system: data storage unit)
As shown in FIG. 19, there are three data storage units shown in a drum shape in the figure: a document storage unit 12a, a document group storage unit 12b, and a document group search index storage unit 12c. Since these are all the same as the first embodiment of the present invention described above, detailed description thereof is omitted.

（文書検索システムのモジュール構成：テンポラリデータ部）
また、図中にテープ形で示されるテンポラリデータ部は、図１９に示すように、検索キーデータ１３ａ、文書群検索結果データ１３ｂ、及び文書本文データ１３ｃの３つがある。これらはすべて、先に説明した本発明の第一の実施例と同じであるので詳細な説明を省略する。 (Module structure of document search system: temporary data section)
Further, as shown in FIG. 19, there are three temporary data portions shown in a tape form in the figure: search key data 13a, document group search result data 13b, and document body data 13c. Since these are all the same as the first embodiment of the present invention described above, detailed description thereof is omitted.

以上、本発明の第三の実施例にかかる文書検索システムのモジュール構成について説明した。 The module configuration of the document search system according to the third embodiment of the present invention has been described above.

（文書検索システムの処理手順）
本発明の第三の実施例における文書検索システムの処理手順は、図１３のフローチャートに示される第一の実施例における処理手順と同様である。すなわち、まず文書検索サーバ１において文書格納部１２ａに格納された検索対象の文書から文書群を生成して文書群格納部１２ｂに格納し（１０１）、前記格納された文書群から特徴語を抽出して文書群の検索インデックスを生成して文書群検索インデックス格納部１２ｃに格納する（１０２）。ユーザが文書検索クライアント２において文書検索の終了を指示すれば処理を終了し（１０３）、そうでなければ検索キーの入力を受け付けて検索キーデータ１３ａに格納して文書検索サーバ１に送信する（１０４）。文書検索サーバ１は、文書検索クライアント２が送信した検索キーデータ１３ａを受信して、検索キーデータ１３ａに格納されていた検索キーに基づいて文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索し、結果を文書群検索結果データ１３ｂに格納して文書検索クライアント２に送信する（１０５）。文書検索クライアント２は、文書検索サーバ１から送信された文書群検索結果データ１３ｂを受信して、格納されている文書群を適合度のランクと共に表示する。なお、ステップ１０１及びステップ１０２の処理は、文書検索サーバ１で一度実行しておけばよく、検索のたびに実行する必要はない。 (Processing procedure of document retrieval system)
The processing procedure of the document search system in the third embodiment of the present invention is the same as the processing procedure in the first embodiment shown in the flowchart of FIG. That is, first, a document group is generated from a search target document stored in the document storage unit 12a in the document search server 1 and stored in the document group storage unit 12b (101), and feature words are extracted from the stored document group. Then, a search index for the document group is generated and stored in the document group search index storage unit 12c (102). If the user instructs the document search client 2 to end the document search, the process ends (103). Otherwise, the input of the search key is accepted, stored in the search key data 13a, and transmitted to the document search server 1 ( 104). The document search server 1 receives the search key data 13a transmitted from the document search client 2 and searches the document group search index storage unit 12c based on the search key stored in the search key data 13a. The index is searched, and the result is stored in the document group search result data 13b and transmitted to the document search client 2 (105). The document search client 2 receives the document group search result data 13b transmitted from the document search server 1, and displays the stored document group together with the suitability rank. Note that the processing of step 101 and step 102 need only be executed once by the document search server 1 and does not need to be executed each time a search is performed.

本発明の第三の実施例では、上記処理のうち文書群生成ステップ１０１の詳細が第一の実施例と異なる。具体的には、本発明の第三の実施例において文書群生成ステップは、検索対象の文書集合から引用先文書と当該文書を引用している引用元文書を抽出して、引用先文書から引用先箇所を取得し、当該取得した引用先箇所を引用している引用元文書をまとめて文書群を生成する。
以上、本発明の第三の実施例にかかる文書検索システムについて説明した。 In the third embodiment of the present invention, the details of the document group generation step 101 in the above processing are different from those in the first embodiment. Specifically, in the third embodiment of the present invention, the document group generation step extracts the cited document and the cited document that cites the document from the set of documents to be searched, and extracts the cited document from the cited document. The destination part is acquired, and a document group is generated by collecting the citation source documents quoting the acquired citation destination part.
The document search system according to the third embodiment of the present invention has been described above.

［実施例４］
次に、本発明の第四の実施例に係る文書検索システムについて図を用いて説明する。発明の第四の実施例に係るシステムの全体構成及びシステムを構成するコンピュータのハードウェア構成は、先に説明した本発明の第一の実施例と同じであるので説明を省略する。 [Example 4]
Next, a document search system according to a fourth embodiment of the present invention will be described with reference to the drawings. Since the overall configuration of the system according to the fourth embodiment of the invention and the hardware configuration of the computer constituting the system are the same as those of the first embodiment of the present invention described above, description thereof will be omitted.

（モジュール構成）
図２０は、本発明の第四の実施例にかかる文書検索システムのモジュール構成を示す図である。以下、図２０に示す各モジュールについて説明する。 (Module configuration)
FIG. 20 is a diagram showing the module configuration of the document search system according to the fourth embodiment of the present invention. Hereinafter, each module shown in FIG. 20 will be described.

（文書検索システムのモジュール構成：処理部）
図２０に示すように、文書検索システムの処理部には、文書検索サーバ１における文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、文書検索部１１ｈ、及び文書検索クライアント２におけるクライアント通信部１１ｅ、検索キー受付部１１ｆ、検索結果表示部１１ｇがある。これらのうち文書群生成部１１ａ、文書群検索インデックス生成部１１ｂ、文書群検索部１１ｃ、サーバ通信部１１ｄ、クライアント通信部１１ｅ、検索キー受付部１１ｆ、及び検索結果表示部１１ｇは、先に説明した本発明の第一の実施例と同じであるので説明を省略する。 (Module configuration of document search system: processing section)
As shown in FIG. 20, the processing unit of the document search system includes a document group generation unit 11a, a document group search index generation unit 11b, a document group search unit 11c, a server communication unit 11d, and a document search unit 11h in the document search server 1. And a client communication unit 11e, a search key receiving unit 11f, and a search result display unit 11g in the document search client 2. Among these, the document group generation unit 11a, the document group search index generation unit 11b, the document group search unit 11c, the server communication unit 11d, the client communication unit 11e, the search key reception unit 11f, and the search result display unit 11g will be described first. Since this is the same as the first embodiment of the present invention, the description is omitted.

文書検索部１１ｈは、検索キーデータ１３ａに格納された検索キーに基づき文書検索インデックス格納部１２ｄに格納された文書検索インデックスを検索し、その結果を文書検索結果データ１３ｄに格納する。文書検索部１１ｈは、一般的な文書検索技術を適用することにより実装可能であるので詳細な説明は省略する。 The document search unit 11h searches the document search index stored in the document search index storage unit 12d based on the search key stored in the search key data 13a, and stores the result in the document search result data 13d. Since the document search unit 11h can be implemented by applying a general document search technique, detailed description thereof is omitted.

（文書検索システムのモジュール構成：データ格納部）
図中にドラム形で示されるデータ格納部は、図２０に示すように、文書格納部１２ａ、文書群格納部１２ｂ、文書群検索インデックス格納部１２ｃ、及び文書検索インデックス格納部１２ｄの４つがある。これらのうち、文書格納部１２ａ、文書群格納部１２ｂ、及び文書群検索インデックス格納部１２ｃは、先に説明した本発明の第一の実施例と同じであるので詳細な説明を省略する。 (Module configuration of document retrieval system: data storage unit)
As shown in FIG. 20, there are four data storage units shown in drum form in the figure: a document storage unit 12a, a document group storage unit 12b, a document group search index storage unit 12c, and a document search index storage unit 12d. . Among these, the document storage unit 12a, the document group storage unit 12b, and the document group search index storage unit 12c are the same as those in the first embodiment of the present invention described above, and thus detailed description thereof is omitted.

文書検索インデックス１２ｄは、文書格納部１２ａに格納されている文書を検索するためのインデックスを格納するエリアである。文書群検索インデックス格納部１２ｃに格納されたインデックスが、複数の文書から構成される文書群をひとつのまとまりとして検索するための文書群検索インデックスであるのに対して、文書検索インデックス１２ｄに格納されたインデックスは、個々の文書を検索するための文書検索インデックスであるという点で異なる。なお、文書検索インデックスは、一般的な文書検索技術を適用することにより生成可能である。具体的には、各文書から特徴語を抽出してその出現頻度とともに記録するなどの公知の手段を適用すればよい。よって、その生成手段について詳細な説明は省略する。 The document search index 12d is an area for storing an index for searching for a document stored in the document storage unit 12a. The index stored in the document group search index storage unit 12c is a document group search index for searching a document group composed of a plurality of documents as one unit, whereas it is stored in the document search index 12d. The index differs in that it is a document search index for searching individual documents. The document search index can be generated by applying a general document search technique. Specifically, known means such as extracting feature words from each document and recording them together with their appearance frequencies may be applied. Therefore, detailed description of the generation means is omitted.

（文書検索インデックス格納部１２ｄの構成）
図２１は、文書検索インデックス格納部１２ｄの構成を示す図である。図２１に示すように、文書検索インデックス格納部１２ｄは、文書番号１２ｄ０１、及び特徴語１２ｄ０２から構成される。文書番号１２ｄ０１はインデックスに対応する文書の番号を格納する。ここで文書の番号とは、文書格納部１２ａに格納された文書の番号を指す。特徴語１２ｄ０２には、文書から抽出した特徴語を格納する。文書からの特徴語の抽出は、文書群の特徴語同様、文書から特徴語を抽出する公知の技術を適用することにより実装可能である。また、図１０と同様に、図２１中の特徴語１２ｄ０２には、特徴語の文字列のみを記載したが、これに限らない。文書における特徴語の出現頻度あるいは重要度を示す値などとともに格納してもよい。 (Configuration of Document Search Index Storage Unit 12d)
FIG. 21 is a diagram showing the configuration of the document search index storage unit 12d. As shown in FIG. 21, the document search index storage unit 12d includes a document number 12d01 and a feature word 12d02. The document number 12d01 stores the document number corresponding to the index. Here, the document number refers to the number of the document stored in the document storage unit 12a. A feature word extracted from the document is stored in the feature word 12d02. Extraction of feature words from a document can be implemented by applying a known technique for extracting feature words from a document as well as feature words of a document group. Similarly to FIG. 10, only the character string of the feature word is described in the feature word 12d02 in FIG. 21, but the present invention is not limited to this. You may store with the value etc. which show the appearance frequency or importance of the feature word in a document.

図２１中１行目には、文書番号１２ｄ０１が「doc050789」である文書の検索インデックスが格納されている。具体的には特徴語１２ｄ０２に当該文書の特徴語として「肺機能」、「男性」、「症例」などが格納されている。 The first line in FIG. 21 stores a search index of a document whose document number 12d01 is “doc050789”. Specifically, “lung function”, “male”, “case”, and the like are stored in the feature word 12d02 as feature words of the document.

（文書検索システムのモジュール構成：テンポラリデータ部）
図中にテープ形で示されるテンポラリデータ部は、図２０に示すように、検索キーデータ１３ａ、文書群検索結果データ１３ｂ、文書本文データ１３ｃ、及び文書検索結果データ１３ｄの４つがある。これらのうち、検索キーデータ１３ａ、文書群検索結果データ１３ｂ、及び文書本文データ１３ｃは、先に説明した本発明の第一の実施例と同じであるので詳細な説明を省略する。 (Module structure of document search system: temporary data section)
As shown in FIG. 20, there are four temporary data portions shown in a tape form in the figure: search key data 13a, document group search result data 13b, document body data 13c, and document search result data 13d. Among these, the search key data 13a, the document group search result data 13b, and the document body data 13c are the same as those of the first embodiment of the present invention described above, and thus detailed description thereof is omitted.

文書検索結果データ１３ｄは、文書検索部１１ｈが前記検索キーデータ１３ａに基づいて文書検索インデックス格納部１２ｄに格納された文書検索インデックスを検索した結果を格納する。具体的には、検索キーデータ１３ａに適合する文書の番号のリストがそれぞれ検索キーとの適合度とともに格納される。 The document search result data 13d stores the result of the document search unit 11h searching the document search index stored in the document search index storage unit 12d based on the search key data 13a. More specifically, a list of document numbers that match the search key data 13a is stored together with the matching degree with the search key.

（文書検索結果データ１３ｄの一例）
図２２は、文書検索結果データ１３ｄの一例を示す図である。図中、検索キーにマッチする文書の番号のリストとして「doc053028」、「doc039402」等が格納されている。なお図２２の例では、文書群の検索キーに対する適合度を示す数値「５．７９８」などが格納してあるが、これに限らない。検索キーに適合した順位などでもよい。 (Example of document search result data 13d)
FIG. 22 is a diagram illustrating an example of the document search result data 13d. In the figure, “doc053028”, “doc039402”, and the like are stored as a list of document numbers that match the search key. In the example of FIG. 22, a numerical value “5.798” indicating the degree of matching of the document group with respect to the search key is stored, but the present invention is not limited to this. The order suitable for the search key may be used.

以上、本発明の第四の実施例にかかる文書検索システムのモジュール構成について説明した。 The module configuration of the document search system according to the fourth embodiment of the present invention has been described above.

（文書検索システムの処理手順）
図２３は、本発明の第四の実施例における文書検索システムの処理手順を示すフローチャートである。 (Processing procedure of document retrieval system)
FIG. 23 is a flowchart showing the processing procedure of the document search system in the fourth embodiment of the present invention.

図２３に示すように、文書検索システムは、まず文書検索サーバ１において文書格納部１２ａに格納された検索対象の文書から文書群を生成して文書群格納部１２ｂに格納し（２０１）、前記格納された文書群から特徴語を抽出して文書群の検索インデックスを生成して文書群検索インデックス格納部１２ｃに格納する（２０２）。同様に、文書格納部１２ａに格納された文書から特徴語を抽出して文書の検索インデックスを生成し、文書検索インデックス格納部１２ｄに格納する（２０３）。ユーザが文書検索クライアント２において文書検索の終了を指示すれば処理を終了し（２０４）、そうでなければ検索キーの入力を受け付けて検索キーデータ１３ａに格納し、文書検索サーバ１に送信する（２０５）。 As shown in FIG. 23, the document search system first generates a document group from a search target document stored in the document storage unit 12a in the document search server 1 and stores it in the document group storage unit 12b (201). A feature word is extracted from the stored document group to generate a search index for the document group, and is stored in the document group search index storage unit 12c (202). Similarly, a feature word is extracted from the document stored in the document storage unit 12a, a document search index is generated, and stored in the document search index storage unit 12d (203). If the user instructs the document search client 2 to end the document search, the process ends (204). Otherwise, the input of the search key is accepted, stored in the search key data 13a, and transmitted to the document search server 1 ( 205).

文書検索サーバ１は、文書検索クライアント２が送信した検索キーデータ１３ａを受信して、検索キーデータ１３ａに格納されていた検索キーに基づいて文書群検索インデックス格納部１２ｃに格納された文書群検索インデックスを検索する。そして、結果を文書群検索結果データ１３ｂに格納し、文書検索クライアント２に送信する（２０６）。同様に文書検索サーバ１は、検索キーに基づいて文書検索インデックス格納部１２ｄに格納された文書検索インデックスを検索し、結果を文書検索結果データ１３ｄに格納して文書検索クライアント２に送信する（２０７）。文書検索クライアント２は、文書検索サーバ１から送信された文書群検索結果データ１３ｂ及び文書検索結果データ１３ｄを受信して、文書群検索結果の文書群に含まれる文書を文書検索結果データ１３ｄに格納された文書検索結果における適合度順に表示し、文書検索結果の文書を文書群検索結果データ１３ｂに格納された文書群検索結果における適合度のランクと共に表示する（２０８）。 The document search server 1 receives the search key data 13a transmitted from the document search client 2 and searches the document group search index storage unit 12c based on the search key stored in the search key data 13a. Search the index. The result is stored in the document group search result data 13b and transmitted to the document search client 2 (206). Similarly, the document search server 1 searches the document search index stored in the document search index storage unit 12d based on the search key, stores the result in the document search result data 13d, and transmits it to the document search client 2 (207). ). The document search client 2 receives the document group search result data 13b and the document search result data 13d transmitted from the document search server 1, and stores the documents included in the document group search result data group in the document search result data 13d. The document search results are displayed in the order of suitability, and the document search results are displayed together with the rank of the suitability in the document group search results stored in the document group search result data 13b (208).

上記ステップのうち、文書群生成ステップ２０１、文書群検索インデックス生成ステップ２０２、終了ステップ２０４、検索キー受付ステップ２０５、文書群検索ステップ２０６は、図１３に示す本発明の第一の実施例と同様である。すなわち、文書群生成ステップ２０１は文書群生成ステップ１０１と、文書群検索インデックス生成ステップ２０２は文書群検索インデックス生成ステップ１０２と、終了ステップ２０４は終了ステップ１０３と、検索キー受付ステップ２０５は検索キー受付ステップ１０４と、文書群検索ステップ２０６は文書群検索ステップ１０５と同じであるので、詳細な説明は省略する。また文書検索ステップ２０７は、一般的な文書検索の技術を適用する実現可能であるので、詳細な説明は省略する。 Of the above steps, the document group generation step 201, the document group search index generation step 202, the end step 204, the search key reception step 205, and the document group search step 206 are the same as in the first embodiment of the present invention shown in FIG. It is. That is, the document group generation step 201 is a document group generation step 101, the document group search index generation step 202 is a document group search index generation step 102, the end step 204 is an end step 103, and the search key reception step 205 is a search key reception. Since step 104 and document group search step 206 are the same as document group search step 105, a detailed description thereof will be omitted. The document search step 207 can be realized by applying a general document search technique, and thus detailed description thereof is omitted.

以下、上記処理のうち検索結果表示ステップ２０８の処理について図２４に示す検索結果表示画面を用いて説明する。 Hereinafter, the process of the search result display step 208 among the above processes will be described using the search result display screen shown in FIG.

（文書検索システムにおける検索結果表示画面の一例）
図２４は、本発明の第四の実施例にかかる文書検索システムの文書検索クライアント２における文書検索結果表示画面の一例を示す図である。図に示すように、文書検索クライアント２の文書検索結果表示画面は、検索キー入力エリア６０１、検索ボタン６０２、文書群検索結果表示エリア６０３、文書検索結果表示エリア６０４、及び文書件数表示エリア６０５から構成される。 (Example of search result display screen in document search system)
FIG. 24 is a diagram showing an example of a document search result display screen in the document search client 2 of the document search system according to the fourth embodiment of the present invention. As shown in the figure, the document search result display screen of the document search client 2 includes a search key input area 601, a search button 602, a document group search result display area 603, a document search result display area 604, and a document number display area 605. Composed.

検索キー入力エリア６０１は、ユーザが検索キーを入力するエリアである。図２４中には、検索キーとして「高血圧症」、「肺機能」、「血管」が入力された状態が表示してある。検索ボタン６０２は、ユーザが検索の実行を指示するボタンである。文書群検索結果表示エリア６０３は、文書検索クライアント２が文書群検索結果データ１３ｂを検索キーとの適合度の高い順に表示するエリアであり、文書検索結果表示エリア６０４は文書検索結果データ１３ｄを検索キーとの適合度の高い順に表示するエリアである。文書件数表示エリア６０５は、文書群検索結果表示エリア６０３及び文書検索結果表示エリア６０４に表示した文書の件数等を表示するエリアである。 The search key input area 601 is an area where the user inputs a search key. In FIG. 24, a state in which “hypertension”, “pulmonary function”, and “blood vessel” are input as search keys is displayed. A search button 602 is a button for the user to instruct execution of the search. The document group search result display area 603 is an area where the document search client 2 displays the document group search result data 13b in descending order of the degree of matching with the search key, and the document search result display area 604 searches for the document search result data 13d. This area is displayed in descending order of the degree of matching with the key. The document number display area 605 is an area for displaying the number of documents displayed in the document group search result display area 603 and the document search result display area 604.

ここで、文書群検索結果表示エリア６０３、文書検索結果表示エリア６０４、及び文書件数表示エリア６０５について詳細に説明する Here, the document group search result display area 603, the document search result display area 604, and the document number display area 605 will be described in detail.

（文書群検索結果表示エリア６０３の詳細）
文書群検索結果表示エリア６０３は、文書群検索結果ランク６０３０１、文書検索結果ランク６０３０２、引用先文書６０３０３、及び引用元文書６０３０４から構成される。文書群検索結果ランク６０３０１には、当該文書群と検索キーとの適合度のランクを表示する。文書検索結果ランク６０３０２には、各文書群に含まれている引用先文書又は引用元文書の、文書検索結果におけるランクを表示する。引用先文書又は引用元文書が文書検索結果に含まれていない場合は、ランクが不明であることを示す「−」などを表示する。これは、引用先文書又は引用元文書が個々の文書としては検索キーとの適合度が低いなどの場合が想定される。引用先文書６０３０３には、文書群における引用先文書を表示する。引用元文書６０３０４には、文書群において前記引用先文書の同一の部分を引用している文書を表示する。 (Details of document group search result display area 603)
The document group search result display area 603 includes a document group search result rank 60301, a document search result rank 60302, a citation destination document 60303, and a citation source document 60304. The document group search result rank 60301 displays the rank of the degree of matching between the document group and the search key. The document search result rank 60302 displays the rank in the document search result of the citation target document or the citation source document included in each document group. When the cited document or the cited document is not included in the document search result, “-” or the like indicating that the rank is unknown is displayed. This is assumed to be a case where the citation destination document or the citation source document has a low degree of matching with the search key as an individual document. The cited document 60303 displays the cited document in the document group. A citation source document 60304 displays a document that cites the same portion of the citation destination document in the document group.

図２４の例では、検索キーとの適合度が２番目に高かった文書群の引用先文書６０３０３には「肺機能におよぼす高血圧…」が表示されている。この引用先文書は、文書検索結果においてランクが１２位であったことが「（１２）」として表示されている。また当該引用先文書中の同じ引用箇所を引用している引用元文書として「肺換気障害と高血圧症…」、「慢性肺疾患患者の…」が表示されているが、これら引用元文書はそれぞれ文書検索結果においてランクが１位と５位であったことが表示されている。 In the example of FIG. 24, “high blood pressure affecting lung function ...” is displayed in the cited document 60303 of the document group having the second highest matching degree with the search key. This quoted document is displayed as “(12)” that the rank is 12th in the document search result. In addition, “Pulmonary Ventilation Disorder and Hypertension…” and “Chronic Lung Disease Patient…” are displayed as the cited documents that cite the same cited place in the cited document. It is displayed that the ranks are first and fifth in the document search result.

（文書検索結果表示エリア６０４の詳細）
文書検索結果表示エリア６０４は、文書検索結果ランク６０４０１、文書群検索結果ランク６０４０２、及び文書６０４０３から構成される。文書検索結果ランク６０４０１には、当該文書と検索キーとの適合度のランクを表示する。文書群検索結果ランク６０４０２には、当該文書が含まれている文書群の、文書群検索結果におけるランクを表示する。当該文書が含まれている文書群が文書群検索結果に含まれていない場合は、ランクが不明であることを示す「−」などを表示する。文書６０４０３には、検索キーと適合した文書を表示する。 (Details of document search result display area 604)
The document search result display area 604 includes a document search result rank 60401, a document group search result rank 60402, and a document 60403. The document search result rank 60401 displays the rank of the degree of matching between the document and the search key. The document group search result rank 60402 displays the rank in the document group search result of the document group including the document. When the document group including the document is not included in the document group search result, “-” or the like indicating that the rank is unknown is displayed. A document 60403 displays a document that matches the search key.

図２４の例では、検索キーとの適合度が５番目に高かった文書「慢性肺疾患患者の…」の「（２）」は、文書群検索結果表示エリア６０３において２番目にランクされている文書群に含まれていることを示している。 In the example of FIG. 24, “(2)” of the document “chronic lung disease patient ...” having the fifth highest matching degree with the search key is ranked second in the document group search result display area 603. It is included in the document group.

（文書件数表示エリア６０５の詳細）
文書件数表示エリア６０５は、表示件数６０５０１、参照見積６０５０２、及び参照済６０５０３から構成される。 (Details of document count display area 605)
The document number display area 605 includes a display number 60501, a reference estimate 60502, and a referenced 60503.

表示件数６０５０１には、文書群検索結果表示エリア６０３と文書検索結果表示エリア６０４に表示されている文書がそれぞれ何件になるかを表示する。文書群検索結果表示エリア６０３に表示されている文書群に含まれている文書は文書群ごとに異なる。また前記文書群に含まれる文書と文書検索結果表示エリア６０４に表示されている文書には、重複がある。そのため検索結果に含まれる文書群数及び文書数を表示し、さらに文書群の場合、文書群に含まれている文書の件数を表示する。 In the display number 60501, the number of documents displayed in the document group search result display area 603 and the document search result display area 604 is displayed. The documents included in the document group displayed in the document group search result display area 603 are different for each document group. Further, there is an overlap between the document included in the document group and the document displayed in the document search result display area 604. Therefore, the number of documents included in the search result and the number of documents are displayed. In the case of a document group, the number of documents included in the document group is displayed.

図の例では、表示件数６０５０１に、検索結果に含まれる文書群及び文書数がそれぞれ「２０」件であり、文書群の場合、２０件の文書群に含まれている文書(引用先文書と引用元文書)の数は全部で「４２」件であることが示されている。これに対して文書の場合は、検索結果に含まれている文書の数がすなわち文書の数と一致するため「２０」件である。さらに文書群検索結果に含まれる文書数と文書検索結果に含まれる文書のうち、重複を除いた異なり数が合計の「４６」件であることが示されている。 In the example in the figure, the number of documents displayed and the number of documents included in the search result are “20” in the display number 60501. In the case of the document group, the documents included in the 20 document groups (the cited document and It is indicated that the total number of citation documents is “42”. On the other hand, the number of documents is “20” because the number of documents included in the search result matches the number of documents. Further, it is shown that the number of differences excluding duplication among the number of documents included in the document group search result and the documents included in the document search result is “46” in total.

参照見積６０５０２には、表示されている文書を参照する際、文書群検索結果及び文書検索結果をそれぞれ上位何位まで参照すれば全体で何件参照したことになるか計算した結果を表示する。具体的には、参照見積６０５０２には、参照する文書群の数、又は文書の数、又は参照する文書数の合計のいずれかをユーザが入力することができる。例えば、ユーザが「合計」の数値に「２０」を入力すると、システムは合計２０件の文書を参照するに必要な文書群と文書の数を計算し、文書群を「５」件（含まれる文書数は１５件）、文書を「５」件参照すればよいという数値を表示する。詳細には、文書群検索結果に含まれる文書群および文書検索結果に含まれる文書をランクの上位から順にそれぞれ１つずつ取得し、重複している文書があれば除いて文書数の合計を計算する。文書数の合計がユーザが指定した数を超せば、図の例では「２０」を超せば、文書群検索結果および文書検索結果の上位からそれぞれいくつ文書群および文書を取得したかを参照見積６０５０２の文書群数および文書数に表示する。 In the reference estimate 60502, when referring to the displayed document, the result of calculating the total number of documents referred to by referring to the document group search result and the document search result is displayed. Specifically, in the reference estimate 60502, the user can input either the number of document groups to be referred to, the number of documents, or the total number of documents to be referred to. For example, when the user inputs “20” as the value of “total”, the system calculates the document group and the number of documents necessary for referring to the total of 20 documents, and includes “5” document groups (included). The number of documents is 15), and a numerical value indicating that “5” documents should be referred to is displayed. Specifically, the document group included in the document group search result and the document included in the document search result are acquired one by one in order from the top of the rank, and the total number of documents is calculated excluding any duplicate documents. To do. If the total number of documents exceeds the number specified by the user, in the example in the figure, if it exceeds “20”, the number of document groups and documents obtained from the top of the document group search results and the document search results are referred to. It is displayed in the document group number and document number of the estimate 60502.

あるいは別の例として、ユーザが「合計」の数値に「２０」、「文書」の数値に「５」を入力すると、システムは合計２０件の文書を参照するに必要な文書群の数を計算する。詳細には、文書群検索結果に含まれる文書群をランクの上位から順に１つずつ取得し、文書検索結果の上位５位に含まれる文書と重複している文書があれば除いて文書数の合計を計算する。文書数の合計がユーザが指定した数を超せば、文書群検索結果の上位からいくつ文書群を取得したかを参照見積もり６０５０２の文書群数に表示する。 As another example, when the user inputs “20” as the “total” value and “5” as the “document” value, the system calculates the number of document groups necessary to refer to the total of 20 documents. To do. Specifically, the document group included in the document group search result is acquired one by one in order from the top of the rank, and the number of documents is excluded unless there is a document that overlaps with the document included in the top five of the document search result. Calculate the total. If the total number of documents exceeds the number specified by the user, the number of document groups acquired from the top of the document group search result is displayed in the document group number of the reference estimate 60502.

参照済６０５０３は、ユーザが文書群検索結果において参照した文書群の数と当該文書群に含まれる文書の数、及び文書検索結果において参照した文書の数とそれらの合計を示す。図の例の参照済６０５０３には、文書群検索結果においてユーザが参照した文書群の数が「２」件、当該参照した文書群に含まれる文書の数が「５」件であり、文書検索結果で参照した文書の数は「０」件、合計で「５」件の文書を参照したことが表示されている。 Referenced 60503 indicates the number of document groups referred to by the user in the document group search result, the number of documents included in the document group, the number of documents referred to in the document search result, and the total thereof. In the example of reference 60503 in the figure, the number of document groups referred to by the user in the document group search result is “2”, and the number of documents included in the referenced document group is “5”. The number of documents referred to in the result is “0”, which indicates that a total of “5” documents have been referred to.

以上、本発明の第四の実施例にかかる文書検索システムについて説明した。 The document search system according to the fourth embodiment of the present invention has been described above.

本発明は、文書を検索するシステムに適用可能であり、特に論文や特許のような文書間で引用関係を持つ文書を検索するためのシステムに適用可能である。 The present invention can be applied to a system that searches documents, and in particular, can be applied to a system that searches documents having a citation relationship between documents such as papers and patents.

文書検索における検索インデックス生成の手順の概要を示す図。The figure which shows the outline | summary of the procedure of the search index production | generation in a document search. 文書間の引用関係についての説明図。Explanatory drawing about the citation relationship between documents. 文書間の引用関係についての説明図。Explanatory drawing about the citation relationship between documents. 文書群生成方法の一例を説明する図。The figure explaining an example of a document group production | generation method. 文書検索システムの構成例を示す図。The figure which shows the structural example of a document search system. 文書検索サーバ及び文書検索クライアントのハードウェア構成例を示す図。The figure which shows the hardware structural example of a document search server and a document search client. 本発明による文書検索システムのモジュール構成例を示す図。The figure which shows the module structural example of the document search system by this invention. 文書格納部の構成例を示す図。The figure which shows the structural example of a document storage part. 文書群格納部の構成例を示す図。The figure which shows the structural example of a document group storage part. 文書群検索インデックス格納部の構成例を示す図。The figure which shows the structural example of a document group search index storage part. 検索キーデータの一例を示す図。The figure which shows an example of search key data. 文書群検索結果データの一例を示す図。The figure which shows an example of document group search result data. 本発明による文書検索システムの処理手順の例を示すフローチャート。The flowchart which shows the example of the process sequence of the document search system by this invention. 文書群生成の処理手順の例を示すフローチャート。6 is a flowchart illustrating an example of a processing procedure for generating a document group. 文書群検索インデックス生成の処理手順の例を示すフローチャート。The flowchart which shows the example of the process sequence of document group search index production | generation. 本発明による文書検索システムの検検索結果表示画面の例を示す図。The figure which shows the example of the inspection result display screen of the document search system by this invention. 本発明による文書検索システムの検検索結果表示画面の例を示す図。The figure which shows the example of the inspection result display screen of the document search system by this invention. 本発明による文書検索システムの構成例を示す図。The figure which shows the structural example of the document search system by this invention. 本発明による文書検索システムの構成例を示す図。The figure which shows the structural example of the document search system by this invention. 本発明による文書検索システムの構成例を示す図。The figure which shows the structural example of the document search system by this invention. 文書検索インデックス格納部の構成例を示す図。The figure which shows the structural example of a document search index storage part. 文書検索結果データの一例を示す図。The figure which shows an example of document search result data. 本発明よる文書検索システムの処理手順の例を示すフローチャート。The flowchart which shows the example of the process sequence of the document search system by this invention. 本発明による文書検索システムの検検索結果表示画面の例を示す図。The figure which shows the example of the inspection result display screen of the document search system by this invention.

Explanation of symbols

１…文書検索サーバ、２…文書検索クライアント、３…通信回線、１０…ＣＰＵ、２０…ハードディスク、３０…メモリ、４０…ディスプレイ、４１…ディスプレイ制御部、５０…キーボード、５１…キーボード制御部、６０…マウス、６１…マウス制御部、７０…バス、１１ａ…文書群生成部、１１ａ１…引用先文書分割部、１１ａ１１…引用先文書構造分割部、１１ａ２…文書割付部、１１ｂ…文書群検索インデックス生成部、１１ｃ…文書群検索部、１１ｄ…サーバ通信部、１１ｅ…クライアント通信部、１１ｆ…検索キー受付部、１１ｇ…検索結果表示部、１２ａ…文書納部、１２ｂ…文書群格納部、１２ｃ…文書群検索インデックス格納部、１３a…検索キーデータ、１３ｂ…文書群検索結果データ、１３ｃ…文書本文 DESCRIPTION OF SYMBOLS 1 ... Document search server, 2 ... Document search client, 3 ... Communication line, 10 ... CPU, 20 ... Hard disk, 30 ... Memory, 40 ... Display, 41 ... Display control part, 50 ... Keyboard, 51 ... Keyboard control part, 60 ... Mouse, 61 ... Mouse control unit, 70 ... Bus, 11a ... Document group generation unit, 11a1 ... Destination document division unit, 11a11 ... Destination document structure division unit, 11a2 ... Document allocation unit, 11b ... Document group search index generation , 11c: Document group search unit, 11d: Server communication unit, 11e ... Client communication unit, 11f ... Search key reception unit, 11g ... Search result display unit, 12a ... Document storage unit, 12b ... Document group storage unit, 12c ... Document group search index storage unit, 13a ... search key data, 13b ... document group search result data, 13c ... document text

Claims

A document group generation means for generating a document group based on a citation relationship between documents from a set of documents to be searched;
And a document group search index generating means for generating a search index of the document group by extracting feature words from the document having the citation relationship.

2. The document search system according to claim 1, wherein the document group generation means uses a document cited from another document as a cited document, and a document that cites another document as a cited document. A document retrieval system characterized in that documents having the same document are collected to generate a document group.

3. The document search system according to claim 2, wherein the document group generation means uses a document cited from another document as a cited document, and a document that cites another document as a cited document. A cited part obtaining unit that obtains a cited part cited from another document in the document, and a document collecting unit that collects documents having the same cited part and generates a document group. Feature document retrieval system.

3. The document retrieval system according to claim 2, wherein the document group generation unit uses a document cited from another document as a citation destination document, and a document citation of another document as a citation source document. A document search system comprising: a citation destination document dividing means for dividing a destination document; and a document assignment means for generating a document group by assigning a citation source document to a portion of the divided citation destination document.

5. The document search system according to claim 4, wherein the cited document dividing means divides the document according to the document structure of the cited document.

5. The document search system according to claim 4, wherein the cited document dividing means divides the document by clustering feature words of the cited document.

5. The document search system according to claim 4, wherein the document assigning means includes a citation source document clustering means for clustering a plurality of citation documents that cite the same document, and a citation destination divided by the citation destination document dividing means. A document retrieval system comprising: a citation source document cluster allocating unit for generating a document group by allocating a cluster of the citation source document to a document part.

4. The document search system according to claim 3, wherein the document group search index generation means cites the cited part of the cited document obtained by the cited part obtaining means and the obtained cited part. A document search system that extracts a feature word from a part of an original document and generates a search index of a document group.

9. The document search system according to claim 8, wherein the document group search index generation means extracts a feature word from the cited part of the cited document and the part of the cited document, and is higher in terms of words appearing in the cited part. A document retrieval system characterized by extracting feature words by assigning weights.

5. The document search system according to claim 4, wherein the document group search index generating means is assigned to the cited document part divided by the document dividing means and the cited document part assigned by the document assigning means. A document retrieval system characterized by extracting feature words from a document.

The document search system according to claim 10, wherein the document group search index generation unit extracts a feature word from a quoted document part and a document assigned to the cited document part. A document search system characterized by extracting feature words by giving higher weights to words appearing in a part.

The document search system according to claim 2, further comprising: a document group search unit that searches a search index of the document group; and a display unit that displays a search result.
The display means displays the document group search result acquired by the document group search means, displays the cited document and the cited document in parallel in the document group display, and can further identify the cited location of the cited document The document search system is characterized in that the words appearing in the citation source document are displayed in reverse, and the words appearing at the citation destination location are highlighted.

13. The document search system according to claim 12, further comprising a document search means for searching a document search index generated by extracting feature words from documents included in a document set to be searched,
The document search system, wherein the display means displays the document group search result acquired by the document group search means in comparison with the document search result acquired by the document search means.

14. The document search system according to claim 13, wherein the display unit arranges the citation source documents included in the document group search result acquired by the document group search unit in a rank in the document search result acquired by the document search unit. A document search system characterized by being displayed instead.

14. The document search system according to claim 13, wherein the display means adds a rank in the document group search result acquired by the document group search means to the display of the document search result acquired by the document search means. A document retrieval system characterized by that.