JP5060601B2

JP5060601B2 - Document analysis apparatus and program

Info

Publication number: JP5060601B2
Application number: JP2010174791A
Authority: JP
Inventors: 泰成宮部; 茂松本; 和之後藤; 博司平; 秀樹岩崎
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2010-08-03
Filing date: 2010-08-03
Publication date: 2012-10-31
Anticipated expiration: 2030-08-03
Also published as: JP2012037936A

Description

本発明の実施形態は、ユーザにとって重要な文書が属するカテゴリの傾向を当該ユーザに対して提示する文書分析装置およびプログラムに関する。 Embodiments described herein relate generally to a document analysis apparatus and program for presenting a tendency of a category to which a document important for a user belongs to the user.

近年の情報システムの高度化に伴い、例えば特許文献、新聞記事、ウェブページおよび書籍のような文書を電子化して大量に記憶（保存）することが可能となっている。 With the advancement of information systems in recent years, for example, documents such as patent documents, newspaper articles, web pages and books can be digitized and stored (saved) in large quantities.

そこで、このような大量の文書群を日々の業務に有効活用していくことが求められている。 Therefore, it is required to effectively use such a large amount of documents for daily work.

例えば過去の膨大な量の新聞記事を分類することによって多くの人が利用しやすいように整理する、または、研究開発している技術に関係のある自他社の特許群の傾向を分析して新たな研究開発分野を検討する、ということが考えられる。 For example, by classifying a huge amount of newspaper articles in the past, it is arranged so that it can be used easily by many people, or the trend of patent groups of other companies related to the technology being researched and developed is analyzed. It is conceivable to consider a new R & D field.

しかしながら、大量の文書群から関係のある文書を抽出する、当該文書群を内容が類似した文書毎（つまり、カテゴリ）に分類する、または、当該大量の文書群の内容の傾向等を分析する等の作業には多大な労力とコストかかる。 However, relevant documents are extracted from a large number of document groups, the document groups are classified into similar documents (that is, categories), or the tendency of the contents of the large number of document groups is analyzed. This work takes a lot of labor and cost.

そこで、これらの作業を支援するために、文書検索、文書分類および文書分析技術のような様々な技術が開発されている。 Therefore, various technologies such as document retrieval, document classification, and document analysis techniques have been developed to support these operations.

特許４２４０２８０号公報Japanese Patent No. 4240280

「情報検索と言語処理」言語と計算５東京大学出版会ｐｐ３９−４３（１９９９）徳永健伸、辻井潤一“Information Retrieval and Language Processing” Language and Calculation 5 The University of Tokyo Press pp39-43 (1999) Takenobu Tokunaga, Junichi Sakurai

ここで、複数のカテゴリに分類されている大量の文書群の中から、例えばユーザに既に知られている当該ユーザにとって重要な文書（以下、既知の重要文書と表記）に基づいて当該ユーザにまだ知られていない当該ユーザにとって重要な文書（以下、未知の重要文書と表記）を検索する場合を想定する。 Here, from among a large number of document groups classified into a plurality of categories, for example, the user is still informed based on a document that is already known to the user and important for the user (hereinafter referred to as a known important document). Assume that a document important for the user who is not known (hereinafter referred to as an unknown important document) is searched.

この場合において、未知の重要文書がカテゴリ等に関係なく全ての文書群から検索されるような場合には、実際には重要でない文書であるにもかかわらず、例えば既知の重要文書に出現する単語が偶然出現するような文書が重要文書として検索される場合がある。 In this case, when an unknown important document is searched from all document groups regardless of the category etc., for example, a word appearing in a known important document even though the document is not actually important. There is a case where a document in which “” appears accidentally is retrieved as an important document.

ところで、複数のカテゴリに分類された文書群の中から例えばユーザが重要文書（当該ユーザにとって重要な文書）を選択（指定）するとき、当該ユーザは、１つのカテゴリにのみ着目して文書を閲覧することによって当該重要文書であるか否かの判断を行う場合が多い。つまり、ユーザが重要であると判断した文書（重要文書）が他にどのようなカテゴリに属しているかについて、当該ユーザが把握していることは少ない。 By the way, for example, when a user selects (specifies) an important document (an important document for the user) from a group of documents classified into a plurality of categories, the user views the document by paying attention to only one category. By doing so, it is often determined whether the document is the important document. That is, it is rare that the user knows what other category the document (important document) determined to be important belongs to.

したがって、例えばユーザによって指定されたカテゴリ（重要文書が含まれているとユーザが考えるカテゴリ）から未知の重要文書を検索するような場合には、ユーザによって指定されていないカテゴリ（つまり、重要文書が含まれているとユーザが気づいていないカテゴリ）から未知の重要文書を検索することができない。 Therefore, for example, when an unknown important document is searched from a category designated by the user (a category that the user thinks contains an important document), a category not designated by the user (that is, an important document is An unknown important document cannot be searched from a category that is not noticed by the user if it is included.

したがって、ユーザにとっての重要文書がどのようなカテゴリに多く属しているか等の傾向を把握することができれば、当該重要文書が含まれているとユーザが気づいていないカテゴリの発見とともに、当該傾向を用いて未知の重要文書を検索する（つまり、絞り込む）ことが可能となると考えられる。 Therefore, if it is possible to grasp the tendency such as what categories the important documents for the user belong to, the trend is used together with the discovery of the category that the user is not aware that the important documents are included. Thus, it is considered possible to search (that is, narrow down) unknown important documents.

具体的には、自社の製品の基本となる技術について記述された重要な特許（文書）に基づいて当該製品にとって脅威となる他社の重要な特許を検索したいような場合に、重要な特許が「Ａ社のＢ課題を解決する特許に多い」または「１９９５年にＣ社の出願が多い」という傾向を発見できれば、そのようなカテゴリに着目して絞り込むことで、他社の重要な特許を少ない労力で検索する（見つけ出す）ことが可能になると考えられる。 Specifically, when you want to search for important patents of other companies that threaten the product based on important patents (documents) that describe the technology that is the basis of your product, If you can find the tendency of “Many patents to solve the B problem of Company A” or “Many applications from Company C in 1995”, narrow down focusing on such categories and reduce the labor of important patents of other companies. It will be possible to search (find) with.

そこで、ユーザにとって重要な文書が属するカテゴリの傾向を当該ユーザに提示することが可能な文書分析装置およびプログラムを提供することが目的とされる。 Therefore, an object of the present invention is to provide a document analysis apparatus and program capable of presenting to a user a category tendency to which a document important for the user belongs.

実施形態に係る文書分析装置は、複数の文書が分類されるカテゴリ毎に、当該カテゴリを識別するためのカテゴリ識別情報および当該カテゴリに属する文書を対応づけて記憶する記憶手段を具備する。 The document analysis apparatus according to the embodiment includes, for each category into which a plurality of documents are classified, storage means for storing category identification information for identifying the category and documents belonging to the category in association with each other.

実施形態に係る文書分析装置は、ユーザの操作に応じて、前記記憶手段に記憶されている第１の文書を指定する重要文書指定手段を具備する。 The document analysis apparatus according to the embodiment includes an important document designation unit that designates the first document stored in the storage unit in response to a user operation.

実施形態に係る文書分析装置は、前記指定された第１の文書に対応づけて前記記憶手段に記憶されているカテゴリ識別情報を特定する第１の特定手段を具備する。 The document analysis apparatus according to the embodiment includes a first specifying unit that specifies category identification information stored in the storage unit in association with the designated first document.

実施形態に係る文書分析装置は、前記第１の特定手段によって特定されたカテゴリ識別情報に対応づけて前記記憶手段に記憶されている第２の文書を特定する第２の特定手段を具備する。 The document analysis apparatus according to the embodiment includes a second specifying unit that specifies the second document stored in the storage unit in association with the category identification information specified by the first specifying unit.

実施形態に係る文書分析装置は、前記指定された第１の文書と前記第２の特定手段によって特定された第２の文書との類似度を、当該第１の文書に対応づけて前記記憶手段に記憶されているカテゴリ識別情報および当該第２の文書に対応づけて前記記憶手段に記憶されているカテゴリ識別情報に基づいて算出する類似度算出手段を具備する。 The document analyzing apparatus according to the embodiment associates the similarity between the designated first document and the second document specified by the second specifying unit with the first document, and stores the storage unit. And similarity calculation means for calculating based on the category identification information stored in the storage means and the category identification information stored in the storage means in association with the second document.

実施形態に係る文書分析装置は、前記類似度算出手段によって算出された類似度に基づいて、前記第２の特定手段によって特定された第２の文書を重要文書候補として決定する重要文書候補決定手段を具備する。 The document analysis apparatus according to the embodiment determines an important document candidate determining unit that determines the second document specified by the second specifying unit as an important document candidate based on the similarity calculated by the similarity calculating unit. It comprises.

実施形態に係る文書分析装置は、前記記憶手段に記憶されているカテゴリ識別情報によって識別されるカテゴリの重要度を、前記類似度算出手段によって算出された類似度のうち、当該カテゴリ識別情報に対応づけて前記記憶手段に記憶されている前記重要文書候補として決定された第２の文書と前記指定された第１の文書との類似度に基づいて算出する重要度算出手段を具備する。 The document analysis apparatus according to the embodiment corresponds to the category identification information, out of the similarities calculated by the similarity calculation means, the importance of the category identified by the category identification information stored in the storage means In addition, there is provided importance calculation means for calculating based on the similarity between the second document determined as the important document candidate stored in the storage means and the designated first document.

実施形態に係る文書分析装置は、前記重要度算出手段によって算出された重要度に基づいて、前記記憶手段に記憶されているカテゴリ識別情報によって識別されるカテゴリを重要カテゴリ候補として決定する重要カテゴリ候補決定手段を具備する。 The document analysis apparatus according to the embodiment determines the category identified by the category identification information stored in the storage unit as the important category candidate based on the importance calculated by the importance calculating unit. A determination means is provided.

実施形態に係る文書分析装置は、前記重要カテゴリ候補として決定されたカテゴリを提示する提示手段を具備する。 The document analysis apparatus according to the embodiment includes a presentation unit that presents the category determined as the important category candidate.

第１の実施形態に係る文書分析装置のハードウェア構成を示すブロック図。1 is a block diagram showing a hardware configuration of a document analysis apparatus according to a first embodiment. 図１に示す文書分析装置３０の主として機能構成を示すブロック図。The block diagram which mainly shows a function structure of the document analyzer 30 shown in FIG. 図２に示す文書記憶部２２に記憶されている文書のデータ構造の一例を示す図。The figure which shows an example of the data structure of the document memorize | stored in the document memory | storage part 22 shown in FIG. カテゴリの階層構造におけるルートのカテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the category of the root | route in the hierarchy structure of a category. カテゴリの階層構造における図４に示すカテゴリの子カテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the child category of the category shown in FIG. 4 in the hierarchical structure of a category. カテゴリの階層構造における図５に示すカテゴリの子カテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the child category of the category shown in FIG. 5 in the hierarchical structure of a category. カテゴリの階層構造における図４に示すカテゴリの子カテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the child category of the category shown in FIG. 4 in the hierarchical structure of a category. カテゴリの階層構造における図７に示すカテゴリの子カテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the child category of the category shown in FIG. 7 in the hierarchy structure of a category. カテゴリの階層構造における図８に示すカテゴリの子カテゴリに関する情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information regarding the child category of the category shown in FIG. 8 in the hierarchy structure of a category. 本実施形態に係る文書分析装置３０の処理手順を示すフローチャート。6 is a flowchart showing a processing procedure of the document analysis apparatus 30 according to the present embodiment. カテゴリ表示操作部３１１によって表示される重要文書指定画面の一例を示す図。The figure which shows an example of the important document designation | designated screen displayed by the category display operation part 311. FIG. 図１１に示す重要文書指定画面１００において重要文書を指定する操作が行われた後の表示画面の一例を示す図。The figure which shows an example of the display screen after operation which designates an important document was performed in the important document designation | designated screen 100 shown in FIG. 重要文書候補抽出部３２１による重要文書候補抽出処理の処理手順を示すフローチャート。12 is a flowchart showing a processing procedure of important document candidate extraction processing by an important document candidate extraction unit 321. 重要文書候補抽出処理において実行されるカテゴリベクトル生成処理の処理手順を示すフローチャート。The flowchart which shows the process sequence of the category vector generation process performed in an important document candidate extraction process. 重要カテゴリ候補抽出部３２２による重要カテゴリ候補抽出処理の処理手順を示すフローチャート。The flowchart which shows the process sequence of the important category candidate extraction process by the important category candidate extraction part 322. FIG. カテゴリ表示操作部３１１によって重要カテゴリ候補が表示された場合の表示画面の一例を示す図。The figure which shows an example of a display screen when an important category candidate is displayed by the category display operation part 311. FIG. 関連重要カテゴリ候補抽出部３２３による関連重要カテゴリ候補抽出処理の処理手順を示すフローチャート。The flowchart which shows the process sequence of the related important category candidate extraction process by the related important category candidate extraction part 323. FIG. カテゴリ表示操作部３１１によって関連重要カテゴリ候補が表示された場合の表示画面の一例を示す図。The figure which shows an example of a display screen when a related important category candidate is displayed by the category display operation part 311. FIG. 図１８に示す表示画面３００において確認対象カテゴリが選択された場合の表示画面の一例を示す図。The figure which shows an example of the display screen when the confirmation object category is selected on the display screen 300 shown in FIG. 第２の実施形態に係る文書分析装値の主として機能構成を示すブロック図。The block diagram which mainly shows the function structure of the document analysis instrumentation which concerns on 2nd Embodiment. 本実施形態に係る文書分析装置４０の処理手順を示すフローチャート。5 is a flowchart showing a processing procedure of the document analysis apparatus 40 according to the present embodiment. 重要文書集計部４１１による重要文書の集計結果の一例を示す図。The figure which shows an example of the total result of the important document by the important document total part 411. FIG.

以下、図面を参照して、各実施形態について説明する。 Hereinafter, each embodiment will be described with reference to the drawings.

（第１の実施形態）
図１は、第１の実施形態に係る文書分析装置のハードウェア構成を示すブロック図である。図１に示すように、コンピュータ１０は、例えばハードディスクドライブ（ＨＤＤ：Hard Disk Drive）のような外部記憶装置２０と接続されている。この外部記憶装置２０は、コンピュータ１０によって実行されるプログラム２１を格納する。コンピュータ１０および外部記憶装置２０は、文書分析装置３０を構成する。 (First embodiment)
FIG. 1 is a block diagram illustrating a hardware configuration of the document analysis apparatus according to the first embodiment. As shown in FIG. 1, the computer 10 is connected to an external storage device 20 such as a hard disk drive (HDD). The external storage device 20 stores a program 21 executed by the computer 10. The computer 10 and the external storage device 20 constitute a document analysis device 30.

図２は、図１に示す文書分析装置３０の主として機能構成を示すブロック図である。図２に示すように、文書分析装置３０は、ユーザインタフェース部３１および重要文書処理部３２を含む。本実施形態において、これらの各部３１および３２は、図１に示すコンピュータ１０が外部記憶装置２０に格納されているプログラム２１を実行することにより実現されるものとする。このプログラム２１は、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、このプログラム２１が、例えばネットワークを介してコンピュータ１０にダウンロードされても構わない。 FIG. 2 is a block diagram mainly showing a functional configuration of the document analysis apparatus 30 shown in FIG. As shown in FIG. 2, the document analysis apparatus 30 includes a user interface unit 31 and an important document processing unit 32. In the present embodiment, these units 31 and 32 are realized by the computer 10 illustrated in FIG. 1 executing the program 21 stored in the external storage device 20. This program 21 can be stored in advance in a computer-readable storage medium and distributed. Further, this program 21 may be downloaded to the computer 10 via, for example, a network.

また、文書分析装置３０は、文書記憶部２２およびカテゴリ記憶部２３を含む。本実施形態において、文書記憶部２２およびカテゴリ記憶部２３は、例えば外部記憶装置２０に格納される。 Further, the document analysis device 30 includes a document storage unit 22 and a category storage unit 23. In the present embodiment, the document storage unit 22 and the category storage unit 23 are stored in, for example, the external storage device 20.

文書記憶部２２には、文書分析装置３０における分析の対象となる複数の文書が格納されている。文書記憶部２２に格納されている文書の各々は、例えば当該文書を識別するための文書番号（文書識別情報）および本文等を含む。 The document storage unit 22 stores a plurality of documents to be analyzed by the document analysis device 30. Each of the documents stored in the document storage unit 22 includes, for example, a document number (document identification information) for identifying the document and a text.

カテゴリ記憶部２３には、文書記憶部２２に記憶されている複数の文書が分類される複数のカテゴリに関する情報が記憶されている。カテゴリ記憶部２３には、カテゴリ毎に、当該カテゴリを識別するためのカテゴリ番号（カテゴリ識別情報）および当該カテゴリに分類されている文書（当該カテゴリに属する文書）を識別するための文書番号が記憶されている。なお、文書記憶部２２に記憶されている複数の文書が分類される複数のカテゴリは、例えば階層構造を構成する。 The category storage unit 23 stores information related to a plurality of categories into which a plurality of documents stored in the document storage unit 22 are classified. The category storage unit 23 stores, for each category, a category number (category identification information) for identifying the category and a document number for identifying a document classified in the category (a document belonging to the category). Has been. A plurality of categories into which a plurality of documents stored in the document storage unit 22 are classified constitutes a hierarchical structure, for example.

つまり、文書記憶部２２およびカテゴリ記憶部２３含む記憶部（図示せず）には、複数の文書が分類されるカテゴリ毎に、当該カテゴリを識別するためのカテゴリ識別情報および当該カテゴリに属する文書が対応づけて記憶されている。 That is, in a storage unit (not shown) including the document storage unit 22 and the category storage unit 23, for each category into which a plurality of documents are classified, category identification information for identifying the category and documents belonging to the category are stored. It is stored in association.

なお、文書記憶部２２およびカテゴリ記憶部２３は、例えばファイルシステムまたはデータベース等を用いて実現されても構わない。 The document storage unit 22 and the category storage unit 23 may be realized using a file system or a database, for example.

ユーザインタフェース部３１は、カテゴリ表示操作部３１１、重要文書指定部３１２およびカテゴリ選択部３１３を含む。 The user interface unit 31 includes a category display operation unit 311, an important document designation unit 312, and a category selection unit 313.

カテゴリ表示操作部３１１は、カテゴリ記憶部２３を参照して、文書記憶部２２に記憶されている複数の文書が分類されたカテゴリおよび当該カテゴリに属する文書をユーザに対して提示（表示）する。また、カテゴリ表示操作部３１１は、ユーザに対して提示されたカテゴリおよび文書に対する当該ユーザの操作を受け付ける機能を有する。これにより、ユーザは、カテゴリ表示操作部３１１によって提示されたカテゴリおよび文書を選択および指定することができる。 The category display operation unit 311 refers to the category storage unit 23 and presents (displays) a category in which a plurality of documents stored in the document storage unit 22 are classified and a document belonging to the category to the user. Further, the category display operation unit 311 has a function of accepting the user's operation on the category and the document presented to the user. Thereby, the user can select and specify the category and the document presented by the category display operation unit 311.

カテゴリ表示操作部３１１は、例えばグラフィカル・ユーザ・インタフェース（ＧＵＩ：Graphical User Interface）等の技術によって実現されても構わない。 The category display operation unit 311 may be realized by a technique such as a graphical user interface (GUI).

重要文書指定部３１２は、カテゴリ表示操作部３１１によって受け付けられたユーザの操作に応じて、文書記憶部２２に記憶されている文書（第１の文書）を指定する。重要文書指定部３１２は、複数の文書を指定しても構わない。なお、重要文書指定部３１２によって指定される文書は、例えばユーザによって指定された当該ユーザにとって重要な文書（以下、重要文書と表記）である。 The important document designating unit 312 designates the document (first document) stored in the document storage unit 22 in accordance with the user operation received by the category display operation unit 311. The important document specifying unit 312 may specify a plurality of documents. The document specified by the important document specifying unit 312 is, for example, a document important for the user specified by the user (hereinafter referred to as an important document).

カテゴリ選択部３１３は、カテゴリ表示操作部３１１によって受け付けられたユーザの操作に応じて、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリ（例えば、カテゴリ表示操作部３１１によって提示されたカテゴリ）を選択する機能を有する。 The category selection unit 313 is a category (for example, presented by the category display operation unit 311) identified by the category number stored in the category storage unit 23 in accordance with a user operation received by the category display operation unit 311. (Category) is selected.

重要文書処理部３２は、重要文書候補抽出部３２１、重要カテゴリ候補抽出部３２２および関連重要カテゴリ候補抽出部３２３を含む。 The important document processing unit 32 includes an important document candidate extraction unit 321, an important category candidate extraction unit 322, and a related important category candidate extraction unit 323.

重要文書候補抽出部３２１は、重要文書指定部３１２によって指定された文書（重要文書）を用いて、文書記憶部２２に記憶されている複数の文書の中から当該重要文書の候補（以下、重要文書候補と表記）を抽出する機能を有する。 The important document candidate extraction unit 321 uses the document (important document) designated by the important document designation unit 312 and selects the important document candidate (hereinafter, important document) from among a plurality of documents stored in the document storage unit 22. A function of extracting a document candidate).

重要文書候補抽出部３２１は、重要文書指定部３１２によって指定された重要文書が属するカテゴリを特定する。この場合、重要文書候補抽出部３２１は、重要文書を識別するための文書番号に対応づけてカテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリを特定する。 The important document candidate extraction unit 321 specifies the category to which the important document designated by the important document designation unit 312 belongs. In this case, the important document candidate extraction unit 321 specifies the category identified by the category number stored in the category storage unit 23 in association with the document number for identifying the important document.

また、重要文書候補抽出部３２１は、特定されたカテゴリに属する文書（第２の文書）を特定する。この場合、重要文書候補抽出部３２１は、特定されたカテゴリを識別するためのカテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書を特定する。 Further, the important document candidate extraction unit 321 specifies a document (second document) belonging to the specified category. In this case, the important document candidate extraction unit 321 specifies the document identified by the document number stored in the category storage unit 23 in association with the category number for identifying the identified category.

重要文書候補抽出部３２１は、重要文書と特定された文書との類似度を、当該重要文書を識別するための文書番号に対応づけてカテゴリ記憶部２３に記憶されているカテゴリ番号および当該特定された文書を識別するための文書番号に対応づけてカテゴリ記憶部２３に記憶されているカテゴリ番号に基づいて算出する。換言すれば、重要文書候補抽出部３２１は、重要文書と特定された文書との類似度を当該各文書が属するカテゴリに基づいて算出する。 The important document candidate extraction unit 321 associates the similarity between the important document and the identified document with the category number stored in the category storage unit 23 in association with the document number for identifying the important document and the identified document. It is calculated based on the category number stored in the category storage unit 23 in association with the document number for identifying the document. In other words, the important document candidate extraction unit 321 calculates the similarity between the important document and the identified document based on the category to which each document belongs.

なお、重要文書候補抽出部３２１は、重要文書および特定された文書との類似度を算出する際、当該重要文書および特定された文書が属するカテゴリを表すカテゴリベクトル（後述する）を用いる。 Note that the important document candidate extraction unit 321 uses a category vector (described later) representing a category to which the important document and the specified document belong when calculating the similarity between the important document and the specified document.

重要文書候補抽出部３２１は、算出された重要文書と特定された文書との類似度に基づいて、当該特定された文書を重要文書候補として決定する。 The important document candidate extraction unit 321 determines the identified document as an important document candidate based on the similarity between the calculated important document and the identified document.

重要カテゴリ候補抽出部３２２は、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの中から、ユーザにとって重要な文書が属するカテゴリの候補（以下、重要カテゴリ候補と表記）を抽出する機能を有する。 The important category candidate extraction unit 322 extracts a category candidate to which a document important for the user belongs (hereinafter referred to as an important category candidate) from the categories identified by the category number stored in the category storage unit 23. It has a function.

重要カテゴリ候補抽出部３２２は、重要文書候補抽出部３２１によって算出された類似度のうち、重要文書候補抽出部３２１によって重要文書候補として決定された文書（以下、単に重要文書候補と表記）と重要文書との類似度に基づいて、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの重要度を算出する。なお、カテゴリの重要度は、当該カテゴリに属する重要文書候補と重要文書との類似度に基づいて算出される。 Of the similarities calculated by the important document candidate extraction unit 321, the important category candidate extraction unit 322 and a document determined as an important document candidate by the important document candidate extraction unit 321 (hereinafter simply referred to as an important document candidate) and an important Based on the similarity to the document, the importance of the category identified by the category number stored in the category storage unit 23 is calculated. The importance level of a category is calculated based on the similarity level between the important document candidate belonging to the category and the important document.

重要カテゴリ候補抽出部３２２は、算出されたカテゴリの重要度に基づいて、当該カテゴリを重要カテゴリ候補として決定する。 The important category candidate extraction unit 322 determines the category as an important category candidate based on the calculated importance of the category.

なお、重要カテゴリ候補抽出部３２２によって重要カテゴリ候補として決定されたカテゴリ（以下、単に重要カテゴリ候補と表記）は、カテゴリ表示操作部３１１を介してユーザに提示される。 Note that a category determined as an important category candidate by the important category candidate extraction unit 322 (hereinafter simply referred to as an important category candidate) is presented to the user via the category display operation unit 311.

関連重要カテゴリ候補抽出部３２３は、重要カテゴリ候補抽出部３２２によって抽出（決定）された重要カテゴリ候補の中から例えばカテゴリ選択部３１３によって選択されたカテゴリと関連のあるカテゴリ（以下、関連重要カテゴリ候補と表記）を抽出する。この場合、関連重要カテゴリ候補抽出部３２２は、カテゴリ選択部３１３によって選択されたカテゴリに属する重要文書候補および重要カテゴリ候補に属する重要文書候補に基づいて、当該重要カテゴリ候補が関連重要カテゴリ候補であるか否かを判定する。 The related important category candidate extraction unit 323 is a category related to the category selected by, for example, the category selection unit 313 from the important category candidates extracted (determined) by the important category candidate extraction unit 322 (hereinafter referred to as related important category candidate). Is extracted). In this case, the related important category candidate extraction unit 322 is based on the important document candidate belonging to the category selected by the category selection unit 313 and the important document candidate belonging to the important category candidate, and the relevant important category candidate is a related important category candidate. It is determined whether or not.

なお、関連重要カテゴリ候補抽出部３２３によって抽出された関連カテゴリ候補は、カテゴリ表示操作部３１１によってユーザに提示される。 The related category candidates extracted by the related important category candidate extraction unit 323 are presented to the user by the category display operation unit 311.

図３は、図２に示す文書記憶部２２に記憶されている文書のデータ構造の一例を示す。なお、図３に示す文書（文書記憶部２２に記憶されている文書）は、例えば特許出願に係る文書（特許文書）であるものとする。 FIG. 3 shows an example of the data structure of a document stored in the document storage unit 22 shown in FIG. 3 is a document (patent document) relating to a patent application, for example.

図３に示すように、文書記憶部２２に記憶されている文書には、文書番号、文書名、本文、出願人、出願日および重要度（を示す情報）が含まれる。文書番号は、文書を一意に識別するための文書識別情報である。文書名および本文は、文書番号によって識別される文書の名称（文書名）および当該文書の本文を示すテキストデータである。出願人は、文書番号によって識別される文書（特許文書）によってされた特許出願の出願人を示す。出願日は、文書番号によって識別される文書（特許文書）によってされた特許出願の出願日を示す。重要度は、文書番号によって識別される文書が例えば重要文書としてユーザによって指定（設定）されているか否かを示す。なお、文書に含まれる出願人および出願日は、当該文書（特許文書）の属性データである。 As shown in FIG. 3, the document stored in the document storage unit 22 includes a document number, a document name, a body text, an applicant, an application date, and an importance (information indicating). The document number is document identification information for uniquely identifying a document. The document name and the text are text data indicating the name of the document (document name) identified by the document number and the text of the document. The applicant indicates the applicant of the patent application filed by the document (patent document) identified by the document number. The filing date indicates the filing date of the patent application filed by the document (patent document) identified by the document number. The importance level indicates whether or not the document identified by the document number is designated (set) by the user as an important document, for example. The applicant and the application date included in the document are attribute data of the document (patent document).

図３に示す例では、文書記憶部２２には、文書２２１が記憶されている。文書２２１には、文書番号「ｄ０００１」、当該文書２２１の文書名および本文、出願人「Ａ社」、出願日「２００６／０１／２５」および重要度「重要」が含まれている。 In the example illustrated in FIG. 3, a document 221 is stored in the document storage unit 22. The document 221 includes a document number “d0001”, a document name and body of the document 221, an applicant “Company A”, an application date “2006/01/25”, and an importance “important”.

これによれば、文書番号「ｄ０００１」によって識別される文書（特許文書）２２１によってされた特許出願の出願人はＡ社であり、当該文書２２１によってされた特許出願の出願日は２００６年１月２５日であることが示される。また、文書２２１が重要文書としてユーザによって指定されていることが示される。 According to this, the applicant of the patent application filed by the document (patent document) 221 identified by the document number “d0001” is Company A, and the filing date of the patent application filed by the document 221 is January 2006. Shown to be 25 days. Further, it is indicated that the document 221 is designated by the user as an important document.

なお、図３に示す例では、文書２２１には重要度「重要」が含まれるものとして説明したが、文書に含まれる重要度には、「重要」以外に「不要」および「なし」が含まれる。重要度「不要」は、例えば文書番号によって識別される文書が不要な文書（不要文書）としてユーザによって指定されていることを示す。また、重要度「なし」は、例えば文書が重要文書であるか不要文書であるか等がユーザによって指定されていない（つまり、設定されていない）ことを示す。 In the example illustrated in FIG. 3, the document 221 has been described as including the importance “important”, but the importance included in the document includes “unnecessary” and “none” in addition to “important”. It is. The importance level “unnecessary” indicates that, for example, the document identified by the document number is designated by the user as an unnecessary document (unnecessary document). The importance level “none” indicates that the user has not designated (ie, has not been set) whether the document is an important document or an unnecessary document.

ここでは、重要度が「重要」、「不要」および「なし」の値をとるものとして説明したが、当該重要度は、重要である度合いに応じて例えば５段階の数値によって示されても構わない。 Here, the importance has been described as taking the values of “important”, “unnecessary”, and “none”. However, the importance may be indicated by, for example, a numerical value in five stages according to the degree of importance. Absent.

図４〜図９は、図２に示すカテゴリ記憶部２３に記憶されている階層構造を構成するカテゴリ（に関する情報）のデータ構造の一例を示す。図４〜図９に示すように、カテゴリ記憶部２３には、カテゴリ毎に、カテゴリ番号、親カテゴリ番号、カテゴリ名および文書番号が対応づけて記憶されている。 4 to 9 show an example of the data structure of categories (information relating to) constituting the hierarchical structure stored in the category storage unit 23 shown in FIG. As shown in FIGS. 4 to 9, the category storage unit 23 stores a category number, a parent category number, a category name, and a document number in association with each category.

カテゴリ番号は、カテゴリを一意に識別するためのカテゴリ識別情報である。親カテゴリ番号は、カテゴリの階層構造においてカテゴリ番号によって識別されるカテゴリの上位に位置するカテゴリ（つまり、親カテゴリ）を識別するためのカテゴリ識別情報である。カテゴリ名は、カテゴリ番号によって識別されるカテゴリの名称を示すテキストデータである。文書番号は、カテゴリ番号によって識別されるカテゴリに属する（分類された）文書を識別するための文書識別情報である。 The category number is category identification information for uniquely identifying a category. The parent category number is category identification information for identifying a category (that is, a parent category) located at a higher level of the category identified by the category number in the hierarchical structure of the category. The category name is text data indicating the name of the category identified by the category number. The document number is document identification information for identifying a document belonging to (categorized) the category identified by the category number.

図４は、カテゴリの階層構造におけるルートのカテゴリを示す。図４に示すカテゴリにおいては、カテゴリ番号「ｃ０１」がカテゴリ記憶部２３に記憶されている。図４に示すカテゴリはルートのカテゴリであるため、親カテゴリを持たない（つまり、カテゴリ番号「ｃ０１」に対応づけて親カテゴリ番号は記憶されていない）。また、図４に示す例では文書番号が記憶されていないため、カテゴリ番号「ｃ０１」によって識別されるカテゴリ（ルートのカテゴリ）には、文書が分類されていない（つまり、当該カテゴリに属する文書は存在しない）ことが示される。 FIG. 4 shows the root category in the category hierarchy. In the category illustrated in FIG. 4, the category number “c01” is stored in the category storage unit 23. Since the category shown in FIG. 4 is a root category, it does not have a parent category (that is, no parent category number is stored in association with the category number “c01”). In the example shown in FIG. 4, since the document number is not stored, the document is not classified into the category (root category) identified by the category number “c01” (that is, documents belonging to the category are not classified). Does not exist).

図５は、カテゴリの階層構造における図４に示すカテゴリの子カテゴリ（カテゴリの階層構造において図４に示すカテゴリの下位に位置するカテゴリ）を示す。図５に示すカテゴリにおいては、カテゴリ番号「ｃ０２」、親カテゴリ番号「ｃ０１」およびカテゴリ名「出願人別」がカテゴリ記憶部２３に対応づけて記憶されている。 FIG. 5 shows child categories of the category shown in FIG. 4 in the category hierarchical structure (categories positioned below the category shown in FIG. 4 in the category hierarchical structure). In the category shown in FIG. 5, category number “c02”, parent category number “c01”, and category name “by applicant” are stored in association with category storage unit 23.

これによれば、カテゴリの階層構造においてカテゴリ番号「ｃ０２」によって識別されるカテゴリの上位に位置する親カテゴリは、親カテゴリ番号「ｃ０１」によって識別されるカテゴリ（つまり、図４に示すカテゴリ）であることが示される。また、カテゴリ番号「ｃ０２」によって識別されるカテゴリの名称（カテゴリ名）は、「出願人別」であることが示される。なお、図５に示す例では文書番号が記憶されていないため、カテゴリ番号「ｃ０２」によって識別されるカテゴリには、文書が分類されていない（つまり、当該カテゴリに属する文書は存在しない）ことが示される。 According to this, the parent category positioned above the category identified by the category number “c02” in the hierarchical structure of the category is the category identified by the parent category number “c01” (that is, the category shown in FIG. 4). It is shown that there is. Further, the category name (category name) identified by the category number “c02” is “by applicant”. In the example shown in FIG. 5, since the document number is not stored, the document is not classified in the category identified by the category number “c02” (that is, there is no document belonging to the category). Indicated.

図６は、カテゴリの階層構造における図５に示すカテゴリの子カテゴリを示す。図６に示すカテゴリにおいては、カテゴリ番号「ｃ０４」、親カテゴリ番号「ｃ０２」、カテゴリ名「Ｔ社」および文書番号「ｄ１５，ｄ２３，ｄ３６，…」がカテゴリ記憶部２３に対応づけて記憶されている。 FIG. 6 shows child categories of the category shown in FIG. 5 in the category hierarchy. In the category shown in FIG. 6, category number “c04”, parent category number “c02”, category name “T company”, and document numbers “d15, d23, d36,...” Are stored in association with category storage unit 23. ing.

これによれば、カテゴリの階層構造においてカテゴリ番号「ｃ０４」によって識別されるカテゴリの上位に位置する親カテゴリは、親カテゴリ番号「ｃ０２」によって識別されるカテゴリ（つまり、図５に示すカテゴリ）であることが示される。また、カテゴリ番号「ｃ０４」によって識別されるカテゴリの名称（カテゴリ名）は、「Ｔ社」であることが示される。更に、カテゴリ番号「ｃ０４」によって識別されるカテゴリには、文書番号「ｄ１５」、「ｄ２３」および「ｄ３６」によって識別される文書が属する（分類されている）ことが示される。 According to this, the parent category positioned above the category identified by the category number “c04” in the hierarchical structure of the category is the category identified by the parent category number “c02” (that is, the category shown in FIG. 5). It is shown that there is. Further, the category name (category name) identified by the category number “c04” is “T company”. Further, it is indicated that the document identified by the document numbers “d15”, “d23”, and “d36” belongs (classified) to the category identified by the category number “c04”.

なお、図６に示すように、カテゴリ記憶部２３には、カテゴリに分類される文書が満たすべき条件が記憶されていても構わない。図６に示す例では、カテゴリ記憶部２３には、条件「出願人＝“Ｔ社”」が記憶されている。これによれば、上記したように文書に含まれる出願人がＴ社である文書でなければカテゴリ番号「ｃ０４」によって識別されるカテゴリには分類されないことが示される。つまり、例えば上記した図３に示す文書２２１は、当該文書２２１に含まれる出願人がＡ社であるため、カテゴリ番号「ｃ０４」によって識別されるカテゴリには分類されない。 As shown in FIG. 6, the category storage unit 23 may store conditions that should be satisfied by a document classified into a category. In the example shown in FIG. 6, the condition “applicant =“ Company T ”” is stored in the category storage unit 23. According to this, as described above, it is indicated that the document is not classified into the category identified by the category number “c04” unless the applicant included in the document is T company. That is, for example, the document 221 shown in FIG. 3 is not classified into the category identified by the category number “c04” because the applicant included in the document 221 is company A.

図６に示す例では、文書番号および条件が対応づけてカテゴリ記憶部２３に記憶されているものとして説明したが、例えば文書番号は明示的に列挙せずに条件のみが記憶されていても構わない。 In the example illustrated in FIG. 6, the document number and the condition are described as being associated with each other and stored in the category storage unit 23. However, for example, the document number may not be explicitly listed but only the condition may be stored. Absent.

図７は、カテゴリの階層構造における図４に示すカテゴリの子カテゴリを示す。図７に示すカテゴリにおいては、カテゴリ番号「ｃ０３」、親カテゴリ番号「ｃ０１」およびカテゴリ名「技術別」がカテゴリ記憶部２３に対応づけて記憶されている。 FIG. 7 shows child categories of the category shown in FIG. 4 in the category hierarchy. In the category shown in FIG. 7, category number “c03”, parent category number “c01”, and category name “by technology” are stored in association with category storage unit 23.

これによれば、カテゴリの階層構造においてカテゴリ番号「ｃ０３」によって識別されるカテゴリの上位に位置する親カテゴリは、親カテゴリ番号「ｃ０１」によって識別されるカテゴリ（つまり、図４に示すカテゴリ）であることが示される。また、カテゴリ番号「ｃ０３」によって識別されるカテゴリの名称（カテゴリ名）は、「技術別」であることが示される。なお、図７に示す例では、文書番号が記憶されていないため、カテゴリ番号「ｃ０３」によって識別されるカテゴリには、文書が分類されていない（つまり、当該カテゴリに属する文書は存在しない）ことが示される。 According to this, the parent category positioned higher than the category identified by the category number “c03” in the category hierarchical structure is the category identified by the parent category number “c01” (that is, the category shown in FIG. 4). It is shown that there is. In addition, it is indicated that the category name (category name) identified by the category number “c03” is “by technology”. In the example shown in FIG. 7, since the document number is not stored, the document is not classified in the category identified by the category number “c03” (that is, there is no document belonging to the category). Is shown.

図８は、カテゴリの階層構造における図７に示すカテゴリの子カテゴリを示す。図８に示すカテゴリにおいては、カテゴリ番号「ｃ３１」、親カテゴリ番号「ｃ０３」、カテゴリ名「対話分類」および文書番号「ｄ０７，ｄ２３，ｄ５８，…」がカテゴリ記憶部２３に対応づけて記憶されている。 FIG. 8 shows child categories of the categories shown in FIG. 7 in the category hierarchy. In the category shown in FIG. 8, the category number “c31”, the parent category number “c03”, the category name “dialogue classification”, and the document numbers “d07, d23, d58,...” Are stored in association with the category storage unit 23. ing.

これによれば、カテゴリの階層構造においてカテゴリ番号「ｃ３１」によって識別されるカテゴリの上位に位置する親カテゴリは、親カテゴリ番号「ｃ０３」によって識別されるカテゴリ（つまり、図７に示すカテゴリ）であることが示される。また、カテゴリ番号「ｃ３１」によって識別されるカテゴリの名称（カテゴリ名）は、「対話分類」であることが示される。更に、カテゴリ番号「ｃ３１」によって識別されるカテゴリには、文書番号「ｄ０７」、「ｄ２３」および「ｄ５８」によって識別される文書が属する（分類されている）ことが示される。 According to this, the parent category positioned higher than the category identified by the category number “c31” in the category hierarchical structure is the category identified by the parent category number “c03” (that is, the category shown in FIG. 7). It is shown that there is. In addition, the category name (category name) identified by the category number “c31” is “interactive classification”. Further, it is indicated that the document identified by the document numbers “d07”, “d23”, and “d58” belongs (classified) to the category identified by the category number “c31”.

図９は、カテゴリの階層構造における図８に示すカテゴリの子カテゴリを示す。図９に示すカテゴリにおいては、カテゴリ番号「ｃ４３」、親カテゴリ番号「ｃ３１」、カテゴリ名「教師あり分類」および文書番号「ｄ１５，ｄ３２，ｄ６９，…」がカテゴリ記憶部２３に対応づけて記憶されている。 FIG. 9 shows child categories of the category shown in FIG. 8 in the category hierarchy. In the category shown in FIG. 9, category number “c43”, parent category number “c31”, category name “supervised classification”, and document numbers “d15, d32, d69,...” Are stored in association with category storage unit 23. Has been.

これによれば、カテゴリの階層構造においてカテゴリ番号「ｃ４３」によって識別されるカテゴリの上位に位置する親カテゴリは、親カテゴリ番号「ｃ３１」によって識別されるカテゴリ（つまり、図８に示すカテゴリ）であることが示される。また、カテゴリ番号「ｃ４３」によって識別されるカテゴリの名称（カテゴリ名）は、「教師あり分類」であることが示される。更に、カテゴリ番号「ｃ４３」によって識別されるカテゴリには、文書番号「ｄ１５」、「ｄ３２」および「ｄ６９」によって識別される文書が属する（分類されている）ことが示される。 According to this, the parent category positioned above the category identified by the category number “c43” in the category hierarchical structure is the category identified by the parent category number “c31” (that is, the category shown in FIG. 8). It is shown that there is. In addition, the category name (category name) identified by the category number “c43” is “supervised classification”. Furthermore, it is indicated that the document identified by the document numbers “d15”, “d32”, and “d69” belongs (classified) to the category identified by the category number “c43”.

なお、上記した図４〜図９に示すカテゴリに示すように、同一の文書が複数のカテゴリに属していても構わない。例えば文書番号「ｄ１５」によって識別される文書は、図６に示すカテゴリ番号「ｃ０４」によって識別されるカテゴリおよび図９に示すカテゴリ番号「ｃ４３」によって識別されるカテゴリの両方に属している（分類されている）。また、例えば文書番号「ｄ２３」によって識別される文書は、図６に示すカテゴリ番号「ｃ０４」によって識別されるカテゴリおよび図８に示すカテゴリ番号「ｃ３１」によって識別されるカテゴリの両方に属している（分類されている）。 Note that the same document may belong to a plurality of categories as shown in the categories shown in FIGS. For example, the document identified by the document number “d15” belongs to both the category identified by the category number “c04” shown in FIG. 6 and the category identified by the category number “c43” shown in FIG. Have been). For example, the document identified by the document number “d23” belongs to both the category identified by the category number “c04” shown in FIG. 6 and the category identified by the category number “c31” shown in FIG. (Classified).

次に、図１０のフローチャートを参照して、本実施形態に係る文書分析装置３０の処理手順について説明する。 Next, a processing procedure of the document analysis apparatus 30 according to the present embodiment will be described with reference to the flowchart of FIG.

まず、ユーザインタフェース部３１に含まれるカテゴリ表示操作部３１１は、ユーザにとって重要な文書（重要文書）を指定（選択）するための画面（以下、重要文書指定画面と表記）を表示する（ステップＳ１）。 First, the category display operation unit 311 included in the user interface unit 31 displays a screen for designating (selecting) an important document (important document) for the user (hereinafter referred to as an important document designation screen) (step S1). ).

この重要文書指定画面には、例えばカテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリおよび当該カテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書（に含まれる文書名および本文）が表示される。 The important document designation screen includes, for example, a category identified by a category number stored in the category storage unit 23 and a document identified by a document number stored in the category storage unit 23 in association with the category number ( The document name and text included in the

ここで、図１１は、カテゴリ表示操作部３１１によって表示される重要文書指定画面の一例を示す。 Here, FIG. 11 shows an example of an important document designation screen displayed by the category display operation unit 311.

図１１に示す重要文書指定画面１００においては、領域１０１〜１０３が設けられている。 In the important document designation screen 100 shown in FIG. 11, areas 101 to 103 are provided.

領域１０１には、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリが、当該カテゴリの名称（つまり、カテゴリ名）とともに階層構造で表示される。 In the area 101, the category identified by the category number stored in the category storage unit 23 is displayed in a hierarchical structure together with the name of the category (that is, the category name).

領域１０２には、領域１０１において表示されているカテゴリの中でユーザによって指定されたカテゴリに属する文書の文書名が表示される。つまり、領域１０２には、ユーザによって指定されたカテゴリを識別するカテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書の各々に含まれる文書名の一覧が表示される。なお、文書名は、文書記憶部２２に記憶されている文書の各々に含まれている。図１１に示す重要文書指定画面１００においては、領域１０１において表示されているカテゴリのうちのカテゴリ「対話分類」に属する文書の文書名が表示されている。 In the area 102, the document names of documents belonging to the category designated by the user among the categories displayed in the area 101 are displayed. In other words, the area 102 displays a list of document names included in each of the documents identified by the document number stored in the category storage unit 23 in association with the category number identifying the category designated by the user. The The document name is included in each document stored in the document storage unit 22. In the important document designation screen 100 shown in FIG. 11, document names of documents belonging to the category “dialog classification” among the categories displayed in the area 101 are displayed.

領域１０３には、領域１０２において表示されている文書（の文書名）の中でユーザによって指定された文書に含まれる本文が表示される。なお、本文は、文書記憶部２２に記憶されている文書に含まれている。図１１に示す重要文書指定画面１００においては、領域１０２において表示されている文書名の一覧のうちの文書名が「文書分類方法および装置」である文書の本文が表示されている。 In the area 103, the text included in the document specified by the user among the documents (document names) displayed in the area 102 is displayed. Note that the text is included in the document stored in the document storage unit 22. In the important document designation screen 100 shown in FIG. 11, the text of a document whose document name is “document classification method and apparatus” in the list of document names displayed in the area 102 is displayed.

上記したような重要文書指定画面１００が表示されると、ユーザは、当該重要文書指定画面１００を参照することによってカテゴリ（分類結果）および文書の内容（文書名および本文等）を確認し、例えば文書分析装置３０に対して重要文書（当該ユーザにとって重要な文書）を指定する操作を行うことができる。 When the important document designation screen 100 as described above is displayed, the user refers to the important document designation screen 100 to check the category (classification result) and the content of the document (document name, text, etc.). An operation for designating an important document (document important for the user) can be performed on the document analysis apparatus 30.

文書分析装置３０に対して重要文書を指定する操作がユーザによって行われた場合、当該ユーザの操作は、カテゴリ表示操作部３１１によって受け付けられる。 When an operation for designating an important document is performed on the document analysis apparatus 30 by the user, the user's operation is received by the category display operation unit 311.

重要文書指定部３１２は、カテゴリ表示操作部３１１によって受け付けられたユーザの操作に応じて、文書記憶部２２に記憶されている文書の中から重要文書を指定する（ステップＳ２）。 The important document designation unit 312 designates an important document from the documents stored in the document storage unit 22 in accordance with the user operation received by the category display operation unit 311 (step S2).

なお、重要文書指定部３１２によって重要文書が指定されると、当該指定結果が文書記憶部２２に記憶される。具体的には、文書記憶部２２に記憶されている重要文書指定部３１２によって指定された重要文書に含まれる重要度が「重要」に変更される。これにより、重要文書指定部３１２によって指定された文書（重要文書）は、ユーザにとって重要な文書である旨が設定される。 When an important document is designated by the important document designation unit 312, the designation result is stored in the document storage unit 22. Specifically, the importance included in the important document designated by the important document designation unit 312 stored in the document storage unit 22 is changed to “important”. As a result, it is set that the document (important document) designated by the important document designation unit 312 is an important document for the user.

ここで、図１２は、上記した図１１に示す重要文書指定画面１００において重要文書を指定する操作が行われた後の表示画面の一例を示す。 Here, FIG. 12 shows an example of a display screen after an operation of specifying an important document is performed on the important document specifying screen 100 shown in FIG.

重要文書指定画面１００において例えば領域１０２に表示されている文書（の文書名）を重要文書として指定する操作がユーザによって行われた場合には、図１２に示すように、当該文書名の例えば左側に当該文書が重要文書として指定（設定）された旨を示すマーク１０４が表示される。 When the user performs an operation for designating, for example, a document (its document name) displayed in the area 102 on the important document designation screen 100 as an important document, as shown in FIG. A mark 104 indicating that the document has been designated (set) as an important document is displayed.

図１２に示す例では、例えば文書名が「文書分類方法および装置」および「テキスト自動分類方式」である文書が重要文書として指定されたことが示されている。 In the example shown in FIG. 12, for example, a document whose document name is “document classification method and apparatus” and “text automatic classification method” is designated as an important document.

次に、重要文書処理部３２に含まれる重要文書候補抽出部３２１は、重要文書指定部３１２によって指定された重要文書に基づいて、文書記憶部２２に記憶されている文書の中から重要文書の候補（重要文書候補）を抽出する処理（以下、重要文書候補抽出処理と表記）を実行する（ステップＳ３）。 Next, the important document candidate extraction unit 321 included in the important document processing unit 32 selects an important document from the documents stored in the document storage unit 22 based on the important document designated by the important document designation unit 312. Processing for extracting candidates (important document candidates) (hereinafter referred to as important document candidate extraction processing) is executed (step S3).

重要文書候補抽出処理においては、重要文書指定部３１２によって指定された重要文書が属するカテゴリを表すベクトルと当該重要文書候補抽出処理の対象となる文書（後述する）が属するカテゴリを表すベクトルとの類似度が算出され、当該類似度に基づいて重要文書候補が決定される。なお、重要文書候補抽出処理の詳細については後述する。 In the important document candidate extraction process, a similarity between a vector representing a category to which the important document designated by the important document designating unit 312 belongs and a vector representing a category to which a document (to be described later) targeted for the important document candidate extraction process belongs. The degree is calculated, and important document candidates are determined based on the degree of similarity. Details of the important document candidate extraction process will be described later.

重要カテゴリ候補抽出部３２２は、重要文書候補抽出部３２１によって抽出された重要文書候補および重要文書候補抽出部３２１によって算出された類似度（当該重要文書候補と重要文書との類似度）に基づいて、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの中からユーザにとって重要な文書が属するカテゴリの候補（重要カテゴリ候補）を抽出する処理（以下、重要カテゴリ候補抽出処理と表記）を実行する（ステップＳ４）。 The important category candidate extraction unit 322 is based on the important document candidate extracted by the important document candidate extraction unit 321 and the similarity (similarity between the important document candidate and the important document) calculated by the important document candidate extraction unit 321. Processing for extracting a category candidate (important category candidate) to which a document important for the user belongs from the categories identified by the category number stored in the category storage unit 23 (hereinafter referred to as important category candidate extraction processing) Is executed (step S4).

重要カテゴリ候補抽出処理においては、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの各々の重要度が重要文書候補抽出部３２１によって算出された重要文書候補と重要文書との類似度に基づいて算出され、当該重要度に基づいて重要カテゴリ候補が決定される。 In the important category candidate extraction process, the importance of each category identified by the category number stored in the category storage unit 23 is the similarity between the important document candidate calculated by the important document candidate extraction unit 321 and the important document. The important category candidate is determined based on the importance.

カテゴリの重要度は、カテゴリの各々に属する重要文書候補と重要文書との類似度の和（合計値）に対する、当該カテゴリに属する重要文書候補と重要文書との類似度の割合として算出される。なお、重要カテゴリ候補抽出処理の詳細については後述する。 The importance of a category is calculated as the ratio of the similarity between the important document candidate belonging to the category and the important document with respect to the sum (total value) of the similarity between the important document candidate belonging to each category and the important document. The details of the important category candidate extraction process will be described later.

次に、カテゴリ表示操作部３１１は、重要カテゴリ候補抽出部３２２によって抽出された重要カテゴリ候補を提示（表示）する（ステップＳ５）。この場合、カテゴリ表示操作部３１１は、上記した重要カテゴリ候補抽出処理において算出されたカテゴリ（重要カテゴリ候補）の重要度に応じて当該重要カテゴリ候補を表示する。なお、重要カテゴリ候補が表示される際の具体例については後述する。 Next, the category display operation unit 311 presents (displays) the important category candidates extracted by the important category candidate extraction unit 322 (step S5). In this case, the category display operation unit 311 displays the important category candidate according to the importance of the category (important category candidate) calculated in the important category candidate extraction process. A specific example when the important category candidate is displayed will be described later.

ここで、ユーザは、カテゴリ表示操作部３１１によって提示された重要カテゴリ候補のうちの１つを選択する操作を例えば文書分析装置３０に対して行うことができる。この場合、ユーザは、例えば重要文書が含まれているかを確認したいカテゴリ（重要カテゴリ候補）を選択する。 Here, the user can perform an operation for selecting one of the important category candidates presented by the category display operation unit 311, for example, on the document analysis apparatus 30. In this case, for example, the user selects a category (important category candidate) for which it is desired to check whether an important document is included.

文書分析装置３０に対して重要カテゴリ候補のうちの１つを選択する操作が行われた場合、当該ユーザの操作は、カテゴリ表示操作部３１１によって受け付けられる。 When an operation for selecting one of the important category candidates is performed on the document analysis apparatus 30, the user's operation is received by the category display operation unit 311.

カテゴリ選択部３１３は、カテゴリ表示操作部３１１によって受け付けられたユーザの操作に応じて、重要カテゴリ候補抽出部３２２によって抽出された重要カテゴリ候補のうちの１つを選択する（ステップＳ６）。以下、カテゴリ選択部３１３によって選択された重要カテゴリ候補を選択重要カテゴリ候補と称する。 The category selection unit 313 selects one of the important category candidates extracted by the important category candidate extraction unit 322 in accordance with the user operation accepted by the category display operation unit 311 (step S6). Hereinafter, the important category candidate selected by the category selection unit 313 is referred to as a selected important category candidate.

次に、関連重要カテゴリ候補抽出部３２３は、重要カテゴリ候補抽出部３２２によって抽出された重要カテゴリ候補（選択重要カテゴリ候補以外の重要カテゴリ候補）の中から当該選択重要カテゴリ候補と関連のある（関連が高い）重要カテゴリ候補（関連重要カテゴリ候補）を抽出する処理（以下、関連重要カテゴリ候補抽出処理と表記）を実行する（ステップＳ７）。 Next, the related important category candidate extraction unit 323 is related to the selected important category candidate from the important category candidates (important category candidates other than the selected important category candidate) extracted by the important category candidate extraction unit 322 (related) The process of extracting important category candidates (related important category candidates) (hereinafter referred to as related important category candidate extraction process) is executed (step S7).

関連重要カテゴリ候補抽出処理においては、選択重要カテゴリ候補に属する重要文書候補および当該選択重要カテゴリ候補以外の重要カテゴリ候補の各々に属する重要文書候補に基づいて、当該重要カテゴリ候補が関連重要カテゴリ候補であるか否かが判定される。なお、関連重要カテゴリ候補抽出処理の詳細については後述する。 In the related important category candidate extraction process, based on the important document candidate belonging to the selected important category candidate and the important document candidate belonging to each of the important category candidates other than the selected important category candidate, the relevant important category candidate is the relevant important category candidate. It is determined whether or not there is. Details of the related important category candidate extraction process will be described later.

カテゴリ表示操作部３１１は、関連重要カテゴリ候補抽出部３２３によって抽出された関連重要カテゴリ候補を提示（表示）する（ステップＳ８）。この場合、カテゴリ表示操作部３１１は、関連重要カテゴリ候補とともに、当該関連重要カテゴリ候補に属する重要文書候補（および重要文書）の数を表示する。なお、関連重要カテゴリ候補に属する重要文書候補および重要文書の数は、当該関連重要カテゴリ候補を識別するためのカテゴリ番号と当該重要文書候補および重要文書を識別するための文書番号とを元にカテゴリ記憶部２３を参照することによって特定される。 The category display operation unit 311 presents (displays) the related important category candidates extracted by the related important category candidate extraction unit 323 (step S8). In this case, the category display operation unit 311 displays the number of important document candidates (and important documents) belonging to the related important category candidate together with the related important category candidates. The number of important document candidates and important documents belonging to the related important category candidates is determined based on the category number for identifying the relevant important category candidates and the document number for identifying the relevant important document candidates and important documents. It is specified by referring to the storage unit 23.

ここで、ユーザは、カテゴリ表示操作部３１１によって提示された重要カテゴリ候補および関連重要カテゴリ候補のうち、例えば重要文書が存在するか否かを確認したいカテゴリを選択する操作を例えば文書分析装置３０に対して行うことができる。 Here, for example, the document analysis apparatus 30 performs an operation of selecting a category in which, for example, whether or not an important document exists is selected from the important category candidates and the related important category candidates presented by the category display operation unit 311. Can be done against.

文書分析装置３０に対して重要文書が存在するか否かを確認したいカテゴリを選択する操作が行われた場合、当該ユーザの操作は、カテゴリ表示操作部３１１によって受け付けられる。 When an operation for selecting a category for which it is desired to confirm whether or not an important document exists is performed on the document analysis apparatus 30, the operation of the user is accepted by the category display operation unit 311.

ここで、カテゴリ表示操作部３１１は、重要文書が存在するか否かを確認したいカテゴリを選択する操作が受け付けられたか、つまり、ユーザによって当該カテゴリが選択されたか否かを判定する（ステップＳ９）。 Here, the category display operation unit 311 determines whether an operation for selecting a category for which it is desired to confirm whether or not an important document exists has been received, that is, whether or not the category has been selected by the user (step S9). .

重要文書が存在するか否かを確認したいカテゴリを選択する操作が受け付けられたと判定された場合（ステップＳ９のＹＥＳ）、カテゴリ選択部３１３は、カテゴリ表示操作部３１１によって受け付けられた当該操作に応じて、重要カテゴリ候補および関連重要カテゴリ候補の中からカテゴリ（ユーザが重要文書が存在するか否かを確認したいカテゴリ）を選択する。 When it is determined that an operation for selecting a category for which it is desired to confirm whether or not an important document exists is accepted (YES in step S9), the category selection unit 313 responds to the operation accepted by the category display operation unit 311. Then, a category (a category in which the user wants to confirm whether or not an important document exists) is selected from the important category candidates and the related important category candidates.

次に、カテゴリ表示操作部３１１は、カテゴリ選択部３１３によって選択されたカテゴリに属する文書を提示（表示）する（ステップＳ１０）。この場合、カテゴリ表示操作部３１１は、カテゴリ選択部３１３によって選択されたカテゴリを識別するためのカテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書に含まれる例えば文書名を文書記憶部２２から取得し、当該文書名の一覧を表示する。 Next, the category display operation unit 311 presents (displays) a document belonging to the category selected by the category selection unit 313 (step S10). In this case, the category display operation unit 311 is included in the document identified by the document number stored in the category storage unit 23 in association with the category number for identifying the category selected by the category selection unit 313. Document names are acquired from the document storage unit 22 and a list of the document names is displayed.

なお、カテゴリ選択部３１３によって選択されたカテゴリに属する文書（の文書名）が表示される場合、例えば重要文書指定部３１２によって指定された重要文書および重要文書候補抽出部３２１によって抽出された重要文書候補が上位に表示される。また、重要文書候補抽出部３２１によって抽出された重要文書候補は、重要文書との類似度（重要文書候補抽出部３２１によって算出された類似度）が高い順に表示される。 When a document belonging to the category selected by the category selection unit 313 is displayed (for example, an important document specified by the important document specification unit 312 and an important document extracted by the important document candidate extraction unit 321). Candidates are displayed at the top. The important document candidates extracted by the important document candidate extraction unit 321 are displayed in descending order of similarity to the important document (similarity calculated by the important document candidate extraction unit 321).

ユーザは、カテゴリ表示操作部３１１によって表示された文書（ユーザが重要文書が存在するか否かを確認したいカテゴリに属する文書）を確認することによって、当該文書の中から重要文書を指定することができる。当該文書の中に、重要文書が存在すれば(ステップＳ１１のＹＥＳ)、ユーザは重要文書を指定し、文書分析装置３０はステップＳ３以降の処理を繰り返す。重要文書が存在しなければ（ステップＳ１１のＮＯ）、文書分析装置３０の処理は終了される。このように、重要文書指定部３１２によって指定された重要文書を元に、ユーザが重要文書を指定し、カテゴリ表示操作部３１１は再びカテゴリに属する文書の候補を提示する。このような処理を繰り返すことで、ユーザは重要文書の傾向が徐々に明確になっていく。 The user may designate an important document from among the documents by confirming the document displayed by the category display operation unit 311 (the document belonging to the category for which the user wants to confirm whether or not the important document exists). it can. If there is an important document in the document (YES in step S11), the user designates the important document, and the document analysis apparatus 30 repeats the processing from step S3. If there is no important document (NO in step S11), the processing of the document analyzer 30 is terminated. In this way, the user designates an important document based on the important document designated by the important document designation unit 312, and the category display operation unit 311 again presents document candidates belonging to the category. By repeating such processing, the user gradually becomes clear the tendency of important documents.

一方、重要文書が存在するか否かを確認したいカテゴリを選択する操作が受け付けられていないと判定された場合（ステップＳ９のＮＯ）、文書分析装置３０の処理は終了される。 On the other hand, when it is determined that an operation for selecting a category for which it is desired to confirm whether or not an important document exists is not accepted (NO in step S9), the processing of the document analysis apparatus 30 is ended.

次に、図１３のフローチャートを参照して、重要文書候補抽出部３２１による重要文書候補抽出処理（図１０に示すステップＳ３の処理）の処理手順について説明する。 Next, a processing procedure of important document candidate extraction processing (processing in step S3 shown in FIG. 10) by the important document candidate extraction unit 321 will be described with reference to the flowchart of FIG.

まず、ステップＳ２１〜ステップＳ２５の処理において、前述した重要文書指定部３１１によって指定された重要文書が属するカテゴリ（の頻度）を表すベクトル（カテゴリベクトル）が生成される。 First, in the processing from step S21 to step S25, a vector (category vector) representing the category (frequency) to which the important document designated by the important document designation unit 311 described above belongs is generated.

重要文書候補抽出部３２１は、重要文書指定部３１１によって指定された重要文書の集合（以下、重要文書集合Ｄｄ´と表記）のカテゴリベクトル（以下、カテゴリベクトルｖｄｄ´と表記）を空とする（ステップＳ２１）。 The important document candidate extraction unit 321 empties a category vector (hereinafter referred to as a category vector vdd ′) of a set of important documents (hereinafter referred to as an important document set Dd ′) designated by the important document designation unit 311 ( Step S21).

次に、重要文書候補抽出部３２１は、重要文書集合Ｄｄ´中の重要文書の各々について、以下のステップＳ２２およびＳ２３の処理を実行する。以下、この処理の対象となる重要文書を重要文書ｄ´とする。 Next, the important document candidate extraction unit 321 executes the following steps S22 and S23 for each important document in the important document set Dd ′. Hereinafter, an important document to be processed is referred to as an important document d ′.

重要文書候補抽出部３２１は、重要文書ｄ´が属するカテゴリを表すカテゴリベクトルｖｄ´を生成する処理（以下、カテゴリベクトル生成処理と表記）を実行する（ステップＳ２２）。 The important document candidate extraction unit 321 executes processing for generating a category vector vd ′ representing the category to which the important document d ′ belongs (hereinafter referred to as category vector generation processing) (step S22).

このカテゴリベクトル生成処理においては、例えばカテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの各々を次元とするベクトルが生成される。なお、カテゴリベクトル生成処理の詳細については後述する。 In this category vector generation process, for example, a vector having each dimension identified by the category number stored in the category storage unit 23 as a dimension is generated. Details of the category vector generation process will be described later.

次に、重要文書候補抽出部３２１は、カテゴリベクトルｖｄｄ´にカテゴリベクトル生成処理において生成された重要文書ｄ´のカテゴリベクトルｖｄ´を加算する（ステップＳ２３）。 Next, the important document candidate extraction unit 321 adds the category vector vd ′ of the important document d ′ generated in the category vector generation process to the category vector vdd ′ (step S23).

重要文書候補抽出部３２１は、重要文書集合Ｄｄ´中の全ての重要文書について上記したステップＳ２２およびＳ２３の処理が実行されたか否かを判定する（ステップＳ２４）。 The important document candidate extraction unit 321 determines whether or not the processing in steps S22 and S23 described above has been executed for all important documents in the important document set Dd ′ (step S24).

重要文書集合Ｄｄ´中の全ての重要文書について処理が実行されていないと判定された場合（ステップＳ２４のＮＯ）、上記したステップＳ２２に戻って処理が繰り返される。この場合、ステップＳ２２およびＳ２３の処理が実行されていない重要文書集合Ｄｄ´中の重要文書を重要文書ｄ´として処理が実行される。 If it is determined that the processing has not been executed for all the important documents in the important document set Dd ′ (NO in step S24), the process returns to the above step S22 and is repeated. In this case, the process is executed with the important document in the important document set Dd ′ for which the processes of steps S22 and S23 have not been executed as the important document d ′.

このように、ステップＳ２２およびＳ２３の処理が重要文書集合Ｄｄ´中の重要文書の各々について繰り返されることによって当該重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´が生成される。 In this way, the processing of steps S22 and S23 is repeated for each important document in the important document set Dd ′, thereby generating the category vector vdd ′ of the important document set Dd ′.

ステップＳ２４において重要文書集合Ｄｄ´中の全ての重要文書について処理が実行されたと判定された場合、重要文書候補抽出部３２１は、生成されたカテゴリベクトルｖｄｄ´の各次元の値をノルム｜ｖｄｄ´｜で割ることによって、当該カテゴリベクトルｖｄｄ´を正規化する（ステップＳ２５）。この結果、カテゴリベクトルｖｄｄ´は、ノルムが１のベクトルとなる。 When it is determined in step S24 that the processing has been executed for all the important documents in the important document set Dd ′, the important document candidate extraction unit 321 uses the value of each dimension of the generated category vector vdd ′ as the norm | vdd ′. By dividing by |, the category vector vdd ′ is normalized (step S25). As a result, the category vector vdd ′ is a vector having a norm of 1.

次に、ステップＳ２６の処理において、以下のステップＳ２７〜Ｓ３１の処理の対象となる文書（前述した重要文書候補抽出処理の対象となる文書）の集合（以下、対象文書集合Ｄｄと表記）が特定される。 Next, in the process of step S26, a set (hereinafter referred to as a target document set Dd) of documents to be processed in the following steps S27 to S31 (documents to be subjected to the above-described important document candidate extraction process) is specified. Is done.

重要文書候補抽出部３２１は、上記した重要文書集合Ｄｄ´中の重要文書の各々が属するカテゴリを特定する。この場合、重要文書候補抽出部３２１は、重要文書集合Ｄｄ´中の重要文書の各々を識別する文書番号に対応づけてカテゴリ記憶部２３に記憶されているカテゴリ番号（によって識別されるカテゴリ）を特定する。 The important document candidate extraction unit 321 specifies a category to which each important document in the important document set Dd ′ belongs. In this case, the important document candidate extraction unit 321 uses the category number (category identified by) stored in the category storage unit 23 in association with the document number for identifying each important document in the important document set Dd ′. Identify.

重要文書候補抽出部３２１は、特定されたカテゴリ番号によって識別されるカテゴリのうちの少なくとも１つに属する文書の集合を対象文書集合Ｄｄとして特定する（ステップＳ２６）。この場合、重要文書候補抽出部３２１は、特定されたカテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書の集合を対象文書集合Ｄｄとする。 The important document candidate extraction unit 321 specifies a set of documents belonging to at least one of the categories identified by the specified category number as the target document set Dd (step S26). In this case, the important document candidate extraction unit 321 sets a document set identified by the document number stored in the category storage unit 23 in association with the identified category number as the target document set Dd.

つまり、対象文書集合Ｄｄは、重要文書集合Ｄｄ´中の重要文書の各々が属するカテゴリのうちの少なくとも１つに属する文書の集合である。 That is, the target document set Dd is a set of documents belonging to at least one of the categories to which each important document in the important document set Dd ′ belongs.

次に、ステップＳ２７〜Ｓ３２の処理によって対象文書集合Ｄｄ中から重要文書候補となる文書が抽出される。 Next, a document that is an important document candidate is extracted from the target document set Dd by the processes of steps S27 to S32.

重要文書候補抽出部３２１は、対象文書集合Ｄｄ中の文書の各々について、以下のステップＳ２７〜Ｓ３１の処理を実行する。以下、この処理の対象となる文書を対象文書ｄとする。 The important document candidate extraction unit 321 performs the following steps S27 to S31 for each of the documents in the target document set Dd. Hereinafter, a document to be processed is referred to as a target document d.

重要文書候補抽出部３２１は、対象文書ｄが属するカテゴリを表すカテゴリベクトルｖｄを生成する処理（カテゴリベクトル生成処理）を実行する（ステップＳ２７）。このステップＳ２７においては、上記したステップＳ２２におけるカテゴリベクトル生成処理と同様の処理が対象文書ｄに対して実行される。つまり、カテゴリベクトル生成処理によって、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリの各々を次元とするベクトルが生成される。 The important document candidate extraction unit 321 executes a process (category vector generation process) for generating a category vector vd representing the category to which the target document d belongs (step S27). In step S27, processing similar to the category vector generation processing in step S22 described above is executed for the target document d. That is, the category vector generation process generates a vector having each category identified by the category number stored in the category storage unit 23 as a dimension.

重要文書候補抽出部３２１は、生成されたカテゴリベクトルｖｄの各次元の値をノルム｜ｖｄ｜で割ることによって、当該カテゴリベクトルｖｄを正規化する（ステップＳ２８）。この結果、カテゴリベクトルｖｄは、ノルムが１のベクトルとなる。 The important document candidate extraction unit 321 normalizes the category vector vd by dividing the value of each dimension of the generated category vector vd by the norm | vd | (step S28). As a result, the category vector vd is a vector whose norm is 1.

次に、重要文書候補抽出部３２１は、上記したステップＳ２５において正規化されたカテゴリベクトルｖｄｄ´とステップＳ２８において正規化されたカテゴリベクトルｖｄとの類似度（以下、類似度ｓと表記）を算出する（ステップＳ２９）。この類似度ｓは、例えばカテゴリベクトルｖｄｄ´およびカテゴリベクトルｖｄの余弦値である。 Next, the important document candidate extraction unit 321 calculates the similarity between the category vector vdd ′ normalized in step S25 and the category vector vd normalized in step S28 (hereinafter referred to as similarity s). (Step S29). The similarity s is, for example, a cosine value of the category vector vdd ′ and the category vector vd.

重要文書候補抽出部３２１は、算出された類似度ｓが予め定められた値（以下、閾値と表記）以上であるか否かを判定する（ステップＳ３０）。 The important document candidate extraction unit 321 determines whether or not the calculated similarity s is greater than or equal to a predetermined value (hereinafter referred to as a threshold) (step S30).

類似度ｓが閾値以上であると判定された場合（ステップＳ３０のＹＥＳ）、重要文書候補抽出部３２１は、対象文書ｄを重要文書候補として決定する。 When it is determined that the similarity s is greater than or equal to the threshold (YES in step S30), the important document candidate extraction unit 321 determines the target document d as an important document candidate.

この場合、重要文書候補抽出部３２１は、対象文書ｄ（を識別するための文書番号）およびステップＳ２９において算出された類似度ｓを対応づけてリスト（変数list）に格納する（ステップＳ３１）。 In this case, the important document candidate extraction unit 321 associates the target document d (document number for identifying) with the similarity s calculated in step S29 and stores it in the list (variable list) (step S31).

一方、類似度ｓが閾値以上でないと判定された場合（ステップＳ３０のＮＯ）、以下のステップＳ３１の処理は実行されない。 On the other hand, when it is determined that the similarity s is not greater than or equal to the threshold (NO in step S30), the following process in step S31 is not executed.

重要文書候補抽出部３２１は、対象文書集合Ｄｄ中の全ての文書について上記したステップＳ２７〜Ｓ３１の処理が実行されたか否かを判定する（ステップＳ３２）。 The important document candidate extraction unit 321 determines whether or not the processing in steps S27 to S31 described above has been executed for all the documents in the target document set Dd (step S32).

対象文書集合Ｄｄ中の全ての文書について処理が実行されていないと判定された場合（ステップＳ３２のＮＯ）、上記したステップＳ２７に戻って処理が繰り返される。この場合、ステップＳ２７〜Ｓ３１の処理が実行されていない対象文書集合Ｄｄ中の文書を対象文書ｄとして処理が実行される。 When it is determined that the processing has not been executed for all the documents in the target document set Dd (NO in step S32), the process returns to the above step S27 and is repeated. In this case, the processing is executed with the document in the target document set Dd for which the processing of steps S27 to S31 has not been executed as the target document d.

一方、対象文書集合Ｄｄ中の全ての文書について処理が実行されたと判定された場合（ステップＳ３２のＮＯ）、重要文書候補抽出処理は終了される。 On the other hand, when it is determined that the processing has been executed for all the documents in the target document set Dd (NO in step S32), the important document candidate extraction processing is ended.

なお、上記したステップＳ３１においてリストに格納された文書（の各々）が、重要文書候補抽出処理において抽出された重要文書候補である。このステップＳ３１において重要文書候補および類似度が格納されたリスト（以下、重要文書候補リストと表記）は、以下に説明する重要カテゴリ候補抽出処理において用いられる。 Note that each of the documents stored in the list in step S31 described above is an important document candidate extracted in the important document candidate extraction process. The list in which the important document candidate and the similarity are stored in step S31 (hereinafter referred to as an important document candidate list) is used in the important category candidate extraction process described below.

次に、図１４のフローチャートを参照して、上述した重要文書候補抽出処理において実行されるカテゴリベクトル生成処理（図１３に示すステップＳ２２およびＳ２７の処理）の処理手順について説明する。 Next, a processing procedure of the category vector generation process (the processes in steps S22 and S27 shown in FIG. 13) executed in the above-described important document candidate extraction process will be described with reference to the flowchart in FIG.

以下、カテゴリベクトル生成処理の対象となる文書を文書ｄとして説明する。なお、図１３に示すステップＳ２２においてカテゴリベクトル生成処理の対象となる文書とは重要文書ｄ´である。また、図１３に示すステップＳ２７においてカテゴリベクトル生成処理の対象となる文書とは対象文書ｄである。 Hereinafter, a document that is a target of category vector generation processing will be described as a document d. Note that the document that is the target of the category vector generation process in step S22 shown in FIG. 13 is the important document d ′. Further, the document that is the target of the category vector generation process in step S27 shown in FIG. 13 is the target document d.

まず、重要文書候補抽出部３２１は、文書ｄが属するカテゴリを表すカテゴリベクトル（以下、カテゴリベクトルｖｄと表記）を空とする（ステップＳ４１）。 First, the important document candidate extraction unit 321 empties a category vector (hereinafter referred to as a category vector vd) representing a category to which the document d belongs (step S41).

次に、重要文書候補抽出部３２１は、文書ｄが属するカテゴリ（つまり、文書ｄを識別するための文書番号に対応づけてカテゴリ記憶部２３に記憶されているカテゴリ識別番号によって識別されるカテゴリ）の各々について、以下のステップＳ４２およびＳ４３の処理を実行する。以下、この処理の対象となるカテゴリをカテゴリｃとする。 Next, the important document candidate extraction unit 321 includes the category to which the document d belongs (that is, the category identified by the category identification number stored in the category storage unit 23 in association with the document number for identifying the document d). For each of the above, the following steps S42 and S43 are executed. Hereinafter, the category to be processed is referred to as category c.

重要文書候補抽出部３２１は、文書記憶部２２およびカテゴリ記憶部２３を参照して、カテゴリｃが不要なカテゴリ（以下、不要カテゴリと表記）であるか否かを判定する（ステップＳ４２）。 The important document candidate extraction unit 321 refers to the document storage unit 22 and the category storage unit 23 to determine whether the category c is an unnecessary category (hereinafter referred to as an unnecessary category) (step S42).

ここで、不要カテゴリとは、当該カテゴリに属する文書の全てについて重要度「不要」が設定されているカテゴリをいう。つまり、ステップＳ４２の処理は、カテゴリｃを識別するためのカテゴリ番号に対応づけてカテゴリ記憶部２２に記憶されている文書番号を特定し、文書記憶部２２に記憶されている当該文書番号によって識別される文書に含まれる重要度を参照することによって実行される。 Here, the unnecessary category refers to a category in which importance “unnecessary” is set for all documents belonging to the category. That is, the processing in step S42 specifies the document number stored in the category storage unit 22 in association with the category number for identifying the category c, and is identified by the document number stored in the document storage unit 22. It is executed by referring to the importance included in the document to be executed.

カテゴリｃが不要カテゴリでないと判定された場合（ステップＳ４２のＮＯ）、重要文書候補抽出部３２１は、カテゴリｃをカテゴリベクトルｖｄにおける１つの次元とし、当該次元の値を１とする（ステップＳ４３）。 When it is determined that the category c is not an unnecessary category (NO in step S42), the important document candidate extraction unit 321 sets the category c as one dimension in the category vector vd and sets the value of the dimension as 1 (step S43). .

次に、重要文書候補抽出部３２１は、文書ｄが属する全てのカテゴリについて上記したステップＳ４２およびＳ４３の処理が実行されたか否かを判定する（ステップＳ４４）。 Next, the important document candidate extraction unit 321 determines whether or not the processing in steps S42 and S43 described above has been executed for all categories to which the document d belongs (step S44).

文書ｄが属する全てのカテゴリについて処理が実行されていないと判定された場合（ステップＳ４４のＮＯ）、上記したステップＳ４２に戻って処理が繰り返される。この場合、ステップＳ４２およびＳ４３の処理が実行されていない文書ｄが属するカテゴリをカテゴリｃとして処理が実行される。 If it is determined that processing has not been performed for all categories to which the document d belongs (NO in step S44), the process returns to step S42 described above and is repeated. In this case, the process is executed with the category to which the document d, for which the processes of steps S42 and S43 have not been executed, as the category c.

一方、文書ｄが属する全てのカテゴリについて処理が実行されたと判定された場合（ステップＳ４４のＹＥＳ）、カテゴリベクトル生成処理は終了される。 On the other hand, if it is determined that the process has been executed for all categories to which the document d belongs (YES in step S44), the category vector generation process ends.

なお、上記したステップＳ４２においてカテゴリｃが不要カテゴリであると判定された場合には、ステップＳ４３の処理は実行されず、ステップＳ４４の処理が実行される。 In addition, when it determines with the category c being an unnecessary category in above-mentioned step S42, the process of step S43 is not performed but the process of step S44 is performed.

上記したカテゴリベクトル生成処理によって生成される文書ｄのカテゴリベクトル（当該文書ｄが属するカテゴリを表すカテゴリベクトル）ｖｄは、当該文書ｄが属するカテゴリであって不要カテゴリでないカテゴリの各々を１つの次元とし、当該各次元の値が１であるベクトルである。 The category vector of the document d (category vector representing the category to which the document d belongs) vd generated by the above-described category vector generation processing has each of the categories to which the document d belongs and not an unnecessary category as one dimension. , A vector whose value of each dimension is 1.

ここで、上述した重要文書候補抽出処理について具体的に説明する。ここでは、重要文書指定部３１１によって第１〜第４の重要文書が指定されているものとする。また、第１の重要文書は、カテゴリＡ、Ｂ、Ｃ、ＤおよびＥに属するものとする。第２の重要文書は、カテゴリＡ、Ｃ、Ｆ、ＧおよびＨに属するものとする。第３の重要文書は、カテゴリＢ、Ｃ、Ｅ、ＧおよびＩに属するものとする。第４の重要文書は、カテゴリＡ、Ｃ、Ｅ、ＧおよびＩに属するものとする。 Here, the important document candidate extraction process described above will be specifically described. Here, it is assumed that the first to fourth important documents are designated by the important document designation unit 311. Further, it is assumed that the first important document belongs to categories A, B, C, D, and E. It is assumed that the second important document belongs to categories A, C, F, G, and H. It is assumed that the third important document belongs to categories B, C, E, G, and I. It is assumed that the fourth important document belongs to categories A, C, E, G, and I.

この場合における重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´（つまり、第１〜第４の重要文書の各々が属するカテゴリの頻度を表すカテゴリベクトル）は、「カテゴリＡ：３、カテゴリＢ：２、カテゴリＣ：４、カテゴリＤ：１、カテゴリＥ：３、カテゴリＦ：１、カテゴリＧ：３、カテゴリＨ：１、カテゴリＩ：２」となる。重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´は、第１〜第４の重要文書の各々のカテゴリベクトル（ｖｄ´）の合計である。 In this case, the category vector vdd ′ of the important document set Dd ′ (that is, the category vector representing the frequency of the category to which each of the first to fourth important documents belongs) is “Category A: 3, Category B: 2, Category C: 4, Category D: 1, Category E: 3, Category F: 1, Category G: 3, Category H: 1, Category I: 2 ". The category vector vdd ′ of the important document set Dd ′ is the sum of the category vectors (vd ′) of the first to fourth important documents.

重要文書候補抽出処理においては、第１〜第４の重要文書の各々が属するカテゴリＡ〜Ｉのうちの少なくとも１つに属する文書の各々を対象文書として、当該対象文書のカテゴリベクトル（ｖｄ）が生成される。次に、対象文書のカテゴリベクトルｖｄと重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´との類似度ｓが算出され、当該類似度ｓが閾値以上である場合に当該対象文書は重要文書候補として決定（抽出）される。 In the important document candidate extraction process, each of the documents belonging to at least one of the categories A to I to which each of the first to fourth important documents belongs is set as a target document, and the category vector (vd) of the target document is set. Generated. Next, the similarity s between the category vector vd of the target document and the category vector vdd ′ of the important document set Dd ′ is calculated, and when the similarity s is equal to or greater than a threshold, the target document is determined as an important document candidate ( Extracted).

ここで、重要文書候補抽出処理において第１〜第３の重要文書候補が抽出されたものとする。 Here, it is assumed that the first to third important document candidates are extracted in the important document candidate extraction process.

第１の重要文書候補は、カテゴリＡ、Ｃ、ＧおよびＥに属する文書であって、当該第１の重要文書候補のカテゴリベクトルｖｄと重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´との類似度は、例えば０．８６で閾値以上である。したがって、第１の重要文書候補は、重要文書候補抽出処理において抽出される。 The first important document candidate is a document belonging to the categories A, C, G, and E, and the similarity between the category vector vd of the first important document candidate and the category vector vdd ′ of the important document set Dd ′ is For example, it is 0.86 or more at 0.86. Accordingly, the first important document candidate is extracted in the important document candidate extraction process.

第２の重要文書候補は、カテゴリＡ、Ｃ、Ｆ、ＧおよびＥに属する文書であって、当該第２の重要文書候補のカテゴリベクトルｖｄと重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´との類似度は、例えば０．８３で閾値以上である。したがって、第２の重要文書候補は、重要文書候補抽出処理において抽出される。 The second important document candidate is a document belonging to the categories A, C, F, G, and E, and the similarity between the category vector vd of the second important document candidate and the category vector vdd ′ of the important document set Dd ′. The degree is, for example, 0.83, which is equal to or greater than the threshold value. Therefore, the second important document candidate is extracted in the important document candidate extraction process.

また、第３の重要文書候補は、カテゴリＢ、Ｃ、ＤおよびＨに属する文書であって、当該第３の重要文書候補のカテゴリベクトルｖｄと重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´との類似度は、例えば０．６４で閾値以上である。したがって、第３の重要文書候補は、重要文書候補抽出処理において抽出される。 The third important document candidate is a document belonging to categories B, C, D, and H, and the similarity between the category vector vd of the third important document candidate and the category vector vdd ′ of the important document set Dd ′. The degree is, for example, 0.64, which is equal to or greater than a threshold value. Therefore, the third important document candidate is extracted in the important document candidate extraction process.

つまり、重要文書候補抽出処理においては、重要文書が属するカテゴリと同じカテゴリに多く属する文書、すなわち重要文書と同じ観点に多く属する文書が重要文書候補として抽出される。 That is, in the important document candidate extraction process, documents belonging to many of the same categories as the category to which the important document belongs, that is, documents belonging to the same viewpoint as the important document are extracted as important document candidates.

よって、上記した第１〜第４の重要文書の各々が属するカテゴリＡ〜Ｉ以外のカテゴリに多く属するような文書は、当該文書のカテゴリベクトルｖｄと重要文書集合Ｄｄ´のカテゴリベクトルｖｄｄ´との類似度は低く、重要文書候補としては抽出されない。 Therefore, a document that belongs to many categories other than the categories A to I to which each of the first to fourth important documents belongs is determined by the category vector vd of the document and the category vector vdd ′ of the important document set Dd ′. The similarity is low and is not extracted as an important document candidate.

次に、図１５のフローチャートを参照して、重要カテゴリ候補抽出部３２２による重要カテゴリ候補抽出処理（図１０に示すステップＳ４の処理）の処理手順について説明する。 Next, the processing procedure of the important category candidate extraction process (the process of step S4 shown in FIG. 10) by the important category candidate extraction unit 322 will be described with reference to the flowchart of FIG.

重要カテゴリ候補抽出処理においては、後述するようにカテゴリ記憶部２３に記憶されているカテゴリの各々の重要度が算出され、当該重要度に基づいて当該カテゴリの中から重要文書が属するカテゴリの候補（重要カテゴリ候補）が抽出される。 In the important category candidate extraction process, as will be described later, the importance of each category stored in the category storage unit 23 is calculated, and based on the importance, the category candidate to which the important document belongs (from the category) ( Important category candidates) are extracted.

なお、重要カテゴリ候補抽出処理においては、上記した重要文書候補リストが用いられる。重要文書候補リストには、上記したように重要文書候補（を識別するための文書番号）および当該重要文書候補のカテゴリベクトルと上記した重要文書集合のカテゴリベクトルとの類似度（以下、単に当該重要文書候補の類似度と表記）が対応づけて格納されている。 In the important category candidate extraction process, the important document candidate list described above is used. As described above, the important document candidate list includes the important document candidate (document number for identifying) and the similarity between the category vector of the important document candidate and the category vector of the important document set (hereinafter simply referred to as the important document list). (Similarity and notation of document candidates) are stored in association with each other.

まず、重要カテゴリ候補抽出部３２２は、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別される全てのカテゴリの集合（以下、カテゴリ集合Ｄｃと表記）のスコア（以下、スコアｓｄｃと表記）を空とする（ステップＳ５１）。このカテゴリ集合Ｄｃのスコアｓｄｃは、後述するようにカテゴリの重要度の算出に用いられる。 First, the important category candidate extraction unit 322 obtains scores (hereinafter referred to as scores sdc) of a set of all categories identified by category numbers stored in the category storage unit 23 (hereinafter referred to as category set Dc). Empty (step S51). The score sdc of the category set Dc is used for calculating the importance of the category as will be described later.

次に、重要カテゴリ候補抽出部３２２は、カテゴリ集合Ｄｃ中のカテゴリの各々について、以下のステップＳ５２〜Ｓ５６の処理を実行する。以下、この処理の対象となるカテゴリをカテゴリｃとする。 Next, the important category candidate extraction unit 322 performs the following processes of steps S52 to S56 for each category in the category set Dc. Hereinafter, the category to be processed is referred to as category c.

重要カテゴリ候補抽出部３２２は、カテゴリｃのスコア（以下、スコアｓｃと表記）を空とする（ステップＳ５２）。 The important category candidate extraction unit 322 empties the score of the category c (hereinafter referred to as a score sc) (step S52).

次に、重要カテゴリ候補抽出部３２２は、カテゴリｃに属する文書の各々について、以下のステップＳ５３およびＳ５４の処理を実行する。なお、カテゴリｃに属する文書とは、カテゴリｃを識別するためのカテゴリ番号に対応づけてカテゴリ記憶部２３に記憶されている文書番号によって識別される文書である。以下、この処理の対象となる文書を文書ｄとする。 Next, the important category candidate extraction unit 322 executes the following processes of steps S53 and S54 for each of the documents belonging to the category c. A document belonging to category c is a document identified by a document number stored in category storage unit 23 in association with a category number for identifying category c. Hereinafter, a document to be processed is referred to as a document d.

重要カテゴリ候補抽出部３２２は、文書ｄが重要文書候補リストに格納されているか、つまり、文書ｄが重要文書候補であるか否かを判定する（ステップＳ５３）。 The important category candidate extraction unit 322 determines whether or not the document d is stored in the important document candidate list, that is, whether or not the document d is an important document candidate (step S53).

文書ｄが重要文書候補であると判定された場合（ステップＳ５３のＹＥＳ）、重要カテゴリ候補抽出部３２２は、当該文書ｄ（重要文書候補）に対応づけて重要文書候補リストに格納されている類似度（つまり、当該重要文書候補の類似度）をカテゴリｃのスコアｓｃに加算する（ステップＳ５４）。 If it is determined that the document d is an important document candidate (YES in step S53), the important category candidate extraction unit 322 associates the document d (important document candidate) with the similarity stored in the important document candidate list. The degree (that is, the similarity of the important document candidate) is added to the score sc of category c (step S54).

一方、文書ｄが重要文書候補でないと判定された場合（ステップＳ５３のＮＯ）、ステップＳ５４の処理は実行されない。 On the other hand, when it is determined that the document d is not an important document candidate (NO in step S53), the process in step S54 is not executed.

重要カテゴリ候補抽出部３２２は、カテゴリｃに属する全ての文書について上記したステップＳ５３およびＳ５４の処理が実行されたか否かを判定する（ステップＳ５５）。 The important category candidate extraction unit 322 determines whether or not the processes in steps S53 and S54 described above have been executed for all documents belonging to the category c (step S55).

カテゴリｃに属する全ての文書について処理が実行されていないと判定された場合（ステップＳ５５のＮＯ）、上記したステップＳ５３に戻って処理が繰り返される。この場合、ステップＳ５３およびＳ５４の処理が実行されていないカテゴリｃに属する文書を文書ｄとして処理が実行される。 If it is determined that processing has not been performed for all documents belonging to category c (NO in step S55), the process returns to step S53 described above and is repeated. In this case, the processing is executed with the document belonging to the category c for which the processing of steps S53 and S54 has not been executed as the document d.

このようにステップＳ５３およびＳ５４の処理がカテゴリｃに属する文書の各々について繰り返されることによって当該カテゴリｃのスコアｓｃが算出される。つまり、カテゴリｃのスコアｓｃは、当該カテゴリｃに属する重要文書候補の類似度の合計である。 In this way, the process of steps S53 and S54 is repeated for each of the documents belonging to category c, thereby calculating the score sc of the category c. That is, the score sc of the category c is the total similarity of the important document candidates belonging to the category c.

ステップＳ５５においてカテゴリｃに属する全ての文書について処理が実行されたと判定された場合、重要カテゴリ候補抽出部３２２は、カテゴリｃのスコアｓｃをカテゴリ集合Ｄｃのスコアｓｄｃに加算する（ステップＳ５６）。 When it is determined in step S55 that the processing has been executed for all documents belonging to category c, the important category candidate extraction unit 322 adds the score sc of category c to the score sdc of category set Dc (step S56).

次に、重要カテゴリ候補抽出部３２２は、カテゴリ集合Ｄｃ中の全てのカテゴリについて上記したステップＳ５２〜Ｓ５６の処理が実行されたか否かを判定する（ステップＳ５７）。 Next, the important category candidate extraction unit 322 determines whether or not the processing in steps S52 to S56 described above has been executed for all categories in the category set Dc (step S57).

カテゴリ集合Ｄｃ中の全てのカテゴリについて処理が実行されていないと判定された場合（ステップＳ５７のＮＯ）、上記したステップＳ５２に戻って処理が繰り返される。この場合、ステップＳ５２〜Ｓ５６の処理が実行されていないカテゴリをカテゴリｃとして処理が実行される。 If it is determined that processing has not been performed for all categories in the category set Dc (NO in step S57), the process returns to step S52 described above and is repeated. In this case, the process is executed with the category for which the processes of steps S52 to S56 have not been executed as the category c.

このようにステップＳ５２〜Ｓ５６の処理がカテゴリ集合Ｄｃ中のカテゴリの各々について繰り返されることによって当該カテゴリ集合Ｄｃのスコアｓｄｃが算出される。つまり、カテゴリ集合Ｄｃのスコアｓｄｃは、当該カテゴリ集合Ｄｃ中のカテゴリの各々のスコア（ｓｃ）の合計である。 In this way, the process of steps S52 to S56 is repeated for each of the categories in the category set Dc, whereby the score sdc of the category set Dc is calculated. That is, the score sdc of the category set Dc is the total score (sc) of each category in the category set Dc.

ステップＳ５７においてカテゴリ集合Ｄｃ中の全てのカテゴリについて処理が実行されたと判定された場合、重要カテゴリ候補抽出部３２２は、当該カテゴリ集合Ｄｃ中のカテゴリの各々について、以下のステップＳ５８〜Ｓ６１の処理を実行する。以下、この処理の対象となるカテゴリをカテゴリｃとする。 If it is determined in step S57 that the processing has been executed for all categories in the category set Dc, the important category candidate extraction unit 322 performs the following steps S58 to S61 for each category in the category set Dc. Execute. Hereinafter, the category to be processed is referred to as category c.

重要カテゴリ候補抽出部３２２は、上記したカテゴリ集合Ｄｃのスコアｓｄｃおよびカテゴリｃのスコアｓｃを用いて、当該カテゴリｃの重要度を算出する（ステップＳ５８）。ここで、カテゴリｃの重要度は、ｓｃ／ｓｄｃによって算出されるものとする。つまり、カテゴリｃの重要度は、カテゴリ集合Ｄｃのスコアｓｄｃ（カテゴリ集合Ｄｃ中のカテゴリのスコアの合計値）に対する、当該カテゴリｃのスコアｓｃの割合である。 The important category candidate extraction unit 322 calculates the importance of the category c using the score sdc of the category set Dc and the score sc of the category c (step S58). Here, the importance of category c is calculated by sc / sdc. That is, the importance of the category c is the ratio of the score sc of the category c to the score sdc of the category set Dc (the total value of the scores of the categories in the category set Dc).

次に、重要カテゴリ候補抽出部３２２は、算出されたカテゴリｃの重要度が予め定められた値（以下、閾値と表記）以上であるか否かを判定する（ステップＳ５９）。このステップＳ５９の処理において用いられる閾値は、例えば０．４（４０％）である。 Next, the important category candidate extraction unit 322 determines whether or not the calculated importance level of the category c is equal to or greater than a predetermined value (hereinafter referred to as a threshold value) (step S59). The threshold value used in the process of step S59 is, for example, 0.4 (40%).

カテゴリｃの重要度が閾値以上でないと判定された場合（ステップＳ５９のＮＯ）、以下のステップＳ６２の処理が実行される。 When it is determined that the importance level of the category c is not equal to or higher than the threshold value (NO in step S59), the following process in step S62 is executed.

一方、カテゴリｃの重要度が閾値以上であると判定された場合（ステップＳ５９のＹＥＳ）、重要カテゴリ候補抽出部３２２は、当該カテゴリｃを重要カテゴリ候補として決定する（ステップＳ６０）。 On the other hand, when it is determined that the importance level of the category c is equal to or higher than the threshold (YES in step S59), the important category candidate extraction unit 322 determines the category c as an important category candidate (step S60).

次に、重要カテゴリ候補抽出部３２２は、算出されたカテゴリｃの重要度に応じて、当該カテゴリｃ（重要カテゴリ候補として決定されたカテゴリｃ）をユーザに対して提示する際の背景色を決定する（ステップＳ６１）。なお、重要カテゴリ候補抽出部３２２によって決定される背景色は、カテゴリｃの重要度に応じて複数種類用意されているものとする。 Next, the important category candidate extraction unit 322 determines a background color when presenting the category c (the category c determined as the important category candidate) to the user according to the calculated importance of the category c. (Step S61). Note that a plurality of types of background colors determined by the important category candidate extraction unit 322 are prepared according to the importance of the category c.

重要カテゴリ候補抽出部３２２は、例えばカテゴリｃの重要度（割合）が０．８（８０％）以上である場合、当該重要度が０．６以上０．８未満（６０％以上８０％未満）である場合、当該重要度が０．４以上０．６未満（４０％以上６０％未満）である場合の３段階で背景色を決定する。なお、背景色の種類の数およびカテゴリｃの重要度の範囲等については、適宜、変更可能である。 For example, when the importance (ratio) of the category c is 0.8 (80%) or more, the important category candidate extraction unit 322 has the importance of 0.6 or more and less than 0.8 (60% or more and less than 80%). In this case, the background color is determined in three stages when the degree of importance is 0.4 or more and less than 0.6 (40% or more and less than 60%). Note that the number of types of background colors and the range of importance of the category c can be appropriately changed.

後述するように、この重要カテゴリ候補抽出部３２２によって決定された背景色によってカテゴリｃに重要文書が属する程度（度合い）がユーザに対して提示される。 As will be described later, the degree (degree) of the important document belonging to the category c is presented to the user based on the background color determined by the important category candidate extraction unit 322.

次に、重要カテゴリ候補抽出部は、カテゴリ集合Ｄｃ中の全てのカテゴリについて上記したステップＳ５８〜Ｓ６１の処理が実行されたか否かを判定する（ステップＳ６２）。 Next, the important category candidate extraction unit determines whether or not the above-described processing in steps S58 to S61 has been executed for all categories in the category set Dc (step S62).

カテゴリ集合Ｄｃ中の全てのカテゴリについて処理が実行されていないと判定された場合（ステップＳ６２のＮＯ）、上記したステップＳ５８に戻って処理が繰り返される。この場合、ステップＳ５８およびＳ５９の処理が実行されていないカテゴリをカテゴリｃとして処理が実行される。 When it is determined that the processing has not been executed for all categories in the category set Dc (NO in step S62), the process returns to the above-described step S58 and is repeated. In this case, the process is executed with the category for which the processes of steps S58 and S59 have not been executed as category c.

このようにステップＳ５８〜Ｓ６１の処理がカテゴリ集合Ｄｃ中のカテゴリの各々について繰り返されることによって、当該カテゴリ集合Ｄｃの中から重要カテゴリ候補が決定（抽出）される。 As described above, the processes of steps S58 to S61 are repeated for each of the categories in the category set Dc, whereby an important category candidate is determined (extracted) from the category set Dc.

一方、カテゴリ集合Ｄｃ中の全てのカテゴリについて処理が実行されたと判定された場合（ステップＳ６２のＹＥＳ）、重要カテゴリ候補抽出処理は終了される。 On the other hand, when it is determined that the process has been executed for all categories in the category set Dc (YES in step S62), the important category candidate extraction process is terminated.

ここで、上述したように重要カテゴリ候補抽出処理が終了されると、カテゴリ表示操作部３１１によって当該重要カテゴリ候補抽出処理において抽出された重要カテゴリ候補がユーザに対して提示（表示）される。 Here, when the important category candidate extraction process ends as described above, the category display operation unit 311 presents (displays) the important category candidates extracted in the important category candidate extraction process to the user.

図１６は、カテゴリ表示操作部３１１によって重要カテゴリ候補が表示された場合の表示画面の一例を示す。 FIG. 16 shows an example of a display screen when an important category candidate is displayed by the category display operation unit 311.

図１６に示す表示画面２００においては、カテゴリ記憶部２３に記憶されているカテゴリ番号によって識別されるカテゴリが、当該カテゴリ名とともに階層構造で表示される。 On the display screen 200 shown in FIG. 16, the category identified by the category number stored in the category storage unit 23 is displayed in a hierarchical structure together with the category name.

この表示画面２００において表示されているカテゴリのうち、背景色があるカテゴリ２０１〜２０７が重要カテゴリ候補として抽出されたカテゴリである。なお、これらのカテゴリ２０１〜２０７以外のカテゴリは、重要カテゴリ候補として抽出されていないカテゴリである。 Among the categories displayed on the display screen 200, categories 201 to 207 having a background color are extracted as important category candidates. Note that categories other than these categories 201 to 207 are categories that are not extracted as important category candidates.

図１６に示す例では、重要カテゴリ候補であるカテゴリ２０１〜２０７は、上記したように例えば３段階で背景色（の例えば濃度）が異なる。この背景色は、例えばカテゴリ２０１〜２０７の重要度に比例して濃く表示されるものとする。 In the example illustrated in FIG. 16, the categories 201 to 207 that are important category candidates have different background colors (for example, density) in three stages as described above. This background color is assumed to be darkly displayed in proportion to the importance of the categories 201 to 207, for example.

ここでは、カテゴリ「操作」２０５は、カテゴリ２０１〜２０７の中で背景色が最も濃いため、重要度が高いカテゴリである。つまり、カテゴリ「操作」２０５は、カテゴリ２０１〜２０７の中では重要文書が属する程度（度合い）が高いカテゴリである
一方、カテゴリ「Ｔ社」２０１、カテゴリ「Ｎ社」２０３およびカテゴリ「１９９６年」２０７は、カテゴリ２０１〜２０７の中で背景色が最も薄いため、重要度が低いカテゴリである。つまり、カテゴリ「Ｔ社」２０１、カテゴリ「Ｎ社」２０３およびカテゴリ「１９９６年」２０７は、カテゴリ２０１〜２０７の中では重要文書が属する程度（度合い）が低いカテゴリである。 Here, the category “operation” 205 is a category having high importance because the background color is the darkest among the categories 201 to 207. That is, the category “operation” 205 is a category having a high degree (degree) of important documents in the categories 201 to 207, while the category “T company” 201, the category “N company” 203, and the category “1996”. 207 is a category with low importance because the background color is the lightest among the categories 201 to 207. That is, the category “Company T” 201, the category “Company N” 203, and the category “1996” 207 are categories with a low degree (degree) of important documents belonging to the categories 201 to 207.

ここで、上記したように重要カテゴリ候補が表示された画面（つまり、図１６に示す表示画面２００）においては、ユーザは、例えば文書分析装置３０に対して当該重要カテゴリ候補（ここでは、カテゴリ２０１〜２０７）のうちの１つを選択する操作を行うことができる。 Here, on the screen on which the important category candidates are displayed as described above (that is, the display screen 200 shown in FIG. 16), for example, the user makes the important category candidates (here, the category 201) to the document analysis device 30. To 207) can be performed.

このような操作がユーザによって行われた場合には、カテゴリ表示操作部３１１は当該操作を受け付け、カテゴリ選択部３１３はカテゴリ表示操作部３１１によって受け付けられた操作に応じて重要カテゴリ候補のうちの１つのカテゴリを選択する。 When such an operation is performed by the user, the category display operation unit 311 accepts the operation, and the category selection unit 313 selects one of the important category candidates according to the operation accepted by the category display operation unit 311. Select one category.

カテゴリ選択部３１３によって１つの重要カテゴリ候補が選択されると、上述したように関連重要カテゴリ候補抽出部３２３によって関連重要カテゴリ候補抽出処理（図１０に示すステップＳ７の処理）が実行される。 When one important category candidate is selected by the category selection unit 313, the related important category candidate extraction unit 323 executes the related important category candidate extraction process (the process of step S7 shown in FIG. 10) as described above.

次に、図１７のフローチャートを参照して、関連重要カテゴリ候補抽出部３２３による関連重要カテゴリ候補抽出処理の処理手順について説明する。 Next, a processing procedure of related important category candidate extraction processing by the related important category candidate extraction unit 323 will be described with reference to a flowchart of FIG.

以下の説明では、カテゴリ選択部３１３によって選択された重要カテゴリ候補（カテゴリ表示操作部３１１によって表示された重要カテゴリ候補のうちの１つ）を選択重要カテゴリ候補ｃｃとする。 In the following description, the important category candidate selected by the category selection unit 313 (one of the important category candidates displayed by the category display operation unit 311) is set as the selected important category candidate cc.

関連重要カテゴリ候補抽出部３２３は、重要カテゴリ候補抽出部３２２によって抽出された重要カテゴリ候補（つまり、カテゴリ表示操作部３１１によって表示された背景色が無色でないカテゴリ）のうちの選択重要カテゴリ候補ｃｃ以外の重要カテゴリ候補の各々について、以下のステップＳ７１〜Ｓ７３の処理を実行する。以下、この処理の対象となる重要カテゴリ候補を重要カテゴリ候補ｃとする。 The related important category candidate extraction unit 323 is other than the selected important category candidate cc among the important category candidates extracted by the important category candidate extraction unit 322 (that is, the category whose background color displayed by the category display operation unit 311 is not colorless). The following steps S71 to S73 are executed for each of the important category candidates. Hereinafter, an important category candidate to be processed is referred to as an important category candidate c.

関連重要カテゴリ候補抽出部３２３は、重要カテゴリ候補ｃおよび選択重要カテゴリ候補ｃｃの両方のカテゴリに属する重要候補文書および重要文書の数と、当該重要カテゴリ候補ｃに属する重要候補文書および重要文書の数と、当該選択重要カテゴリ候補ｃｃに属する重要候補文書および重要文書の数とを用いて、当該重要カテゴリ候補ｃが統計的に有意であるか否かを判定する（ステップＳ７１）。 The related important category candidate extraction unit 323 includes the number of important candidate documents and important documents belonging to both the important category candidate c and the selected important category candidate cc, and the number of important candidate documents and important documents belonging to the important category candidate c. Using the important candidate documents and the number of important documents belonging to the selected important category candidate cc, it is determined whether or not the important category candidate c is statistically significant (step S71).

関連重要カテゴリ候補抽出部３２３は、例えばχ二乗検定で重要カテゴリ候補ｃが統計的に有意であるか否かを検定する。 The related important category candidate extraction unit 323 tests whether the important category candidate c is statistically significant by, for example, a chi-square test.

この場合のχ二乗検定によれば、χ二乗統計量（χ）が、自由度２の有意水準５％のχ二乗分布（３．８４）や自由度２の有意水準１％のχ二乗分布（６．６３）よりも小さい場合には統計的に有意となる。一方、χ二乗統計量（χ）が、自由度２の有意水準５％のχ二乗分布（３．８４）や自由度２の有意水準１％のχ二乗分布（６．６３）よりも大きい場合には統計的に有意とならない。なお、χ二乗統計量（χ）は、以下の数式により算出される。

According to the chi-square test in this case, the chi-square statistic (χ) has a chi-square distribution (3.84) with a significance level of 5% and a chi-square distribution (3.84) with a significance level of 2 degrees ( If it is smaller than 6.63), it becomes statistically significant. On the other hand, when the chi-square statistic (χ) is larger than the chi-square distribution (3.84) with a significance level of 5% and the chi-square distribution (6.63) with a significance level of 1% and a significance level of 1%. Is not statistically significant. The χ square statistic (χ) is calculated by the following formula.

この数式において、ｘ１１は、重要カテゴリ候補ｃおよび選択重要カテゴリ候補ｃｃの両方に属する重要文書候補および重要文書の数である。ａ１は、重要カテゴリ候補ｃに属する重要文書候補および重要文書の数である。ｂ１は、選択重要カテゴリ候補ｃｃに属する重要文書候補および重要文書の数である。ｎは、全ての重要文書候補および重要文書の数である。また、ｘ１２はａ１−ｘ１１であり、ｘ２１はｂ１−ｘ１１であり、ｘ２２はｎ−ａ１−ｘ２１である。 In this equation, x11 is the number of important document candidates and important documents that belong to both the important category candidate c and the selected important category candidate cc. a1 is the number of important document candidates and important documents belonging to the important category candidate c. b1 is the number of important document candidates and important documents belonging to the selected important category candidate cc. n is the number of all important document candidates and important documents. X12 is a1-x11, x21 is b1-x11, and x22 is n-a1-x21.

χ二乗検定で重要カテゴリ候補ｃが統計的に有意であると判定された場合（ステップＳ７１のＹＥＳ）、関連重要カテゴリ候補抽出部３２３は、当該重要カテゴリ候補ｃを関連重要カテゴリ候補（選択重要カテゴリ候補ｃｃと関連のあるカテゴリ）として決定する（ステップＳ７２）。 When it is determined by the chi-square test that the important category candidate c is statistically significant (YES in step S71), the related important category candidate extraction unit 323 selects the related important category candidate c as the related important category candidate (selected important category). The category is related to the candidate cc) (step S72).

次に、関連重要カテゴリ候補抽出部３２３は、カテゴリ記憶部２３を参照して、関連重要カテゴリ候補として決定された重要カテゴリ候補ｃおよび当該重要カテゴリ候補ｃに属する重要文書候補および重要文書（つまり、当該重要カテゴリ候補ｃを識別するためのカテゴリ番号に対応づけて当該カテゴリ記憶部２３に記憶されている文書番号によって識別される重要文書候補および重要文書）の数を返り値リスト（以下、関連重要カテゴリ候補リストと表記）に格納する（ステップＳ７３）。 Next, the related important category candidate extraction unit 323 refers to the category storage unit 23, the important category candidate c determined as the related important category candidate, and the important document candidate and the important document belonging to the important category candidate c (that is, The number of important document candidates and important documents identified by the document number stored in the category storage unit 23 in association with the category number for identifying the important category candidate c is returned as a return value list (hereinafter referred to as related important items). Stored in the category candidate list) (step S73).

なお、ステップＳ７３において関連重要カテゴリ候補リストに格納された重要カテゴリ候補が関連重要カテゴリ候補抽出処理によって抽出された関連重要カテゴリ候補となる。 Note that the important category candidates stored in the related important category candidate list in step S73 are related important category candidates extracted by the related important category candidate extraction process.

一方、上記したステップＳ７１においてχ二乗検定で重要カテゴリ候補ｃが統計的に有意でないと判定された場合には、ステップＳ７２およびＳ７３の処理は実行されない。 On the other hand, when it is determined in step S71 described above that the important category candidate c is not statistically significant by the χ square test, the processes in steps S72 and S73 are not executed.

関連重要カテゴリ候補抽出部３２３は、選択重要カテゴリ候補ｃｃ以外の全ての重要カテゴリ候補について上記したステップＳ７１〜Ｓ７３の処理が実行されたか否かを判定する（ステップＳ７４）。 The related important category candidate extraction unit 323 determines whether or not the processes in steps S71 to S73 described above have been executed for all important category candidates other than the selected important category candidate cc (step S74).

選択重要カテゴリ候補ｃｃ以外の全ての重要カテゴリ候補について処理が実行されていないと判定された場合（ステップＳ７４のＮＯ）、上記したステップＳ７１に戻って処理が繰り返される。この場合、ステップＳ７１〜Ｓ７３の処理が実行されていない選択重要カテゴリ候補ｃｃ以外の重要カテゴリ候補を重要カテゴリ候補ｃとして処理が実行される。 When it is determined that the process is not executed for all the important category candidates other than the selected important category candidate cc (NO in step S74), the process returns to the above-described step S71 and is repeated. In this case, the process is executed with an important category candidate other than the selected important category candidate cc for which the processes of steps S71 to S73 are not executed as the important category candidate c.

一方、選択重要カテゴリ候補ｃｃ以外の全ての重要カテゴリ候補について処理が実行されたと判定された場合（ステップＳ７４のＹＥＳ）、関連重要カテゴリ候補抽出処理は終了される。 On the other hand, when it is determined that the process has been executed for all the important category candidates other than the selected important category candidate cc (YES in step S74), the related important category candidate extraction process ends.

ここで、上述したように関連重要カテゴリ候補抽出処理が終了されると、関連重要カテゴリ候補リストに格納されている重要カテゴリ候補（関連重要カテゴリ候補抽出処理によって抽出された関連重要カテゴリ候補）と当該重要カテゴリ候補に属する重要文書候補および重要文書の数とがカテゴリ表示操作部３１１によってユーザに対して提示（表示）される。 When the related important category candidate extraction process ends as described above, the important category candidates (related important category candidates extracted by the related important category candidate extraction process) stored in the related important category candidate list and the relevant The important document candidates belonging to the important category candidates and the number of important documents are presented (displayed) to the user by the category display operation unit 311.

図１８は、カテゴリ表示操作部３１１によって関連重要カテゴリ候補が表示された場合の表示画面の一例を示す。なお、図１８に示す表示画面３００は、上述した図１６に示す表示画面２００において例えばカテゴリ「操作性」２０５を選択する操作がユーザによって行われた後に関連重要カテゴリ候補抽出処理が実行され、当該関連重要カテゴリ候補抽出処理において抽出された関連重要カテゴリが表示された場合の画面である。 FIG. 18 shows an example of a display screen when related important category candidates are displayed by the category display operation unit 311. The display screen 300 shown in FIG. 18 is subjected to the related important category candidate extraction process after the user performs an operation of selecting the category “operability” 205 on the display screen 200 shown in FIG. It is a screen when the related important category extracted in the related important category candidate extraction process is displayed.

図１８に示す表示画面３００においては、選択重要カテゴリ候補であるカテゴリ「操作性」２０５の付近の領域３０１に関連重要カテゴリ候補が表示される。 In the display screen 300 shown in FIG. 18, related important category candidates are displayed in an area 301 near the category “operability” 205 which is a selected important category candidate.

図１８に示す例では、カテゴリ「操作性」２０５と関連がある関連重要カテゴリ候補としてカテゴリ「Ｈ社」２０２、カテゴリ「対話分類」２０４およびカテゴリ「１９９５年」２０６が表示されている。 In the example illustrated in FIG. 18, the category “Company H” 202, the category “interaction classification” 204, and the category “1995” 206 are displayed as related important category candidates related to the category “operability” 205.

なお、関連重要カテゴリ候補であるカテゴリ「Ｈ社」２０２、カテゴリ「対話分類」２０４およびカテゴリ「１９９５年」２０６の近傍には、当該カテゴリに属する重要文書候補および重要文書の数（件数）が表示されている。例えばカテゴリ「Ｈ社」２０２の上部には、当該カテゴリ「Ｈ社」２０２に属する重要文書候補および重要文書の数として「３０件」が表示されている。 In addition, the number of important document candidates and important documents belonging to the category are displayed in the vicinity of the category “Company H” 202, the category “Dialogue classification” 204, and the category “1995” 206, which are related important category candidates. Has been. For example, at the top of the category “Company H” 202, “30” is displayed as the number of important document candidates and important documents belonging to the category “Company H” 202.

ここで、ユーザは、このような表示画面３００に表示されたカテゴリ「操作性」２０５と関連のあるカテゴリ（ここでは、「Ｈ社」２０２、カテゴリ「対話分類」２０４およびカテゴリ「１９９５年」２０６）を参照して、重要文書が存在するか否かを確認したいカテゴリ（以下、確認対象カテゴリと表記）を選択する操作を行うことができる。これにより、ユーザは、例えば上述した図１６に示す表示画面２００において選択した重要カテゴリ候補（ここでは、カテゴリ「操作性」２０５）のみから重要文書を探すか、または、当該カテゴリ「操作性」２０５と関連のあるカテゴリ（ここでは、「Ｈ社」２０２、カテゴリ「対話分類」２０４およびカテゴリ「１９９５年」２０６）からも重要文書を探すかを選択することができる。 Here, the user has a category (here, “Company H” 202, category “Dialogue classification” 204, and category “1995” 206) related to the category “operability” 205 displayed on the display screen 300. ), It is possible to perform an operation of selecting a category (hereinafter referred to as a confirmation target category) for which it is desired to confirm whether or not an important document exists. Thereby, for example, the user searches for an important document only from the important category candidate (here, the category “operability” 205) selected on the display screen 200 shown in FIG. 16 or the category “operability” 205. It is also possible to select whether to search for important documents from the categories related to (here, “Company H” 202, category “dialog classification” 204, and category “1995” 206).

確認対象カテゴリを選択する操作が当該ユーザによって行われた場合、当該カテゴリに属する文書（の文書名）が表示される。 When the operation for selecting the confirmation target category is performed by the user, the document (document name) belonging to the category is displayed.

ここで、図１９は、図１８に示す表示画面３００において確認対象カテゴリとしてカテゴリ「操作性」２０５およびカテゴリ「Ｈ社」２０２が選択された場合の表示画面の一例を示す。 Here, FIG. 19 shows an example of the display screen when the category “operability” 205 and the category “Company H” 202 are selected as the confirmation target categories on the display screen 300 shown in FIG.

図１９に示す表示画面３００においては、当該表示画面３００の例えば右上に設けられた領域３０２に確認対象カテゴリとして選択されたカテゴリ「操作性」２０５およびカテゴリ「Ｈ社」２０２に属する文書の文書名の一覧が表示される。 In the display screen 300 shown in FIG. 19, for example, the document names of the documents belonging to the category “operability” 205 and the category “Company H” 202 selected as the confirmation target category in the area 302 provided in the upper right of the display screen 300. A list of will be displayed.

この場合、領域３０２においては、カテゴリ「操作性」２０５およびカテゴリ「Ｈ社」２０２に属する文書のうちの重要文書が最上位に表示され、次に重要文書候補が表示され、最後に他の文書（つまり、重要文書および重要文書候補以外の文書）が表示される。なお、重要文書候補は、上述した類似度（重要文書との類似度）の順に表示される。 In this case, in the area 302, the important documents among the documents belonging to the category “operability” 205 and the category “Company H” 202 are displayed at the top, the important document candidates are displayed next, and finally the other documents. (In other words, important documents and documents other than important document candidates) are displayed. The important document candidates are displayed in the order of the above-described similarity (similarity with the important document).

なお、ユーザは、図１９に示す表示画面３００に表示された重要文書候補等を参照することによって、文書分析装置３０に対して当該重要文書候補を重要文書として指定する操作を行うことができる。文書分析装置３０に対して重要文書候補を指定する操作がユーザによって行われた場合、文書記憶部２２において上記したように当該指定された重要文書候補に含まれる重要度が「重要」に変更されることにより、当該重要文書候補は重要文書とされる。 The user can perform an operation of designating the important document candidate as an important document with respect to the document analysis apparatus 30 by referring to the important document candidate displayed on the display screen 300 shown in FIG. When an operation for designating an important document candidate is performed on the document analysis apparatus 30 by the user, the importance included in the designated important document candidate is changed to “important” in the document storage unit 22 as described above. Thus, the important document candidate is set as an important document.

上記したように本実施形態においては、ユーザの操作に応じて指定された重要文書と重要文書候補抽出処理の対象となる文書との類似度が算出され、当該類似度に基づいて重要文書候補が抽出（決定）される。また、本実施形態においては、算出された類似度のうちのカテゴリに属する重要文書候補と重要文書との類似度に基づいて当該カテゴリの重要度が算出され、当該カテゴリの重要度に基づいて抽出（決定）された重要カテゴリ候補がユーザに対して提示される。 As described above, in the present embodiment, the similarity between the important document specified in accordance with the user's operation and the document that is the target of the important document candidate extraction process is calculated, and the important document candidate is determined based on the similarity. Extracted (determined). In the present embodiment, the importance of the category is calculated based on the similarity between the important document candidate belonging to the category and the important document among the calculated similarities, and extracted based on the importance of the category. The (determined) important category candidates are presented to the user.

これにより、本実施形態においては、ユーザにとって重要な文書が属するカテゴリの傾向を当該ユーザに提示することができるため、当該ユーザは当該重要な文書の傾向（つまり、当該重要な文書が含まれていそうなカテゴリ）に気づくことが可能となる。つまり、本実施形態においては、ユーザは漏れが少なく、効率よく重要な文書を探すことが可能となり、当該重要な文書の発見のための労力を削減することができる。 Thereby, in this embodiment, since the tendency of the category to which the document important for the user belongs can be presented to the user, the user includes the tendency of the important document (that is, the important document is included). It becomes possible to notice such a category). That is, in the present embodiment, the user can search for important documents efficiently with little leakage, and the labor for finding the important documents can be reduced.

また、本実施形態においては、カテゴリ（重要カテゴリ候補）の重要度に応じて当該カテゴリの背景色を変化させて当該カテゴリがユーザに対して提示（表示）される。これにより、本実施形態においては、ユーザはカテゴリに重要な文書が属する程度（度合い）によって重要文書が属しているかを確認すべきカテゴリを知ることができ、より効率よく重要文書を発見することが可能となる。 In the present embodiment, the category is presented (displayed) to the user by changing the background color of the category according to the importance of the category (important category candidate). Thereby, in this embodiment, the user can know the category to which the important document belongs by the degree (degree) of the important document belonging to the category, and can find the important document more efficiently. It becomes possible.

また、本実施形態においては、ユーザの操作に応じて選択された重要カテゴリ候補に関連がある重要カテゴリ候補（関連重要カテゴリ候補）が抽出（決定）され、当該関連重要カテゴリ候補をユーザに対して提示することができる。これにより、本実施形態においては、ユーザの操作に応じて選択された重要カテゴリ候補だけでなく、当該重要カテゴリ候補に関連がある重要カテゴリ候補についてもユーザに対して提示することで、当該ユーザは漏れなく重要文書を発見することが可能となる。 In the present embodiment, important category candidates (related important category candidates) related to the important category candidates selected according to the user's operation are extracted (determined), and the relevant important category candidates are extracted from the user. Can be presented. Thereby, in this embodiment, not only the important category candidate selected according to the user's operation but also the important category candidate related to the important category candidate is presented to the user. It is possible to discover important documents without omission.

なお、本実施形態においては、ユーザの操作に応じて重要文書が指定されるものとして説明したが、例えば当該ユーザの操作に応じてカテゴリが指定されることによって当該カテゴリに属する全ての文書が重要文書として指定される構成であっても構わない。 In the present embodiment, the description has been made on the assumption that an important document is designated according to a user operation. However, for example, when a category is designated according to the user operation, all documents belonging to the category are important. The configuration may be specified as a document.

（第２の実施形態）
次に、図２０を参照して、第２の実施形態について説明する。図２０は、本実施形態に係る文書分析装置の主として機能構成を示すブロック図である。なお、前述した図２と同様の部分には同一参照符号を付してその詳しい説明を省略する。ここでは、図２と異なる部分について主に述べる。 (Second Embodiment)
Next, a second embodiment will be described with reference to FIG. FIG. 20 is a block diagram mainly showing a functional configuration of the document analysis apparatus according to the present embodiment. The same parts as those in FIG. 2 described above are denoted by the same reference numerals, and detailed description thereof is omitted. Here, parts different from FIG. 2 will be mainly described.

また、本実施形態に係る文書分析装置のハードウェア構成は、前述した第１の実施形態と同様であるため、適宜、図１を用いて説明する。 The hardware configuration of the document analysis apparatus according to this embodiment is the same as that of the first embodiment described above, and will be described with reference to FIG. 1 as appropriate.

図２０に示すように、本実施形態に係る文書分析装置４０は、重要文書処理部４１を含む。本実施形態において、重要文書処理部４１は、図１に示すコンピュータ１０が外部記憶装置２０に格納されているプログラム２１を実行することにより実現されるものとする。 As shown in FIG. 20, the document analysis apparatus 40 according to the present embodiment includes an important document processing unit 41. In the present embodiment, the important document processing unit 41 is realized by the computer 10 illustrated in FIG. 1 executing the program 21 stored in the external storage device 20.

重要文書処理部４１は、重要文書集計部４１１を含む。重要文書集計部４１１は、文書記憶部２２に記憶されている文書のうちの重要文書について、ユーザによって指定された複数の分類軸でクロス集計を行う。重要文書処理部４１１は、ユーザによって指定された複数の分類軸の各々に該当する複数のカテゴリ（の全て）に属する重要文書の数を集計する。 The important document processing unit 41 includes an important document totaling unit 411. The important document totaling unit 411 performs cross tabulation on the important documents among the documents stored in the document storage unit 22 using a plurality of classification axes specified by the user. The important document processing unit 411 counts the number of important documents belonging to (all) a plurality of categories corresponding to each of the plurality of classification axes designated by the user.

重要文書集計部４１１による集計の結果は、カテゴリ表示操作部３１１を介してユーザに対して提示される。 The result of counting by the important document totaling unit 411 is presented to the user via the category display operation unit 311.

次に、図２１のフローチャートを参照して、本実施形態に係る文書分析装置４０の処理手順について説明する。 Next, a processing procedure of the document analysis apparatus 40 according to the present embodiment will be described with reference to the flowchart of FIG.

まず、前述したステップＳ１〜Ｓ１０の処理に相当するステップＳ８１〜Ｓ９０の処理が実行される。 First, steps S81 to S90 corresponding to the steps S1 to S10 described above are executed.

なお、ユーザは、文書分析装置４０を操作することによって、例えばステップＳ９０において提示された文書（つまり、重要カテゴリ候補および関連重要カテゴリ候補に属する重要文書候補）の中から重要文書（第３の文書）を指定することができる。この場合、文書記憶部２２において重要文書として指定された文書に含まれる重要度は「重要」に変更される。 Note that the user operates the document analysis device 40, for example, an important document (third document) from among the documents presented in step S90 (that is, important document candidates belonging to the important category candidate and the related important category candidate). ) Can be specified. In this case, the importance included in the document designated as the important document in the document storage unit 22 is changed to “important”.

ここで、ユーザは、例えばステップＳ９０において提示された文書の中から重要文書を指定し、全ての重要文書が確定した後に、当該重要文書の集計を指示するための操作を文書分析装置４０に対して行うことができる。なお、重要文書の集計を指示するための操作において、ユーザは、例えば２つの分類軸を指定（選択）する。 Here, for example, the user designates an important document from the documents presented in step S90, and after all the important documents are confirmed, an operation for instructing the aggregation of the important documents is performed on the document analysis apparatus 40. Can be done. In the operation for instructing the aggregation of important documents, the user designates (selects), for example, two classification axes.

カテゴリ表示操作部３１１は、ユーザによって重要文書の集計を指示するための操作が行われた場合には、当該操作を受け付ける。 The category display operation unit 311 accepts an operation when an operation for instructing the aggregation of important documents is performed by the user.

カテゴリ表示操作部３１１は、重要文書の集計を指示するための操作が受け付けられたか、つまり、ユーザからの当該重要文書の集計の指示（以下、集計指示と表記）があるか否かを判定する（ステップＳ９１）。 The category display operation unit 311 determines whether an operation for instructing the aggregation of important documents has been accepted, that is, whether there is an instruction to aggregate the important documents (hereinafter referred to as an aggregation instruction) from the user. (Step S91).

ユーザからの集計指示があると判定された場合（ステップＳ９１のＹＥＳ）、重要文書集計部４１１は、文書記憶部２２に記憶されている重要文書（重要度が「重要」である文書）を、当該集計指示においてユーザによって指定された２つの分類軸でクロス集計する（ステップＳ９２）。 When it is determined that there is a totaling instruction from the user (YES in step S91), the important document totaling unit 411 selects an important document (a document whose importance is “important”) stored in the document storage unit 22, Cross tabulation is performed on the two classification axes designated by the user in the tabulation instruction (step S92).

この場合、重要文書集計部４１１は、カテゴリ記憶部２３を参照して、ユーザによって指定された２つの分類軸（によって示される観点）のうちの一方の分類軸に該当するカテゴリの各々と他方の分類軸に該当するカテゴリの各々との両方に属する重要文書の数を、当該カテゴリの組み合わせ毎に集計する。 In this case, the important document totaling unit 411 refers to the category storage unit 23, and each of the categories corresponding to one of the two classification axes (the viewpoint indicated by) designated by the user and the other classification axis. The number of important documents belonging to both of the categories corresponding to the classification axis is tabulated for each category combination.

ここで、分類軸とは、複数のカテゴリにおいて共通する観点を示す。例えば文書記憶部２２に記憶されている文書（重要文書）が特許文書であるものとすると、分類軸には、例えば「出願人別」および「出願年別」等が指定される。分類軸「出願人別」に該当するカテゴリには、例えばカテゴリ「Ｔ社」および「Ｈ社」等が含まれる。また、分類軸「出願年別」に該当するカテゴリには、例えばカテゴリ「１９９５年」および「１９９６年」等が含まれる。 Here, the classification axis indicates a viewpoint common to a plurality of categories. For example, if the document (important document) stored in the document storage unit 22 is a patent document, for example, “by applicant” and “by application year” are designated as the classification axis. The category corresponding to the classification axis “by applicant” includes, for example, the categories “Company T” and “Company H”. The category corresponding to the classification axis “by application year” includes, for example, the categories “1995” and “1996”.

具体的には、ユーザは、階層構造を構成する複数のカテゴリのうち、子カテゴリを持つカテゴリ（つまり、親カテゴリ）を分類軸として選択することができる。この場合における分類軸に該当するカテゴリとは、ユーザによって選択されたカテゴリ（親カテゴリ）の下位に位置する子カテゴリである。 Specifically, the user can select a category having a child category (that is, a parent category) as a classification axis from among a plurality of categories constituting the hierarchical structure. The category corresponding to the classification axis in this case is a child category positioned below the category (parent category) selected by the user.

ここで、図２２は、重要文書集計部４１１による重要文書の集計結果の一例を示す。図２２においては、分類軸として「出願人別（縦軸）」および「出願年別（横軸）」が指定されている。 Here, FIG. 22 shows an example of the result of counting important documents by the important document totaling unit 411. In FIG. 22, “by applicant” (vertical axis) and “by application year (horizontal axis)” are designated as classification axes.

図２２に示す例では、分類軸「出願人別」に該当するカテゴリには、カテゴリ「Ｔ社」、「Ｈ社」、「Ｎ社」および「Ｍ社」が含まれている。一方、分類軸「出願年別」には、カテゴリ「１９９５年」、「１９９６年」、「１９９７年」および「１９９８年」等が含まれている。 In the example illustrated in FIG. 22, categories corresponding to the classification axis “by applicant” include the categories “Company T”, “Company H”, “Company N”, and “Company M”. On the other hand, the classification axis “by application year” includes categories “1995”, “1996”, “1997”, “1998”, and the like.

図２２に示す重要文書の集計結果においては、例えばカテゴリ「Ｔ社」およびカテゴリ「１９９５年」の両方のカテゴリに属する重要文書の数が１０であることが示されている。また、例えばカテゴリ「Ｈ社」およびカテゴリ「１９９６年」に両方のカテゴリに属する重要文書の数が３０であることが示されている。また、カテゴリ「Ｍ社」に属する文書の中には重要文書が存在しないため、当該カテゴリ「Ｍ社」とカテゴリ「１９９５年」、「１９９６年」、「１９９７年」および「１９９８年」の各々との両方のカテゴリに属する重要文書の数が０であることが示されている。 In the tabulated result of important documents shown in FIG. 22, for example, it is shown that the number of important documents belonging to both the category “T company” and the category “1995” is ten. Further, for example, the number of important documents belonging to both categories in the category “Company H” and the category “1996” is 30. Further, since there is no important document among the documents belonging to the category “M company”, each of the category “M company” and the categories “1995”, “1996”, “1997”, and “1998”. It is shown that the number of important documents belonging to both categories is zero.

再び図２１に戻ると、カテゴリ表示操作部３１１は、重要文書集計部４１１による集計結果をユーザに対して提示（表示）する（ステップＳ９３）。 Returning to FIG. 21 again, the category display operation unit 311 presents (displays) the totaled result by the important document totaling unit 411 to the user (step S93).

上記した図２２に示すような重要文書の集計結果がユーザに提示されることで、当該ユーザは、当該重要文書の傾向を容易に把握することが可能となる。 By presenting the summary results of the important documents as shown in FIG. 22 to the user, the user can easily grasp the tendency of the important documents.

具体的には、図２２に示す重要文書の集計結果によれば、例えばカテゴリ「Ｈ社」およびカテゴリ「１９９６年」の両方に属する重要文書（つまり、Ｈ社によって１９９６年に出願された重要な特許文書）が最も多いことが容易に把握できる。 Specifically, according to the result of tabulating important documents shown in FIG. 22, for example, important documents belonging to both the category “Company H” and the category “1996” (that is, important documents filed in 1996 by Company H). It is easy to grasp that there are the most patent documents).

なお、図２２に示す重要文書の集計結果がユーザに対して提示（表示）される場合には、重要文書の数に応じて背景色を変更しても構わない。これによって、より重要文書の傾向を容易に把握することが可能となる。 Note that when the result of aggregation of important documents shown in FIG. 22 is presented (displayed) to the user, the background color may be changed according to the number of important documents. This makes it possible to easily grasp the tendency of important documents.

再び図２１に戻ると、ステップＳ８９において重要文書が存在するか否かを確認したいカテゴリを選択する操作が受け付けられていないと判定された場合には、上記したステップＳ９１の処理が実行される。また、ステップＳ９１においてユーザからの集計指示がないと判定された場合、処理は終了される。 Returning to FIG. 21 again, if it is determined in step S89 that an operation for selecting a category for which it is desired to confirm whether or not an important document exists is not accepted, the processing in step S91 described above is executed. If it is determined in step S91 that there is no aggregation instruction from the user, the process ends.

上記したように本実施形態においては、ユーザによって指定された２つの分類軸（の各々に該当する複数のカテゴリ）に属する重要文書の集計結果をユーザに対して提示することができるため、当該ユーザは、重要文書（が属するカテゴリ）の傾向を容易に把握することが可能となる。 As described above, in the present embodiment, since the aggregation result of important documents belonging to two classification axes (a plurality of categories corresponding to each) specified by the user can be presented to the user, the user Makes it possible to easily grasp the tendency of important documents (category to which the document belongs).

なお、本実施形態においては、文書記憶部２２に記憶されている文書のうちの重要文書のみの集計（クロス集計）結果が例えば２軸マップ上に表示されるものとして説明したが、当該重要文書に加えて重要文書候補抽出部３２２によって抽出された重要文書候補の集計結果が表示されてもよく、更に、当該重要文書および重要文書候補以外の文書の集計結果が表示されても構わない。 In the present embodiment, it has been described that the result of aggregation (cross tabulation) of only important documents among documents stored in the document storage unit 22 is displayed on, for example, a two-axis map. In addition, the total result of the important document candidates extracted by the important document candidate extraction unit 322 may be displayed, and further, the total result of documents other than the important document and the important document candidate may be displayed.

上述した第１および第２の実施形態によれば、ユーザにとって重要な文書が属するカテゴリの傾向を当該ユーザに提示することが可能な文書分析装置およびプログラムを提供することができる。 According to the first and second embodiments described above, it is possible to provide a document analysis apparatus and a program capable of presenting a tendency of a category to which a document important for a user belongs to the user.

なお、本願発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組合せにより種々の発明を形成できる。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に亘る構成要素を適宜組合せてもよい。 Note that the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１０…コンピュータ、２０…外部記憶装置、２２…文書記憶部、２３…カテゴリ記憶部、３０，４０…文書分析装置、３１…ユーザインタフェース部、３２，４１…重要文書処理部、３１１…カテゴリ表示操作部、３１２…重要文書指定部、３１３…カテゴリ選択部、３２１…重要文書候補抽出部、３２２…重要カテゴリ候補抽出部、３２３…関連重要カテゴリ候補抽出部、４１１…重要文書集計部。 DESCRIPTION OF SYMBOLS 10 ... Computer, 20 ... External storage device, 22 ... Document storage part, 23 ... Category storage part, 30, 40 ... Document analysis apparatus, 31 ... User interface part, 32, 41 ... Important document processing part, 311 ... Category display operation 312 ... important document designation part, 313 ... category selection part, 321 ... important document candidate extraction part, 322 ... important category candidate extraction part, 323 ... related important category candidate extraction part, 411 ... important document aggregation part.

Claims

Storage means for associating and storing category identification information for identifying the category and documents belonging to the category for each category into which a plurality of documents are classified;
An important document designating unit for designating a first document stored in the storage unit in response to a user operation;
First specifying means for specifying category identification information stored in the storage means in association with the designated first document;
Second specifying means for specifying a second document stored in the storage means in association with the category identification information specified by the first specifying means;
Category identification information stored in the storage means in association with the first document, the similarity between the designated first document and the second document specified by the second specifying means, and Similarity calculation means for calculating based on category identification information stored in the storage means in association with the second document;
An important document candidate determining means for determining the second document specified by the second specifying means as an important document candidate based on the similarity calculated by the similarity calculating means;
The importance of the category identified by the category identification information stored in the storage means is stored in the storage means in association with the category identification information among the similarities calculated by the similarity calculation means. Importance calculating means for calculating based on the similarity between the second document determined as the important document candidate and the designated first document;
Based on the importance calculated by the importance calculating means, an important category candidate determining means for determining a category identified by the category identification information stored in the storage means as an important category candidate;
Presenting means for presenting a category determined as the important category candidate.

The importance calculation means includes:
For each category identification information stored in the storage means, the second document determined as the important document candidate stored in the storage means in association with the category identification information and the designated first First calculating means for calculating a similarity to a document;
Second calculation means for calculating a total value of similarities calculated for each category identification information by the first calculation means;
The ratio of the similarity calculated for each category identification information by the first calculation unit to the total similarity calculated by the second calculation unit is expressed as the importance of the category identified by the category identification information. And a third calculating means for calculating as
The document analysis apparatus according to claim 1, wherein the presenting unit presents the category determined as the important category candidate according to the ratio calculated by the third calculating unit as the importance.

Selection means for selecting a category determined as the important category candidate in accordance with a user operation;
The selected based on the second document determined as the important document candidate stored in the storage means in association with each of the category identification information for identifying the category determined as the important category candidate Determination means for determining whether the category determined as the important category candidate other than the selected category is a category related to the category selected by the user;
The presenting means sets a category determined as the important category candidate other than the selected category determined to be a category related to the selected category as a category related to the selected category. The document analysis apparatus according to claim 1, further presenting.

A further counting means,
The important document designating unit designates a third document stored in the storage unit in association with category identification information for identifying the presented category according to the operation of the user,
The counting means refers to the storage means and counts the number of the designated first and third documents belonging to a plurality of categories designated by the user,
The document analysis apparatus according to claim 1, wherein the presenting unit further presents the total result.

For each category into which a plurality of documents are classified, an external storage device having storage means for storing category identification information for identifying the category and documents belonging to the category in association with each other, and a computer using the external storage device A program executed by the computer in a document analysis device comprising:
In the computer,
Designating a first document stored in the storage means in response to a user operation;
Identifying category identification information stored in the storage means in association with the designated first document;
Identifying a second document stored in the storage means in association with the identified category identification information;
The category identification information and the second document stored in the storage means are associated with the similarity between the designated first document and the identified second document in association with the first document. Calculating based on category identification information stored in the storage means in association with each other;
Determining the identified second document as an important document candidate based on the calculated similarity;
The important document candidate stored in the storage means in association with the category identification information among the calculated similarities, the importance of the category identified by the category identification information stored in the storage means Calculating based on the similarity between the second document determined as and the designated first document;
Determining a category identified by category identification information stored in the storage means as an important category candidate based on the calculated importance;
Presenting the category determined as the important category candidate.