JP2012203868A

JP2012203868A - Skim reading support system, skim reading support method and program

Info

Publication number: JP2012203868A
Application number: JP2011070926A
Authority: JP
Inventors: Yumi Wakagi; 裕美若木; Toshihiro Yamazaki; 智弘山崎; Masaru Suzuki; 優鈴木; Kazuo Sumita; 一男住田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2011-03-28
Filing date: 2011-03-28
Publication date: 2012-10-22
Anticipated expiration: 2031-03-28
Also published as: JP5259764B2

Abstract

PROBLEM TO BE SOLVED: To support skim reading of a large number of documents.SOLUTION: According to an embodiment, a system comprises a document storage, a display unit, an input unit, a classified information storage, an extractor and a specifying unit. The document storage stores a plurality of documents together with identifying information. A user accesses a document on the display unit, and designates the classification type to be assigned to the document from the input unit. The classified information storage stores the classification type of the document designated by the user. The extractor extracts, out of one or more of the documents assigned the same classification type, one or more words or phrases to be highlighted in display regarding the classification type. The specifying unit specifies, regarding each of documents to which the user has assigned no classification type, a section in which the word(s) or phrase(s) to be highlighted in the document. The display unit, in displaying a document, highlights the word(s) or phrase(s) in the section.

Description

本発明の実施形態は、多数の文書の拾い読みを支援するための拾い読み支援システム、方法及びプログラムに関する。 Embodiments described herein relate generally to a browsing support system, method, and program for supporting browsing of a large number of documents.

コンピュータが広く浸透し、通信速度・処理速度などの高速化、ハードディスク・メモリの大容量化などハードウェアの進化とともに、文書の電子化が進み、日常的に大量の情報を扱うことが多くなった。一般ユーザにとっては、情報検索など様々なソフトウェアの技術革新とともに、大量の情報から知りたい情報を利用するための支援を受けられるようになってきた。 With the widespread use of computers, with the advancement of hardware such as higher communication speed and processing speed, and increased capacity of hard disks and memories, the digitization of documents has progressed and a large amount of information has been handled on a daily basis. . For general users, along with technological innovations of various software such as information retrieval, it has become possible to receive support for using information that they want to know from a large amount of information.

しかし、例えば特許調査・文献調査・市場調査のように漏れなく網羅的に内容を調査しなければならないビジネスシーンなどでは、大量文書にユーザ自身が目を通す必要があり、検索や分類などの機械的処理で高精度に読むべき文書を少量に減らすことは難しい。また、時間的な制約があることが一般的で、通常は人手で或いは無意識的に、精読すべき箇所を選別するなどの拾い読みを行っている。あるいは、多数の文書を複数人で分担できるように、精読する前処理として、各人の専門分野などを元に拾い読みによって割り振りを行うなどを行っている。 However, in business scenes where content must be thoroughly investigated, such as patent research, literature research, and market research, for example, users need to read a large number of documents. It is difficult to reduce the number of documents to be read with high accuracy to a small amount. In addition, there are generally time restrictions, and browsing is usually performed such as selecting a portion to be read carefully, manually or unconsciously. Alternatively, as a pre-processing for detailed reading so that a large number of documents can be shared by a plurality of people, allocation is performed by browsing based on each person's specialized field.

特開２００８−１７１１６４号公報JP 2008-171164 A

多数の文書の拾い読みを支援する技術は知られていなかった。 No technology has been known to assist in browsing a large number of documents.

本実施形態は、多数の文書の拾い読みを支援することの可能な拾い読み支援システム、拾い読み支援方法及びプログラムを提供することを目的とする。 An object of the present embodiment is to provide a browsing support system, a browsing support method, and a program that can support browsing of a large number of documents.

実施形態によれば、文書記憶部と、表示部と、入力部と、分類情報記憶部と、抽出部と、特定部とを備える。文書記憶部は、識別情報が対応付けられた複数の文書を記憶する。表示部は、前記複数の文書のうちの全部又は一部を、単語又はフレーズのハイライト表示を伴って又はハイライト表示を伴わずに表示する。入力部は、表示された前記文書のうちの特定の文書の指示及び予め定められた複数種類の分類タイプのうちから当該特定の文書に付与する特定の分類タイプの指示をユーザから入力する。分類情報記憶部は、前記識別情報と前記特定の分類タイプとを対応付けた分類情報を記憶する。抽出部は、同一の分類タイプが対応付けられている１又は複数の前記文書から、当該分類タイプについてハイライト表示すべき１又は複数の単語又はフレーズを抽出する。特定部は、前記文書の全部又は一部の各々について、前記抽出部により抽出された各々の単語又はフレーズが当該文書中に存在する場合に当該文書中でハイライト表示すべき箇所を特定する。 According to the embodiment, a document storage unit, a display unit, an input unit, a classification information storage unit, an extraction unit, and a specifying unit are provided. The document storage unit stores a plurality of documents associated with identification information. The display unit displays all or a part of the plurality of documents with or without highlight display of words or phrases. The input unit inputs, from the user, an instruction of a specific document among the displayed documents and an instruction of a specific classification type to be given to the specific document among a plurality of predetermined classification types. The classification information storage unit stores classification information in which the identification information is associated with the specific classification type. The extraction unit extracts one or a plurality of words or phrases to be highlighted from the one or a plurality of the documents associated with the same classification type. The specifying unit specifies a portion to be highlighted in the document when each word or phrase extracted by the extracting unit is present in the document for each of all or a part of the document.

第１の実施形態に係る拾い読み支援システムの機能構成例を示す図。The figure which shows the function structural example of the browsing assistance system which concerns on 1st Embodiment. 文書データ記憶部に記憶される文書データの一例を示す図。The figure which shows an example of the document data memorize | stored in a document data memory | storage part. 分類情報記憶部に記憶される文書分類情報（初期状態）の一例を示す図。The figure which shows an example of the document classification information (initial state) memorize | stored in a classification information storage part. 拾い読み単語抽出部により作成されるスコア付けされた単語リスト（初期状態）の一例を示す図。The figure which shows an example of the scored word list (initial state) produced by the browsing word extraction part. 拾い読み支援システムのシステム画面例及び動作例について説明するための図。The figure for demonstrating the system screen example and operation example of a browsing assistance system. 第１の実施形態の拾い読み支援システムの処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence of the browsing assistance system of 1st Embodiment. ハイライト表示を伴わない文書表示例を示す図。The figure which shows the example of a document display without a highlight display. 更新された文書分類情報の一例を示す図。The figure which shows an example of the updated document classification information. 更新されたスコア付けされた単語リストの一例を示す図。The figure which shows an example of the updated scored word list. 拾い読み単語抽出部の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence of the browsing word extraction part. ハイライト箇所特定部の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence of a highlight location specific | specification part. テキスト表示部の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence of a text display part. ハイライト表示を伴う文書表示例を示す図。The figure which shows the example of a document display accompanied with a highlight display. 第２の実施形態に係る拾い読み支援システムの機能構成例を示す図。The figure which shows the function structural example of the browsing assistance system which concerns on 2nd Embodiment. 拾い読み単語抽出部の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence of the browsing word extraction part. スコア付けされた単語リスト及び異言語間の単語置換の一例について説明するための図。The figure for demonstrating an example of the word substitution scored and the word substitution between different languages. テキスト表示部の処理手順の他の例を示すフローチャート。The flowchart which shows the other example of the process sequence of a text display part. スコア付けされた単語リスト及び異言語間の単語置換の一例について説明するための図。The figure for demonstrating an example of the word substitution scored and the word substitution between different languages. 異言語によるハイライト表示を伴う文書表示例を示す図。The figure which shows the example of a document display accompanied with the highlight display by a different language. 第３の実施形態に係る拾い読み支援システムの機能構成例を示す図。The figure which shows the function structural example of the browsing assistance system which concerns on 3rd Embodiment. 第４の実施形態に係る拾い読み支援システムの機能構成例を示す図。The figure which shows the function structural example of the browsing assistance system which concerns on 4th Embodiment.

以下、図面を参照しながら本発明の実施形態に係る拾い読み支援システムについて詳細に説明する。なお、以下の実施形態では、同一の番号を付した部分については同様の動作を行うものとして、重ねての説明を省略する。 Hereinafter, a browsing support system according to an embodiment of the present invention will be described in detail with reference to the drawings. Note that, in the following embodiments, the same numbered portions are assumed to perform the same operation, and repeated description is omitted.

（第１の実施形態）
従来、例えば特許調査・文献調査・市場調査をはじめとする文書調査などのように多数の文書を閲覧する際に、検索や分類などの機械的処理だけで高精度に読むべき文書を少量に減らすことは難しく、ユーザ自身が目で単語を走査し、拾い読みを行っていた。 (First embodiment)
Conventionally, when browsing a large number of documents, such as patent research, literature research, market research and other document research, etc., the number of documents to be read with high accuracy is reduced to a small amount only by mechanical processing such as search and classification. It was difficult, and the user himself scanned the word with his eyes and browsed it.

第１の実施形態では、（例えば表示された文書をユーザが任意に閲覧してその分類タイプを判断した上で）ユーザが入力した幾つかの文書に対する分類を示す情報（分類タイプ）に基づいて、自動的に拾い読みのキーワード抽出を行い、ユーザが未分類の文書中のキーワードのハイライト表示を行うことによって、ユーザの拾い読みを支援する場合を例にとって説明する。 In the first embodiment, based on information (classification type) indicating classifications for some documents input by the user (for example, after the user arbitrarily browses the displayed document and determines the classification type). An example in which browsing keyword extraction is automatically performed and the user supports browsing by highlighting keywords in an uncategorized document will be described as an example.

以下では、ユーザが文書に付与する分類タイプとして、当該文書が必要な文書であるか否かによって、少なくとも以下の２種類の文書タイプが設けられる具体例を中心に説明する。
（ａ）ユーザが必要であるとした文書を示す分類タイプ（以下、必要文書タイプ）、
（ｂ）ユーザが不要であるとした文書を示す分類タイプ（以下、不要文書タイプ）。 Hereinafter, a specific example in which at least the following two types of document types are provided as the classification type to be given to a document by the user depending on whether the document is a necessary document will be mainly described.
(A) a classification type (hereinafter referred to as a required document type) indicating a document that the user considers necessary;
(B) A classification type (hereinafter referred to as “unnecessary document type”) indicating a document that the user does not need.

この例の場合には、ユーザは、所望の文書に対して、分類タイプとして、「必要文書タイプ」又は「不要文書タイプ」のいずれかを入力できる。 In the case of this example, the user can input either “required document type” or “unnecessary document type” as a classification type for a desired document.

上記は一例であり、他にも様々な分類方法を使用することが可能である。 The above is an example, and various other classification methods can be used.

例えば、当該文書がいずれの担当者に関連するかによって分類する方法が可能である。例えば、Ａ〜Ｅの５人の担当者が設定される場合に、少なくとも以下の５種類の文書タイプが設けられる。
・担当者Ａに関連する文書を示す分類タイプ（担当者Ａ文書タイプ）、
・担当者Ｂに関連する文書を示す分類タイプ（担当者Ｂ文書タイプ）、
・担当者Ｃに関連する文書を示す分類タイプ（担当者Ｃ文書タイプ）、
・担当者Ｄに関連する文書を示す分類タイプ（担当者Ｄ文書タイプ）、
・担当者Ｅに関連する文書を示す分類タイプ（担当者Ｅ文書タイプ）。 For example, it is possible to classify according to which person in charge the document is related to. For example, when five persons A to E are set, at least the following five document types are provided.
A classification type (person A document type) indicating a document related to person A
A classification type (person B document type) indicating a document related to person B
A classification type (person C document type) indicating a document related to the person C
A classification type (person in charge D document type) indicating a document related to person in charge D;
A classification type indicating a document related to the person in charge E (person in charge E document type).

この例の場合には、ユーザは、所望の文書に対して、分類タイプとして、「担当者Ａ文書タイプ」〜「担当者Ｅ文書タイプ」のいずれかを入力できる。 In the case of this example, the user can input any one of “person A document type” to “person E document type” as a classification type for a desired document.

また、上記二つの例を併せて、例えば少なくとも以下の６種類の文書タイプを設けることも可能である。
・担当者Ａに関連する文書を示す分類タイプ（担当者Ａ文書タイプ）、
・担当者Ｂに関連する文書を示す分類タイプ（担当者Ｂ文書タイプ）、
・担当者Ｃに関連する文書を示す分類タイプ（担当者Ｃ文書タイプ）、
・担当者Ｄに関連する文書を示す分類タイプ（担当者Ｄ文書タイプ）、
・担当者Ｅに関連する文書を示す分類タイプ（担当者Ｅ文書タイプ）、
・ユーザが不要であるとした文書を示す分類タイプ（不要文書タイプ）。 In addition, for example, at least the following six document types can be provided in combination with the above two examples.
A classification type (person A document type) indicating a document related to person A
A classification type (person B document type) indicating a document related to person B
A classification type (person C document type) indicating a document related to the person C
A classification type (person in charge D document type) indicating a document related to person in charge D;
-Classification type (person E document type) indicating the document related to person E
A classification type (unnecessary document type) indicating a document that the user does not need.

なお、分類タイプの一つとして、更に、ユーザが未だ分類タイプを入力していない文書であることを示すタイプ（ここでは、未読文書タイプと呼ぶ）を設けることも可能である。例えば、上記具体例において、更に未読文書タイプを使用する場合には、ユーザが幾つかの文書に対して必要文書タイプを入力し、他の幾つかの文書に対して不要文書タイプを入力した場合に、残りの文書には、自動的に、「未読文書タイプ」が付与されることになる。 As one of the classification types, a type (herein referred to as an unread document type) indicating that the user has not yet input a classification type can be provided. For example, in the above specific example, when the unread document type is used further, the user inputs the required document type for some documents and the unnecessary document type for some other documents. In addition, the “unread document type” is automatically assigned to the remaining documents.

図１に、第１の実施形態の拾い読み支援システムの機能構成例を示す。 FIG. 1 shows a functional configuration example of the browsing support system according to the first embodiment.

図１に示されるように本実施形態の拾い読み支援システムは、拾い読み支援システムの構成は、拾い読み単語抽出部１０１、ハイライト箇所特定部１０２、テキスト表示部１０３、ユーザ入力部１０４、文書データ記憶部１０５、分類情報記憶部１０６を備えている。 As shown in FIG. 1, the browsing support system according to the present embodiment includes a browsing support system including a browsing word extraction unit 101, a highlight location specifying unit 102, a text display unit 103, a user input unit 104, and a document data storage unit. 105, a classification information storage unit 106 is provided.

文書データ記憶部１０５は、複数の文書のデータを記憶する。 The document data storage unit 105 stores data of a plurality of documents.

文書は、どのようなものであっても良い。例えば、文書は、何らかのドキュメント本文であっても良いし、そのドキュメント本文に対する要約文であっても良い。例えば、ドキュメント本文が特許明細書であり、文書データ記憶部１０５に記憶される各文書が、各特許明細書に対応する要約文である場合に、ユーザは、各要約文に対応する各特許文書を実際に読むかどうかを判断するために、各要約文を拾い読みすることがある。本実施形態では、文書の例として、特許明細書の要約文を例にとりつつ説明する。 The document may be anything. For example, the document may be any document body or a summary sentence for the document body. For example, when the document text is a patent specification and each document stored in the document data storage unit 105 is a summary sentence corresponding to each patent specification, the user can select each patent document corresponding to each summary sentence. Each summary sentence may be browsed to determine whether or not it is actually read. In the present embodiment, a summary sentence of a patent specification will be described as an example of a document.

図２に、文書データ記憶部１０５に記憶される文書データの一例を示す。図２の例では、各文書データに、文書識別子（以下、文書ＩＤ）（図中、０００１〜０００６）が付与されている。なお、文書データの記憶フォーマットは、図２に制限されない。 FIG. 2 shows an example of document data stored in the document data storage unit 105. In the example of FIG. 2, a document identifier (hereinafter, document ID) (0001 to 0006 in the figure) is assigned to each document data. The document data storage format is not limited to that shown in FIG.

なお、文書データ記憶部１０５に記憶される文書データは、記録媒体から入力されたものであっても良いし、インターネットなどのネットワークを介してダウンロードされたものであっても良いし、キー入力された文書を含んでも良いし、他のどのような方法で得られたものであっても良い。 The document data stored in the document data storage unit 105 may be input from a recording medium, may be downloaded via a network such as the Internet, or may be input by key input. May be included, or may be obtained by any other method.

また、文書データ記憶部１０５に記憶される文書データは、例えば、文書ＩＤ又は他の何らかの基準によってソートされていても良いし、特にソートされていなくても良い。 Further, the document data stored in the document data storage unit 105 may be sorted by, for example, the document ID or some other standard, or may not be sorted in particular.

テキスト表示部１０３は、文書データ記憶部１０５に記憶されている複数の文書（その文書数をＮとする）について、一度に所望の文書数ｄの文書を表示する。なお、ここでは、２≦ｄ≦Ｎとして説明するが、ｄ＝１の表示状態があっても構わない。 The text display unit 103 displays a desired number of documents d at a time for a plurality of documents (the number of documents is N) stored in the document data storage unit 105. In addition, although it demonstrates as 2 <= d <= N here, there may exist a display state of d = 1.

その際、表示の仕方に制限はなく、例えば、表示対象となった各々の文書について、その文書全体を表示する方法も可能であり、また、例えば、その文書の一部を表示する（例えば、その文書を、予め定められた上限となる文字数の部分まで表示し、残りの部分はユーザが所定の操作を行うことによって表示されるようになる）方法も可能である。また、一つの画面に同時に表示する文書数ｎをユーザが指示できるようにしても良いし、一つの文書について表示する上限文字数をユーザが指示できるようにしても良いし、その他にも様々な表示方法が可能である。 At this time, there is no limitation on the display method. For example, for each document to be displayed, a method of displaying the entire document is also possible. For example, a part of the document is displayed (for example, A method is also possible in which the document is displayed up to a predetermined upper limit number of characters, and the remaining portion is displayed by the user performing a predetermined operation. Further, the user may be able to specify the number n of documents to be displayed simultaneously on one screen, the user may be able to specify the upper limit number of characters to be displayed for one document, and various other displays. A method is possible.

また、複数の文書のうちの一部の文書を表示する場合に、表示する文書の選択方法に制限はない。例えば、表示する文書を、文書ＩＤ又は他の何らかの基準によって選択しても良いし、ユーザが表示する文書を指示しても良い。 In addition, when displaying a part of a plurality of documents, there is no limitation on a method for selecting a document to be displayed. For example, the document to be displayed may be selected based on the document ID or some other criterion, or the user may indicate the document to be displayed.

ユーザ入力部１０４は、ユーザが指定する文書の指示とその文書に対するユーザが指定する分類タイプの指示を入力する。例えば、ユーザは、テキスト表示部１０３に表示された文書のうちから、所定の方法で所望の文書を選択するとともに、その文書に対する所望の分類タイプを所定の方法で選択しても良い。なお、文書選択方法や分類タイプ選択方法に特に制限はない。 The user input unit 104 inputs a document designation designated by the user and a classification type designation designated by the user for the document. For example, the user may select a desired document from a document displayed on the text display unit 103 by a predetermined method and select a desired classification type for the document by a predetermined method. There are no particular restrictions on the document selection method and the classification type selection method.

分類タイプは、分類方法が予め１種類に定められていても良いし（例えば、「必要文書タイプ」「不要文書タイプ」のいずれか）、分類方法が予め複数種類に定められていて、それらのうちからユーザが選択するようにしても良いし（例えば、「必要文書タイプ」「不要文書タイプ」のいずれか、又は、「担当者Ａ文書タイプ」）〜「担当者Ｅ文書タイプ」のいずれか）、いつでもユーザが自由な分類タイプを任意に設定可能であっても良いし、それらの組み合わせであっても良い。 As for the classification type, the classification method may be determined in advance as one type (for example, “necessary document type” or “unnecessary document type”), and the classification methods are determined in advance as a plurality of types. The user may select from among them (for example, “necessary document type”, “unnecessary document type”, or “person A document type”) to “person E document type”. ), The user can arbitrarily set a free classification type at any time, or a combination thereof.

分類情報記憶部１０６は、各々の文書ＩＤと、当該文書ＩＤに対する分類タイプとの対応を示す文書分類情報を記憶する。 The classification information storage unit 106 stores document classification information indicating correspondence between each document ID and a classification type for the document ID.

図３（ａ）に、分類情報記憶部１０６に記憶される文書分類情報の例を示す。図３（ａ）の例は、分類タイプが何も入力されていない初期的な状態を示す。以下、一例として、文書分類情報において、「不要文書タイプ」については値「Ａ」が記憶され、「必要文書タイプ」については値「Ｂ」が記憶されるものとして説明する。 FIG. 3A shows an example of document classification information stored in the classification information storage unit 106. The example of FIG. 3A shows an initial state in which no classification type is input. Hereinafter, as an example, in the document classification information, it is assumed that a value “A” is stored for “unnecessary document type” and a value “B” is stored for “required document type”.

なお、前述の「未読文書タイプ」を使用しない場合には、図３（ａ）の例を使用すれば良い。また、「未読文書タイプ」を使用する場合には、図３（ｂ）のように文書分類情報の初期状態として全文書について、ユーザが入力する分類タイプ以外の値（例えば値「Ｃ」）を記憶しておき、値「Ｃ」が記憶されている文書は、「未読文書タイプ」として扱うようにしても良いし、または、図３（ａ）の例を使用して、値が何も記憶されていない（あるいは、値が「ｎｕｌｌ」である）文書は、「未読文書タイプ」として扱うようにしても良い。 If the above-mentioned “unread document type” is not used, the example of FIG. 3A may be used. When “unread document type” is used, values other than the classification type input by the user (for example, value “C”) are set for all documents as the initial state of the document classification information as shown in FIG. A document that is stored and the value “C” is stored may be handled as “unread document type”, or any value is stored using the example of FIG. A document that is not (or has a value of “null”) may be handled as an “unread document type”.

いずれの場合においても、文書データ記憶部１０５に記憶されている文書の文書数がＮである場合に、文書分類情報においてユーザにより入力された分類タイプに対応する値が記憶される数ｃは、０≦ｃ≦Ｎである。 In any case, when the number of documents stored in the document data storage unit 105 is N, the number c in which the value corresponding to the classification type input by the user in the document classification information is stored is: 0 ≦ c ≦ N.

以下では、ｃ＝０の場合には、ハイライト表示を行わないものとし、また、ｃ＝Ｎに達した場合には、すべての文書についてユーザによる分類がなされたことを意味するので、それ以上の拾い読み単語抽出部１０１及びハイライト箇所特定部１０２の処理は、行わないものとする。ただし、「未読文書タイプ」を使用する場合に、ｃ＝０のときに、ハイライト表示を行うことも可能である。 In the following, when c = 0, no highlight display is performed, and when c = N is reached, it means that all documents have been classified by the user. The processing of the browsing word extraction unit 101 and the highlight location specifying unit 102 is not performed. However, when the “unread document type” is used, it is possible to perform highlight display when c = 0.

また、以下では、まだユーザにより分類タイプが付与されていない（Ｎ−ｃ）個の文書のみを、ハイライト表示の対象とするものとして説明する。ただし、Ｎ個のすべての文書を、ハイライト表示の対象とすることも可能である。 In the following description, it is assumed that only (Nc) documents that have not yet been assigned a classification type by the user are to be highlighted. However, all N documents can be highlighted.

拾い読み単語抽出部１０１は、文書分類情報において、少なくとも一つの文書について、ユーザにより入力された分類タイプが記憶されている場合に（すなわち、ｃ≧１である場合に）、文書分類情報中に対応する値が存在する分類タイプごとに、当該分類タイプに対応する文書から、当該分類タイプに特徴的な単語のリストを作成する。その際、各単語について、当該分類タイプに特徴的である程度を表すスコアを計算する。 The browsing word extraction unit 101 corresponds to the document classification information when the classification type input by the user is stored for at least one document in the document classification information (that is, when c ≧ 1). For each classification type having a value to be generated, a list of words characteristic of the classification type is created from a document corresponding to the classification type. At that time, for each word, a score characteristic to the classification type and representing a certain level is calculated.

図４に、拾い読み単語抽出部１０１がその処理において作成するスコア付けされた単語リスト（スコア付単語リスト）の例を示す。本具体例のように「必要文書タイプ」及び「不要文書タイプ」を使用する場合には、文書タイプごとにスコア付単語リストが作成される。図４（ａ）及び（ｂ）の例は、それぞれ、「不要文書タイプ」用のスコア付単語リスト及び「必要文書タイプ」用のスコア付単語リストの初期的な状態を示す。なお、「未読文書タイプ」を使用する場合には、更に、「未読文書タイプ」用のスコア付単語リストも設けられる。 FIG. 4 shows an example of a scored word list (scored word list) created by the browsing word extraction unit 101 in the process. When “necessary document type” and “unnecessary document type” are used as in this specific example, a scored word list is created for each document type. The examples of FIGS. 4A and 4B show initial states of the scored word list for “unnecessary document type” and the scored word list for “necessary document type”, respectively. When “unread document type” is used, a scored word list for “unread document type” is also provided.

各々の分類タイプごとに、そのスコア付単語リスト中の単語が、ハイライト表示されるものとして選択される。なお、例えば、スコア付単語リスト中の単語が予め定められた個数ｋを超えた場合には、スコア付単語リスト中でスコアが上位のｋ個を選択する方法、スコアが予め定められた閾値以上の単語のみ使用する方法など、あるいは、それらを組み合わせた方法なども可能であり、また、他の様々な方法が可能である。 For each classification type, the words in the scored word list are selected as highlighted. In addition, for example, when the number of words in the scored word list exceeds a predetermined number k, a method of selecting the top k scores in the scored word list, the score is equal to or higher than a predetermined threshold A method using only these words or a combination of them is also possible, and various other methods are possible.

なお、文書分類情報中に対応する値が存在しない分類タイプについては、その間、スコア付単語リストは作成されない。例えば、上記具体例において、文書分類情報中に、図３（ａ）については値「Ｂ」のみ存在する場合に、図３（ｂ）については、「Ｂ」及び「Ｃ」のみ存在する場合に、値「Ａ」に対応する「不要文書タイプ」については、スコア付単語リストは作成されない。 Note that a word list with scores is not created for a classification type for which there is no corresponding value in the document classification information. For example, in the above specific example, in the document classification information, only the value “B” exists for FIG. 3A, and only “B” and “C” exist for FIG. 3B. For the “unnecessary document type” corresponding to the value “A”, a scored word list is not created.

以下、各々の分類タイプごとに、ハイライト表示されるものとして選択された単語（又は又は用語又はフレーズ）を、「拾い読み単語候補」と呼ぶものとする。 Hereinafter, the word (or term or phrase) selected to be highlighted for each classification type will be referred to as a “browsing word candidate”.

ハイライト箇所特定部１０２は、ユーザが分類タイプを入力していない各々の文書について、当該文書中で各々の「拾い読み単語候補」が出現する箇所を探し（なお、「拾い読み単語候補」の全部又は一部が存在しないこともある）、当該文書中において出現する各々の「拾い読み単語候補」からハイライトすべき箇所を選択する。例えば、一つの文書において、同一の「拾い読み単語候補」について１箇所のみハイライト表示するものとした場合に、ある文書中にある「拾い読み単語候補」が複数存在するときに、いずれの箇所をハイライト表示するかを選択する。 For each document for which the user has not entered a classification type, the highlighted location specifying unit 102 searches for a location where each “browsing word candidate” appears in the document (note that all of the “browsing word candidates” or A part to be present may be absent), and a portion to be highlighted is selected from each “browsing word candidate” appearing in the document. For example, in a single document, if only one location is highlighted for the same “browsing word candidate”, when there are multiple “browsing word candidates” in a document, any location is highlighted. Select whether to display light.

ユーザが少なくとも一つの文書に対して分類タイプを入力した後は、テキスト表示部１０３は、ユーザが分類タイプを入力していない文書について、その文書中の単語のうち、拾い読み単語抽出部１０１により分類タイプごとに抽出された単語であって且つハイライト箇所特定部１０２により特定された箇所の単語をハイライト表示する。 After the user inputs the classification type for at least one document, the text display unit 103 classifies the document for which the user has not input the classification type by the browsing word extraction unit 101 among the words in the document. A word extracted for each type and highlighted at a location specified by the highlighted location specifying unit 102 is highlighted.

その際、「拾い読み単語候補」をハイライト表示するにあたって、その分類タイプに対応するハイライト形態でハイライト表示するようにしても良い。例えば、フォントを変えることによって拾い読み単語候補を示す場合に、分類タイプごとに、文字の色等のフォントを変えても良いし、拾い読み単語候補を枠で囲んで示す場合に、分類タイプごとに、枠の形状、線種、色、枠内のハッチングの有無、ハッチングの種類等を変えても良いし、それらを組み合わせても良いし、また、他にも様々なハイライト形態が可能である。 At this time, when the “browsing word candidate” is highlighted, it may be highlighted in a highlight form corresponding to the classification type. For example, when the browsing word candidate is shown by changing the font, the font such as the character color may be changed for each classification type, or when the browsing word candidate is surrounded by a frame and shown for each classification type, The shape, line type, color, presence / absence of hatching in the frame, the type of hatching, etc. may be changed, or combinations thereof, and various other highlight forms are possible.

また、例えば、必要文書タイプを一番に目立つハイライト形態、不要文書タイプを次に目立つハイライト形態、未読文書タイプをその次に目立つハイライト形態にするような方法も可能である。 Further, for example, a method is possible in which the necessary document type is the most noticeable highlight form, the unnecessary document type is the next most noticeable highlight form, and the unread document type is the next most noticeable highlight form.

ユーザ入力部１０４でユーザからの入力が起こると、分類情報記憶部１０６が更新され、拾い読み単語抽出部１０１、ハイライト箇所特定部１０２、テキスト表示部１０３の一連の処理が行われる。 When an input from the user occurs in the user input unit 104, the classification information storage unit 106 is updated, and a series of processes of the browsing word extraction unit 101, the highlight location specifying unit 102, and the text display unit 103 are performed.

なお、上記一連の処理は、ユーザ入力部１０４から一つの文書に係る文書分類情報を入力するごとに、これを契機として実行することとしても良いし、あるいは、ユーザ入力部１０４から文書分類情報が入力されただけでは、上記一連の処理を実行せず、（例えばユーザ入力部１０４から）上記一連の処理を実行するための所定の指示が入力されたときに、これを契機として上記一連の処理を実行することとしても良い。 The series of processes may be executed every time document classification information related to one document is input from the user input unit 104, or the document classification information is received from the user input unit 104. The above-described series of processing is not executed only by the input. When a predetermined instruction for executing the above-described series of processing is input (for example, from the user input unit 104), the above-described series of processing is triggered by this. It is also possible to execute.

ここで、図５のシステム画面例を参照しながら、本実施形態の全体的な動作例の概要について説明する。図５では、各文書が特許明細書に対応する要約文である場合を例にとって説明する。 Here, an outline of an overall operation example of the present embodiment will be described with reference to the system screen example of FIG. In FIG. 5, a case where each document is a summary corresponding to a patent specification will be described as an example.

なお、以下では、必要文書タイプを「○」、不要文書タイプを「×」でも表すものとする。 In the following, the necessary document type is also represented by “◯” and the unnecessary document type is represented by “×”.

まず、テキスト表示部１０３が、図５のシステム画面（１２１）の内側に、各文書（図中、１２２参照）を表示する。なお、初期的な状態では、ユーザによる分類タイプの入力がなされておらず、実際には図５と異なり、「○」「×」は表示されておらず、ハイライト表示も行われていない。 First, the text display unit 103 displays each document (see 122 in the figure) inside the system screen (121) of FIG. In the initial state, the classification type is not input by the user. Actually, unlike FIG. 5, “◯” and “x” are not displayed and no highlight display is performed.

図５の具体例は、システム画面（１２１）の内側に、各文書を５行２列に表示するものである。 The specific example of FIG. 5 displays each document in five rows and two columns inside the system screen (121).

次に、ユーザは、図５のシステム画面（１２１）中に表示された文書を任意に読む。 Next, the user arbitrarily reads the document displayed on the system screen (121) of FIG.

なお、図５の５行２列に表示する例において、１１以上の文書が存在する場合には、ユーザは、例えばスクロール又はページ更新をするなどして他の文書を表示させても良い。もちろん、文書の表示は、５行２列の表示に制限されない。 In the example of displaying in 5 rows and 2 columns in FIG. 5, when there are 11 or more documents, the user may display other documents by scrolling or page updating, for example. Of course, the display of the document is not limited to the display of 5 rows and 2 columns.

そして、ユーザは、分類タイプを入力すると判断した文書を選択するとともに、その文書についてユーザが判断した分類タイプを選択することによって、文書分類情報｛文書，分類タイプ｝を入力する。すなわち、システム側は、ユーザ入力部１０４において、ユーザが選択する｛文書，分類タイプ｝の入力を受け付ける。 Then, the user inputs the document classification information {document, classification type} by selecting the document determined to input the classification type and selecting the classification type determined by the user for the document. That is, the system side accepts input of {document, classification type} selected by the user in the user input unit 104.

例えば、ユーザが１２４で示される文書を読んで、この文書を不要と判断した場合（例えば、この要約文に対応する特許明細書の全文は読まなくて良いと判断した場合）、この文書データ（１２４）に対して不要文書タイプ記号「×」をユーザが選択する。 For example, when the user reads the document indicated by 124 and determines that this document is unnecessary (for example, when it is determined that it is not necessary to read the full text of the patent specification corresponding to this summary sentence), the document data ( 124), the user selects the unnecessary document type symbol “x”.

同様の、ユーザが１２５で示される文書を読んで、この文書を必要と判断した場合（例えば、この要約文に対応する特許明細書の全文は読む必要があると判断した場合）、この文書データ（１２４）に対して必要文書タイプ記号「○」をユーザが選択する。 Similarly, when the user reads the document indicated by 125 and determines that this document is necessary (for example, when it is determined that the full text of the patent specification corresponding to this summary sentence needs to be read), this document data For (124), the user selects the required document type symbol “◯”.

なお、ユーザは、それら以外の文書には分類を付与していないとする。 It is assumed that the user has not assigned a classification to other documents.

この場合に、図５に示されるように、分類が付与された文書１２４，１２５についてそれぞれ付与された分類タイプを示す「×」「○」が表示されても良い。もちろん、他の付与された分類タイプを識別可能にしても良い。 In this case, as shown in FIG. 5, “x” and “◯” may be displayed indicating the classification type assigned to each of the documents 124 and 125 to which the classification is assigned. Of course, other assigned classification types may be identifiable.

さて、文書データ１２４，１２５に対して分類タイプが付与されたときに、システム内では、文書データ１２４の文書ＩＤ及び付与された分類タイプを示す文書分類情報と、文書データ１２５の文書ＩＤ及び付与された分類タイプを示す文書分類情報を受け取る。 Now, when a classification type is assigned to the document data 124, 125, in the system, the document classification information indicating the document ID of the document data 124 and the assigned classification type, and the document ID of the document data 125 and the assignment are given. Document classification information indicating the classified classification type is received.

そして、上記二つの文書分類情報をもとに、拾い読み単語抽出部１０１の処理、ハイライト箇所特定部１０２の処理、テキスト表示部１０３の処理からなる一連の処理を行って、図５に例示されるように、文書データ１２４，１２５以外の文書に対して、拾い読み単語をハイライト表示する。 Then, based on the above two document classification information, a series of processes including the process of the browsing word extraction unit 101, the process of the highlight location specifying unit 102, and the process of the text display unit 103 is performed, and is illustrated in FIG. As shown, the browsing words are highlighted on the documents other than the document data 124 and 125.

なお、拾い読み単語抽出部１０１では、分類タイプごとにスコアを計算する。 The browsing word extraction unit 101 calculates a score for each classification type.

例えば、図５中のタイプ分け凡例（１２３）のように、必要文書タイプ・不要文書タイプ・未読文書タイプの３分類でそれぞれスコアの高い語を用意することで、タイプごとに単語のハイライト方法を変えることができる。 For example, as shown in the type classification legend (123) in FIG. 5, a word highlighting method is prepared for each type by preparing words with high scores for each of the required document type, unnecessary document type, and unread document type. Can be changed.

異なるハイライト方法を適用する例として、例えば、前述のようにハイライト色を変更しても良い。例えば、必要文書タイプをピンク、不要文書タイプを黄色、未読文書タイプを緑で各タイプの拾い読み単語をハイライトしても良い。 As an example of applying different highlight methods, for example, the highlight color may be changed as described above. For example, each type of browsing word may be highlighted with the required document type pink, the unnecessary document type yellow, and the unread document type green.

図５では、タイプごとに単語のハイライト方法を変える様子を例示するために、必要文書タイプでハイライトする単語の部分についてはクロスハッチング枠で、不要文書タイプについては斜線ハッチング枠で、未読文書タイプについてはハッチングなしの枠で、それぞれハイライトを行う例を示した。 In FIG. 5, in order to exemplify how the word highlighting method is changed for each type, an unread document is displayed in a cross-hatching frame for a portion of a word to be highlighted in a necessary document type and in a hatched hatch frame for an unnecessary document type. An example of highlighting each type with a frame without hatching is shown.

図５に例示するような表示状態において、ユーザは、単語にハイライトが付加された文書群を閲覧しながら、ハイライトされた単語を中心に拾い読みすることができ、更に、未分類の文書へ分類を付与していくことができる。その際、例えば、必要に応じてハイライトされた単語の周辺単語も合わせて読むこともできる。 In the display state illustrated in FIG. 5, the user can browse the highlighted word while browsing the document group with the highlighted word, and further to the unclassified document. Classification can be given. At that time, for example, if necessary, peripheral words of the highlighted word can be read together.

図６に、本実施形態の拾い読み支援システムの処理手順の一例を示す。 FIG. 6 shows an example of the processing procedure of the browsing support system of this embodiment.

ステップＳ１において、テキスト表示部１０３は、初期的に文書を表示する。 In step S1, the text display unit 103 initially displays a document.

図７に、図２に例示した文書を表示した例を示す。 FIG. 7 shows an example in which the document illustrated in FIG. 2 is displayed.

なお、この初期の段階では、分類情報記憶部１０６に記憶される文書分類情報は、図３（ａ）又は図３（ｂ）に例示したようになる。また、スコア付単語リストは、図４に例示したようになる。 In this initial stage, the document classification information stored in the classification information storage unit 106 is as illustrated in FIG. 3A or FIG. The scored word list is as illustrated in FIG.

ステップＳ２において、ユーザ入力部１０４は、ユーザから文書分類情報｛文書ＩＤ，分類タイプ｝の入力を受け付ける。 In step S2, the user input unit 104 receives input of document classification information {document ID, classification type} from the user.

ステップＳ３において、入力された上記の文書分類情報｛文書ＩＤ，分類タイプ｝を、分類情報記憶部１０６に記録する。 In step S 3, the input document classification information {document ID, classification type} is recorded in the classification information storage unit 106.

ここでは、図２の文書ＩＤ＝０００１〜０００３の各文書に対して、それぞれ、「不要文書タイプ」「不要文書タイプ」「必要文書タイプ」がユーザにより入力されているものとすると、分類情報記憶部１０６に記憶された文書分類情報例は、図３（ａ）については図８（ａ）に例示するようになり、図３（ｂ）については図８（ｂ）に例示するようになる。 Here, it is assumed that “unnecessary document type”, “unnecessary document type”, and “necessary document type” are input by the user for each document ID = 0001 to 0003 in FIG. Examples of document classification information stored in the unit 106 are illustrated in FIG. 8A for FIG. 3A and illustrated in FIG. 8B for FIG. 3B.

なお、ステップＳ４において、終了条件が成立したならば、処理を終了し、終了条件が成立していないならば、次のステップＳ５に進む。 In step S4, if the end condition is satisfied, the process is ended. If the end condition is not satisfied, the process proceeds to the next step S5.

終了条件には、種々のものが考えられる。例えば、文書データ記憶部１０５に記憶されている文書の文書数がＮである場合に、Ｎ個の文書すべてについて上記の文書分類情報｛文書ＩＤ，分類タイプ｝がユーザにより入力されたことを終了条件としても良いし、あるいは、上記の文書分類情報｛文書ＩＤ，分類タイプ｝がユーザにより入力された文書の数をｃとして、（Ｎ−ｃ）の値（すなわち、まだユーザにより分類タイプが付与されていない文書の数）が、予め定められた閾値を下回ったことを終了条件としても良い。もちろん、これらに制限されない。 Various termination conditions can be considered. For example, when the number of documents stored in the document data storage unit 105 is N, the above-described document classification information {document ID, classification type} has been input by the user for all N documents. The condition may be used, or the number of documents in which the document classification information {document ID, classification type} is input by the user is c, and the value of (N−c) (that is, the classification type is still assigned by the user) The termination condition may be that the number of undocumented documents) falls below a predetermined threshold. Of course, it is not limited to these.

さて、ステップＳ４において、終了条件が成立していないならば、以下の一連の処理が行われる。 In step S4, if the end condition is not satisfied, the following series of processing is performed.

ステップＳ５において、拾い読み単語抽出部１０１の処理を行って、スコア付単語リストを作成する。 In step S5, the browsing word extraction unit 101 performs processing to create a scored word list.

例えば、「不要文書タイプ」用のスコア付単語リストが図９（ａ）に例示するようになり、「必要文書タイプ」用のスコア付単語リストが図９（ｂ）に例示するようになる。なお、ａ１，ａ２，ｂ１，ｂ２はそれぞれのスコア値を示している。また、各スコア付単語リストは、スコア順にソートされても良い。 For example, a scored word list for “unnecessary document type” is illustrated in FIG. 9A, and a scored word list for “necessary document type” is illustrated in FIG. 9B. Here, a1, a2, b1, and b2 indicate the respective score values. Each scored word list may be sorted in the order of score.

ステップＳ６において、ハイライト箇所特定部１０２の処理を行って、各文書中のハイライト箇所を特定する。 In step S6, the highlight part specifying unit 102 performs processing to specify a highlight part in each document.

ステップＳ７において、テキスト表示部１０３の処理を行って、テキスト表示（のハイライト状態）を更新する（例えば後で説明する図１３参照）。 In step S7, the text display unit 103 is processed to update the text display (highlighted state thereof) (see, for example, FIG. 13 described later).

以下、拾い読み単語抽出部１０１、ハイライト箇所特定部１０２、テキスト表示部１０３の各処理について順番に詳しく説明する。 Hereafter, each process of the browsing word extraction part 101, the highlight location specific | specification part 102, and the text display part 103 is demonstrated in detail in order.

まず、拾い読み単語抽出部１０１について説明する。 First, the browsing word extraction unit 101 will be described.

拾い読み単語抽出部１０１は、文書データと既知の分類とから、拾い読みに適した単語を抽出するモジュールである。 The browsing word extraction unit 101 is a module that extracts words suitable for browsing from document data and known classifications.

図１０に、拾い読み単語抽出部１０１の処理手順の一例を示す。 FIG. 10 shows an example of the processing procedure of the browsing word extraction unit 101.

ステップＳ１１において、拾い読み単語抽出部１０１は、文書データ記憶部１０５及び分類情報記憶部１０６から、文書データとそれに対してユーザにより付与された分類を読み込む。 In step S <b> 11, the browsing word extraction unit 101 reads the document data and the classification assigned by the user to the document data from the document data storage unit 105 and the classification information storage unit 106.

ステップＳ１２において、各文書の分類及び各文書の単語から、拾い読み単語としてのスコアを計算する。 In step S12, a score as a browsing word is calculated from the classification of each document and the word of each document.

ステップＳ１３において、スコア順に単語をソートし、上位の単語を拾い読み単語候補とし、ステップＳ１４において、拾い読み単語候補とそのスコアを出力する。 In step S13, the words are sorted in the order of the scores, and the upper word is used as a reading word candidate. In step S14, the reading word candidate and its score are output.

拾い読み単語のスコア計算として、例えば、次のような式で計算を行っても良い。ここでは、『ある文書Ｄ内に出現する単語ｔを見た場合、この文書Ｄが文書タイプＣであるとすぐに判断できるかどうか』、というスコアとして速判度（ｔ，Ｃ）を導入する。
速判度（ｔ，Ｃ）＝読みコスト（ｔ）×判別度（ｔ，Ｃ） …（１）
ここで、読みコストとは、人が単語を認識するのにかかるコストを指し、文字長や文字の複雑さなどに依存する。例えば、文字数の逆数（例：１／（文字数））などとする。また、ひらがな・カタカナなどの文字種は、漢字に比べて画数が少なく目につきやすいことを考慮して、スコアを上げるなどの工夫が考えられる（例：ｋ／（文字数））。 As the score calculation of the browsing word, for example, the calculation may be performed by the following formula. Here, the quickness (t, C) is introduced as a score “whether it is possible to immediately determine that the document D is the document type C when the word t appearing in the document D is seen”. .
Speed (t, C) = reading cost (t) × discrimination (t, C) (1)
Here, the reading cost refers to the cost required for a person to recognize a word, and depends on the character length and the complexity of the character. For example, the reciprocal of the number of characters (eg, 1 / (number of characters)) is used. Considering that the character types such as hiragana and katakana have fewer strokes than the kanji and are easily noticeable, it is possible to increase the score (eg, k / (number of characters)).

判別度とは、ある文書タイプＣの判別に使える単語であるかの度合いを指す。ある単語が、文書タイプＣらしい単語あるいは文書タイプＣらしくない単語であれば、判別度は高く、逆に、文書タイプＣなのかがわかりにくい単語であれば、判別度は低い。例えば、ｔｆ（ｔ）＊ｌｏｇ（ｄｆ（ｔ｜Ｃ）／ｄｆ（ｔ））＊ｓｃｏｒｅ＿ｐｏｓ（ｔ）といった式で計算する。ここで、ｔｆ（ｔ）は、単語ｔの文書D内での単語頻度、ｄｆ（ｔ｜Ｃ）は、文書タイプＣに分類された文書での、単語ｔの出る文書数、ｄｆ（ｔ）は、単語ｔの出る文書数、ｓｃｏｒｅ＿ｐｏｓ（ｔ）は、単語ｔの品詞のスコアで例えば単語tが名詞のときの名詞スコアを指す。このとき、名詞のスコアを高く設定するなどして、特定の品詞の単語のスコアが高くなるようにもできる。 The degree of discrimination indicates the degree of whether a word can be used for discrimination of a certain document type C. If a certain word is a word that seems to be a document type C or a word that does not seem to be a document type C, the degree of discrimination is high. For example, the calculation is performed using an expression such as tf (t) * log (df (t | C) / df (t)) * score_pos (t). Here, tf (t) is the word frequency of the word t in the document D, df (t | C) is the number of documents in which the word t appears in documents classified into the document type C, and df (t) Is the number of documents in which the word t appears, and score_pos (t) is the score of the part of speech of the word t, for example, the noun score when the word t is a noun. At this time, the score of a word having a specific part of speech can be increased by setting the score of the noun high.

次に、ハイライト箇所特定部１０２について説明する。 Next, the highlight location specifying unit 102 will be described.

ハイライト箇所特定部１０２は、各々の文書について、当該文書中で拾い読み単語抽出部１０１により選択された拾い読み単語候補が出現する箇所を探し、当該文書中でハイライトすべき箇所を選択するモジュールである。 The highlight location specifying unit 102 is a module that searches each document for a location where a browsing word candidate selected by the browsing word extraction unit 101 appears in the document and selects a location to be highlighted in the document. is there.

図１１に、ハイライト箇所特定部１０４の処理手順の一例を示す。 FIG. 11 shows an example of the processing procedure of the highlight location specifying unit 104.

ここで、Ｌはハイライト語数の上限、Ｎは文書数、Ｍはハイライト単語候補の総数、Ｘは文書ｉ中のハイライト単語として記憶している単語数である。 Here, L is the upper limit of the number of highlight words, N is the number of documents, M is the total number of highlight word candidates, and X is the number of words stored as highlight words in document i.

まず、ハイライトする箇所が多いと拾い読みにならないため、ハイライト語数の上限Ｌを設定する。 First, since there is no browsing when there are many highlights, an upper limit L of the number of highlighted words is set.

ステップＳ２１において、ハイライト箇所特定部１０４は、文書データ記憶部１０５から文書データを読み込むとともに、拾い読み単語抽出部１０１の出力する単語リストとスコアを読み込む。 In step S <b> 21, the highlight location specifying unit 104 reads the document data from the document data storage unit 105 and also reads the word list and score output by the browsing word extraction unit 101.

ステップＳ２２において、ｉに１を代入する。 In step S22, 1 is substituted for i.

ステップＳ２３〜Ｓ２５では、各文書について、各拾い読み単語候補が出現しているかを確認する。 In steps S23 to S25, it is confirmed whether each browsing word candidate appears for each document.

ただし、文書データ記憶部１０５の総文書数をＮ、拾い読み単語候補の総数をＭとする。 However, the total number of documents in the document data storage unit 105 is N, and the total number of read word candidates is M.

文書ｉで、単語ｊがｐ回出現した場合、ステップＳ２７において、ハイライトする単語位置を、ｐ個の単語ｊの中から選択する。例えば、初出の箇所を選択する、近傍単語重要度で選択する、複合語に含まれない単語を選択するなどである。ここで、近傍単語重要度とは、単語ｔの近傍に出現する語の重要度を指す。ハイライト箇所には目が行くため、その近傍にも重要語がある方をよりスコアを高くするものである。なお、重要語の計算には、一般的にキーワード抽出として知られる計算式などを使って計算できる。例えば、近傍単語重要度（ｔｉ）を、その周辺の語ｔｊの重要度とｔｉまでの距離を使いスコア付けし、Σｊ｛ｔｆ（ｔｊ）＊ｌｏｇ（Ｎ／ｄｆ（ｔｊ））＊（１／｜ｊ−ｉ｜）｝といった式で表す。ここで、Ｎは、全文書数を表し、ｊ−ｉは単語ｔｉとｔｊの距離を表す。また、複合語を構成する１単語となっている場合には、１単語だけでは意味をなさない可能性があるため、スコアを低くするなどでもよい。 If the word j appears p times in the document i, the word position to be highlighted is selected from the p words j in step S27. For example, selecting a first appearance, selecting by neighborhood word importance, selecting a word not included in the compound word, and the like. Here, the neighborhood word importance refers to the importance of a word that appears in the vicinity of the word t. Since the highlights are eye-catching, those who have important words in the vicinity also have a higher score. The important words can be calculated using a calculation formula generally known as keyword extraction. For example, the neighboring word importance (ti) is scored using the importance of the neighboring word tj and the distance to ti, and Σj {tf (tj) * log (N / df (tj)) * (1 / | J−i |)}. Here, N represents the total number of documents, and j−i represents the distance between the words ti and tj. In addition, when a single word is included in a compound word, the score may be lowered because only one word may not make sense.

ステップＳ２８では、文書ｉに対しハイライトする単語とその出現位置を記憶し、ハイライト単語数Ｘがハイライト語数の上限Ｌ以下である間同様の操作を繰り返す。 In step S28, the word to be highlighted with respect to the document i and its appearance position are stored, and the same operation is repeated while the number of highlighted words X is not more than the upper limit L of the number of highlighted words.

ステップＳ２５において条件一致がＮｏになった場合、ステップＳ２３へ戻り、次の文書ｉ＋１に進み、ステップＳ２４〜Ｓ２８について同様の作業を繰り返す。 If the condition match is No in step S25, the process returns to step S23, proceeds to the next document i + 1, and repeats the same operation for steps S24 to S28.

ステップＳ２１で読み込んだ文書すべてにハイライト箇所特定が終わったならば、ステップＳ２９で各文書のハイライト単語とその位置を出力する。 When the highlight location has been specified for all the documents read in step S21, the highlighted word and its position of each document are output in step S29.

次に、テキスト表示部１０３について説明する。 Next, the text display unit 103 will be described.

テキスト表示部１０３は、文書データ記憶部１０５の各文書データに対し、ハイライト箇所特定部１０２で特定したハイライト単語位置をハイライトして表示するモジュールである。 The text display unit 103 is a module that highlights and displays the highlighted word position specified by the highlighted part specifying unit 102 for each document data in the document data storage unit 105.

図１２に、テキスト表示部１０３の処理手順の一例を示す。 FIG. 12 shows an example of the processing procedure of the text display unit 103.

ステップＳ３１において、テキスト表示部１０３は、文書データ記憶部１０５の文書データを読み込むとともに、ハイライト箇所特定部１０２で出力するハイライトの単語とその位置を読み込む。 In step S <b> 31, the text display unit 103 reads the document data in the document data storage unit 105, and also reads the highlight word output from the highlight location specifying unit 102 and its position.

ステップＳ３２において、文書データ中のハイライト箇所にハイライトを施して、各文書データを出力する。 In step S32, the highlighted portion in the document data is highlighted and each document data is output.

図５は、各文書データのハイライトの一例である。 FIG. 5 is an example of highlighting of each document data.

また、図１３に、各文書データの他のハイライト例を示す。図１３は、図７に例示された文書群のうち、最初の３文書に分類が付与された際の他の文書の表示例である。また、図５と同様、必要文書タイプはクロスハッチング枠で、不要文書タイプは斜線ハッチング枠で、未読文書タイプはハッチングなしの枠でそれぞれハイライトする例を示した。 FIG. 13 shows another highlight example of each document data. FIG. 13 is a display example of another document when a classification is assigned to the first three documents in the document group illustrated in FIG. Further, as in FIG. 5, an example is shown in which the required document type is highlighted with a cross hatched frame, the unnecessary document type is highlighted with a hatched hatched frame, and the unread document type is highlighted with a frame without hatching.

さて、上記のようなハイライト表示された後に、ユーザは更に未分類の文書に対して、分類を付与することができる。 Now, after the highlight display as described above, the user can further assign a classification to an unclassified document.

ユーザ入力部１０４は、テキスト表示部１０３で表示した各文書に対し、ユーザが付与する分類を受け付けるモジュールである。図５の各文書に対し、ユーザは分類を付与する（例：「○」「×」）。このとき、ユーザ入力部１０４がこの分類情報を受け取る。ユーザ入力部１０４がユーザの付与した分類を受け取ると、続けて１０１〜１０３の処理を行い、ハイライト単語の更新を行う。 The user input unit 104 is a module that receives a classification given by the user to each document displayed on the text display unit 103. The user assigns a classification to each document in FIG. 5 (example: “◯” “×”). At this time, the user input unit 104 receives this classification information. When the user input unit 104 receives the classification assigned by the user, the processing of 101 to 103 is subsequently performed to update the highlight word.

次に、処理の具体的な例を示す。 Next, a specific example of processing will be shown.

図１３のような６つの文書１２０１〜１２０６が存在した場合で、最初の３つの文書１２０１〜１２０３に分類が付与された場合で考える。分類情報記憶部１０６には、文書１２０１に「×」、文書１２０２に「×」、文書１２０３に「○」が与えられている。 Consider a case in which there are six documents 1201 to 1206 as shown in FIG. 13 and classification is given to the first three documents 1201 to 1203. In the classification information storage unit 106, “x” is given to the document 1201, “x” is given to the document 1202, and “◯” is given to the document 1203.

この３文書と分類を元に、拾い読み単語抽出部１０１で、単語抽出を行う。式（１）のｓｃｏｒｅ＿ｐｏｓにおいて、名詞について１、それ以外について０である場合に、格文書から名詞を抽出する。文書１２０１から、「文書」「音声」「指示」「入力」「頻度」「状況」「合成」「売上げ」「スタイル」「出力」「装置」といった名詞を抽出する。同様に、文書１２０２、文書１２０３からも名詞を抽出する。そして、抽出された各々の名詞について、式（１）の計算をする。この例では文書数が少なく単語間でスコアがほとんど変わらないため、各名詞が出現する文書の分類に対応する拾い読み単語候補となる。文書１２０４で上記３文書に出てきた名詞は、「操作」「音声」「出力」である。このうち、「出力」は文書１２０１中で複合語でしか出てこないため、スコアを下げ、「操作」と「音声」をハイライトするとする。 Based on the three documents and the classification, the browsing word extraction unit 101 performs word extraction. In score_pos of equation (1), if the noun is 1 and the others are 0, the noun is extracted from the case document. From the document 1201, nouns such as “document”, “speech”, “instruction”, “input”, “frequency”, “situation”, “synthesis”, “sales”, “style”, “output”, and “device” are extracted. Similarly, nouns are extracted from the documents 1202 and 1203. Then, the expression (1) is calculated for each extracted noun. In this example, since the number of documents is small and the score hardly changes between words, it becomes a browsing word candidate corresponding to the classification of the document in which each noun appears. The nouns that appear in the three documents in the document 1204 are “operation”, “voice”, and “output”. Among these, since “output” appears only in compound words in the document 1201, it is assumed that the score is lowered and “operation” and “voice” are highlighted.

続いて、ハイライト箇所特定部で、文書１２０４中でハイライトする単語を選ぶ。「操作」は一回しか出現しないため、そのまま選択する。また、「音声」は、「音声出力処理方式」と「音声出力」の２箇所に出現する。この際には、例えば、いずれも複合語なので、より短い「音声出力」の方の「音声」をハイライトすることにし、２回目の「音声」をハイライト箇所とする。 Subsequently, a highlighted word in the document 1204 is selected by the highlight location specifying unit. Since “operation” appears only once, it is selected as it is. In addition, “voice” appears in two places, “voice output processing method” and “voice output”. In this case, for example, since both are compound words, the shorter “speech output” “speech” is highlighted, and the second “speech” is taken as the highlighted part.

なお、上記では、具体例として日本語を用いて説明したが、本実施形態は他の言語の場合にも同様に適用可能である。 In the above description, Japanese has been described as a specific example. However, the present embodiment can be similarly applied to other languages.

本実施形態によれば、例えば特許調査・文献調査・市場調査などのように多数の文書を漏れなく網羅的に内容を調査したいようなケースにおいて、ユーザが多数の文書（例えばドキュメント本文又はその要約文）のうちの幾つかの文書を読みながら、例えば読んで確認すべき或いは精読すべき文書とそうでない文書等の分類や、分担して読む担当を分けるための分類などを付与し、ユーザによる分類が付与された文書をもとに、例えば未分類の文書から分類判断のための拾い読みに適した単語や、未分類の文書を弁別する根拠となりそうなキーワードなどを抽出し、適切な箇所にハイライト表示やマーカ付与を行って提示することによって、ユーザ自身が行っている拾い読みを効率的に行えるように支援することができる。 According to the present embodiment, in a case where it is desired to comprehensively investigate the contents of a large number of documents without omissions, such as patent research, literature research, market research, etc., the user has a large number of documents (for example, the document text or its summary). While reading some of the documents, for example, the classification of the documents that should be read and confirmed or the documents that should not be read and the documents that are not, and the classification to divide the person in charge to share and give, etc. Based on a document with classification, for example, it extracts words suitable for browsing for classification judgment from unclassified documents and keywords that are likely to serve as a basis for discriminating unclassified documents. By presenting with highlight display or marker addition, it is possible to assist the user in efficiently performing browsing performed by the user himself / herself.

（第２の実施形態）
第２の実施形態では、第１の言語で既に分類が付与されているときに、その分類を利用して、第１の言語とは異なる第２の言語の文書で拾い読み用のキーワードをハイライトする例を示す。 (Second Embodiment)
In the second embodiment, when a classification is already given in the first language, a keyword for browsing is highlighted in a document in a second language different from the first language using the classification. An example is shown.

例えば、第１の言語をユーザの母国語とし、第２の言語を外国語としても良いし、逆に、第２の言語をユーザの母国語とし、第１の言語を外国語としても良い。 For example, the first language may be the user's native language, the second language may be the foreign language, and conversely, the second language may be the user's native language and the first language may be the foreign language.

ここでは、第１の言語を英語、第２の言語を日本語とする場合を例に取りつつ説明する。 Here, the case where the first language is English and the second language is Japanese will be described as an example.

図１４に、第２の実施形態に係る支援システムの機能構成例を示す。 FIG. 14 shows a functional configuration example of the support system according to the second embodiment.

図１４に示されるように、拾い読み支援システムの構成は、拾い読み単語抽出部１０１、置換部２０１、二言語間辞書２０２、ハイライト箇所特定部１０２、テキスト表示部１０３、ユーザ入力部１０４、第１言語文書データ記憶部２０３、第１言語分類情報記憶部２０４、第２言語文書データ記憶部２０５、第２言語分類情報記憶部２０６を備える。 As shown in FIG. 14, the browsing support system includes a browsing word extraction unit 101, a replacement unit 201, a bilingual dictionary 202, a highlight location specifying unit 102, a text display unit 103, a user input unit 104, a first A language document data storage unit 203, a first language classification information storage unit 204, a second language document data storage unit 205, and a second language classification information storage unit 206 are provided.

第１言語文書データ記憶部２０３には、第１言語での文書データが保存されている。また、各文書に対応した分類がすでに付与されており、その分類情報が第１言語分類情報記憶部２０４に保存されている。 The first language document data storage unit 203 stores document data in the first language. In addition, a classification corresponding to each document is already assigned, and the classification information is stored in the first language classification information storage unit 204.

拾い読み単語抽出部１０１は、基本的には第１の実施形態と同様にして、第１言語文書データ記憶部２０３及び第１言語分類情報記憶部２０４の情報を読み込み、拾い読み単語抽出を行う。第１言語分類情報記憶部２０４の内容は変化しないので、ここでの抽出は、１回のみ行えば良い。 The browsing word extraction unit 101 basically reads the information in the first language document data storage unit 203 and the first language classification information storage unit 204, and performs browsing word extraction in the same manner as in the first embodiment. Since the contents of the first language classification information storage unit 204 do not change, the extraction here needs to be performed only once.

一方、第２言語文書データ記憶部２０５は、第１の実施形態の文書データ記憶部１０５に対応し、第２言語分類情報記憶部２０６は、第１の実施形態の分類情報記憶部１０６に対応する。拾い読み単語抽出部１０１は、第２言語文書データ記憶部２０５及び第２言語分類情報記憶部２０６について、第１の実施形態と同様の処理を繰り返し行うことになる。 On the other hand, the second language document data storage unit 205 corresponds to the document data storage unit 105 of the first embodiment, and the second language classification information storage unit 206 corresponds to the classification information storage unit 106 of the first embodiment. To do. The browsing word extraction unit 101 repeatedly performs the same processing as in the first embodiment for the second language document data storage unit 205 and the second language classification information storage unit 206.

拾い読み単語抽出部１０１で出力された拾い読み単語候補とスコアは、置換部２０１に入力される。 The browsing word candidate and the score output by the browsing word extraction unit 101 are input to the replacement unit 201.

置換部２０１は、第１言語文書データ記憶部２０３及び第１言語分類情報記憶部２０４を対象として拾い読み単語抽出部１０１により抽出された第１言語による単語を、（文書データ上の第２言語の単語との対応を付けるために）第２言語の単語に置き換えるためのモジュールである。 The replacement unit 201 searches the first language document data storage unit 203 and the first language classification information storage unit 204 for the first language word extracted by the read word extraction unit 101 (in the second language on the document data). This is a module for replacing with words in the second language (to make correspondence with words).

置換部２０１では、第１言語で記載された拾い読み単語候補から、二言語間辞書２０２を用いて第２言語への翻訳単語を検索し、第２言語の翻訳語を作成する。このとき、第２言語の翻訳語が複数ある単語の場合や、第２言語の翻訳語になる第１言語の単語が他にもある場合には、第１言語から第２言語へ翻訳すると曖昧性が生じている可能性があるため、このような場合にはこの単語の拾い読み単語スコアを下げるなどして、あいまい性のない他の単語を優先する。 The replacement unit 201 searches for a translation word into the second language using the bilingual dictionary 202 from the browsing word candidates described in the first language, and creates a translation word in the second language. At this time, when there are a plurality of translated words in the second language, or when there are other words in the first language that become translated words in the second language, it is ambiguous if translated from the first language to the second language. In such a case, priority is given to other words having no ambiguity by, for example, lowering the reading word score of this word.

ハイライト箇所特定部１０２、テキスト表示部１０３、ユーザ入力部１０４の一連の処理は、基本的には第１の実施形態と同様である。ただし、ハイライト箇所特定部１０２とテキスト表示部１０３が読み込む文書データは、第２言語文書データ記憶部２０５、ユーザ入力部１０４で付与された分類情報を記憶するのは第２言語分類情報記憶部２０６である。 A series of processes of the highlight location specifying unit 102, the text display unit 103, and the user input unit 104 are basically the same as those in the first embodiment. However, the document data read by the highlight location specifying unit 102 and the text display unit 103 are the second language document data storage unit 205, and the classification information given by the user input unit 104 is stored in the second language classification information storage unit. 206.

第２の実施形態では、ユーザが分類したい文書データである第２言語の文書データに分類がまだ付与されていない状況でも、ユーザが既に分類を付与した第１言語のデータ、すなわち第１言語文書データ記憶部２０３と第１言語分類情報記憶部２０４から、拾い読み単語抽出部１０１の処理を行うことができる。 In the second embodiment, even in a situation where the classification is not yet given to the second language document data that is the document data that the user wants to classify, the first language data that the user has already given the classification, that is, the first language document The browsing word extraction unit 101 can perform processing from the data storage unit 203 and the first language classification information storage unit 204.

また、第２言語分類情報記憶部２０６にデータが追加された後は、第２言語文書データ記憶部２０５と第２言語分類情報記憶部２０６のデータから拾い読み単語抽出部１０１の処理を行うこともできる。 In addition, after the data is added to the second language classification information storage unit 206, the processing of the word extraction unit 101 may be performed by browsing the data in the second language document data storage unit 205 and the second language classification information storage unit 206. it can.

後者の場合には、置換部２０１が不要となる（図１の構成に切り替わる）。 In the latter case, the replacement unit 201 is not required (switching to the configuration of FIG. 1).

図１５に、拾い読み単語抽出部１０１の処理手順の一例を示す。 FIG. 15 shows an example of the processing procedure of the browsing word extraction unit 101.

まだ、ユーザにより文書に対して分類が付与されていない初期の状態において処理を行う場合には（ステップＳ４１でＹｅｓ）、拾い読み単語抽出部１０１は、第１言語文書データ記憶部２０３及び第１言語分類情報記憶部２０４の情報を用いて、第１言語のスコア付単語リストを作成する（ステップＳ４２）。図１６に、そのスコア付単語リストの一例を示す。なお、図１６（ａ）のスコア付単語リストは、置換部２０１及び二言語間辞書２０２により、単語が第１言語から第２言語へ置換される。図１６（ｂ）に、その一例を示す。なお、ｓ１はスコア値を示している。 When processing is performed in an initial state in which no classification is assigned to the document by the user (Yes in step S41), the browsing word extraction unit 101 includes the first language document data storage unit 203 and the first language. A scored word list in the first language is created using the information in the classification information storage unit 204 (step S42). FIG. 16 shows an example of the scored word list. In the scored word list of FIG. 16A, the replacement unit 201 and the bilingual dictionary 202 replace the words from the first language to the second language. An example is shown in FIG. In addition, s1 has shown the score value.

ユーザにより少なくとも一つの文書に対して分類が付与された後に処理を行う場合には（ステップＳ４１でＮｏ）、拾い読み単語抽出部１０１は、第２言語文書データ記憶部２０５及び第２言語分類情報記憶部２０６の情報を用いて、第２言語のスコア付単語リストを作成する（ステップＳ４３）。本具体例では、例えば、図９のようになる。そして、ステップＳ４２で既に作成されている第１言語のスコア付単語リスト（単語を置換したもの）と、このステップＳ４３で作成された第２言語のスコア付単語リストとを、マージする。 When processing is performed after classification is given to at least one document by the user (No in step S41), the browsing word extraction unit 101 stores the second language document data storage unit 205 and the second language classification information storage. A scored word list of the second language is created using the information of the unit 206 (step S43). In this specific example, for example, as shown in FIG. Then, the scored word list in the first language already created in step S42 (word replacement) and the scored word list in the second language created in step S43 are merged.

なお、予め定められた条件が成立した場合には、ステップＳ４３を行わずに、第２言語のスコア付単語リストのみを使用するようにしても良い。 When a predetermined condition is satisfied, only the second word scored word list may be used without performing step S43.

予め定められた条件は、例えば、第２言語の全文書数をＮ、第２言語の文書に対してユーザにより分類タイプが付与された文書数をｃとして、ｃ／Ｎが予め定められた値を上回った場合、若しくは、ｃが予め定められた値を上回った場合、又は、最初に第２言語の文書に対してユーザにより分類タイプが付与されてから、所定の時間が経過した場合など、様々なものが可能である。 The predetermined condition is, for example, a value in which c / N is predetermined, where N is the total number of documents in the second language, and c is the number of documents to which the classification type is assigned by the user to the documents in the second language. When c exceeds a predetermined value, or when a predetermined time has elapsed since the classification type was first assigned to the second language document by the user, etc. Various things are possible.

（第２の実施形態の変形例１）
なお、上記において第１の言語を第２の言語と同じにすることも可能である。この場合には、置換部２０１及び二言語間辞書２０２が不要になる。 (Modification 1 of 2nd Embodiment)
In the above description, the first language can be the same as the second language. In this case, the replacement unit 201 and the bilingual dictionary 202 are not necessary.

（第２の実施形態の変形例２）
第２の実施形態の変形例２では、第２の実施形態で第２言語の文書中の単語をハイライト表示する代わりに、第１言語で単語を翻訳して表示する例を示す。 (Modification 2 of the second embodiment)
In the second modification of the second embodiment, an example is shown in which a word is translated and displayed in the first language instead of highlighting the word in the document in the second language in the second embodiment.

第２の実施形態の変形例の機能構成例は、図１４と同様で構わない。 A functional configuration example of the modification of the second embodiment may be the same as that of FIG.

ただし、置換部２０１からの出力は、拾い読み単語抽出部１０１から出力された第１言語の単語の第２言語への翻訳だけでなく、第１言語と第２言語への翻訳のセットにし、ハイライト箇所特定部１０２へと渡される。 However, the output from the replacement unit 201 is not only a translation of the first language word output from the browsing word extraction unit 101 into the second language, but also a set of translations into the first language and the second language. It is passed to the write location specifying unit 102.

ハイライト箇所特定部１０２の処理は、第２の実施例と同様の処理を行うが、その際の出力は、第１言語と第２言語への翻訳のセットにしてテキスト表示部１０３へ渡される。 The processing of the highlight location specifying unit 102 performs the same processing as in the second embodiment, but the output at that time is passed to the text display unit 103 as a set of translation into the first language and the second language. .

テキスト表示部１０３の処理は、第２の実施形態と異なる。 The processing of the text display unit 103 is different from that of the second embodiment.

図１７に、テキスト表示部１０３の処理手順の一例を示す。 FIG. 17 shows an example of the processing procedure of the text display unit 103.

ステップＳ５１において、テキスト表示部１０３は、第２言語文書データと、ハイライト箇所特定部１０２の出力であるハイライト単語・その単語の第１言語への翻訳語・ハイライト単語位置を入力する。 In step S <b> 51, the text display unit 103 inputs the second language document data, the highlighted word that is the output of the highlighted part specifying unit 102, the translated word of the word into the first language, and the highlighted word position.

ステップＳ５２において、第２言語文書データを表示するとともに、ハイライト箇所の単語を第１言語への翻訳単語に変換して文中に表示する。 In step S52, the second language document data is displayed, and the highlighted word is converted into a translated word into the first language and displayed in the sentence.

図１８に、第２の実施形態の変形例２のスコア付単語リストの例を示す。 FIG. 18 illustrates an example of a word list with scores according to the second modification of the second embodiment.

拾い読み単語抽出部１０１により生成される第１の言語のスコア付単語リストの例を、（ａ）に示す。第１の言語の単語を第２の言語に置換し、これを更にスコア付単語リストに追加した例を、（ｂ）に示す。なお、ｅ１はスコア値を示している。 An example of a scored word list in the first language generated by the browsing word extraction unit 101 is shown in (a). An example in which a word in the first language is replaced with a second language and this is further added to the scored word list is shown in (b). In addition, e1 has shown the score value.

拾い読み単語抽出部１０１により生成される第２の言語のスコア付単語リストの例を、（ｃ）に示す。第２の言語の単語を第１の言語に置換し、これを更にスコア付単語リストに追加した例を、（ｄ）に示す。なお、ｊ１はスコア値を示している。 An example of the second language scored word list generated by the browsing word extraction unit 101 is shown in FIG. An example in which a word in the second language is replaced with the first language and this is further added to the scored word list is shown in (d). J1 indicates a score value.

図１９に、第２言語（日本語）の文書を表示する際に、各分類タイプについて抽出された単語について、第１言語（英語）でハイライトする例を示す。また、図１９では、更に、必要文書タイプは枠で、不要文書タイプは斜線ハッチング枠で、更なるハイライトを行い、未読文書タイプは第１言語（英語）でのハイライトのみとした例を示している。 FIG. 19 shows an example in which words extracted for each classification type are highlighted in the first language (English) when a document in the second language (Japanese) is displayed. Further, in FIG. 19, the necessary document type is a frame, the unnecessary document type is a hatched hatched frame, and further highlighting is performed, and the unread document type is only highlighted in the first language (English). Show.

（第３の実施形態）
第３の実施形態では、拾い読みに使う単語をユーザ自身が選択する例を示す。 (Third embodiment)
In the third embodiment, an example in which the user himself selects a word used for browsing is shown.

図２０に、第３の実施形態に係る支援システムの機能構成例を示す。 FIG. 20 shows a functional configuration example of the support system according to the third embodiment.

図２０は、最初にユーザが拾い読みに使う単語を幾つか入力し、それを元に初期動作を始める場合の例である。 FIG. 20 shows an example in which the user first inputs some words used for browsing and starts an initial operation based on the words.

単語入力部３０１は、ユーザが拾い読みに使う単語の入力を受け付ける。この場合、ユーザによって入力された単語をそのまま拾い読み単語候補とし、この候補語を元にハイライト箇所特定部１０２でハイライト箇所を特定する。その他の処理は、第１の実施形態と同様である。なお、ユーザ入力部１０４でユーザ入力後、分類情報記憶部１０６の更新にともない拾い読み単語抽出部１０１、ハイライト箇所特定部１０２、テキスト表示部１０３の一連の処理が行われる際にも、単語入力部３０１から新たに単語入力をしても良い。 The word input unit 301 receives an input of a word used for browsing by the user. In this case, a word input by the user is directly read and used as a word candidate, and a highlight location is specified by the highlight location specifying unit 102 based on the candidate word. Other processes are the same as those in the first embodiment. In addition, after a user input by the user input unit 104, the word input is performed when a series of processes of the browsing word extraction unit 101, the highlight location specifying unit 102, and the text display unit 103 are performed in accordance with the update of the classification information storage unit 106. A new word may be input from the unit 301.

（第４の実施形態）
第４の実施形態では、拾い読みに使う単語をユーザ自身が選択する例を示す。 (Fourth embodiment)
In the fourth embodiment, an example in which the user himself selects a word used for browsing is shown.

図２１に、第２の実施形態に係る支援システムの機能構成例を示す。 FIG. 21 shows a functional configuration example of the support system according to the second embodiment.

図２１は、拾い読み単語抽出部１０１で計算された拾い読み単語候補中から、ユーザが拾い読みに使いたい単語を選択する場合の例である。 FIG. 21 shows an example in which the user selects a word that he / she wants to use for browsing from the browsing word candidates calculated by the browsing word extraction unit 101.

拾い読み単語抽出部１０１、ハイライト箇所特定部１０２、テキスト表示部１０３、ユーザ入力部１０４の処理は、第１の実施形態と同様である。 The processes of the browsing word extraction unit 101, the highlight location specifying unit 102, the text display unit 103, and the user input unit 104 are the same as those in the first embodiment.

拾い読み単語選択部４０１では、拾い読み単語抽出部１０１で抽出された拾い読み単語候補をユーザに提示する。ユーザは、使いたい拾い読み単語あるいは使いたくない拾い読み単語を選択し、その選択を拾い読み単語選択部４０１が受け付ける。選択の結果、ユーザに拾い読み単語として使いたい単語の優先度を高くし、ハイライト箇所特定部１０２へスコア付の単語リストとして入力する。 The browsing word selection unit 401 presents the browsing word candidates extracted by the browsing word extraction unit 101 to the user. The user selects a browsing word that the user wants to use or does not want to use, and the browsing word selection unit 401 accepts the selection. As a result of the selection, the priority of the word that the user wants to use as a browsing word is made high, and it is input to the highlight location specifying unit 102 as a word list with scores.

なお、これまでに説明してきた各実施形態や変形例は、任意に組み合わせて実施することが可能である。 It should be noted that the embodiments and modifications described so far can be implemented in any combination.

また、本実施形態（これまで説明してきた各実施形態や変形例又はそれらを任意に組み合わせたもののいずれによっても）、多数の文書の拾い読みを支援することができる。 In addition, the present embodiment (any of the embodiments and modifications described so far, or any combination thereof) can assist in browsing a large number of documents.

また、上述の実施形態の中で示した処理手順に示された指示は、ソフトウェアであるプログラムに基づいて実行されることが可能である。汎用の計算機システムが、このプログラムを予め記憶しておき、このプログラムを読み込むことにより、上述した実施形態の拾い読み支援システムによる効果と同様な効果を得ることも可能である。上述の実施形態で記述された指示は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＯＭ、ＤＶＤ±Ｒ、ＤＶＤ±ＲＷなど）、半導体メモリ、またはこれに類する記録媒体に記録される。コンピュータまたは組み込みシステムが読み取り可能な記録媒体であれば、その記憶形式は何れの形態であってもよい。コンピュータは、この記録媒体からプログラムを読み込み、このプログラムに基づいてプログラムに記述されている指示をＣＰＵで実行させれば、上述した実施形態の拾い読み支援システムと同様な動作を実現することができる。もちろん、コンピュータがプログラムを取得する場合または読み込む場合はネットワークを通じて取得または読み込んでもよい。
また、記録媒体からコンピュータや組み込みシステムにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワーク等のＭＷ（ミドルウェア）等が本実施形態を実現するための各処理の一部を実行してもよい。
さらに、本実施形態における記録媒体は、コンピュータあるいは組み込みシステムと独立した媒体に限らず、ＬＡＮやインターネット等により伝達されたプログラムをダウンロードして記憶または一時記憶した記録媒体も含まれる。
また、記録媒体は１つに限られず、複数の媒体から本実施形態における処理が実行される場合も、本実施形態における記録媒体に含まれ、媒体の構成は何れの構成であってもよい。 The instructions shown in the processing procedure shown in the above embodiment can be executed based on a program that is software. A general-purpose computer system stores this program in advance and reads this program, so that the same effect as that obtained by the browsing support system of the above-described embodiment can be obtained. The instructions described in the above-described embodiments are, as programs that can be executed by a computer, magnetic disks (flexible disks, hard disks, etc.), optical disks (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD). ± R, DVD ± RW, etc.), semiconductor memory, or a similar recording medium. As long as the recording medium is readable by the computer or the embedded system, the storage format may be any form. If the computer reads the program from the recording medium and causes the CPU to execute instructions described in the program based on the program, the same operation as the browsing support system of the above-described embodiment can be realized. Of course, when the computer acquires or reads the program, it may be acquired or read through a network.
In addition, the OS (operating system), database management software, MW (middleware) such as a network, etc. running on the computer based on the instructions of the program installed in the computer or embedded system from the recording medium implement this embodiment. A part of each process for performing may be executed.
Furthermore, the recording medium in the present embodiment is not limited to a medium independent of a computer or an embedded system, but also includes a recording medium in which a program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.
Further, the number of recording media is not limited to one, and when the processing in this embodiment is executed from a plurality of media, it is included in the recording medium in this embodiment, and the configuration of the media may be any configuration.

なお、本実施形態におけるコンピュータまたは組み込みシステムは、記録媒体に記憶されたプログラムに基づき、本実施形態における各処理を実行するためのものであって、パソコン、マイコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。
また、本実施形態におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本実施形態における機能を実現することが可能な機器、装置を総称している。 The computer or the embedded system in the present embodiment is for executing each process in the present embodiment based on a program stored in a recording medium. The computer or the embedded system includes a single device such as a personal computer or a microcomputer. The system may be any configuration such as a system connected to the network.
In addition, the computer in this embodiment is not limited to a personal computer, but includes an arithmetic processing device, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions in this embodiment by a program. ing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０１…拾い読み単語抽出部、１０２…ハイライト箇所特定部、１０３…テキスト表示部、１０４…ユーザ入力部、１０５…文書データ記憶部、１０６…分類情報記憶部、２０１…置換部、２０２…二言語間辞書、２０３…第１言語文書データ記憶部、２０４…第１言語分類情報記憶部、２０５…第２言語文書データ記憶部、２０６…第２言語分類情報記憶部、３０１…単語入力部、４０１…拾い読み単語選択部。 DESCRIPTION OF SYMBOLS 101 ... Browsing word extraction part, 102 ... Highlight location specific part, 103 ... Text display part, 104 ... User input part, 105 ... Document data storage part, 106 ... Classification information storage part, 201 ... Replacement part, 202 ... Two languages Inter-dictionary, 203 ... first language document data storage unit, 204 ... first language classification information storage unit, 205 ... second language document data storage unit, 206 ... second language classification information storage unit, 301 ... word input unit, 401 … Browsing word selection part.

Claims

A document storage unit for storing a plurality of documents associated with identification information;
A display unit for displaying all or a part of the plurality of documents with or without highlighting of words or phrases;
An input unit for inputting from a user an instruction of a specific document of the displayed document and an instruction of a specific classification type to be given to the specific document among a plurality of predetermined classification types;
A classification information storage unit that stores classification information in which the identification information is associated with the specific classification type;
An extraction unit that extracts one or more words or phrases that should be highlighted for the classification type from one or more of the documents associated with the same classification type;
A specifying unit that specifies a portion to be highlighted in the document when each word or phrase extracted by the extraction unit exists in the document for each of all or a part of the document A browsing support system characterized by that.

The browsing support system according to claim 1, wherein the specifying unit performs the specification for each of the documents other than the specific document.

The display unit displays all or a part of the plurality of documents without the highlight display before the place to be highlighted is specified, and the part to be highlighted is displayed. After the word is specified, when displaying all or a part of the plurality of documents, the word or phrase in which the portion to be highlighted in the document is specified is selected as the highlight corresponding to the classification type. The browsing support system according to claim 1, wherein highlight display is performed in a light form.

The said extraction part performs the said extraction by the said extraction part, the said specification by the said specific part, and the update of the display content by the said display part triggered by the content of the said classification information having been updated. 1. A browsing support system according to 1.

The extraction unit performs the extraction by the extraction unit, the specification by the specification unit, and the update of display content by the display unit when an update instruction is input from a user. 1. A browsing support system according to 1.

The display unit displays, together with the information indicating the classification type assigned to the document, for each of the specific documents when displaying all or part of the plurality of documents. The browsing support system according to claim 1.

The document stored in the document storage unit is described in a specific language,
The browsing support system is
A different language document storage unit for storing a plurality of documents described in a language different from the specific language associated with the identification information;
A different language classification information storage unit that stores classification information in which the identification information is associated with the classification type given in advance by the user for all documents stored in the different language document storage unit;
The extraction unit includes the one or more specific languages to be highlighted based on the document stored in the different language document storage unit and the classification information stored in the different language classification information storage unit. Also extracts words or phrases in different languages,
The browsing support system is
The browsing support system according to claim 1, further comprising a replacement unit that replaces words or phrases in a language different from the specific language with the specific language.

The extraction unit is extracted based on the document stored in the different language document storage unit and the classification information stored in the different language classification information storage unit until a predetermined condition is satisfied, and is extracted by the replacement unit. Extracted based on only the word or phrase replaced with the specific language, or only the classification information stored in the different language classification information storage unit and the document stored in the different language document storage unit The specific language extracted based on the word or phrase replaced by the specific language by the replacement unit, the document stored in the document storage unit, and the classification information stored in the classification information storage unit As a result of the extraction, the merged word or phrase according to is used as a result of the extraction, and after a predetermined condition is established, the document stored in the document storage unit and the document Browsing support system according to claim 7 in which only the word or phrase extracted the specific language on the basis of the classification information stored in the information storage unit, characterized in that the result of the extraction.

The replacement unit refers to a bilingual dictionary in which correspondence between the specific language and a language different from the specific language is registered for a plurality of words or phrases, and performs the replacement.
The extraction unit preferentially extracts words or phrases in which correspondence between the specific language and a language different from the specific language is one-to-one correspondence in the bilingual dictionary. 8. A browsing support system according to 8.

The browsing support system according to claim 1, wherein the classification type indicated through the input unit includes at least a classification type related to a conforming document or a classification type related to a nonconforming document.

The browsing support system according to claim 1, wherein the classification type instructed via the input unit includes each classification type related to a document assigned to each of at least a plurality of persons in charge.

A word input unit for inputting a word or phrase from a user;
The browsing support system according to claim 1, wherein the word or phrase input via the word input unit is used in addition to or instead of the word or phrase extracted by the extraction unit. .

The extraction unit extracts candidates for the word or phrase,
The browsing support system further includes a word selection unit for displaying the word or phrase candidates and inputting a selection of the word or phrase from the user among the displayed candidates.
The said extraction part makes it easier to select the said word or phrase selected via the said word selection part, when selecting the candidate of the said word or phrase. Browsing support system.

The browsing support system according to claim 1, wherein the document is a target document text or a summary thereof.

A browsing support method of a browsing support system including a document storage unit for storing a plurality of documents associated with identification information,
Displaying all or part of the plurality of documents with or without highlighting of words or phrases;
An instruction from a specific document among the displayed documents and a specific classification type instruction to be given to the specific document among a plurality of predetermined classification types are input from the user,
Storing classification information in which the identification information is associated with the specific classification type;
Extracting one or more words or phrases to be highlighted for the classification type from one or more of the documents associated with the same classification type;
For each of all or a part of the document, when each word or phrase extracted by the extraction unit is present in the document, the location to be highlighted in the document is specified. Browsing support method.

A program for causing a computer to function as a browsing support system including a document storage unit for storing a plurality of documents associated with identification information,
A document storage unit for storing a plurality of documents associated with identification information;
A display unit for displaying all or a part of the plurality of documents with or without highlighting of words or phrases;
An input unit for inputting from a user an instruction of a specific document of the displayed document and an instruction of a specific classification type to be given to the specific document among a plurality of predetermined classification types;
A classification information storage unit that stores classification information in which the identification information is associated with the specific classification type;
An extraction unit that extracts one or more words or phrases that should be highlighted for the classification type from one or more of the documents associated with the same classification type;
For each of all or part of the document, when each word or phrase extracted by the extraction unit exists in the document, a computer is used as a specifying unit that specifies a place to be highlighted in the document. A program to make it work.