JP2006277091A

JP2006277091A - Index data generation device, data retrieval device, and program

Info

Publication number: JP2006277091A
Application number: JP2005092519A
Authority: JP
Inventors: Kei Tanaka; 圭田中; Toshiya Koyama; 俊哉小山; Shoichi Tateno; 昌一舘野; Masayoshi Sakakibara; 正義榊原; Teruka Saito; 照花斎藤; Kotaro Nakamura; 浩太郎中村; Takashi Nagao; 隆長尾; Shinu Ho; 新宇彭
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-03-28
Filing date: 2005-03-28
Publication date: 2006-10-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a means for acquiring a retrieval result whose quality is much higher when retrieving a specific part in a document. <P>SOLUTION: A specifying part 1012 specifies a section in a document attached with notes. A first retrieving part 1013 extracts a keyword included in the document attached with notes. An index data generating part 1014 generates index data showing a position where the extracted keyword is included in the document. At that time, when notes are attached to a section including a character string matched with the keyword, the index data generating part 1014 makes the data specifying the notes correspond to the index data. A second retrieval part 1015 retrieves the section in the document including the keyword and the notes attached to the section by using the keyword inputted by the user as a retrieval key based on the generated index data, and makes a display part 103 display the retrieval result. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、文書検索技術に関する。 The present invention relates to a document retrieval technique.

文書の読み手により文書に注記が付されることがある。注記が付された箇所は、文書において他の箇所と比較して一般的により重要である。従って、注記が付された箇所がピックアップされ注記とともに表示されると、その文書を読む者は重要箇所を把握しやすく好都合である。そのような文書表示を可能とする技術を開示するものとして、例えば特許文献１がある。
特開平１１−２１９２４５号公報 Documents may be annotated by the document reader. The places marked with notes are generally more important in the document compared to other places. Therefore, if a point with a note is picked up and displayed together with the note, it is convenient for a reader of the document to easily grasp the important point. For example, Patent Document 1 discloses a technique that enables such document display.
Japanese Patent Laid-Open No. 11-219245

また、読み手により文書に注記が付された際、注記が付された箇所と関連する同文書内もしくは他文書内の箇所が自動的に表示されると、注記を付した読み手は自分が関心を持っている事項について関連する情報を容易に得ることができ好都合である。そのような文書表示を可能とする技術を開示するものとして、例えば特許文献２がある。
特開２０００−１０９８１号公報 In addition, when a note is added to a document by a reader, if the location in the same document or other document related to the location where the note was added is automatically displayed, the reader with the note will be interested in it. Conveniently, you can easily get relevant information about what you have. For example, Patent Document 2 discloses a technique that enables such document display.
JP 2000-10981 A

文章に含まれるキーワードのうち統計的に使用頻度が高いものを抽出して、抽出したキーワードに基づき検索用のインデックスを生成することが一般的に行われている。しかしながら、文書全体にはその文書において重要ではないが登場頻度が高い汎用語が含まれる場合が多い。その結果、統計的方法により生成されるインデックスを用いた検索結果は必ずしも品質の高いものとはならない。 In general, keywords that are frequently used among keywords included in a sentence are extracted, and a search index is generated based on the extracted keywords. However, in many cases, the entire document includes general terms that are not important in the document but appear frequently. As a result, a search result using an index generated by a statistical method is not necessarily high quality.

ところで、上述したように、文書中で読み手により注記が付された部分は他の部分と比較して一般的により重要である。従って、検索において、注記が付された部分に注目することが考えられる。上記の特許文献１および２に開示の技術は、注記が付された部分に注目した技術であるが、検索の効率化をもたらす技術ではない。 By the way, as described above, the portion of the document that is annotated by the reader is generally more important than the other portions. Therefore, it is conceivable to pay attention to the part with the note in the search. The techniques disclosed in Patent Documents 1 and 2 above are techniques that focus on the part to which a note is attached, but are not techniques that increase the efficiency of search.

上述の事情に鑑み、本発明は文書中の注記の付された部分に注目することにより、文書中の特定箇所を検索する場合において、より品質の高い検索結果の取得を可能とする手段を提供することを目的とする。 In view of the above-described circumstances, the present invention provides a means for enabling acquisition of a higher-quality search result when searching for a specific part in a document by paying attention to a part with a note in the document. The purpose is to do.

上述の課題を解決するため、本発明は、一の文書を示す文書データと、前記一の文書を構成するいずれかの部分文書に対応付けられた注記を各々示す１以上の注記データを記憶する記憶手段と、前記１以上の注記データのいずれかにより示される注記が対応付けられた部分文書に含まれる一の語句と一致する語句が前記一の文書において含まれる位置を示す索引データを生成する索引データ生成手段とを備えることを特徴とする索引データ生成装置を提供する。 In order to solve the above-described problem, the present invention stores document data indicating one document and one or more annotation data indicating each of notes associated with any partial document constituting the one document. Index data indicating a position where a phrase that matches one phrase included in the partial document associated with the note indicated by any one of the storage means and the one or more annotation data is included in the one document is generated. An index data generation device comprising index data generation means is provided.

かかる構成の索引データ生成装置によれば、文書中の注記の付された部分に含まれるキーワードに基づき、効率的な検索を可能とする索引データが生成される。 According to the index data generating apparatus having such a configuration, index data that enables efficient search is generated based on the keywords included in the part of the document that is annotated.

好ましい態様において、前記索引データ生成装置は前記一の文書の中から、一の注記の付された部分文書を特定する特定手段を備え、前記記憶手段は、前記一の注記を示す注記データを前記特定手段により特定された部分文書に対応付けられた注記データとして記憶するように構成されてもよい。 In a preferred aspect, the index data generation device comprises a specifying means for specifying a partial document with one note from the one document, and the storage means stores the note data indicating the one note as the note data. You may comprise so that it may memorize | store as note data matched with the partial document specified by the specific means.

その場合、前記特定手段は、前記文書データにより示される文書と、前記注記データにより示される注記が同じ平面に表示された場合の空間的位置関係に基づき前記特定を行うようにしてもよい。 In this case, the specifying unit may perform the specifying based on a spatial positional relationship when the document indicated by the document data and the note indicated by the note data are displayed on the same plane.

かかる構成の索引データ生成装置によれば、注記と注記の付された部分文書との対応関係が不明である文書データと注記データを用いても、効率的な検索を可能とする索引データが生成される。 According to the index data generation apparatus having such a configuration, index data that enables efficient search is generated even when document data and annotation data in which the correspondence between the annotation and the partial document with the annotation is unknown are used. Is done.

他の好ましい態様において、前記索引データ生成装置は一の言語で表現された語句を、当該語句と同一の意味を有する他の言語で表現された訳語に変換する変換手段を備え、前記索引データ生成手段は、前記変換手段により前記一の語句から得られた訳語を前記検索データに対応付けるように構成されてもよい。 In another preferred embodiment, the index data generation device comprises conversion means for converting a phrase expressed in one language into a translated word expressed in another language having the same meaning as the phrase, and generating the index data The means may be configured to associate a translation obtained from the one word / phrase by the conversion means with the search data.

かかる構成の索引データ生成装置によれば、文書の言語と異なる言語のキーワードを用いた場合でも利用可能な、効率的な検索を可能とする索引データが生成される。 According to the index data generating apparatus having such a configuration, index data that can be used even when a keyword in a language different from the language of the document is used and that enables efficient search is generated.

また、他の好ましい態様において、前記索引データ生成装置の前記記憶手段は、キーワードを示すキーワードデータを予め記憶し、前記索引データ生成手段は、前記キーワードを前記一の語句として用いるように構成されてもよい。 In another preferable aspect, the storage unit of the index data generation device stores keyword data indicating a keyword in advance, and the index data generation unit is configured to use the keyword as the one phrase. Also good.

かかる構成の索引データ生成装置によれば、例えば統計的にキーワードを抽出する等の処理を経ることなく、索引データの生成が行われる。 According to the index data generating apparatus having such a configuration, index data is generated without going through a process such as statistically extracting keywords.

また、他の好ましい態様において、前記索引データ生成装置の前記索引データ生成手段は、前記１以上の注記データのいずれかにより示される注記に含まれる一の語句と一致する語句が前記１以上の注記データのいずれかにより示される注記において含まれる位置を示す注記索引データを生成するように構成されてもよい。 In another preferred aspect, the index data generation means of the index data generation device has a phrase that matches one phrase included in a note indicated by any one of the one or more annotation data, as the one or more annotations. Annotation index data may be configured to indicate positions included in the annotations indicated by any of the data.

かかる構成の索引データ生成装置によれば、文書に付された注記に含まれるキーワードに基づき、効率的な検索を可能とする索引データが生成される。 According to the index data generating apparatus having such a configuration, index data that enables efficient search is generated based on keywords included in a note attached to a document.

また、他の好ましい態様において、前記索引データ生成装置は前記文書データにより示される文書を表示する表示手段と、ユーザの筆記動作に応じて筆跡を示す筆跡データを生成する筆跡データ生成手段とを備え、前記記憶手段は、前記筆跡データ生成手段により生成された筆跡データを前記注記データとして記憶するように構成されてもよい。 In another preferred embodiment, the index data generation device includes display means for displaying a document indicated by the document data, and handwriting data generation means for generating handwriting data indicating handwriting according to a user's writing operation. The storage unit may be configured to store the handwriting data generated by the handwriting data generation unit as the annotation data.

また、他の好ましい態様において、前記索引データ生成装置は前記文書データにより示される文書を表示する表示手段と、ユーザの筆記動作に応じて筆跡を示す筆跡データを生成する筆跡データ生成手段と、前記筆跡データ生成手段により生成された筆跡データに対し文字認識処理を行い、認識した文字列を示すテキストデータを生成する文字認識処理手段とを備え、前記記憶手段は、前記文字認識処理手段により生成されたテキストデータを前記注記データとして記憶するように構成されてもよい。 In another preferred embodiment, the index data generation device includes a display unit that displays a document indicated by the document data, a handwriting data generation unit that generates handwriting data indicating handwriting according to a user's writing operation, Character recognition processing means for performing character recognition processing on the handwriting data generated by the handwriting data generation means, and generating text data indicating the recognized character string, and the storage means is generated by the character recognition processing means. The text data may be stored as the annotation data.

かかる構成の索引データ生成装置によれば、ユーザがポインティングデバイス等を用いて注記を書いた場合であっても、その注記に基づき、効率的な検索を可能とする索引データが生成される。 According to the index data generating apparatus having such a configuration, even when a user writes a note using a pointing device or the like, index data that enables efficient search is generated based on the note.

また、他の好ましい態様において、前記索引データ生成装置は活字で記載された文書および当該文書に対し手書きで記載された１以上の注記を含む原稿から画像を光学的に読み取り、読み取った画像を示す画像データを生成する画像読取手段と、前記画像読取手段により生成された画像データから前記活字で記載された文書を示す画像データと前記手書きで記載された１以上の注記を各々示す１以上の画像データとを生成する画像データ分離手段と、前記画像データ分離手段により生成された前記文書を示す画像データに対し文字認識処理を行い、前記一の文書の内容を示すテキストデータを生成する文字認識処理手段とを備え、前記記憶手段は、前記文字認識処理手段により生成された前記一の文書の内容を示すテキストデータを前記文書データとして記憶するとともに、前記画像データ分離手段により生成された前記手書きで記載された１以上の注記を各々示す１以上の画像データを前記１以上の注記データとして記憶するように構成されてもよい。 In another preferred embodiment, the index data generation device optically reads an image from a document described in type and one or more notes written on the document by hand, and indicates the read image. Image reading means for generating image data, image data indicating a document described in the type from the image data generated by the image reading means, and one or more images each indicating one or more notes written in handwriting Image data separating means for generating data, and character recognition processing for performing text recognition processing on the image data indicating the document generated by the image data separating means and generating text data indicating the contents of the one document And the storage means stores text data indicating the content of the one document generated by the character recognition processing means. And one or more pieces of image data each indicating one or more notes written by handwriting generated by the image data separation means are stored as the one or more pieces of note data. Good.

また、他の好ましい態様において、前記索引データ生成装置は活字で記載された文書および当該文書に対し手書きで記載された１以上の注記を含む原稿から画像を光学的に読み取り、読み取った画像を示す画像データを生成する画像読取手段と、前記画像読取手段により生成された画像データから前記活字で記載された文書を示す画像データと前記手書きで記載された１以上の注記を各々示す１以上の画像データとを生成する画像データ分離手段と、前記画像データ分離手段により生成された前記文書を示す画像データに対し文字認識処理を行い前記一の文書の内容を示すテキストデータを生成するとともに、前記画像データ分離手段により生成された前記１以上の注記を各々示す１以上の画像データの各々に対し文字認識処理を行い前記１以上の注記の内容を示す１以上のテキストデータを生成する文字認識処理手段とを備え、前記記憶手段は、前記文字認識処理手段により生成された前記一の文書の内容を示すテキストデータを前記文書データとして記憶するとともに、前記文字認識処理手段により生成された前記１以上の注記の内容を示す１以上のテキストデータを前記１以上の注記データとして記憶するように構成されてもよい。 In another preferred embodiment, the index data generation device optically reads an image from a document described in type and one or more notes written on the document by hand, and indicates the read image. Image reading means for generating image data, image data indicating a document described in the type from the image data generated by the image reading means, and one or more images each indicating one or more notes written in handwriting Image data separation means for generating data, and character recognition processing is performed on the image data indicating the document generated by the image data separation means to generate text data indicating the contents of the one document, and the image Character recognition processing is performed on each of the one or more image data each indicating the one or more notes generated by the data separation means. Character recognition processing means for generating one or more text data indicating the contents of the note above, and the storage means stores the text data indicating the contents of the one document generated by the character recognition processing means in the document In addition to storing as data, one or more text data indicating the contents of the one or more annotations generated by the character recognition processing means may be stored as the one or more annotation data.

かかる構成の索引データ生成装置によれば、印刷物に対しユーザが手書きで注記を付加した場合であっても、その注記に基づき、効率的な検索を可能とする索引データが生成される。 According to the index data generating apparatus having such a configuration, even when the user adds a handwritten note to the printed matter, the index data that enables efficient search is generated based on the note.

また、本発明は、一の文書を示す文書データと、前記一の文書を構成するいずれかの部分文書に対応付けられた注記を各々示す１以上の注記データと、１以上のキーワードの各々に関し当該キーワードと一致する語句が前記一の文書において含まれる位置を示す索引データとを記憶する記憶手段と、一のキーワードを示すキーワードデータを受け取る入力手段と、前記索引データに基づき、前記一の文書を構成する複数の部分文書の中から、前記一のキーワードを含む１以上の部分文書を検索する検索手段と、前記検索手段により検索された部分文書に対応付けられた注記を示す注記データを出力する出力手段とを備えることを特徴とするデータ検索装置を提供する。 In addition, the present invention relates to document data indicating one document, one or more note data each indicating a note associated with one of the partial documents constituting the one document, and one or more keywords. Based on the index data, the storage unit stores index data indicating a position where the phrase matching the keyword is included in the one document, the input unit receiving keyword data indicating the one keyword, and the one document. A search means for searching for one or more partial documents including the one keyword from among a plurality of partial documents constituting the document, and note data indicating a note associated with the partial documents searched by the search means And a data search device characterized by comprising an output means.

かかる構成のデータ検索装置によれば、キーワードの入力により、当該キーワードを含む部分文書に付された注記の出力を得ることができる。 According to the data search device having such a configuration, it is possible to obtain an output of a note attached to a partial document including the keyword by inputting the keyword.

また、好ましい態様において、前記データ検索装置は一の言語で表現された語句を、当該語句と同一の意味を有する他の言語で表現された訳語に変換する変換手段を備え、前記検索手段は、前記一のキーワードの代わりに、前記変換手段により前記一のキーワードから得られた訳語を用いて、前記検索を行うように構成されてもよい。 Further, in a preferred aspect, the data search device includes a conversion unit that converts a phrase expressed in one language into a translation expressed in another language having the same meaning as the phrase, and the search unit includes: Instead of the one keyword, the search may be performed by using a translation obtained from the one keyword by the conversion means.

かかる構成のデータ検索装置によれば、キーワードの入力により、当該キーワードの言語とは異なる言語の部分文書に付された注記の出力を得ることができる。 According to the data search device having such a configuration, it is possible to obtain an output of a note attached to a partial document in a language different from the language of the keyword by inputting the keyword.

また、本発明は、一の言語で表現された第１の文書を示す第１の文書データと、前記第１の文書を構成するいずれかの部分文書に対応付けられた注記を各々示す１以上の第１の注記データと、１以上のキーワードの各々に関し当該キーワードと一致する語句が前記第１の文書において含まれる位置を示す索引データと、前記第１の文書と同一の意味を有する他の言語で表現された第２の文書を示す第２の文書データと、前記第２の文書を構成するいずれかの部分文書に対応付けられた注記を各々示す１以上の第２の注記データとを記憶する記憶手段と、一のキーワードを示すキーワードデータを受け取る入力手段と、前記索引データに基づき、前記第１の文書を構成する複数の部分文書の中から、前記一のキーワードを含む１以上の部分文書を検索する検索手段と、前記検索手段により検索された部分文書に対応付けられた注記を示す注記データと、前記第２の文書を構成する部分文書であって前記検索手段により検索された部分文書に対応する部分文書に対応付けられた注記を示す注記データとを出力する出力手段とを備えることを特徴とするデータ検索装置を提供する。 Further, the present invention provides at least one first document data representing a first document expressed in one language and one or more notes each associated with any partial document constituting the first document. First note data, index data indicating a position in the first document that includes a phrase that matches the keyword with respect to each of the one or more keywords, and another having the same meaning as the first document Second document data indicating a second document expressed in a language, and one or more second note data each indicating a note associated with any partial document constituting the second document Storage means for storing; input means for receiving keyword data indicating one keyword; and one or more including the one keyword among a plurality of partial documents constituting the first document based on the index data Partial document A search means for searching, note data indicating a note associated with the partial document searched by the search means, and a partial document constituting the second document and searched for by the search means There is provided a data search device comprising output means for outputting note data indicating a note associated with a corresponding partial document.

かかる構成のデータ検索装置によれば、キーワードの入力により、当該キーワードを含む部分文書に付された注記に加え、当該部分文書に対応する他の部分文書に付された注記の出力を得ることができる。 According to the data search device having such a configuration, by inputting a keyword, in addition to a note attached to a partial document including the keyword, an output of a note attached to another partial document corresponding to the partial document can be obtained. it can.

また、本発明は、上記の索引データ生成装置もしくはデータ検索装置が行う処理と同様の処理をコンピュータに実行させることを特徴とするプログラムを提供する。かかるプログラムによれば、汎用的なコンピュータを用いて上記の索引データ生成装置もしくはデータ検索装置を実現することができる。 The present invention also provides a program that causes a computer to execute the same processing as the processing performed by the index data generation device or the data search device. According to such a program, the above-described index data generation device or data search device can be realized using a general-purpose computer.

［１．第１実施形態］
以下、本発明の第１実施形態として、予め準備されたテキスト形式の文書に対し、読み手であるユーザがワードプロセッサ機能を用いて注記を付した後、注記の付された文書に関する索引データを生成し、ユーザによりキーワード入力がなされた場合、生成した索引データを用いて文書内のキーワードを含む部分を表示する装置を説明する。 [1. First Embodiment]
Hereinafter, as a first embodiment of the present invention, after a user who is a reader attaches a note to a text document prepared in advance using a word processor function, index data relating to the document with the note is generated. An apparatus for displaying a portion including a keyword in a document using generated index data when a keyword is input by a user will be described.

図１は、第１実施形態にかかる検索装置１０の構成を示したブロック図である。検索装置１０は、検索装置１０の構成部を制御する制御部１０１と、制御部１０１による各種処理を指示するプログラムおよび各種データを記憶するとともに制御部１０１等のワークエリアとして用いられる記憶部１０２と、ユーザに対し文字や図形の表示を行う表示部１０３と、ユーザの操作に応じて所定の信号を生成し制御部１０１に引き渡す操作部１０４を備えている。本実施形態における操作部１０４は、例えばキーボードやマウスである。検索装置１０は汎用コンピュータにアプリケーションプログラムに従った処理を行わせることにより実現されてもよいし、専用のハードウェアにより実現されてもよい。 FIG. 1 is a block diagram illustrating a configuration of a search device 10 according to the first embodiment. The search device 10 includes a control unit 101 that controls components of the search device 10, a storage unit 102 that stores programs and various data for instructing various types of processing by the control unit 101, and is used as a work area for the control unit 101 and the like. A display unit 103 that displays characters and graphics to the user, and an operation unit 104 that generates a predetermined signal in response to a user operation and delivers the signal to the control unit 101 are provided. The operation unit 104 in the present embodiment is, for example, a keyboard or a mouse. The search device 10 may be realized by causing a general-purpose computer to perform processing according to an application program, or may be realized by dedicated hardware.

記憶部１０２は、予め、テキスト形式の文書データ１０２１と、文書データ１０２１により示される文書中に含まれる可能性のあるキーワードの候補を示すキーワードデータを格納したキーワードＤＢ１０２３（以下、データベースを「ＤＢ」と呼ぶ）を記憶している。 The storage unit 102 stores, in advance, a keyword DB 1023 (hereinafter referred to as “DB”) that stores document data 1021 in text format and keyword data indicating keyword candidates that may be included in the document indicated by the document data 1021. Called).

ユーザは、表示部１０３に文書データ１０２１の内容を表示させ、操作部１０４を用いた操作により、文書の特定位置に注記を入力することができる。制御部１０１は、ユーザにより入力された注記を示すテキストデータを、注記の付された文書中の部分（以下、「部分文書」と呼ぶ）に対応付け、注記データとして記憶部１０２に記憶させる注記付加部１０１１を備えている。注記付加部１０１１の指示に従い、記憶部１０２は注記データを注記ＤＢ１０２２に格納する。 The user can display the contents of the document data 1021 on the display unit 103 and input a note at a specific position of the document by an operation using the operation unit 104. The control unit 101 associates text data indicating a note input by the user with a portion in a document with a note (hereinafter referred to as “partial document”), and stores the note data in the storage unit 102 as note data. An adding unit 1011 is provided. In accordance with an instruction from the annotation adding unit 1011, the storage unit 102 stores the annotation data in the annotation DB 1022.

図２は、文書データ１０２１により示される文書と、ユーザにより入力された注記が表示部１０３に表示された様子を例示した図である。図２に示される表示においては、ユーザにより入力された注記が下方余白部に脚注として示されている。また、文書中に示される（＊１）および（＊２）は、ユーザがそれらの注記を付した位置を示すマーカである。 FIG. 2 is a diagram illustrating a state in which a document indicated by the document data 1021 and a note input by the user are displayed on the display unit 103. In the display shown in FIG. 2, the note input by the user is shown as a footnote in the lower margin. In addition, (* 1) and (* 2) shown in the document are markers that indicate the positions where the user has added their notes.

図３は、注記ＤＢ１０２２の内容を例示した図である。注記ＤＢ１０２２は、注記を識別する「番号」、注記が挿入された文書中の「位置」および注記の「内容」の各項目を含む注記データを複数格納している。注記データの「位置」には、例えば「第３段落第１文第２４文字」のような形式のデータが格納される。このデータは、文書先頭から数えて第３段落の第１文の第２４文字の後に注記が挿入されたことを示している。 FIG. 3 is a diagram illustrating the contents of the note DB 1022. The note DB 1022 stores a plurality of note data including “number” for identifying the note, “position” in the document in which the note is inserted, and “content” of the note. In the “position” of the annotation data, for example, data in a format such as “third paragraph, first sentence, 24th character” is stored. This data indicates that a note has been inserted after the 24th character of the first sentence in the third paragraph counting from the beginning of the document.

制御部１０１は、注記データの「内容」により示される注記が文書中のいずれの部分文書に付されたものであるかを特定し、特定した部分文書を示すデータ（以下、「部分文書データ」と呼ぶ）を文書データ１０２１から切り出す機能を有する特定部１０１２を備えている。特定部１０１２は、まず、注記ＤＢ１０２２に含まれる注記データの各々に関し、「位置」に格納されるデータに基づき、注記の付された文書中の位置を特定する。例えば、特定部１０１２は図３に例示の番号「＊１」の注記データに基づき、「我が国では・・・」という注記が図２に示されるマーカ（＊１）の位置に挿入されていることを特定する。 The control unit 101 specifies which partial document in the document the note indicated by the “content” of the note data is attached, and data indicating the specified partial document (hereinafter “partial document data”). A specifying unit 1012 having a function of cutting out from the document data 1021. First, the specifying unit 1012 specifies the position in the document to which the note is attached based on the data stored in the “position” for each piece of note data included in the note DB 1022. For example, the identification unit 1012 has inserted a note “... in Japan” at the position of the marker (* 1) shown in FIG. 2 based on the note data of the number “* 1” illustrated in FIG. Is identified.

続いて、特定部１０１２は注記の付された対象の部分文書を特定する。具体的には、特定部１０１２は注記の挿入位置より前に登場する直近の句読点もしくは主語を示す文字列を検索し、検索した句読点の直後の文字もしくは検索した主語を示す文字列の先頭文字を開始位置とし、注記の挿入位置の直前の文字を終了位置とする範囲の文字列を、注記の付された対象の部分文書として特定する。例えば番号（＊１）の注記（以下、「注記（＊１）」のように呼ぶ）の場合、「第七条（２）の規定の適用」が対象の部分文書として特定される。 Subsequently, the specifying unit 1012 specifies the target partial document to which a note is attached. Specifically, the specifying unit 1012 searches for a character string indicating the latest punctuation mark or subject that appears before the insertion position of the note, and determines the character immediately after the searched punctuation mark or the first character of the character string indicating the searched subject. A character string in a range having a start position and a character immediately before the insertion position of the note as an end position is specified as a target partial document to which a note is attached. For example, in the case of a note with a number (* 1) (hereinafter referred to as “note (* 1)”), “application of the provision of Article 7 (2)” is specified as the target partial document.

ただし、特定部１０１２が注記の付された対象の部分文書を特定する方法は上記のものに限られず、例えば注記の挿入位置を含む文全体や段落全体を部分文書として特定してもよいし、注記の挿入位置の前後にアンダーラインが付されている場合にはそのアンダーラインが付されている部分を部分文書として特定するようにしてもよい。また、部分文書の単位をユーザが選択可能としてもよい。 However, the method by which the specifying unit 1012 specifies the target partial document to which the annotation is attached is not limited to the above, and for example, the entire sentence including the insertion position of the note or the entire paragraph may be specified as the partial document. If an underline is added before and after the insertion position of the note, the underlined part may be specified as a partial document. In addition, the user may be able to select the unit of the partial document.

続いて、特定部１０１２は特定した部分文書を示す部分文書データを第１検索部１０１３に引き渡す。また、特定部１０１２は部分文書の文書全体における範囲を示す範囲データを生成して索引データ生成部１０１４に引き渡す。範囲データは、例えば「（＊１）第３段落第１文第１３文字〜第３段落第１文第２４文字」といった形式のデータである。範囲データの先頭の（＊１）は、対応する注記データの番号である。 Subsequently, the specifying unit 1012 delivers the partial document data indicating the specified partial document to the first search unit 1013. In addition, the specifying unit 1012 generates range data indicating the range of the partial document in the entire document, and delivers the range data to the index data generation unit 1014. The range data is, for example, data in a format such as “(* 1) 3rd paragraph 1st sentence 13th character to 3rd paragraph 1st sentence 24th character”. The head (* 1) of the range data is the number of the corresponding note data.

制御部１０１は、部分文書の中からキーワードＤＢ１０２３により示されるキーワードを検索する第１検索部１０１３を備えている。図４はキーワードＤＢ１０２３の内容を例示した図である。キーワードＤＢ１０２３は、キーワードを示すキーワードデータを複数格納している。 The control unit 101 includes a first search unit 1013 that searches a keyword indicated by the keyword DB 1023 from the partial documents. FIG. 4 is a diagram illustrating the contents of the keyword DB 1023. The keyword DB 1023 stores a plurality of keyword data indicating keywords.

なお、キーワードＤＢ１０２３に含まれるキーワードには、いわゆるワイルドカードを用いることができる。例えばキーワード「第＊条（＊）」は、「＊」の部分には如何なる文字もしくは文字列が入ってもよい。従って、キーワード「第＊条（＊）」を検索キーとする検索の結果には、例えば「第一条（１）」や「第九条（１２）」等が含まれることになる。 A so-called wild card can be used as a keyword included in the keyword DB 1023. For example, the keyword “Article * (*)” may include any character or character string in the “*” part. Therefore, the search result using the keyword “Article * (*)” as a search key includes, for example, “Article 1 (1)”, “Article 9 (12)”, and the like.

第１検索部１０１３は、キーワードＤＢ１０２３に含まれるキーワードデータの各々に関し、そのキーワードデータにより示されるキーワードと一致する文字列を、特定部１０１２から受け取った部分文書データにより示される部分文書の中から検索する。例えば部分文書「第七条（２）の規定の適用」からはキーワード「第七条（２）」が検索結果として得られる。第１検索部１０１３はそれらの検索結果を示すデータ、すなわちキーワードＤＢ１０２３に含まれるキーワードデータのうち注記の付された部分文書中に含まれるキーワードを示すものを抽出したものを、抽出キーワードデータとして制御部１０１が備える索引データ生成部１０１４に引き渡す。 The first search unit 1013 searches the partial document indicated by the partial document data received from the specifying unit 1012 for a character string that matches the keyword indicated by the keyword data for each of the keyword data included in the keyword DB 1023. To do. For example, the keyword “Article 7 (2)” is obtained as a search result from the partial document “Application of Article 7 (2)”. The first search unit 1013 controls the data indicating the search results, that is, the keyword data included in the keyword DB 1023, extracted from the keyword indicating the keyword included in the partial document with the note, as extracted keyword data. The data is delivered to the index data generation unit 1014 included in the unit 101.

索引データ生成部１０１４は、第１検索部１０１３により抽出されたキーワードの各々に関し、そのキーワードが文書中に登場する位置を示す索引データを生成する。上述したように、索引データ生成部１０１４は特定部１０１２から部分文書の文書中における範囲を示す範囲データを受け取るとともに、第１検索部１０１３からいずれかの注記に登場するキーワードを示す抽出キーワードデータを受け取る。索引データ生成部１０１４は、文書全体の中から抽出キーワードデータにより示されるキーワードの各々と一致する文字列を検索し、検索した文字列の文書中における位置を示す位置データを生成する。続いて、索引データ生成部１０１４はその位置が、範囲データにより示される文書中の範囲に含まれるか否かを判定する。索引データ生成部１０１４はそれらの処理の結果に基づき索引データを生成する。索引データ生成部１０１４により生成された索引データは記憶部１０２の索引ＤＢ１０２４に格納される。 For each keyword extracted by the first search unit 1013, the index data generation unit 1014 generates index data indicating the position where the keyword appears in the document. As described above, the index data generation unit 1014 receives the range data indicating the range of the partial document in the document from the specifying unit 1012 and extracts the extracted keyword data indicating the keyword appearing in any one of the notes from the first search unit 1013. receive. The index data generation unit 1014 searches the entire document for a character string that matches each of the keywords indicated by the extracted keyword data, and generates position data indicating the position of the searched character string in the document. Subsequently, the index data generation unit 1014 determines whether or not the position is included in the range in the document indicated by the range data. The index data generation unit 1014 generates index data based on the results of these processes. The index data generated by the index data generation unit 1014 is stored in the index DB 1024 of the storage unit 102.

図５は索引ＤＢ１０２４の内容を例示した図である。索引ＤＢ１０２４は、「キーワード」、「位置」および「対応注記番号」の項目を有する索引データを格納している。索引データの「キーワード」は、抽出キーワードデータに含まれるキーワードデータのいずれかである。索引データの「位置」は、文書中に登場するキーワードの先頭文字の位置を示すデータである。同じキーワードが文書中に複数回登場する場合、索引データの「位置」には複数のデータが含まれることになる。 FIG. 5 is a diagram illustrating the contents of the index DB 1024. The index DB 1024 stores index data having items of “keyword”, “position”, and “corresponding note number”. The “keyword” of the index data is any of the keyword data included in the extracted keyword data. The “position” of the index data is data indicating the position of the first character of the keyword appearing in the document. When the same keyword appears multiple times in the document, the “position” of the index data includes a plurality of data.

索引データの「対応注記番号」は、文書中に登場するキーワードを含む部分文書に付された注記の番号を示すデータである。従って、いずれの注記も付されていない部分文書にキーワードが含まれている場合、そのキーワードの「位置」に対応する「対応注記番号」は空欄（「−」）となる。 The “corresponding note number” of the index data is data indicating the number of the note attached to the partial document including the keyword appearing in the document. Therefore, when a keyword is included in a partial document to which no note is attached, the “corresponding note number” corresponding to the “position” of the keyword is blank (“−”).

例えば、図５の先頭行のデータは、第２段落第１文第１０文字の位置にキーワード「国際出願」が登場し、そのキーワードを含む部分文書には注記が付されていないことを示している。一方、第８行（図簡略のための「・」も各々１行と数える）のデータによれば、第３段落第１文第１３文字の位置にキーワード「第七条（２）」が登場し、そのキーワードを含む部分文書には、注記（＊１）が付されていることを示している。 For example, the data in the first line in FIG. 5 indicates that the keyword “international application” appears at the position of the 10th character of the first sentence in the second paragraph, and that the partial document containing the keyword is not annotated. Yes. On the other hand, according to the data on the 8th line ("*" for simplification of each figure is counted as 1 line), the keyword "Article 7 (2)" appears at the position of the 13th character in the 3rd paragraph, 1st sentence. The partial document including the keyword is indicated with a note (* 1).

以上のように、索引ＤＢ１０２４が記憶部１０２に記憶されている状態において、ユーザは操作部１０４を操作して任意のキーワードを入力し、そのキーワードを含む部分文書を表示させることができる。その際、部分文書に注記が付されていれば、その注記も同時に表示される。図６は、ユーザがキーワードを含む部分文書の検索を行う際に表示部１０３に表示される画面（以下、「検索画面」と呼ぶ）を例示した図である。 As described above, in a state where the index DB 1024 is stored in the storage unit 102, the user can input an arbitrary keyword by operating the operation unit 104 and display a partial document including the keyword. At that time, if a note is attached to the partial document, the note is also displayed at the same time. FIG. 6 is a diagram illustrating a screen (hereinafter referred to as “search screen”) displayed on the display unit 103 when the user searches for a partial document including a keyword.

検索画面には、ユーザがキーワードを直接入力もしくはプルダウンメニューから選択するためのボックス１０３１、ユーザが検索の実行指示を行うためのボタン１０３２および検索結果を表示するためのリスト１０３３が表示される。ボックス１０３１のプルダウンメニューには索引ＤＢ１０２４に含まれる全てのキーワードが選択肢として表示される。 On the search screen, a box 1031 for the user to directly input a keyword or select from a pull-down menu, a button 1032 for the user to issue a search execution instruction, and a list 1033 for displaying the search result are displayed. In the pull-down menu of the box 1031, all keywords included in the index DB 1024 are displayed as options.

制御部１０１は入力されたキーワードを検索キーとして、部分文書およびそれに対し付された注記を検索する第２検索部１０１５を備えている。ユーザがボックス１０３１にキーワードを入力しボタン１０３２を押下する操作を行うと、第２検索部１０１５は索引ＤＢ１０２４から入力されたキーワードを「キーワード」欄に含む索引データを検索し、検索した索引データの「位置」欄のデータにより示される位置を含む部分文書およびその前後部分を文書データ１０２１から切り出して、リスト１０３３の左欄に表示する。その際、第２検索部１０１５はリスト１０３３の左欄の表示に含まれるキーワードを枠で囲う等により強調する。 The control unit 101 includes a second search unit 1015 that searches for a partial document and a note attached thereto using the input keyword as a search key. When the user inputs a keyword in the box 1031 and presses the button 1032, the second search unit 1015 searches the index data including the keyword input from the index DB 1024 in the “keyword” column, and The partial document including the position indicated by the data in the “position” column and its front and rear portions are cut out from the document data 1021 and displayed in the left column of the list 1033. At that time, the second search unit 1015 emphasizes the keyword included in the display in the left column of the list 1033 by surrounding it with a frame or the like.

また第２検索部１０１５は検索した索引データの「対応注記番号」欄が空欄でない場合、その欄に含まれる番号を「番号」欄に含む注記データを注記ＤＢ１０２２から検索し、検索した注記データの「内容」欄の内容をリスト１０３３の右欄に表示する。 If the “corresponding note number” field of the searched index data is not blank, the second search unit 1015 searches the note DB 1022 for note data including the number included in that field in the “number” field, The contents of the “content” column are displayed in the right column of the list 1033.

ところで、上記の説明においては、文書データ１０２１と注記ＤＢ１０２２は分離されたデータであるものとして説明したが、これらのデータが１つの統合されたデータであってもよい。 In the above description, the document data 1021 and the annotation DB 1022 are described as separated data. However, these data may be integrated into one piece.

また、上記の説明においては、検索装置１０がユーザにより入力される注記を用いて注記ＤＢ１０２２を生成するものとして説明したが、他の装置により生成された注記ＤＢ１０２２を記憶媒体や通信回線等を介して検索装置１０に入力するようにしてもよい。 Further, in the above description, the search device 10 is described as generating the note DB 1022 using the note input by the user. However, the note DB 1022 generated by another device is stored via a storage medium, a communication line, or the like. May be input to the search device 10.

また、上記の説明においては、検索キーの候補となるキーワードを示すキーワードデータが予めキーワードＤＢ１０２３に格納されているものとして説明したが、例えば制御部１０１が文書全体もしくは注記が付された部分文書に含まれる単語のうち、登場頻度が高いものをキーワードとして選択する等により、文書に応じたキーワードＤＢ１０２３を生成するようにしてもよい。 In the above description, it has been described that keyword data indicating keywords as search key candidates is stored in the keyword DB 1023 in advance. However, for example, the control unit 101 applies to an entire document or a partial document with a note. The keyword DB 1023 corresponding to the document may be generated, for example, by selecting a word having a high appearance frequency among the included words as a keyword.

上記のように、検索装置１０によれば、注記が付された部分文書に含まれるキーワードを用いて索引データが自動的に生成される。その結果、文書の全文から抽出されたキーワードを用いて生成される索引データと比較し、その文書にとって重要なキーワードをより高い確率で含む高品質な索引データが容易に得られ、便利である。 As described above, according to the search device 10, index data is automatically generated using a keyword included in a partial document with a note. As a result, compared with index data generated using keywords extracted from the full text of the document, high-quality index data including keywords with a higher probability can be easily obtained, which is convenient.

また、検索装置１０によれば、上記のように生成された索引データを用いてユーザが検索を行う際、検索キーとしてユーザが入力するキーワードを含む部分文書に対し注記が付されている場合、その注記も併せて表示されるため、ユーザは特定のキーワードを含みかつ注記が付されている部分文書を容易に特定可能であるとともに、その注記を参照しつつ検索された部分文書を読むことができ、便利である。 Further, according to the search device 10, when a user performs a search using the index data generated as described above, when a note is attached to a partial document including a keyword input by the user as a search key, Since the note is also displayed, the user can easily specify the partial document that includes the specific keyword and is annotated, and can read the searched partial document while referring to the note. It is possible and convenient.

［２．第２実施形態］
以下、本発明の第２実施形態として、ユーザがタブレットＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）を用いて予め準備されたテキスト形式の文書に対し手書きで付加した注記に基づき索引データを生成し、その索引データをサーバ装置を介して他のユーザと共有することにより、効率的な文書内検索を可能とするシステムを説明する。また、第２実施形態のシステムにおいては索引データおよび検索結果の翻訳が行われるため、ユーザは文書の言語とは異なる言語のキーワードを検索キーとして用いることができる。以下、説明の重複を避けるため、第１実施形態と異なる点のみを説明する。また、以下で参照の図において、第１実施形態におけるものと同様の構成部もしくは対応する構成部には同じ符号が付されている。 [2. Second Embodiment]
Hereinafter, as a second embodiment of the present invention, index data is generated based on a note added by hand to a text document prepared in advance by a user using a tablet PC (Personal Computer), and the index data is stored in a server. A system that enables efficient search in a document by sharing with other users via the apparatus will be described. In the system of the second embodiment, index data and search results are translated, so that the user can use a keyword in a language different from the language of the document as a search key. Hereinafter, only points different from the first embodiment will be described in order to avoid duplication of explanation. In the drawings referred to below, the same reference numerals are given to the same or corresponding components as those in the first embodiment.

図７は、第２実施形態にかかる検索システム２の構成を示したブロック図である。検索システム２は、複数のユーザの各々が文書に対し注記を付すために使用するとともに文書内検索を行うために用いるタブレットＰＣ２０−１〜ｋ（ただし、ｋは任意の自然数）、各々のタブレットＰＣ２０において生成された索引データを格納しタブレットＰＣ２０の要求に応じて検索結果を提供するサーバ装置２１、そして各々のタブレットＰＣ２０およびサーバ装置２１を相互に接続するネットワーク２２から構成されている。なお、以下の説明においてタブレットＰＣ２０−１〜ｋの各々を特に区別する必要がない場合、単にタブレットＰＣ２０と呼ぶ。 FIG. 7 is a block diagram showing the configuration of the search system 2 according to the second embodiment. The search system 2 is used by each of a plurality of users to attach a note to a document and used for performing a search in the document. Tablet PCs 20-1 to k (where k is an arbitrary natural number), each tablet PC 20 Is composed of a server device 21 that stores the index data generated in FIG. 4 and provides search results in response to a request from the tablet PC 20, and a network 22 that interconnects each tablet PC 20 and the server device 21. In the following description, when it is not necessary to distinguish each of the tablet PCs 20-1 to 20-k, they are simply referred to as a tablet PC 20.

表示部１０３は例えば液晶ディスプレイである。操作部１０４は、ペン型のスタイラスを用いたユーザの筆記動作に応じて筆圧の加えられた位置および筆圧の大きさを測定し、測定した筆圧の位置および大きさを示すデータ（以下、「筆跡データ」と呼ぶ）を生成するペンタブレットであり、表示部１０３と一体化されている。筆跡データは、文書とともに表示部１０３に表示される。そのため、ユーザは表示部１０３に表示される文書に対し手書きで注記を付加することができる。図８は、文書および注記が表示部１０３に表示された様子を例示した図である。 The display unit 103 is a liquid crystal display, for example. The operation unit 104 measures the position where the writing pressure is applied and the magnitude of the writing pressure according to the writing operation of the user using the pen-type stylus, and data indicating the position and the magnitude of the measured writing pressure (hereinafter referred to as the writing pressure). , Referred to as “handwriting data”), and is integrated with the display unit 103. The handwriting data is displayed on the display unit 103 together with the document. Therefore, the user can add a handwritten note to the document displayed on the display unit 103. FIG. 8 is a diagram illustrating a state in which a document and a note are displayed on the display unit 103.

制御部１０１は、ユーザにより手書きされた注記を示す筆跡データに対し文字認識処理を行い、注記の内容を示すテキストデータを生成する文字認識処理部２０１１を備えている。文字認識処理部２０１１は生成した注記のテキストデータを、注記が書き込まれた画面上の位置を示す位置データと対応付けて注記データを生成し、注記ＤＢ１０２２に格納する。注記が書き込まれた位置としては、例えば注記を示す筆跡データを囲む四角形の左上端の位置が代表点として用いられる。 The control unit 101 includes a character recognition processing unit 2011 that performs character recognition processing on handwriting data indicating a note handwritten by a user and generates text data indicating the content of the note. The character recognition processing unit 2011 generates note data by associating the generated text data of the note with position data indicating the position on the screen where the note is written, and stores the note data in the note DB 1022. As the position where the note is written, for example, the position of the upper left corner of the rectangle surrounding the handwriting data indicating the note is used as the representative point.

図９は、記憶部１０２に記憶される注記ＤＢ１０２２の内容を例示した図である。第２実施形態において、注記データの「位置」欄には３要素を含む配列が格納され、第１要素はページ番号、第２要素は紙面の左端からの距離（単位は例えばミリメートル）、第３要素は紙面の上端からの距離（単位は例えばミリメートル）をそれぞれ示している。 FIG. 9 is a diagram illustrating the contents of the annotation DB 1022 stored in the storage unit 102. In the second embodiment, an array including three elements is stored in the “position” column of the note data, the first element is a page number, the second element is a distance from the left end of the page (the unit is millimeters), the third Each element indicates a distance from the upper end of the page (unit: millimeters, for example).

特定部１０１２は、画面に表示された文書と注記との位置関係に基づき、注記が付された対象の部分文書を特定し、特定した部分文書を示す部分文書データを第１検索部１０１３に引き渡すとともに、部分文書の文書における範囲を示す範囲データを索引データ生成部１０１４に引き渡す。 The specifying unit 1012 specifies the target partial document to which the note is attached based on the positional relationship between the document displayed on the screen and the note, and delivers the partial document data indicating the specified partial document to the first search unit 1013. At the same time, range data indicating the range of the partial document is delivered to the index data generation unit 1014.

第１検索部１０１３は、部分文書データに含まれるキーワードを抽出し、抽出したキーワードを示す抽出キーワードデータを生成する。ところで、制御部１０１は、ある言語の語もしくは文書を他の言語の語もしくは文書に翻訳する変換部２０１２を備えている。第１検索部１０１３は生成した抽出キーワードデータを変換部２０１２に引き渡す。変換部２０１２は、基準言語以外の言語の抽出キーワードデータを受け取ると、受け取った抽出キーワードデータにより示されるキーワードの各々を基準言語に翻訳し、その結果を示す基準言語抽出キーワードデータを生成する。以下の説明において、例として、ユーザの用いる言語は日本語であり、基準言語は英語であるものとする。変換部２０１２は生成した基準言語抽出キーワードデータを索引データ生成部１０１４に引き渡す。 The first search unit 1013 extracts keywords included in the partial document data, and generates extracted keyword data indicating the extracted keywords. Incidentally, the control unit 101 includes a conversion unit 2012 that translates a word or document in one language into a word or document in another language. The first search unit 1013 delivers the generated extracted keyword data to the conversion unit 2012. Upon receiving extracted keyword data in a language other than the reference language, the conversion unit 2012 translates each of the keywords indicated by the received extracted keyword data into the reference language, and generates reference language extracted keyword data indicating the result. In the following description, as an example, it is assumed that the language used by the user is Japanese and the reference language is English. The conversion unit 2012 delivers the generated reference language extraction keyword data to the index data generation unit 1014.

索引データ生成部１０１４は、特定部１０１２から受け取る範囲データと、変換部２０１２から受け取る基準言語抽出キーワードデータに基づき索引データを生成し記憶部１０２の索引ＤＢ１０２４に格納する。図１０は索引ＤＢ１０２４の内容を例示した図であり、第１実施形態における索引ＤＢ１０２４と比較し、「キーワード」欄に英語のキーワードが格納されている点が異なっている。 The index data generation unit 1014 generates index data based on the range data received from the specifying unit 1012 and the reference language extraction keyword data received from the conversion unit 2012 and stores it in the index DB 1024 of the storage unit 102. FIG. 10 is a diagram illustrating the contents of the index DB 1024, and is different from the index DB 1024 in the first embodiment in that English keywords are stored in the “keyword” column.

第２実施形態において、索引データ生成部１０１４は上記のように注記の付された部分文書に含まれるキーワードに関する索引データを生成するのに加え、注記に含まれるキーワードに関する索引データも生成する。 In the second embodiment, the index data generation unit 1014 generates index data related to the keywords included in the notes, in addition to generating index data related to the keywords included in the partial documents annotated as described above.

具体的には、索引データ生成部１０１４は注記ＤＢ１０２２の「内容」欄により示される注記の中から、キーワードＤＢ１０２３により示されるキーワードの各々を検索する。索引データ生成部１０１４は検索に成功した場合、検索キーとして用いたキーワードを示すキーワードデータを変換部２０１２に引き渡し、変換部２０１２からそのキーワードに対応する英語のキーワードを示す基準言語キーワードデータを受け取る。 Specifically, the index data generation unit 1014 searches each of the keywords indicated by the keyword DB 1023 from the notes indicated by the “content” column of the note DB 1022. If the search is successful, the index data generation unit 1014 passes the keyword data indicating the keyword used as the search key to the conversion unit 2012, and receives reference language keyword data indicating the English keyword corresponding to the keyword from the conversion unit 2012.

続いて、索引データ生成部１０１４は基準言語キーワードデータを格納する「キーワード」、キーワードを含む注記の番号を示す「注記番号」、そして注記の付された部分文書の先頭文字の文書における位置を示す「対応位置」の各項目を含む索引データを生成し、記憶部１０２の注記索引ＤＢ２０２１に格納する。図１１は注記索引ＤＢ２０２１の内容を例示した図である。 Subsequently, the index data generation unit 1014 indicates a “keyword” for storing the reference language keyword data, a “note number” indicating the number of a note including the keyword, and a position of the first character of the partial document with the note in the document. Index data including each item of “corresponding position” is generated and stored in the note index DB 2021 of the storage unit 102. FIG. 11 is a diagram illustrating the contents of the note index DB 2021.

タブレットＰＣ２０は、ネットワーク２２を介してサーバ装置２１との間でデータの送受信を行う通信部２０５を備えている。制御部１０１は文書データ１０２１、注記ＤＢ１０２２、索引ＤＢ１０２４および注記索引ＤＢ２０２１を、文書の名称を示す名称データとともに通信部２０５を用いてサーバ装置２１に送信する。 The tablet PC 20 includes a communication unit 205 that transmits and receives data to and from the server device 21 via the network 22. The control unit 101 transmits the document data 1021, the annotation DB 1022, the index DB 1024, and the annotation index DB 2021 to the server device 21 using the communication unit 205 together with the name data indicating the name of the document.

サーバ装置２１は、サーバ装置２１の構成部を制御する制御部２１１、制御部２１１による各種処理を指示するプログラムおよび各種データを記憶するとともに制御部２１１等のワークエリアとして用いられる記憶部２１２、そしてネットワーク２２を介して各々のタブレットＰＣ２０との間でデータの送受信を行う通信部２１３を備えている。 The server device 21 includes a control unit 211 that controls the components of the server device 21, a program that instructs various processes by the control unit 211 and various data, and a storage unit 212 that is used as a work area for the control unit 211, and the like. A communication unit 213 that transmits and receives data to and from each tablet PC 20 via the network 22 is provided.

制御部２１１はタブレットＰＣ２０から受信したデータを記憶部２１２に記憶させる登録部２１１１を備えている。登録部２１１１はタブレットＰＣ２０から受信した文書データ１０２１を文書データ群２１２１に追加する。文書データ群２１２１は、タブレットＰＣ２０の各々からサーバ装置２１が受信した文書データの集まりであり、各々の文書データは例えば、名称データにより互いに識別される。 The control unit 211 includes a registration unit 2111 that stores data received from the tablet PC 20 in the storage unit 212. The registration unit 2111 adds the document data 1021 received from the tablet PC 20 to the document data group 2121. The document data group 2121 is a collection of document data received by the server device 21 from each of the tablet PCs 20, and each document data is identified from each other by, for example, name data.

また、登録部２１１１はタブレットＰＣ２０から受信した注記ＤＢ１０２２、索引ＤＢ１０２４および注記索引ＤＢ２０２１に含まれるデータを、注記ＤＢ２１２２、索引ＤＢ２１２３および注記索引ＤＢ２１２４にそれぞれ追加する。図１２、図１３および図１４は、それぞれ、注記ＤＢ２１２２、索引ＤＢ２１２３および注記索引ＤＢ２１２４の内容を例示した図である。注記ＤＢ２１２２、索引ＤＢ２１２３および注記索引ＤＢ２１２４は、それぞれ、注記ＤＢ１０２２、索引ＤＢ１０２４および注記索引ＤＢ２０２１が備える項目に加え、名称データを格納する「文書名称」欄を備えている。 Also, the registration unit 2111 adds the data included in the note DB 1022, the index DB 1024, and the note index DB 2021 received from the tablet PC 20 to the note DB 2122, the index DB 2123, and the note index DB 2124, respectively. FIGS. 12, 13, and 14 are diagrams illustrating the contents of the note DB 2122, the index DB 2123, and the note index DB 2124, respectively. The annotation DB 2122, the index DB 2123, and the annotation index DB 2124 each include a “document name” column for storing name data in addition to the items included in the annotation DB 1022, the index DB 1024, and the annotation index DB 2021.

上記のように、記憶部２１２に文書データ群２１２１が記憶され、注記ＤＢ２１２２、索引ＤＢ２１２３および注記索引ＤＢ２１２４にタブレットＰＣ２０の各々から送信されたデータが追加された状態で、ユーザはタブレットＰＣ２０に任意のキーワードを入力し、そのキーワードを含む部分文書を、その部分文書に付された注記とともに表示させることができる。その際、入力したキーワードの言語と異なる言語の文書に含まれる部分文書および注記も表示される。 As described above, the document data group 2121 is stored in the storage unit 212, and the user adds any data transmitted from the tablet PC 20 to the note DB 2122, the index DB 2123, and the note index DB 2124. A keyword can be input and a partial document including the keyword can be displayed together with a note attached to the partial document. At that time, partial documents and notes included in a document in a language different from the language of the input keyword are also displayed.

図１５は、表示部１０３に表示される検索画面を例示した図である。第２実施形態における検索画面には、第１実施形態における検索画面に含まれるオブジェクトに加え、検索対象として文書と注記のいずれかを選択するためのオプションボタン２０３１が含まれている。 FIG. 15 is a diagram illustrating a search screen displayed on the display unit 103. The search screen in the second embodiment includes an option button 2031 for selecting either a document or a note as a search target in addition to the objects included in the search screen in the first embodiment.

ユーザがオプションボタン２０３１においてオプション「文書」を選択した後、ボタン１０３２を押下する操作を行うと、変換部２０１２は入力されたキーワードを英語に翻訳したものを示す基準言語キーワードデータを生成し、ユーザにより選択されたオプション「文書」を示す選択データとともに、サーバ装置２１に送信する。 When the user selects the option “document” with the option button 2031 and then performs an operation of pressing the button 1032, the conversion unit 2012 generates reference language keyword data indicating the input keyword translated into English, and the user Together with the selection data indicating the option “document” selected in step S3, it is transmitted to the server device 21.

サーバ装置２１の制御部２１１は、タブレットＰＣ２０から受信した基準言語キーワードを検索キーとして、索引ＤＢ２１２３もしくは注記索引ＤＢ２１２４を用いて文書内検索を行う検索部２１１２を備えている。検索部２１１２は、タブレットＰＣ２０からオプション「文書」を示す選択データを受信した場合、索引ＤＢ２１２３を用いた検索結果をタブレットＰＣ２０に送信する。 The control unit 211 of the server device 21 includes a search unit 2112 that performs in-document search using the index DB 2123 or the note index DB 2124 using the reference language keyword received from the tablet PC 20 as a search key. When receiving the selection data indicating the option “document” from the tablet PC 20, the search unit 2112 transmits the search result using the index DB 2123 to the tablet PC 20.

具体的には、まず検索部２１１２は索引ＤＢ２１２３に含まれる索引データの中から、タブレットＰＣ２０から受信した基準言語キーワードデータと一致するデータを「キーワード」欄に含むものを検索し、検索した索引データの「文書名称」により特定される文書データから、検索した索引データの「位置」により特定される部分文書を示す部分文書データを取得する。さらに、検索した索引データの「対応注記番号」欄が空欄でない場合、検索部２１１２は注記ＤＢ２１２２に含まれる注記データから、検索した索引データの「文書名称」および「対応注記番号」のデータと一致するデータを含む注記データを検索し、検索した注記データの「内容」欄のデータを取得する。 Specifically, the search unit 2112 first searches the index data included in the index DB 2123 for data that matches the reference language keyword data received from the tablet PC 20 in the “keyword” column, and searches the index data. The partial document data indicating the partial document specified by the “position” of the retrieved index data is acquired from the document data specified by the “document name”. Further, when the “corresponding note number” field of the retrieved index data is not blank, the search unit 2112 matches the data of “document name” and “corresponding note number” of the retrieved index data from the note data included in the note DB 2122. The note data including the data to be searched is searched, and the data in the “content” column of the searched note data is acquired.

以上のように検索部２１１２はキーワードを含む部分文書を示す部分文書データと、その部分文書に注記が付されている場合にはその注記を示すデータを取得すると、それらのデータを検索結果データとしてタブレットＰＣ２０に送信する。 As described above, the search unit 2112 acquires partial document data indicating a partial document including a keyword, and data indicating the note when the partial document is annotated, and uses the data as search result data. It transmits to tablet PC20.

タブレットＰＣ２０はサーバ装置２１から受信した検索結果データに日本語以外の言語の部分文書もしくは注記を示すものが含まれている場合、変換部２０１２によりそれらを日本語に翻訳する。タブレットＰＣ２０は受信した検索結果データもしくはそれを翻訳したデータを用いて検索画面のリスト１０３３の表示（図１５参照）を行う。 When the search result data received from the server device 21 includes partial documents or notes indicating a language other than Japanese, the tablet PC 20 translates them into Japanese by the conversion unit 2012. The tablet PC 20 displays the list 1033 of the search screen (see FIG. 15) using the received search result data or data obtained by translating the search result data.

図１６はユーザがオプションボタン２０３１においてオプション「注記」を選択した場合の検索画面を例示した図である。ユーザによりオプション「注記」が選択され、ボタン１０３２の押下操作が行われた場合のタブレットＰＣ２０およびサーバ装置２１の動作は、オプション「文書」が選択された場合の動作と比較し、サーバ装置２１の検索部２１１２が索引ＤＢ２１２３の代わりに注記索引ＤＢ２１２４を用いる点を除き同じであるため、その説明を省略する。 FIG. 16 is a diagram illustrating a search screen when the user selects the option “note” with the option button 2031. The operation of the tablet PC 20 and the server device 21 when the user selects the option “note” and presses the button 1032 is compared with the operation when the option “document” is selected. Since the search unit 2112 is the same except that the note index DB 2124 is used instead of the index DB 2123, the description thereof is omitted.

ところで、上記の説明においては、変換部２０１２を各々のタブレットＰＣ２０に配置するものとして説明したが、変換部２０１２をサーバ装置２１に配置し、サーバ装置２１がキーワードおよび検索結果の翻訳を行うようにしてもよい。 In the above description, the conversion unit 2012 is described as being arranged in each tablet PC 20. However, the conversion unit 2012 is arranged in the server device 21 so that the server device 21 translates keywords and search results. May be.

また、上記の説明においては、索引データの生成において、検索部２１１２はキーワードを単語単位で翻訳し基準言語キーワードデータを生成するものとして説明したが、検索部２１１２がキーワードを含む文もしくは段落を翻訳し、その結果得られる翻訳文に含まれるキーワード部分を示す基準言語キーワードデータを生成するようにしてもよい。その場合、単語単位でキーワードを翻訳する場合と比較し、文脈に応じ適する訳語が選択される結果、より精度の高い索引データの生成が行われる。 Further, in the above description, in the generation of index data, the search unit 2112 has been described as generating the reference language keyword data by translating keywords in units of words. However, the search unit 2112 translates a sentence or paragraph including the keywords. Then, reference language keyword data indicating a keyword portion included in the translation obtained as a result may be generated. In that case, compared with the case where the keyword is translated in units of words, as a result of selecting a translation suitable for the context, index data with higher accuracy is generated.

また、上記の説明においては、索引データに用いるキーワードを基準言語に翻訳することにより、異なる言語間で使用可能な索引データを生成するものとしたが、基準言語を特定することなく、予め各言語のキーワードを含む索引データを生成しておき、例えばユーザにより日本語のキーワードが入力された場合には、日本語の索引データを用いて検索処理を行うようにしてもよい。 Further, in the above description, the index data that can be used between different languages is generated by translating the keywords used for the index data into the reference language. For example, when a Japanese keyword is input by a user, search processing may be performed using the Japanese index data.

また、上記の説明においては、サーバ装置２１からタブレットＰＣ２０に送信される検索結果データに含まれる注記を示すデータに複数の言語のデータが混在しており、それらの言語を変換部２０１２により翻訳するものとして説明した。しかしながら、どの段階で注記を示すデータの翻訳を行うかは任意に変更可能である。例えばタブレットＰＣ２０の変換部２０１２が文書データ１０２１、注記ＤＢ１０２２、索引ＤＢ１０２４および注記索引ＤＢ２０２１に含まれるテキストを複数の言語に翻訳した後にサーバ装置２１に送信するようにしてもよい。その場合、サーバ装置２１は記憶部２１２に予め各言語に対応する文書データ群２１２１、注記ＤＢ２１２２、索引ＤＢ２１２３および注記索引ＤＢ２１２４を記憶しておけるので、サーバ装置２１は検索要求を行ったユーザの使用する言語に対応する文書データ群２１２１等を用いて、当該言語のみで記述された注記を示すデータを、検索結果データとしてタブレットＰＣ２０に送信することができる。 In the above description, data indicating a plurality of languages is mixed in the data indicating the notes included in the search result data transmitted from the server device 21 to the tablet PC 20, and these languages are translated by the conversion unit 2012. Explained as a thing. However, it is possible to arbitrarily change at which stage the data indicating the annotation is translated. For example, the conversion unit 2012 of the tablet PC 20 may transmit the text included in the document data 1021, the note DB 1022, the index DB 1024, and the note index DB 2021 to the server device 21 after being translated into a plurality of languages. In this case, since the server device 21 can store in advance the document data group 2121, the note DB 2122, the index DB 2123, and the note index DB 2124 corresponding to each language in the storage unit 212, the server device 21 uses the user who made the search request. Using the document data group 2121 or the like corresponding to the language to be used, data indicating a note described only in the language can be transmitted to the tablet PC 20 as search result data.

また、上記の説明においては、ユーザが端末装置としてタブレットＰＣ２０を用いるものとして説明したが、ユーザが用いる端末装置は、例えば一般的なＰＣにマウスや表示部とは独立したペンタブレット等のポインティングデバイスを接続したものであってもよい。その場合、ユーザがそれらのポインティングデバイスを用いた筆記動作により文書に対する注記の付加を行う。 In the above description, the user uses the tablet PC 20 as the terminal device. However, the terminal device used by the user is, for example, a general PC and a pointing device such as a pen tablet independent of a mouse and a display unit. May be connected. In that case, the user adds notes to the document by a writing operation using those pointing devices.

また、上記の説明においては、注記の内容を示すデータとして文字認識処理部２０１１により生成されたテキストデータが用いられるものとして説明したが、それに代えて、もしくは加えて、注記を示す筆跡データを用いるようにしてもよい。その場合、例えば文字認識処理において認識が困難な手書き文字や認識が不可能な図形等の注記についても、検索結果に表示する等の利用が可能となる。 In the above description, the text data generated by the character recognition processing unit 2011 is used as the data indicating the content of the note. However, the handwriting data indicating the note is used instead of or in addition to the text data. You may do it. In such a case, for example, handwritten characters that are difficult to recognize in character recognition processing or notes such as graphics that cannot be recognized can be used to be displayed in search results.

以上のように、検索システム２によれば、文書に対しユーザが手書きにより付加した注記を用いて効率的な索引データの生成が行われ便利である。また、検索システム２によれば、索引データに含まれるキーワードや検索結果に含まれる文書もしくは注記の言語が必要に応じて他の言語に翻訳される結果、様々な言語の文書を検索対象とすることができ便利である。さらに、検索システム２によれば、注記に含まれるキーワードに基づく索引データが生成されるため、特定のキーワードを含む注記の検索が可能となる。その際、注記とともに注記が付された対象の部分文書も表示されるため、注記の内容を検討する際に便利である。 As described above, according to the search system 2, it is convenient to efficiently generate index data by using a note added by handwriting to a document by a user. Further, according to the search system 2, as a result of translating the keywords included in the index data, the documents included in the search results, or the language of the notes into other languages as necessary, the documents in various languages are searched. Can be convenient. Furthermore, according to the search system 2, since index data based on the keyword included in the note is generated, it is possible to search for a note including a specific keyword. At that time, the target partial document to which the note is attached is also displayed together with the note, which is convenient when examining the content of the note.

［３．第３実施形態］
例えば国際会議においては複数の言語により同じ内容を示す文書が印刷物として参加者に配布されることが多い。その場合、参加者は各自にとって便利な言語により、配布された印刷物に手書きで注記を書き入れる。以下、本発明の第３実施形態として、そのように異なる言語の版を有する文書の印刷物に手書きされた注記を互いに関連付け、効率的な検索を可能とする検索装置を説明する。以下、説明の重複を避けるため、第１実施形態および第２実施形態と異なる点のみを説明する。また、以下で参照の図において、第１実施形態もしくは第２実施形態におけるものと同様の構成部もしくは対応する構成部には同じ符号が付されている。 [3. Third Embodiment]
For example, in international conferences, documents showing the same content in a plurality of languages are often distributed to participants as printed matter. In that case, participants write notes on the distributed printed matter by hand in a language convenient to them. Hereinafter, as a third embodiment of the present invention, a description will be given of a search device that enables efficient search by associating handwritten notes with printed documents of documents having different language versions. Hereinafter, only points different from the first embodiment and the second embodiment will be described in order to avoid duplication of explanation. In the drawings referred to below, the same reference numerals are given to the same or corresponding components as those in the first embodiment or the second embodiment.

図１７は、第３実施形態にかかる検索システム３の構成を示したブロック図である。検索システム３は、ユーザが索引データの生成および検索のために使用する検索装置３０と、検索装置３０に接続されたスキャナ３１から構成されている。スキャナ３１は、紙面に記載された文字、線図、画像等を光学的に読み取り、それらを示す画像データを生成して出力する装置である。 FIG. 17 is a block diagram showing the configuration of the search system 3 according to the third embodiment. The search system 3 includes a search device 30 used by a user for generating and searching index data, and a scanner 31 connected to the search device 30. The scanner 31 is a device that optically reads characters, diagrams, images, and the like written on a paper surface, and generates and outputs image data indicating them.

記憶部１０２には、予め国際会議の参加者に配布された各言語の版の文書を示す文書データを含む文書データ群３０２１が記憶されている。文書データ群３０２１に含まれる文書データは、いずれの言語による版であるかを示す言語データにより互いに識別される。 The storage unit 102 stores a document data group 3021 including document data indicating versions of documents distributed in advance to participants in international conferences. The document data included in the document data group 3021 is distinguished from each other by language data indicating which language the version is.

また、記憶部１０２には、異なる言語の文書に含まれる部分文書間の対応関係を示す文書間対応ＤＢ３０２２が記憶されている。図１８は文書間対応ＤＢ３０２２の内容を例示した図である。文書間対応ＤＢ３０２２は文書を構成する部分文書の各々に対応する文書間対応データを格納している。文書間対応データは「英語」、「日本語」等の各言語に対応する項目を備えており、各項目には、対応する言語の部分文書の先頭文字の位置を示す位置データが格納されている。仮に各言語の版のいずれにおいても文単位で逐語翻訳が行われていれば、文の区切りにより異なる言語の部分文書の対応関係はその順序により特定可能である。しかしながら、ある言語における１文が他の言語において複数文に翻訳される等の場合がある。検索装置３０は文書間対応ＤＢ３０２２により、そのような場合でも部分文書の対応関係を特定可能としている。 In addition, the storage unit 102 stores an inter-document correspondence DB 3022 that indicates the correspondence between partial documents included in documents in different languages. FIG. 18 is a diagram illustrating the contents of the inter-document correspondence DB 3022. The inter-document correspondence DB 3022 stores inter-document correspondence data corresponding to each partial document constituting the document. Inter-document correspondence data has items corresponding to each language such as “English”, “Japanese”, etc., and each item stores position data indicating the position of the first character of the partial document in the corresponding language. Yes. If word-by-sentence translation is performed on a sentence-by-sentence basis in any of the language versions, the correspondence between partial documents in different languages can be specified by the order of sentence separation. However, there are cases where one sentence in one language is translated into a plurality of sentences in another language. Even in such a case, the retrieval apparatus 30 can specify the correspondence relationship of the partial documents by using the inter-document correspondence DB 3022.

検索システム３のユーザ、例えば国際会議の開催者は、配布した印刷物を参加者から回収し、スキャナ３１によりそれらの印刷物の記載内容を読み取る。スキャナ３１により生成された印刷物の記載内容を示す画像データは検索装置３０に引き渡される。制御部１０１は活字文字と手書き文字の混在した画像データから各々の文字を示す画像データを分離生成する画像データ分離部３０１１を備えている。画像データ分離部３０１１はスキャナ３１から受け取った画像データに対し、例えばフィルタ処理を行うことにより、印刷物に予め印刷されていた文書部分を示す画像データと、参加者により手書きされた注記部分を示す画像データを生成する。 The user of the search system 3, for example, the organizer of the international conference, collects the distributed printed matter from the participants and reads the description content of the printed matter with the scanner 31. Image data indicating the description content of the printed matter generated by the scanner 31 is delivered to the search device 30. The control unit 101 includes an image data separation unit 3011 that separates and generates image data indicating each character from image data in which printed characters and handwritten characters are mixed. The image data separation unit 3011 performs, for example, a filtering process on the image data received from the scanner 31 to display image data indicating a document portion that has been printed in advance on the printed matter, and an image indicating a note portion handwritten by a participant. Generate data.

活字文字と手書き文字の画像データを分離するフィルタとしては、既存の様々な技術によるものが利用可能である。例えば各線図に含まれる直線成分と非直線成分の比率が所定の閾値以上であるものを活字文字として抽出し、所定の閾値未満であるものを手書き文字として抽出するフィルタを利用してもよいし、各線図の濃度の分散が所定の閾値未満であるものを活字文字として抽出し、所定の閾値以上であるものを手書き文字として抽出するフィルタを利用するようにしてもよい。 As filters for separating image data of printed characters and handwritten characters, those using various existing techniques can be used. For example, a filter may be used that extracts characters whose ratio between the linear component and the non-linear component included in each diagram is equal to or greater than a predetermined threshold value as a printed character, and extracts characters that are less than the predetermined threshold value as handwritten characters. Alternatively, a filter may be used that extracts characters whose density variance is less than a predetermined threshold as printed characters and extracts characters that are greater than or equal to the predetermined threshold as handwritten characters.

画像データ分離部３０１１は、上記のように文書部分および注記部分を各々示す画像データを生成すると、それらの画像データを文字認識処理部２０１１に引き渡す。文字認識処理部２０１１はそれらの画像データの各々により示される文字を認識し、認識結果を示すテキストデータを生成する。その際、文字認識処理部２０１１は画像データにより示される文字がいずれの言語のものであるかも併せて認識する。文字認識処理部２０１１は生成したテキストデータを、そのテキストデータに対応する画像の画面上の位置を示す位置データと対応付けて、認識した言語を示す言語データとともに特定部１０１２に引き渡す。 When the image data separation unit 3011 generates image data indicating the document portion and the note portion as described above, the image data separation unit 3011 delivers the image data to the character recognition processing unit 2011. The character recognition processing unit 2011 recognizes the character indicated by each of the image data, and generates text data indicating the recognition result. At that time, the character recognition processing unit 2011 also recognizes which language the character indicated by the image data belongs to. The character recognition processing unit 2011 associates the generated text data with the position data indicating the position on the screen of the image corresponding to the text data, and delivers it to the specifying unit 1012 together with the language data indicating the recognized language.

特定部１０１２は、例えば画面における文書部分と注記部分の縦方向の位置関係に基づき受け取った注記部分のテキストデータに対応する文書部分のテキストデータの一部を特定する。そのように特定されるテキストデータの一部は、注記が付された部分文書を示す部分文書データである。続いて、特定部１０１２は受け取った言語データにより識別される文書データの中から、特定した部分文書データと一致する部分を検索する。特定部１０１２は、検索により得られた部分文書の先頭文字の文書における位置を示す位置データを、注記部分のテキストデータと対応付け、注記データとして記憶部１０２の注記ＤＢに記憶する。 The specifying unit 1012 specifies a part of the text data of the document part corresponding to the text data of the note part received based on the vertical positional relationship between the document part and the note part on the screen, for example. A part of the text data identified as such is partial document data indicating a partial document to which a note is attached. Subsequently, the specifying unit 1012 searches the document data identified by the received language data for a portion that matches the specified partial document data. The specifying unit 1012 associates the position data indicating the position of the first character of the partial document obtained by the search with the text data of the annotation part, and stores it in the annotation DB of the storage unit 102 as annotation data.

記憶部１０２には、各々の言語に応じた注記ＤＢ、キーワードＤＢおよび索引ＤＢが記憶される。図１７において、注記ＤＢの集まりは注記ＤＢ群３０２３、キーワードＤＢの集まりはキーワードＤＢ群３０２４、索引ＤＢの集まりは索引ＤＢ群３０２５として示されている。例えば、スキャナ３１により読み取りの行われた文書が日本語のものである場合、特定部１０１２は日本語に対応する索引ＤＢに注記データを記憶させる。検索装置３０の記憶部１０２に記憶される注記ＤＢの構成は、第１実施形態における注記ＤＢ１０２２（図３参照）と同様である。 The storage unit 102 stores a note DB, a keyword DB, and an index DB corresponding to each language. In FIG. 17, a collection of note DBs is shown as a note DB group 3023, a collection of keyword DBs is shown as a keyword DB group 3024, and a collection of index DBs is shown as an index DB group 3025. For example, when a document read by the scanner 31 is in Japanese, the specifying unit 1012 stores note data in an index DB corresponding to Japanese. The configuration of the note DB stored in the storage unit 102 of the search device 30 is the same as the note DB 1022 (see FIG. 3) in the first embodiment.

上記のように、特定部１０１２により注記ＤＢへ注記データの格納が行われると、第１実施形態における場合と同様の方法により、第１検索部１０１３によるキーワード抽出処理が行われ、索引データ生成部１０１４による索引データの生成および索引ＤＢへの格納が行われる。ただし、第３実施形態における第１検索部１０１３および索引データ生成部１０１４の処理においては、文字認識処理部２０１１により認識された言語に対応する文書データ、注記ＤＢ、キーワードＤＢおよび索引ＤＢが選択され用いられる。検索装置３０の記憶部１０２に記憶される索引ＤＢの構成は、第１実施形態における索引ＤＢ１０２４（図５参照）と同様である。 As described above, when note data is stored in the note DB by the specifying unit 1012, the keyword extraction processing by the first search unit 1013 is performed by the same method as in the first embodiment, and the index data generation unit The index data is generated by 1014 and stored in the index DB. However, in the processing of the first search unit 1013 and the index data generation unit 1014 in the third embodiment, document data, note DB, keyword DB, and index DB corresponding to the language recognized by the character recognition processing unit 2011 are selected. Used. The configuration of the index DB stored in the storage unit 102 of the search device 30 is the same as the index DB 1024 (see FIG. 5) in the first embodiment.

以上のように、複数の言語の各々に関し、複数の読み手により付加された注記に関する注記データを格納した注記ＤＢと、それらの注記の付された部分文書に含まれるキーワードに関する索引データを格納した索引ＤＢが記憶部１０２に記憶されると、ユーザはキーワードを入力して、そのキーワードを含む部分文書をその部分文書に付された注記とともに表示させることができる。図１９は検索装置３０の表示部１０３に表示される検索画面を例示した図である。 As described above, for each of a plurality of languages, a note DB that stores note data relating to notes added by a plurality of readers, and an index that stores index data relating to keywords included in the partial documents to which those notes are attached. When the DB is stored in the storage unit 102, the user can input a keyword and display a partial document including the keyword together with a note attached to the partial document. FIG. 19 is a diagram illustrating a search screen displayed on the display unit 103 of the search device 30.

検索画面には、第１実施形態における検索画面（図６参照）に含まれるオブジェクトに加え、キーワードの入力および結果の表示に用いられる言語をユーザが選択するためのボックス３０３１が含まれている。以下、ユーザはボックス３０３１に「日本語」を入力したものとする。ボックス３０３１に「日本語」が入力されると、ボックス１０３１のプルダウンメニューには日本語に対応する索引ＤＢの「キーワード」欄のデータが表示されるようになる。 In addition to the objects included in the search screen (see FIG. 6) in the first embodiment, the search screen includes a box 3031 for the user to select a language used for keyword input and result display. Hereinafter, it is assumed that the user inputs “Japanese” in the box 3031. When “Japanese” is input in the box 3031, data in the “keyword” column of the index DB corresponding to Japanese is displayed in the pull-down menu of the box 1031.

続いて、ユーザがボックス１０３１にキーワードを入力し、ボタン１０３２を押下する操作を行うと、第２検索部１０１５は日本語に対応する索引ＤＢから、入力されたキーワードを含む索引データを検索する。 Subsequently, when the user inputs a keyword in the box 1031 and performs an operation of pressing the button 1032, the second search unit 1015 searches the index DB corresponding to Japanese for index data including the input keyword.

続いて、第２検索部１０１５は検索した索引データの「位置」欄のデータと一致するデータを「日本語」欄に格納する文書間対応データを文書間対応ＤＢ３０２２から検索する。第２検索部１０１５は検索した文書間対応データに含まれる各言語に対応する位置データを取り出し、取り出した位置データと一致するデータを含む注記データを、各言語に対応する注記ＤＢから検索する。 Subsequently, the second search unit 1015 searches the inter-document correspondence DB 3022 for inter-document correspondence data in which data matching the data in the “position” column of the retrieved index data is stored in the “Japanese” column. The second search unit 1015 retrieves position data corresponding to each language included in the retrieved inter-document correspondence data, and retrieves note data including data matching the retrieved position data from the note DB corresponding to each language.

第２検索部１０１５は検索した注記データの「内容」欄のテキストデータを変換部２０１２に引き渡し、引き渡したテキストデータにより示される注記を日本語に翻訳した結果を示すテキストデータを翻訳前の注記の言語を示す言語データとともに変換部２０１２から受け取る。第２検索部１０１５は日本語の注記データの「位置」欄により特定される部分文書データと、変換部２０１２から受け取った翻訳後のテキストデータおよび言語データを用いて、図１９のリスト１０３３に検索結果を表示する。 The second search unit 1015 delivers the text data in the “content” column of the searched note data to the conversion unit 2012, and converts the note indicated by the delivered text data into Japanese text data indicating the result of the translation before the note. It is received from the converter 2012 together with language data indicating the language. The second search unit 1015 searches the list 1033 in FIG. 19 using the partial document data specified by the “position” column of the Japanese note data and the translated text data and language data received from the conversion unit 2012. Display the results.

ところで、上記の説明においては、印刷物をスキャンして得られる画像データに対し文字認識処理を行った結果得られるテキストデータに基づき、印刷物に記載された文書の言語を認識するとともに、その印刷物が文書のいずれのページに該当するものかを特定するものとして説明したが、例えば予め印刷物に言語を示す記号やページ番号を印字しておき、文字認識処理部２０１１によりそれらの記号や番号を認識することにより、言語やページの特定を行うようにしてもよい。 By the way, in the above description, based on text data obtained as a result of performing character recognition processing on image data obtained by scanning a printed matter, the language of the document described in the printed matter is recognized, and the printed matter is a document. However, for example, a symbol or page number indicating a language is printed in advance on a printed matter, and the character recognition processing unit 2011 recognizes the symbol or number. Thus, the language and page may be specified.

また、上記の説明においては、スキャナ３１から検索装置３０が受け取った画像データに対しフィルタ処理を行うことにより、文書部分を示す画像データと注記部分を示す画像データを分離するものとして説明したが、文書部分と注記部分を分離する方法はフィルタを用いるものに限られない。例えば、検索装置３０は文書データを記憶していることから、スキャナ３１から受け取った画像データと文書の各ページの表示データとの間でマッチングを行い、受け取った画像データがいずれの文書のいずれのページに対応するものかを特定することができる。そこで、検索装置３０はそのように特定したページの表示データを文書部分を示す画像データとして用い、スキャナ３１から受信した画像データと特定したページの表示データとの差分を注記部分を示す画像データとしてもよい。その場合、検索装置３０は文書部分を示す画像データに対し文字認識処理を行う必要はなく、文書データの対応ページ部分を直接用いることができる。 In the above description, the image data received by the search device 30 from the scanner 31 is filtered to separate the image data indicating the document portion and the image data indicating the note portion. The method for separating the document portion and the note portion is not limited to using a filter. For example, since the search device 30 stores document data, matching is performed between the image data received from the scanner 31 and the display data of each page of the document, and the received image data corresponds to any document in any document. You can specify whether it corresponds to a page. Therefore, the search device 30 uses the display data of the specified page as image data indicating the document portion, and uses the difference between the image data received from the scanner 31 and the display data of the specified page as image data indicating the note portion. Also good. In this case, the search device 30 does not need to perform character recognition processing on the image data indicating the document portion, and can directly use the corresponding page portion of the document data.

また、上記の説明においては、検索結果の表示において、同じ言語の文書に対し異なる参加者が書き込んだ注記が区別されることがないものとして説明したが、例えば予め印刷物に配布先の参加者のＩＤや氏名等を印字しておき、文字認識処理部２０１１によりそれらのＩＤや氏名等を認識した後、その内容を検索結果に表示させることにより、ユーザが各々の注記を書き込んだ参加者を特定可能としてもよい。 In the above description, it has been described that in the display of the search result, notes written by different participants on the document in the same language are not distinguished. The ID and name are printed, and after the ID and name are recognized by the character recognition processing unit 2011, the contents are displayed in the search result, thereby identifying the participant to whom the user wrote each note. It may be possible.

以上のように、検索システム３によれば、ユーザは同じ文書について異なる言語に関する版が存在するような場合において、各言語に関し検索操作を繰り返すことなく、複数言語により書き込まれた注記の表示を伴う検索結果を得ることができ、便利である。 As described above, according to the search system 3, the user is accompanied by display of notes written in a plurality of languages without repeating the search operation for each language when there are versions related to different languages for the same document. Search results can be obtained, which is convenient.

［４．変形例］
本発明は上述の実施形態に限定されるものではなく、種々の変形実施が可能である。例えば、第１実施形態ないし第３実施形態に含まれる分離可能な技術的要素を任意に組み合わせることにより、同様の結果をもたらす装置もしくはシステムを構築するようにしてもよい。例えば、第１実施形態にかかる検索装置を複数、第２実施形態におけるようにネットワークを介して相互接続させることにより、索引データの共有を可能としてもよい。 [4. Modified example]
The present invention is not limited to the above-described embodiment, and various modifications can be made. For example, an apparatus or a system that provides the same result may be constructed by arbitrarily combining separable technical elements included in the first to third embodiments. For example, the index data may be shared by interconnecting a plurality of search devices according to the first embodiment via a network as in the second embodiment.

また、上述の実施形態において示される各装置の構成部の配置は任意に変更可能である。例えば、第３実施形態にかかる検索システムを、検索装置にスキャナを内蔵させることにより実現したり、検索装置が変換部を備える代わりに同様の機能を有する翻訳装置を検索装置に接続することにより検索システムを実現したりしてもよい。 Moreover, arrangement | positioning of the structure part of each apparatus shown in the above-mentioned embodiment can be changed arbitrarily. For example, the search system according to the third embodiment can be realized by incorporating a scanner in the search device, or a search device can be searched by connecting a translation device having a similar function to the search device instead of including a conversion unit. A system may be realized.

第１実施形態にかかる検索装置の構成を示したブロック図である。It is the block diagram which showed the structure of the search device concerning 1st Embodiment. 第１実施形態にかかる文書および注記の表示された様子を例示した図である。It is the figure which illustrated the mode that the document concerning 1st Embodiment and the note were displayed. 第１実施形態にかかる注記ＤＢの内容を例示した図である。It is the figure which illustrated the contents of note DB concerning a 1st embodiment. 第１実施形態にかかるキーワードＤＢの内容を例示した図である。It is the figure which illustrated the contents of keyword DB concerning a 1st embodiment. 第１実施形態にかかる索引ＤＢの内容を例示した図である。It is the figure which illustrated the contents of index DB concerning a 1st embodiment. 第１実施形態にかかる検索画面を例示した図である。It is the figure which illustrated the search screen concerning 1st Embodiment. 第２実施形態にかかる検索システムの構成を示したブロック図である。It is the block diagram which showed the structure of the search system concerning 2nd Embodiment. 第２実施形態にかかる文書および注記の表示された様子を例示した図である。It is the figure which illustrated the mode that the document concerning 2nd Embodiment and the note were displayed. 第２実施形態にかかる注記ＤＢの内容を例示した図である。It is the figure which illustrated the contents of note DB concerning a 2nd embodiment. 第２実施形態にかかる索引ＤＢの内容を例示した図である。It is the figure which illustrated the contents of index DB concerning a 2nd embodiment. 第２実施形態にかかる注記索引ＤＢの内容を例示した図である。It is the figure which illustrated the contents of note index DB concerning a 2nd embodiment. 第２実施形態にかかる注記ＤＢの内容を例示した図である。It is the figure which illustrated the contents of note DB concerning a 2nd embodiment. 第２実施形態にかかる索引ＤＢの内容を例示した図である。It is the figure which illustrated the contents of index DB concerning a 2nd embodiment. 第２実施形態にかかる注記索引ＤＢの内容を例示した図である。It is the figure which illustrated the contents of note index DB concerning a 2nd embodiment. 第２実施形態にかかる検索画面を例示した図である。It is the figure which illustrated the search screen concerning 2nd Embodiment. 第２実施形態にかかる検索画面を例示した図である。It is the figure which illustrated the search screen concerning 2nd Embodiment. 第３実施形態にかかる検索システムの構成を示したブロック図である。It is the block diagram which showed the structure of the search system concerning 3rd Embodiment. 第３実施形態にかかる文書間対応ＤＢの内容を例示した図である。It is the figure which illustrated the contents of DB corresponding to documents concerning a 3rd embodiment. 第３実施形態にかかる検索画面を例示した図である。It is the figure which illustrated the search screen concerning 3rd Embodiment.

Explanation of symbols

２・３…検索システム、１０・３０…検索装置、２０…タブレットＰＣ、２１…サーバ装置、２２…ネットワーク、３１…スキャナ、１０１・２１１…制御部、１０２・２１２…記憶部、１０３…表示部、１０４…操作部、２０５・２１３…通信部、１０１１…注記付加部、１０１２…特定部、１０１３…第１検索部、１０１４…索引データ生成部、１０１５…第２検索部、１０２１…文書データ、１０２２・２１２２…注記ＤＢ、１０２３…キーワードＤＢ、１０２４・２１２３…索引ＤＢ、２０１１…文字認識処理部、２０１２…変換部、２０２１・２１２４…注記索引ＤＢ、２１１１…登録部、２１１２…検索部、２１２１・３０２１…文書データ群、３０１１…画像データ分離部、３０２２…文書間対応ＤＢ、３０２３…注記ＤＢ群、３０２４…キーワードＤＢ群、３０２５…索引ＤＢ群。 2.3 ... Search system, 10.30 ... Search device, 20 ... Tablet PC, 21 ... Server device, 22 ... Network, 31 ... Scanner, 101, 211 ... Control unit, 102/212 ... Storage unit, 103 ... Display unit , 104 ... operation unit, 205 213 ... communication unit, 1011 ... note addition unit, 1012 ... identification unit, 1013 ... first search unit, 1014 ... index data generation unit, 1015 ... second search unit, 1021 ... document data, 1022 · 2122 ... Note DB, 1023 ... Keyword DB, 1024 · 2123 ... Index DB, 2011 ... Character Recognition Processing Unit, 2012 ... Conversion Unit, 2021 · 2124 ... Note Index DB, 2111 ... Registration Unit, 2112 ... Search Unit, 2121 3021 ... Document data group, 3011 ... Image data separation unit, 3022 ... Inter-document correspondence DB, 3023 ... Note DB group 3024 ... keyword DB group, 3025 ... index DB group.

Claims

Storage means for storing document data indicating one document, and one or more note data each indicating a note associated with any partial document constituting the one document;
Index data generating means for generating index data indicating a position where a phrase that matches one phrase included in the partial document associated with the note indicated by any one of the one or more annotation data is included in the one document An index data generating device comprising:

A specifying unit for specifying a partial document with one note from the one document;
The index data generation apparatus according to claim 1, wherein the storage unit stores note data indicating the one note as note data associated with the partial document specified by the specifying unit.

The said specifying means performs the said specification based on the spatial positional relationship when the document shown by the said document data and the note shown by the said note data are displayed on the same plane. Index data generator.

Conversion means for converting a phrase expressed in one language into a translation expressed in another language having the same meaning as the phrase;
The index data generation device according to claim 1, wherein the index data generation unit associates a translated word obtained from the one word / phrase by the conversion unit with the search data.

The storage means stores in advance keyword data indicating a keyword,
The index data generation device according to claim 1, wherein the index data generation unit uses the keyword as the one phrase.

The index data generating means indicates a position where a phrase that matches one phrase included in the note indicated by any one of the one or more note data is included in the note indicated by any of the one or more note data. The index data generation device according to claim 1, wherein the index data is generated.

Display means for displaying a document indicated by the document data;
Handwriting data generating means for generating handwriting data indicating handwriting according to a user's writing operation,
The index data generation device according to claim 1, wherein the storage unit stores the handwriting data generated by the handwriting data generation unit as the annotation data.

Display means for displaying a document indicated by the document data;
Handwriting data generating means for generating handwriting data indicating handwriting according to a user's writing operation;
Character recognition processing means for performing character recognition processing on the handwriting data generated by the handwriting data generation means, and generating text data indicating the recognized character string, and
The index data generation apparatus according to claim 1, wherein the storage unit stores the text data generated by the character recognition processing unit as the annotation data.

An image reading means for optically reading an image from a document including a document described in type and one or more notes written by hand on the document, and generating image data indicating the read image;
Image data separation means for generating image data indicating the document written in the type and one or more image data each indicating one or more notes written by handwriting from the image data generated by the image reading means; ,
Character recognition processing means for performing character recognition processing on the image data indicating the document generated by the image data separation means, and generating text data indicating the content of the one document, and
The storage means stores text data indicating the content of the one document generated by the character recognition processing means as the document data, and at least one of the handwritten information generated by the image data separation means The index data generation device according to claim 1, wherein one or more pieces of image data indicating each of the notes are stored as the one or more pieces of annotation data.

An image reading means for optically reading an image from a document including a document described in type and one or more notes written by hand on the document, and generating image data indicating the read image;
Image data separation means for generating image data indicating the document written in the type and one or more image data each indicating one or more notes written by handwriting from the image data generated by the image reading means; ,
The image data indicating the document generated by the image data separating unit is subjected to character recognition processing to generate text data indicating the contents of the one document, and the one or more generated by the image data separating unit Character recognition processing means for performing character recognition processing on each of the one or more image data each indicating a note and generating one or more text data indicating the contents of the one or more notes;
The storage means stores text data indicating the contents of the one document generated by the character recognition processing means as the document data, and stores the contents of the one or more notes generated by the character recognition processing means. The index data generation device according to claim 1, wherein one or more pieces of text data indicated are stored as the one or more annotation data.

Document data indicating one document, one or more note data each indicating a note associated with any partial document constituting the one document, and a phrase that matches the keyword with respect to each of the one or more keywords Storage means for storing index data indicating a position included in the one document;
An input means for receiving keyword data indicating one keyword;
Search means for searching for one or more partial documents including the one keyword from a plurality of partial documents constituting the one document based on the index data;
A data search apparatus comprising: output means for outputting note data indicating a note associated with the partial document searched by the search means.

Conversion means for converting a phrase expressed in one language into a translation expressed in another language having the same meaning as the phrase;
The data search apparatus according to claim 11, wherein the search means performs the search using a translation obtained from the one keyword by the conversion means instead of the one keyword.

First document data representing a first document expressed in one language, and one or more first annotation data each representing a note associated with one of the partial documents constituting the first document And index data indicating a position in the first document that includes a phrase that matches the keyword for each of the one or more keywords, and a second language expressed in another language having the same meaning as the first document. Storage means for storing second document data indicating two documents and one or more second note data each indicating a note associated with any partial document constituting the second document;
An input means for receiving keyword data indicating one keyword;
Search means for searching one or more partial documents including the one keyword from a plurality of partial documents constituting the first document based on the index data;
Corresponding to annotation data indicating notes associated with the partial document searched by the searching means, and partial documents constituting the second document and corresponding to the partial documents searched by the searching means An output means for outputting note data indicating the attached note.

Processing for storing document data indicating one document and one or more note data each indicating a note associated with any partial document constituting the one document;
A process of generating index data indicating a position where a phrase that matches one phrase included in the partial document associated with the note indicated by any one of the one or more annotation data is included in the one document; A program characterized by being executed.

Document data indicating one document, one or more note data each indicating a note associated with any partial document constituting the one document, and a phrase that matches the keyword with respect to each of the one or more keywords Storing index data indicating a position included in the one document;
Processing to receive keyword data indicating one keyword,
A process of searching for one or more partial documents including the one keyword from a plurality of partial documents constituting the one document based on the index data;
A program for causing a computer to execute processing for outputting note data indicating a note associated with a partial document searched in the search.