JP2006252167A

JP2006252167A - Document processing device

Info

Publication number: JP2006252167A
Application number: JP2005067541A
Authority: JP
Inventors: Masayoshi Sakakibara; 正義榊原; Shoichi Tateno; 昌一舘野; Kei Tanaka; 圭田中; Kotaro Nakamura; 浩太郎中村; Takashi Nagao; 隆長尾; Shinu Ho; 新宇彭; Teruka Saito; 照花斎藤; Toshiya Koyama; 俊哉小山
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-03-10
Filing date: 2005-03-10
Publication date: 2006-09-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mechanism capable of achieving favorable reading comprehension of a document without making a reader refer to annoying dictionary reference. <P>SOLUTION: When an annotation is added to a word appearing in a Chinese document, and the document is scanned, a description indicating either one of pronunciation, meaning, and grammar/syntax of the word added with the annotation is presented. Each pattern of adding annotation is preliminarily associated with pronunciation, meaning, and grammar/syntax, and the type of description to be presented is uniquely specified in response to a pattern of adding annotation obtained by analyzing a document. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文書中に含まれる語句の解説を提示する技術に関する。 The present invention relates to a technique for presenting explanations of words included in a document.

日本語、或いは外国語で記された文書の効率的な読解を支援する種々の技術がこれまでに提案されている。例えば、特許文献１には、ユーザに対して複雑な操作を強いることなく、外国語文書の読解を迅速に支援しうる読解支援装置の開示がある。特許文献２には、紙の文書を所定のハイパーテキストが埋め込まれたデジタル文書へ変換することで、コンピュータ等のデジタルの世界と紙の文書との間に情報の連続性・関連性を構築する技術の開示がある。特許文献３には、ユーザが文章をコンピュータのモニタ上で読む際に、その文章に関してユーザが知らないであろう単語や熟語をそのユーザの読解レベルに応じ自動的に辞書引きして提示する技術の開示がある。
特開平８−０９５９７８号公報特開平１０−２８３３６４号公報特開２０００−０８９８８２号公報 Various techniques have been proposed so far that support efficient reading of documents written in Japanese or foreign languages. For example, Patent Literature 1 discloses a reading support device that can quickly support reading a foreign language document without forcing the user to perform complicated operations. In Patent Document 2, a paper document is converted into a digital document in which a predetermined hypertext is embedded, thereby constructing continuity / relevance of information between a digital world such as a computer and a paper document. There is a technical disclosure. In Patent Document 3, when a user reads a sentence on a computer monitor, a word or phrase that the user may not know about the sentence is automatically dictionary-drawn according to the reading level of the user and presented. There is a disclosure.
JP-A-8-095978 JP-A-10-283364 JP 2000-089882 A

ところで、文書、特に、外国語で記された文書の読解は、文書中に現れる辞書引きの必要な語句に適宜マークを付しながら一通り読んだ後、マークを付しておいた語句の意味を辞書で調べてから改めて読み直すといった手順で行われるのが一般的である。そして、このような手順に従って読解を行う場合は、語句の発音が分からないところに直線でマークを追記する一方で意味の分からないところには波線でマークを追記するといったように、後から調べるべき内容の種類に応じてマークの追記態様を使い分けるとより効率的である。
しかしながら、かかる手順で文書を読解する場合、マークを付しておいた語句の辞書引きが煩わしくなるという問題があった。
本発明は、このような背景の下に案出されたものであり、煩わしい辞書引き作業を読者に強いることなく、文書の良好な読解を実現できる仕組みを提供することを目的とする。 By the way, in reading a document, especially a document written in a foreign language, the meaning of the word / phrase that has been marked after reading through the document with appropriate marks on the words / phrases that need dictionary lookup appearing in the document. Generally, this is done by a procedure such as searching for a dictionary and then rereading it. And when reading in accordance with such a procedure, you should check later, such as adding a mark with a straight line where you do not know the pronunciation of a word, but adding a mark with a wavy line if you do not understand the meaning. It is more efficient to use different mark appending modes depending on the type of content.
However, when the document is read and read in such a procedure, there is a problem that the dictionary lookup of the marked words becomes troublesome.
The present invention has been devised under such a background, and an object of the present invention is to provide a mechanism that can realize a good reading of a document without forcing the reader to perform a troublesome dictionary lookup operation.

本発明の好適な態様である文書処理装置は、１つの語句に関する数種の解説情報と文書へのアノテーションの追記態様のパターンの各対を纏めた語句別解説情報群を、文書中に出現し得る各語句毎に記憶した解説情報記憶手段と、アノテーションが追記された一又は複数の語句を含む文書を表すビットマップを入力する入力手段と、前記入力手段からビットマップが入力されると、そのビットマップが表す文書からアノテーションを抽出する抽出手段と、前記抽出したアノテーションが追記されている語句を前記ビットマップから特定する対象語句特定手段と、前記特定された語句と対応付けて前記解説情報記憶手段に記憶されている語句別解説情報群を特定する解説情報群特定手段と、前記抽出されたアノテーションを解析することによってそのアノテーションが該当する追記態様のパターンを識別し、識別したパターンと対を成す解説情報を前記特定された語句別解説情報群から取得する解説取得手段と、前記取得された解説情報を出力する解説出力手段とを備える。 A document processing apparatus according to a preferred aspect of the present invention, a word-by-word commentary information group in which a pair of several kinds of commentary information relating to one word and a pattern of an annotation addition to a document is collected appears in the document. Comment information storage means stored for each word to be obtained, input means for inputting a bitmap representing a document including one or more words with annotations added thereto, and when a bitmap is input from the input means, Extraction means for extracting an annotation from a document represented by a bitmap, target phrase specifying means for specifying a phrase to which the extracted annotation is added from the bitmap, and the explanation information storage in association with the specified phrase Comment information group specifying means for specifying the comment information group for each phrase stored in the means, and analyzing the extracted annotation. Commentary acquisition means for identifying the pattern of the additional writing mode to which the annotation corresponds, and acquiring commentary information paired with the identified pattern from the specified commentary group by phrase, and commentary for outputting the acquired commentary information Output means.

この態様において、前記対象語句特定手段は、前記抽出したアノテーションの周辺にある前記ビットマップ上の画像を切り出し、切り出した画像に文字認識を施すことにより、当該アノテーションが追記されている語句を文字又は文字列として特定するようにしてもよい。 In this aspect, the target word specifying unit cuts out an image on the bitmap around the extracted annotation, and performs character recognition on the cut-out image so that the word or phrase to which the annotation is additionally written It may be specified as a character string.

また、前記入力されたビットマップからアノテーションを消去するアノテーション消去手段を更に備えてもよい。 An annotation erasing unit for erasing the annotation from the input bitmap may be further provided.

前記解説情報記憶手段にて各種解説情報と対応付けられた追記態様のパターンと、それら各パターンと対応する解説の種別との関係を表す解説凡例画像を生成し、生成した解説凡例画像を前記アノテーションの画像を消去して得られた新たなビットマップにおける所定の描画位置に上書きする凡例生成手段を更に備えてもよい。 A commentary legend image representing the relationship between the pattern of the additional writing mode associated with various commentary information in the commentary information storage means and the type of commentary corresponding to each pattern is generated, and the generated commentary legend image is the annotation Legend generation means for overwriting a predetermined drawing position in a new bitmap obtained by erasing the image may be further provided.

本発明の別の好適な態様である文書処理装置は、１つの語句に関する数種の解説情報と文書へのアノテーションの追記態様のパターンの各対を纏めた語句別解説情報群を、文書中に出現し得る各語句毎に記憶した解説情報記憶手段と、文書の文字コード列を入力する入力手段と、前記入力された文字コード列を文書を表す文字列として表示する表示手段と、前記表示された文字列の一部にアノテーションを追記するアノテーション追記手段と、前記追記されたアノテーションを抽出する抽出手段と、前記抽出したアノテーションが追記されている文字又は文字の纏まりを語句として特定する対象語句特定手段と、前記特定された語句と対応付けて前記解説情報記憶手段に記憶されている語句別解説情報群を特定する解説情報群特定手段と、前記抽出されたアノテーションを解析することによってそのアノテーションが該当する追記態様のパターンを識別し、識別したパターンと対を成す解説情報を前記特定された語句別解説情報群から取得する解説取得手段と、前記取得された解説情報を出力する解説出力手段とを備える。 In another preferred aspect of the present invention, a document processing apparatus includes, in a document, a description information group by phrase that summarizes each pair of several types of description information related to one word and a pattern of an annotation addition form to the document. Explanation information storage means stored for each word that may appear, input means for inputting a character code string of a document, display means for displaying the input character code string as a character string representing a document, and the display Annotation adding means for adding an annotation to a part of the character string, extracting means for extracting the added annotation, and target phrase specification for specifying a character or a group of characters to which the extracted annotation is added as a phrase A comment information group specifying means for specifying a phrase-specific comment information group stored in the comment information storage means in association with the specified word; Analyzing the extracted annotation to identify the pattern of the additional writing mode to which the annotation corresponds, commentary acquisition means for acquiring commentary information paired with the identified pattern from the specified commentary group by phrase, and Comment output means for outputting the acquired comment information.

本発明によると、煩わしい辞書引き作業を読者に強いることなく、文書の良好な読解を実現できる。 According to the present invention, it is possible to realize good reading of a document without forcing the reader to perform a troublesome dictionary lookup operation.

（第１実施形態）
本願発明の第１実施形態について説明する。
本実施形態は、以下に示す２つの特徴を有している。１つ目の特徴は、読解対象となる中国語の文書に現れる語句にアノテーションを追記してその文書をスキャンすると、アノテーションが追記された語句の「読み」、「意味」、及び「文法・構文」のいずれかを表す解説が直ちに提示されるようにした点である。２つ目の特徴は、アノテーションの追記のされ方の各パターンを、「読み」、「意味」、「文法・構文」の各々と予め対応付けておき、文書を解析して得たアノテーションの追記のされ方に応じて提示すべき解説の種別を一意に特定するようにした点である。
以降の説明において、「解説情報」とは、ある語句に関する、「読み」、「意味」、又は「文法・構文」のいずれか１つの解説を表した情報の各々を意味する。 (First embodiment)
A first embodiment of the present invention will be described.
This embodiment has the following two features. The first feature is that when an annotation is added to a word or phrase appearing in a Chinese document to be read and the document is scanned, “reading”, “meaning”, and “grammar / syntax” The explanation that expresses any of the above is to be presented immediately. The second feature is that each pattern of how annotations are added is associated with “reading”, “meaning”, and “grammar / syntax” in advance, and annotations obtained by analyzing the document are added. The type of commentary to be presented according to how it is done is uniquely specified.
In the following description, “explanatory information” means each piece of information representing any one of “reading”, “meaning”, or “grammar / syntax” regarding a certain phrase.

図１は、本実施形態にかかる文書処理装置のハードウェア構成を示すブロック図である。図に示すように、本装置は、解説情報記憶部１０、文書画像入力部１１、アノテーション抽出部１２、対象語句特定部１３、文字認識部１４、解説取得部１５、及び解説出力部１６を備える。
解説情報記憶部１０は、解説データベース１０ａと追記パターンデータベース１０ｂの２つのデータベースを有している。
図２は、解説データベース１０ａのデータ構造図である。このデータベースは、「語句」、「読み」、「意味」、及び「文法・構文」の４つのフィールドを夫々有する複数のレコードを集めてなる。「語句」のフィールドには、文書中に出現しうる各語句を表す文字コードが夫々記憶される。「読み」、「意味」、「文法・構文」の３つのフィールドには、各種の解説情報が記憶される。具体的には、語句の読みを表す解説情報が「読み」のフィールドへ、語句の意味を表す解説情報が「意味」のフィールドへ、文法と構文を表す解説情報が「文法・構文」のフィールドへ夫々記憶される。 FIG. 1 is a block diagram showing a hardware configuration of the document processing apparatus according to the present embodiment. As shown in the figure, the apparatus includes a comment information storage unit 10, a document image input unit 11, an annotation extraction unit 12, a target phrase specifying unit 13, a character recognition unit 14, a comment acquisition unit 15, and a comment output unit 16. .
The comment information storage unit 10 has two databases, a comment database 10a and a write-once pattern database 10b.
FIG. 2 is a data structure diagram of the explanation database 10a. This database is a collection of a plurality of records each having four fields of “word / phrase”, “reading”, “meaning”, and “grammar / syntax”. In the “word / phrase” field, a character code representing each word / phrase that can appear in the document is stored. In the three fields of “reading”, “meaning”, and “grammar / syntax”, various explanatory information is stored. Specifically, the commentary information indicating the reading of the phrase is in the “reading” field, the commentary information indicating the meaning of the phrase is in the “meaning” field, and the description information indicating the grammar and syntax is in the “grammar / syntax” field. Each is remembered.

図３は、追記パターンデータベース１０ｂのデータ構造図である。このデータベースは、「解説種類」と「アノテーション追記態様」の２つのフィールドを夫々有する３つのレコードを集めてなる。各レコードの「解説種類」のフィールドには、解説種別識別子が記憶される。解説種別識別子は、「読み」、「意味」、「文法・構文」の３種類の解説を夫々表す識別子である。「追記態様」のフィールドには、追記態様識別子が記憶される。追記態様識別子は、アノテーションの追記のされ方のパターンを夫々表す識別子である。追記のされ方のパターンは、対象となる語句を三角で囲む追記の仕方である「三角」、対象となる語句をまるで囲む追記の仕方である「丸」、対象となる語句の下に線を引く追記の仕方である「下線」の３つがある。図３を参照すると、このデータベースにおいては、「読み」の解説が「三角」のパターンと、「意味」の解説が「丸」のパターンと、「文法・構文」の解説が「下線」のパターンと夫々対応付けられていることが分かる。 FIG. 3 is a data structure diagram of the postscript pattern database 10b. This database is a collection of three records each having two fields of “explanation type” and “annotation additional writing mode”. An explanation type identifier is stored in the “explanation type” field of each record. The commentary type identifier is an identifier that represents three types of commentary: “reading”, “meaning”, and “grammar / syntax”. In the “additional writing mode” field, an additional writing mode identifier is stored. The additional recording mode identifier is an identifier that represents a pattern of how an annotation is added. The pattern of additional writing is as follows: “triangle”, which is the method of adding the target word with a triangle, “circle”, which is the way of adding the target word, and a line under the target word. There are three "underline" that is the method of additional writing. Referring to FIG. 3, in this database, the explanation of “reading” is a pattern of “triangle”, the explanation of “meaning” is a pattern of “circle”, and the explanation of “grammar / syntax” is a pattern of “underline”. It can be seen that they are associated with each other.

次に、図１に示す各部の機能の概要について説明する。本装置の文書画像入力部１１からは、文書画像データが入力される。この文書画像データは、アノテーションが追記された一又は複数の語句を含む文書を表すビットマップデータである。アノテーション抽出部１２は、文書画像データからアノテーションを抽出する。対象語句特定部１３は、アノテーション抽出部１２が抽出したアノテーションが追記されている語句の画像を文書画像データから特定する。文字認識部１４は、対象語句特定部１３が特定した語句を文字コード化する。解説取得部１５は、アノテーション抽出部１２が抽出したアノテーションが該当する追記態様のパターンと文字認識部１４によって文字コード化された語句に対応する解説情報を解説情報記憶部１０から取得する。取得された解説情報は解説出力部１６によって所定の用紙へ印字される。 Next, an overview of the function of each unit shown in FIG. 1 will be described. Document image data is input from the document image input unit 11 of the apparatus. This document image data is bitmap data representing a document including one or more words / phrases to which annotations are added. The annotation extraction unit 12 extracts an annotation from the document image data. The target phrase specifying unit 13 specifies, from the document image data, an image of the phrase to which the annotation extracted by the annotation extracting unit 12 is added. The character recognition unit 14 converts the phrase specified by the target phrase specifying unit 13 into a character code. The comment acquisition unit 15 acquires, from the comment information storage unit 10, the comment information corresponding to the pattern of the additional writing mode to which the annotation extracted by the annotation extraction unit 12 corresponds and the word / phrase encoded by the character recognition unit 14. The acquired comment information is printed on a predetermined sheet by the comment output unit 16.

図４は、本装置の動作を示すフローチャートである。
図に示すステップ１００では、文書画像入力部１１から文書画像データが入力される。入力された文書画像データは図示しないビットマップ用メモリに記憶される。
ビットマップ用メモリに文書画像データが記憶されると、アノテーション抽出部１２は、その文書データに記されているアノテーションの１つを抽出する（Ｓ１１０）。 FIG. 4 is a flowchart showing the operation of the present apparatus.
In step 100 shown in the figure, document image data is input from the document image input unit 11. The input document image data is stored in a bitmap memory (not shown).
When the document image data is stored in the bitmap memory, the annotation extraction unit 12 extracts one of the annotations described in the document data (S110).

アノテーションが抽出されると、対象語句特定部１３は、そのアノテーションが追記されている語句の画像をビットマップ用メモリの文書画像データから切り出す（Ｓ１２０）。続いて、文字認識部１４は、ステップ１１０にて切り出された画像に対して文字認識を試み、その認識によって得られた語句を表す文字コードを図示しない認識結果用メモリに記憶する（Ｓ１３０）。
語句の文字コードが認識結果用メモリに記憶されると、解説取得部１５は、ステップ１１０で抽出されたアノテーションを解析することによって、そのアノテーションの追記のされ方が、「三角」、「丸」、「下線」の何れのパターンに該当するか識別する（Ｓ１４０）。 When the annotation is extracted, the target phrase specifying unit 13 cuts out the image of the phrase in which the annotation has been added from the document image data in the bitmap memory (S120). Subsequently, the character recognizing unit 14 attempts character recognition on the image cut out in step 110, and stores a character code representing a word obtained by the recognition in a recognition result memory (not shown) (S130).
When the character code of the phrase is stored in the recognition result memory, the comment acquisition unit 15 analyzes the annotation extracted in step 110, so that the annotation is added as “triangle” or “circle”. , “Underline” pattern is identified (S140).

続いて、解説取得部１５は、ステップ１４０で識別したパターンと対応付けられた種別識別子を追記パターンデータベース１０ｂから読み出す（Ｓ１５０）。
更に、解説取得部１５は、認識結果用メモリに記憶されている文字コードが表す語句と対応するレコードを解説データベース１０ａから特定し、特定したレコードに記憶されている３種類の解説情報のうち、ステップ１５０で読み出した種別識別子と対応する解説情報を取得する（Ｓ１６０）。 Subsequently, the comment acquisition unit 15 reads the type identifier associated with the pattern identified in step 140 from the additional write pattern database 10b (S150).
Further, the comment acquisition unit 15 identifies a record corresponding to the phrase represented by the character code stored in the recognition result memory from the comment database 10a, and among the three types of comment information stored in the identified record, The comment information corresponding to the type identifier read in step 150 is acquired (S160).

解説取得部１５は、ステップ１６０で取得した解説情報を、図示しないページメモリに記憶する（Ｓ１７０）。
ステップ１７０を実行すると、未だ抽出されていないアノテーションがビットマップ用メモリの文書画像データに残っているか否か判断される。そして、抽出されていないアノテーションが残っているときは、ステップ１１０に戻って以降の処理を繰返す。一方、全てのアノテーションが抽出されたときは、ステップ１８０に進み、解説出力部１６が、ページメモリに記憶されている各解説情報をビットマップ用メモリの文書画像データへ重ね合わせて得た解説付き文書を用紙に印字する。 The comment acquisition unit 15 stores the comment information acquired in step 160 in a page memory (not shown) (S170).
When step 170 is executed, it is determined whether annotations that have not yet been extracted remain in the document image data in the bitmap memory. If annotations that have not been extracted remain, the process returns to step 110 and the subsequent processing is repeated. On the other hand, when all the annotations have been extracted, the process proceeds to step 180, and the comment output unit 16 adds the comment information obtained by superimposing the comment information stored in the page memory on the document image data in the bitmap memory. Print the document on paper.

ここで、読解対象となる中国語の文書画像と解説つき文書の関係について、具体的な文書例を挙げて説明する。
図５（Ａ）は、読解対象となる中国語の文書画像であり、図５（Ｂ）は、本装置による処理を経て得られる解説付き文書である。
図５（Ａ）を参照すると、最上段の文の語句にはアノテーション２１が、上から２行目の文の語句にはアノテーション２２が、３行目の文の語句にはアノテーション２３が夫々追記されている。一方、図５（Ｂ）を参照すると、アノテーション２１が追記された語句には、「nan（第２声）」という読みを表す解説情報が、アノテーション２２が追記された語句には、「〜だと思う」という意味を表す解説情報が対応付けられており、アノテーション２３が追記された語句には、「「越［Ａ］越［Ｂ］」という構文は「［Ａ］すればするほど［Ｂ］」を表すのに使われます。」という構文を表す解説情報が対応付けられていることが分かる。これは、アノテーション２１の追記のされ方が「三角」のパターンに、アノテーション２２の追記のされ方が「丸」のパターンに、アノテーション２３の追記のされ方が「下線」のパターンに夫々該当すると判断されたことを意味している。 Here, the relationship between a Chinese document image to be read and a document with explanation will be described with a specific document example.
FIG. 5A shows a Chinese document image to be read, and FIG. 5B shows a document with a comment obtained through processing by this apparatus.
Referring to FIG. 5A, the annotation 21 is added to the words in the top sentence, the annotation 22 is added to the words in the second sentence from the top, and the annotation 23 is added to the words in the sentence on the third line. Has been. On the other hand, referring to FIG. 5 (B), commentary information indicating the reading “nan (second voice)” is added to the phrase with the annotation 21 added, and “˜da” is added to the phrase with the annotation 22 added. Commentary information representing the meaning of “I think” is associated, and the phrase “annotation 23” is added to the phrase with “23 [A] Yue [B]”. ] "To indicate. It can be seen that commentary information representing the syntax “is associated. This means that the annotation 21 is added to the “triangle” pattern, the annotation 22 is added to the “circle” pattern, and the annotation 23 is added to the “underline” pattern. It means that it was judged.

以上説明した本実施形態では、読解対象となる中国語の文書に現れる語句にアノテーションを追記してその文書をスキャンすると、アノテーションの追記のされ方のパターンに応じて、「読み」、「意味」、及び「文法・構文」のいずれかを表す解説が直ちに提示される。従って、文書を読解するユーザは、所望の解説を適宜取得しながらその読解を進めて行くことができる。 In the present embodiment described above, when an annotation is added to a word or phrase appearing in a Chinese document to be read and the document is scanned, “reading” and “meaning” are determined according to the pattern of how the annotation is added. , And explanations representing either "grammar / syntax" are immediately presented. Therefore, a user who reads and understands a document can proceed with reading and understanding a desired explanation as appropriate.

（第２実施形態）
本願発明の第２実施形態について説明する。
上記実施形態において、読解対象となる中国語の文書はビットマップとして入力されることになっており、そのビットマップ上の文字を認識することによって語句が特定されるようになっていた。これに対し、本実施形態は、読解対象となる文書を文字コード列のデータ形式で入力することにより、文字認識を行わない構成とした。 (Second Embodiment)
A second embodiment of the present invention will be described.
In the above embodiment, a Chinese document to be read is input as a bitmap, and a phrase is specified by recognizing a character on the bitmap. On the other hand, this embodiment has a configuration in which character recognition is not performed by inputting a document to be read in a character code string data format.

図６は、本実施形態にかかる文書処理装置のハードウェア構成を示すブロック図である。図に示すように、本装置は、解説情報記憶部１０、文書データ入力部１７、表示部１８、アノテーション追記部１９、アノテーション抽出部１２、対象語句特定部１３、解説取得部１５、及び解説出力部１６を備える。第１実施形態と異なり、文字認識部１４は備えていない。
解説情報記憶部１０は、解説データベース１０ａと追記パターンデータベース１０ｂを有しており、両データベースのデータ構造は第１実施形態と同様である。 FIG. 6 is a block diagram showing a hardware configuration of the document processing apparatus according to the present embodiment. As shown in the figure, the present apparatus includes an explanation information storage unit 10, a document data input unit 17, a display unit 18, an annotation addition unit 19, an annotation extraction unit 12, a target phrase specifying unit 13, an explanation acquisition unit 15, and an explanation output. The unit 16 is provided. Unlike the first embodiment, the character recognition unit 14 is not provided.
The comment information storage unit 10 has a comment database 10a and a write-once pattern database 10b, and the data structures of both databases are the same as those in the first embodiment.

次に、図６に示す各部の機能の概要について説明する。本装置の文書データ入力部１７からは、文書データが入力される。この文書データは、中国語の文書を文字コード列として表したものである。表示部１８は、表示デバイスと入力デバイスを兼ねるタッチディスプレイであり、文書データ入力部１７を介して入力される文書データを、文書を表す文字列として表示させる。アノテーション追記部１９は、スタイラスペンであり、表示デバイス上の任意の文字又は文字の纏まりにアノテーションを追記する。アノテーション抽出部１２は、表示部１８上に追記されたアノテーションを抽出する。対象語句特定部１３は、アノテーションが追記された文字又は文字の纏まりの文字コードを語句として特定する。解説取得部１５は、アノテーション抽出部１２が抽出したアノテーションが該当する追記態様のパターンと対象語句特定部１３が特定した語句に対応する解説情報を解説情報記憶部１０から取得する。解説出力部１６は、解説取得部１５が取得した解説情報をポップアップとして表示部１８に表示させる。 Next, an overview of the function of each unit shown in FIG. 6 will be described. Document data is input from the document data input unit 17 of the apparatus. This document data represents a Chinese document as a character code string. The display unit 18 is a touch display that serves as both a display device and an input device, and displays document data input via the document data input unit 17 as a character string representing a document. The annotation appending unit 19 is a stylus pen, and appends an annotation to an arbitrary character or a group of characters on the display device. The annotation extraction unit 12 extracts the annotation added on the display unit 18. The target word / phrase specifying unit 13 specifies a character code to which an annotation is added or a group of characters as a word. The comment acquisition unit 15 acquires, from the comment information storage unit 10, comment information corresponding to the pattern of the additional writing mode to which the annotation extracted by the annotation extraction unit 12 corresponds and the word specified by the target word specifying unit 13. The comment output unit 16 causes the display unit 18 to display the comment information acquired by the comment acquisition unit 15 as a pop-up.

図７は、本装置の動作を示すフローチャートである。
図に示すステップ２００では、文書データ入力部１７から文書データが入力される。入力された文書データは図示しない文書データ用メモリに記憶される。
文書データ用メモリに記憶された文書データは、文書を表す中国語の文字列として表示部１８に表示される（Ｓ２１０）。 FIG. 7 is a flowchart showing the operation of the present apparatus.
In step 200 shown in the figure, document data is input from the document data input unit 17. The input document data is stored in a document data memory (not shown).
The document data stored in the document data memory is displayed on the display unit 18 as a Chinese character string representing the document (S210).

本装置のユーザは、表示部１８に表示された中国語の文字列を読み進め、読み、意味、又は文法の不明な語句が現れると、アノテーション追記部１９を用いてその語句にアノテーションを追記する。
アノテーションが追記されると、アノテーション抽出部１２は、その追記されたアノテーションを抽出する（Ｓ２２０）。 The user of this apparatus reads the Chinese character string displayed on the display unit 18 and, when a phrase with unknown reading, meaning, or grammar appears, adds an annotation to the phrase using the annotation appending unit 19. .
When the annotation is added, the annotation extraction unit 12 extracts the added annotation (S220).

対象語句特定部１３は、アノテーションが追記された文字又は文字列を語句として特定する（Ｓ２３０）。
次に、解説取得部１５が、ステップ２２０で抽出されたアノテーションを解析することによって、そのアノテーションの追記のされ方が、「三角」、「丸」、「下線」の何れのパターンに該当するか識別する（Ｓ２４０）。 The target word / phrase specifying unit 13 specifies a character or a character string to which an annotation is added as a word / phrase (S230).
Next, when the comment acquisition unit 15 analyzes the annotation extracted in step 220, whether the annotation is added to any of “triangle”, “circle”, or “underline” pattern. Identify (S240).

解説取得部１５は、ステップ２４０で識別したパターンと対応付けられた種別識別子を追記パターンデータベース１０ｂから読み出す（Ｓ２５０）。
更に、解説取得部１５は、認識結果用メモリに記憶されている文字コードが表す語句と対応するレコードを解説データベース１０ａから特定し、特定したレコードに記憶されている３種類の解説情報のうち、ステップ２５０で読み出した種別識別子と対応する解説情報を取得する（Ｓ２６０）。 The comment acquisition unit 15 reads the type identifier associated with the pattern identified in step 240 from the additional write pattern database 10b (S250).
Further, the comment acquisition unit 15 identifies a record corresponding to the phrase represented by the character code stored in the recognition result memory from the comment database 10a, and among the three types of comment information stored in the identified record, The comment information corresponding to the type identifier read in step 250 is acquired (S260).

解説取得部１５は、ステップ２６０で取得された解説情報を、ポップアップとして表示部１８に表示させる（Ｓ２７０）。後述するように、この解説情報のポップアップは、「解説を閉じる」と記されたボタンとともに表示部１８に表示されることになっており、ユーザがこのボタンを選択すると、解説情報のポップアップが消去されたあと、ステップ２１０の状態に戻る。そして、ユーザによって再びアノテーションが追記されると、ステップ２２０以降の処理が繰返される。 The comment acquisition unit 15 displays the comment information acquired in step 260 on the display unit 18 as a pop-up (S270). As will be described later, the comment information pop-up is displayed on the display unit 18 together with a button labeled “Close comment”. When the user selects this button, the comment information pop-up is erased. After that, the state returns to step 210. Then, when the annotation is added again by the user, the processing after step 220 is repeated.

ここで、読解対象となる中国語の文書データが表示部１８に表示されてから解説情報がポップアップとして表示されるまでの状態の遷移について、具体的な文書例を挙げて説明する。
図８（Ａ）は、ステップ２１０が実行された段階の表示部１８の表示状態であり。図８（Ｂ）は、アノテーションが追記された段階の表示状態である。また、図８（Ｃ）は、ステップ２７０が実行された段階の表示部１８の表示状態である。
図８（Ａ）を参照すると、画面の上段には、「文書を閉じる」、「解説を閉じる」と夫々記されたボタンが表示されており、その下には、読解対象となる中国語の文書が数段に渡って表示されている。図８（Ｂ）では、上から２段目の文の語句にアノテーションが追記されている。図８（Ｃ）を参照すると、アノテーションが追記された語句には、「〜だと思う」という意味を表す解説情報が対応付けられていることが分かる。これは、アノテーションの追記のされ方が「丸」のパターンに該当すると判断されたことを意味している。
以上説明した本実施形態によると、表示された文書上の所定の語句にスタイラスペンを使ってアノテーションを追記するだけで、その語句に関する解説情報をポップアップとして参照することができる。 Here, the transition of the state from when the Chinese document data to be read is displayed on the display unit 18 until the commentary information is displayed as a pop-up will be described with a specific document example.
FIG. 8A shows the display state of the display unit 18 at the stage where step 210 is executed. FIG. 8B shows a display state at a stage where annotations are additionally written. FIG. 8C shows the display state of the display unit 18 at the stage where step 270 is executed.
Referring to FIG. 8 (A), in the upper part of the screen, buttons indicating “close document” and “close commentary” are displayed, and below that, the Chinese language to be read is displayed. The document is displayed in several columns. In FIG. 8B, annotations are added to the words in the second sentence from the top. Referring to FIG. 8C, it can be seen that the commentary added with the annotation is associated with commentary information indicating the meaning of “I think”. This means that it is determined that the annotation is added to the “circle” pattern.
According to the present embodiment described above, it is possible to refer to commentary information about a word as a popup simply by adding an annotation to a predetermined word on the displayed document using a stylus pen.

（第３実施形態）
本願発明の第３実施形態について説明する。
本実施形態では、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが追記された状態の中国語の文書画像データを入力してそのアノテーションを消去した後、「読み」、「意味」、及び「文法・構文」と追記態様のパターンとの対応関係を上書きしてから用紙に印字して出力する機能を搭載させた。 (Third embodiment)
A third embodiment of the present invention will be described.
In this embodiment, after inputting the Chinese document image data in a state in which an annotation unrelated to “reading”, “meaning”, or “grammar / syntax” is added and deleting the annotation, “reading”, A function for overwriting the correspondence between "meaning" and "grammar / syntax" and the pattern of additional writing mode, and then printing it on paper and outputting it.

図９は、本実施形態にかかる文書処理装置のハードウェア構成を示すブロック図である。図に示すように、本装置は、解説情報記憶部１０、文書画像入力部１１、アノテーション抽出部１２、対象語句特定部１３、文字認識部１４、解説取得部１５、解説出力部１６のほか、アノテーション消去部２０、解説凡例生成部２１、凡例付き文書出力部２２を備える。 FIG. 9 is a block diagram showing a hardware configuration of the document processing apparatus according to the present embodiment. As shown in the figure, this apparatus includes a comment information storage unit 10, a document image input unit 11, an annotation extraction unit 12, a target phrase specifying unit 13, a character recognition unit 14, a comment acquisition unit 15, a comment output unit 16, An annotation erasing unit 20, a commentary legend generating unit 21, and a document output unit 22 with a legend are provided.

解説情報記憶部１０が有する両データベースのデータ構造、解説情報記憶部１０、文書画像入力部１１、アノテーション抽出部１２、対象語句特定部１３、文字認識部１４、解説取得部１５、及び解説出力部１６の機能は第１実施形態と同様である。 Data structure of both databases included in the comment information storage unit 10, comment information storage unit 10, document image input unit 11, annotation extraction unit 12, target phrase specifying unit 13, character recognition unit 14, comment acquisition unit 15, and comment output unit The function of 16 is the same as that of the first embodiment.

アノテーション消去部２０は、文書画像入力部１１から入力されたビットマップから、アノテーションのみを消去する。解説凡例生成部２１は、アノテーション消去部２０がアノテーションを消去することによって得られた新たなビットマップに、解説種別と追記態様の各態様関係を表す解説凡例画像を生成する。凡例付き文書出力部２２は、アノテーションが消去されたビットマップに解説凡例画像を上書きしてから用紙に印字する。 The annotation erasing unit 20 erases only the annotation from the bitmap input from the document image input unit 11. The commentary legend generating unit 21 generates a commentary legend image representing each aspect relationship between the commentary type and the additional writing form in a new bitmap obtained by deleting the annotation by the annotation deleting unit 20. The legend-added document output unit 22 overwrites the commentary legend image on the bitmap from which the annotation has been deleted, and then prints it on the sheet.

図１０は、本実施形態の動作を示すフローチャートである。
図に示すステップ１０では、文書画像入力部１１から文書画像データが入力される。この文書画像データは、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが追記された状態の文書をスキャンして得られたビットマップである。入力された文書画像データは図示しないビットマップ用メモリに記憶される。
ビットマップ用メモリに文書画像データが記憶されると、アノテーション抽出部１２は、その文書データに記されているアノテーションの１つを抽出する（Ｓ２０）。
次に、アノテーション消去部２０が、ステップ２０で抽出されたアノテーションをビットマップ用メモリの文書画像データから消去する（Ｓ３０）。 FIG. 10 is a flowchart showing the operation of this embodiment.
In step 10 shown in the figure, document image data is input from the document image input unit 11. This document image data is a bitmap obtained by scanning a document in which annotations unrelated to “reading”, “meaning”, or “grammar / syntax” are added. The input document image data is stored in a bitmap memory (not shown).
When document image data is stored in the bitmap memory, the annotation extraction unit 12 extracts one of the annotations described in the document data (S20).
Next, the annotation erasing unit 20 erases the annotation extracted in step 20 from the document image data in the bitmap memory (S30).

ステップ３０を実行すると、未だ抽出されていないアノテーションがビットマップ用メモリの文書画像データに残っているか否か判断される。そして、抽出されていないアノテーションが残っているときは、ステップ２０に戻って以降の処理を繰返す。一方、全てのアノテーションが抽出されたときは、ステップ４０に進み、解説凡例生成部２１が、追記パターンデータベース１０ｂの各レコードにおける解説種別識別子と追記態様識別子の対応関係を表す解説凡例画像を生成する。
解説凡例画像が生成されると、凡例付き文書出力部２２は、アノテーション消去部２０によって全てのアノテーションが消去された文書画像データに解説凡例画像を上書きし、その上書きによって得られた凡例付き文書画像を用紙に印字して出力する（Ｓ５０）。 When step 30 is executed, it is determined whether annotations that have not yet been extracted remain in the document image data in the bitmap memory. If annotations that have not been extracted remain, the process returns to step 20 and the subsequent processing is repeated. On the other hand, when all the annotations have been extracted, the process proceeds to step 40, where the commentary legend generating unit 21 generates a commentary legend image representing the correspondence between the commentary type identifier and the recordable form identifier in each record of the write-once pattern database 10b. .
When the explanatory legend image is generated, the legend-added document output unit 22 overwrites the explanatory legend image on the document image data from which all annotations have been deleted by the annotation deleting unit 20, and the document image with the legend obtained by the overwriting. Is printed on paper and output (S50).

ユーザは、「読み」、「意味」、又は「文法・構文」が不明な語句に所定のアノテーションを追記しながら、ステップ５０で出力された判例上書き文書を読み進める。そして、アノテーションの追記を終えた文書をスキャンして得た文書画像データを文書画像入力部１１から再び入力させる。文書画像データが入力されると、図４に示したステップ１００以降の動作が実行される。 The user advances the precedent overwritten document output in step 50 while adding a predetermined annotation to a word whose “reading”, “meaning”, or “grammar / syntax” is unknown. Then, the document image data obtained by scanning the document for which the annotation has been added is input from the document image input unit 11 again. When the document image data is input, the operations after step 100 shown in FIG. 4 are executed.

ここで、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが追記された状態の文書と凡例付き文書との関係について、具体的な文書例を挙げて説明する。
図１１（Ａ）は、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが追記された状態の文書であり、図１１（Ｂ）は、本装置による処理を経て得られる凡例付き文書である。図１１（Ａ）を参照すると、上から２段目及び３段目の各文には、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが追記されていることが分かる。一方、図１１（Ｂ）を参照すると、図１１（Ａ）にて追記されていたアノテーションが消去され、その代りに、「凡例」、「△ 読み」、「○ 意味」、「下線文法・構文」という注記を矩形により囲んだ解説凡例画像が上書きされていることが分かる。 Here, the relationship between a document in which an annotation irrelevant to “reading”, “meaning”, or “grammar / syntax” is added and a document with a legend will be described with a specific example of the document.
FIG. 11A is a document in which annotations unrelated to “reading”, “meaning”, or “grammar / syntax” are added, and FIG. 11B is obtained through processing by this apparatus. A document with a legend. Referring to FIG. 11A, it is understood that annotations unrelated to “reading”, “meaning”, or “grammar / syntax” are added to the sentences in the second and third rows from the top. . On the other hand, referring to FIG. 11 (B), the annotation added in FIG. 11 (A) is deleted, and instead of “Legend”, “△ Reading”, “○ Meaning”, “Underline Grammar / Syntax It can be seen that the explanatory legend image in which the note “” is surrounded by a rectangle is overwritten.

以上説明した本実施形態によると、「読み」、「意味」、又は「文法・構文」と無関係なアノテーションが記されている状態の文書からそのアノテーションを一旦消去し、「読み」、「意味」、又は「文法・構文」が不明なことを表すアノテーションを改めて追記していくことができる。 According to the present embodiment described above, the annotation is temporarily deleted from the document in which the annotations unrelated to “reading”, “meaning”, or “grammar / syntax” are written, and “reading”, “meaning” Or, an annotation indicating that “grammar / syntax” is unknown can be added again.

（他の実施形態）
本願にかかる発明は、種々の変形実施が可能である。
上記実施形態は、読解対象となる文書が中国語で記されていることを前提として説明を行ったが、他の言語で記されている文書に本願発明を適用してももちろんよい。 (Other embodiments)
The invention according to the present application can be variously modified.
The above embodiment has been described on the assumption that the document to be read is written in Chinese, but the present invention may of course be applied to a document written in another language.

また、上記実施形態の追記パターンデータベース１０ｂにおいては、「読み」の解説が「三角」のパターンと、「意味」の解説が「丸」のパターンと、「文法・構文」の解説が「下線」のパターンと夫々対応付けられていた。これに対し、読解対象となる文書を記している言語の種類に応じ、同データベースにおける解説の種別と追記態様のパターンの対応関係を異なるものにしてもよい。例えば、図１２（Ａ）に示すように、アノテーションが追記されている文字が一文字だけであるときは「読み」と、二文字であるときは「意味」と、三文字以上であるときは「文法・構文」と夫々対応付けてもよい。このような対応関係は、特に、日本語や中国語の文書の読解に好適である。また、図１２（Ｂ）に示すように、アノテーションが追記されている文字が二文字以下であるときは「意味」と、三文字以上であるときは「文法・構文」と夫々対応付けてもよい。このような対応関係は、特に、読みを調べる必要がない表音文字によって構築される韓国語の文書の読解に好適である。また、図１２（Ｃ）に示すように、アノテーションが記されている文字が１つの単語であるときは「読み及び意味」と、２〜３つの単語であるときは「句の意味」と、４つ以上の単語であるときは「文法・構文」と夫々対応付けてもよい。このような対応関係は、英語など印欧語系言語の文書の読解に好適である。 Further, in the postscript pattern database 10b of the above embodiment, the explanation of “reading” is a pattern of “triangle”, the explanation of “meaning” is a pattern of “circle”, and the explanation of “grammar / syntax” is “underlined”. Was associated with each of the patterns. On the other hand, in accordance with the type of language describing the document to be read, the correspondence relationship between the commentary type and the additional writing mode pattern in the database may be different. For example, as shown in FIG. 12A, “reading” is used when only one character is added with an annotation, “meaning” when there are two characters, and “character” when there are three or more characters. It may be associated with “grammar / syntax”. Such correspondence is particularly suitable for reading Japanese and Chinese documents. Also, as shown in FIG. 12 (B), when the character to which the annotation is added is 2 characters or less, it can be associated with “meaning” and when it is 3 characters or more, it can be associated with “grammar / syntax”. Good. Such a correspondence relationship is particularly suitable for reading a Korean document constructed by phonograms that do not require reading. In addition, as shown in FIG. 12C, when the character with the annotation is one word, “reading and meaning”, and when it is two or three words, “phrase meaning”, When there are four or more words, they may be associated with “grammar / syntax”. Such a correspondence relationship is suitable for reading a document in an Indo-European language such as English.

文書処理装置のハードウェア構成図である。It is a hardware block diagram of a document processing apparatus. 解説データベースのデータ構造図である。It is a data structure figure of an explanation database. 追記パターンデータベースのデータ構造図である。It is a data structure figure of a postscript pattern database. 文書処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a document processing apparatus. 中国語の文書画像とその解説付き文書である。Chinese document image and document with explanation. 文書処理装置のハードウェア構成図である。It is a hardware block diagram of a document processing apparatus. 文書処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a document processing apparatus. 文書画像の遷移を示す図である。It is a figure which shows the transition of a document image. 文書処理装置のハードウェア構成図である。It is a hardware block diagram of a document processing apparatus. 文書処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a document processing apparatus. 文書画像の遷移を示す図である。It is a figure which shows the transition of a document image. 追記パターンデータベースのデータ構造図である（変形例）。It is a data structure figure of a postscript pattern database (modification).

Explanation of symbols

１１…文書画像入力部、１２…アノテーション抽出部、１３…対象語句特定部、１４…文字認識部、１５…解説取得部、１６…解説出力部、１７…文書データ入力部、１８…表示部、１９…アノテーション追記部、２０…アノテーション消去部、２０…Ｓ、２１…アノテーション、２１…解説凡例生成部、２２…文書出力部 DESCRIPTION OF SYMBOLS 11 ... Document image input part, 12 ... Annotation extraction part, 13 ... Target word specific | specification part, 14 ... Character recognition part, 15 ... Explanation acquisition part, 16 ... Explanation output part, 17 ... Document data input part, 18 ... Display part, DESCRIPTION OF SYMBOLS 19 ... Annotation appending part, 20 ... Annotation deletion part, 20 ... S, 21 ... Annotation, 21 ... Explanation legend generation part, 22 ... Document output part

Claims

Commentary information storage means for storing a commentary group by phrase that summarizes each pair of several kinds of commentary information related to one word and a pattern of an annotation addition to the document for each word that can appear in the document;
An input means for inputting a bitmap representing a document including one or more words to which annotations are added;
When a bitmap is input from the input unit, an extraction unit that extracts an annotation from a document represented by the bitmap;
A target phrase specifying means for specifying from the bitmap the phrase to which the extracted annotation is added;
Commentary information group specifying means for specifying a word-specific commentary information group stored in the commentary information storage means in association with the specified word;
Analyzing the extracted annotation to identify the pattern of the additional writing mode to which the annotation corresponds, commentary acquisition means for acquiring commentary information paired with the identified pattern from the specified commentary group by phrase,
A document processing apparatus comprising: comment output means for outputting the acquired comment information.

The document processing apparatus according to claim 1,
The target phrase specifying unit extracts an image on the bitmap around the extracted annotation, and performs character recognition on the extracted image, thereby specifying the phrase in which the annotation is added as a character or a character string. Yes Document processing device.

The document processing apparatus according to claim 1 or 2,
A document processing apparatus further comprising annotation erasing means for erasing an annotation from the input bitmap.

The document processing apparatus according to claim 3.
A commentary legend image representing the relationship between the pattern of the additional writing mode associated with various commentary information in the commentary information storage means and the type of commentary corresponding to each pattern is generated, and the generated commentary legend image is the annotation A document processing apparatus further comprising a legend generating means for overwriting a predetermined drawing position in a new bitmap obtained by erasing data.

Commentary information storage means for storing a commentary group by phrase that summarizes each pair of several kinds of commentary information related to one word and a pattern of an annotation addition to the document for each word that can appear in the document;
An input means for inputting a character code string of a document;
Display means for displaying the inputted character code string as a character string representing a document;
Annotation adding means for adding an annotation to a part of the displayed character string;
Extracting means for extracting the appended annotation;
A target phrase specifying means for specifying, as a phrase, a character or a group of characters to which the extracted annotation is added;
Commentary information group specifying means for specifying a word-specific commentary information group stored in the commentary information storage means in association with the specified word;
Analyzing the extracted annotation to identify the pattern of the additional writing mode to which the annotation corresponds, commentary acquisition means for acquiring commentary information paired with the identified pattern from the specified commentary group by phrase,
A document processing apparatus comprising: comment output means for outputting the acquired comment information.