JP6268888B2

JP6268888B2 - Information processing apparatus and information processing program

Info

Publication number: JP6268888B2
Application number: JP2013208306A
Authority: JP
Inventors: 木村　俊一; 俊一木村; 耕輔丸山; 瑛一田中; 越　裕; 裕越
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2013-10-03
Filing date: 2013-10-03
Publication date: 2018-01-31
Anticipated expiration: 2033-10-03
Also published as: JP2015072622A

Description

本発明は、情報処理装置及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus and an information processing program.

特許文献１には、手書き入力システムを利用し、文字が筆記された記入用紙のその紙面上で的確に記入文字に対する訂正処理を確定させることを課題とし、手書き入力システムは、帳票上のドットパターンを読み込んで筆跡情報を生成するデジタルペンと、この筆跡情報からデジタルペンの筆跡をイメージデータ化した筆跡イメージを生成する筆跡イメージ生成部と、筆跡イメージを入力して文字認識を行う文字認識部と、文字認識結果から削除すべき文字を指定する取消文字指定線と削除対象の文字数を指定する削除文字数指定線とを筆跡イメージより検出する訂正指示検出部と、訂正指示検出部による検出結果に基づいて、文字認識部による文字認識結果を訂正する文字認識結果訂正部とを備えることが開示されている。 Patent Document 1 uses a handwriting input system as an object to accurately determine a correction process for an input character on the surface of an entry sheet on which a character is written. The handwriting input system uses a dot pattern on a form. A digital pen that generates handwriting information by reading a handwriting, a handwriting image generation unit that generates a handwriting image obtained by converting the handwriting of the digital pen into image data from the handwriting information, and a character recognition unit that performs character recognition by inputting the handwriting image Based on the detection result of the correction instruction detection unit that detects from the handwriting image the cancellation character specification line that specifies the character to be deleted from the character recognition result and the deletion character number specification line that specifies the number of characters to be deleted. And a character recognition result correction unit that corrects a character recognition result by the character recognition unit.

特許文献２には、タブレットや電子ペンによる記入手段において、一般の紙とペンを用いた通常の筆記行為とできる限り同じ作業感覚で、記入文字の削除や追加、置換、強調などの変更作業が計算機上で行える手段を提供することを課題とし、筆記者が通常の筆記行為で用いられる、削除を意味する二重線や塗りつぶし、追加を意味する山型もしくは谷型記号や矢印記号、強調を意味する囲み線や下線、といった記号をそれら変更処理の開始及びその対象となる文字を指定するための制御コマンドと対応付け、それらコマンドを筆記情報中から自動的に検出し、当該処理を自動的に実行し、またその際の変更対象文字の検出に関し、前記文字認識手法中で生成された文字切出し情報を利用し、二重線や囲み線などの制御記号が記入された場合、前記文字切出し情報とそれら制御記号との重なりの度合いを求め、変更対象文字の判定に利用することによって、高精度な検出を実現することが開示されている。 In Patent Document 2, in a writing means using a tablet or an electronic pen, changing work such as deletion, addition, substitution, and emphasis of written characters is performed as much as possible with a normal writing action using ordinary paper and a pen. The task is to provide a means that can be performed on a computer, and a writer uses a double line or fill that means deletion, a mountain or valley symbol that means addition, an arrow symbol, or emphasis that is used in normal writing. Corresponding symbols such as encircled lines and underlines are associated with the start of the modification process and control commands for specifying the target character, and these commands are automatically detected from the written information and the process is automatically performed. When a control symbol such as a double line or an enclosing line is entered using the character cutout information generated in the character recognition method for the detection of the character to be changed at that time The calculated degree of overlap of the character segmentation information and their control symbols, by utilizing the determination of the change target characters, it is disclosed that to realize highly accurate detection.

特許文献３には、媒体に対する筆記情報に基づく処理を行った後におけるその筆記情報の漏洩を防止することを課題とし、電子ペンで印刷文書に筆記し、通信装置に接続すると、通信装置は、筆記を電子化したストローク情報を端末装置に送信し、端末装置は、ペン認証サーバを用いて電子ペンを認証し、識別情報サーバに問い合わせてストローク情報を関連付ける電子文書の格納場所等を示す電子文書情報を取得し、文書サーバに対して電子文書情報とストローク情報とを送信することで登録を依頼し、これにより、文書サーバは、登録の成功／失敗を示す結果情報を端末装置に送信し、通信装置は、結果情報を受信すると、電子ペン内のストローク情報を消去し、結果情報が登録失敗を示していれば、再入力を促すメッセージを出力することが開示されている。 Patent Document 3 aims to prevent leakage of writing information after performing processing based on writing information on a medium, writing on a printed document with an electronic pen, and connecting to a communication device. Electronic document indicating stroke information obtained by digitizing writing to the terminal device, the terminal device authenticating the electronic pen using the pen authentication server, and inquiring the identification information server to associate the stroke information with the electronic document Information is obtained, and registration is requested by transmitting electronic document information and stroke information to the document server, whereby the document server transmits result information indicating success / failure of registration to the terminal device, Upon receiving the result information, the communication device deletes the stroke information in the electronic pen, and outputs a message prompting re-input if the result information indicates registration failure. Door has been disclosed.

特開２００８−０４０７５９号公報JP 2008-040759 A 特開２００４−１５２０４０号公報JP 2004-152040 A 特開２００８−０７７５５３号公報JP 2008-077753 A

本発明は、削除記号が含まれている可能性のあるストローク群から、削除された文字を判断するようにした情報処理装置及び情報処理プログラムを提供することを目的としている。 An object of the present invention is to provide an information processing apparatus and an information processing program that determine a deleted character from a stroke group that may include a deletion symbol.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、ストローク群を受け付ける受付手段と、前記受付手段によって受け付けられたストローク群から予め定められた規則に従って区切られた単位毎にストローク群を抽出するストローク抽出手段と、前記単位毎のストローク群を囲む矩形を抽出する矩形抽出手段と、前記矩形抽出手段によって抽出された矩形を領域に分割する領域分割手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークが存在する前記領域分割手段によって分割された領域の個数を計数する計数手段と、前記計数手段によって計数された領域の個数に基づいて、前記矩形内のストローク群は削除されたものか否かを判断する判断手段を具備することを特徴とする情報処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
The invention of claim 1 is a receiving means for receiving a stroke group, a stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means, and for each unit. There is a rectangle extracting means for extracting a rectangle surrounding the stroke group, an area dividing means for dividing the rectangle extracted by the rectangle extracting means into regions, and the stroke exists for each stroke extracted by the stroke extracting means. Counting means for counting the number of areas divided by the area dividing means, and determination for determining whether or not the stroke group in the rectangle has been deleted based on the number of areas counted by the counting means. An information processing apparatus comprising: means.

請求項２の発明は、前記ストロークと前記領域の境界線との交点を算出する算出手段をさらに具備し、前記計数手段は、前記算出手段によって算出された交点に接する領域を、ストロークが存在する領域とすることを特徴とする請求項１に記載の情報処理装置である。 The invention of claim 2 further comprises a calculation means for calculating an intersection point between the stroke and the boundary line of the region, and the counting means has a stroke in a region in contact with the intersection point calculated by the calculation means. The information processing apparatus according to claim 1, wherein the information processing apparatus is an area.

請求項３の発明は、前記矩形抽出手段は、前記単位の列の高さを前記矩形の高さとして、又は該単位の列の幅を該矩形の幅として、該矩形を抽出することを特徴とする請求項１又は２に記載の情報処理装置である。 The invention of claim 3 is characterized in that the rectangle extracting means extracts the rectangle with the height of the row of units as the height of the rectangle, or with the width of the row of units as the width of the rectangle. An information processing apparatus according to claim 1 or 2.

請求項４の発明は、ストローク群を受け付ける受付手段と、前記受付手段によって受け付けられたストローク群から予め定められた規則に従って区切られた単位毎にストローク群を抽出するストローク抽出手段と、前記単位毎のストローク群を囲む矩形を抽出する矩形抽出手段と、前記矩形抽出手段によって抽出された矩形を領域に分割する領域分割手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークが存在する前記領域分割手段によって分割された領域の個数を計数する計数手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークの長さを計測する計測手段と、前記計測手段によって計測されたストロークの長さを、該ストロークが含まれている前記矩形の高さ、幅のいずれか一方又は両方を用いて正規化する正規化手段と、前記計数手段によって計数された領域の個数及び前記正規化手段によって正規化されたストロークの長さに基づいて、前記矩形内のストローク群は削除されたものか否かを判断する判断手段を具備することを特徴とする情報処理装置である。 The invention of claim 4 is a receiving means for receiving a stroke group, a stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means, and for each unit. There is a rectangle extracting means for extracting a rectangle surrounding the stroke group, an area dividing means for dividing the rectangle extracted by the rectangle extracting means into regions, and the stroke exists for each stroke extracted by the stroke extracting means. Counting means for counting the number of areas divided by the area dividing means, measuring means for measuring the length of each stroke extracted by the stroke extracting means, and stroke measured by the measuring means The length of either the height or width of the rectangle containing the stroke The stroke group in the rectangle is deleted based on the normalizing means that normalizes using one or both, the number of areas counted by the counting means and the length of the stroke normalized by the normalizing means It is an information processing apparatus characterized by comprising a judging means for judging whether or not it has been done.

請求項５の発明は、コンピュータを、ストローク群を受け付ける受付手段と、前記受付手段によって受け付けられたストローク群から予め定められた規則に従って区切られた単位毎にストローク群を抽出するストローク抽出手段と、前記単位毎のストローク群を囲む矩形を抽出する矩形抽出手段と、前記矩形抽出手段によって抽出された矩形を領域に分割する領域分割手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークが存在する前記領域分割手段によって分割された領域の個数を計数する計数手段と、前記計数手段によって計数された領域の個数に基づいて、前記矩形内のストローク群は削除されたものか否かを判断する判断手段として機能させるための情報処理プログラムである。 The invention of claim 5 includes a computer that receives a stroke group, a stroke extracting unit that extracts a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving unit, Rectangle extraction means for extracting a rectangle surrounding the stroke group for each unit, area dividing means for dividing the rectangle extracted by the rectangle extraction means into areas, and for each stroke extracted by the stroke extraction means, the stroke Counting means for counting the number of areas divided by the area dividing means, and whether or not the stroke group in the rectangle is deleted based on the number of areas counted by the counting means. It is an information processing program for functioning as a determination means for determining.

請求項６の発明は、コンピュータを、ストローク群を受け付ける受付手段と、前記受付手段によって受け付けられたストローク群から予め定められた規則に従って区切られた単位毎にストローク群を抽出するストローク抽出手段と、前記単位毎のストローク群を囲む矩形を抽出する矩形抽出手段と、前記矩形抽出手段によって抽出された矩形を領域に分割する領域分割手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークが存在する前記領域分割手段によって分割された領域の個数を計数する計数手段と、前記ストローク抽出手段によって抽出されたストローク毎に、該ストロークの長さを計測する計測手段と、前記計測手段によって計測されたストロークの長さを、該ストロークが含まれている前記矩形の高さ、幅のいずれか一方又は両方を用いて正規化する正規化手段と、前記計数手段によって計数された領域の個数及び前記正規化手段によって正規化されたストロークの長さに基づいて、前記矩形内のストローク群は削除されたものか否かを判断する判断手段として機能させるための情報処理プログラムである。 The invention of claim 6 includes a computer that receives a stroke group, a stroke extracting unit that extracts a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving unit, Rectangle extraction means for extracting a rectangle surrounding the stroke group for each unit, area dividing means for dividing the rectangle extracted by the rectangle extraction means into areas, and for each stroke extracted by the stroke extraction means, the stroke Counting means for counting the number of areas divided by the area dividing means, measuring means for measuring the length of each stroke extracted by the stroke extracting means, and measuring by the measuring means The length of the stroke made is the height of the rectangle that contains the stroke. Normalizing means using one or both of the widths, and the number of areas counted by the counting means and the length of the stroke normalized by the normalizing means. The stroke group is an information processing program for functioning as a determination means for determining whether or not the stroke group has been deleted.

請求項１の情報処理装置によれば、削除記号が含まれている可能性のあるストローク群から、削除されたストローク群を判断することができる。 According to the information processing apparatus of the first aspect, it is possible to determine the deleted stroke group from the stroke group that may include the deletion symbol.

請求項２の情報処理装置によれば、ストロークと領域の境界線との交点に接する領域を、ストロークが存在する領域として判断することができる。 According to the information processing apparatus of the second aspect, it is possible to determine a region in contact with the intersection of the stroke and the boundary line of the region as a region where the stroke exists.

請求項３の情報処理装置によれば、単位の列の高さ又は単位の列の幅を用いて、矩形を抽出することができる。 According to the information processing apparatus of the third aspect, the rectangle can be extracted using the height of the unit column or the width of the unit column.

請求項４の情報処理装置によれば、本構成を有していない場合に比較して高精度に、削除記号が含まれている可能性のあるストローク群から、削除されたストローク群を判断することができる。 According to the information processing apparatus of the fourth aspect , the deleted stroke group is determined from the stroke group that may include the deletion symbol with higher accuracy than in the case where the present configuration is not provided. be able to.

請求項５の情報処理プログラムによれば、削除記号が含まれている可能性のあるストローク群から、削除されたストローク群を判断することができる。 According to the information processing program of the fifth aspect , it is possible to determine the deleted stroke group from the stroke group that may include the deletion symbol.

請求項６の情報処理プログラムによれば、本構成を有していない場合に比較して高精度に、削除記号が含まれている可能性のあるストローク群から、削除されたストローク群を判断することができる。 According to the information processing program of the sixth aspect , the deleted stroke group is determined from the stroke group that may include the deletion symbol with higher accuracy than in the case of not having this configuration. be able to.

第１の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 1st Embodiment. 第１の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 1st Embodiment. 第１の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 1st Embodiment. ストローク情報のデータ構造例を示す説明図である。It is explanatory drawing which shows the data structure example of stroke information. 第１の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by 1st Embodiment. 第１の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by 1st Embodiment. 第２の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by 2nd Embodiment. 第３の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 3rd Embodiment. 従来技術において、文字列の修正例を示す説明図である。In prior art, it is explanatory drawing which shows the example of correction of a character string. 従来技術において、文字列の修正例を示す説明図である。In prior art, it is explanatory drawing which shows the example of correction of a character string. 従来技術において、文字列の修正例を示す説明図である。In prior art, it is explanatory drawing which shows the example of correction of a character string. 本実施の形態を実現化する場合のシステム例を示す説明図である。It is explanatory drawing which shows the example of a system in the case of implement | achieving this Embodiment. 情報画像が印刷された電子ペン用紙の例を示す説明図である。It is explanatory drawing which shows the example of the electronic pen paper on which the information image was printed. 電子ペン内の構成例を示す説明図である。It is explanatory drawing which shows the structural example in an electronic pen. 電子ペンによる処理例を示すフローチャートである。It is a flowchart which shows the process example by an electronic pen. 本実施の形態で取り扱う情報画像（コードパターン画像）の例を示す説明図である。It is explanatory drawing which shows the example of the information image (code pattern image) handled by this Embodiment. 本実施の形態における情報の符号化処理例及び情報画像（ドットコード画像）の生成処理例を示す説明図である。It is explanatory drawing which shows the example of the encoding process of the information in this Embodiment, and the production | generation process example of an information image (dot code image). 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

まず、本実施の形態を説明する前に、その前提となる技術について図９〜１１を用いて説明する。なお、この説明は、本実施の形態の理解を容易にすることを目的とするものである。
電子ペンを用いて文字認識をする装置において、一旦記入された文字を削除（消去）したい場合がある。図９の例に示すように、文字列「ＡＢＣ」と記載した後で、実は、「ＡＣ」と修正を行いたいと考える場合がある。
ワードプロセッサ等のようにコンピュータ上の電子情報を用いる場合は、文字「Ｂ」を選択して削除することができる。
電子ペンを用いる場合は、既にインクで紙上に「ＡＢＣ」と記載されてしまう。インクでの表記上で削除して、その表記上での削除がそのまま電子的データとしての文字の削除として解釈されることが望ましい。 First, before explaining the present embodiment, the presupposed technology will be described with reference to FIGS. This description is intended to facilitate understanding of the present embodiment.
In an apparatus that performs character recognition using an electronic pen, there are cases where it is desired to delete (erase) a character once entered. As shown in the example of FIG. 9, after writing the character string “ABC”, there is a case where it is actually desired to correct the character string “AC”.
When using electronic information on a computer such as a word processor, the character “B” can be selected and deleted.
When the electronic pen is used, “ABC” is already written on the paper with ink. It is desirable to delete on the notation in ink, and the deletion on the notation is directly interpreted as deletion of characters as electronic data.

以下の例では、電子ペンを用いたオンライン文字認識において、文字列「ＡＢＣ」を文字列「ＡＣ」に修正する例（文字列「ＡＢＣ」の「Ｃ」を削除する例）を示している。
図１０は、特許文献１に記載の技術において、文字列の修正例を示す説明図である。図１０の例に示されるように、削除文字列を指定する横線と削除文字を指定する斜め線とで構成される文字列取消線を用いて、削除文字を指定する。
図１１は、特許文献２に記載の技術において、文字列の修正例を示す説明図である。図１１の例に示されるように、多重線（１本線も含む）や塗りつぶしが記載された場合に、削除として扱う例が示されている。
特許文献１のように横線と斜め線で削除する場合には、同様の形状の文字が存在する場合があるため、特に１文字だけを削除する場合は、削除記号と文字とを混同してしまう可能がある。
特許文献２のように多重線や塗りつぶしを用いる場合は、文字と削除記号との区別をすることができる。
しかしながら、特許文献２において、「記憶手段には、各制御記号の形状とそれに対応する動作との組で登録」しておく必要がある。つまり、予め定められた削除記号と同じものを操作者は筆記する必要がある。 The following example shows an example of correcting the character string “ABC” to the character string “AC” in the online character recognition using the electronic pen (an example of deleting “C” of the character string “ABC”).
FIG. 10 is an explanatory diagram showing a modification example of a character string in the technique described in Patent Document 1. In FIG. As shown in the example of FIG. 10, a deletion character is specified using a character string strikethrough composed of a horizontal line specifying the deletion character string and a diagonal line specifying the deletion character.
FIG. 11 is an explanatory diagram showing a modification example of a character string in the technique described in Patent Document 2. As shown in the example of FIG. 11, when multiple lines (including a single line) and fill are described, an example of handling as deletion is shown.
When deleting with horizontal lines and diagonal lines as in Patent Document 1, there may be characters of the same shape, so especially when only one character is deleted, the deletion symbol and the character are confused. There is a possibility.
When using multiple lines or fills as in Patent Document 2, it is possible to distinguish between characters and deletion symbols.
However, in Patent Document 2, it is necessary to register “in the storage unit with a combination of the shape of each control symbol and the corresponding operation”. That is, the operator needs to write the same deletion symbol as that determined in advance.

以下、図面に基づき本発明を実現するにあたっての好適な各種の実施の形態の例を説明する。
図１は、第１の実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、全ての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, examples of various preferred embodiments for realizing the present invention will be described with reference to the drawings.
FIG. 1 is a conceptual module configuration diagram of a configuration example according to the first embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, they may be different values, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である情報処理装置は、削除記号が含まれている可能性のあるストローク群から、削除されたストローク群を判断するものであって、図１の例に示すように、オンライン文字認識モジュール１１０、ストロークスクラッチ判定モジュール１２０、文字除去モジュール１３０を有している。なお、「予め定められた規則に従って区切られた単位」として、主に文字を例にして示す。本情報処理装置が行う矩形抽出は、文字毎となる場合が多い。ただし、正確に文字であることを判定しているわけではない。また、文字と文字の境界が不明である場合もあるため、ストローク群は「文字毎」には抽出できないことがあり得る。ここで記載している矩形は、「たまたま文字のようなまとまりとして抽出されたストローク群の外接矩形」程度の意味である。また、「予め定められた規則」とは、例えば、オンライン文字認識モジュール１１０によって文字としてのストローク群を抽出するための規則を指し、具体的には、予め定められた大きさの範囲内で記載されていること、予め定められた時間内に記載されていること、前後のストローク間に予め定められた時間以上の間隔があること等、又はこれらの組み合わせ等がある。また、「ストローク群」とは、１つ以上のストロークによって構成されているストロークの集合である。したがって、ストローク群が、１つのストロークによって構成されている場合もある。 The information processing apparatus according to the present embodiment determines a deleted stroke group from a stroke group that may include a deletion symbol. As shown in the example of FIG. It has a recognition module 110, a stroke scratch determination module 120, and a character removal module 130. Note that characters are mainly shown as examples as “units delimited according to a predetermined rule”. The rectangle extraction performed by the information processing apparatus is often performed for each character. However, the character is not accurately determined. In addition, since the boundary between characters may be unknown, the stroke group may not be extracted “by character”. The rectangle described here has a meaning of “a circumscribed rectangle of a stroke group extracted as a unit like a character”. The “predetermined rule” refers to, for example, a rule for extracting a stroke group as a character by the online character recognition module 110, and is specifically described within a predetermined size range. That it is performed, that it is described within a predetermined time, that there is an interval more than a predetermined time between the preceding and following strokes, or a combination thereof. The “stroke group” is a set of strokes composed of one or more strokes. Therefore, the stroke group may be composed of one stroke.

オンライン文字認識モジュール１１０は、ストロークスクラッチ判定モジュール１２０と接続されている。オンライン文字認識モジュール１１０は、ストローク情報１０５を受け付ける。オンライン文字認識モジュール１１０によって受け付けられるストローク情報１０５（群）には、複数のストロークが含まれており、文字列を表すストロークの他に、削除記号を示すストロークが含まれている可能性がある。削除記号は、文字上を塗り潰す記号（スクラッチ）である。具体的には、図５の例を用いて後述する。
ストローク情報１０５として、例えば、ストローク情報４００がある。図４は、ストローク情報４００のデータ構造例を示す説明図である。ストローク情報４００は、時刻欄４１０、座標位置欄４２０、ペンアップ／ダウン欄４３０を有している。時刻欄４１０は、時刻を記憶している。ここでの時刻は、時系列を示す情報（座標位置が発生した順序を示す情報）であればよい。座標位置欄４２０は、座標位置を記憶している。座標位置は、電子ペンの紙上での位置（例えば、ＸＹ座標）を示している。ペンアップ／ダウン欄４３０は、ペンアップとペンダウンを示す情報を記憶している。ペンアップは、紙から電子ペンを離したことを示しており、ペンダウンは、紙に電子ペンを押し当てたこと（押し当てていること）を示している。
例えば、１秒に７０回〜１００回程度の頻度で得られた電子ペンの座標位置と、電子ペンのアップダウン情報を用いて、ストローク情報４００が生成される。
電子ペンがダウンしている状態は、紙に文字が書かれている状態としてみなすことができる。すなわち、電子ペンがダウンしてからアップするまでを一つのストローク（一筆で書かれる文字の線）としてみなせる。
このストローク情報１０５を用いて、オンライン文字認識モジュール１１０は、紙上に記載された文字を認識する。ストローク情報１０５を用いて文字を認識する手法は種々存在している。
オンライン文字認識モジュール１１０による文字認識の結果、１又は複数の文字（認識結果（ストローク情報付）１１５）が出力される。文字認識の結果、複数のストロークの一つ一つ毎に、そのストロークがどの文字に対応しているかを判断することができる。
さらに、個々の文字を囲む矩形を形成することができる。文字を囲む矩形として、例えば、その文字の外接矩形がある。ある文字に対応しているストロークの座標情報のＸ軸、Ｙ軸それぞれの最大、最小値を用いて、外接矩形を算出することができる。 The online character recognition module 110 is connected to the stroke scratch determination module 120. The online character recognition module 110 receives the stroke information 105. The stroke information 105 (group) received by the online character recognition module 110 includes a plurality of strokes, and may include a stroke indicating a deletion symbol in addition to a stroke indicating a character string. The deletion symbol is a symbol (scratch) that fills the character. Specifically, it will be described later using the example of FIG.
The stroke information 105 includes, for example, stroke information 400. FIG. 4 is an explanatory diagram showing an example of the data structure of the stroke information 400. The stroke information 400 has a time column 410, a coordinate position column 420, and a pen up / down column 430. The time column 410 stores the time. The time here may be information indicating time series (information indicating the order in which coordinate positions are generated). The coordinate position column 420 stores coordinate positions. The coordinate position indicates the position of the electronic pen on the paper (for example, XY coordinates). The pen up / down column 430 stores information indicating pen up and pen down. Pen-up indicates that the electronic pen has been released from the paper, and pen-down indicates that the electronic pen has been pressed against the paper.
For example, the stroke information 400 is generated by using the coordinate position of the electronic pen obtained at a frequency of about 70 to 100 times per second and the up / down information of the electronic pen.
A state where the electronic pen is down can be regarded as a state where characters are written on paper. That is, it can be regarded as one stroke (a line of characters written with a single stroke) from when the electronic pen is down to when it is up.
Using this stroke information 105, the online character recognition module 110 recognizes characters written on paper. There are various methods for recognizing characters using the stroke information 105.
As a result of character recognition by the online character recognition module 110, one or more characters (recognition result (with stroke information) 115) are output. As a result of character recognition, it is possible to determine which character the stroke corresponds to for each of a plurality of strokes.
Furthermore, a rectangle surrounding each character can be formed. As a rectangle surrounding a character, for example, there is a circumscribed rectangle of the character. A circumscribed rectangle can be calculated using the maximum and minimum values of the X-axis and Y-axis of the coordinate information of the stroke corresponding to a certain character.

ストロークスクラッチ判定モジュール１２０は、オンライン文字認識モジュール１１０、文字除去モジュール１３０と接続されている。ストロークスクラッチ判定モジュール１２０は、オンライン文字認識モジュール１１０から認識結果（ストローク情報付）１１５を受け付ける。前述したように、認識結果（ストローク情報付）１１５には、各文字に対応しているストロークが含まれている。ストロークスクラッチ判定モジュール１２０は、受け付けた認識結果（ストローク情報付）１１５内のストローク群を文字毎に抽出する。つまり、認識文字毎に、その文字として認識するために用いたストロークを抽出する。
次に、ストロークスクラッチ判定モジュール１２０は、文字毎にその文字を囲む矩形を抽出する。前述したような外接矩形を用いてもよい。また、文字列の高さを矩形の高さとして、又は文字列の幅を矩形の幅として、その矩形を抽出するようにしてもよい。ここでの文字列は、対象としている文字を含む文字列である。横書きの場合、「文字列の高さを矩形の高さとして」、その矩形を抽出する。縦書きの場合、「文字列の幅を矩形の幅として」、その矩形を抽出する。例えば、横書きにおいて、「。」等の句読点、「っ」、「ょ」等の促音、拗音を示す小さな文字については、その外接矩形を縦方向に広げた矩形となる。 The stroke scratch determination module 120 is connected to the online character recognition module 110 and the character removal module 130. The stroke scratch determination module 120 receives a recognition result (with stroke information) 115 from the online character recognition module 110. As described above, the recognition result (with stroke information) 115 includes a stroke corresponding to each character. The stroke scratch determination module 120 extracts a stroke group in the received recognition result (with stroke information) 115 for each character. That is, for each recognized character, a stroke used for recognizing the character is extracted.
Next, the stroke scratch determination module 120 extracts a rectangle surrounding the character for each character. A circumscribed rectangle as described above may be used. Alternatively, the rectangle may be extracted with the height of the character string as the height of the rectangle or the width of the character string as the width of the rectangle. The character string here is a character string including the target character. In the case of horizontal writing, “the height of the character string is set as the height of the rectangle”, and the rectangle is extracted. In the case of vertical writing, “the width of the character string is set as the width of the rectangle” and the rectangle is extracted. For example, in horizontal writing, punctuation marks such as “.”, Urgent sounds such as “tsu”, “yo”, etc., and small characters indicating stuttering are rectangles in which the circumscribed rectangle is expanded vertically.

次に、ストロークスクラッチ判定モジュール１２０は、抽出した矩形を領域（以下、メッシュともいう）に分割する。領域は、矩形を分割しているものであるので、矩形よりも小さい矩形である。矩形の分割数は、縦Ｋ個横Ｌ個等のように、予め定めておけばよい。例えば、Ｋ＝６、Ｌ＝４等とする。
次に、ストロークスクラッチ判定モジュール１２０は、文字毎に抽出したストローク毎に、そのストロークが存在する領域の個数を計数する。例えば、ストロークを辿り、その座標が存在する領域を抽出すればよい。また、ストロークと領域の境界線との交点を算出し、その算出した交点に接する領域を、ストロークが存在する領域として計数するようにしてもよい。
次に、ストロークスクラッチ判定モジュール１２０は、計数した領域の個数に基づいて、矩形内の文字は削除されたものか否かを判断する。なお、対象としているストロークがスクラッチである場合は、矩形内の文字は削除されたものである。ストロークが存在する領域の個数をａとする。
例えば、
Ｐ＝ａ／（Ｋ×Ｌ）
を算出する。このように、領域の総個数で正規化したストロークの存在領域個数を、ストローク存在比（Ｐ）とする。
そして、Ｐが、予め定めておいた閾値Ｔより大であれば、そのストロークが削除記号（スクラッチ）であると判定する。Ｔは、例えば０．７等とする。
そして、ストロークスクラッチ判定モジュール１２０は、処理結果としての判定結果１２５を文字除去モジュール１３０に渡す。判定結果１２５には、認識結果と削除された文字を示す情報（例えば、Ｘ番目の文字が削除されている文字であることを示す情報）が含まれている。 Next, the stroke scratch determination module 120 divides the extracted rectangle into regions (hereinafter also referred to as meshes). The region is a rectangle smaller than the rectangle because it divides the rectangle. The number of rectangular divisions may be determined in advance, such as K in the vertical direction and L in the horizontal direction. For example, K = 6, L = 4, etc.
Next, for each stroke extracted for each character, the stroke scratch determination module 120 counts the number of regions in which the stroke exists. For example, it is only necessary to follow a stroke and extract a region where the coordinates exist. Alternatively, the intersection of the stroke and the boundary line of the area may be calculated, and the area in contact with the calculated intersection may be counted as the area where the stroke exists.
Next, the stroke scratch determination module 120 determines whether or not the characters in the rectangle are deleted based on the counted number of areas. If the target stroke is a scratch, the characters in the rectangle are deleted. Let a be the number of regions where strokes exist.
For example,
P = a / (K × L)
Is calculated. In this way, the number of stroke existing areas normalized by the total number of areas is defined as a stroke existence ratio (P).
If P is larger than a predetermined threshold T, it is determined that the stroke is a deletion symbol (scratch). T is, for example, 0.7.
Then, the stroke scratch determination module 120 passes the determination result 125 as the processing result to the character removal module 130. The determination result 125 includes a recognition result and information indicating the deleted character (for example, information indicating that the Xth character is a deleted character).

文字除去モジュール１３０は、ストロークスクラッチ判定モジュール１２０と接続されている。文字除去モジュール１３０は、削除記号であるストロークが上書きされた文字を除去して、認識結果１３５の文字列を出力する。 The character removal module 130 is connected to the stroke scratch determination module 120. The character removal module 130 removes characters overwritten with strokes that are deletion symbols, and outputs a character string of the recognition result 135.

図２は、第１の実施の形態による処理例を示すフローチャートである。
ステップＳ２０２では、オンライン文字認識モジュール１１０が、ストローク情報１０５を受け付ける。図５（ａ）の例は、消去前の文字列（対象ストローク群５００）である。図５（ｂ）の例は、ＡＢＣ（対象ストローク群５００）のうちＢ（削除文字（Ｂ）５１０）だけをスクラッチで消去したものである。つまり、図５（ａ）の例に示すように、電子ペンを用いて対象ストローク群５００として「ＡＢＣ」と手書きされた後に、図５（ｂ）の例に示すように、電子ペンを用いて文字「Ｂ」の上にスクラッチが施された場合、操作者の筆記によって削除文字（Ｂ）５１０が指定されたことになる。
ステップＳ２０４では、オンライン文字認識モジュール１１０が、ストローク情報１０５を文字認識する。文字認識の結果、図５（ｃ）のように１文字毎に文字の外接矩形（外接矩形５２０、外接矩形５３０、外接矩形５４０）を設定することができる。なお、文字認識結果として、文字列「ＡＢＣ」を得る。なお、外接矩形５３０内にはスクラッチのストロークが存在するが、文字認識できる部分を抽出するようにしてもよいし、図５（ａ）の例のように筆記された後、図５（ｂ）の例のようにスクラッチが筆記された場合（つまり、「Ｃ」が筆記された後に「Ｂ」上にスクラッチが筆記された場合）は、その時系列情報から文字列「ＡＢＣ」を得る。
ステップＳ２０６では、ストロークスクラッチ判定モジュール１２０が、ストローク情報１０５からスクラッチであるか判定し、削除された文字を特定する。ステップＳ２０６の処理については、図３の例を用いて後述する。
ステップＳ２０８では、文字除去モジュール１３０が、文字認識結果である文字列から特定された削除文字を削除する。
ステップＳ２１０では、文字除去モジュール１３０が、認識結果１３５を出力する。 FIG. 2 is a flowchart illustrating a processing example according to the first exemplary embodiment.
In step S202, the online character recognition module 110 receives the stroke information 105. The example of FIG. 5A is a character string (target stroke group 500) before erasure. In the example of FIG. 5B, only B (deleted character (B) 510) of ABC (target stroke group 500) is erased by scratch. That is, as shown in the example of FIG. 5A, after handwriting “ABC” as the target stroke group 500 using the electronic pen, the electronic pen is used as shown in the example of FIG. 5B. When the scratch is applied on the character “B”, the deleted character (B) 510 is designated by the operator's writing.
In step S204, the online character recognition module 110 recognizes the stroke information 105 as characters. As a result of character recognition, a circumscribed rectangle (circumscribed rectangle 520, circumscribed rectangle 530, circumscribed rectangle 540) of each character can be set for each character as shown in FIG. A character string “ABC” is obtained as a character recognition result. Note that although there is a scratch stroke in the circumscribed rectangle 530, a portion where characters can be recognized may be extracted, or after writing as in the example of FIG. 5A, FIG. 5B. When the scratch is written as in the example (that is, when the scratch is written on “B” after “C” is written), the character string “ABC” is obtained from the time series information.
In step S206, the stroke scratch determination module 120 determines whether the stroke is a scratch from the stroke information 105, and identifies the deleted character. The process of step S206 will be described later using the example of FIG.
In step S208, the character removal module 130 deletes the deleted character specified from the character string that is the character recognition result.
In step S210, the character removal module 130 outputs the recognition result 135.

図３は、第１の実施の形態による処理例を示すフローチャートである。
ステップＳ３０２では、文字の外接矩形を抽出する。前述の図５（ｂ）の例は、図５（ｃ）の例に示すように、外接矩形５２０、外接矩形５３０、外接矩形５４０となる。
ここで、図５（ｃ）の例は、文字の外接矩形で区切っているが、前述したように、正確に文字の外接矩形で区切ることができない場合があるため、正確に「文字」であることを要求しているわけではない点に注意が必要である。
ステップＳ３０４では、外接矩形を予め定められた個数のメッシュに分割する。例えば、外接矩形５３０をメッシュに分割した例を図５（ｄ）に示す。図５（ｄ）に示す例は、文字Ｂ（削除文字（Ｂ）５１０）の部分の、スクラッチのストロークだけを抜き出して、大きく拡大表示した図であり、Ｂ（削除文字（Ｂ）５１０）の外接矩形５３０を図５（ｄ）の例に示すようにメッシュで分割する。もちろんのことながら、外接矩形５２０、外接矩形５４０も同様にメッシュに分割するが、これらに、スクラッチはない（ステップＳ３１４）と判断される。
ステップＳ３０６では、外接矩形内の１ストロークを抽出する。 FIG. 3 is a flowchart illustrating a processing example according to the first exemplary embodiment.
In step S302, a circumscribed rectangle of the character is extracted. The example shown in FIG. 5B is a circumscribed rectangle 520, a circumscribed rectangle 530, and a circumscribed rectangle 540, as shown in the example of FIG. 5C.
Here, although the example of FIG. 5C is delimited by the circumscribed rectangle of the character, as described above, since it may not be delimited by the circumscribed rectangle of the character, it is exactly “character”. Note that this is not a requirement.
In step S304, the circumscribed rectangle is divided into a predetermined number of meshes. For example, FIG. 5D shows an example in which the circumscribed rectangle 530 is divided into meshes. The example shown in FIG. 5D is a diagram in which only the scratch stroke of the character B (deleted character (B) 510) is extracted and enlarged, and is shown in B (deleted character (B) 510). The circumscribed rectangle 530 is divided by a mesh as shown in the example of FIG. Of course, the circumscribed rectangle 520 and the circumscribed rectangle 540 are similarly divided into meshes, but it is determined that there is no scratch (step S314).
In step S306, one stroke in the circumscribed rectangle is extracted.

ステップＳ３０８では、ステップＳ３０６で抽出したストロークが存在するメッシュを計数する。つまり、スクラッチのストロークが各メッシュに存在しているかどうかを判定する。例えば、外接矩形５３０内でストロークが存在するメッシュを斜線で示した例を図５（ｅ）に示す。メッシュの数がＮで、内部にストロークが存在しているメッシュの数をａとする。図５（ｄ）の例では、メッシュは横４個縦６個であるため、Ｎは４×６＝２４である。ストロークが通るメッシュは、図５（ｅ）の例の斜線で表示している部分であり、ａ＝２２である。
ステップＳ３１０では、ストロークの存在濃度＞閾値であるか否かを判断し、ストロークの存在濃度＞閾値である場合はステップＳ３１６へ進み、それ以外の場合はステップＳ３１２へ進む。前述の例を用いて説明すると、ａ／Ｎが、閾値よりも大きいときに、そのストロークがスクラッチであると判断する。例えば、閾値を０．７として、図５（ｅ）の例では、０．７＜ａ／Ｎであるので、このストロークをスクラッチとする。 In step S308, the mesh in which the stroke extracted in step S306 exists is counted. That is, it is determined whether or not a scratch stroke exists in each mesh. For example, FIG. 5E shows an example in which a mesh having a stroke in the circumscribed rectangle 530 is indicated by hatching. Let a be the number of meshes with N meshes and strokes inside. In the example of FIG. 5D, since the number of meshes is 4 in the horizontal direction and 6 in the vertical direction, N is 4 × 6 = 24. The mesh through which the stroke passes is a portion indicated by hatching in the example of FIG. 5E, and a = 22.
In step S310, it is determined whether or not the existing density of the stroke> the threshold value. If the existing density of the stroke> the threshold value, the process proceeds to step S316. Otherwise, the process proceeds to step S312. If it demonstrates using the above-mentioned example, when a / N is larger than a threshold value, it will judge that the stroke is a scratch. For example, the threshold value is 0.7, and in the example of FIG. 5E, 0.7 <a / N, so this stroke is a scratch.

ステップＳ３１２では、対象としている文字内の次に判定すべきストロークはあるか否かを判断し、ある場合はステップＳ３０６へ戻り、それ以外の場合はステップＳ３１４へ進む。つまり、対象としている外接矩形内のストローク全てについて、ステップＳ３０６〜ステップＳ３１０までの処理を行う。
ステップＳ３１４では、対象としている外接矩形内にスクラッチはない（削除文字ではない）と判定する。
ステップＳ３１６では、そのストロークはスクラッチである（削除された文字である）と判定する。
ステップＳ３１８では、次に対象とすべき文字はあるか否かを判断し、ある場合はステップＳ３０２へ戻り、それ以外の場合は処理を終了する（ステップＳ３９９）。 In step S312, it is determined whether or not there is a stroke to be determined next in the target character. If there is a stroke, the process returns to step S306. Otherwise, the process proceeds to step S314. That is, the process from step S306 to step S310 is performed for all strokes in the circumscribed rectangle.
In step S314, it is determined that there is no scratch (not a deleted character) in the circumscribed rectangle.
In step S316, it is determined that the stroke is a scratch (a deleted character).
In step S318, it is determined whether or not there is a character to be processed next. If there is, the process returns to step S302, and otherwise, the process ends (step S399).

図６は、第１の実施の形態による処理例を示す説明図である。これは、ステップＳ３０８の処理の一例であり、前述したストロークとメッシュの境界線との交点を用いた例を示すものである。
ストロークが存在するメッシュの個数ｍの算出方法を具体的に述べる。
まず、図６の例に示すようにＸ軸とＹ軸を設定する。この設定方法は２次元中の点位置を指定できるデカルト座標であれば、どのようなものであってもよい
文字の外接矩形をメッシュに分割するＸ軸方向の線を、Ｘ１〜Ｘｎとする。同様にＹ軸方向の線をＹ１〜Ｙｍとする。また、各メッシュにストロークが存在するか否かを示すメモリ（配列）を用意する。この配列はＯＮ（ストロークが存在する）と、ＯＦＦ（ストロークが存在しない）の２種の値を持つ。初期値は全てＯＦＦとなっている。
始点６１０から終点６２０への一つのストロークは、複数の点列から成り立っている。
連続する２つの点を対象とする。そして、この２つの点と点を結ぶ線分を対象とする。
まず、線分の端点が存在するメッシュをＯＮとする。
次に、線分の２つの端点のＸ座標最大値Ｘｍａｘ、最小値Ｘｍｉｎと、Ｙ座標最大値Ｙｍａｘ、最小値Ｙｍｉｎを求める。
ＸｍａｘとＸｍｉｎの間に存在する線Ｘｉ（ｉ＝１〜ｎ）を抽出する。
線分が、Ｘｉ（ｉ＝１〜ｎ）と交差する位置を算出する。交差する線と、交差位置のＸ座標を用いて、線分がどのメッシュに存在するかを知ることができる。
例えば、図６のＸｉと、線分が交差するＸ座標は、矢印で示した位置である。この位置に存在するメッシュの中で、かつ、Ｘｉを境界として持つメッシュを線分は通ることになる。斜線で示したメッシュに対応する配列をＯＮとする。
これを、全ての抽出したＸｉで行う。
さらに、以上を全ての連続する２点で作った線分で行う。
最終的に配列がＯＮとなったメッシュの個数を数えればよい。 FIG. 6 is an explanatory diagram illustrating a processing example according to the first exemplary embodiment. This is an example of the process of step S308, and shows an example using the intersection of the stroke and the boundary line of the mesh described above.
A method for calculating the number m of meshes having strokes will be specifically described.
First, as shown in the example of FIG. 6, the X axis and the Y axis are set. In this setting method, any Cartesian coordinates that can specify the position of a point in two dimensions may be used. The lines in the X-axis direction that divide the circumscribed rectangle of a character into meshes are X1 to Xn. Similarly, the lines in the Y-axis direction are Y1 to Ym. Also, a memory (array) indicating whether or not a stroke exists in each mesh is prepared. This array has two types of values: ON (there is a stroke) and OFF (there is no stroke). The initial values are all OFF.
One stroke from the start point 610 to the end point 620 is composed of a plurality of point sequences.
Target two consecutive points. A line segment connecting these two points is the object.
First, the mesh where the end point of the line segment exists is turned ON.
Next, the X coordinate maximum value Xmax and minimum value Xmin, the Y coordinate maximum value Ymax, and the minimum value Ymin of the two end points of the line segment are obtained.
A line Xi (i = 1 to n) existing between Xmax and Xmin is extracted.
The position where the line segment intersects with Xi (i = 1 to n) is calculated. By using the intersecting line and the X coordinate of the intersecting position, it is possible to know in which mesh the line segment exists.
For example, the X coordinate at which the line segment intersects with Xi in FIG. 6 is the position indicated by the arrow. Among the meshes present at this position, the line segment passes through a mesh having Xi as a boundary. The array corresponding to the mesh indicated by the diagonal lines is set to ON.
This is done for all extracted Xi.
Furthermore, the above is performed with a line segment made of all two consecutive points.
What is necessary is just to count the number of meshes in which the array is finally turned ON.

＜第２の実施の形態＞
図７は、第２の実施の形態による処理例を示す説明図である。第２の実施の形態では、ストロークスクラッチ判定モジュール１２０が、ストローク群を受け付け、その受け付けたストローク群を文字毎に抽出し、文字毎にその文字を囲む矩形を抽出し、ストローク毎に、そのストロークの長さを計測し、計測したストロークの長さを、そのストロークが含まれている矩形の高さ、幅のいずれか一方又は両方を用いて正規化し、その正規化したストロークの長さに基づいて、矩形内の文字は削除されたものか否かを判断する。なお、「ストローク群を受け付け、その受け付けたストローク群を文字毎に抽出し、文字毎にその文字を囲む矩形を抽出すること」は、第１の実施の形態と同等の処理である。 <Second Embodiment>
FIG. 7 is an explanatory diagram illustrating a processing example according to the second exemplary embodiment. In the second embodiment, the stroke scratch determination module 120 accepts a stroke group, extracts the accepted stroke group for each character, extracts a rectangle surrounding the character for each character, and extracts the stroke for each stroke. Based on the normalized stroke length, normalize the measured stroke length using one or both of the height and width of the rectangle that contains the stroke. Thus, it is determined whether or not the characters in the rectangle have been deleted. Note that “accepting a stroke group, extracting the accepted stroke group for each character, and extracting a rectangle surrounding the character for each character” is a process equivalent to the first embodiment.

第２の実施の形態では、ストロークスクラッチ判定モジュール１２０において、ストロークの長さを評価する。
まず、文字を囲む矩形の高さＨを得る。これは、その文字の矩形の高さであってもよいし、文字列全体の高さであってもよい。
次に、ストロークの長さを計測する。ストロークの長さは、例えば、全ての連続する２点間の距離の和として計測することができる。ストロークの長さをＱとする。予め閾値Ｓを用意しておき、（Ｑ／Ｈ）＞Ｓであれば、そのストロークがスクラッチであると判断する。このように文字を囲む矩形の大きさＨで正規化したストローク長さを、ストローク長さ比とする。
この例では、文字を囲む矩形の高さＨを用いたが、文字を囲む矩形の幅Ｗを用いてもよい。又は、文字を囲む矩形の高さとその矩形の幅から算出する数を用いてもよい。
例えば、次のような式等であってもよい。
（Ｈ＋Ｗ）／２
ｓｑｒｔ（Ｈ×Ｗ）（ただし、ｓｑｒｔ（）は平方根を取得する関数） In the second embodiment, the stroke scratch determination module 120 evaluates the stroke length.
First, the height H of the rectangle surrounding the character is obtained. This may be the height of the rectangle of the character or the height of the entire character string.
Next, the stroke length is measured. The length of the stroke can be measured, for example, as the sum of the distances between all two consecutive points. Let Q be the length of the stroke. A threshold value S is prepared in advance, and if (Q / H)> S, it is determined that the stroke is a scratch. The stroke length normalized with the size H of the rectangle surrounding the character in this way is taken as the stroke length ratio.
In this example, the height H of the rectangle surrounding the character is used, but the width W of the rectangle surrounding the character may be used. Alternatively, a number calculated from the height of the rectangle surrounding the character and the width of the rectangle may be used.
For example, the following equation may be used.
(H + W) / 2
sqrt (H × W) (where sqrt () is a function for obtaining the square root)

＜第３の実施の形態＞
図８は、第３の実施の形態の構成例についての概念的なモジュール構成図である。
第１の実施の形態では、メッシュ内に存在するストロークの存在比を用いて、スクラッチの判定をした。第２の実施の形態では、ストロークの長さ比を用いて、スクラッチの判定をした。第３の実施の形態では、これらを組み合わせて判定するものである。
図８の例は、第１の実施の形態（第２の実施の形態）におけるストロークスクラッチ判定モジュール１２０内のモジュール構成例を示したものである。
ストロークスクラッチ判定モジュール１２０は、ストローク存在比Ａ算出モジュール８１０、ストローク長さ比Ｂ算出モジュール８２０、判定モジュール８３０を有している。
ストローク存在比Ａ算出モジュール８１０は、判定モジュール８３０と接続されている。ストローク存在比Ａ算出モジュール８１０は、オンライン文字認識モジュール１１０から認識結果（ストローク情報付）１１５を受け取り、第１の実施の形態におけるストロークスクラッチ判定モジュール１２０の処理を行って、ストローク存在比Ａを算出する。
ストローク長さ比Ｂ算出モジュール８２０は、判定モジュール８３０と接続されている。ストローク長さ比Ｂ算出モジュール８２０は、オンライン文字認識モジュール１１０から認識結果（ストローク情報付）１１５を受け取り、第２の実施の形態におけるストロークスクラッチ判定モジュール１２０の処理を行って、ストローク長さ比Ｂを算出する。
判定モジュール８３０は、ストローク存在比Ａ算出モジュール８１０、ストローク長さ比Ｂ算出モジュール８２０と接続されている。判定モジュール８３０は、ストローク存在比Ａ算出モジュール８１０によって計数された領域の個数及びストローク長さ比Ｂ算出モジュール８２０によって正規化されたストロークの長さ（ストローク存在比Ａとストローク長さ比Ｂを含む）に基づいて、矩形内の文字は削除されたものか否かを判断する。そして、判定結果１２５を文字除去モジュール１３０に渡す。 <Third Embodiment>
FIG. 8 is a conceptual module configuration diagram of an exemplary configuration according to the third embodiment.
In the first embodiment, the scratch is determined using the abundance ratio of strokes existing in the mesh. In the second embodiment, scratch determination is made using the stroke length ratio. In the third embodiment, a combination of these is determined.
The example of FIG. 8 shows a module configuration example in the stroke scratch determination module 120 in the first embodiment (second embodiment).
The stroke scratch determination module 120 includes a stroke existence ratio A calculation module 810, a stroke length ratio B calculation module 820, and a determination module 830.
The stroke presence ratio A calculation module 810 is connected to the determination module 830. The stroke existence ratio A calculation module 810 receives the recognition result (with stroke information) 115 from the online character recognition module 110, and performs the process of the stroke scratch determination module 120 in the first embodiment to calculate the stroke existence ratio A. To do.
The stroke length ratio B calculation module 820 is connected to the determination module 830. The stroke length ratio B calculation module 820 receives the recognition result (with stroke information) 115 from the online character recognition module 110, performs the processing of the stroke scratch determination module 120 in the second embodiment, and performs the stroke length ratio B. Is calculated.
The determination module 830 is connected to the stroke existence ratio A calculation module 810 and the stroke length ratio B calculation module 820. The determination module 830 includes the number of areas counted by the stroke existence ratio A calculation module 810 and the stroke length normalized by the stroke length ratio B calculation module 820 (including the stroke existence ratio A and the stroke length ratio B). ) To determine whether the characters in the rectangle have been deleted. Then, the determination result 125 is passed to the character removal module 130.

「ストローク存在比Ａが閾値Ｔ１より大」かつ「ストローク長さ比Ｂが閾値Ｔ２より大」の場合にスクラッチであると判断するようにしてもよいし、「ストローク存在比Ａが閾値Ｔ１より大」又は「ストローク長さ比Ｂが閾値Ｔ２より大」の場合にスクラッチであると判断するようにしてもよい。また、順番を設けて、「ストローク存在比Ａが閾値Ｔ１より大」である場合に、「ストローク長さ比Ｂが閾値Ｔ２より大」であるか否かの判断をするようにしてもよい。「ストローク長さ比Ｂが閾値Ｔ２より大」である場合に、「ストローク存在比Ａが閾値Ｔ１より大」であるか否かの判断をするようにしてもよい。順番を設けた場合は、ストローク存在比Ａ算出モジュール８１０又はストローク長さ比Ｂ算出モジュール８２０の一方の処理が終了した後に、他方の処理を行うか否かを判定モジュール８３０が判断するようにしてもよい。
また、予め個々の判定毎に閾値を用意するのではなく、一般の２クラス分類の機械学習方法を用いて判定してもよい。例えば、ニューラルネット、ａｄａｂｏｏｓｔ、ＳＶＭ等、様々な機械学習方法を用いることができる。例えば、判定モジュール８３０は、ニューラルネット、ａｄａｂｏｏｓｔ、ＳＶＭ等の機械学習器を用いて構成した２クラス分類器としてもよい。判定モジュール８３０は、予め、ストローク存在比Ａ、ストローク長さ比Ｂの値と、スクラッチ／非スクラッチの判定結果との対応を学習し、学習結果を内包している。判定モジュール８３０は、ストローク存在比Ａ算出モジュール８１０からストローク存在比Ａとストローク長さ比Ｂ算出モジュール８２０からストローク長さ比Ｂの値を受け付けて、判定結果１２５を出力する。
図８の例では、ストローク存在比Ａとストローク長さ比Ｂだけを受け付けているが、他の値を入力してもよい。ただし、他の値を受け付ける場合には、その値も用いて予め学習する必要がある。 When “the stroke existence ratio A is larger than the threshold value T1” and “the stroke length ratio B is larger than the threshold value T2,” it may be determined that the scratch is present, or “the stroke existence ratio A is larger than the threshold value T1. Or “the stroke length ratio B is greater than the threshold value T2” may be determined as a scratch. Further, an order may be provided to determine whether or not “the stroke length ratio B is greater than the threshold value T2” when “the stroke existence ratio A is greater than the threshold value T1”. When “the stroke length ratio B is greater than the threshold value T2”, it may be determined whether or not “the stroke existence ratio A is greater than the threshold value T1”. When the order is provided, the determination module 830 determines whether or not to perform the other process after the one process of the stroke existence ratio A calculation module 810 or the stroke length ratio B calculation module 820 is completed. Also good.
Further, instead of preparing a threshold value for each individual determination in advance, the determination may be performed using a general two-class classification machine learning method. For example, various machine learning methods such as neural network, adaboost, and SVM can be used. For example, the determination module 830 may be a two-class classifier configured using a machine learning device such as a neural network, an adaboost, or an SVM. The determination module 830 learns the correspondence between the stroke existence ratio A and the stroke length ratio B and the scratch / non-scratch determination result in advance, and includes the learning result. The determination module 830 receives the values of the stroke existence ratio A and the stroke length ratio B calculation module 820 from the stroke existence ratio A calculation module 810 and outputs the determination result 125.
In the example of FIG. 8, only the stroke existence ratio A and the stroke length ratio B are accepted, but other values may be input. However, when accepting another value, it is necessary to learn in advance using that value.

図１２は、本実施の形態を実現化する場合のシステム例を示す説明図である。
電子ペン用紙印刷システム１２２０、筆記情報処理システム１２３０は、通信回線１２９９（有線、無線、その混合を問わない回線）を介して接続されている。電子ペン用紙印刷システム１２２０には印刷装置１２２５が接続されており、筆記情報処理システム１２３０には電子ペン１２３５が接続されている。なお、図１、図８等に例示したモジュール構成は、主に、筆記情報処理システム１２３０に構築されている。
電子ペン用紙印刷システム１２２０は、紙ＩＤを用いて情報画像（以下、ドットコード画像ともいう）を重ね合わせた文書を用紙に印刷装置１２２５を用いて印刷するシステムである。筆記情報処理システム１２３０は、電子ペン用紙印刷システム１２２０によって情報画像が印刷された用紙に対して、電子ペン１２３５を用いて筆記が行われた場合に、その筆記情報を文書に重ね合わせるシステムである。そして、操作者によって電子ペン１２３５を用いて記載された削除記号を検出した場合は、削除記号が重ね合わされた文字を削除した文字列（修正後の文字列）を文書に重ね合わせる。 FIG. 12 is an explanatory diagram showing a system example in the case of realizing the present embodiment.
The electronic pen paper printing system 1220 and the writing information processing system 1230 are connected via a communication line 1299 (wired, wireless, or a mixed line). A printing device 1225 is connected to the electronic pen paper printing system 1220, and an electronic pen 1235 is connected to the writing information processing system 1230. The module configuration illustrated in FIGS. 1, 8, etc. is mainly constructed in the writing information processing system 1230.
The electronic pen paper printing system 1220 is a system that uses a printing device 1225 to print a document on which information images (hereinafter also referred to as dot code images) are superimposed using a paper ID. The writing information processing system 1230 is a system for superimposing writing information on a document when writing is performed using the electronic pen 1235 on a sheet on which an information image is printed by the electronic pen sheet printing system 1220. . When the operator uses the electronic pen 1235 to detect a deletion symbol, the character string (corrected character string) from which the deletion symbol is superimposed is superimposed on the document.

図１３は、電子ペン用紙印刷システム１２２０によって、情報画像が印刷された電子ペン用紙１３１０の例を示す説明図である。電子ペン用紙１３１０は、電子ペン用紙印刷システム１２２０が印刷装置１２２５を用いて印刷するものである。電子ペン用紙１３１０には、ドットコード画像が印刷されている。例えば、電子ペン用紙１３１０内の領域１３２０には、領域１３２０内を拡大して示している図１３（ｂ）のようなドットコード画像が印刷されている。ドットコード画像によって、電子ペン用紙１３１０一枚一枚に割り当てられた紙ＩＤと用紙上の位置情報（Ｘ，Ｙ座標値）が表現されている。
例えば、紙ＩＤは、３２ｂｉｔの空間内の数値である。文字列表記の場合は、１６進文字列により表記する。よって、紙ＩＤの範囲は「００００００００」から「ＦＦＦＦＦＦＦＦ」である。 FIG. 13 is an explanatory diagram illustrating an example of electronic pen paper 1310 on which an information image is printed by the electronic pen paper printing system 1220. The electronic pen paper 1310 is printed by the electronic pen paper printing system 1220 using the printing device 1225. A dot code image is printed on the electronic pen paper 1310. For example, in a region 1320 in the electronic pen paper 1310, a dot code image as shown in FIG. 13B, which is an enlarged view of the region 1320, is printed. The dot code image represents the paper ID assigned to each electronic pen paper 1310 and the position information (X, Y coordinate values) on the paper.
For example, the paper ID is a numerical value in a 32-bit space. In the case of character string notation, it is represented by a hexadecimal character string. Therefore, the range of the paper ID is “00000000” to “FFFFFFFF”.

図１４は、電子ペン１２３５内の構成例を示す説明図である。
概要を説明する。電子ペン用紙１４９９上に電子ペン１２３５で筆記する際、圧力センサがＯＮ（前述のペンダウン）になると、電子ペン用紙１４９９上のドットコード画像を撮像し、デコードして、電子ペン用紙１４９９の紙ＩＤと電子ペン用紙１４９９上の位置情報（Ｘ，Ｙ座標値）を取り出し、メモリに格納する。そして、メモリに格納された情報を、通信回路を経由して筆記情報処理システム１２３０に送信する。なお、図４の例に示したストローク情報４００のように、圧力センサがＯＮになった情報として前述の筆圧ＯＮ情報（前述のペンダウン）、圧力センサがＯＦＦになった情報として前述の筆圧ＯＦＦ情報（前述のペンアップ）をもメモリに格納してもよい。また、それぞれの情報が発生した時刻に関する情報（年、月、日、秒、秒以下、又はこれらの組み合わせであってもよい）を含ませるようにしてもよい。 FIG. 14 is an explanatory diagram illustrating a configuration example in the electronic pen 1235.
An outline will be described. When writing with the electronic pen 1235 on the electronic pen paper 1499, when the pressure sensor is turned on (the above-described pen down), a dot code image on the electronic pen paper 1499 is imaged, decoded, and the paper ID of the electronic pen paper 1499. The position information (X, Y coordinate values) on the electronic pen paper 1499 is taken out and stored in the memory. Then, the information stored in the memory is transmitted to the writing information processing system 1230 via the communication circuit. In addition, like the stroke information 400 shown in the example of FIG. 4, the above-described writing pressure ON information (the above-mentioned pen down) is the information that the pressure sensor is turned on, and the above-mentioned writing pressure is the information that the pressure sensor is turned off The OFF information (the pen-up described above) may also be stored in the memory. Moreover, you may make it include the information regarding the time when each information generate | occur | produced (it may be a year, a month, a day, a second, below second, or these combination).

次に詳細に説明する。図示するように、電子ペン１２３５は、ペン全体の動作を制御する制御回路１４０１を備える。また、制御回路１４０１は、入力画像から検出したドットコード画像を処理する画像処理部１４０１ａと、そこでの処理結果から紙ＩＤ及び位置情報を抽出するデータ処理部１４０１ｂとを含む。
そして、制御回路１４０１には、電子ペン１２３５による筆記動作をペンチップ１４０９に加わる圧力によって検出する圧力センサ１４０２が接続されている。また、用紙上に赤外光を照射する赤外ＬＥＤ１４０３と、画像を入力する赤外ＣＭＯＳ１４０４も接続されている。さらに、紙ＩＤ及び位置情報等を記憶するための情報メモリ１４０５と、外部装置と通信するための通信回路１４０６と、電子ペン１２３５を駆動するためのバッテリ１４０７と、電子ペン１２３５の識別情報(ペンＩＤ)を記憶するペンＩＤメモリ１４０８も接続されている。 Next, this will be described in detail. As illustrated, the electronic pen 1235 includes a control circuit 1401 that controls the operation of the entire pen. The control circuit 1401 includes an image processing unit 1401a that processes a dot code image detected from an input image, and a data processing unit 1401b that extracts a paper ID and position information from the processing result there.
The control circuit 1401 is connected to a pressure sensor 1402 that detects a writing operation by the electronic pen 1235 by a pressure applied to the pen tip 1409. In addition, an infrared LED 1403 for irradiating infrared light on the paper and an infrared CMOS 1404 for inputting an image are also connected. Further, an information memory 1405 for storing paper ID and position information, a communication circuit 1406 for communicating with an external device, a battery 1407 for driving the electronic pen 1235, and identification information for the electronic pen 1235 (pen A pen ID memory 1408 for storing (ID) is also connected.

ここで、この電子ペン１２３５の動作の概略を説明する。
電子ペン１２３５による筆記が行われると、ペンチップ１４０９に接続された圧力センサ１４０２が、筆記動作を検出する。これにより、赤外ＬＥＤ１４０３が点灯し、赤外ＣＭＯＳ１４０４がＣＭＯＳセンサによって用紙上の画像を撮像する。
なお、赤外ＬＥＤ１４０３は、消費電力を抑制するために、ＣＭＯＳセンサのシャッタタイミングに同期させてパルス点灯する。
また、赤外ＣＭＯＳ１４０４は、撮像した画像を同時に転送できるグローバルシャッタ方式のＣＭＯＳセンサを使用する。そして、赤外領域に感度があるＣＭＯＳセンサを使用する。また、外乱の影響を低減するために、ＣＭＯＳセンサ全面に可視光カットフィルタを配置している。ＣＭＯＳセンサは、７０ｆｐｓ〜１００ｆｐｓ（ｆｒａｍｅｐｅｒｓｅｃｏｎｄ）程度の周期で、画像を撮像する。なお、撮像素子はＣＭＯＳセンサに限定するものではなく、ＣＣＤ等、他の撮像素子を使用してもよい。 Here, an outline of the operation of the electronic pen 1235 will be described.
When writing with the electronic pen 1235 is performed, the pressure sensor 1402 connected to the pen tip 1409 detects the writing operation. As a result, the infrared LED 1403 is turned on, and the infrared CMOS 1404 captures an image on the sheet by the CMOS sensor.
Note that the infrared LED 1403 is pulse-lit in synchronization with the shutter timing of the CMOS sensor in order to suppress power consumption.
The infrared CMOS 1404 uses a global shutter type CMOS sensor that can simultaneously transfer captured images. A CMOS sensor having sensitivity in the infrared region is used. In order to reduce the influence of disturbance, a visible light cut filter is disposed on the entire surface of the CMOS sensor. The CMOS sensor captures an image with a period of about 70 fps to 100 fps (frame per second). The image sensor is not limited to a CMOS sensor, and other image sensors such as a CCD may be used.

このように撮像した画像が制御回路１４０１に入力されると、制御回路１４０１は、撮像した画像からドットコード画像を取得する。そして、それを復号し、ドットコード画像に埋め込まれている紙ＩＤ及び位置情報を取得する。
以下、このときの制御回路１４０１の動作について説明する。
図１５は、電子ペン１２３５（制御回路１４０１）による処理例を示すフローチャートである。
ステップＳ１５０１では、画像処理部１４０１ａは、画像を入力する。
ステップＳ１５０２では、画像に含まれるノイズを除去するための処理を行う。ここで、ノイズとしては、ＣＭＯＳ感度のばらつきや電子回路により発生するノイズ等がある。ノイズを除去するために如何なる処理を行うかは、電子ペン１２３５の撮像系の特性に応じて決定すべきである。例えば、ぼかし処理やアンシャープマスキング等の先鋭化処理を適用することができる。
ステップＳ１５０３では、画像処理部１４０１ａは、画像からドットパターン（ドット画像の位置）を検出する。例えば、２値化処理によりドットパターン部と背景部とを切り分け、２値化された個々の画像位置からドットパターンを検出することができる。２値化画像にノイズ成分が多数含まれる場合は、例えば、２値化画像の面積や形状によりドットパターンの判定を行うフィルタ処理を組み合わせる必要がある。 When the captured image is input to the control circuit 1401, the control circuit 1401 acquires a dot code image from the captured image. Then, it is decoded to obtain the paper ID and position information embedded in the dot code image.
Hereinafter, the operation of the control circuit 1401 at this time will be described.
FIG. 15 is a flowchart illustrating an example of processing performed by the electronic pen 1235 (control circuit 1401).
In step S1501, the image processing unit 1401a inputs an image.
In step S1502, a process for removing noise included in the image is performed. Here, the noise includes variations in CMOS sensitivity, noise generated by an electronic circuit, and the like. What processing is performed to remove noise should be determined according to the characteristics of the imaging system of the electronic pen 1235. For example, sharpening processing such as blurring processing or unsharp masking can be applied.
In step S1503, the image processing unit 1401a detects a dot pattern (dot image position) from the image. For example, the dot pattern portion and the background portion are separated by binarization processing, and the dot pattern can be detected from each binarized image position. When a binarized image contains a lot of noise components, for example, it is necessary to combine a filter process for determining a dot pattern based on the area and shape of the binarized image.

また、ステップＳ１５０４では、画像処理部１４０１ａは、検出したドットパターンを２次元配列上のデジタルデータに変換する。例えば、２次元配列上で、ドットがある位置を「１」、ドットがない位置を「０」というように変換する。そして、この２次元配列上のデジタルデータは、画像処理部１４０１ａからデータ処理部１４０１ｂへと受け渡される。
次いで、ステップＳ１５０５では、データ処理部１４０１ｂは、受け渡されたデジタルデータから、図１６（ａ）に示した２つのドットの組み合わせからなるビットパターンを検出する。例えば、ビットパターンに対応するブロックの境界位置を２次元配列上で動かし、ブロック内に含まれるドットの数が２つになるような境界位置を検出することにより、ビットパターンを検出することができる。
このようにしてビットパターンが検出されると、ステップＳ１５０６では、データ処理部１４０１ｂは、ビットパターンの種類を参照することにより、同期符号を検出する。
そして、ステップＳ１５０７では、同期符号からの位置関係に基づいて、識別符号及び位置符号を検出する。
その後、ステップＳ１５０８では、データ処理部１４０１ｂは、識別符号を復号して紙ＩＤを取得し、位置符号を復号して位置情報を取得する。識別符号については、ＲＳ復号処理を施すことで紙ＩＤを得る。一方、位置符号については、読み出した部分系列の位置を、画像生成時に使用したＭ系列と比較することで、位置情報を得る。 In step S1504, the image processing unit 1401a converts the detected dot pattern into digital data on a two-dimensional array. For example, on a two-dimensional array, a position where there is a dot is converted to “1”, and a position where there is no dot is converted to “0”. The digital data on the two-dimensional array is transferred from the image processing unit 1401a to the data processing unit 1401b.
In step S1505, the data processing unit 1401b detects a bit pattern including a combination of two dots shown in FIG. 16A from the received digital data. For example, the bit pattern can be detected by moving the boundary position of the block corresponding to the bit pattern on the two-dimensional array and detecting the boundary position so that the number of dots included in the block is two. .
When the bit pattern is detected in this way, in step S1506, the data processing unit 1401b detects the synchronization code by referring to the type of the bit pattern.
In step S1507, the identification code and the position code are detected based on the positional relationship from the synchronization code.
Thereafter, in step S1508, the data processing unit 1401b decodes the identification code to obtain the paper ID, and decodes the position code to obtain position information. About identification code, paper ID is obtained by performing RS decoding processing. On the other hand, for the position code, position information is obtained by comparing the position of the read partial series with the M series used at the time of image generation.

次に、筆記情報格納用電子文書について説明する。以降、単に電子文書と表記した場合、筆記情報格納用電子文書を指す。
筆記情報格納用電子文書は、電子ペン１２３５によって電子ペン用紙１４９９に筆記された内容を、フィールド定義とともにまとめたデータである。以下から構成される。
（１）紙ＩＤ：その筆記情報格納用電子文書に対応付けられた電子ペン用紙に割り当てられている紙ＩＤ
（２）フィールド定義：その筆記情報格納用電子文書への筆記を処理するために使用するフィールド定義
そして、以下を含めてもよい。
（３）帳票イメージ：その筆記情報格納用電子文書に印刷されている帳票イメージ Next, the electronic document for storing written information will be described. Hereinafter, when simply referred to as an electronic document, it refers to an electronic document for storing written information.
The writing information storing electronic document is data in which the contents written on the electronic pen paper 1499 by the electronic pen 1235 are collected together with the field definition. Consists of:
(1) Paper ID: Paper ID assigned to the electronic pen paper associated with the writing information storing electronic document
(2) Field definition: Field definition used for processing writing to the electronic document for storing the writing information.
(3) Form image: Form image printed on the electronic document for storing written information

次に、電子ペン用紙印刷システム１２２０で生成されるドットコード画像の元となるコードパターンについて説明する。
図１６は、電子ペン用紙印刷システム１２２０で取り扱う情報画像（コードパターン画像）の例を示す説明図である。
まず、コードパターンを構成するビットパターンについて説明する。
図１６（ａ）に、ビットパターンの配置の一例を示す。
ビットパターンとは、情報埋め込みの最小単位である。ここでは、図１６（ａ）に示すように、９箇所の中から選択した２箇所にビットを配置する。図では、黒の四角が、ビットが配置された位置を示し、斜線の四角が、ビットが配置されていない位置を示している。９箇所の中から２箇所を選択する組み合わせは、３６（＝９Ｃ２）通りある。したがって、このような配置方法により、３６通り（約５．２ビット）の情報を表現することができる。
ただし、紙ＩＤ及び位置情報は、この３６通りのうち３２通り（５ビット）を使用して表現するものとする。
ところで、図１６（ａ）に示した最小の四角は、６００ｄｐｉにおける２ドット×２ドットの大きさを有している。６００ｄｐｉにおける１ドットの大きさは０．０４２３ｍｍなので、この最小の四角の一辺は、８４．６μｍ（＝０．０４２３ｍｍ×２）である。コードパターンを構成するドットは、大きくなればなるほど目に付きやすくなるため、できるだけ小さいほうが好ましい。ところが、あまり小さくすると、プリンタで印刷できなくなってしまう。そこで、ドットの大きさとして、５０μｍより大きく１００μｍより小さい前記の値を採用している。これにより、プリンタで印刷可能な最適な大きさのドットを形成することができる。つまり、８４．６μｍ×８４．６μｍが、プリンタで安定的に形成可能な最小の大きさである。
なお、ドットをこのような大きさにすることで、１つのビットパターンの一辺は、約０．５（＝０．０４２３×２×６）ｍｍとなる。
また、このようなビットパターンから構成されるコードパターンについて説明する。
図１６（ｂ）に、コードパターンの配置の一例を示す。
ここで、図１６（ｂ）に示した最小の四角が、図１６（ａ）に示したビットパターンに相当する。すなわち、紙ＩＤを符号化した識別符号は、１６（＝４×４）個のビットパターンを使用して埋め込まれる。また、Ｘ方向の位置情報を符号化したＸ位置符号と、Ｙ方向の位置情報を符号化したＹ位置符号とは、それぞれ、４個のビットパターンを使用して埋め込まれる。さらに、左上角部に、コードパターンの位置と回転を検出するための同期符号が、１つのビットパターンを使用して埋め込まれる。
なお、１つのコードパターンの大きさは、ビットパターンの５個分の幅に等しいため、約２．５ｍｍとなる。電子ペン用紙印刷システム１２２０では、このように生成したコードパターンを画像化したコードパターン画像を、用紙全面に配置する。 Next, a code pattern that is the basis of a dot code image generated by the electronic pen paper printing system 1220 will be described.
FIG. 16 is an explanatory diagram illustrating an example of an information image (code pattern image) handled by the electronic pen paper printing system 1220.
First, the bit pattern constituting the code pattern will be described.
FIG. 16A shows an example of bit pattern arrangement.
A bit pattern is the minimum unit of information embedding. Here, as shown in FIG. 16A, bits are arranged at two locations selected from nine locations. In the figure, black squares indicate positions where bits are arranged, and hatched squares indicate positions where bits are not arranged. There are 36 (= 9C2) combinations for selecting 2 locations out of 9 locations. Therefore, 36 kinds (about 5.2 bits) of information can be expressed by such an arrangement method.
However, the paper ID and the position information are expressed using 32 (5 bits) of these 36 patterns.
Incidentally, the minimum square shown in FIG. 16A has a size of 2 dots × 2 dots at 600 dpi. Since the size of one dot at 600 dpi is 0.0423 mm, one side of this minimum square is 84.6 μm (= 0.0423 mm × 2). The larger the dots that make up the code pattern, the more likely it is to be noticeable. However, if it is too small, printing with a printer becomes impossible. Therefore, the above-described value larger than 50 μm and smaller than 100 μm is adopted as the dot size. Thereby, it is possible to form dots of an optimum size that can be printed by the printer. That is, 84.6 μm × 84.6 μm is the minimum size that can be stably formed by the printer.
In addition, by making the dot such a size, one side of one bit pattern becomes about 0.5 (= 0.0423 × 2 × 6) mm.
A code pattern composed of such bit patterns will be described.
FIG. 16B shows an example of the arrangement of code patterns.
Here, the minimum square shown in FIG. 16B corresponds to the bit pattern shown in FIG. That is, the identification code obtained by encoding the paper ID is embedded using 16 (= 4 × 4) bit patterns. Also, the X position code obtained by encoding the position information in the X direction and the Y position code obtained by encoding the position information in the Y direction are each embedded using four bit patterns. Further, a synchronization code for detecting the position and rotation of the code pattern is embedded in the upper left corner using one bit pattern.
Since the size of one code pattern is equal to the width of five bit patterns, it is about 2.5 mm. In the electronic pen paper printing system 1220, a code pattern image obtained by imaging the code pattern generated in this way is arranged on the entire surface of the paper.

図１７は、電子ペン用紙印刷システム１２２０における情報の符号化処理例及び情報画像（ドットコード画像）の生成処理例を示す説明図である。
まず、紙ＩＤの符号化について説明する。
紙ＩＤの符号化には、ブロック符号化方式のＲＳ（リードソロモン）符号が使用される。図１６の例で説明した通り、電子ペン用紙印刷システム１２２０では、５ビットの情報を表現できるビットパターンを用いて情報を埋め込む。したがって、情報の誤りも５ビット単位で発生するため、ブロック符号化方式で符号化効率がよいＲＳ符号を使用している。ただし、符号化方式はＲＳ符号に限定するものでなく、その他の符号化方式、例えば、ＢＣＨ符号等を使用することもできる。
前述したように、電子ペン用紙印刷システム１２２０では、５ビットの情報量をもつビットパターンを用いて情報を埋め込む。したがって、ＲＳ符号のブロック長を５ビットとする必要がある。そのため、紙ＩＤを５ビットずつに区切り、ブロック化する。図１７では、紙ＩＤ「００１１１０１１０１００１…」から、第１のブロック「００１１１」と、第２のブロック「０１１０１」とが切り出されている。
そして、ブロック化された紙ＩＤに対し、ＲＳ符号化処理を行う。図１７では、「ｂｌｋ１」、「ｂｌｋ２」、「ｂｌｋ３」、「ｂｌｋ４」、…というようにブロック化した後、ＲＳ符号化が行われる。
ところで、電子ペン用紙印刷システム１２２０において、紙ＩＤは、１６（＝４×４）個のブロックに分けられる。そこで、ＲＳ符号における符号ブロック数を１６とすることができる。
また、情報ブロック数は、誤りの発生状況に応じて設計することができる。例えば、情報ブロック数を８とすれば、ＲＳ（１６，８）符号となる。この符号は、符号化された情報に４ブロック（＝（１６−８）÷２）の誤りが発生しても、それを補正することができる。また、誤りの位置を特定できれば、訂正能力をさらに向上することができる。なお、この場合、情報ブロックに格納される情報量は、４０ビット（＝５ビット×８ブロック）であるが、このうち３２ビットを用いる。 FIG. 17 is an explanatory diagram showing an example of information encoding processing and an example of information image (dot code image) generation processing in the electronic pen paper printing system 1220.
First, paper ID encoding will be described.
For encoding the paper ID, an RS (Reed Solomon) code of a block encoding method is used. As described in the example of FIG. 16, the electronic pen paper printing system 1220 embeds information using a bit pattern that can represent 5-bit information. Therefore, since an information error also occurs in units of 5 bits, an RS code having good coding efficiency is used in the block coding method. However, the encoding method is not limited to the RS code, and other encoding methods such as a BCH code can also be used.
As described above, in the electronic pen paper printing system 1220, information is embedded using a bit pattern having a 5-bit information amount. Therefore, the block length of the RS code needs to be 5 bits. For this reason, the paper ID is divided into blocks of 5 bits. In FIG. 17, a first block “00111” and a second block “01101” are cut out from the paper ID “0011101101001...”.
Then, RS encoding processing is performed on the blocked paper ID. In FIG. 17, “blk1”, “blk2”, “blk3”, “blk4”,... Are blocked and then RS-encoded.
By the way, in the electronic pen paper printing system 1220, the paper ID is divided into 16 (= 4 × 4) blocks. Therefore, the number of code blocks in the RS code can be 16.
Further, the number of information blocks can be designed according to an error occurrence state. For example, if the number of information blocks is 8, RS (16, 8) code is obtained. This code can correct even if an error of 4 blocks (= (16−8) / 2) occurs in the encoded information. Moreover, if the position of the error can be specified, the correction capability can be further improved. In this case, the amount of information stored in the information block is 40 bits (= 5 bits × 8 blocks), of which 32 bits are used.

次に、位置情報の符号化について説明する。
位置情報の符号化には、擬似乱数系列の一種であるＭ系列符号が使用される。ここで、Ｍ系列とは、Ｋ段の線形シフトレジスタで発生できる最大周期の系列であり、２Ｋ−１の系列長をもつ。このＭ系列から取り出した任意の連続したＫビットは、同じＭ系列中の他の位置に現れない性質をもつ。そこで、この性質を利用することにより、位置情報を符号化することができる。
ところで、電子ペン用紙印刷システム１２２０では、符号化すべき位置情報の長さから、必要なＭ系列の次数を求め、Ｍ系列を生成している。しかしながら、符号化する位置情報の長さが予め分かっている場合は、Ｍ系列を毎回生成する必要はない。すなわち、固定のＭ系列を予め生成しておき、それをメモリ等に格納しておけばよい。
例えば、系列長８１９１のＭ系列（Ｋ＝１３）を使用したとする。
この場合、位置情報も５ビット単位で埋め込むため、系列長８１９１のＭ系列から５ビットずつ取り出してブロック化する。図１７では、Ｍ系列「１１０１００１１０１１０１０…」が、５ビットずつブロック化されている。 Next, encoding of position information will be described.
For encoding the position information, an M-sequence code, which is a kind of pseudo-random sequence, is used. Here, the M sequence is a sequence of the maximum period that can be generated by a K-stage linear shift register, and has a sequence length of 2K-1. Arbitrary consecutive K bits taken out from the M sequence have a property that they do not appear at other positions in the same M sequence. Therefore, the position information can be encoded by using this property.
By the way, in the electronic pen paper printing system 1220, the required M-sequence order is obtained from the length of the position information to be encoded, and the M-sequence is generated. However, if the length of the position information to be encoded is known in advance, it is not necessary to generate the M sequence each time. That is, a fixed M sequence may be generated in advance and stored in a memory or the like.
For example, it is assumed that an M sequence (K = 13) having a sequence length of 8191 is used.
In this case, since the position information is also embedded in units of 5 bits, 5 bits are extracted from the M series having a sequence length of 8191 and blocked. In FIG. 17, the M sequence “11010011011010...” Is blocked by 5 bits.

このように、電子ペン用紙印刷システム１２２０では、位置情報と紙ＩＤとで、異なる符号化方式を用いている。これは、紙ＩＤの検出能力を、位置情報の検出能力よりも高くなるように設定する必要があるからである。つまり、位置情報は、紙面の位置を取得するための情報なので、ノイズ等によって復号できない部分があっても、その部分が欠損するだけで他の部分には影響しない。これに対し、紙ＩＤは、復号に失敗すると、筆記情報を反映する対象を検出できなくなるからである。さらに、このような構成とすることによって、位置情報と紙ＩＤを復号する際の画像読取範囲を最小化できる。すなわち、位置情報にＲＳ符号等の境界を有する符号化方式を使用すると、それを復号する際には境界間の符号を読み取る必要があるため、画像を読み取る範囲は図１６（ｂ）に示した領域の２倍の領域とする必要がある。しかし、Ｍ系列を使用することで、図１６（ｂ）に示した領域と同じ大きさの領域を読み取ればよい構成にできる。これは、Ｍ系列の性質上、Ｍ系列の任意の部分系列から位置情報を復号できるからである。すなわち、紙ＩＤと位置情報を復号する際には、図１６（ｂ）に示した大きさの領域を読み取る必要があるが、その読み取る位置は、図１６（ｂ）に示した境界と一致させる必要はない。位置情報は、Ｍ系列の任意位置の部分系列から復号できる。紙ＩＤは、同じ情報が用紙全面に配置されるため、図１６（ｂ）に図示した境界から読取位置がずれても、読み取られた情報の断片を再配置することで元の情報を復元することができる。 Thus, the electronic pen paper printing system 1220 uses different encoding methods for position information and paper ID. This is because it is necessary to set the paper ID detection capability to be higher than the position information detection capability. That is, since the position information is information for acquiring the position of the paper surface, even if there is a part that cannot be decoded due to noise or the like, the part is lost and does not affect the other part. On the other hand, if the paper ID fails to be decrypted, it is impossible to detect the target reflecting the written information. Further, with such a configuration, it is possible to minimize the image reading range when the position information and the paper ID are decoded. That is, if an encoding method having a boundary such as an RS code is used for position information, it is necessary to read the code between the boundaries when decoding it, so the range for reading the image is shown in FIG. The area needs to be twice as large as the area. However, by using the M series, it is possible to have a configuration in which an area having the same size as the area shown in FIG. This is because the position information can be decoded from an arbitrary partial sequence of the M sequence due to the nature of the M sequence. That is, when decoding the paper ID and the position information, it is necessary to read the area having the size shown in FIG. 16B, but the read position matches the boundary shown in FIG. There is no need. The position information can be decoded from a partial series at an arbitrary position of the M series. As for the paper ID, since the same information is arranged on the entire surface of the paper, even if the reading position is shifted from the boundary illustrated in FIG. 16B, the original information is restored by rearranging the pieces of the read information. be able to.

以上のように、紙ＩＤがブロック分割された後、ＲＳ符号により符号化され、また、位置情報がＭ系列により符号化された後、ブロック分割されると、図示するように、ブロックが合成される。すなわち、これらのブロックは、図示するようなフォーマットで２次元平面に展開される。図１７に示したフォーマットは、図１６（ｂ）に示したフォーマットに対応している。すなわち、黒の四角が同期符号を意味している。
また、横方向に配置された「１」、「２」、「３」、「４」、…がＸ位置符号を、縦方向に配置された「１」、「２」、「３」、「４」、…がＹ位置符号を、それぞれ意味している。位置符号は、用紙の位置が異なれば異なる情報が配置されるので、座標位置に対応する数字で示しているのである。一方、斜線部分の四角が識別符号を意味している。識別符号は、用紙の位置が異なっても同じ情報が配置されるので、全て同じマークで示しているのである。
ところで、図からも分かる通り、２つの同期符号の間には、４個のビットパターンがある。したがって、２０（＝５×４）ビットのＭ系列の部分系列を配置することができる。２０ビットの部分系列から１３ビットの部分系列を取り出せば、その１３ビットが全体（８１９１）の中のどの部分の部分系列なのかを特定することができる。このように、２０ビットのうち１３ビットを位置の特定に使用した場合、取り出した１３ビットの誤りの検出又は訂正を、残りの７ビットを使用して行うことができる。すなわち、Ｍ系列を生成したときと同じ生成多項式を使用して、２０ビットの整合性を確認することで、誤りの検出と訂正が可能となるのである。
その後、各ブロックにおけるビットパターンが、ドット画像を参照することにより画像化される。そして、図１７の最右に示すようなドットで情報を表す出力画像が生成される。 As described above, after the paper ID is divided into blocks, it is encoded with the RS code, and when the position information is encoded with the M series and then divided into blocks, the blocks are synthesized as shown in the figure. The That is, these blocks are developed on a two-dimensional plane in the format shown in the figure. The format shown in FIG. 17 corresponds to the format shown in FIG. That is, a black square means a synchronization code.
Further, “1”, “2”, “3”, “4”,... Arranged in the horizontal direction represent X position codes, and “1”, “2”, “3”, “ 4 ”,... Mean Y position codes. The position code is indicated by a number corresponding to the coordinate position because different information is arranged if the position of the paper is different. On the other hand, the shaded squares represent the identification codes. Since the same information is arranged even if the position of the sheet is different, the identification codes are all indicated by the same mark.
By the way, as can be seen from the figure, there are four bit patterns between two synchronization codes. Therefore, 20 (= 5 × 4) -bit M-sequence partial sequences can be arranged. If a 13-bit partial sequence is extracted from the 20-bit partial sequence, it is possible to specify which partial sequence in the whole (8191) the 13 bits are. As described above, when 13 bits out of 20 bits are used for specifying the position, it is possible to detect or correct the extracted 13-bit error using the remaining 7 bits. That is, it is possible to detect and correct errors by confirming 20-bit consistency using the same generator polynomial as that used when generating the M-sequence.
Thereafter, the bit pattern in each block is imaged by referring to the dot image. Then, an output image representing information with dots as shown on the rightmost side of FIG. 17 is generated.

なお、本実施の形態としてのプログラムが実行されるコンピュータのハードウェア構成は、図１８に例示するように、一般的なコンピュータであり、具体的にはパーソナルコンピュータ、サーバとなり得るコンピュータ等である。つまり、具体例として、処理部（演算部）としてＣＰＵ１８０１を用い、記憶装置としてＲＡＭ１８０２、ＲＯＭ１８０３、ＨＤ１８０４を用いている。ＨＤ１８０４として、例えばハードディスクを用いてもよい。オンライン文字認識モジュール１１０、ストロークスクラッチ判定モジュール１２０、文字除去モジュール１３０、ストローク存在比Ａ算出モジュール８１０、ストローク長さ比Ｂ算出モジュール８２０、判定モジュール８３０等のプログラムを実行するＣＰＵ１８０１と、そのプログラムやデータを記憶するＲＡＭ１８０２と、本コンピュータを起動するためのプログラム等が格納されているＲＯＭ１８０３と、補助記憶装置（フラッシュメモリ等であってもよい）であるＨＤ１８０４と、キーボード、マウス、タッチパネル等に対する利用者の操作に基づいてデータを受け付ける受付装置１８０６と、ＣＲＴ、液晶ディスプレイ等の出力装置１８０５と、ネットワークインタフェースカード等の通信ネットワークと接続するための通信回線インタフェース１８０７、そして、それらをつないでデータのやりとりをするためのバス１８０８により構成されている。これらのコンピュータが複数台互いにネットワークによって接続されていてもよい。 Note that the hardware configuration of the computer on which the program according to the present embodiment is executed is a general computer, specifically, a personal computer, a computer that can be a server, or the like, as illustrated in FIG. That is, as a specific example, the CPU 1801 is used as a processing unit (calculation unit), and the RAM 1802, the ROM 1803, and the HD 1804 are used as storage devices. For example, a hard disk may be used as the HD 1804. CPU 1801 for executing programs such as the online character recognition module 110, the stroke scratch determination module 120, the character removal module 130, the stroke existence ratio A calculation module 810, the stroke length ratio B calculation module 820, the determination module 830, and the programs and data A RAM 1802 that stores a program, a ROM 1803 that stores a program for starting up the computer, an HD 1804 that is an auxiliary storage device (may be a flash memory or the like), and a user for a keyboard, a mouse, a touch panel A communication device for connecting to a receiving device 1806 for receiving data based on the operation of the device, an output device 1805 such as a CRT or liquid crystal display, and a communication network such as a network interface card Interface 1807, and, and a bus 1808 for exchanging data by connecting them. A plurality of these computers may be connected to each other via a network.

前述の実施の形態のうち、コンピュータ・プログラムによるものについては、本ハードウェア構成のシステムにソフトウェアであるコンピュータ・プログラムを読み込ませ、ソフトウェアとハードウェア資源とが協働して、前述の実施の形態が実現される。
なお、図１８に示すハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図１８に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えばＡＳＩＣ等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図１８に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、特に、パーソナルコンピュータの他、情報家電、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Among the above-described embodiments, the computer program is a computer program that reads the computer program, which is software, in the hardware configuration system, and the software and hardware resources cooperate with each other. Is realized.
Note that the hardware configuration illustrated in FIG. 18 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 18, and is a configuration capable of executing the modules described in the present embodiment. I just need it. For example, some modules may be configured by dedicated hardware (for example, ASIC), and some modules may be in an external system and connected via a communication line. A plurality of systems shown in FIG. 5 may be connected to each other via communication lines so as to cooperate with each other. In particular, in addition to personal computers, information appliances, copiers, fax machines, scanners, printers, and multifunction machines (image processing apparatuses having two or more functions of scanners, printers, copiers, fax machines, etc.) Etc. may be incorporated.

前述の実施の形態では、ストロークの存在比や、ストローク長さ比を用いて、ストロークがスクラッチであるか否かを判定した。この判定に用いる閾値を文字サイズ（矩形の大きさ）に応じて変更してもよい。例えば、矩形の縦横の画素数が予め定められた値よりも小さな矩形の場合、スクラッチを重畳し難い場合がある。このような場合を考慮して、閾値を下げる等の処理を行う。又は、矩形の縦横画素数自体を、ニューラルネット、ａｄａｂｏｏｓｔ、ＳＶＭ等の機械学習器で実現した判別器の特徴量として受け付け、学習するようにしてもよい。
また、前述の実施の形態の説明において、予め定められた値との比較において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。 In the above-described embodiment, it is determined whether or not the stroke is a scratch using the stroke existence ratio or the stroke length ratio. The threshold used for this determination may be changed according to the character size (rectangular size). For example, in the case of a rectangle whose number of vertical and horizontal pixels is smaller than a predetermined value, it may be difficult to superimpose a scratch. Considering such a case, processing such as lowering the threshold value is performed. Alternatively, the number of vertical and horizontal pixels per se may be received and learned as a feature quantity of a discriminator realized by a machine learning device such as a neural network, an adaboost, or an SVM.
Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more than”, and “less than” may be used.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１０５…ストローク情報
１１０…オンライン文字認識モジュール
１１５…認識結果（ストローク情報付）
１２０…ストロークスクラッチ判定モジュール
１２５…判定結果
１３０…文字除去モジュール
１３５…認識結果
８１０…ストローク存在比Ａ算出モジュール
８２０…ストローク長さ比Ｂ算出モジュール
８３０…判定モジュール 105 ... Stroke information 110 ... Online character recognition module 115 ... Recognition result (with stroke information)
DESCRIPTION OF SYMBOLS 120 ... Stroke scratch determination module 125 ... Determination result 130 ... Character removal module 135 ... Recognition result 810 ... Stroke existence ratio A calculation module 820 ... Stroke length ratio B calculation module 830 ... Determination module

Claims

Receiving means for receiving a stroke group;
Stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means;
Rectangle extracting means for extracting a rectangle surrounding the stroke group for each unit;
Area dividing means for dividing the rectangle extracted by the rectangle extracting means into areas;
For each stroke extracted by the stroke extraction means, a counting means for counting the number of areas divided by the area dividing means in which the stroke exists;
An information processing apparatus comprising: determination means for determining whether or not the stroke group in the rectangle has been deleted based on the number of areas counted by the counting means.

A calculation means for calculating an intersection between the stroke and the boundary line of the region;
The information processing apparatus according to claim 1, wherein the counting unit sets a region in contact with the intersection calculated by the calculation unit as a region where a stroke exists.

The rectangle extraction means extracts the rectangle by using the height of the row of units as the height of the rectangle, or the width of the row of units as the width of the rectangle. The information processing apparatus described in 1.

Receiving means for receiving a stroke group;
Stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means;
Rectangle extracting means for extracting a rectangle surrounding the stroke group for each unit;
Area dividing means for dividing the rectangle extracted by the rectangle extracting means into areas;
For each stroke extracted by the stroke extraction means, a counting means for counting the number of areas divided by the area dividing means in which the stroke exists;
For each stroke extracted by the stroke extracting means, a measuring means for measuring the length of the stroke;
Normalizing means for normalizing the length of the stroke measured by the measuring means using either one or both of the height and width of the rectangle including the stroke;
Judgment means for judging whether or not the stroke group in the rectangle has been deleted based on the number of areas counted by the counting means and the length of the stroke normalized by the normalizing means. An information processing apparatus characterized by that.

Computer
Receiving means for receiving a stroke group;
Stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means;
Rectangle extracting means for extracting a rectangle surrounding the stroke group for each unit;
Area dividing means for dividing the rectangle extracted by the rectangle extracting means into areas;
For each stroke extracted by the stroke extraction means, a counting means for counting the number of areas divided by the area dividing means in which the stroke exists;
An information processing program for functioning as a judging means for judging whether or not the stroke group in the rectangle is deleted based on the number of areas counted by the counting means.

Computer
Receiving means for receiving a stroke group;
Stroke extracting means for extracting a stroke group for each unit divided according to a predetermined rule from the stroke group received by the receiving means;
Rectangle extracting means for extracting a rectangle surrounding the stroke group for each unit;
Area dividing means for dividing the rectangle extracted by the rectangle extracting means into areas;
For each stroke extracted by the stroke extraction means, a counting means for counting the number of areas divided by the area dividing means in which the stroke exists;
For each stroke extracted by the stroke extracting means, a measuring means for measuring the length of the stroke;
Normalizing means for normalizing the length of the stroke measured by the measuring means using either one or both of the height and width of the rectangle including the stroke;
Based on the number of areas counted by the counting means and the length of the stroke normalized by the normalizing means, it functions as a judging means for judging whether or not the stroke group in the rectangle has been deleted. Information processing program.