JPH1166238A

JPH1166238A - Handwritten character recognition method

Info

Publication number: JPH1166238A
Application number: JP22684297A
Authority: JP
Inventors: Harunobu Oyama; 晴信大山; Masaki Nakagawa; 正樹中川
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 1997-08-22
Filing date: 1997-08-22
Publication date: 1999-03-09
Anticipated expiration: 2017-08-22
Also published as: JP3216800B2

Abstract

PROBLEM TO BE SOLVED: To accurately decide a writing direction and to enable batch recognition by comparing the ratio of the added right-directional component and downward component of a vector with a threshold for lateral/longitudinal writing discrimination and thus accurately discriminating the writing direction. SOLUTION: When a character is handwritten on the input surface of an input device, a coordinate data sequence of stroke points constituting the respective strokes of a handwritten character is inputted from the input device 2 in the stroke order and stored in a storage device 6. When a user selectively operates a command button of 'recognition' after inputting the arbitrary handwritten character, a handwritten character recognition program 61 is started to read the stroke point coordinate data on the handwritten character from the storage device 6. Then a discrimination processing for the writing direction, a discrimination process for a carriage-return position, a discrimination process for the character size, a dividing/combining process for character elements, and a recognition process using a dictionary 62 are performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、タブレットや電子
黒板などの手書き文字入力装置から入力された手書き文
字を認識する手書き文字列認識方法に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a handwritten character string recognition method for recognizing handwritten characters input from a handwritten character input device such as a tablet or an electronic blackboard.

【０００２】[0002]

【従来の技術】従来、この種の手書き文字認識方法ある
いは認識装置として、次のようなものが提案されてい
る。2. Description of the Related Art Heretofore, the following has been proposed as this kind of handwritten character recognition method or recognition device.

【０００３】（１）特開昭６１−２９９８２号（名称：
オンライン手書き文字列認識方式）（２）特開平５−１７４１８５号（名称：日本語文字認
識装置）（３）特開平６−１６２２６９号（名称：手書き文字認
識装置）（４）特開平８−５０６３２号（名称：手書き文字切り
出し方法および装置）特開昭６１−２９９８２号公報に開示されたオンライン
手書き文字列認識方法は、データタブレット上に自由形
式で筆記された文字列を認識する際の制約を解消すると
共に、文字のセグメンテーションを正しく行うことを目
的とし、データタブレットから入力されたストローク列
を複数の基本セグメント列に分割し、次に、その基本セ
グメントを組み合わせて候補文字を生成し、次に、生成
された候補文字を標準文字との照合によって逐次認識
し、認識結果の文字名称と相違度を蓄積する処理を、全
ての候補文字に対して反復実行し、入力ストローク列に
対し相違度の総和を最小とする文字名称の系列を最小経
路探索アルゴリズムを用いて割り当てるようにしたもの
である。(1) JP-A-61-29982 (name:
Online handwritten character string recognition method) (2) JP-A-5-174185 (name: Japanese character recognition device) (3) JP-A-6-162269 (name: handwritten character recognition device) (4) JP-A-8-50632 No. (name: handwritten character cutout method and apparatus) The online handwritten character string recognition method disclosed in Japanese Patent Application Laid-Open No. 61-29982 imposes restrictions on recognizing a character string written in free form on a data tablet. For the purpose of eliminating characters and correctly performing segmentation of characters, the stroke sequence input from the data tablet is divided into a plurality of basic segment sequences, and then the basic segments are combined to generate candidate characters. , A process of sequentially recognizing the generated candidate characters by comparing them with standard characters, and accumulating a character name and a degree of difference of the recognition result, A series of character names that minimizes the sum of the differences to the input stroke sequence is repeatedly assigned to all input characters using a minimum path search algorithm.

【０００４】次に、特開平５−１７４１８５号公報に開
示された日本語文字認識装置は、スキャナなどからオン
ラインもしくはオフラインで入力された日本語文字列の
誤切り出しおよび誤認識を最小限にすることを目的と
し、分離文字あるいは半角文字が並んでいる可能性のあ
る文字列の範囲を検出し、その範囲で全ての切り出し候
補を求め、認識を行い、切り出し優先順位と認識類似度
との相互判断で最も確からしい認識文字コードを出力す
るために、文字部分の連結部分の外接図形を抽出し、隣
接する外接図形が、横書き文書ならば上下方向に、縦書
き文書ならば左右方向に重なっている場合に統合を行っ
て基本矩形を作成し、その基本矩形が単独で１文字とし
て決定できるか否かを判定し、決定できない場合、その
基本矩形の範囲を検出し、この範囲に対し、切り出し候
補として隣接する基本矩形の統合の組合せを求め、夫々
に優先順位を付け、全切り出し候補を認識し、切り出し
優先順位および認識類似度より最も確からしい認識文字
コードを出力するようにしたものである。A Japanese character recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 5-174185 minimizes erroneous segmentation and erroneous recognition of a Japanese character string input online or offline from a scanner or the like. Detects the range of a character string in which there may be separated characters or half-width characters, finds all cutout candidates in that range, performs recognition, and makes mutual judgment between cutout priority and recognition similarity In order to output the most probable recognition character code in, the circumscribed figure of the connected part of the character part is extracted, and the adjacent circumscribed figure overlaps in the vertical direction if it is a horizontal writing document, and overlaps in the horizontal direction if it is a vertical document In such a case, a basic rectangle is created by combining, and it is determined whether or not the basic rectangle can be determined as a single character. If not, the range of the basic rectangle is detected. Then, for this range, a combination of integration of adjacent basic rectangles is obtained as a cutout candidate, priorities are given to each of them, all cutout candidates are recognized, and a recognized character code that is most certain from the cutout priority and the recognition similarity is determined. This is to output.

【０００５】次に、特開平６−１６２２６９号公報に開
示された手書き文字認識装置は、任意の位置に任意の速
度で円滑に手書き文字を入力可能にすることを目的と
し、入力された手書き文字のストローク間の距離および
方向、始点の位置を検出し、座標データを文字単位で識
別し、文字単位の座標データによって該ストロークが表
現する文字を認識するようにしたものである。Next, a handwritten character recognition device disclosed in Japanese Patent Application Laid-Open No. 6-162269 aims at enabling a handwritten character to be smoothly input at an arbitrary position at an arbitrary speed. , The distance and direction between the strokes, and the position of the starting point are detected, the coordinate data is identified in character units, and the character represented by the stroke is recognized by the coordinate data in character units.

【０００６】次に、特開平８−５０６３２号公報に開示
された手書き文字切り出し方法および装置は、入力枠を
設けずに文字の切り出しを可能にすることを目的とし、
入力された手書き文字列の高さＨを求め、この文字列高
さＨに基づいて幅Ｌを決定し、基点Ｏから水平方向に幅
Ｌの範囲を予備探索範囲とし、その予備探索範囲内にお
いてストロークの数Ｓと最大高さｈと形状特徴量ｘ（空
白長の最大のもの）を求め、変数Ｓ，ｈ，ｘに応じて探
索範囲を決定し、その探索範囲内でヒストグラムが最小
値をとる区間を探索し、その区間のうち最長のものが後
続の文字との間の切れ目であるとして１文字の切り出し
を行うようにしたものである。[0006] Next, a handwritten character extracting method and apparatus disclosed in Japanese Patent Application Laid-Open No. 8-50632 has an object to enable extracting characters without providing an input frame.
The height H of the input handwritten character string is determined, the width L is determined based on the character string height H, and the range of the width L from the base point O in the horizontal direction is set as a preliminary search range. The number S of strokes, the maximum height h, and the shape feature x (maximum blank length) are obtained, and a search range is determined in accordance with the variables S, h, x. A search is made for a section to be taken, and one character is cut out assuming that the longest of the sections is a break between subsequent characters.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上記の
公報に記載された手書き文字認識方法にあっては、いず
れも、筆記方向が横書きまたは縦書きとして予め指定さ
れるか、固定されていることを前提とし、さらに改行位
置も指定されることを前提としているため、筆記方向や
改行位置が指定されない手書き文字文書、例えば電子黒
板に筆記された複数行の手書き文書をオンラインで取り
込み、これを一括して認識することができないという問
題がある。However, in the handwritten character recognition method described in the above publication, the writing direction is specified in advance as horizontal writing or vertical writing or fixed. Since it is assumed that the line feed position is also specified, a handwritten character document in which the writing direction and line feed position are not specified, such as a multi-line handwritten document written on an electronic blackboard, is imported online and There is a problem that cannot be recognized.

【０００８】また、特開昭６１−２９９８２号公報に開
示されたオンライン手書き文字列認識方式にあっては、
入力されたストローク列を基本セグメント列に区分する
手法として、横書きの手書き入力文字パターンに対し
て、各ストロークの横軸への投影の重なり具合と手書き
入力文字パターンの外接図形の高さの比と閾値とを比較
してストロークを分割し、分割された各ストロークの組
を基本セグメントとしているため、手書き文字が斜め方
向に傾いて筆記された場合、外接図形の高さが文字高さ
より異常に大きくなってしまい、その結果として、隣の
文字を構成するセグメント列を含んだ形で１つの組の基
本セグメント列として区分してしまう。この結果、斜め
方向に傾いて筆記された手書き入力文字を正しく認識す
ることができなくなるという問題がある。In the online handwritten character string recognition system disclosed in Japanese Patent Application Laid-Open No. 61-29982,
As a method of dividing the input stroke sequence into basic segment sequences, for the horizontal handwritten input character pattern, the ratio of the overlapping degree of projection of each stroke to the horizontal axis and the height of the circumscribed figure of the handwritten input character pattern Since the stroke is divided by comparing it with the threshold value and each set of divided strokes is used as a basic segment, when a handwritten character is written in a diagonal direction, the height of the circumscribed figure is abnormally larger than the character height. As a result, as a result, the set is divided into a set of basic segment strings including the segment strings constituting the adjacent characters. As a result, there is a problem that a handwritten input character written in a slanted direction cannot be correctly recognized.

【０００９】また、特開平５−１７４１８５号公報に開
示された日本語文字認識装置にあっては、横書きの場合
は縦方向に、縦書きの場合は横方向に重なり合うストロ
ーク同士を結合し、１つの文字を構成し得る基本セグメ
ントとしているため、すなわち重なりが有るか無いかと
いう決定論的な手法によって基本セグメントに分割して
いるため、文字間隔が狭くて隣接する文字との外接図形
が重なっている場合には、複数の文字のストロークを１
つの文字の基本セグメントに統合してしまう危険性があ
り、文字間隔の狭い手書き入力文字を正しく認識できな
くなる恐れがある。Further, in the Japanese character recognition device disclosed in Japanese Patent Laid-Open No. 5-174185, strokes overlapping in the vertical direction for horizontal writing and in the horizontal direction for vertical writing are combined with one another. Because it is a basic segment that can constitute one character, that is, it is divided into basic segments by a deterministic method of determining whether there is overlap or not, the circumscribed figure with the adjacent character overlaps because the character spacing is narrow If you have multiple strokes,
There is a risk that the characters may be integrated into the basic segment of one character, and a handwritten input character with a narrow character interval may not be correctly recognized.

【００１０】また、特開平６−１６２２６９号公報に開
示された手書き文字認識装置にあっては、複数の手書き
文字を１文字づつ切り出す際に、第１ストロークの始点
に注目し、直前の文字の最後のストロークの始点位置が
予め定めた閾値よりも下部に有り、かつ当該文字の第１
ストロークの始点位置が前記閾値より上部に有ることを
検出したならば、この部分を１文字の境界候補とした
後、直前の文字の第１ストロークと注目文字の第１スト
ロークの始点間距離および方向を調べ、その始点間距離
が閾値より大きく、かつ文字入力方向と同一であれば、
１文字の切り出し候補に決定し、その切り出し候補の外
接ボックスを作成し、直前に作成した外接ボックスとの
重なり関係を調べ、重なる場合は２つの外接ボックスを
同一文字のストローク群として統合し、重ならない場合
は１つ前の切り出し候補のストローク群を１文字分とし
て切り出すようにしているため、第１ストロークの始点
位置が直前の文字の最後のストロークの始点位置よりも
常に下部になる縦書き形式の手書き文字認識には適用で
きないという問題がある。また、横書き形式であって
も、同様の理由により、１行全体の文字が右下がり方向
に傾いた斜め書き形式の手書き文字の場合に、注目文字
の第１ストロークの始点位置が直前の文字の最後のスト
ロークの始点位置より下部になっていれば、当該第１ス
トロークは直前の文字を構成するストロークとして区分
されてしまい、切り出しが正しく行われなくなるという
問題がある。In the handwritten character recognition device disclosed in Japanese Patent Application Laid-Open No. 6-162269, when cutting out a plurality of handwritten characters one by one, attention is paid to the starting point of the first stroke, and The start point position of the last stroke is below a predetermined threshold and the first point of the character is
If it is detected that the starting point position of the stroke is above the threshold value, this part is set as a candidate for the boundary of one character, and then the distance and direction between the starting point of the first stroke of the immediately preceding character and the first stroke of the target character If the distance between the starting points is larger than the threshold value and is the same as the character input direction,
A cutout candidate for one character is determined, a circumscribing box of the cutout candidate is created, and an overlapping relationship with the circumscribing box created immediately before is examined. If they overlap, two circumscribing boxes are integrated as a stroke group of the same character, If it does not, the stroke group of the previous cut candidate is cut out as one character, so the vertical stroke format where the start point of the first stroke is always lower than the start point of the last stroke of the previous character There is a problem that it cannot be applied to handwritten character recognition. Even in the horizontal writing format, for the same reason, in the case where the characters on the entire line are handwritten characters in the diagonal writing format in which the characters are inclined to the lower right, the starting point of the first stroke of the target character is set to the position of the immediately preceding character. If the first stroke is lower than the starting point of the last stroke, the first stroke is classified as a stroke constituting the immediately preceding character, and there is a problem that the clipping is not performed correctly.

【００１１】また、特開平８−５０６３２号公報に開示
された手書き文字切り出し方法にあっては、入力された
手書き文字列の高さＨを求め、この文字列高さＨに基づ
いて幅Ｌを決定し、基点Ｏから水平方向に幅Ｌの範囲を
予備探索範囲とし、その予備探索範囲内においてストロ
ークの数Ｓと最大高さｈと形状特徴量ｘ（空白長の最大
のもの）を求め、変数Ｓ，ｈ，ｘに応じて探索範囲を決
定し、その探索範囲内でヒストグラムが最小値をとる区
間を探索し、その区間のうち最長のものが後続の文字と
の間の切れ目であるとして１文字の切り出しを行うよう
にしているため、例えば３桁の数字「１１１」を縦長に
筆記した場合、これらの数字が１つの文字を構成するス
トローク列として切り出され、漢字の「川」という文字
に誤認識されてしまう恐れがある。また、複数行の手書
き文字については改行位置で行の区分を行うようにして
いるが、改行位置をどのようにして検出するかについて
は考慮されていない。このため、複数行に渡って筆記さ
れた手書き文字をそれぞれの行別に一括して認識するこ
とができないという問題がある。In the handwritten character extracting method disclosed in Japanese Patent Application Laid-Open No. 8-50632, a height H of an input handwritten character string is obtained, and a width L is determined based on the character string height H. Is determined, and the range of the width L in the horizontal direction from the base point O is set as the preliminary search range, and within the preliminary search range, the number S of strokes, the maximum height h, and the shape feature x (the maximum blank length) are obtained. A search range is determined according to the variables S, h, and x, and a section where the histogram takes the minimum value is searched within the search range, and the longest one of the sections is a break between the subsequent character. Since one character is cut out, for example, when a three-digit number "111" is written vertically, these numbers are cut out as a stroke sequence that constitutes one character, and the character "Kanji" is written as a kanji character. Was misrecognized by Cormorants there is a risk. In addition, for handwritten characters of a plurality of lines, line division is performed at line break positions, but how to detect line break positions is not taken into consideration. Therefore, there is a problem that handwritten characters written over a plurality of lines cannot be collectively recognized for each line.

【００１２】本発明の第１の目的は、電子黒板等に筆記
方向が指定されずに筆記された手書き文字をオンライン
で取り込み、その手書き文字の筆記方向を正確に判定
し、その判定結果に従って手書き文字を一括して認識す
ることができる手書き文字認識方法を提供することにあ
る。A first object of the present invention is to take online a handwritten character written on an electronic blackboard or the like without specifying a writing direction, accurately determine the writing direction of the handwritten character, and write the handwritten character in accordance with the determination result. An object of the present invention is to provide a handwritten character recognition method capable of collectively recognizing characters.

【００１３】本発明の第２の目的は、電子黒板等に改行
位置が指定されずに筆記された手書き文字をオンライン
で取り込み、その手書き文字の改行位置を正確に判定
し、その判定結果に従って複数行に渡る手書き文字を一
括して認識することができる手書き文字認識方法を提供
することにある。A second object of the present invention is to capture online a handwritten character written on an electronic blackboard or the like without specifying a line feed position, to accurately determine the line feed position of the handwritten character, and to determine a plurality of line feed positions in accordance with the determination result. An object of the present invention is to provide a handwritten character recognition method capable of collectively recognizing handwritten characters over a line.

【００１４】本発明の第３の目的は、斜め書きや文字間
隔が狭い手書き文字であっても、各文字の切り出しを正
確に行い、その切り出し結果に従って任意行の手書き文
字を一括して認識することができる手書き文字認識方法
を提供することにある。A third object of the present invention is to accurately cut out each character even if it is a diagonal writing or a handwritten character with a narrow character interval, and collectively recognize handwritten characters on an arbitrary line according to the cutout result. And a method for recognizing handwritten characters.

【００１５】本発明の第４の目的は、縦書き横書きの種
別、行数、筆記枠の有無に関係なく、電子黒板等に筆記
された手書き文字をオンラインで取り込み、その手書き
文字を一括して認識することができる手書き文字認識方
法を提供することにある。A fourth object of the present invention is to capture handwritten characters written on an electronic blackboard or the like online regardless of the type of vertical writing and horizontal writing, the number of lines, and the presence or absence of a writing frame, and collectively collect the handwritten characters. An object of the present invention is to provide a handwritten character recognition method that can be recognized.

【００１６】[0016]

【課題を解決するための手段】上記第１の目的を達成す
るために、本発明は、手書き文字入力装置からストロー
ク順に入力された複数ストローク群を対象し、その複数
ストローク群の中のストローク入力時刻において隣合う
ストロークの終点から始点へのベクトルを求め、そのベ
クトルの右方向成分および下方向成分を同一方向成分同
士で加算し、その加算した右方向成分と下方向成分の比
と横書き／縦書き判定用の閾値とを比較し、前記比が前
記閾値以上ならば横書き、未満ならば縦書きとして筆記
方向を判定し、この筆記方向の判定結果に従って前記複
数ストローク群から成る手書き文字列を認識するように
したものである。In order to achieve the first object, the present invention is directed to a plurality of stroke groups input in the order of strokes from a handwritten character input device, and a stroke input from the plurality of stroke groups. At the time, a vector from the end point to the start point of the adjacent stroke is obtained, the rightward component and the downward component of the vector are added together in the same direction component, and the ratio of the added rightward component to the downward component and the horizontal writing / vertical writing The writing direction is compared with a threshold for writing determination, and if the ratio is equal to or greater than the threshold, the writing direction is determined as horizontal writing, and if the ratio is less than the threshold, the writing direction is determined as vertical writing, and the handwritten character string composed of the plurality of stroke groups is recognized according to the determination result of the writing direction. It is something to do.

【００１７】また、第２の目的を達成するために、複数
ストローク群を対象とし、その筆記方向へのヒストグラ
ムを求め、そのヒストグラムにより筆記点が少ない部分
を改行位置候補に選定し、さらに前記ストローク群の中
のストローク入力時刻において隣合うストロークの終点
から始点へのベクトルおよびそのベクトルの長さの平均
を求め、前記改行位置候補内のベクトルの長さと前記ベ
クトルの長さの平均を比較し、改行判定用の閾値を超え
るベクトルの位置を改行位置として判定し、この改行位
置の判定結果に従って前記複数ストローク群から成る手
書き文字列を認識するようにしたものである。In order to achieve the second object, a histogram in the writing direction is obtained for a plurality of stroke groups, a portion having few writing points is selected as a line feed position candidate based on the histogram, and the stroke is determined. Determine the average of the length of the vector and the vector from the end point to the start point of adjacent strokes at the stroke input time in the group, and compare the average of the length of the vector with the length of the vector in the line feed position candidate, A position of a vector exceeding a threshold for line feed determination is determined as a line feed position, and a handwritten character string composed of the plurality of stroke groups is recognized according to a result of the determination of the line feed position.

【００１８】また、第３の目的を達成するために、複数
ストローク群を構成する各ストローク間の距離を予め定
めた関係式に従って評価し、その評価した距離が仮結合
用の閾値よりも小さいストローク同士を結合する処理を
結合可能なストロークがなくなるまで繰り返すことによ
り、複数ストローク群を複数の文字要素に分割した後、
各文字要素の外接矩形を求め、その外接矩形の高さの最
大値または平均値と幅の最大値または平均値を手書き文
字の標準文字サイズとして推定し、この推定の標準文字
サイズの空間において隣接する文字要素間の関係を表す
パラメータを予め定めた関係式に従って算出し、その算
出したパラメータが仮結合用の閾値よりも小さい文字要
素同士を結合する処理を結合可能な文字要素がなくなる
まで繰り返すことにより、複数の文字要素を複数の文字
要素集合に分割し、その文字要素集合によって辞書を探
索し、辞書に登録された手書き文字パターンに対する評
価値が最大になる文字を認識結果として出力するするよ
うにしたものである。Further, in order to achieve the third object, the distance between the strokes constituting the plurality of stroke groups is evaluated according to a predetermined relational expression, and the estimated distance is smaller than the threshold for temporary connection. By repeating the process of joining together until there are no more strokes that can be joined, after dividing the multiple stroke group into multiple character elements,
The circumscribed rectangle of each character element is obtained, and the maximum value or average value of the circumscribed rectangle and the maximum value or average value of the width are estimated as the standard character size of the handwritten character. A parameter representing the relationship between the character elements to be calculated is calculated according to a predetermined relational expression, and the process of combining the character elements whose calculated parameters are smaller than the threshold for temporary combination is repeated until there are no more character elements that can be combined. Divides a plurality of character elements into a plurality of character element sets, searches a dictionary based on the character element sets, and outputs, as a recognition result, a character having a maximum evaluation value for a handwritten character pattern registered in the dictionary. It was made.

【００１９】さらに、第４の目的を達成するために、複
数ストローク群を対象とし、そのストローク群の中のス
トローク入力時刻において隣合うストロークの終点から
始点へのベクトルを求め、そのベクトルの右方向成分お
よび下方向成分を同一方向成分同士で加算し、その加算
した右方向成分と下方向成分の比と横書き／縦書き判定
用の閾値とを比較し、前記比が前記閾値以上ならば横書
き、未満ならば縦書きとして筆記方向を判定すると共
に、前記複数ストローク群の筆記方向へのヒストグラム
を求め、そのヒストグラムにより筆記点が少ない部分を
改行位置候補に選定し、さらに前記ストローク群の中の
ストローク入力時刻において隣合うストロークの終点か
ら始点へのベクトルおよびそのベクトルの長さの平均を
求め、前記改行位置候補内のベクトルの長さと前記ベク
トルの長さの平均を比較し、改行判定用の閾値を超える
ベクトルの位置を改行位置として判定し、さらに前記複
数ストローク群を構成する各ストローク間の距離を予め
定めた関係式に従って行単位に評価し、その評価した距
離が仮結合用の閾値よりも小さいストローク同士を結合
する処理を結合可能な文字要素がなくなるまで繰り返す
ことにより、複数ストローク群を複数の文字要素に分割
した後、各文字要素の外接矩形を求め、その外接矩形の
高さの最大値または平均値と幅の最大値または平均値を
手書き文字の標準文字サイズとして推定し、この推定の
標準文字サイズの空間において隣接する文字要素間の関
係を表すパラメータを予め定めた関係式に従って算出
し、その算出したパラメータが仮結合用の閾値よりも小
さい文字要素同士を結合する処理を結合可能な文字要素
がなくなるまで繰り返すことにより、複数の文字要素を
複数の文字要素集合に分割し、その文字要素集合によっ
て辞書を探索し、辞書に登録された手書き文字パターン
に対する評価値が最大になる文字を認識結果として出力
するようにしたものである。Further, in order to achieve the fourth object, a vector from an end point to a start point of an adjacent stroke is obtained at a stroke input time in a plurality of stroke groups, and the rightward direction of the vector is determined. The component and the downward component are added together in the same direction component, and the ratio of the added rightward component and downward component is compared with a threshold value for horizontal writing / vertical writing determination. If the ratio is equal to or greater than the threshold value, horizontal writing is performed. If less than, the writing direction is determined as vertical writing, a histogram in the writing direction of the plurality of stroke groups is obtained, a portion having a small number of writing points is selected as a line feed position candidate by the histogram, and a stroke in the stroke group is further determined. At the input time, a vector from the end point to the start point of adjacent strokes and the average of the length of the vector are obtained, and the line feed position is calculated. Comparing the length of the vector in the complement and the average of the length of the vector, determining the position of the vector exceeding the threshold for line feed determination as a line feed position, and further determining the distance between the strokes constituting the plurality of stroke groups in advance. A plurality of stroke groups are evaluated by repeating the process of combining strokes whose evaluated distances are smaller than the threshold for provisional combination until there are no more character elements that can be combined, according to the determined relational expression. After dividing it into elements, the circumscribed rectangle of each character element is determined, and the maximum or average height and average and width of the circumscribed rectangle are estimated as the standard character size of handwritten characters. A parameter representing the relationship between adjacent character elements in the space of the character size is calculated according to a predetermined relational expression, and the calculated parameter is used for temporary combination. By repeating the process of combining character elements smaller than the threshold value until there are no more character elements that can be combined, the plurality of character elements are divided into a plurality of character element sets, and the dictionary is searched according to the character element sets. The character having the maximum evaluation value for the registered handwritten character pattern is output as a recognition result.

【００２０】[0020]

【発明の実施の形態】以下、本発明の実施形態について
図面により詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００２１】図１は、本発明を適用した手書き文字認識
装置の実施の形態を示すブロック構成図であり、タブレ
ットあるいは電子黒板等で構成され、ペン１で入力面に
筆記された手書き文字の筆点座標をストローク順に出力
する手書き文字入力装置２と、手書き文字の認識結果を
表示する表示装置３と、手書き文字入力装置２から入力
された手書き文字の複数ストローク群を文字要素候補別
に結合／分割し、辞書との照合によって認識する中央処
理装置（ＣＰＵ）４と、認識処理に必要な各種のパラメ
ータやコマンドを入力するキーボード５、手書き文字認
識プログラム６１や辞書６２等を記憶した記憶装置６と
で構成されている。FIG. 1 is a block diagram showing an embodiment of a handwritten character recognition apparatus to which the present invention is applied. The apparatus is composed of a tablet or an electronic blackboard, etc., and is used for writing a handwritten character written on an input surface with a pen 1. A handwritten character input device 2 that outputs point coordinates in the order of strokes, a display device 3 that displays a recognition result of handwritten characters, and a plurality of stroke groups of handwritten characters input from the handwritten character input device 2 are combined / divided for each character element candidate. A central processing unit (CPU) 4 for recognizing by matching with a dictionary, a keyboard 5 for inputting various parameters and commands necessary for the recognition process, and a storage device 6 for storing a handwritten character recognition program 61, a dictionary 62 and the like. It is composed of

【００２２】ここで、手書き文字入力装置２は、電子黒
板やタブレットに限定されるものではなく、手書き文字
の筆点座標をストローク順に出力する構成のものであれ
ば全て使用することができる。また、透明タブレットの
下面に表示画面を実装した構造の入力装置を使用するこ
ともできる。Here, the handwritten character input device 2 is not limited to an electronic blackboard or a tablet, and any device can be used as long as it is configured to output the handwriting point coordinates of strokes in the order of strokes. Further, an input device having a structure in which a display screen is mounted on the lower surface of the transparent tablet can be used.

【００２３】本実施形態の手書き文字認識装置にあって
は、図２に示すように、手書き文字入力装置２の入力面
２１上には手書き文字の入力枠は設けられておらず、入
力面２１上でペン１によって図示のような任意の手書き
文字「枠無し手書き文字の認識について」を任意の位置
に複数行に渡って入力した後、「認識」のコマンドボタ
ン２２を選択操作すると、入力面２１上に筆記された手
書き文字が一括して認識され、その認識結果が表示装置
３の表示画面に文字表示される。この場合、認識結果に
誤りがあったならば、「再認識」のコマンドボタン２３
を選択操作することにより、筆記方向を認識する処理か
ら始まる一連の処理が再度実行され、再認識結果が表示
される。また、誤った手書き文字を筆記した場合、「取
消し」のコマンドボタン２４を選択操作することによ
り、１文字単位で取り消すことができる。In the handwritten character recognition device of the present embodiment, as shown in FIG. 2, no input frame for handwritten characters is provided on the input surface 21 of the handwritten character input device 2, and After an arbitrary handwritten character “recognition of frameless handwritten character” as shown in the drawing is input over a plurality of lines by the pen 1 at an arbitrary position, and the “recognition” command button 22 is selected and operated, the input screen is displayed. The handwritten characters written on 21 are collectively recognized, and the recognition result is displayed on the display screen of the display device 3 as characters. In this case, if there is an error in the recognition result, the "re-recognition" command button 23
, A series of processes starting from the process of recognizing the writing direction is executed again, and the re-recognition result is displayed. When an erroneous handwritten character is written, it can be canceled in units of one character by selecting and operating the "cancel" command button 24.

【００２４】ここで、本明細書中で使用する用語の定義
について説明しておく。Here, definitions of terms used in the present specification will be described.

【００２５】（１）ストロークストロークとは、ペン１が入力装置２の入力面２１に接
触してから離れるまでに書かれる１本の手書き線を意味
し、日本語でいうところの「一画」に対応する。１つの
手書き文字は、句読点などを除き複数のストロークで構
成される。(1) Stroke A stroke means a single handwritten line written from the time the pen 1 comes into contact with the input surface 21 of the input device 2 until the pen 1 leaves the stroke, and is referred to as "one stroke" in Japanese. Corresponding to One handwritten character is composed of a plurality of strokes except for punctuation marks.

【００２６】（２）筆点筆点とは、それぞれのストロークを構成する最小単位の
点を意味し、入力面２１におけるペン１の押圧座標値、
あるいはその押圧座標値から導き出される論理的な座標
値で表現され、ストロークの始点や終点といった属性を
備える。(2) Brush Point A brush point means a point of the minimum unit constituting each stroke, and a coordinate value of the pen 1 pressed on the input surface 21;
Alternatively, it is expressed by a logical coordinate value derived from the pressed coordinate value, and has attributes such as a start point and an end point of the stroke.

【００２７】（３）文字要素文字要素とは、１文字に含まれることが明らかなストロ
ークの集合のことを指し、任意のストロークの集合に対
して交点を持つストローク同士の結合、距離の近いスト
ローク同士の結合等の処理を経ることによって得られ
る。図３にストローク、文字要素の区別を例示してい
る。(3) Character element A character element refers to a set of strokes clearly contained in one character, and is a combination of strokes having intersections with an arbitrary set of strokes, and a stroke having a short distance. It is obtained through a process such as bonding between them. FIG. 3 illustrates the distinction between strokes and character elements.

【００２８】（４）手書きパターン手書きパターンとは、図３に例示するように、入力装置
２の入力面に筆記された認識対象の手書き文字を構成す
るストローク群の全体を指し、何処までを認識対象とす
るかは、ユーザが区切りであることをボタンやメニュー
等で明示的に指示する方法、あるいはペン１が入力面２
１から離れて一定時間以上接触操作が行われなかった時
点を区切りとして指示する方法がある。(4) Handwritten Pattern As shown in FIG. 3, the handwritten pattern indicates the entire stroke group constituting the handwritten character to be recognized, which is written on the input surface of the input device 2, and recognizes up to where. The target may be determined by a method in which the user explicitly indicates the break by using a button or menu, or the pen 1
There is a method in which a point in time when the contact operation has not been performed for a certain period of time apart from 1 is set as a break.

【００２９】（５）裏ストローク裏ストロークとは、あるストロークの終点から次のスト
ロークの始点へのベクトルを指す。本発明では、文字内
裏ストローク、文字間裏ストローク、改行裏ストローク
に細分される。(5) Back Stroke The back stroke refers to a vector from the end point of a certain stroke to the start point of the next stroke. In the present invention, the stroke is subdivided into a stroke inside a character, a stroke between characters, and a stroke after a line feed.

【００３０】（６）文字内裏ストローク文字内裏ストロークとは、１文字内に含まれる連続した
２ストローク間で発生する裏ストローク。(6) Back stroke inside a character A back stroke inside a character is a back stroke generated between two consecutive strokes included in one character.

【００３１】（７）文字間裏ストローク文字間裏ストロークとは、ある文字の最後のストローク
の終点と次の文字の始点との間で発生する裏ストロー
ク。(7) Inter-character back stroke The inter-character back stroke is a back stroke generated between the end point of the last stroke of a certain character and the start point of the next character.

【００３２】（８）改行裏ストローク改行裏ストロークとは、ある行の最後の文字の最後のス
トロークの終点と次の行の先頭の文字の最初のストロー
クの始点との間で発生する裏ストローク。(8) New Line Back Stroke A new line back stroke is a back stroke generated between the end point of the last stroke of the last character of a certain line and the start point of the first stroke of the first character of the next line.

【００３３】図４は、本実施形態の手書き文字入力装置
の機能構成図であり、入力装置２の入力面２１で手書き
文字が筆記されると、その手書き文字の各ストロークを
構成する複数の筆点の座標データ列が入力装置２からス
トローク順に出力される。この各ストロークの筆点座標
データ列は、記憶装置６に順次格納される。FIG. 4 is a functional block diagram of the handwritten character input device according to the present embodiment. When a handwritten character is written on the input surface 21 of the input device 2, a plurality of brushes constituting each stroke of the handwritten character are displayed. A coordinate data sequence of points is output from the input device 2 in the order of strokes. The brush point coordinate data sequence of each stroke is sequentially stored in the storage device 6.

【００３４】任意の手書き文字の入力が終了し、ユーザ
が「認識」のコマンドボタン２２を選択操作すると、手
書き文字認識プログラム６１が起動され、記憶装置６に
格納された手書き文字の筆点座標データ列を読出し、筆
記方向の判別処理、改行位置の判別処理、文字サイズの
判別処理、ストローク群の分割／結合処理、文字要素の
分割／結合処理、辞書６２を用いた認識処理を行う。When the input of an arbitrary handwritten character is completed and the user selects and operates the “recognition” command button 22, the handwritten character recognition program 61 is activated and the handwriting character coordinate data of the handwritten character stored in the storage device 6 is stored. The column is read, a writing direction determination process, a line feed position determination process, a character size determination process, a stroke group division / combination process, a character element division / combination process, and a recognition process using the dictionary 62 are performed.

【００３５】手書き文字認識プログラム６１は、筆記方
向取得部６１１、改行位置取得部６１２、標準文字サイ
ズ取得部６１３、枠無し手書き文字列認識部６１４とか
ら構成される。このうち、枠無し手書き文字列認識部６
１４は、図５に示すように、仮結合処理部６１５、仮分
割処理部６１６、評価・探索処理部６１７とから構成さ
れる。以下、この手書き文字認識プログラム６１を構成
する各部の構成および処理内容について詳細に説明す
る。The handwritten character recognition program 61 includes a writing direction acquisition unit 611, a line feed position acquisition unit 612, a standard character size acquisition unit 613, and a frameless handwritten character string recognition unit 614. Among these, the frameless handwritten character string recognition unit 6
As shown in FIG. 5, 14 includes a temporary combination processing unit 615, a temporary division processing unit 616, and an evaluation / search processing unit 617. Hereinafter, the configuration and processing content of each unit constituting the handwritten character recognition program 61 will be described in detail.

【００３６】（１）記憶装置６に格納される筆点座標デ
ータ列の構成入力装置２から出力される手書き文字の各ストローク筆
点座標データ列は、図６に示すように、基本的にはスト
ローク番号６３１と各筆点のｘ，ｙ座標値６３２とから
構成され、認識処理の過程で各ストロークが何文字目の
ストロークに属するかなどのストローク間関係属性６３
３、改行位置に相当するストロークであることを示す改
行位置フラグ６３４などが付加されるようになってい
る。(1) Structure of pen point coordinate data string stored in storage device 6 Each stroke pen point coordinate data string of handwritten characters output from input device 2 is basically formed as shown in FIG. It is composed of a stroke number 631 and x, y coordinate values 632 of each brush point, and an inter-stroke relation attribute 63 such as which character stroke each stroke belongs to during the recognition process.
3. A line feed position flag 634 indicating that the stroke corresponds to the line feed position is added.

【００３７】（２）筆記方向取得部６１１筆記方向取得部６１１は、図７および図８で示される手
順に従って手書きパターンが縦書きか、横書きかを判定
する。図７は、裏ストローク及び縦書き横書き判別ベク
トルの説明図である。裏ストロークとは、前述したよう
に、あるストロークの終点から次のストロークの始点へ
のベクトルである。直感的には、手書きパターンの入力
中のタブレットから離れた状態でのペン１の移動が裏ス
トロークである。裏ストロークはさらに文字内裏ストロ
ークと文字間裏ストロークに分類できる。文字内裏スト
ロークとは、１文字に含まれるストローク間に生じる裏
ストロークである。文字間裏ストロークとは、ある文字
の最後のストロークの終点から次の文字の最初のストロ
ークの始点への裏ストロークである。(2) Writing Direction Acquisition Unit 611 The writing direction acquisition unit 611 determines whether the handwritten pattern is vertical or horizontal according to the procedure shown in FIGS. FIG. 7 is an explanatory diagram of the back stroke and the vertical / horizontal writing determination vector. As described above, the back stroke is a vector from the end point of a certain stroke to the start point of the next stroke. Intuitively, the movement of the pen 1 away from the tablet on which the handwriting pattern is being input is the back stroke. The back stroke can be further classified into a back stroke inside a character and a back stroke between characters. The in-character back stroke is a back stroke generated between strokes included in one character. The inter-character back stroke is a back stroke from the end point of the last stroke of a certain character to the start point of the first stroke of the next character.

【００３８】図７の手書きパターンでは、ＢＳ１,ＢＳ
２,ＢＳ４,ＢＳ６が文字内裏ストローク、ＢＳ３,ＢＳ
５が文字間裏ストロークである。In the handwritten pattern of FIG. 7, BS1, BS
2, BS4 and BS6 are inside strokes of characters, BS3 and BS
5 is the back stroke between characters.

【００３９】筆記方向取得部６１１は、認識対象の手書
きパターンの全てのストローク群を対象として、各裏ス
トロークに含まれる右方向の成分Ｒ３，Ｒ４，Ｒ５と下
方向の成分Ｄ６のみを加算し、縦書き横書き判別ベクト
ルを求める。図７ではＶtotalが縦書き横書き判別ベク
トルである。The writing direction acquisition unit 611 adds only rightward components R3, R4, R5 and downward component D6 included in each back stroke to all stroke groups of the handwritten pattern to be recognized, Find the vertical writing horizontal writing discrimination vector. In FIG. 7, Vtotal is a vertical writing horizontal writing discrimination vector.

【００４０】日本語の場合、横書きの文字列では文字間
裏ストロークは右方向の成分を多く含み、縦書きの文字
列では文字間裏ストロークは下方向の成分を多く含む。
この性質を利用し、筆記方向取得部６１１は図８のよう
な手順で縦書き横書きの判定を行なっている。In the case of Japanese, the backstroke between characters includes many rightward components in a horizontally written character string, and the understroke between characters includes many downward components in a vertically written character string.
Utilizing this property, the writing direction acquisition unit 611 determines vertical / horizontal writing in a procedure as shown in FIG.

【００４１】まず、図７で示した縦書き横書き判別ベク
トルを求める（ステップ８０１）。次に、縦書き横書き
判別ベクトルの右方向の成分を下方向の成分で割った値
Ａ（下方向成分に対する右方向成分の比）と、横書き判
定用の閾値Ｔｈ及び縦書き判定用の閾値Ｔｖとを比較
し、前記の値ＡがＴｈ以上であれば横書き、Ｔｖ以下で
あれば縦書きとして判定する（ステップ８０２）。上記
処理で判定できなかった場合は、筆記された文字数が少
ないと判断し、筆記された手書きパターン全体の外接矩
形の縦横比（高さに対する幅の比）が「１」以上か否か
を調べ、「１」以上ならば横書き、「１」未満ならば縦
書きとして判定する（ステップ８０３）。First, the vertical / horizontal writing discrimination vector shown in FIG. 7 is obtained (step 801). Next, a value A (the ratio of the rightward component to the downward component) obtained by dividing the rightward component of the vertical writing horizontal writing determination vector by the downward component, a threshold Th for horizontal writing determination, and a threshold Tv for vertical writing determination Are determined as horizontal writing if the value A is equal to or greater than Th, and vertical writing is determined if the value A is equal to or less than Tv (step 802). If it cannot be determined by the above processing, it is determined that the number of written characters is small, and it is checked whether the aspect ratio (the ratio of the width to the height) of the circumscribed rectangle of the entire written handwritten pattern is “1” or more. , "1" or more is determined as horizontal writing, and if less than "1", vertical writing is determined (step 803).

【００４２】従って、図７に示したように、判別ベクト
ルの下方向成分に対する右方向成分の比が横書き判定用
の閾値Ｔｈを超えるものについては、正確に「横書き」
として判定される。このようにして横書きか、縦書きか
を判定することにより、ユーザは予め筆記方向を指定す
る必要がなくなり、手書き文字を筆記する際の煩わしさ
から解放される。Therefore, as shown in FIG. 7, when the ratio of the rightward component to the downward component of the discrimination vector exceeds the threshold value Th for judging horizontal writing, the "horizontal writing" is correctly performed.
Is determined. By judging horizontal writing or vertical writing in this way, the user does not need to specify the writing direction in advance, and is free from the trouble of writing handwritten characters.

【００４３】（３）改行位置取得部６１２改行位置取得部６１２は、入力装置２から入力された手
書き文字の複数ストローク群を対象とし、その筆記方向
へのヒストグラムを求め、そのヒストグラムにより筆記
点が少ない部分を改行位置候補に選定し、さらに前記ス
トローク群の中のストローク入力時刻において隣合うス
トロークの終点から始点へのベクトルおよびそのベクト
ルの長さの平均を求め、前記改行位置候補内のベクトル
の長さと前記ベクトルの長さの平均を比較し、改行判定
用の閾値を超えるベクトルの位置を改行位置として判定
する。(3) Line Feed Position Acquisition Unit 612 The line feed position acquisition unit 612 obtains a histogram in the writing direction for a plurality of stroke groups of the handwritten character input from the input device 2, and determines the writing point by the histogram. A small portion is selected as a line feed position candidate, and at the stroke input time in the stroke group, a vector from the end point to the start point of an adjacent stroke and the average of the lengths of the vectors are obtained. The length and the average of the vector lengths are compared, and the position of the vector exceeding the threshold for line feed determination is determined as a line feed position.

【００４４】すなわち、改行位置取得部６１２は、図１
１のフローチャートに示すように、筆記方向取得部部６
１１が判定した筆記方向の判定結果に基づき、ストロー
ク群の筆記方向へのヒストグラムを求める（ステップ１
１０１）。横書きの場合、図９に示すように、ヒストグ
ラム９０１の「谷」に相当する位置が改行位置であると
推定される。そこで、ヒストグラム９０１で筆点分布度
数が小さい部分（谷の部分）をまたぐ裏ストロークを改
行位置候補に選定する（ステップ１１０２）。That is, the line feed position acquisition unit 612 is configured to
As shown in the flowchart of FIG.
A histogram of the stroke group in the writing direction is obtained based on the writing direction determination result determined by step 11 (step 1).
101). In the case of horizontal writing, as shown in FIG. 9, a position corresponding to a “valley” in the histogram 901 is estimated to be a line feed position. Therefore, a back stroke that straddles a portion (valley portion) where the brush point distribution frequency is small in the histogram 901 is selected as a line feed position candidate (step 1102).

【００４５】改行裏ストロークとは、文字間裏ストロー
クの一種であり、図１０に示すように、ある行の末尾の
文字の最後のストロークの終点から次の行の先頭の文字
の最初のストロークの始点への裏ストロークという意味
である。日本語の場合、横書きの文章中の改行裏ストロ
ークは左下方向、縦書きの文章中の裏ストロークは左上
方向である。The line feed back stroke is a kind of inter-character back stroke, and as shown in FIG. 10, the first stroke of the first character of the next character from the end point of the last stroke of the last character of a certain line. This means a back stroke to the starting point. In the case of Japanese, the back stroke of the line feed in the horizontal writing is the lower left direction, and the back stroke in the vertical writing is the upper left direction.

【００４６】そこで、縦書きの場合は、ヒストグラム９
０１で筆点分布度数が小さい部分（谷の部分）をまたぐ
左上方向の裏ストロークを、横書きの場合は左下方向の
裏ストロークを改行裏ストローク候補として選択する。
次に、横書きの場合、上記処理で選択した裏ストローク
の左方向水平成分Ｗｃｒが改行判定用の閾値を超えるも
のを改行裏ストロークと判定し、縦書きの場合は、上記
処理で選択した裏ストロークの上方向鉛直成分Ｈｃｒが
改行判定用の閾値を超えるものを改行裏ストロークと判
定する（ステップ１１０３）。Therefore, in the case of vertical writing, the histogram 9
At 01, a back stroke in the upper left direction straddling a portion (valley portion) where the brush point distribution frequency is small, and in the case of horizontal writing, a back stroke in the lower left direction is selected as a line feed back stroke candidate.
Next, in the case of horizontal writing, if the left horizontal component Wcr of the back stroke selected in the above process exceeds the threshold for line feed determination, it is determined as a line feed back stroke. In the case of vertical writing, the back stroke selected in the above process is determined. If the upward vertical component Hcr exceeds the threshold for line feed determination, it is determined as a line feed back stroke (step 1103).

【００４７】この場合、改行裏ストロークの水平成分Ｗ
ｃｒおよび鉛直成分Ｈｃｒの大きさは、１行の文字数に
よって異なる。そこで、手書き文字の１文字の標準サイ
ズが図１０に示すように既知であるか、推定できる場
合、その標準文字サイズの幅Ｗｓで水平成分Ｗｃｒを割
った値が閾値を超えるものを横書きの場合の改行裏スト
ロークとして選定し、また標準文字サイズの高さＨｓで
鉛直成分Ｈｃｒを割った値が閾値を超えるものを縦書き
の場合の改行裏ストロークとして選定することにより、
判定精度がさらに向上する。In this case, the horizontal component W of the line feed back stroke is
The size of cr and the vertical component Hcr differs depending on the number of characters in one line. Therefore, when the standard size of one handwritten character is known or can be estimated as shown in FIG. 10, when the value obtained by dividing the horizontal component Wcr by the width Ws of the standard character size exceeds a threshold is horizontal writing. And a value obtained by dividing the vertical component Hcr by the height Hs of the standard character size and exceeding a threshold value is selected as a linefeed backstroke in the case of vertical writing.
The judgment accuracy is further improved.

【００４８】ところで、手書き文字が斜め方向に傾いて
筆記された場合、水平成分Ｗｃｒおよび鉛直成分Ｈｃｒ
が算定できなくなる恐れがあるが、斜め書きの場合は、
手書き文字パターンを正規直交座標系に変換する補正処
理を施すことによって水平成分Ｗｃｒおよび鉛直成分Ｈ
ｃｒを正常に算定することが可能である。この場合、斜
め書きであるか否かは、例えば、各手書き文字の外接矩
形の中心を結ぶ線を求め、その線の傾斜によって判定す
ることができる。When a handwritten character is written in an oblique direction, the horizontal component Wcr and the vertical component Hcr
May not be calculated, but in the case of diagonal writing,
The horizontal component Wcr and the vertical component H are obtained by performing a correction process for converting a handwritten character pattern into an orthonormal coordinate system.
It is possible to calculate cr normally. In this case, whether or not the writing is oblique can be determined, for example, by obtaining a line connecting the centers of the circumscribed rectangles of the respective handwritten characters, and determining the inclination of the line.

【００４９】このようにして改行位置を判定することに
より、ユーザは筆記途中で改行位置を指定する必要がな
くなり、手書き文字を筆記する際の煩わしさから解放さ
れる。By determining the line feed position in this way, the user does not need to specify the line feed position during writing, and is freed from the trouble of writing handwritten characters.

【００５０】（４）標準文字サイズ取得部６１３標準文字サイズ取得部６１３は、入力装置２から入力さ
れた手書き文字の複数ストローク群を構成する各ストロ
ーク間の距離を、予め定めた関係式に従って評価し、そ
の評価した距離が仮結合用の閾値よりも小さいストロー
ク同士を結合する仮結合処理を結合可能なストロークが
なくなるまで繰り返すことにより、複数ストローク群を
複数の文字要素に分割した後、各文字要素の外接矩形を
求め、その外接矩形の高さの最大値または平均値と幅の
最大値または平均値を手書き文字の標準文字サイズとし
て推定する。(4) Standard Character Size Acquisition Unit 613 The standard character size acquisition unit 613 evaluates the distance between the strokes constituting a plurality of stroke groups of the handwritten character input from the input device 2 according to a predetermined relational expression. Then, by repeating the temporary joining process of joining strokes whose evaluated distances are smaller than the threshold for temporary joining until there are no strokes that can be joined, after dividing the plurality of stroke groups into a plurality of character elements, The circumscribed rectangle of the element is obtained, and the maximum value or average value of the circumscribed rectangle and the maximum value or average value of the width are estimated as the standard character size of the handwritten character.

【００５１】仮結合処理におけるストローク間の距離
は、図１２および図１３に示すような各パラメータに係
数を乗じて加算した値で評価する。ここで、Ｌは図１２
（ａ）に示すように１つのストロークの標準サイズ（１
辺の長さ）、Ｓは１つのストロークの標準の面積であ
る。１つのストロークの標準サイズＬおよび標準面積Ｓ
は、図１２（ｂ）に破線で示すような各ストロークの外
接矩形を求め、その外接矩形の高さおよび幅のうち、長
い方の値のみを選択し、さらに全てのストロークの高さ
および幅のうち最大のものを選択し、これから１つのス
トロークの標準サイズＬおよび標準面積Ｓ推定する。The distance between strokes in the temporary connection processing is evaluated by a value obtained by multiplying each parameter by a coefficient as shown in FIGS. Here, L is shown in FIG.
As shown in (a), the standard size of one stroke (1
Side length), S is the standard area of one stroke. Standard size L and standard area S of one stroke
Calculates the circumscribed rectangle of each stroke as shown by the broken line in FIG. 12 (b), selects only the longer value of the height and width of the circumscribed rectangle, and further calculates the height and width of all strokes. Is selected, and the standard size L and standard area S of one stroke are estimated from this.

【００５２】なお、後述する文字要素間の結合処理にお
いては、Ｌは１つの文字要素の標準サイズ、Ｓは１つの
文字要素の標準面積となる。In the combining process between the character elements described later, L is the standard size of one character element, and S is the standard area of one character element.

【００５３】（ａ）評価パラメータ＝ｄ／Ｌ図１２（ｂ）に示すように、隣合うストロークの外接図
形（破線で図示）の筆記方向の変位ｄの１文字の標準サ
イズＬに対する割合い、（ｂ）評価パラメータ＝ｃ／Ｓ図１２（ｃ）に示すように、隣合うストロークの外接図
形（破線で図示）の重なり部分の面積ｃの１文字の標準
面積Ｓに対する割合い、（ｃ）評価パラメータ＝ｄ／Ｌ図１２（ｄ）に示すように、隣合うストロークの重心座
標のユークリッド距離ｄの１文字の標準サイズＬに対す
る割合い、（ｄ）評価パラメータ＝ｄ／Ｌ図１３（ａ）に示すように、隣合うストロークの重心座
標の筆記方向の変位ｄの１文字の標準サイズＬに対する
割合い、（ｅ）評価パラメータ＝ｄ／Ｌ図１３（ｂ）に示すように、先のストロークの末尾の筆
点と後のストロークの先頭の筆点のユークリッド距離ｄ
の１文字の標準サイズＬに対する割合い、（ｆ）評価パラメータ＝ｄ／Ｌ図１３（ｃ）に示すように、先のストロークの末尾の筆
点と後のストロークの先頭の筆点の筆記方向の変位ｄの
１文字の標準サイズＬに対する割合い。(A) Evaluation parameter = d / L As shown in FIG. 12B, the ratio of the displacement d in the writing direction of the circumscribed figure (shown by a broken line) of adjacent strokes to the standard size L of one character, (B) Evaluation parameter = c / S As shown in FIG. 12C, the ratio of the area c of the overlapping part of the circumscribed figure (shown by a broken line) of the adjacent stroke to the standard area S of one character, (c) Evaluation parameter = d / L As shown in FIG. 12 (d), the ratio of the Euclidean distance d of the barycentric coordinates of adjacent strokes to the standard size L of one character, (d) evaluation parameter = d / L ), The ratio of the displacement d in the writing direction of the barycentric coordinates of adjacent strokes to the standard size L of one character. (E) Evaluation parameter = d / L As shown in FIG. End of stroke Euclidean distance d of the beginning of the writing point of the stroke and after the writing point
(F) Evaluation parameter = d / L As shown in FIG. 13C, the writing direction of the last writing point of the previous stroke and the writing point of the first writing point of the subsequent stroke Is the ratio of the displacement d to the standard size L of one character.

【００５４】これらの評価パラメータの中から少なくと
も２つを予め選定しておき、その選定した複数の評価パ
ラメータによる評価値が求まったならば、その各評価値
に所定の係数を乗じて加算し、その加算値と仮結合用の
閾値と比較する。この比較処理の結果、加算値が小さい
ものについては、１文字の中に含まれると判定し、その
１対のストロークを同一集合に結合し、１つの文字要素
候補に選定する。この仮結合処理は、閾値以下のストロ
ークがいずれかの文字要素に全て結合されるまで再帰的
に繰り返す。At least two of these evaluation parameters are selected in advance, and when the evaluation values based on the selected plurality of evaluation parameters are obtained, each evaluation value is multiplied by a predetermined coefficient and added. The sum is compared with a threshold value for provisional combination. As a result of this comparison processing, a small addition value is determined to be included in one character, and the pair of strokes is combined into the same set and selected as one character element candidate. This temporary combining process is recursively repeated until all strokes equal to or less than the threshold value are combined with any of the character elements.

【００５５】例えば、図１４（ａ）に示すように「ソフ
ト」というカナ文字が入力された場合、このカナ文字を
構成するストロークＳＴ₁〜ＳＴ₅について、互いに隣接
するストローク同士で図１２（ｂ）〜図１３（ｃ）に示
す評価パラメータを求め、その評価パラメータを全部使
って総合評価を行い、どのストロークを結合して１つの
文字要素とするかを決定する。図１４（ｂ）に各評価パ
ラメータの値の例を示している。ここで、図１４（ｂ）
における評価パラメータ（ａ）〜（ｃ）は図１２（ａ）
〜（ｃ）の評価パラメータ、評価パラメータ（ｄ）〜
（ｆ）は図１３（ａ）〜（ｃ）の評価パラメータに該当
する。算出した各評価パラメータは、小さいほど結合の
度合いが強いことを示している。For example, when a kana character “soft” is input as shown in FIG. 14A, strokes ST ₁ to ST ₅ constituting the kana character are connected with each other by the adjacent strokes shown in FIG. 13) to 13 (c), comprehensive evaluation is performed using all the evaluation parameters, and it is determined which strokes are combined into one character element. FIG. 14B shows an example of the value of each evaluation parameter. Here, FIG.
The evaluation parameters (a) to (c) in FIG.
~ (C) evaluation parameters, evaluation parameters (d) ~
(F) corresponds to the evaluation parameters in FIGS. The smaller the calculated evaluation parameters, the stronger the degree of coupling.

【００５６】図１４（ｂ）の評価パラメータに対し、
「仮結合の閾値＝−４．０」、「仮分割の閾値＝−５．
０」を設定した場合、総合評価はストロークＳＴ₁，Ｓ
Ｔ₂間が「−３．２」、ストロークＳＴ₂，ＳＴ₃間が
「−５．４５」、ストロークＳＴ₃,ST₄間が「−７．
４」、ストロークＳＴ₄，ＳＴ₅間が「−１．４１」であ
るので、ストロークＳＴ₁，ＳＴ₂間は「結合」、ストロ
ークＳＴ₂，ＳＴ₃間は「分割」、ストロークＳＴ₃，Ｓ
Ｔ₄間は「分割」、ストロークＳＴ₄，ＳＴ₅間は「結
合」となる。With respect to the evaluation parameters shown in FIG.
“Temporary combination threshold = −4.0”, “Temporary division threshold = −5.
If you set a 0 ", the overall evaluation stroke ST _1, S
T ₂ between the "-3.2", stroke ST _2, between ST ₃ is "-5.45", stroke ST _3, ST ₄ between the "-7.
4 ", since the stroke ST _4, ST ₅ while it is" -1.41 ", the stroke ST _1, ST ₂ during the" bond "between the stroke ST _2, ST ₃ is" split ", the stroke ST _3, S
T ₄ between is "split" between the stroke ST _4, ST ₅ is "binding".

【００５７】ここで、Ｘ軸方向（横書き方向）の単なる
重なり度合いによって「結合」か「分割」かを従来の決
定論的な方法によって判断するようにした場合、ストロ
ークＳＴ₂，ＳＴ₃間の距離ｄ２よりも小さい距離を仮結
合用の閾値に設定した場合、ストロークＳＴ₄，ＳＴ₅間
の距離ｄ３はｄ２＞ｄ３であるのでストロークＳＴ₄，
ＳＴ₅は「結合」となる。しかし、ストロークＳＴ₁，Ｓ
Ｔ₂間の距離ｄ１はｄ１＞ｄ２であるので、これらスト
ロークＳＴ₁，ＳＴ₂間は「分割」となり、ストロークＳ
Ｔ₂，ＳＴ₃間は「結合」となり、ストローク同士の結合
および分割が正しく行われなくなる。[0057] Here, if you choose to determine by simple overlap "binding" or "split" or the conventional deterministic manner by the degree of X-axis direction (horizontal direction), between the stroke ST _2, ST ₃ the If a distance smaller than the distance d2 is set to the threshold value for the temporary binding, stroke ST ₄ the distance d3 between the stroke ST _4, ST ₅ is a d2> d3,
ST ₅ is "binding". However, the stroke ST ₁ , S
The distance d1 between T ₂ are are d1> d2, these strokes ST _1, ST ₂ between the "split", and the stroke S
The connection between T ₂ and ST ₃ is “joining”, and the joining and division of strokes cannot be performed properly.

【００５８】一方、本発明のように、複数の評価パラメ
ータの総合評価によってストローク同士の結合および分
割を決定することにより、ストローク同士の結合および
分割を精度良く行うことができる。On the other hand, as in the present invention, the connection and division of strokes are determined by comprehensive evaluation of a plurality of evaluation parameters, so that the connection and division of strokes can be performed accurately.

【００５９】標準文字サイズ取得部６１３は、以上のよ
うにしてストロークの結合および分割を行い、文字要素
となる候補を定めるこの結果、入力装置２から入力され
た手書き文字の複数ストローク群は、図１５に破線で囲
んで示すように、複数の文字要素に分割される。The standard character size acquisition unit 613 combines and divides strokes as described above, and determines candidates to be character elements. As a result, a plurality of stroke groups of handwritten characters input from the input device 2 are As shown in FIG. 15 by being surrounded by a broken line, it is divided into a plurality of character elements.

【００６０】そこで、次に、図１５に破線で示すような
各文字要素の外接矩形を求め、その外接矩形の大きさか
ら１文字の大きさを推定する。文字の大きさは、高さと
幅をそれぞれ別個に計算する。計算には各外接矩形の高
さおよび幅のうち、長い方の値のみを利用する。図１４
のような手書きパターンが与えられた時は、高さの計算
にはＨ₁，Ｈ₃，Ｈ₄，Ｈ₅，Ｈ₆を、幅の計算にはＷ₂，Ｗ
₇を利用する。計算に用いるデータを選択した後、それ
ぞれのデータの平均値と標準偏差を求め、平均値との差
を標準偏差で割った値が閾値以上のものはノイズを含ん
でいるものと見做してデータから削除する。最後に残っ
たデータの最大値もしくは平均値を標準文字の高さ、あ
るいは幅の推定値とする。Then, a circumscribed rectangle of each character element as shown by a broken line in FIG. 15 is obtained, and the size of one character is estimated from the size of the circumscribed rectangle. For character size, height and width are calculated separately. For the calculation, only the longer value of the height and width of each circumscribed rectangle is used. FIG.
Such when the handwritten pattern is given as, in the calculation of the height _{_{_{H 1, H 3, H 4}}} , H 5, and H _6, the calculation of the width W _2, W
_{Use 7} . After selecting the data to be used for calculation, calculate the average value and standard deviation of each data, and if the value obtained by dividing the difference from the average value by the standard deviation is equal to or greater than the threshold value, it is considered that noise Delete from data. The maximum or average value of the last remaining data is used as the estimated value of the height or width of the standard character.

【００６１】この場合、最終的にデータ不足で高さＨあ
るいは幅Ｗの片方が算出できなかった場合、算出できた
方の値を算出できなかった方の値にも利用する。例え
ば、高さＨだけが算出でき、幅Ｗが求められなかった場
合は幅Ｗ＝高さＨとする。図１４の例では、文字の高さ
＝Ｈ₆、幅＝Ｗ₇として算出している。In this case, if one of the height H or the width W cannot be finally calculated due to the lack of data, the value that can be calculated is also used as the value that could not be calculated. For example, when only the height H can be calculated and the width W is not obtained, the width W is set to the height H. In the example of FIG. 14, the character height = H _6, it is calculated as the width = W _7.

【００６２】このようにすることにより、筆記方向や行
数の指定が無い場合でも、文字の大きさの推定が可能に
なる。そして、筆記方向や行数の情報が筆記方向判別処
理および改行位置判別処理で判明すれば、仮結合処理の
精度がさらに向上し、結果として、手書き文字の標準サ
イズの推定精度が向上するという利点がある。In this way, the size of a character can be estimated even when the writing direction and the number of lines are not specified. If the information on the writing direction and the number of lines is found in the writing direction discrimination processing and the line feed position discrimination processing, the accuracy of the temporary combination processing is further improved, and as a result, the accuracy of estimating the standard size of handwritten characters is improved. There is.

【００６３】特に、斜め書きや文字間隔が狭い手書き文
字であっても、各文字要素の切り出しを行うための標準
文字サイズを正確に推定することができる。In particular, the standard character size for extracting each character element can be accurately estimated even for obliquely written characters or handwritten characters with a narrow character interval.

【００６４】例えば、図１６（ａ）に示すように斜め書
きの手書き文字が入力された場合、仮結合処理によって
図１６（ｂ）に示すように結合または分割された文字要
素単位に、その文字要素の外接矩形を求め、その外接矩
形の大きさから１文字の大きさを推定するため、標準文
字サイズを斜め書きの場合であっても正確に推定するこ
とができる。For example, when a diagonally-written handwritten character is input as shown in FIG. 16A, the character is united or divided as shown in FIG. Since the circumscribed rectangle of the element is obtained and the size of one character is estimated from the size of the circumscribed rectangle, the standard character size can be accurately estimated even in the case of oblique writing.

【００６５】（５）枠無し手書き文字列認識部６１４枠無し手書き文字列認識部６１４は、図５に詳細を示し
たように仮結合処理部６１５、仮分割処理部６１６、評
価・探索処理部６１７とで構成される。(5) Frameless Handwritten Character String Recognition Unit 614 The frameless handwritten character string recognition unit 614 is, as shown in detail in FIG. 5, a temporary combination processing unit 615, a temporary division processing unit 616, an evaluation / search processing unit. 617.

【００６６】仮結合処理部６１５における処理は、標準
文字サイズ取得部６１３における仮結合処理と全く同様
である。但し、標準文字サイズ取得部６１３における仮
結合処理は個々のストロークを結合し、「１つの文字に
含まれることが明らかな状態の文字要素」を作成するこ
とであるのに対し、仮結合処理部６１５における仮結合
処理は標準文字サイズの推定値を参照し、各文字要素を
さらに結合することである。この場合、文字要素を結合
する際に用いる評価パラメータおよび手順は、標準文字
サイズ取得部６１３における仮結合処理と全く同様のも
のを用いることができる。但し、標準サイズＬは１つの
文字要素の外接矩形の長さの大きい方の値、標準面積Ｓ
は標準サイズＬの正方形の面積を使用する点が異なる。
なお、文字要素の結合に専用に設定した評価パラメータ
を用いてもよい。The processing in temporary combination processing section 615 is exactly the same as the temporary combination processing in standard character size acquisition section 613. However, the temporary combination processing in the standard character size acquisition unit 613 is to combine individual strokes to create a “character element in a state clearly included in one character”. The provisional combination processing in 615 is to further combine each character element with reference to the estimated value of the standard character size. In this case, the evaluation parameters and the procedure used when combining the character elements can be exactly the same as the temporary combination processing in the standard character size acquisition unit 613. However, the standard size L is the larger value of the length of the circumscribed rectangle of one character element, and the standard area S
Uses a square area of standard size L.
Note that an evaluation parameter set specifically for combining character elements may be used.

【００６７】この文字要素の再帰的な仮結合処理によっ
て、例えば図１７に示すように「問」という漢字につい
ては、「門構え」内の「口」という文字要素は最後に筆
記された文字要素であるにも拘らず、「門構え」内に結
合され、「問」という１つの漢字の文字要素集合とな
る。Due to the recursive provisional combination processing of the character elements, for example, as shown in FIG. 17, for the kanji character “Q”, the character element “mouth” in the “gate” is the last written character element. In spite of the fact, it is combined in the "gate stance" and becomes a single kanji character element set of "question".

【００６８】文字要素がさらに結合され、新たな文字要
素集合が作成されたならば、仮分割処理部６１６におい
て仮分割処理を行う。When the character elements are further combined and a new character element set is created, the provisional division processing unit 616 performs a provisional division process.

【００６９】仮分割処理とは、文字要素間の距離を評価
し、仮分割用の閾値よりも大きい距離の文字要素間に、
そこが文字の区切りであることを示す属性フラグを設定
するという処理である。この場合、文字要素間の距離の
評価方法は前述した仮結合処理と同様である。The provisional division processing evaluates the distance between the character elements, and calculates the distance between the character elements having a distance larger than the provisional division threshold.
This is a process of setting an attribute flag indicating that there is a character delimiter. In this case, the method of evaluating the distance between the character elements is the same as in the above-described provisional combination processing.

【００７０】この処理によって、文字区切りの属性フラ
グが設定された２つの文字要素のうち先に筆記された文
字要素の末尾のストロークと、後に筆記された文字要素
の先頭のストロークの間は「文字の区切りであることが
明らかな状態」になる。図２においては、この属性フラ
グを文字の順番号で例示している。属性フラグの表現方
法としては、他の方法を用いても何等構わない。As a result of this processing, between the last stroke of the previously written character element and the first stroke of the later written character element of the two character elements for which the attribute flag for character separation has been set, the “character” It is a state in which it is clear that it is a delimiter. In FIG. 2, this attribute flag is illustrated by a character sequence number. As a method of expressing the attribute flag, any other method may be used.

【００７１】この枠無し文字列認識部６１４における仮
結合処理および仮分割処理は、後続の評価・探索処理部
６１７における探索空間を小さくするための処理である
ので、処理時間が問題にならない場合（高速の処理時間
を必要としない場合）は省略することができる。The temporary combining process and the temporary dividing process in the frameless character string recognizing unit 614 are processes for reducing the search space in the subsequent evaluation / search processing unit 617, so that the processing time does not matter ( The case where high-speed processing time is not required) can be omitted.

【００７２】次に、評価・探索処理部６１７において、
各文字要素集合によって辞書６２を探索し、辞書６２に
登録された手書き文字パターンに対する評価値が最大に
なる文字を判定し、その文字のコードを認識結果として
表示装置３に出力し、表示装置３において文字コードに
対応した文字を表示させる。Next, in the evaluation / search processing unit 617,
The dictionary 62 is searched by each character element set, a character having the maximum evaluation value for the handwritten character pattern registered in the dictionary 62 is determined, and the code of the character is output to the display device 3 as a recognition result. To display the character corresponding to the character code.

【００７３】前記の仮分割処理部６１６の処理が終了し
た段階では、入力装置２から入力された手書きパターン
に含まれる全ての隣接したストローク間の状態は、「１
文字に含まれることが明らかな状態」、「文字の区切り
であることが明らかな状態」、「曖昧な状態」のいずれ
かである。この段階で存在する「曖昧な状態」について
それぞれ、１文字に含まれていると見做すか、文字の区
切りであると見做すかによって、１つの「切り出しパタ
ーン」が定義できる。探索空間にある「切り出しパター
ン」の数は「あいまいな状態」の数をｎとすると、２の
ｎ乗である。At the stage where the processing of the temporary division processing section 616 is completed, the state between all adjacent strokes included in the handwritten pattern input from the input device 2 is “1”.
It is one of a state where it is clear that the character is included, a state where it is clear that the character is delimited, and an “ambiguous state”. One “cutout pattern” can be defined depending on whether the “ambiguous state” existing at this stage is included in one character or a character delimiter. The number of “cutout patterns” in the search space is 2 to the power of n, where n is the number of “ambiguous states”.

【００７４】この評価・探索処理は、探索空間に含まれ
る「全切り出しパターン」の中から以下で説明する評価
値を最大にする「切り出しパターン」を探索するという
処理である。この場合の探索手法には、動的計画法、全
探索、ビーム探索等の既存の探索手法が利用可能であ
る。本実施形態では、探索空間を図１８に示すように２
分木で表現し、その２分木に対するビーム探索を行うよ
うにしている。This evaluation / search process is a process of searching for a “cut pattern” that maximizes an evaluation value described below from “all cut patterns” included in the search space. As a search method in this case, an existing search method such as a dynamic programming method, a full search, or a beam search can be used. In the present embodiment, as shown in FIG.
It is represented by a binary tree, and a beam search for the binary tree is performed.

【００７５】切り出しパターンの評価値は、次に示す評
価パラメータに係数を乗じて加算した値を用いている。As the evaluation value of the cut-out pattern, a value obtained by multiplying the following evaluation parameter by a coefficient and adding it is used.

【００７６】（ａ）切り出された各手書きパターンと辞
書に登録されている手書きパターンとの距離から得られ
る評価パラメータ、（ｂ）各認識結果文字間の遷移確率
から得られる評価パラメータ、（ｃ）切り出された各手
書きパターンのサイズの標準の文字サイズに対する割合
から得られる評価パラメータ、（ｄ）１文字に含まれる
と判断した隣接のストローク間の、文字要素間の距離の
評価値と、仮結合の閾値から得られる評価パラメータ、
（ｅ）文字の区切りであると判断した隣接ストローク間
の、文字要素間の距離の評価値と、仮分割処理の閾値か
ら得られる評価パラメータ。(A) an evaluation parameter obtained from the distance between each cut-out handwritten pattern and the handwritten pattern registered in the dictionary; (b) an evaluation parameter obtained from the transition probability between each recognition result character; and (c). An evaluation parameter obtained from a ratio of the size of each cut-out handwritten pattern to a standard character size; (d) an evaluation value of a distance between character elements between adjacent strokes determined to be included in one character; Evaluation parameters obtained from the threshold of
(E) An evaluation value obtained from an evaluation value of a distance between character elements between adjacent strokes determined to be character delimiters and a threshold value of the temporary division processing.

【００７７】図１８において、１点鎖線は区切りになる
かどうかが曖昧な部分を示し、破線矢印は分割処理、実
線矢印は結合処理によって各文字要素が分割または結合
されることを示している。例えば、手書き文字「晴れ」
をの曖昧部分で結合した後、の曖昧部分で分割した
場合は「晴れ」という文字に認識される。しかし、の
曖昧部分も結合した場合は認識不可能であることを示し
ている。In FIG. 18, a dashed line indicates a portion where it is unclear whether or not a break will occur, a broken line arrow indicates that each character element is divided or combined by the combining process, and a solid arrow indicates that each character element is divided or combined by the combining process. For example, the handwritten character "sunny"
After combining at the ambiguous part of and then dividing at the ambiguous part, the character is recognized as "sunny". However, if the ambiguous part is also combined, it indicates that it cannot be recognized.

【００７８】評価・探索処理部６１７は、各文字要素間
の結合関係が曖昧な部分を左から順に、文字の区切りと
判断する場合は左側に、１文字に含まれると判断する場
合は右側に進むものとすると、図１８の２分木の各ノー
ドの日本語の文字列としての確からしさを以下に述べる
手法で評価しながら、２分木の葉の中から最も確からし
い葉を探索し、その葉に相当する文字列を認識結果とす
る。これは、上記（ｃ）の評価方法に該当する。The evaluation / search processing unit 617 places, in order from left, portions where the connection relation between character elements is ambiguous, to the left when judging to be character delimiters, and to the right when judging that they are included in one character. Assuming that the procedure proceeds, while searching the probability of each node of the binary tree in FIG. 18 as a Japanese character string by the method described below, the most probable leaf is searched from the leaves of the binary tree, and The corresponding character string is used as the recognition result. This corresponds to the evaluation method (c).

【００７９】ある手書きパターンＸが文字列Ｃである確
率は、ベイズの定理により次の「数１」によって表すこ
とができる。The probability that a certain handwritten pattern X is a character string C can be expressed by the following “Equation 1” according to Bayes' theorem.

【００８０】[0080]

【数１】 (Equation 1)

【００８１】ここで、Ｐ（Ｘ）は事象Ｘの起こる確率、
Ｐ（Ｘ│Ｙ）は事象Ｙのもとで事象Ｘの起きる条件つき
確率である。すなわち、Ｐ（Ｘ│Ｃ）；文字列ＣがパターンＸのように書かれる確率、Ｐ（Ｃ）；文字列Ｃが書かれる確率、Ｐ（Ｘ）；パターンＸが書かれる確率；（Ｃとは独立であるので定数として考える）、である。Where P (X) is the probability of occurrence of event X,
P (X | Y) is the conditional probability that event X occurs under event Y. That is, P (X | C); the probability that the character string C is written like the pattern X, P (C); the probability that the character string C is written, P (X); the probability that the pattern X is written; Are independent and are considered as constants).

【００８２】Ｐ（Ｃ）は近似的に、「数２」によって表
すことができる。P (C) can be approximately expressed by “Equation 2”.

【００８３】[0083]

【数２】 (Equation 2)

【００８４】但し、Ｐ（Ｃ_i+1│Ｃ_i）はｉ番目の文字と
ｉ＋１番目の文字が連続して書かれる確率のことで、予
め統計を取って用意して有る表から求める。Ｎは文字数
である。[0084] However, since the probability _{_{P (C i + 1 │C i}} ) is the i th character and the (i + 1) th character is written continuously determined from the table there are prepared taking advance statistics. N is the number of characters.

【００８５】Ｐ（Ｘ│Ｃ）は近似的に、「数３」によっ
て表すことができる。P (X | C) can be approximately expressed by “Equation 3”.

【００８６】[0086]

【数３】 (Equation 3)

【００８７】但し、Ｐ（Ｘ_i│Ｃ_i）は文字列Ｃ中のｉ番
目の文字Ｃｉが手書きパターンＸを１文字毎に分割した
中のｉ番目の手書きパターンＸｉのように書かれる確率
であり、文字Ｃｉに対応する辞書パターンと手書きパタ
ーンＸｉをオンライン枠有り文字認識装置で比較するこ
とにより求めている。Here, P (X _i │C _i ) is a probability that the i-th character Ci in the character string C is written like the i-th hand-written pattern Xi in which the hand-written pattern X is divided for each character. Yes, it is obtained by comparing the dictionary pattern corresponding to the character Ci and the handwritten pattern Xi with the online frame character recognition device.

【００８８】Ｐ（区切りｏｒ結合│ｄｋ）はｋ番目の文
字要素とｋ＋１番目の文字要素間の距離がｄｋの場合に
その２つの文字要素間が、文字の区切りに成っている確
率、あるいは１文字に含まれている確率である。どちら
の確率を求めるかは、手書きパターンＸの分割の仕方に
依存する。P (delimiter or combination | dk) is the probability that, when the distance between the k-th character element and the (k + 1) -th character element is dk, the two character elements constitute a character delimiter, or 1 Probability of being included in the character. Which probability is determined depends on how to divide the handwritten pattern X.

【００８９】評価中の手書きパターンの分割法で、ｋ番
目の文字要素とｋ＋１番目の文字要素が１文字に含まれ
ていなければ文字になる確率を、１文字に含まれていれ
ば１文字に含まれる確率を求める。In the division method of the handwritten pattern under evaluation, the probability of becoming a character if the k-th character element and the (k + 1) -th character element are not included in one character is reduced to one character if it is included in one character. Find the probability of being included.

【００９０】Ｐ（ＳＩＺＥ_i│標準サイズ）は、１文字
の標準の大きさが標準サイズである時の、ｉ番目の文字
の大きさＳＩＺＥｉの確からしさである。P (SIZE _i | standard size) is the certainty of the size SIZEi of the i-th character when the standard size of one character is the standard size.

【００９１】次に、コンピュータで計算することを考慮
した場合、「数３」では乗算が多く、（２ｉ＋ｋ）回の
乗算が必要になる。そこで、「数３」を「数４」に示す
ような対数項を持つ計算式に置き換え、この「数４」の
計算結果を統計的評価値として採用する。Next, in consideration of calculation by a computer, "Equation 3" requires many multiplications and (2i + k) multiplications. Therefore, “Formula 3” is replaced with a calculation formula having a logarithmic term as shown in “Formula 4”, and the calculation result of “Formula 4” is adopted as a statistical evaluation value.

【００９２】[0092]

【数４】 (Equation 4)

【００９３】このように日本語としての確からしさを評
価し、その評価値が最大となる文字を認識結果として出
力することにより、文字間隔が不揃いな手書き文字、斜
めに傾いて筆記された手書き文字が存在したとしても、
複数行にわたる文字列の文脈に適合する認識結果が得ら
れ、文字単位の認識では得られない高精度の認識結果を
一括して得ることができる。As described above, the likelihood of Japanese is evaluated, and the character having the largest evaluation value is output as a recognition result, so that handwritten characters with irregular character spacing and handwritten characters written diagonally are written. Even if exists,
A recognition result suitable for the context of a character string over a plurality of lines can be obtained, and a high-precision recognition result that cannot be obtained by character-by-character recognition can be obtained collectively.

【００９４】例えば、図１６（ａ）の手書き文字は同図
（ｃ）に示すような文字要素の結合によって正しく認識
される。For example, the handwritten characters in FIG. 16A are correctly recognized by combining the character elements as shown in FIG.

【００９５】なお、本発明は、上記実施形態に限定され
るものではなく、筆記方向取得部６１１、改行位置取得
部６１２、標準文字サイズ取得部６１２、枠無し手書き
文字認識部６１４における処理を新規の要素技術とし
て、既存の文字認識処理の中に組み込んで構成すること
ができる。Note that the present invention is not limited to the above embodiment, and the processing in the writing direction acquisition unit 611, line feed position acquisition unit 612, standard character size acquisition unit 612, and handwriting recognition unit 614 without a frame is newly performed. As an elemental technology of, it can be configured by being incorporated into existing character recognition processing.

【００９６】また、手書き文字認識プログラムは、ＣＤ
・ＲＯＭ等の記録媒体に格納されてユーザに提供され
る。または、インタネット等の通信媒体を通じて有償で
提供される。The handwritten character recognition program is a CD
-Stored in a recording medium such as a ROM and provided to the user. Alternatively, it is provided for a fee through a communication medium such as the Internet.

【００９７】[0097]

【発明の効果】以上説明したように、本発明によれば、
電子黒板等に筆記方向が指定されずに筆記された手書き
文字の筆記方向を正確に判定し、その判定結果に従って
手書き文字を認識することができる。As described above, according to the present invention,
It is possible to accurately determine the writing direction of a handwritten character written without specifying the writing direction on an electronic blackboard or the like, and recognize the handwritten character according to the determination result.

【００９８】また、電子黒板等に改行位置が指定されず
に筆記された手書き文字の改行位置を正確に判定し、そ
の判定結果に従って複数行に渡る手書き文字を認識する
ことができる。Further, it is possible to accurately determine the line feed position of a handwritten character written without specifying a line feed position on an electronic blackboard or the like, and recognize handwritten characters over a plurality of lines according to the result of the determination.

【００９９】さらに、斜め書きや文字間隔が狭い手書き
文字であっても、各文字要素の切り出しを正確に行い、
その切り出し結果に従って任意行の手書き文字を認識す
ることができる。Furthermore, even for obliquely written characters or handwritten characters with a narrow character interval, each character element is accurately cut out,
An arbitrary line of handwritten characters can be recognized according to the cutout result.

【０１００】また、縦書き横書きの種別、行数、筆記枠
の有無に関係なく、電子黒板等に筆記された手書き文字
を高精度で認識することができる。Further, regardless of the type of vertical writing and horizontal writing, the number of lines, and the presence or absence of a writing frame, handwritten characters written on an electronic blackboard or the like can be recognized with high accuracy.

[Brief description of the drawings]

【図１】本発明を適用した手書き文字認識装置の実施形
態を示すブロック構成図である。FIG. 1 is a block diagram showing an embodiment of a handwritten character recognition apparatus to which the present invention is applied.

【図２】手書き文字入力装置の入力面に筆記された手書
き文字の一例を示す説明図である。FIG. 2 is an explanatory diagram illustrating an example of a handwritten character written on an input surface of a handwritten character input device.

【図３】手書き文字の中のデータの単位を示す説明図で
ある。FIG. 3 is an explanatory diagram showing a unit of data in a handwritten character.

【図４】図１の手書き文字認識装置の機能構成図であ
る。FIG. 4 is a functional configuration diagram of the handwritten character recognition device of FIG. 1;

【図５】枠無し文字列認識部の詳細構成図である。FIG. 5 is a detailed configuration diagram of a frameless character string recognition unit.

【図６】記憶装置に格納される手書き文字のデータ構成
の一例を示す図である。FIG. 6 is a diagram illustrating an example of a data configuration of a handwritten character stored in a storage device.

【図７】縦書き横書き判別ベクトルの説明図である。FIG. 7 is an explanatory diagram of a vertical / horizontal writing discrimination vector.

【図８】縦書き横書きの判別処理を示すフローチャート
である。FIG. 8 is a flowchart illustrating a process of determining vertical writing and horizontal writing.

【図９】改行位置の判別に使用するヒストグラムの例を
示す説明図である。FIG. 9 is an explanatory diagram showing an example of a histogram used for determining a line feed position.

【図１０】改行裏ストロークの説明図である。FIG. 10 is an explanatory diagram of a line feed back stroke.

【図１１】改行位置の判定処理を示すフローチャートで
ある。FIG. 11 is a flowchart illustrating a line feed position determination process.

【図１２】ストローク間の仮結合処理に用いる評価パラ
メータの説明図である。FIG. 12 is an explanatory diagram of evaluation parameters used for a temporary connection process between strokes.

【図１３】ストローク間の仮結合処理に用いる評価パラ
メータの説明図である。FIG. 13 is an explanatory diagram of evaluation parameters used for a temporary connection process between strokes.

【図１４】ストロークの仮結合処理の対象となる入力ス
トロークの例と評価パラメータの算出例を示す説明図で
ある。FIG. 14 is an explanatory diagram showing an example of an input stroke to be subjected to a temporary connection process of strokes and an example of calculation of an evaluation parameter.

【図１５】文字要素の外接矩形から標準文字サイズを推
定する処理の説明図である。FIG. 15 is an explanatory diagram of a process of estimating a standard character size from a circumscribed rectangle of a character element.

【図１６】斜め書きの手書き文字の文字要素への仮結合
処理の一例を示す図である。FIG. 16 is a diagram illustrating an example of a process of temporarily combining obliquely written handwritten characters into character elements.

【図１７】文字要素の再帰的な処理によって結合可能な
手書き文字の一例を示す説明図である。FIG. 17 is an explanatory diagram showing an example of handwritten characters that can be combined by recursive processing of character elements.

【図１８】手書き文字を辞書内で探索する際に用いる２
分木の一例を示す説明図である。FIG. 18 illustrates a method for searching for a handwritten character in a dictionary.
It is explanatory drawing which shows an example of a branch tree.

[Explanation of symbols]

１…ペン、２…手書き文字入力装置、３…表示装置、４
…ＣＰＵ、６…記憶装置、２１…手書き文字の入力面、
６１…手書き文字認識プログラム、６２…辞書、６１１
…筆記方向取得部、６１２…改行位置取得部、６１３…
標準文字サイズ取得部、６１４…枠無し手書き文字列認
識部、６１５…仮結合処理部、６１６…仮分割処理部、
６１７…評価・探索処理部。DESCRIPTION OF SYMBOLS 1 ... Pen, 2 ... Handwritten character input device, 3 ... Display device, 4
... CPU, 6 ... Storage device, 21 ... Input surface for handwritten characters,
61: handwritten character recognition program, 62: dictionary, 611
... writing direction acquisition unit, 612 ... line feed position acquisition unit, 613 ...
Standard character size acquisition unit, 614: frameless handwritten character string recognition unit, 615: temporary combination processing unit, 616: temporary division processing unit
617 ... Evaluation / search processing unit.

Claims

[Claims]

1. A handwritten character recognition method for recognizing a plurality of handwritten character strings consisting of a plurality of stroke groups input in the order of strokes from a handwritten character input device, the method comprising: At the time, a vector from the end point to the start point of the adjacent stroke is obtained, the rightward component and the downward component of the vector are added together in the same direction component, and the ratio of the added rightward component to the downward component and the horizontal writing / vertical writing The writing direction is compared with a threshold for writing determination, and if the ratio is equal to or greater than the threshold, the writing direction is determined as horizontal writing, and if the ratio is less than the threshold, the writing direction is determined as vertical writing, and the handwritten character string including the plurality of stroke groups is recognized according to the determination result of the writing direction. A handwritten character recognition method.

2. The method according to claim 1, wherein the horizontal / vertical writing determination threshold comprises a horizontal writing determination threshold and a vertical writing determination threshold, and the ratio of the added rightward component and downward component is less than the horizontal writing determination threshold. If it is equal to or greater than the writing determination threshold, whether the horizontal writing is performed based on the ratio of the height and width of the circumscribed rectangle of the entire input handwritten character string,
2. The method for recognizing handwritten characters according to claim 1, wherein it is determined whether the character is written vertically.

3. A handwritten character recognition method for recognizing a plurality of handwritten character strings composed of a plurality of stroke groups input in the order of strokes from a handwritten character input device, wherein a histogram in the writing direction is obtained for the plurality of stroke groups. Selecting a portion having a small number of writing points as a line feed position candidate according to the histogram, and further calculating an average of the length of the vector and the vector from the end point to the start point of adjacent strokes at the stroke input time in the stroke group, The length of the vector in the line feed position candidate is compared with the average of the lengths of the vectors, the position of the vector exceeding the threshold for line feed determination is determined as a line feed position, and the plurality of stroke groups are formed according to the result of determination of the line feed position. A handwritten character recognition method characterized by recognizing a handwritten character string.

4. A handwritten character recognition method for recognizing a plurality of handwritten character strings composed of a plurality of stroke groups input in the order of strokes from a handwritten character input device. At the time, a vector from the end point to the start point of the adjacent stroke is obtained, the rightward component and the downward component of the vector are added together in the same direction component, and the ratio of the added rightward component to the downward component and the horizontal writing / vertical writing A writing direction is compared with a threshold for writing determination, and if the ratio is equal to or greater than the threshold, the writing direction is determined as horizontal writing, and if the ratio is less than the threshold, a writing direction is determined as vertical writing. A part having a small number of points is selected as a line feed candidate, and a part at the stroke input time in the stroke group is selected. Determine the average of the length of the vector and the vector from the end point of the adjacent strokes to the start point, and determine the position where the average of the length of the vector exceeds the threshold for line feed determination in the line feed position candidate as a line feed position, A handwritten character recognition method comprising: recognizing a handwritten character string including the plurality of stroke groups according to the determination result of the line feed position and the determination result of the writing direction.

5. A handwritten character recognition method for recognizing a plurality of handwritten character strings composed of a plurality of stroke groups input in the order of strokes from a handwritten character input device, wherein a distance between strokes constituting the plurality of stroke groups is predetermined. After evaluating according to the relational expression and repeating the process of combining strokes whose evaluated distances are smaller than the threshold for temporary combination until there are no more connectable strokes, after dividing the plurality of stroke groups into a plurality of character elements, The circumscribed rectangle of each character element is obtained, and the maximum value or average value of the circumscribed rectangle and the maximum value or average value of the width are estimated as the standard character size of the handwritten character. Parameters according to a predetermined relational expression, and the calculated parameters are By repeating the process of combining character elements smaller than the threshold for combination until there are no more character elements that can be combined, multiple character elements are divided into multiple character element sets, and a dictionary is searched according to the character element sets. And outputting a character having a maximum evaluation value for a handwritten character pattern registered in a dictionary as a recognition result.

6. An attribute flag is set for each character element whose parameter indicating the relationship between the character elements is greater than a threshold for provisional division, indicating that the character element is a delimiter. 6. The handwritten character recognition method according to claim 5, wherein the flag is divided into a state where character delimitation is obvious, and the dictionary is searched with reference to this division.

7. A handwritten character recognition method for recognizing a plurality of handwritten character strings composed of a plurality of stroke groups input in the order of strokes from a handwritten character input device, comprising: inputting a stroke in the plurality of stroke groups; At the time, a vector from the end point to the start point of the adjacent stroke is obtained, the rightward component and the downward component of the vector are added together in the same direction component, and the ratio of the added rightward component to the downward component and the horizontal writing / vertical writing A writing direction is compared with a threshold for writing determination, and if the ratio is equal to or greater than the threshold, the writing direction is determined as horizontal writing, and if the ratio is less than the threshold, a writing direction is determined as vertical writing. A part having a small number of points is selected as a line feed candidate, and a part at the stroke input time in the stroke group is selected. A vector from the end point to the start point of an adjacent stroke and the average of the lengths of the vectors are obtained, the length of the vector in the line feed position candidate is compared with the average of the length of the vector, and the vector exceeding the threshold for line feed determination is determined. Is determined as a line feed position, and the distance between strokes constituting the plurality of stroke groups is evaluated in line units according to a predetermined relational expression. By repeating the process of combining until there are no more character elements that can be combined, the multiple stroke group is divided into multiple character elements, then the circumscribed rectangle of each character element is obtained, and the maximum or average height of the circumscribed rectangle is obtained. The maximum or average value and width are estimated as the standard character size of handwritten characters. By calculating a parameter representing the relationship between elements according to a predetermined relational expression, and repeating the process of combining the character elements whose calculated parameters are smaller than the threshold for provisional combination until there are no more character elements that can be combined, Dividing a plurality of character elements into a plurality of character element sets, searching a dictionary by the character element sets, and outputting, as a recognition result, a character having a maximum evaluation value for a handwritten character pattern registered in the dictionary. How to recognize handwritten characters.