JPH11338972A

JPH11338972A - Device and method for segmenting character string

Info

Publication number: JPH11338972A
Application number: JP11030686A
Authority: JP
Inventors: Yoshinobu Hotta; 悦伸堀田; Satoshi Naoi; 聡直井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-02-08
Filing date: 1999-02-08
Publication date: 1999-12-10
Anticipated expiration: 2017-05-27
Also published as: JP3285837B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to accurately segment a character string even when an overhang and a separation stroke are present. SOLUTION: A character string extraction means 16 extracts a partial pattern of characters based on the connection information of the character string. A character size calculation means 18 calculates the histogram of the longitudinal or lateral character size of a circumscribed rectangle circumscribing about the extracted partial pattern and calculates an average character size and calculates the distribution value based on the result. A character pitch calculation means 21 calculates the histogram of a pitch between the characters and calculates an average character pitch and calculates the distribution value based on the result. An integration means 22 integrates the characters and also changes the integration condition of the characters corresponding to the distribution value of the average character size and the distribution value of the pitch. A small part integration means 26 integrates the characters by discriminating a small part pattern in the partial pattern in the character string based on the information of the average value and distribution value of the average character size and the pitch.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字列の切り出しを行
うための文字の切り出し方法及び文字の切り出し装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting method and a character extracting device for extracting a character string.

【０００２】[0002]

【従来の技術】近年、手書き用の入力周辺機器として、
手書き文字認識装置の需要が増加している。この手書き
文字認識装置においては、文字列の夫々の文字を切り出
してその文字を認識する。この場合、文字列を正確に認
識する為には文字列の切り出し処理が重要となる。2. Description of the Related Art In recent years, as input peripheral devices for handwriting,
There is an increasing demand for handwritten character recognition devices. In this handwritten character recognition device, each character in a character string is cut out and the character is recognized. In this case, in order to accurately recognize the character string, the character string cutout processing is important.

【０００３】従来では、帳票等の予め文字を書く位置を
指定した文書に対して、指定した範囲内（文字枠）に文
字をきれいに書き、その文字の切り出しを行う場合に、
文字列中の最大矩形もしくは文字列内の矩形の単純平均
を文字を認識するときの文字サイズとして算出する。ま
た、文字間のピッチを算出して、文字の切り出しや文字
の統合を行っている。Conventionally, in a document such as a form in which a character writing position is specified in advance, characters are clearly written in a specified range (character frame) and the character is cut out.
The maximum rectangle in the character string or the simple average of the rectangles in the character string is calculated as the character size for character recognition. In addition, the pitch between characters is calculated to extract characters and integrate characters.

【０００４】ここで、ピッチの算出としては、以下のよ
うな公知の技術がある。例えば、特開平４−０９８４７
７の技術は、文字パターンについて一次元投影を行い、
得られた白画素と黒画素との間隔によりピッチを算出す
るものである。特開昭６０−１７３６８５の技術、文書
画像データに一次元のフーリエ変換を施してピッチを推
定するものである。特開昭６２−１９５８９３の技術は
文字列の高さ情報に基づいて文字の切り出し推定範囲を
定め、その推定範囲内で切り出し位置を探索するもので
ある。Here, there are the following known techniques for calculating the pitch. For example, JP-A-4-09847
Technique 7 performs one-dimensional projection on character patterns,
The pitch is calculated based on the obtained interval between the white pixel and the black pixel. Japanese Patent Application Laid-Open No. 60-173885 discloses a technique for performing one-dimensional Fourier transform on document image data to estimate a pitch. The technique disclosed in Japanese Patent Application Laid-Open No. Sho 62-195893 is to determine an estimated range of character cutout based on height information of a character string and to search for a cutout position within the estimated range.

【０００５】また、文字の統合については、連結する黒
画素の外接矩形を抽出し、この外接矩形を文字サイズや
ピッチを基に統合していく特開平４−０１７０８６など
の技術がある。このように、従来の文字切り出し方法に
あっては、等間隔の文字枠に夫々の文字をきれいに書
き、前記切り出し方法を用いていたので、文字をかなり
の精度で認識することができる。ところで、文字枠に夫
々の文字をきれいに書くのは書き手にとって煩わしいた
め、文字枠のない用紙にフリーピッチで文字を書いてい
た。このフリーピッチで書かれた文字では、文字の大
小、文字同士の重なり、接触、文字同士が接近するオー
バハングや一つの数字が複数パターンに分離する分離ス
トロークなどが発生する。Japanese Patent Laid-Open No. 4-17086 discloses a technique for unifying characters by extracting a circumscribed rectangle of connected black pixels and integrating the circumscribed rectangle based on the character size and pitch. As described above, in the conventional character cutout method, each character is written neatly in a character frame at equal intervals and the cutout method is used, so that characters can be recognized with considerable accuracy. By the way, it is troublesome for a writer to write each character neatly in a character frame, so characters are written at a free pitch on paper without a character frame. In the characters written at this free pitch, the size of the characters, the overlap between the characters, the contact, the overhang where the characters approach each other, the separation stroke in which one number is separated into a plurality of patterns, and the like occur.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、フリー
ピッチで書かれた文字列に対して従来の文字切り出し方
法を用いると、次のような問題があった。まず、文字サ
イズの変動のある文字列に対して、文字列中の最大矩形
を基準サイズとして用いた場合、極端に大きい文字が一
つでもあると、文字を統合した時のサイズとしては適当
でなかった。However, when a conventional character segmentation method is used for a character string written at a free pitch, there are the following problems. First, when the largest rectangle in a character string is used as a reference size for a character string having a variable character size, if there is at least one extremely large character, it is not appropriate as the size when the characters are integrated. Did not.

【０００７】また、濁点等の小部分パターンの多い文字
列に対して、単純に平均文字サイズを算出すると、濁点
等の小サイズ矩形の影響を受けるため、平均文字サイズ
が実際よりも小さく算出されてしまう。Further, if the average character size is simply calculated for a character string having a large number of small part patterns such as a cloud point, the average character size is calculated to be smaller than the actual size because the average character size is affected by a small size rectangle such as a cloud point. Would.

【０００８】さらに、オーバーハングのある文字列に対
して、前記一次元投影によりピッチを算出する場合、白
画素と黒画素との境界が出なくなる。このため、文字の
認識率が低下するという問題があった。Further, when the pitch is calculated by the one-dimensional projection for a character string having an overhang, a boundary between a white pixel and a black pixel does not appear. For this reason, there is a problem that the character recognition rate is reduced.

【０００９】また、文字列の高さ情報を基に文字の切り
出し推定範囲を定める方法では、正方性の高い印刷文字
や手書きの漢字などでは問題ないが、手書きの数字など
文字サイズの変動が大きい文字列では、明確に切り出し
推定範囲が定まらないという問題があった。In the method of determining the estimated range of character cutout based on the height information of a character string, there is no problem with a highly square printed character or a handwritten kanji character, but a large variation in character size such as a handwritten numeral. With a character string, there is a problem that the cutout estimation range is not clearly defined.

【００１０】一方、従来、文字列を抽出する際に文書画
像上の文字を黒画素，背景を白画素として、それを横
（縦）方向に投影してヒストグラムを求め、ヒストグラ
ムの山と谷とから文字列を抽出する方法がある。また、
黒画素に拡大及び縮退操作を複数回だけ施し、その結
果、得られたひとまとまりを１文字列とする方法（電子
通信学会論文誌’83/4 Vol.J66ーD No 437ー444）や連
結パターンの外接矩形を求め、その中心座標に対して矩
形の面積を重みとした投影をとり、得られた投影値の周
辺分布をとる方法（電子通信学会論文誌’85/12 Vol.J6
8ーD No 12 2123ー2131）などがある。On the other hand, conventionally, when a character string is extracted, a character on a document image is defined as a black pixel and a background is defined as a white pixel, which is projected in the horizontal (vertical) direction to obtain a histogram. There is a method to extract a character string from Also,
Enlargement and reduction operations are performed on black pixels multiple times, and the resulting group is converted into a single character string (Transactions of the Institute of Electronics, Information and Communication Engineers, '83 / 4 Vol.J66-D No 437-444) and concatenation. A method of obtaining a circumscribed rectangle of a pattern, performing projection with the area of the rectangle weighted to the center coordinates, and obtaining a marginal distribution of the obtained projection values (Transactions of the Institute of Electronics and Communication Engineers '85 / 12 Vol.J6
8-D No 12 2123-2131).

【００１１】ここで、手書き文字列において上下（左
右）の間隔が小さい場合に、図２９に示すようにある文
字列の文字の一部が、隣接する文字列の文字間にはいっ
てしまうことが発生しやすい。この場合、ヒストグラム
をとる方法では、投影したヒストグラムから谷の部分を
見つけにくくなり、また、ヒストグラムの形状が文字の
複雑さに依存してしまう。Here, when the vertical (left and right) intervals are small in a handwritten character string, some of the characters of a certain character string may enter between the characters of an adjacent character string as shown in FIG. Likely to happen. In this case, the method of obtaining the histogram makes it difficult to find a valley portion from the projected histogram, and the shape of the histogram depends on the complexity of the character.

【００１２】また、黒画素の拡大及び縮退操作を行う方
法では、上下の文字列が全てひとまとまりになってしま
う。連結パターンの外接矩形を基に矩形中心座標の重み
付け投影をとる方法では、矩形中心だけに対する重み付
けのために、投影値から文字サイズ情報を読み取れな
い。このため、周辺分布の範囲を特定しにくいという問
題があった。また、数字などのように高さのサイズがほ
ぼ一定でも幅のサイズに変動があるものに対しては、文
字面積を重みとして矩形中心座標にかけるために、周辺
分布から文字列を抽出する処理において、横長の文字の
影響を受けやすいという問題があった。Further, in the method of performing the operation of enlarging and reducing the black pixel, the upper and lower character strings are all united. In the method of weighting and projecting the coordinates of the center of the rectangle based on the circumscribed rectangle of the connection pattern, character size information cannot be read from the projection value because only the center of the rectangle is weighted. For this reason, there was a problem that it was difficult to specify the range of the peripheral distribution. In addition, when the size of the height is almost constant but the size of the width fluctuates, such as numbers, the character area is weighted and applied to the center coordinates of the rectangle. , There is a problem that the character is susceptible to horizontally long characters.

【００１３】本発明の第１の目的は、オーバハングや分
離ストロークがあっても、正確に文字の切り出しを行な
うことのできる文字の切り出し方法及び文字の切り出し
装置を提供する。A first object of the present invention is to provide a character extracting method and a character extracting apparatus which can accurately extract characters even if there is an overhang or a separation stroke.

【００１４】本発明の第２の目的は、文字列間の間隔が
小さく、また文字列間で文字と文字の接触があるような
場合でも容易に文字列を抽出することのできる文字の切
り出し方法及び文字の切り出し装置を提供する。A second object of the present invention is to provide a method for extracting a character which can easily extract a character string even when the character string has a small space between the character strings and there is a contact between the character strings. And a device for cutting out characters.

【００１５】[0015]

【課題を解決するための手段】本発明は、上記課題を解
決し目的を達成するために下記の構成とした。図１は本
発明にかかる文字の切り出し装置の原理図である。文字
列の切り出し装置は、文字列の夫々の文字を認識するた
めに文字の切り出しを行うもので、文字列抽出手段１
６、文字サイズ算出手段１８、文字ピッチ算出手段２
１、統合手段２２、小部分統合手段２６を備える。Means for Solving the Problems The present invention has the following constitution in order to solve the above-mentioned problems and achieve the object. FIG. 1 is a principle diagram of a character cutting device according to the present invention. The character string cutout device cuts out a character in order to recognize each character of the character string.
6, character size calculating means 18, character pitch calculating means 2
1, an integrating means 22 and a small part integrating means 26 are provided.

【００１６】文字列抽出手段１６は文字列の連結情報に
基づいて文字の部分パターンを抽出する。文字サイズ算
出手段１８は抽出された部分パターンに外接する外接矩
形の縦又は横の文字サイズのヒストグラムを算出しその
結果に基づき平均文字サイズの算出とその分散値の算出
を行う。The character string extracting means 16 extracts a character partial pattern based on character string connection information. The character size calculation means 18 calculates a vertical or horizontal character size histogram of a circumscribed rectangle circumscribing the extracted partial pattern, and calculates an average character size and its variance based on the result.

【００１７】文字ピッチ算出手段２１は文字間のピッチ
のヒストグラムを算出しとその結果に基づき平均文字ピ
ッチの算出とその分散値の算出を行う。統合手段２２は
前記平均文字サイズ，平均文字ピッチ，サイズ分散値，
ピッチ分散値に応じて文字の統合条件を変えながら文字
の統合を行う。The character pitch calculating means 21 calculates a histogram of the pitch between characters, and calculates the average character pitch and the variance thereof based on the result. The integrating means 22 calculates the average character size, average character pitch, size variance value,
Character integration is performed while changing the character integration condition according to the pitch variance value.

【００１８】小部分統合手段２６は前記平均文字サイズ
に基づき文字列中の部分パターンの内、小部分パターン
を判別することにより文字の統合を行う。前記文字サイ
ズ算出手段１８は、算出手段３２，３４，３８、決定手
段３６を備える。算出手段３２は文字列中の文字の部分
パターンに外接する矩形の縦又は横方向の長さのヒスト
グラムを算出する。算出手段３４は文字列内の全体のヒ
ストグラムに基づき暫定平均文字サイズを算出する。決
定手段３６は暫定平均文字サイズに基づき文字サイズ算
出領域を決定する。算出手段３８は決定された文字サイ
ズ領域内で平均文字サイズを算出する。The small part integrating means 26 integrates the characters by determining the small part pattern among the partial patterns in the character string based on the average character size. The character size calculating means 18 includes calculating means 32, 34, 38 and a determining means 36. The calculating means 32 calculates a histogram of the length in the vertical or horizontal direction of the rectangle circumscribing the partial pattern of the character in the character string. The calculating means 34 calculates a provisional average character size based on the entire histogram in the character string. The determining means 36 determines a character size calculation area based on the provisional average character size. Calculation means 38 calculates an average character size within the determined character size area.

【００１９】さらに、前記算出された平均文字サイズを
用いて平均文字サイズと外接矩形の文字サイズとの面積
比又は高さ比を求め、その結果により文字列中の部分パ
ターンから前記小部分パターンからなる小分離ストロー
クを抽出する小分離ストローク抽出部２０を備えるよう
にする。Further, an area ratio or a height ratio between the average character size and the character size of the circumscribed rectangle is obtained by using the calculated average character size. A small separation stroke extracting unit 20 for extracting a small separation stroke is provided.

【００２０】前記文字ピッチ算出手段２１は、文字間の
ピッチを算出する際に、小分離ストローク以外の部分パ
ターンに対して、外接矩形間の距離をピッチとし、その
ピッチのヒストグラムを算出する手段２１ａ、ヒストグ
ラムに基づき暫定平均文字ピッチを算出する手段２１
ｂ、その暫定平均文字ピッチに基づき文字ピッチ算出領
域を決定する手段２１ｃ、決定された文字ピッチ領域内
で平均文字ピッチを算出する手段２１ｄとを備えた。When calculating the pitch between characters, the character pitch calculating means 21 uses a distance between circumscribed rectangles as a pitch for a partial pattern other than a small separation stroke, and calculates a histogram of the pitch. Means 21 for calculating provisional average character pitch based on histogram
b, means 21c for determining a character pitch calculation area based on the provisional average character pitch, and means 21d for calculating an average character pitch within the determined character pitch area.

【００２１】さらに、前記統合手段２２は、小分離スト
ロークを含む文字を統合する際に前記平均文字サイズ，
平均文字ピッチ，サイズ分散値，ピッチ分散値からなる
評価関数の値に応じて文字の統合条件を変えるようにす
る。Further, when integrating characters including small separation strokes, the integrating means 22 sets the average character size,
The character integration condition is changed according to the value of the evaluation function including the average character pitch, the size variance, and the pitch variance.

【００２２】前記統合手段２２は、ピッチの幅がある閾
値以下であって評価関数値がある範囲内である場合に、
抽出された小分離ストロークとその左右に位置するパタ
ーンとの距離を算出し、それらの距離比に基づき統合を
行う確信度統合部２４を備えるようにするとよい。When the pitch width is equal to or less than a certain threshold and the evaluation function value is within a certain range, the integrating means 22
It is preferable to include a certainty factor integration unit 24 that calculates the distance between the extracted small separation stroke and the patterns located on the left and right sides thereof and integrates them based on the distance ratio between them.

【００２３】前記部分統合手段２６は、線密度算出部４
２、傾き算出部４４、判別部４６を備える。線密度算出
部４２は小分離ストロークとその左右に位置する部分パ
ターンとそれらを統合した場合の部分パターンとに対し
て線密度を算出する。The partial integration means 26 includes a linear density calculator 4
2. It includes a tilt calculator 44 and a determiner 46. The linear density calculation unit 42 calculates the linear density for the small separation stroke, the partial patterns located on the left and right sides of the small separation stroke, and the partial pattern obtained by integrating them.

【００２４】傾き算出部４４は小分離ストロークの傾き
を算出する。判別部４６は前記線密度及び傾きに基づき
小分離ストロークを左右に位置するいずれかの部分パタ
ーンに統合すべきかを判別する。The inclination calculator 44 calculates the inclination of the small separation stroke. The determination unit 46 determines whether the small separation stroke should be integrated with any of the left and right partial patterns based on the line density and the inclination.

【００２５】前記線密度算出部４２は、線密度を算出す
る際に部分パターンの外接矩形をｎ等分し、ｍライン目
からｎ−ｍライン目までにカウントした線密度の最大値
をとるようにするとよい。When calculating the linear density, the linear density calculator 42 divides the circumscribed rectangle of the partial pattern into n equal parts, and takes the maximum value of the linear density counted from the m-th line to the mn-th line. It is good to

【００２６】さらに、前記線密度算出において抽出され
た直線分候補に対して、パターンのある一方向に対して
線密度を算出していき、次に他方向に転じて線密度を算
出していき複数の方向の線密度の合計を求める複数方向
線密度算出部５２を備えるとよい。Further, for the straight line segment candidates extracted in the line density calculation, the line density is calculated in one direction with a pattern, and then the line density is calculated in the other direction. It is preferable to include a multi-direction linear density calculator 52 that calculates the sum of the linear densities in a plurality of directions.

【００２７】前記傾き算出部４４は、部分パターンを外
接矩形の縦横サイズの内のサイズが長い方向に等間隔に
分割し、部分パターンと夫々の等分線との交点に基づき
傾きを算出するようにする。The inclination calculating section 44 divides the partial pattern at equal intervals in the direction in which the size of the vertical and horizontal sizes of the circumscribed rectangle is longer, and calculates the inclination based on the intersection between the partial pattern and each of the bisectors. To

【００２８】手書き文字認識装置、印刷文字認識装置、
図面認識における文字の切り出し装置などの手書き文字
切り出し装置に適用できる。図２は第２の発明の原理図
である。第２の発明はパターン抽出手段１４、重み付け
投影手段６４、文字列軸決定手段６６、文字列抽出手段
６８を備える。A handwritten character recognition device, a printed character recognition device,
The present invention is applicable to a handwritten character cutout device such as a character cutout device in drawing recognition. FIG. 2 is a principle diagram of the second invention. The second invention includes a pattern extracting unit 14, a weighting projecting unit 64, a character string axis determining unit 66, and a character string extracting unit 68.

【００２９】パターン抽出手段１４は文字列の連結情報
に基づいて文字の部分パターンを抽出する。重み付け投
影手段６４は抽出された部分パターンに外接する外接矩
形の縦又は横方向の線分に対して重み付け投影を行うこ
とにより投影ヒストグラムを求める。The pattern extracting means 14 extracts a character partial pattern based on character string connection information. The weighting projection unit 64 obtains a projection histogram by performing weighting projection on a vertical or horizontal line segment of a circumscribed rectangle circumscribing the extracted partial pattern.

【００３０】文字列軸決定手段６６は前記投影ヒストグ
ラムに基づき文字列軸を決定する。文字列抽出手段６８
は前記文字列軸に基づき文字列を抽出する。また、前記
重み付け投影手段６４は、パターンの外接矩形の縦又は
横方向線分の中心をピークとしその中心からの距離に応
じて重み付け投影を行う。前記文字列軸決定手段６６
は、投影ヒストグラムのピーク値から文字列の中心軸を
決定する。前記文字列抽出手段６８は、前記中心軸と夫
々の外接矩形の中心との距離とに基づきパターンの属す
る文字列を抽出するようにする。例えば、上下左右の文
字列間で文字が接触しない場合にこれらの手段が用いら
れる。The character string axis determining means 66 determines the character string axis based on the projection histogram. Character string extraction means 68
Extracts a character string based on the character string axis. Further, the weighting projection unit 64 performs weighting projection according to the distance from the center of the vertical or horizontal line segment of the circumscribed rectangle of the pattern as a peak. The character string axis determining means 66
Determines the central axis of the character string from the peak value of the projection histogram. The character string extracting means 68 extracts a character string to which a pattern belongs based on the distance between the central axis and the center of each circumscribed rectangle. For example, these means are used when characters do not touch between the upper, lower, left, and right character strings.

【００３１】さらに、外接矩形に対して文字の高さの平
均を算出する算出手段７１、平均文字高さのサイズの所
定倍以上の外接矩形を接触文字塊として除去する除去手
段７２を備えるようにするとよい。例えば、上下左右の
文字列間で文字が接触した場合で、入力データ中の文字
サイズがほぼ一様であることが予め分かっている場合に
これらの手段が用いられる。Furthermore, a calculating means 71 for calculating the average of the heights of the characters with respect to the circumscribed rectangle and a removing means 72 for removing a circumscribed rectangle having a size equal to or more than a predetermined multiple of the average character height as a contact character block are provided. Good to do. For example, these means are used when a character touches between upper, lower, left, and right character strings and it is known in advance that the character size in the input data is substantially uniform.

【００３２】複数の文字列軸が横切る外接矩形を接触文
字列塊として除去する除去手段７３を備え、前記重み付
け投影手段６４は、パターンの外接矩形の縦又は横方向
線分の上端及び下端をピークとしその上端からの距離，
下端からの距離に応じて重み付け投影を行う。前記文字
列軸決定手段６６は、投影ヒストグラムの上端のピーク
値と下端のピーク値とから文字列の中心軸を決定する。
前記文字列抽出手段６８は、前記接触文字列塊を除く外
接矩形に対して前記中心軸と夫々の外接矩形の中心との
距離とに基づきパターンの属する文字列を抽出するよう
にする。There is provided a removing means 73 for removing a circumscribed rectangle crossed by a plurality of character string axes as a contact character string chunk, and the weighting projection means 64 provides peaks at upper and lower ends of vertical or horizontal line segments of the circumscribed rectangle of the pattern. And the distance from the top,
Weighted projection is performed according to the distance from the lower end. The character string axis determining means 66 determines the central axis of the character string from the peak value at the upper end and the peak value at the lower end of the projection histogram.
The character string extracting means 68 extracts a character string to which a pattern belongs from a circumscribed rectangle excluding the contact character string block, based on the distance between the center axis and the center of each circumscribed rectangle.

【００３３】前記重み付け投影手段６４は、パターンの
外接矩形の上端及び下端に対して重み付け投影を行う。
前記文字列軸決定手段６６は、投影ヒストグラムの上端
の候補位置と下端の候補位置とを決定し上端の候補位置
と下端の候補位置とから文字列の中心軸を決定する。前
記文字列抽出手段６８は、前記中心軸と夫々の外接矩形
の中心との距離とに基づきパターンの属する文字列を抽
出するようにする。The weighting projection means 64 performs weighting projection on the upper and lower ends of the circumscribed rectangle of the pattern.
The character string axis determining means 66 determines a candidate position of an upper end and a candidate position of a lower end of the projection histogram, and determines a central axis of the character string from the candidate positions of the upper end and the lower end. The character string extracting means 68 extracts a character string to which a pattern belongs based on the distance between the central axis and the center of each circumscribed rectangle.

【００３４】さらに、夫々の外接矩形同士が重なる場合
に重なった外接矩形を統合する外接矩形統合部６１を備
えるようにする。前記統合の結果に対して平均文字サイ
ズを算出する算出手段７４、平均文字サイズの所定倍以
上の外接矩形を接触文字塊として除去する除去手段７５
を備えるようにする。例えば、上下左右の文字列間で文
字が接触し、入力データ中の文字サイズがほぼ一様であ
ることが予め分かっている場合にこれらの手段が用いら
れる。Further, a circumscribed rectangle integrating unit 61 for integrating the circumscribed rectangles when the circumscribed rectangles overlap each other is provided. Calculating means 74 for calculating an average character size for the result of the integration; removing means 75 for removing a circumscribed rectangle having a predetermined size or more of the average character size as a contact character block
Be prepared to have. For example, these means are used when a character touches between upper, lower, left, and right character strings and it is known in advance that the character size in the input data is substantially uniform.

【００３５】前記重み付け投影手段６４は、外接矩形の
一端部から他端部に向かって減衰する重み付け投影と他
端部から一端部に向かって減衰する重み付け投影とを行
い、投影ヒストグラムの夫々のピークを求める。夫々の
ピーク値から１文字の中心位置と１文字の存在領域を推
定するようにする。例えば、オンライン手書き文字列に
対してこれらの手段は用いられる。The weighted projection means 64 performs weighted projection that attenuates from one end to the other end of the circumscribed rectangle and weighted projection that attenuates from the other end to one end of the circumscribed rectangle. Ask for. The center position of one character and the existing area of one character are estimated from the respective peak values. For example, these means are used for an online handwritten character string.

【００３６】また、文字列の夫々の文字を認識するため
に文字の切り出しを行う文字の切り出し方法であって、
文字列の連結情報に基づいて文字の部分パターンを抽出
する抽出ステップと、抽出された部分パターンに外接す
る外接矩形の縦又は横の文字サイズのヒストグラムを算
出しその結果に基づき平均文字サイズの算出とその分散
値の算出を行う算出ステップと、文字間のピッチのヒス
トグラムを算出しその結果に基づき平均文字ピッチの算
出とその分散値の算出を行う算出ステップと、前記平均
文字サイズ，平均文字ピッチ，サイズ分散値，ピッチ分
散値に応じて文字の統合条件を変えながら文字の統合を
行う統合ステップと、前記平均文字サイズに基づき文字
列中の部分パターンの内、小部分パターンを判別するこ
とにより文字の統合を行う小部分統合ステップとを含
む。A character extracting method for extracting a character in order to recognize each character in a character string,
An extraction step of extracting a partial pattern of a character based on the concatenation information of the character string, and calculating a vertical or horizontal character size histogram of a circumscribed rectangle circumscribing the extracted partial pattern and calculating an average character size based on the result And calculating a variance thereof, calculating a histogram of the pitch between characters, calculating an average character pitch based on the result, and calculating the variance thereof, and calculating the average character size and average character pitch. An integration step of integrating characters while changing the integration condition of the characters according to the size variance value and the pitch variance value, and by discriminating a small partial pattern from the partial patterns in the character string based on the average character size. A small part integration step of integrating characters.

【００３７】前記算出ステップは、外接矩形間の距離を
ピッチとし、そのピッチのヒストグラムを算出し、ヒス
トグラムに基づき暫定平均文字ピッチを算出する。その
暫定平均文字ピッチに基づき文字ピッチ算出領域を決定
し、決定された文字ピッチ領域内で平均文字ピッチを算
出する。In the calculation step, a distance between the circumscribed rectangles is set as a pitch, a histogram of the pitch is calculated, and a provisional average character pitch is calculated based on the histogram. A character pitch calculation area is determined based on the provisional average character pitch, and an average character pitch is calculated within the determined character pitch area.

【００３８】さらに、文字列の連結情報に基づいて文字
の部分パターンを抽出する抽出ステップと、抽出された
部分パターンに外接する外接矩形の縦又は横方向の線分
に対して重み付け投影を行うことにより投影ヒストグラ
ムを求める投影ステップと、前記投影ヒストグラムに基
づき文字列軸を決定する決定ステップと、前記文字列軸
に基づき文字列を抽出する抽出ステップとを含む。Further, an extracting step of extracting a partial pattern of a character based on the connection information of a character string, and performing weighted projection on a vertical or horizontal line segment of a circumscribed rectangle circumscribing the extracted partial pattern. And a determination step of determining a character string axis based on the projection histogram, and an extraction step of extracting a character string based on the character string axis.

【００３９】[0039]

【作用】本発明によれば、まず、文字列抽出手段１６が
文字列の連結情報に基づいて文字の部分パターンを抽出
する。そして、文字サイズ算出手段１８が抽出された部
分パターンに外接する外接矩形の縦又は横の文字サイズ
のヒストグラムを算出しその結果に基づき平均文字サイ
ズの算出とその分散値の算出を行う。また、文字ピッチ
算出手段２１が文字間のピッチのヒストグラムを算出し
その結果に基づき平均文字ピッチの算出とその分散値の
算出を行う。According to the present invention, first, the character string extracting means 16 extracts a partial pattern of a character based on the connection information of the character strings. Then, the character size calculating means 18 calculates a vertical or horizontal character size histogram of a circumscribed rectangle circumscribing the extracted partial pattern, and calculates an average character size and its variance based on the result. Further, the character pitch calculating means 21 calculates a histogram of the pitch between characters, and calculates the average character pitch and the variance thereof based on the result.

【００４０】次に、統合手段２２が、前記平均文字サイ
ズ，平均文字ピッチ，サイズ分散値，ピッチ分散値に応
じて文字の統合条件を変えながら文字の統合を行う。そ
して、小部分統合手段２６が前記平均文字サイズに基づ
き文字列中の部分パターンの内、小部分パターンを判別
することにより文字の統合を行う。Next, the integrating means 22 integrates the characters while changing the character integration conditions according to the average character size, average character pitch, size variance value, and pitch variance value. Then, the small part integrating means 26 integrates the characters by determining the small part pattern among the partial patterns in the character string based on the average character size.

【００４１】すなわち、不定ピッチ，文字サイズの変動
のある文字列に対して文字の平均サイズ，ピッチを厳密
に算出し、統合の際にそれらの平均値，分散値に応じて
統合条件を適応的に変えているので、文字の精度の高い
切り出しが行える。また、小部分パターンに着目して文
字の統合を行うので、正確でしかも高速な処理が行え
る。That is, the average size and pitch of characters are strictly calculated for a character string having an indefinite pitch and a variation in character size, and the integration conditions are adaptively adjusted according to their average and variance values during integration. , The character can be cut out with high precision. In addition, since the integration of characters is performed by focusing on the small part pattern, accurate and high-speed processing can be performed.

【００４２】また、パターン抽出手段１４が文字列の連
結情報に基づいて文字の部分パターンを抽出し、重み付
け投影手段６４が抽出された部分パターンに外接する外
接矩形の縦又は横方向の線分に対して重み付け投影を行
うことにより投影ヒストグラムを求める。Further, the pattern extracting means 14 extracts a character partial pattern based on the character string concatenation information, and the weighting projecting means 64 generates a vertical or horizontal line segment of a circumscribed rectangle circumscribing the extracted partial pattern. A projection histogram is obtained by performing weighted projection on the projection histogram.

【００４３】そして、文字列軸決定手段６６が投影ヒス
トグラムに基づき文字列軸を決定し、文字列抽出手段６
８が前記文字列軸に基づき文字列を抽出するので、文字
列同士が近接し、ある文字の一部が他の文字列に属して
いる場合にも、高精度かつ高速に文字列を抽出できる。Then, the character string axis determining means 66 determines the character string axis on the basis of the projection histogram.
Since the character string is extracted based on the character string axis, even if the character strings are close to each other and a part of a certain character belongs to another character string, the character string can be extracted with high accuracy and high speed. .

【００４４】[0044]

【実施例】以下、本発明にかかる文字の切り出し方法及
びその装置を説明する。図３は文字の切り出し方法を適
用した文字の切り出し装置の実施例１の構成ブロック図
である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a character extracting method and apparatus according to the present invention; FIG. 3 is a configuration block diagram of a first embodiment of a character cutout apparatus to which the character cutout method is applied.

【００４５】＜実施例１＞実施例１では、例えば、オー
バーハングのある手書き文字列から文字を切り出すもの
とし、不定ピッチで文字サイズの変動もある手書き文字
から一文字一文字を切り出すようにする。<First Embodiment> In the first embodiment, for example, a character is cut out from a handwritten character string having an overhang, and one character is cut out from a handwritten character having an irregular pitch and a change in character size.

【００４６】図３において、入力パターン部１２は、分
離ストローク、オーバハングを含む文字列パターンを有
する。連結パターン抽出部１４は、入力パターン部１２
から文字列入力パターンを入力してラベリングを行うこ
とにより連結パターンのみを抽出する。この連結パター
ン抽出部１４には文字列抽出部１６が接続される。In FIG. 3, the input pattern section 12 has a character string pattern including a separation stroke and an overhang. The connection pattern extraction unit 14
Then, only a connected pattern is extracted by inputting a character string input pattern and performing labeling. A character string extraction unit 16 is connected to the connection pattern extraction unit 14.

【００４７】この文字列抽出部１６は、ラベリングされ
た文字列を抽出する。この文字列抽出部１６には平均文
字サイズ算出部１８が接続される。平均文字サイズ算出
部１８は、抽出された文字列に基づき平均文字サイズを
算出するものであり、ヒストグラム算出部３２、暫定平
均文字サイズ算出部３４、文字サイズ算出領域決定部３
６、平均文字サイズ算出部３８から構成される。The character string extracting section 16 extracts a labeled character string. An average character size calculation unit 18 is connected to the character string extraction unit 16. The average character size calculation unit 18 calculates the average character size based on the extracted character string, and includes a histogram calculation unit 32, a provisional average character size calculation unit 34, and a character size calculation area determination unit 3.
6. An average character size calculation unit 38.

【００４８】ヒストグラム算出部３２は、夫々の矩形の
縦（横）方向の長さのヒストグラムを算出する。暫定平
均文字サイズ算出部３４は、前記ヒストグラムに基づき
縦（横）の平均文字サイズを算出し、このサイズを暫定
平均文字サイズとする。The histogram calculator 32 calculates a histogram of the length of each rectangle in the vertical (horizontal) direction. The provisional average character size calculation unit 34 calculates the vertical (horizontal) average character size based on the histogram, and sets this size as the provisional average character size.

【００４９】文字サイズ算出領域決定部３６は、例え
ば、暫定平均文字サイズより右の領域でヒストグラムが
頻度最大値をとる文字サイズを算出し、その文字サイズ
の左右方向にヒストグラムが頻度最大値／２以上をとる
領域を決定する。平均文字サイズ算出部３８は、その領
域で再度、平均文字サイズを算出する。この頻度最大値
とヒストグラムとの関係は図９に示す通りであるが、詳
細は段落６６以下で説明する。The character size calculation area determination unit 36 calculates the character size at which the histogram has the maximum frequency in the area to the right of the tentative average character size, for example. A region for taking the above is determined. The average character size calculation unit 38 calculates the average character size again in the area. The relationship between the maximum frequency value and the histogram is as shown in FIG. 9, and the details will be described in paragraphs 66 and thereafter.

【００５０】この平均文字サイズ算出部３８には小分離
ストローク抽出部２０が接続される。小分離ストローク
抽出部２０は、平均文字サイズと面積条件と高さ条件と
を用いて、小分離ストロークを抽出する。この小分離ス
トローク抽出部２０には、文字ピッチ算出部２１が接続
される。The small character stroke extracting unit 20 is connected to the average character size calculating unit 38. The small separation stroke extraction unit 20 extracts a small separation stroke using the average character size, the area condition, and the height condition. A character pitch calculator 21 is connected to the small separation stroke extractor 20.

【００５１】この文字ピッチ算出部２１は、小分離スト
ロークと判定されなかったものについて、外接矩形間の
距離をピッチとして平均文字ピッチを算出するものであ
る（図１０のＰ参照）。本来ならば、このピッチはギャ
ップと呼ぶべきものであるが、本明細書では当該語句の
修正がかえって混乱を来す恐れがあるので図１０のＰで
示された間隔を「ピッチ」と呼ぶ。The character pitch calculation unit 21 calculates an average character pitch for the strokes that are not determined to be small separation strokes, using the distance between the circumscribed rectangles as the pitch (see P in FIG. 10). Originally, this pitch should be called a gap. However, in the present specification, since the correction of the phrase may cause confusion, the interval shown by P in FIG. 10 is called "pitch".

【００５２】前記文字ピッチ算出部２１は、ヒストグラ
ム算出部２１ａ、暫定平均文字ピッチ算出部２１ｂ、文
字ピッチ算出領域決定部２１ｃ、平均文字ピッチ算出部
２１ｄから構成される。ヒストグラム算出部２１ａは、
夫々の矩形間のピッチのヒストグラムを算出する。暫定
平均文字ピッチ算出部２１ｂは、前記ヒストグラムに基
づき平均文字ピッチを算出し、そのピッチを暫定平均文
字ピッチとする。文字ピッチ算出領域決定部２１ｃは、
例えば、暫定平均文字ピッチより右の領域でヒストグラ
ムが頻度最大値（ＭＡＸ値）をとる文字ピッチを算出
し、その文字ピッチの左右方向にヒストグラムが頻度最
大値（ＭＡＸ値）／２以上をとる領域を決定する。平均
文字ピッチ算出部２１ｄはその領域で再度、平均文字ピ
ッチを算出する。The character pitch calculator 21 includes a histogram calculator 21a, a provisional average character pitch calculator 21b, a character pitch calculation area determiner 21c, and an average character pitch calculator 21d. The histogram calculation unit 21a
A histogram of the pitch between each rectangle is calculated. The provisional average character pitch calculation unit 21b calculates an average character pitch based on the histogram and sets the pitch as a provisional average character pitch. The character pitch calculation area determination unit 21c
For example, in a region to the right of the provisional average character pitch, a character pitch at which the histogram takes the maximum frequency value (MAX value) is calculated, and an area in which the histogram takes the maximum frequency value (MAX value) / 2 or more in the horizontal direction of the character pitch. To determine. The average character pitch calculator 21d calculates the average character pitch again in the area.

【００５３】平均文字サイズ・ピッチ統合部２２は、小
分離ストローク抽出部２０と文字ピッチ算出部２１とに
接続され、抽出された小分離ストロークと平均文字ピッ
チ，平均文字サイズ，サイズ分散値，ピッチ分散値の情
報とに基づき文字の統合を行う。この平均文字サイズ・
ピッチ統合部２２には確信度統合部２４が接続される。The average character size / pitch integration unit 22 is connected to the small separation stroke extraction unit 20 and the character pitch calculation unit 21, and the extracted small separation stroke and the average character pitch, average character size, size variance value, pitch Characters are integrated based on the information on the variance value. This average font size
The confidence integrating unit 24 is connected to the pitch integrating unit 22.

【００５４】確信度統合部２４は、抽出された小分離ス
トロークとその左右に位置する文字パターンとの距離を
算出し、それらの距離比を統合の確信度として定量化
し、確信度が高い場合には統合を行う。この確信度統合
部２４には簡易認識処理部２６が接続される。The certainty degree integration unit 24 calculates the distance between the extracted small separation stroke and the character pattern located on the left and right sides of the stroke and quantifies the distance ratio between them as the certainty degree of integration. Performs integration. A simple recognition processing unit 26 is connected to the certainty degree integration unit 24.

【００５５】簡易認識処理部２６は、オーバハングのあ
る手書き数字を対象とした処理を行うもので、小分離ス
トローク及びその左右に位置するパターン、さらにそれ
らを統合した場合のパターンに対して、線密度，傾き，
文字サイズを簡単に識別して文字の統合を行っていく。
簡易認識処理部２６は、小分離ストローク線密度算出部
４２、傾き算出部４４、文字サイズ判別部４６，５０、
右隣接矩形線密度算出部４８、複数方向線密度算出部５
２から構成される。The simple recognition processing section 26 performs processing for handwritten numerals having an overhang. The simple recognition processing section 26 performs linear density processing on small separation strokes, patterns located on the left and right thereof, and patterns obtained by integrating them. , Tilt,
Character size is easily identified and characters are integrated.
The simple recognition processing unit 26 includes a small separation stroke linear density calculation unit 42, an inclination calculation unit 44, character size determination units 46 and 50,
Right adjacent rectangular line density calculator 48, multiple direction line density calculator 5
2

【００５６】小分離ストローク線密度算出部４２は、小
さく書かれた文字かどうかを判別するために小分離スト
ロークに対して線密度を算出する。この小分離ストロー
ク線密度算出部４２には傾き算出部４４が接続される。The small separation stroke linear density calculation unit 42 calculates the line density for the small separation stroke in order to determine whether the character is a small written character. An inclination calculator 44 is connected to the small separation stroke linear density calculator 42.

【００５７】傾き算出部４４は、小分離ストロークのＸ
方向の傾き又はＹ方向の傾きを算出し、算出された傾き
を基にその傾きが５の分離ストロークであるかあるいは
７の分離ストロークであるかを判別する。この傾き算出
部４４には文字サイズ判別部４６、右隣接矩形線密度算
出部４８が接続される。The inclination calculator 44 calculates the X of the small separation stroke.
The inclination of the direction or the inclination in the Y direction is calculated, and it is determined whether the inclination is a separation stroke of 5 or 7 based on the calculated inclination. A character size determination unit 46 and a right adjacent rectangular line density calculation unit 48 are connected to the inclination calculation unit 44.

【００５８】文字サイズ判別部４６は、確実に５の角度
として算出されたものに対して、左矩形との距離が右矩
形との距離のある閾値倍（例えば１．５倍）よりも小さ
いか否かを判定する。The character size discriminating unit 46 determines whether the distance to the left rectangle is smaller than a certain threshold value (for example, 1.5 times) the distance to the right rectangle with respect to the angle reliably calculated as 5. Determine whether or not.

【００５９】右隣接矩形線密度算出部４８は、小分離ス
トロークの右のストロークの線密度を算出し、右ストロ
ークが７の右の部分かどうかを判別する。この右隣接矩
形線密度算出部４８には文字サイズ判別部５０と複数方
向線密度算出部５２とが接続される。The right adjacent rectangular line density calculator 48 calculates the line density of the right stroke of the small separation stroke, and determines whether or not the right stroke is the right part of 7. The right adjacent rectangular line density calculation unit 48 is connected to a character size determination unit 50 and a multi-direction line density calculation unit 52.

【００６０】文字サイズ判別部５０は、右隣接矩形線密
度算出部４８からの右ストロークの線密度が縦方向の左
ストロークとの距離が平均横サイズのある閾値倍（例え
ば１．８倍）より小さく、分離ストロークの傾きが所定
の範囲か否かを判定し、条件を満たす場合には５として
統合する。The character size discriminating unit 50 determines that the line density of the right stroke from the right adjacent rectangular line density calculating unit 48 is equal to or longer than the threshold value (for example, 1.8 times) of the average horizontal size as the distance from the left stroke in the vertical direction. It is determined whether the separation stroke is small and the inclination of the separation stroke is within a predetermined range.

【００６１】複数方向線密度算出部５２は、右ストロー
クの線密度の縦方向が１で、横方向も１である場合、複
数方向線密度を算出する。＜実施例１の処理＞次に、実施例１の文字の切り出し装
置によって実現される文字の切り出し方法について説明
する。図４は実施例１の文字の切り出し方法の処理フロ
ー、図５は平均文字サイズ算出処理フロー、図３０は平
均文字ピッチ算出処理フロー、図６は簡易認識による小
分離ストローク統合処理を示す図である。The multi-direction linear density calculation unit 52 calculates the multi-directional linear density when the linear density of the right stroke is 1 in the vertical direction and 1 in the horizontal direction. <Processing of First Embodiment> Next, a description will be given of a character extracting method realized by the character extracting apparatus of the first embodiment. FIG. 4 is a diagram showing a processing flow of a character cutout method according to the first embodiment, FIG. 5 is a flowchart showing an average character size calculation processing flow, FIG. 30 is a flowchart showing an average character pitch calculation processing flow, and FIG. is there.

【００６２】まず、入力パターン部１２から出力される
パターンは、極端な傾きや回転の補正を行って雑音が除
去され、かすれの穴埋め等の前処理した後の２値画像で
ある。また、文字列パターンには文字同士のオーバハン
グや分離ストロークを含み、文字同士の重なりや接触あ
るいは続け字は含まないとする。First, the pattern output from the input pattern section 12 is a binary image that has been subjected to extreme processing such as inclination correction and rotation correction to remove noise, and has been subjected to pre-processing such as fading. Further, it is assumed that the character string pattern includes overhangs and separation strokes between characters, and does not include overlapping, contact, or continuous characters between characters.

【００６３】そして、入力パターン部１２から連結パタ
ーン抽出部１４に文字列パターンが入力される（ステッ
プ１０１）。次に、個々のストロークを区別するために
連結パターン抽出部１４では、例えば、例えば図７に示
すように、８連結でつながっているパターンに対してラ
ベリング番号を付けることによりラベリングを行う（ス
テップ１０２）。Then, a character string pattern is input from the input pattern section 12 to the connection pattern extraction section 14 (step 101). Next, in order to distinguish individual strokes, for example, as shown in FIG. 7, the connection pattern extraction unit 14 performs labeling by assigning a labeling number to a pattern connected by eight connections (step 102). ).

【００６４】また、このとき、ラベリングで得られた部
分パターンのサイズが後で問題となるので、連結パター
ン抽出部１４では、ラベリングと同時に、図８に示すよ
うに、部分パターンの外接矩形座標値（左上と右下の黒
ドット部分）も算出する。At this time, since the size of the partial pattern obtained by the labeling becomes a problem later, the connected pattern extracting unit 14 simultaneously performs the labeling and the circumscribed rectangular coordinate values of the partial pattern as shown in FIG. (The upper left and lower right black dot portions) are also calculated.

【００６５】そして、文字列抽出部１６が、ラベリング
された文字列を抽出する（ステップ１０３）。ラベリン
グで得られた部分パターンは、文字の接触等がない場
合、それ自体で一文字になっているものや、一文字を構
成する部分パターンのどちらかである。これらを判別
し、部分パターンだけを抜き出すために文字の平均サイ
ズに着目する。Then, the character string extracting section 16 extracts the labeled character string (step 103). When there is no character contact or the like, the partial pattern obtained by labeling is either a single character by itself or a partial pattern constituting one character. These are discriminated, and attention is paid to the average size of characters to extract only partial patterns.

【００６６】次に、平均文字サイズ算出手段１８が、文
字列抽出部１６からの文字列の夫々の文字の外接矩形に
基づき平均文字サイズを算出する（ステップ１０４）。
この平均文字サイズの算出処理フローを図５に示す。図
５に示すように、まず、ヒストグラム算出部３２が、夫
々の矩形の縦（横）方向の長さのヒストグラムを算出す
る（ステップ１５１）。Next, the average character size calculating means 18 calculates the average character size based on the circumscribed rectangle of each character of the character string from the character string extraction unit 16 (step 104).
FIG. 5 shows a flow of the process of calculating the average character size. As shown in FIG. 5, first, the histogram calculation unit 32 calculates a histogram of the length of each rectangle in the vertical (horizontal) direction (step 151).

【００６７】そして、暫定平均文字サイズ算出部３４
が、そのヒストグラムに基づき縦（横）の平均文字サイ
ズを算出し、このサイズを暫定平均文字サイズとする
（ステップ１５２）。このとき、文字がカナ文字である
場合には、図９（ａ）に示すように、ヒストグラムは濁
点やハ，リ，クなどから生する小分離ストロークによっ
て双峰性になる。また、数字である場合には５や７など
から生ずる小分離ストローク，あるいは英字である場合
にはＡやＥなどから生ずる小分離ストロークによってヒ
ストグラムは、双峰性になる。Then, the provisional average character size calculating section 34
Calculates the vertical (horizontal) average character size based on the histogram, and sets this size as the provisional average character size (step 152). At this time, if the character is a kana character, as shown in FIG. 9A, the histogram becomes bimodal due to a small separation stroke generated from a turbid point, ha, ri, ku, and the like. In addition, the histogram becomes bimodal due to small separation strokes generated from 5 and 7 when it is a number, or small separation strokes generated from A and E when it is an alphabet.

【００６８】このため、算出された暫定平均文字サイズ
は平均文字サイズよりも小さく算出される。そこで、文
字サイズ算出領域決定部３６が、暫定平均文字サイズよ
り右の領域でヒストグラムが頻度最大値（ＭＡＸ値）を
とる文字サイズを算出し、その文字サイズの左右方向に
ヒストグラムが頻度最大値（ＭＡＸ値）／２以上をとる
領域を決定する（ステップ１５３）。Therefore, the calculated provisional average character size is calculated to be smaller than the average character size. Therefore, the character size calculation area determination unit 36 calculates the character size in which the histogram has the maximum frequency value (MAX value) in the area to the right of the provisional average character size, and the histogram has the maximum frequency value (left and right) of the character size. An area having a value of (MAX value) / 2 or more is determined (step 153).

【００６９】なお、図９（ｂ）に示すように、ヒストグ
ラムの山に偏りがある場合には、暫定文字サイズにおけ
るヒストグラムを頻度最大値（ＭＡＸ値）とし、ヒスト
グラムが頻度最大値（ＭＡＸ値）／２以上をとる領域を
決定する。As shown in FIG. 9B, when there is a bias in the peak of the histogram, the histogram at the provisional character size is set to the maximum frequency (MAX value), and the histogram is set to the maximum frequency (MAX value). / 2 or more is determined.

【００７０】そして、平均文字サイズ算出部３８が、そ
の領域で再度、平均文字サイズを算出する（ステップ１
５４）。この方法により、濁点等の小分離ストロークの
影響を受けずに、また、図９に示すようにヒストグラム
の分布に依存せずに、平均的な文字サイズの算出を行な
うことができる。Then, the average character size calculation unit 38 calculates the average character size again in the area (step 1).
54). According to this method, it is possible to calculate the average character size without being affected by the small separation stroke such as the turbid point and without depending on the distribution of the histogram as shown in FIG.

【００７１】次に、小分離ストローク抽出部２０が、既
にストローク毎に抽出された外接矩形を用いて、その外
接矩形の面積が平均文字サイズの面積の１／２以下か否
か、また、外接矩形の高さが平均文字サイズの高さの４
／５以下か否かを判定する（ステップ１０５）。Next, the small separation stroke extracting section 20 uses the circumscribed rectangle already extracted for each stroke to determine whether or not the area of the circumscribed rectangle is equal to or less than 1/2 of the area of the average character size. The height of the rectangle is 4 of the height of the average character size
It is determined whether it is / 5 or less (step 105).

【００７２】そして、小分離ストローク抽出部２０は、
面積比及び高さ比の条件を満たす場合にはその外接矩形
の部分パターンを小分離ストロークとして抽出する（ス
テップ１０６）。Then, the small separation stroke extracting unit 20
If the conditions of the area ratio and the height ratio are satisfied, the partial pattern of the circumscribed rectangle is extracted as a small separation stroke (step 106).

【００７３】ここで、横サイズ（幅）について考慮しな
いのは、５の小分離ストロークのように小分離ストロー
クではあっても、サイズ的に平均サイズと変わらないも
のが存在するからである。The reason why the horizontal size (width) is not taken into consideration is that there are some small separation strokes such as the small separation stroke of 5, which are not different in size from the average size.

【００７４】次に、文字ピッチ算出部２１では、ステッ
プ１０５の処理において小分離ストロークと判定されな
かったもの（その自体で一文字とみなされたもの）につ
いて、図１０に示すように夫々の外接矩形間の距離ｐを
ピッチとし、ピッチのヒストグラムを算出し、その結果
に基づき平均文字ピッチの算出とその分散値の算出を行
う（ステップ１０７）。Next, in the character pitch calculating section 21, for those not determined as the small separation strokes in the processing of step 105 (they are regarded as one character by themselves), as shown in FIG. A pitch histogram is calculated using the distance p as the pitch, and the average character pitch and its variance are calculated based on the histogram (step 107).

【００７５】この平均文字ピッチの算出を図３０に基づ
き説明する。まず、ヒストグラム算出部２１ａは、夫々
の矩形間のピッチのヒストグラムを算出する（ステップ
１６１）。暫定平均文字ピッチ算出部２１ｂは、前記ヒ
ストグラムに基づき平均文字ピッチを算出し、そのピッ
チを暫定平均文字ピッチとする（ステップ１６２）。文
字ピッチ算出領域決定部２１ｃは、例えば、暫定平均文
字ピッチより右の領域でヒストグラムがＭＡＸ値をとる
文字ピッチを算出し、その文字ピッチの左右方向にヒス
トグラムがＭＡＸ／２以上をとる領域を決定する（ステ
ップ１６３）。平均文字ピッチ算出部２１ｄはその領域
で再度、平均文字ピッチを算出する（ステップ１６
４）。The calculation of the average character pitch will be described with reference to FIG. First, the histogram calculator 21a calculates a histogram of the pitch between the respective rectangles (Step 161). The provisional average character pitch calculation unit 21b calculates an average character pitch based on the histogram, and sets the pitch as a provisional average character pitch (step 162). The character pitch calculation area determination unit 21c calculates, for example, a character pitch where the histogram takes a MAX value in an area to the right of the provisional average character pitch, and determines an area where the histogram takes MAX / 2 or more in the horizontal direction of the character pitch. (Step 163). The average character pitch calculation unit 21d calculates the average character pitch again in that area (step 16).
4).

【００７６】次に、平均文字サイズ・ピッチ統合部２２
は下記の評価関数Ｆに基づいて統合を行うか否かを判定
する。Ｆ＝ＭＰ／ＭＷ−（α×ＶＰ＋β）上式についてＦが零以上であるか否かを判定し（ステッ
プ１０８）、Ｆが零以上のとき、サイズ・ピッチ平均、
サイズ・ピッチ分散を用いた部分パターン同士の統合が
完了する（ステップ１０９）。Next, the average character size / pitch integration unit 22
Determines whether or not to perform integration based on the following evaluation function F. F = MP / MW- (α × VP + β) It is determined whether or not F is equal to or greater than zero in the above equation (step 108).
The integration of the partial patterns using the size / pitch dispersion is completed (step 109).

【００７７】ここで、ＭＰはピッチ平均であり、ＭＷは
サイズ平均である。ＶＰはピッチ分散であり、αは１．
６であり、βは０．５である。これらのパラメータの値
は一例である。Here, MP is a pitch average, and MW is a size average. VP is pitch variance, α is 1.
6, and β is 0.5. The values of these parameters are examples.

【００７８】すなわち、文字間の空白の度合（ピッチ平
均／サイズ平均）とのピッチ分散値の値に応じた統合判
定が完了する。次に、ステップ１０８において、Ｆが零
よりも小さい場合には、確信度統合部２４が、抽出され
た小分離ストロークとその左右に位置するパターンとの
距離を算出し、それらの距離比を統合の確信度として定
量化し、この定量化された値によって統合を行う。この
確信度による統合の具体例を示すと、確信度統合部２４
は、例えば、図１１（ｂ）及び図１１（ｃ）に示すよう
な距離ａ，ｂ，ｃ，ｄを用いて、ｂがａの２．６倍より
も大きく、ｃがｄの２．６倍よりも大きい場合には（ス
テップ１１０）、部分パターンの統合を行う（ステップ
１１１）。この場合には前述の定量化値は２．６とな
る。That is, the integration judgment according to the value of the pitch variance with the degree of blank space between characters (pitch average / size average) is completed. Next, in step 108, if F is smaller than zero, the certainty factor integration unit 24 calculates the distance between the extracted small separation stroke and the pattern located on the left and right thereof and integrates the distance ratio between them. Is quantified as a certainty factor, and integration is performed using the quantified value. A specific example of the integration based on the certainty factor will be described.
For example, using distances a, b, c, and d as shown in FIGS. 11B and 11C, b is greater than 2.6 times a and c is 2.6 times d. If it is larger than twice (step 110), the partial patterns are integrated (step 111). In this case, the aforementioned quantified value is 2.6.

【００７９】次に、ステップ１１０において、ｂがａの
２．６倍よりも小さく、ｃがｄの２．６倍よりも小さい
場合には、以下に述べるような簡易認識による小分離ス
トローク統合処理（ステップ１１２）を実行する。Next, in step 110, if b is smaller than 2.6 times a and c is smaller than 2.6 times d, small separation stroke integration processing by simple recognition described below is performed. (Step 112) is executed.

【００８０】この簡易認識による小分離ストローク統合
処理は図６に示すフローにしたがって行われる。この処
理では、同図に示すようにまず、小分離ストロークの横
／縦比が２．６より大きいか否かが判定され、ここで
は、小分離ストロークに対して、パターンマッチング的
手法を用いずにそれが一文字かあるいは文字の部分パタ
ーンかを判別する。このように、複雑な処理を行う必要
がないので、高速に処理を行えるという利点がある。The small separation stroke integration processing based on the simple recognition is performed according to the flow shown in FIG. In this processing, as shown in the figure, first, it is determined whether or not the horizontal / vertical ratio of the small separation stroke is greater than 2.6. Here, a pattern matching method is not used for the small separation stroke. To determine whether it is a single character or a character partial pattern. As described above, since there is no need to perform complicated processing, there is an advantage that processing can be performed at high speed.

【００８１】以下に図６に基づいて簡易認識による小分
離ストローク統合処理について詳述する。まず、小分離
ストローク線密度算出部４２は、小分離ストロークの横
／縦比が２．６より小さい場合には（ステップ１２
１）、小分離ストロークを５として統合する（ステップ
１２２）。そして、小分離ストロークの横／縦比が２．
６より大きい場合には、小分離ストロークの横／縦比が
１／３より小さいか否かを判定する（ステップ１２
３）。小分離ストロークの横／縦比が１／３より小さい
場合には、７のルーチンに進む。小分離ストロークの横
／縦比が１／３より大きい場合には、線密度の算出を行
う。The small stroke separation integration process based on simple recognition will be described below in detail with reference to FIG. First, when the horizontal / vertical ratio of the small separation stroke is smaller than 2.6 (step 12),
1), the small separation strokes are integrated as 5 (step 122). The horizontal / vertical ratio of the small separation stroke is 2.
If it is larger than 6, it is determined whether the aspect ratio of the small separation stroke is smaller than 1/3 (step 12).
3). If the horizontal / vertical ratio of the small separation stroke is smaller than 1/3, the routine proceeds to the routine of 7. If the horizontal / vertical ratio of the small separation stroke is greater than 1/3, the linear density is calculated.

【００８２】数字の場合、小分離ストロークとして抽出
されるものは、小さく書かれた文字か５あるいは７の小
分離ストロークに限定される。このため、まず、小分離
ストローク線密度算出部４２が、小さく書かれた１文字
と５か７の分離ストロークとを判別するために小分離ス
トロークに対して線密度を算出する。In the case of numbers, those extracted as small separation strokes are limited to small written characters or 5 or 7 small separation strokes. For this reason, first, the small separation stroke linear density calculation unit 42 calculates the line density for the small separation stroke in order to discriminate one character written small and the separation stroke of 5 or 7.

【００８３】線度の算出方法としては、図１２に示すよ
うに、小分離ストローク線度算出部４２が、外接矩形が
横長か縦長かを調べ、縦長である場合には外接矩形を横
に４当分し、まん中以外の２ラインで線度を算出する
（図１２（ａ）。横長である場合には、外接矩形を縦に
４等分して同様の処理を行う（図１２（ｂ））。なお、
線度算出方法として、このほかに外接矩形をｎ等分し、
ｎライン目からｎ−ｍライン目までにカウントした線度
の最大値を線度にとるようにしてもよい。As a method of calculating the linearity, as shown in FIG. 12, the small separation stroke linearity calculating unit 42 checks whether the circumscribed rectangle is horizontally long or vertically long. For the time being, linearity is calculated for two lines other than the center (FIG. 12 (a). If the line is horizontally long, the circumscribed rectangle is vertically divided into four equal parts and the same processing is performed (FIG. 12 (b)). In addition,
As another method of calculating the linearity, the circumscribed rectangle is divided into n equal parts.
The maximum value of the linearity counted from the nth line to the nmth line may be taken as the linearity.

【００８４】ここで、図１３に示すような横長のストロ
ークに対して横方向に線密度を算出した場合、誤った線
密度が算出されてしまうため、線密度の算出方法を外接
矩形の形に応じて変える。Here, when the line density is calculated in the horizontal direction for a horizontally long stroke as shown in FIG. 13, an incorrect line density is calculated. Therefore, the line density calculation method is changed to a circumscribed rectangle. Change accordingly.

【００８５】これによれば、パターンの凹凸の影響を受
けずに正確な線度を算出できる。そして、小分離ストロ
ーク線度算出部４２は、線度の縦方向が２以下か、また
は横方向が１以下であるか否かを判定する（ステップ１
２４）。この２つの判定条件のいずれも満たさない場合
には、小分離ストロークではないとしてＲＥＪＥＣＴす
る（ステップ１２５）。According to this, an accurate linearity can be calculated without being affected by the unevenness of the pattern. Then, the small separation stroke linearity calculation unit 42 determines whether the vertical direction of the linearity is 2 or less or the horizontal direction is 1 or less (step 1).
24). If neither of these two determination conditions is satisfied, it is determined that the stroke is not a small separation stroke, and a REJECT is performed (step 125).

【００８６】一方、条件を満たす場合には、小分離スト
ローク線密度算出部４２は、小分離ストロークの縦／横
比が１以上か否かを判定する（ステップ１２６）。この
条件を満たす場合には、傾き算出部４４が、小分離スト
ロークのＸ方向の傾きを算出する（ステップ１２７）。
前記条件を満たさない場合には、傾き算出部４４が、小
分離ストロークのＹ方向の傾きを算出する（ステップ１
２８）。On the other hand, if the condition is satisfied, the small separation stroke linear density calculation unit 42 determines whether the aspect ratio of the small separation stroke is 1 or more (step 126). If this condition is satisfied, the inclination calculation unit 44 calculates the inclination of the small separation stroke in the X direction (step 127).
If the above condition is not satisfied, the inclination calculation unit 44 calculates the inclination of the small separation stroke in the Y direction (Step 1).
28).

【００８７】傾きの算出方法については、図１４（ａ）
（ｂ）（ｃ）に示すように、外接矩形を４等分して、１
本目と３本目の線とストロークとの２交点間での傾きを
算出する。実際には、交点が点ではなく、ある幅をもつ
ので、その中点を選ぶ。FIG. 14A shows the method of calculating the inclination.
(B) As shown in (c), the circumscribed rectangle is divided into four equal parts, and
The inclination between two intersections between the first and third lines and the stroke is calculated. In practice, the intersection is not a point but has a certain width, so the midpoint is selected.

【００８８】線密度の算出方法と同様に、傾きについて
も外接矩形が横長か縦長かによって算出方法を区別す
る。横長矩形に対して、横方向に傾きを算出した場合に
誤りを生ずる可能性があるからである。Similar to the method of calculating the linear density, the method of calculating the inclination is also distinguished depending on whether the circumscribed rectangle is horizontally long or vertically long. This is because an error may occur when the inclination is calculated in the horizontal direction for a horizontally long rectangle.

【００８９】このように、外接矩形が縦長か横長かによ
って、傾きの算出方向を変えることにより、適切な傾き
が算出できる。次に、傾き算出部４４は、算出された傾
きを基にその傾きが５の分離ストロークの角度範囲（−
４０°〜２８°）であるか、７の分離ストロークの角度
範囲であるかを判別する（ステップ１２９）。手書きで
５及び７を書いたときの分離ストロークの角度について
は、図１５（ａ）（ｂ）に示すように、両者はほぼ排反
の関係にあるからである。As described above, an appropriate inclination can be calculated by changing the calculation direction of the inclination depending on whether the circumscribed rectangle is vertically long or horizontally long. Next, based on the calculated inclination, the inclination calculation unit 44 determines the angle range of the separation stroke having the inclination of 5 (−
40 ° to 28 °) or the angle range of the separation stroke of 7 (step 129). This is because, as shown in FIGS. 15 (a) and 15 (b), the angles of the separation strokes when handwriting 5 and 7 are written by hand are almost in an inversion relationship.

【００９０】ここで、５の小分離ストロークの角度分布
よりも７の小分離ストロークの角度分布のほうが広い。
そこで、文字サイズ判別部４６が、５と７との識別にあ
たって、確実に５の角度として算出されたものに対し
て、左矩形との距離が右矩形との距離の１．５倍よりも
小さいか否かを判定し（ステップ１３０）、この条件を
満たす場合には、５として統合する（ステップ１３
１）。なお、この条件を満たさない場合には、ステップ
１３２の７のルーチンに進む。Here, the angular distribution of the small separation stroke of 7 is wider than the angular distribution of the small separation stroke of 5.
Therefore, when the character size discriminating unit 46 discriminates between 5 and 7, the distance to the left rectangle is smaller than 1.5 times the distance to the right rectangle, although the angle is definitely calculated as 5. It is determined whether or not this is the case (step 130), and if this condition is satisfied, it is integrated as 5 (step 13)
1). If this condition is not satisfied, the routine proceeds to the routine of step 132-7.

【００９１】一方、ステップ１２９において、文字サイ
ズでＲＥＪＥＣＴされたもの及び７の角度として算出さ
れたものは、以下の処理を行う。まず、右隣接矩形線密
度算出部４８が、小分離ストロークの右のストロークの
線密度を算出し、右ストロークが７の右の部分かどうか
を判別する。ここでの線密度の算出方法は、図１６に示
すように、７の右の部分と２や９を区別するために、縦
と横の両方向の線密度を調べる。On the other hand, in step 129, the object rejected in the character size and the object calculated as the angle of 7 are subjected to the following processing. First, the right adjacent rectangular line density calculation unit 48 calculates the line density of the right stroke of the small separation stroke, and determines whether the right stroke is the right part of 7. As shown in FIG. 16, the line density is calculated by examining the line density in both the vertical and horizontal directions in order to distinguish the right part of 7 from 2 and 9.

【００９２】そして、右ストロークの線密度の縦方向が
２以下で横方向が１以下か否かを判定する（ステップ１
３２）。線密度の算出の結果、条件を満たさない場合に
は、文字サイズ判別部５０が、左ストロークとの距離が
平均横サイズの１．８倍より小さく、分離ストロークの
傾きが−８０°〜５１．６°か否かを判定する（ステッ
プ１３３）。この条件を満たす場合には、５として統合
し（ステップ１３１）、条件を満たさない場合には、Ｒ
ＥＪＥＣＴする（ステップ１３４）。Then, it is determined whether the vertical direction of the line density of the right stroke is 2 or less and the horizontal direction is 1 or less (step 1).
32). If the condition is not satisfied as a result of the calculation of the line density, the character size determination unit 50 determines that the distance from the left stroke is smaller than 1.8 times the average horizontal size, and the inclination of the separation stroke is −80 ° to 51. It is determined whether it is 6 ° (step 133). If this condition is satisfied, it is integrated as 5 (step 131). If the condition is not satisfied, R
EJECT is performed (step 134).

【００９３】一方、線密度の算出の結果、縦方向が２で
あって、横方向が１となる場合には（ステップ１３２，
１３５）、７の可能性があるとして、小分離ストローク
と統合したときの文字サイズを調べる。その文字サイズ
が平均文字サイズのある閾値倍以下である場合、７とし
て統合する（ステップ１３６）。On the other hand, as a result of the calculation of the linear density, if the vertical direction is 2 and the horizontal direction is 1, (step 132,
135) Assuming that there is a possibility of 7, the character size when integrated with the small separation stroke is examined. If the character size is not more than a certain threshold value of the average character size, it is integrated as 7 (step 136).

【００９４】一方、この方法で線密度を算出した場合、
ステップ１３５において、線密度の縦方向が１で、横方
向が１と算出されたものは、７の右パターンであるかど
うかを確認するために、複数方向線密度算出部５２が、
以下の方法で線密度を再度算出する。On the other hand, when the linear density is calculated by this method,
In step 135, the multi-direction linear density calculation unit 52 checks whether the line density calculated in the vertical direction is 1 and the horizontal direction is 1 is a right pattern of 7,
The linear density is calculated again by the following method.

【００９５】具体的には、図１７（ａ）（ｂ）に示す縦
線密度１，横線密度１に対して、図１７（ｃ）に示すよ
うに、外接矩形の横幅中心から縦方向に線密度を見てい
き、線密度がカウントされた時点で、横方向に線密度を
見ていき、直角線密度が２か否かを判定する（ステップ
１３７）。直角線密度が２となったものは、７として統
合する（ステップ１３６）。なお、図１７（ｄ）に示す
縦線密度１，横線密度１に対して複数方向線密度は１と
なる。More specifically, as shown in FIG. 17C, the vertical line density and the horizontal line density 1 shown in FIGS. 17A and 17B are shifted from the center of the width of the circumscribed rectangle in the vertical direction. The density is checked, and when the line density is counted, the line density is checked in the horizontal direction, and it is determined whether or not the right angle linear density is 2 (step 137). Those having a right angle linear density of 2 are integrated as 7 (step 136). Note that the linear density in the multiple directions is 1 for the vertical linear density 1 and the horizontal linear density 1 shown in FIG.

【００９６】このような直角方向に線密度を見ていくこ
とにより、従来、一方向だけの探索では判別できなかっ
たパターンの判別が行える。さらに、図１７（ｆ）に示
す文字”ク”、図１７（ｇ）に示す”Ｌ”に対しては、
直角線密度２となる。図１７（ｈ）に示す”４”の場合
に複数方向は直角方向でなくともよい。By observing the line density in such a perpendicular direction, it is possible to discriminate a pattern which could not be discriminated by the conventional search in only one direction. Further, for the character “C” shown in FIG. 17F and “L” shown in FIG.
The right angle linear density is 2. In the case of "4" shown in FIG. 17H, the plurality of directions need not be perpendicular directions.

【００９７】また、ステップ１３６において、直角線度
が２以外の線度である場合や、文字サイズでＲＥＪＥＣ
Ｔされたものについては、５の小分離ストロークの可能
性もあるので、５のルーチンに戻り、５として統合した
ときの文字サイズを調べる。そして、条件を満たす場合
には文字を統合し、条件を満たさない場合にはＲＥＪＥ
ＣＴする。また、図１８に本実施例で切り出される数字
の文字パターンの一例を示す。In step 136, if the right-angle linearity is a linearity other than 2, or if the character size is
For the result of T, there is a possibility of a small separation stroke of 5, so the routine returns to the routine of 5, and the character size at the time of integration as 5 is examined. If the condition is satisfied, the characters are integrated, and if the condition is not satisfied, REJE
CT. FIG. 18 shows an example of a character pattern of numbers cut out in this embodiment.

【００９８】このように実施例によれば、不定ピッチ，
文字サイズの変動のある文字列に対して文字の平均サイ
ズ，ピッチを厳密に算出し、統合の際にそれらの平均値
と分散値に応じて統合条件を適応的に変えているので、
文字の精度の高い切り出しが行える。特に、手書き数字
文字列に対しては、パターンマッチング的手法を用いず
に小分離ストロークに注目した簡易認識処理部２６を用
いているので、正確で高速な処理が行える。すなわち、
文字列中の全てのパターンに一様な処理を施すのではな
く、小分離ストロークに注目した処理を施すことによ
り、切り出し処理全体での処理の高速化を図れる。As described above, according to the embodiment, the variable pitch,
Since the average size and pitch of characters are strictly calculated for character strings with variable character sizes, the integration conditions are adaptively changed according to their average and variance values during integration.
Characters can be cut out with high precision. In particular, for the handwritten numeric character strings, accurate and high-speed processing can be performed because the simple recognition processing unit 26 that focuses on small separation strokes is used without using a pattern matching method. That is,
By performing processing focusing on small separation strokes instead of performing uniform processing on all patterns in a character string, it is possible to speed up the entire cutout processing.

【００９９】また、文字列中の全ての外接矩形の幅のヒ
ストグラムを算出し、まず暫定的に平均文字サイズを算
出し、その値に基づき正確に文字サイズを算出するの
で、文字列中の文字サイズの変動が激しい場合やオーバ
ハングのある文字列の場合でもより正確に平均文字サイ
ズが算出できる。その結果、文字の統合を的確に行うこ
とができる。Also, the histogram of the widths of all the circumscribed rectangles in the character string is calculated, the average character size is calculated temporarily, and the character size is accurately calculated based on the value. The average character size can be calculated more accurately even when the size fluctuates greatly or when the character string has an overhang. As a result, the characters can be integrated accurately.

【０１００】さらに、文字列中の文字サイズ，ピッチの
平均値，分散値に応じて、小分離ストローク統合の際の
条件を適応的に変えることにより文字サイズ，ピッチの
変動に依存せずに、より正確な統合が行える。Further, the conditions for integrating small separation strokes are adaptively changed in accordance with the character size and the average value and the variance of the pitch in the character string, so that the conditions are independent of the variation in the character size and pitch. More accurate integration can be performed.

【０１０１】また、カナ文字中の濁点や数字の分離スト
ローク等が存在するとき、それらのパターンも含めて文
字間のピッチを算出すると、実際のピッチ間隔より小さ
いピッチが算出される。それらの小分離ストロークを予
め除外して考えることにより、より正確なピッチの算出
が可能となる。When a pitch between characters is calculated including kana dots and separation strokes of numbers in kana characters, a pitch smaller than the actual pitch interval is calculated. By excluding these small separation strokes in advance, it is possible to calculate the pitch more accurately.

【０１０２】文字列中の文字の並び方の規則性により分
離ストロークを統合する際の閾値を適応的に変えるた
め、より正確な文字の統合が行える。さらに、文字列中
の文字の並び方に規則性がないが、分離ストロークとそ
の左右に位置するパターンとの距離比を確信度として定
量化し、その値に応じて統合を行うため、正確な統合を
行える。＜実施例２＞以下、図面を参照して実施例２を説明す
る。図１９は実施例２の構成ブロック図である。実施例
２は文字列の抽出装置である。文字列の抽出装置は、入
力パターン部１２、連結パターン抽出部１４、外接矩形
統合部６２、重み付け投影部６４、文字列軸決定部６
６、文字列抽出部６８を備えている。Since the threshold for integrating the separated strokes is adaptively changed according to the regularity of the arrangement of the characters in the character string, more accurate character integration can be performed. Furthermore, although the arrangement of the characters in the character string is not regular, the distance ratio between the separation stroke and the pattern located on the left and right of the separation stroke is quantified as a certainty factor, and integration is performed according to the value. I can do it. Embodiment 2 Embodiment 2 will be described below with reference to the drawings. FIG. 19 is a configuration block diagram of the second embodiment. The second embodiment is a character string extracting device. The character string extraction device includes an input pattern unit 12, a connection pattern extraction unit 14, a circumscribed rectangle integration unit 62, a weighting projection unit 64, and a character string axis determination unit 6.
6. A character string extracting unit 68 is provided.

【０１０３】入力パターン部１２には、複数の文字列パ
ターンが入力される。連結パターン抽出部１４は、入力
パターン部１２で入力された複数文字列入力パターンに
対してラベリングを行うことにより連結パターンのみを
抽出し、夫々のパターン外接矩形を求める。この連結パ
ターン抽出部１４には文字形状判定部６１が接続され
る。The input pattern section 12 receives a plurality of character string patterns. The connection pattern extraction unit 14 extracts only the connection pattern by performing labeling on the plurality of character string input patterns input by the input pattern unit 12, and obtains each pattern circumscribed rectangle. A character shape determination unit 61 is connected to the connection pattern extraction unit 14.

【０１０４】文字形状判定部６１は、文字パターンの形
状が複雑かどうかを判定する。文字形状判定部６１には
外接矩形統合部６２、重み付け投影部６４が接続され
る。外接矩形統合部６２は、外接矩形同士で重なりのあ
るものを統合する。この外接矩形統合部６２の出力は重
み付け投影部６４に接続される。The character shape determining section 61 determines whether the character pattern has a complicated shape. A circumscribed rectangle integrating unit 62 and a weighting projection unit 64 are connected to the character shape determination unit 61. The circumscribed rectangle integrating unit 62 integrates overlapping circumscribed rectangles. The output of the circumscribed rectangle integrating unit 62 is connected to the weighting projection unit 64.

【０１０５】重み付け投影部６４は、その外接矩形に対
して、高さの中心に重み付けした投影を行うことにより
投影ヒストグラムを算出する。この重み付け投影部６４
には文字列軸決定部６６が接続される。The weighting projection unit 64 calculates a projection histogram by performing weighting projection on the circumscribed rectangle at the center of the height. This weighted projection unit 64
Is connected to a character string axis determination unit 66.

【０１０６】文字列軸決定部６６は、投影ヒストグラム
を基にそのピーク値を探索して夫々のピーク値を結ぶ軸
を文字列軸に決定する。この文字列軸決定部６６には文
字列抽出部６８が接続される。The character string axis determining section 66 searches for the peak value based on the projection histogram and determines the axis connecting the respective peak values as the character string axis. A character string extracting unit 68 is connected to the character string axis determining unit 66.

【０１０７】文字列抽出部６８は、文字列軸決定部６６
で決定された文字列軸に基づき文字列を抽出する。さら
に、前記統合の結果に対して平均文字サイズを算出する
文字サイズ算出部７４、平均文字サイズの所定倍以上の
外接矩形を接触文字塊として除去する接触文字除去部７
５を備える。例えば、上下左右の文字列間で文字が接触
し、入力データ中の文字サイズがほぼ一様であることが
予め分かっている場合にこれらが用いられる。＜実施例２の処理＞以下、図面を参照して実施例２を説
明する。図２０は実施例２の処理フロー図である。ま
ず、連結パターン抽出部１４が、入力パターン部１２か
ら文字列パターンを入力する（ステップ２０１）。The character string extracting unit 68 includes a character string axis determining unit 66
The character string is extracted based on the character string axis determined in. Further, a character size calculation unit 74 for calculating an average character size based on the result of the integration, and a contact character removal unit 7 for removing a circumscribed rectangle having a predetermined size or more of the average character size as a contact character block.
5 is provided. For example, these characters are used when a character touches between upper, lower, left, and right character strings and it is known in advance that the character size in the input data is substantially uniform. <Process of Second Embodiment> The second embodiment will be described below with reference to the drawings. FIG. 20 is a processing flowchart of the second embodiment. First, the connection pattern extracting unit 14 inputs a character string pattern from the input pattern unit 12 (Step 201).

【０１０８】そして、個々のパターンを区別するために
連結パターン抽出部１４が、８連結でつながっているパ
ターンをラベリングにより抽出し、同時に図２１に示す
ようにその外接矩形を求める（ステップ２０２）。Then, in order to distinguish individual patterns, the connected pattern extracting unit 14 extracts patterns connected by eight connections by labeling, and at the same time, obtains a circumscribed rectangle as shown in FIG. 21 (step 202).

【０１０９】次に、文字形状判定部６１が、文字形状が
複雑かどうかを判定する（ステップ２０３）。文字形状
が複雑な場合とは、漢字のように１つの文字が複数の部
分パターンから構成されている場合を指す。このとき、
部分パターン同士の重なりの数が闘値以上かどうかで複
雑さを判定する。Next, the character shape determining section 61 determines whether or not the character shape is complicated (step 203). The case where the character shape is complicated refers to the case where one character is composed of a plurality of partial patterns like a kanji. At this time,
The complexity is determined based on whether the number of overlaps between the partial patterns is equal to or greater than the threshold value.

【０１１０】一方、これに該当しない場合、すなわち文
字形状が簡単な場合には、次に、重み付け投影部６４
は、その外接矩形に対して、図２２に示すように高さの
中心に重み付けした投影を行うことにより投影ヒストグ
ラムを算出する（ステップ２０４）。重み付けの値は、
高さの中心にピークをもち、周辺に行くに従って減衰す
るものとする。このとき、高さ線分全体に値をもつとす
ると、投影したヒストグラムに谷が生じない場合があ
る。このため、値をもつのは高さ線分全体のｎ％（ｎ；
実数）だけとする。図２２（ａ）では、処理の高速化を
考え、３角形状としている。On the other hand, if this is not the case, that is, if the character shape is simple, then the weighted projection unit 64
Calculates a projection histogram by performing weighted projection on the center of height as shown in FIG. 22 on the circumscribed rectangle (step 204). The weight value is
It has a peak at the center of the height and decays toward the periphery. At this time, if the entire height line segment has a value, there may be no valley in the projected histogram. For this reason, the value has n% (n;
Real number) only. In FIG. 22A, the shape is triangular in consideration of speeding up the processing.

【０１１１】投影の結果として得られる投影ヒストグラ
ムは、比較的明瞭なピーク値をもつようになる。文字列
軸決定部６６は、投影ヒストグラムを基にそのピーク値
を探索して夫々のピーク値を結ぶ軸を文字列軸に決定す
る（ステップ２０５）。この決定された文字列軸Ｘ1，
Ｘ2を図２３に示す。The projection histogram resulting from the projection will have relatively distinct peak values. The character string axis determination unit 66 searches for the peak value based on the projection histogram, and determines the axis connecting the respective peak values as the character string axis (step 205). This determined character string axis X1,
X2 is shown in FIG.

【０１１２】そして、文字列抽出部６８は、各々の外接
矩形と文字列軸との距離を算出し、距離が最小となった
文字列軸にその外接矩形が属するものとし、文字列を抽
出する（ステップ２０６）。Then, the character string extracting section 68 calculates the distance between each circumscribed rectangle and the character string axis, determines that the circumscribed rectangle belongs to the character string axis having the shortest distance, and extracts the character string. (Step 206).

【０１１３】一方、ステップ２０３において、文字パタ
ーンの形状が複雑なものに対しては、前記重み付け投影
にあっては、明瞭なピークが生じない場合がある。そこ
で、前記投影の前処理として、外接矩形統合部６２が、
外接矩形同士で重なりのあるものを統合し（ステップ２
０７）、この統合処理を行った後に、上下（左右）の文
字列間で文字同士の接触があるか否かを判定する（ステ
ップ２０８）。文字同士の接触がない場合には、ステッ
プ２０５の処理に進む。On the other hand, in step 203, when the character pattern has a complicated shape, a clear peak may not be generated in the weighted projection. Therefore, as preprocessing of the projection, the circumscribed rectangle integrating unit 62
Merge overlapping circumscribed rectangles (Step 2
07) After performing this integration process, it is determined whether or not there is any contact between characters between the upper and lower (left and right) character strings (step 208). If there is no contact between characters, the process proceeds to step 205.

【０１１４】一方、上下（左右）の文字列間で文字同士
の接触があり、かつ、入力データ中の文字サイズがほぼ
一様であることが予めわかっている場合には、文字サイ
ズ算出部７４が、統合の結果を用いて平均文字サイズを
算出する（ステップ２０９）。そして、接触文字除去部
７５が、そのサイズ情報を基にサイズの大きい外接矩形
である接触文字塊を除去し（ステップ２１０）、その後
に前期重み付け投影処理を施す。さらにその後、この投
影処理により得られる投影ヒストグラムに基づいて文字
列軸決定部６６がそのピーク値を探索して夫々のピーク
値を結ぶ軸を文字列軸に決定する（ステップ２０５）。
そして、文字列抽出部６８は、各々の外接矩形と文字列
軸との距離を算出し、距離が最小となった文字列軸にそ
の外接矩形が属するものとし、文字列を抽出する（ステ
ップ２０６）。On the other hand, if there is a contact between characters between the upper and lower (left and right) character strings and it is known in advance that the character size in the input data is substantially uniform, the character size calculation unit 74 Calculates the average character size using the result of the integration (step 209). Then, the contact character removing unit 75 removes the contact character block that is a large circumscribed rectangle based on the size information (step 210), and thereafter performs the weighted projection process. Thereafter, the character string axis determination unit 66 searches for the peak value based on the projection histogram obtained by the projection processing, and determines the axis connecting the respective peak values as the character string axis (step 205).
Then, the character string extracting unit 68 calculates the distance between each circumscribed rectangle and the character string axis, determines that the circumscribed rectangle belongs to the character string axis with the shortest distance, and extracts the character string (step 206). ).

【０１１５】前記統合処理により、より明瞭な投影ヒス
トグラムが得られる。また、重み付け投影により高速に
且つ精度よく文字列を抽出できる。さらに、文字形状が
複雑で文字接触がある場合でも接触文字塊が除去される
ため、正確に文字列を抽出できる。By the integration process, a clearer projection histogram can be obtained. In addition, a character string can be extracted at high speed and with high accuracy by weighted projection. Furthermore, even when the character shape is complicated and there is a character contact, the contact character block is removed, so that the character string can be accurately extracted.

【０１１６】また、例えば、上下左右の文字列間で文字
が接触し、入力データ中の文字サイズがほぼ一様である
ことが予め分かっている場合には、外接矩形に対して文
字の高さの平均を算出する文字高さ算出部７１（図示せ
ず）を設ける。さらに、平均文字高さのサイズの所定倍
以上の外接矩形を接触文字塊として除去する接触文字除
去部７２（図示せず）を備えるようにしてもよい。そし
て、接触文字塊を除く外接矩形に対して前記ステップ２
０４以後の処理を施すようにしてもい。For example, if a character touches between upper, lower, left, and right character strings and it is known in advance that the character size in the input data is substantially uniform, the height of the character relative to the circumscribed rectangle is determined. Is provided with a character height calculation unit 71 (not shown) for calculating the average of. Further, a contact character removing unit 72 (not shown) that removes a circumscribed rectangle having a size equal to or more than a predetermined size of the average character height as a contact character block may be provided. Then, step 2 is applied to the circumscribed rectangle excluding the contact character block.
The processing after 04 may be performed.

【０１１７】これにより、上下に文字同士の接触がある
ときでも正確に文字列を抽出することができる。＜実施例３＞次に、実施例３について説明する。図２４
は実施例３の構成ブロック図である。重み付け投影部６
４ａは、パターンの外接矩形の縦又は横方向線分の上端
及び下端をピークとしその上端からの距離，下端からの
距離に応じて重み付け投影を行う。文字列軸決定部６６
ａは、投影ヒストグラムの上端のピーク値と下端のピー
ク値とから文字列の中心軸を決定する。文字列抽出部６
８ａは、接触文字列塊を除く外接矩形に対して前記中心
軸と夫々の外接矩形の中心との距離とに基づきパターン
の属する文字列を抽出するようになっている。As a result, a character string can be accurately extracted even when there is contact between characters at the top and bottom. Third Embodiment Next, a third embodiment will be described. FIG.
FIG. 9 is a configuration block diagram of a third embodiment. Weighting projection unit 6
In 4a, the upper and lower ends of the vertical or horizontal line segment of the circumscribed rectangle of the pattern are peaked, and weighted projection is performed according to the distance from the upper end and the distance from the lower end. Character string axis determination unit 66
“a” determines the center axis of the character string from the peak value at the upper end and the peak value at the lower end of the projection histogram. String extractor 6
8a extracts a character string to which the pattern belongs based on the distance between the center axis and the center of each circumscribed rectangle for the circumscribed rectangle excluding the contact character string block.

【０１１８】さらに、複数の文字列軸が横切る外接矩形
を接触文字列塊として除去する文字接触除去部７３を備
える。＜実施例３の処理＞図２５は実施例３の処理フロー図で
ある。次に、実施例３の処理を説明する。重み付け投影
部６４ａが、図２２（ｂ）に示すように、パターンの外
接矩形の縦又は横方向線分の上端及び下端をピークとし
その上端からの距離，下端からの距離に応じて重み付け
投影を行う（ステップ３０１）。この方法では、上
（下）端から外接矩形の高さ中心に向かって値が減少す
るように重み付けをした投影を行う。これにより、図２
２（ｂ）に示す上端の投影と下端の投影が得られる。Further, there is provided a character contact removal section 73 for removing a circumscribed rectangle crossed by a plurality of character string axes as a contact character string block. <Processing of Third Embodiment> FIG. 25 is a processing flowchart of the third embodiment. Next, processing of the third embodiment will be described. As shown in FIG. 22 (b), the weighting projection unit 64a sets the vertical and horizontal line segments of the circumscribed rectangle of the pattern at the upper and lower ends as peaks, and performs weighted projection according to the distance from the upper end and the distance from the lower end. Perform (Step 301). In this method, weighted projection is performed so that the value decreases from the upper (lower) end toward the center of the height of the circumscribed rectangle. As a result, FIG.
The projection of the upper end and the projection of the lower end shown in FIG. 2B are obtained.

【０１１９】そして、文字列軸決定部６６ａが、投影ヒ
ストグラムの上端のピーク値と下端のピーク値とを求め
て、そのピーク値の中心を文字列の中心軸に決定する
（ステップ３０２）。Then, the character string axis determining unit 66a obtains the peak value at the upper end and the peak value at the lower end of the projection histogram, and determines the center of the peak value as the central axis of the character string (step 302).

【０１２０】次に、接触文字除去部７３が、夫々の外接
矩形矩形のうち、複数の文字列軸が横切る外接矩形を上
下に文字が接触したものとして除外する（ステップ３０
３）。ここでは、図２２（ｂ）中の外接矩形Ｋ1，Ｋ2で
ある。そして、文字列抽出部６８ａが、各残りの外接矩
形に対して、夫々の外接矩形と文字列軸との距離を算出
し、距離が最小となった文字列軸にその外接矩形が属す
るものとし、文字列を抽出する（ステップ３０４）。Next, the contact character removing unit 73 excludes a circumscribed rectangle that is crossed by a plurality of character string axes from among the circumscribed rectangular rectangles as characters that have been vertically touched (step 30).
3). Here, they are the circumscribed rectangles K1, K2 in FIG. Then, the character string extraction unit 68a calculates the distance between each of the circumscribed rectangles and the character string axis with respect to each of the remaining circumscribed rectangles, and assumes that the circumscribed rectangle belongs to the character string axis with the minimum distance. Then, a character string is extracted (step 304).

【０１２１】図２６に実施例２の方法と従来方法である
黒画素の投影結果と示す。また、図２７に実施例２の方
法と従来方法の外接矩形中心の重み付け投影結果を示
す。図２６及び図２７において、文字列右端のヒストグ
ラムは実施例２の処理結果であり、左端のヒストグラム
は従来手法の処理結果を示す。FIG. 26 shows the method of the second embodiment and the result of projection of black pixels according to the conventional method. FIG. 27 shows the weighted projection results of the center of the circumscribed rectangle in the method of the second embodiment and the conventional method. 26 and 27, the histogram at the right end of the character string is the processing result of the second embodiment, and the histogram at the left end is the processing result of the conventional method.

【０１２２】前記図２７の従来技術（文字列左端）のヒ
ストグラムは、同図に示すように、文字同士が上下で接
触しているにもかかわらず、単純に外接矩形の重み付け
投影を行った場合である。これに対して本実施例２で
は、図２０のステップ２０７〜２１０に対して本実施例
２では、図２０のステップ２０７〜２１０に示したよう
な接触文字の除去を行った後に重み付け投影を行ってい
るため、高精度な文字列抽出が可能となっている。As shown in FIG. 27, the histogram of the prior art (left end of the character string) is obtained by simply performing weighted projection of the circumscribed rectangle despite the fact that the characters are in contact with each other up and down. It is. On the other hand, in the second embodiment, in contrast to steps 207 to 210 in FIG. 20, in the second embodiment, weighted projection is performed after removing the contact characters as shown in steps 207 to 210 in FIG. Therefore, highly accurate character string extraction is possible.

【０１２３】図２８に上下の文字列間で文字接触がある
場合の文字列抽出結果を示す。左端のヒストグラムが上
端投影の結果を示し、右端のヒストグラムが下端投影の
結果を示している。そして、夫々の投影のヒストグラム
のピークを示す。文字列軸はこのピークの中心となる。FIG. 28 shows the result of character string extraction when there is a character contact between the upper and lower character strings. The leftmost histogram shows the result of the top projection, and the rightmost histogram shows the result of the bottom projection. Then, the peak of the histogram of each projection is shown. The string axis is the center of this peak.

【０１２４】このように実施例２及び実施例３によれ
ば、文字列同士が近接し、ある文字の一部が他の文字列
に属している場合にも、パターン外接矩形の縦（幅）方
向線分に対して、その高さ位置に応じた重み付け投影を
行うことにより、高い精度かつ高速に文字列を抽出する
ことができる。As described above, according to the second and third embodiments, even when the character strings are close to each other and a part of a certain character belongs to another character string, the vertical (width) of the circumscribed rectangle of the pattern can be reduced. By performing weighted projection according to the height position on the direction line segment, a character string can be extracted with high accuracy and high speed.

【０１２５】なお、実施例２の構成と実施例３との構成
を組み合せて用いるようにしてもよい。さらに、オンラ
イン手書き文字列に対して、重み付け投影部が、外接矩
形の右端から左端に向かって減衰する重み付け投影と左
端から右端に向かって減衰する重み付け投影とを行い、
投影ヒストグラムの夫々のピークを求める。夫々のピー
ク値から１文字の中心位置と１文字の存在領域を推定す
るようにしてもよい。Note that the configuration of the second embodiment and the configuration of the third embodiment may be used in combination. Further, for the online handwritten character string, the weighting projection unit performs a weighted projection that attenuates from the right end to the left end and a weighted projection that attenuates from the left end to the right end of the circumscribed rectangle,
Find each peak of the projection histogram. The center position of one character and the region where one character exists may be estimated from the respective peak values.

【０１２６】例えば、オンライン手書き文字で”マ”と
書く場合にまず、図３１（ａ）に示す１ストローク目
で”マ”の上部についての右端の投影と左端の投影を行
う。次に、図３１（ｂ）に示す２ストローク目で”マ”
の下部についての右端の投影と左端の投影を行う。そし
て、図３１（ｃ）に示すように夫々の端について投影を
合成すると、１文字の存在領域が求められる。For example, when writing "ma" in online handwritten characters, first, the right end projection and the left end projection of the upper part of "ma" are performed at the first stroke shown in FIG. Next, at the second stroke shown in FIG.
Projection of the right end and the left end of the lower part of. Then, as shown in FIG. 31 (c), when the projections are combined for each end, the existence area of one character is obtained.

【０１２７】また、オーバハングのある文字列の場合に
は図３２（ｂ）に示すように６３の２文字の夫々につい
て右端の投影と左端の投影を行うことにより、１文字の
存在領域を求める。In the case of a character string having an overhang, as shown in FIG. 32 (b), the right-edge projection and the left-edge projection are performed for each of the 63 characters, thereby obtaining the one-character existence area.

【０１２８】これによれば、投影ヒストグラムに明瞭な
谷間が生成されるため、個々の文字の右端左端及び中心
位置を容易に且つ正確に決定できる。According to this, since a clear valley is generated in the projection histogram, the right and left ends and the center position of each character can be easily and accurately determined.

【０１２９】[0129]

【発明の効果】本発明によれば、不定ピッチ，文字サイ
ズの変動のある文字列に対して文字の平均サイズ，ピッ
チを厳密に算出し、統合の際にそれらの値及びその分散
に応じて統合条件を適応的に変えているので、文字の精
度の高い切り出しが行える。According to the present invention, the average size and pitch of a character are strictly calculated for a character string having an indefinite pitch and a variation in character size. Since the integration conditions are adaptively changed, characters can be cut out with high precision.

【０１３０】また、文字列同士が近接し、ある文字の一
部が他の文字列に属している場合にも、パターン外接矩
形の縦（幅）方向線分に対して、その高さ位置に応じた
重み付け投影を行うことにより、高精度かつ高速に文字
列を抽出することができる。Further, even when character strings are close to each other and a part of a certain character belongs to another character string, the vertical position of the circumscribed rectangle of the pattern in the vertical (width) direction is also set at the height position. By performing appropriate weighted projection, a character string can be extracted with high accuracy and high speed.

[Brief description of the drawings]

【図１】第１の発明に係る文字の切り出し装置の原理
図である。FIG. 1 is a principle diagram of a character cutout device according to a first invention.

【図２】第２の発明に係る文字列の抽出装置の原理図
である。FIG. 2 is a principle diagram of a character string extracting device according to a second invention.

【図３】実施例１の構成ブロック図である。FIG. 3 is a configuration block diagram of a first embodiment.

【図４】実施例１の処理フローである。FIG. 4 is a processing flow of the first embodiment.

【図５】平均文字サイズ算出処理フローである。FIG. 5 is a flowchart of an average character size calculation process.

【図６】簡易認識による小分離ストローク統合処理を
示すフローFIG. 6 is a flowchart showing small separation stroke integration processing by simple recognition.

【図７】ラベリングを示す図である。FIG. 7 is a diagram showing labeling.

【図８】部分パターンの外接矩形座標値の算出を示す
図である。FIG. 8 is a diagram showing calculation of a circumscribed rectangular coordinate value of a partial pattern.

【図９】平均文字サイズ算出方法を示す図である。FIG. 9 is a diagram showing an average character size calculation method.

【図１０】ピッチの算出方法を示す図である。FIG. 10 is a diagram illustrating a method of calculating a pitch.

【図１１】統合を示す図である。FIG. 11 is a diagram showing integration.

【図１２】線密度算出法を示す図である。FIG. 12 is a diagram showing a linear density calculation method.

【図１３】横長のストロークに対して横方向線密度算
出した場合の失敗例を示す図である。FIG. 13 is a diagram illustrating an example of failure when calculating a horizontal linear density for a horizontally long stroke.

【図１４】傾き算出法を示す図である。FIG. 14 is a diagram illustrating a slope calculation method.

【図１５】５及び７の分離ストロークの角度を示す図
である。FIG. 15 is a diagram showing angles of separation strokes of 5 and 7;

【図１６】線密度算出法を説明する図である。FIG. 16 is a diagram illustrating a linear density calculation method.

【図１７】複数方向線密度算出法を説明する図であ
る。FIG. 17 is a diagram illustrating a method of calculating a linear density in a plurality of directions.

【図１８】文字の切り出しパターンの一例を示す図FIG. 18 is a view showing an example of a character cutout pattern;

【図１９】実施例２の構成ブロック図である。FIG. 19 is a configuration block diagram of a second embodiment.

【図２０】実施例２の処理フロー図である。FIG. 20 is a processing flowchart of the second embodiment.

【図２１】夫々の文字の外接矩形を示す図である。FIG. 21 is a diagram showing a circumscribed rectangle of each character.

【図２２】重み付け投影を示す図である。FIG. 22 is a diagram showing weighted projection.

【図２３】文字列軸を示す図である。FIG. 23 is a diagram illustrating a character string axis.

【図２４】実施例３の構成ブロック図である。FIG. 24 is a configuration block diagram of a third embodiment.

【図２５】実施例３の処理フロー図である。FIG. 25 is a processing flowchart of the third embodiment.

【図２６】実施例２の方法と従来方法による黒画素の
投影結果を示す図FIG. 26 is a diagram showing a result of projecting black pixels by the method of the second embodiment and the conventional method.

【図２７】実施例２の方法と従来方法による外接矩形
中心の重み付け投影結果を示す図FIG. 27 is a diagram showing weighted projection results of the center of a circumscribed rectangle by the method of the second embodiment and the conventional method.

【図２８】上下の文字列間で文字接触がある場合の文
字列抽出結果を示す図である。FIG. 28 is a diagram showing a character string extraction result when there is a character contact between upper and lower character strings.

【図２９】文字列間のオーバハング例を示す図であ
る。FIG. 29 is a diagram illustrating an example of overhang between character strings.

【図３０】平均文字ピッチ算出処理フローである。FIG. 30 is an average character pitch calculation processing flow.

【図３１】オンライン手書き文字でマと書く場合の一
文字存在領域の例を示す図である。FIG. 31 is a diagram illustrating an example of a one-character existence area in the case of writing “ma” with online handwritten characters.

【図３２】オーバハングのある文字列の場合の一文字
存在領域の例を示す図である。FIG. 32 is a diagram illustrating an example of a one-character existence area in the case of a character string having an overhang.

[Explanation of symbols]

１２・・入力パターン部１４・・連結パターン抽出部１６・・文字列抽出部１８・・平均文字サイズ算出手段２０・・小分離ストローク抽出部２１・・文字ピッチ算出部２２・・平均文字サイズ・ピッチ統合部２４・・確信度統合部２６・・簡易認識処理部３２・・ヒストグラム算出部３４・暫定平均文字サイズ算出部３６・・文字サイズ算出領域決定部３８・・平均文字サイズ算出部４２・・小分離ストローク線密度算出部４４・・傾き算出部４６，５０・・文字サイズ判別部４８・・右隣接矩形線密度算出部５２・・直角線密度算出部６１・・文字形状判定部６２・・外接矩形統合部６４・・重み付け投影部６６・・文字列軸決定部６８・・文字列抽出部７１・・文字高算出部７２，７３，７５・・接触文字除去部７４・・文字サイズ算出部 12. Input pattern part 14. Concatenated pattern extraction part 16. Character string extraction part 18. Average character size calculation means 20 Small separation stroke extraction part 21 Character pitch calculation part 22 Average character size Pitch integration unit 24, certainty integration unit 26, simple recognition processing unit 32, histogram calculation unit 34, provisional average character size calculation unit 36, character size calculation region determination unit 38, average character size calculation unit 42・ Small separation stroke linear density calculator 44 ・・ Slope calculator 46,50 ・・ Character size discriminator 48 ・・ Right adjacent rectangular line density calculator 52 ・・ Right angle linear density calculator 61 ・・ Character shape determiner 62 ・A circumscribed rectangle integrating unit 64 a weighting projection unit 66 a character string axis determining unit 68 a character string extracting unit 71 a character height calculating unit 72, 73, 75 a contact character removing unit 74 ..Character size calculator

Claims

[Claims]

1. A pattern extracting means (14) for extracting a partial pattern of a character based on character string connection information, and weighting a vertical or horizontal line segment of a circumscribed rectangle circumscribing the extracted partial pattern. Weighting projection means (64) for obtaining a projection histogram by performing projection; character string axis determination means (66) for determining a character string axis based on the peak value of the projection histogram; and a center of the character string axis and a circumscribed rectangle. Character string extracting means (68) for extracting a character string to which the partial pattern belongs based on the distance to the character string.

2. The character weighting means according to claim 1, wherein said weighted projection means (64) performs weighted projection according to a distance from the center of a vertical or horizontal line segment of a circumscribed rectangle of the pattern as a peak. The column axis determining means (66) determines the central axis of the character string from the peak value of the projection histogram, and the character string extracting means (68) determines the central axis based on the distance between the central axis and the center of each circumscribed rectangle. A character string cutout device for extracting a character string to which a pattern belongs.

3. A method according to claim 1, further comprising: calculating means for calculating an average of the heights of the characters with respect to the circumscribed rectangle; A character string cutout device comprising:

4. A removing means (7) for removing a circumscribed rectangle crossed by a plurality of character string axes as a contact character string block.
The weighted projection means (64) performs weighted projection in accordance with the distance from the upper end and the lower end, with the upper and lower ends of the vertical or horizontal line segment of the circumscribed rectangle of the pattern as peaks, The character string axis determining means (66) determines the central axis of the character string from the peak value at the upper end and the peak value at the lower end of the projection histogram, and the character string extracting means (68) determines the contact character string chunk. A character string cutout device for extracting a character string to which a pattern belongs based on the distance between the center axis and the center of each of the circumscribed rectangles to be excluded.

5. The method according to claim 1, wherein said weighted projection means (64) performs weighted projection on an upper end and a lower end of a circumscribed rectangle of the pattern, and said character string axis determining means (66) operates on an upper end of the projection histogram. And the center position of the character string is determined from the candidate position of the upper end and the candidate position of the lower end. The character string extracting means (68) determines the central axis and the respective circumscribed rectangles. Character string extracting device for extracting a character string to which a pattern belongs based on a distance from the center of the character string.

6. The method according to claim 2, further comprising:
A character string clipping device including a circumscribed rectangle integrating unit (61) that integrates overlapping circumscribed rectangles when the respective circumscribed rectangles overlap.

7. A method according to claim 6, further comprising: calculating means for calculating an average character size based on the result of the integration; and removing means for removing a circumscribed rectangle having a predetermined size or more of the average character size as a contact character block. 75).

8. The weighted projection means (64) according to claim 1, wherein said weighted projection means (64) attenuates from one end to the other end of the circumscribed rectangle and a weighted projection attenuates from the other end to one end. A character string cutout apparatus that obtains respective peaks of the projection histogram, and estimates a center position of one character and an existing area of one character from the respective peak values.

9. An extraction step of extracting a character partial pattern based on character string connection information, and performing weighted projection on a vertical or horizontal line segment of a circumscribed rectangle circumscribing the extracted partial pattern. A projecting step of obtaining a projection histogram by: a decision step of deciding a character string axis based on a peak value of the projection histogram, and a character string to which the partial pattern belongs based on a distance between the character string axis and the center of a circumscribed rectangle. A character string cutting method including an extracting step of extracting.