JPH1021332A

JPH1021332A - Non-linear normalizing method

Info

Publication number: JPH1021332A
Application number: JP8173406A
Authority: JP
Inventors: Fumiyoshi Nishio; 文祥西尾; Yasutaka Watanabe; 康隆渡辺
Original assignee: Tamura Electric Works Ltd
Current assignee: Tamura Electric Works Ltd
Priority date: 1996-07-03
Filing date: 1996-07-03
Publication date: 1998-01-23

Abstract

PROBLEM TO BE SOLVED: To extract a curved character like a handwritten character as straight character lines. SOLUTION: In the case of finding out distances between respective character lines constituting a character to be discriminated and distances between respective sides of a circumscribed rectangular frame 11 and respective character lines adjacent to the frame 11, setting up the inverses of respective distances as line density and executing non-linear normalizing processing in accordance with the line density, the line density of a part adjacent to each character line is set up to zero. Thereby a slightly curved character is extracted as straight character lines. A circumscribed rectangular frame 12 is formed on the outside of the frame 11, distances between respective character lines adjacent to the frame 11 and respective sides of the frame 12 are found out and the inverses of the distances are set up as line density. Consequently a character picture obtained by the non-linear normalizing processing is prevented from being reduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、手書き文字等の被
識別文字を認識する文字認識技術に関し、特に被識別文
字画像を非線形正規化処理する非線形正規化方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition technique for recognizing a character to be identified such as a handwritten character, and more particularly to a non-linear normalization method for performing non-linear normalization processing on an image of a character to be identified.

【０００２】[0002]

【従来の技術】一般に文字認識を行う場合は図９に示す
認識処理フローに従った手順で行われる。即ち、まず前
処理ステップＳ１において、認識されるべき被認識文字
の原画像からノイズ成分を除去する。次いで方向パター
ン作成ステップＳ２において、ノイズ成分が除去された
被認識文字の原画像から輪郭線画像を得て、この輪郭線
画像を各画素に分割し、文字情報を有する黒画素または
文字情報を有しない白画素の２値画素として構成する。
そして、黒画素または白画素の各画素からなる２値画像
から輪郭線画像の方向パターンを抽出する。2. Description of the Related Art In general, character recognition is performed according to a procedure according to a recognition processing flow shown in FIG. That is, first, in a preprocessing step S1, a noise component is removed from an original image of a character to be recognized. Next, in a direction pattern creation step S2, a contour image is obtained from the original image of the recognized character from which the noise component has been removed, and the contour image is divided into pixels, and a black pixel having character information or character information is stored. It is configured as a binary pixel of a white pixel which is not used.
Then, the direction pattern of the contour image is extracted from the binary image including the black pixels or the white pixels.

【０００３】こうして方向パターンが抽出された後で、
次の非線形正規化ステップＳ３では後述するように輪郭
線画像をその画像の各点について異なる倍率で拡大また
は縮小変換する。そして非線形正規化処理された画像の
各方向パターンについて、次の特徴ベクトル作成ステッ
プＳ４では、８×８の小領域（正方領域）に等分割し、
各小領域の黒画素数を特徴量とすることで、各方向パタ
ーンの特徴ベクトルを作成する。After the direction pattern is extracted in this way,
In the next non-linear normalization step S3, the outline image is enlarged or reduced at different magnifications for each point of the image as described later. In the next feature vector creation step S4, each direction pattern of the image subjected to the nonlinear normalization processing is equally divided into 8 × 8 small areas (square areas).
By using the number of black pixels of each small area as a feature amount, a feature vector of each direction pattern is created.

【０００４】次に大分類ステップＳ５では、上記ステッ
プＳ４で求めた入力文字（被認識文字）の方向特徴ベク
トルに対し、予め設けた辞書のベクトルの中で距離の近
いベクトルを有する候補文字を距離の近いものから順に
複数抽出する。この距離計算は、式（１）に示す４−近
傍距離（４−ｎｅｉｇｈｏｒ−ｄｉｓｔａｎｃｅ）と呼
ばれる一般的な計算法を用いて演算される。[0004] Next, in a large classification step S5, a candidate character having a vector having a short distance among vectors in a dictionary provided in advance is compared with the directional feature vector of the input character (character to be recognized) obtained in step S4. A plurality is extracted in order from the closest one. This distance calculation is performed using a general calculation method called 4-neighbor-distance shown in Expression (1).

【０００５】即ち、ｄ4 （（ｉ、ｊ），（ｈ，ｋ））＝｜ｉ−ｈ｜＋｜ｊ−ｋ｜（１）（但し、（ｉ、ｊ）は入力文字の座標値、（ｈ，ｋ）は
辞書の文字の座標値）その後、詳細分類ステップＳ６では、精緻なパターン
（例えば、別に３２×３２次元の特徴ベクトルを用意す
る）を用いて詳細比較を行うことで最終的な文字識別を
行う。That is, d4 ((i, j), (h, k)) = | i−h | + | j−k | (1) (where (i, j) is the coordinate value of the input character, (h, k) are the coordinate values of the characters in the dictionary.) Then, in the detailed classification step S6, a detailed comparison is performed using a fine pattern (for example, a separate 32 × 32-dimensional feature vector is prepared) to make the final comparison. Perform character identification.

【０００６】図８は非線形正規化処理の概要を示す図で
ある。上述したように、方向パターン作成ステップＳ２
では図８（ａ）に示す原図形（輪郭線画像とする）か
ら、図８（ｂ）に示す水平，垂直，右下がり及び右上が
りの各方向パターンを抽出するが、これとは別に、非線
形正規化ステップＳ３では、まず図８（ａ）の原図形か
ら線間隔を求めて図８（ｃ）に示す線密度情報を得る。FIG. 8 is a diagram showing an outline of the nonlinear normalization processing. As described above, the direction pattern creation step S2
In FIG. 8B, horizontal, vertical, downward-sloping and upward-sloping directional patterns shown in FIG. 8B are extracted from the original figure (contour image) shown in FIG. 8A. In the normalization step S3, first, a line interval is obtained from the original figure in FIG. 8A to obtain line density information shown in FIG. 8C.

【０００７】ここで原図形を８×８の領域に分割する
が、このときの各領域の線密度が上記線密度情報と同一
の密度となるように分割線の位置を決定する非線形正規
化処理を行う。図８（ｄ）は、その非線形正規化処理の
結果を示す図である。そしてこの分割線の位置を示す非
線形正規化情報をもとに図８（ｂ）のそれぞれの方向パ
ターンに対して非線形写像を行い、図８（ｅ）のような
各方向パターン毎の処理結果を得る。こうして非線形正
規化処理された画像の各方向パターンについて、次の特
徴ベクトル作成ステップＳ４で特徴ベクトルが抽出され
る。Here, the original figure is divided into 8 × 8 regions. At this time, the position of the dividing line is determined so that the line density of each region is the same as the line density information. I do. FIG. 8D is a diagram showing a result of the nonlinear normalization processing. Then, non-linear mapping is performed on each direction pattern in FIG. 8B based on the non-linear normalization information indicating the position of the dividing line, and the processing result for each direction pattern as shown in FIG. obtain. In each of the directional patterns of the image subjected to the non-linear normalization processing, a feature vector is extracted in the next feature vector creation step S4.

【０００８】[0008]

【発明が解決しようとする課題】従来の非線形正規化処
理では、原図形から線密度情報を得る場合、文字線と文
字線間の線間隔（線間幅）を計数して求めている。この
ため、手書き文字のような多少湾曲している直線状の文
字線の場合は、純然たる直線の文字線の場合には発生し
ない線密度が湾曲部に生じ、これに応じて文字線が湾曲
状態のまま抽出されることから、その後の文字認識に悪
影響を与えるという問題があり、直線の文字線として抽
出することが要望されている。従って本発明は、手書き
文字のような個人差のある直線状の文字線を直線文字線
として抽出することを目的とする。In the conventional non-linear normalization processing, when obtaining line density information from an original figure, a line interval (line width) between character lines is counted and obtained. For this reason, in the case of a slightly curved straight character line such as a handwritten character, a line density that does not occur in the case of a purely straight character line occurs in the curved portion, and the character line is curved accordingly. Since it is extracted as it is, there is a problem that the subsequent character recognition is adversely affected, and it is desired to extract as a straight character line. Accordingly, an object of the present invention is to extract a linear character line having individual differences such as handwritten characters as a straight character line.

【０００９】[0009]

【課題を解決するための手段】このような課題を解決す
るために本発明は、被識別文字に外接する第１の外接矩
形枠を設け、被識別文字を構成する各文字線間の距離、
及び第１の外接矩形枠の各辺と第１の外接矩形枠に隣接
する各文字線間の距離を求め、各距離の逆数を線密度と
してこの線密度情報に応じて第１の外接矩形枠を各小領
域に分割すると共に、分割した各小領域の各情報を規定
の大きさで等分割された各領域に各個に写像する非線形
正規化方法において、各文字線の近傍の線密度を零にす
るようにした方法である。従って、手書き文字のような
多少湾曲している文字線の場合はその近傍に生じる線密
度が取り除かれ、こうした湾曲文字線を正規な直線文字
線として抽出できる。また、第１の外接矩形枠の外部に
この第１の外接矩形枠の定数倍の第２の外接矩形枠を設
け、第１の外接矩形枠に隣接する各文字線と第２の外接
矩形枠の各辺との距離を求めこの距離の逆数を線密度と
する。この結果、第１の外接矩形枠の各辺に隣接した各
文字線が対応する各辺に近接するような文字画像の線密
度を小さくでき、従って非線形正規化した場合に生じる
その文字画像の縮小を防止できることから、非線形正規
化後に例えば「田」と「由」のような類似文字の判別が
困難になるといった問題を回避できる。In order to solve the above-mentioned problems, the present invention provides a first circumscribed rectangular frame circumscribing a character to be identified, the distance between each character line constituting the character to be identified,
And determining a distance between each side of the first circumscribed rectangular frame and each character line adjacent to the first circumscribed rectangular frame, and using a reciprocal of each distance as a line density in accordance with the line density information, the first circumscribed rectangular frame. Is divided into small regions, and the linear density near each character line is reduced to zero in a non-linear normalization method in which each information of each divided small region is mapped to each of the equally divided regions of a prescribed size. This is the method that was used. Therefore, in the case of a slightly curved character line such as a handwritten character, the line density generated in the vicinity thereof is removed, and such a curved character line can be extracted as a regular straight character line. Further, a second circumscribed rectangular frame having a constant multiple of the first circumscribed rectangular frame is provided outside the first circumscribed rectangular frame, and each character line adjacent to the first circumscribed rectangular frame is connected to the second circumscribed rectangular frame. The distance to each side is determined, and the reciprocal of this distance is defined as the linear density. As a result, the line density of a character image in which each character line adjacent to each side of the first circumscribed rectangular frame is close to the corresponding side can be reduced, and therefore, the reduction of the character image caused by nonlinear normalization Therefore, it is possible to avoid a problem that it becomes difficult to distinguish similar characters such as “Ta” and “Yu” after nonlinear normalization.

【００１０】[0010]

【発明の実施の形態】以下、本発明について図面を参照
して説明する。図１は本発明を適用した装置の構成を示
すブロック図である。同図において、本装置は、ＣＰＵ
１、タッチパネル２、タッチパネルの下面に形成された
ＬＣＤ３、メモリ４、及び辞書メモリ５からなる手書き
文字入力装置である。ここで、タッチパネル２上の例え
ばＡ点からＢ点までの間を、図示しない入力ペンにより
押下しながら移動させると、ＣＰＵ１はこの間の入力ペ
ンの移動軌跡を複数の座標値としてタッチパネル２を介
して入力する。そして、メモリ４にそのＡ点からＢ点ま
での線分を記憶すると共に、ＬＣＤ３の対応部分にその
線分を表示する。このようにしてペン入力に基づく手書
き文字をＬＣＤ３に表示することができる。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an apparatus to which the present invention is applied. Referring to FIG.
1, a touch panel 2, an LCD 3 formed on the lower surface of the touch panel, a memory 4, and a dictionary memory 5. Here, when the touch panel 2 is moved while being pressed by an input pen (not shown) between points A and B on the touch panel 2, for example, the CPU 1 sets the movement locus of the input pen during this time as a plurality of coordinate values via the touch panel 2. input. Then, the line segment from the point A to the point B is stored in the memory 4 and the line segment is displayed on the corresponding portion of the LCD 3. In this manner, handwritten characters based on pen input can be displayed on the LCD 3.

【００１１】また、こうしてメモリ４に記憶されかつＬ
ＣＤ３に表示された手書き文字について、ＣＰＵ１は以
下に示すような各処理を行い、その処理結果と辞書メモ
リ５内に記憶された各字種とを比較することで、その手
書き文字がどの字種に該当するかを認識する。即ち、こ
の場合ＣＰＵ１はまず被識別文字である入力文字（手書
き文字）からノイズ等を除去して滑らかな文字とする前
処理を行う。その後、図２（ａ）に示すこの原画像から
図２（ｂ）に示すような輪郭線画像を得て、その輪郭線
画像を６４×６４の各画素として分割する。The data stored in the memory 4 and L
The CPU 1 performs the following processes on the handwritten characters displayed on the CD 3, and compares the processing result with each of the character types stored in the dictionary memory 5. Recognize whether it corresponds to. That is, in this case, the CPU 1 first performs preprocessing to remove noise and the like from the input characters (handwritten characters), which are the characters to be identified, to make the characters smooth. Thereafter, a contour image as shown in FIG. 2B is obtained from the original image shown in FIG. 2A, and the contour image is divided into 64 × 64 pixels.

【００１２】そして、その輪郭線を構成する全画素（黒
画素）について、それぞれ各画素が有している方向成分
を検出する。即ち、輪郭線を構成する任意の黒画素に着
目し、その着目画素と、その着目画素の前後の黒画素と
の接続パターンから、着目画素の方向成分を検出する。
図３はこのような黒画素の方向成分の検出状況を示す図
である。Then, with respect to all the pixels (black pixels) constituting the outline, the direction component of each pixel is detected. That is, an arbitrary black pixel constituting the contour is focused on, and a directional component of the pixel of interest is detected from a connection pattern between the pixel of interest and black pixels before and after the pixel of interest.
FIG. 3 is a diagram showing a detection state of such a directional component of a black pixel.

【００１３】ここで、図３において、斜線部分の画素ａ
〜ｃは輪郭線を構成する黒画素を示し、斜線部分以外の
画素は白画素であるとする。そして、着目画素をｂ、そ
の前後の黒画素をそれぞれａ，ｃとすると、輪郭線は、
黒画素ａから水平に着目画素ｂに入り、さらに着目画素
ｂを通って右上がり方向にある黒画素ｃに達することか
ら、着目画素ｂでは、水平方向成分と右上がり方向成分
の２つの方向成分を有することが分かる。そしてこれら
２つの方向成分が着目画素ｂの方向成分として検出され
る。なお、輪郭線を構成する黒画素の方向成分としては
上述の２つの方向成分の他に、垂直方向成分と右下がり
方向成分があるが、１つの黒画素については、これら４
つの方向成分のうち２つの方向成分が割り付けられる。
ＣＰＵ１は、輪郭線の方向パターンを抽出する場合、輪
郭線上の全ての各黒画素について、各黒画素毎に順次方
向成分を検出し、検出した各黒画素の方向成分に基づき
輪郭線の水平，垂直，右上がり及び右下がりの各方向パ
ターンを抽出する。Here, in FIG. 3, a pixel a in a shaded area
Ｃc indicate black pixels constituting the contour line, and pixels other than the hatched portions are assumed to be white pixels. If the pixel of interest is b and the black pixels before and after it are a and c, respectively, the outline is
The pixel of interest b enters the target pixel b horizontally from the black pixel a, and further reaches the black pixel c in the upward right direction through the pixel of interest b. Therefore, the target pixel b has two directional components of a horizontal component and a right upward component. It can be seen that Then, these two directional components are detected as the directional components of the pixel of interest b. Note that, in addition to the above-described two directional components, there are a vertical component and a rightward downward component as the directional components of the black pixels forming the contour line.
Two directional components of the one directional component are allocated.
When extracting the directional pattern of the contour, the CPU 1 sequentially detects the directional component for each black pixel for all the black pixels on the contour, and based on the detected directional components of the black pixels, detects the horizontal and horizontal components of the contour. Vertical, upward and downward directional patterns are extracted.

【００１４】次にＣＰＵ１は輪郭線画像の非線形正規化
処理を行う。即ち、図４（ａ）に示すように、「田」と
いう文字画像の各文字線（文字ストローク）間の距離
（例えば、図中のＡ，Ｂ，Ｃ，Ｄ）を求めてこの逆数を
線密度とする。ここで、文字線間の距離は、水平方向と
垂直方向とを別々に計数し、その逆数をそれぞれ水平方
向の線密度及び垂直方向の線密度とする。このようにし
て各文字線間（即ち、図中の文字線（黒部分）以外の白
点）の線密度が求められる。なお文字線（即ち、黒点）
の線密度を「０」とする。また、図中において、１１は
処理対象の文字に外接する外接矩形枠である。Next, the CPU 1 performs a non-linear normalization process of the contour image. That is, as shown in FIG. 4 (a), the distance (for example, A, B, C, D in the figure) between each character line (character stroke) of the character image "" is obtained, and this reciprocal is represented by a line. Density. Here, the distance between the character lines is counted separately in the horizontal direction and the vertical direction, and the reciprocals thereof are defined as the line density in the horizontal direction and the line density in the vertical direction, respectively. In this way, the line density between character lines (that is, white points other than the character lines (black portions) in the figure) is obtained. Note that character lines (that is, black dots)
Is "0". In the figure, reference numeral 11 denotes a circumscribed rectangular frame that circumscribes a character to be processed.

【００１５】次に外接矩形枠１１を座標に置き換え、図
４（ｂ）に示すように、全ての白点の水平方向の線密度
をｘ軸方向に、垂直方向の線密度をｙ軸方向に集積す
る。これにより、水平方向及び垂直方向における線密度
ヒストグラムが作成される。なお、図４（ｂ）中の２１
は任意の白点を示す。次にＣＰＵ１は線密度ヒストグラ
ムを８分割する。そしてこのとき、図４（ｃ）に示すよ
うに、８分割された各々のヒストグラムの和が全て同一
値となるように分割位置を定める。そしてその分割位置
で定められる各分割線により外接矩形枠１１の領域を水
平及び垂直方向にそれぞれ８分割し、８×８の小領域と
する。Next, the circumscribed rectangular frame 11 is replaced with coordinates, and as shown in FIG. 4B, the horizontal line densities of all the white points are in the x-axis direction and the vertical line densities are in the y-axis direction. Collect. Thereby, a line density histogram in the horizontal direction and the vertical direction is created. Note that 21 in FIG.
Indicates an arbitrary white point. Next, the CPU 1 divides the line density histogram into eight. At this time, as shown in FIG. 4C, the division positions are determined so that the sum of each of the eight divided histograms has the same value. Then, the area of the circumscribed rectangular frame 11 is divided into eight in each of the horizontal and vertical directions by each division line determined by the division position, to be an 8 × 8 small area.

【００１６】こうして分割された図４（ｄ）に示す８×
８の各小領域を、規定の大きさで８×８に等分割された
図４（ｅ）の正方矩形に写像する。なお、図４では説明
の都合上原画像が示されているが、ここで扱われる画像
は実際には原画像の輪郭線である。この場合、個々の領
域においては、写像元の大きさとそれに対応する写像先
の大きさとが異なるが、写像元の個々の領域を拡大また
は縮小することによって写像先の領域に対応させて写像
する。The 8 × shown in FIG.
Each small area of 8 is mapped to a square rectangle of FIG. 4E equally divided into 8 × 8 with a specified size. Although the original image is shown in FIG. 4 for convenience of explanation, the image handled here is actually a contour line of the original image. In this case, the size of the mapping source and the size of the corresponding mapping destination are different in each region, but the individual regions of the mapping source are enlarged or reduced to map in correspondence with the region of the mapping destination.

【００１７】このような非線形正規化処理は、輪郭線の
垂直，水平，右上がり及び右下がりの各方向パターン毎
に行われ処理の終了後、ＣＰＵ１は等分割した各小領域
の黒画素数を計数し、各小領域毎の計数値をその方向パ
ターンの方向特徴ベクトルとして求める。そしてその
後、求めた方向特徴ベクトルに対し、辞書メモリ５内の
辞書のベクトルの中で距離の近いベクトルを有する候補
文字を距離の近いものから順に複数抽出し、抽出された
候補文字と入力文字とをさらに詳細に比較することで、
最終的な文字識別を行う。Such a non-linear normalization process is performed for each of the vertical, horizontal, upward-sloping and downward-sloping contour patterns, and after the process is completed, the CPU 1 determines the number of black pixels of each equally divided small area. Counting is performed, and a count value for each small area is obtained as a directional feature vector of the directional pattern. Then, with respect to the obtained direction feature vector, a plurality of candidate characters having a vector having a short distance among the vectors of the dictionary in the dictionary memory 5 are extracted in order from the one with the shortest distance, and the extracted candidate character and the input character are extracted. By comparing in more detail,
Perform final character identification.

【００１８】ところで、原図形から線密度を得る場合
は、文字線と文字線間の距離（線間幅）を計数してその
逆数として求めているが、図７（ａ）に示す多少湾曲し
ている直線状の文字線３１の場合は、湾曲部の距離ｈに
より純然たる直線の文字線では発生しない線密度を生
じ、正規な直線文字として抽出することができない。ま
た、「園」や「田」といった文字は、図７（ｂ）に示す
ように外接矩形枠１１と文字線との間の線間隔Ａ，Ｂ，
Ｃが小さい。このような場合、この文字画像を非線形正
規化処理すると、線間隔Ａ，Ｂ，Ｃの部分ではその線間
隔の逆数である線密度の値は大となり、従って非線形正
規化後のその文字の外形は外接矩形枠１１と離間し、図
７（ｄ）に示すように本来の外形の大きさからかなり縮
小されたものになる。このため「田」と「由」のような
類似文字の判別が困難になる。When the line density is obtained from the original figure, the distance between the character lines (line width) is counted and obtained as its reciprocal. However, the curve is slightly curved as shown in FIG. In the case of the straight character line 31, a line density that does not occur in a pure straight character line is generated due to the distance h of the curved portion, and it cannot be extracted as a normal straight character. In addition, characters such as “garden” and “field” have line spacings A, B, and B between the circumscribed rectangular frame 11 and the character line as shown in FIG.
C is small. In such a case, if this character image is subjected to nonlinear normalization processing, the value of the line density, which is the reciprocal of the line interval, becomes large at the line intervals A, B, and C. Therefore, the outline of the character after nonlinear normalization Is separated from the circumscribed rectangular frame 11, and is considerably reduced from the original outer size as shown in FIG. 7D. For this reason, it is difficult to distinguish similar characters such as “ta” and “yu”.

【００１９】そこで、線密度を得る場合は、ある白点の
位置に着目し、その白点の周囲において、どの方向に文
字線が存在するかを調べ、その情報を基に文字線までの
距離を求めるようにする。そして求めた距離の逆数から
線密度を得る。まず図５に示すように、外接矩形枠１１
の外部にこの外接矩形枠１１の定数倍の外接矩形枠１２
を設ける。そして、図５（ａ）のように白点２１の上下
左右の何れか１つに文字線３１が存在する場合は、文字
線３１が存在する方向に関しては、文字線３１と外接矩
形枠１２の白点２１が存在する側の一辺間の距離ｈを計
数しその逆数を線密度とする。また、文字線３１が存在
しない方向に関しては、外接矩形枠１２の各辺間の距離
ｗの逆数を線密度とする。Therefore, when obtaining the line density, attention is paid to the position of a certain white point, the direction of the character line around the white point is examined, and the distance to the character line is determined based on the information. To ask. Then, the linear density is obtained from the reciprocal of the obtained distance. First, as shown in FIG.
Outside a circumscribed rectangular frame 12 which is a constant multiple of this circumscribed rectangular frame 11
Is provided. Then, as shown in FIG. 5A, when the character line 31 exists at any one of the upper, lower, left, and right of the white point 21, regarding the direction in which the character line 31 exists, the character line 31 and the circumscribed rectangular frame 12 The distance h between the sides where the white point 21 exists is counted, and the reciprocal thereof is defined as the linear density. In the direction in which the character line 31 does not exist, the reciprocal of the distance w between the sides of the circumscribed rectangular frame 12 is defined as the line density.

【００２０】次に、図５（ｂ）のように白点２１の上下
に文字線３１Ａ，３１Ｂが存在する場合（または、白点
２１の左右に各文字線が存在する場合）は、文字線３１
Ａ，３１Ｂが存在する方向に関しては、各文字線間の距
離ｈの逆数を線密度とする。また、文字線が存在しない
方向に関しては外接矩形枠１２の各辺間の距離ｗの逆数
を線密度とする。次に、図５（ｃ）のように白点２１の
上と左に文字線３１Ａ，３１Ｂが存在する場合（また
は、白点２１の上と右、下と左、下と右の何れであって
も良い）は、各文字線３１Ａ，３１Ｂと外接矩形枠１２
の各辺のうち白点２１が存在する側の一辺間の距離ｈ，
ｗをそれぞれ計数しこれらの逆数をそれぞれ線密度とす
る。Next, as shown in FIG. 5B, when the character lines 31A and 31B exist above and below the white point 21 (or when each character line exists to the left and right of the white point 21), the character line 31
Regarding the direction in which A and 31B exist, the reciprocal of the distance h between the character lines is defined as the line density. In the direction in which no character line exists, the reciprocal of the distance w between the sides of the circumscribed rectangular frame 12 is defined as the line density. Next, as shown in FIG. 5C, when the character lines 31A and 31B exist above and to the left of the white point 21 (or any of the upper and right sides, the lower and left sides, and the lower and right sides of the white point 21). May be used), each character line 31A, 31B and the circumscribed rectangular frame 12
Of each side of the side where the white point 21 exists, h,
w is counted, and the reciprocal thereof is defined as the linear density.

【００２１】次に、図５（ｄ）のように白点２１の左右
と上に文字線３１Ａ，３１Ｂ，３１Ｃが存在する場合
（または、白点２１の左右と下、上下と左、上下と右の
何れであっても良い）は、文字線３１Ａと外接矩形枠１
２の各辺のうち白点２１が存在する側の一辺間の距離ｈ
を計数してその逆数を線密度とする。また、各文字線３
１Ｂ，３１Ｃ間の距離ｗを計数してその逆数を線密度と
する。次に、図５（ｅ）のように白点２１の上下左右す
べてに文字線３１Ａ〜３１Ｄが存在する場合は、対向す
る各文字線間の距離ｈ，ｗをそれぞれ計数し、これらの
逆数をそれぞれ線密度とする。Next, as shown in FIG. 5D, when character lines 31A, 31B and 31C are present on the left and right of and above the white point 21 (or on the left and right and below the white point 21, up and down and left and up and down, respectively). (It may be any one on the right).
2 is a distance h between one side of the side where the white point 21 exists.
And the reciprocal thereof is defined as the linear density. In addition, each character line 3
The distance w between 1B and 31C is counted, and the reciprocal thereof is defined as the linear density. Next, as shown in FIG. 5E, when the character lines 31A to 31D exist at all of the upper, lower, left, and right sides of the white point 21, the distances h and w between the opposing character lines are counted, and the reciprocal thereof is calculated. Each is defined as a linear density.

【００２２】このようにして文字画像の線密度を得るこ
とができる。従って、図６（ｆ）に示すような、外接矩
形枠１１と文字線との間の距離Ａ，Ｄを求めてその逆数
を線密度とし、距離Ａ，Ｄが短い場合に非線形正規化に
よって文字画像が縮小するといった問題を回避すること
ができる。即ち、図６（ｇ）に示すように外接矩形枠１
１の外部に外接矩形枠１２を設け、上記距離Ａ，Ｄを外
接矩形枠１２との間の距離Ａ’，Ｄ’として距離値を広
げることにより線密度を小さくする。この結果、非線形
正規化処理後の文字外形の大きさは縮小されずに処理前
の大きさを保持することができ、従って例えば「田」と
「由」のような類似文字の判別が困難になるといった問
題を回避できる。Thus, the linear density of the character image can be obtained. Therefore, as shown in FIG. 6 (f), the distances A and D between the circumscribed rectangular frame 11 and the character line are obtained, and the reciprocal thereof is defined as the line density. When the distances A and D are short, the character is obtained by nonlinear normalization. The problem that the image is reduced can be avoided. That is, as shown in FIG.
1, a circumscribed rectangular frame 12 is provided outside, and the distances A and D are set as distances A ′ and D ′ between the circumscribed rectangular frame 12 to increase the distance value, thereby reducing the line density. As a result, the size of the character outer shape after the non-linear normalization processing can be maintained without being reduced, and therefore, it is difficult to distinguish similar characters such as “ta” and “yu”, for example. Can be avoided.

【００２３】また、図６（ａ）〜（ｄ）に示すように、
白点２１の水平位置を、垂直方向の文字線（例えば、文
字線３１Ｂ）のほぼ先端位置に相当する位置にし、白点
２１から水平文字線３１Ａまでの垂直方向距離ｈを計数
すると共に、垂直文字線３１Ｂから白点２１を通って水
平文字線３１Ａの他端の位置まで（或いは他の垂直文字
線３１Ｃまで）の水平方向距離ｗを計数し、計数した垂
直方向距離ｈ及び水平方向距離ｗの大小を比較すること
によって、水平文字線３１Ａのみを直線文字として抽出
することができる。As shown in FIGS. 6A to 6D,
The horizontal position of the white point 21 is set to a position substantially corresponding to the leading end position of a vertical character line (for example, the character line 31B), and the vertical distance h from the white point 21 to the horizontal character line 31A is counted. The horizontal distance w from the character line 31B to the position of the other end of the horizontal character line 31A through the white point 21 (or to another vertical character line 31C) is counted, and the counted vertical distance h and horizontal distance w By comparing the sizes of the horizontal character lines 31A, only the horizontal character lines 31A can be extracted as straight-line characters.

【００２４】即ち、垂直方向距離ｈが水平方向距離ｗに
比較して極端に小さい図６（ｂ），（ｄ）のような場合
は、その距離ｈを「０」とし、従ってその線密度を
「０」とする。従って、図６（ｂ），（ｄ）の場合は、
１つの文字線３１Ａと見なして直線文字線３１のみを抽
出することができる。なお、垂直方向距離ｈが水平方向
距離ｗに比較して極端に小さくない図６（ａ），（ｃ）
のような場合は、その距離ｈ「０」とせずに保存しその
逆数を線密度とする。That is, in the case where the vertical distance h is extremely small as compared with the horizontal distance w as shown in FIGS. 6B and 6D, the distance h is set to "0", and the linear density is set to "0". It is set to “0”. Therefore, in the case of FIGS. 6B and 6D,
Considering one character line 31A, only the straight character line 31 can be extracted. 6 (a) and 6 (c) that the vertical distance h is not extremely small compared to the horizontal distance w.
In such a case, the distance h is not set to “0” and stored, and the reciprocal thereof is set as the linear density.

【００２５】このようにして、垂直方向距離ｈが水平方
向距離ｗに比較して極端に小さい場合は距離ｈは「０」
となり、また同様に水平方向距離ｗが垂直方向距離ｈに
比較して極端に小さい場合も距離ｗは「０」になる。従
って、図６（ｅ）に示すような直線部分が多少湾曲して
いる手書き文字「上」の文字線の周囲は線間隔が「０」
（即ち、線密度は「０」）の領域を有することになる。
このように、垂直及び水平の２方向の距離を比較して一
方の距離が他方の距離より極端に小さい場合は、一方の
距離を「０」にするようにしたので、図７（ａ）に示す
ような多少湾曲している直線状の文字線３１の周囲では
線密度は「０」となり、従って直線の文字線として抽出
できる。なお、この直線文字抽出の際には、必ずしも外
接矩形枠１２を設ける必要はない。なお、この実施の形
態では、手書き文字の例について説明したが、ＯＣＲで
読み取った文字データなど、ビットマップデータであれ
ばどのような文字データでも良い。In this way, when the vertical distance h is extremely small compared to the horizontal distance w, the distance h is "0".
Similarly, when the horizontal distance w is extremely smaller than the vertical distance h, the distance w is “0”. Accordingly, the line spacing is “0” around the character line of the handwritten character “upper” in which the straight line portion is slightly curved as shown in FIG.
(That is, the linear density is “0”).
As described above, when one distance is extremely smaller than the other distance by comparing the distances in the vertical and horizontal directions, one distance is set to “0”. The line density becomes “0” around the slightly curved straight character line 31 as shown, and thus can be extracted as a straight character line. Note that it is not always necessary to provide the circumscribed rectangular frame 12 at the time of this straight-line character extraction. In this embodiment, an example of handwritten characters has been described, but any character data may be used as long as it is bitmap data, such as character data read by OCR.

【００２６】[0026]

【発明の効果】以上説明したように本発明によれば、被
識別文字に外接する第１の外接矩形枠を設け、被識別文
字を構成する各文字線間の距離、及び第１の外接矩形枠
の各辺と第１の外接矩形枠に隣接する各文字線間の距離
を求め、各距離の逆数を線密度としてこの線密度情報に
応じて第１の外接矩形枠を各小領域に分割すると共に、
分割した各小領域の各情報を規定の大きさで等分割され
た各領域に各個に写像する場合、各文字線の近傍の線密
度を零にするようにしたので、手書き文字のように多少
湾曲している文字線の場合はその近傍に生じる線密度が
除去され、従って正規な直線文字線として抽出できる。
また、第１の外接矩形枠の外部にこの第１の外接矩形枠
の定数倍の第２の外接矩形枠を設け、第１の外接矩形枠
に隣接する各文字線と第２の外接矩形枠の各辺との距離
を求めこの距離の逆数を線密度とするようにしたので、
第１の外接矩形枠の各辺に隣接した各文字線が対応する
各辺に近接するような文字画像の線密度を小さくでき、
従って非線形正規化した場合に生じるその文字画像の縮
小を防止できることから、非線形正規化後に例えば
「田」と「由」のような類似文字の判別が困難になると
いった問題を回避できる。As described above, according to the present invention, the first circumscribed rectangular frame circumscribing the character to be identified is provided, the distance between each character line constituting the character to be identified, and the first circumscribed rectangle. The distance between each side of the frame and each character line adjacent to the first circumscribed rectangular frame is determined, and the first circumscribed rectangular frame is divided into small areas in accordance with the line density information using the reciprocal of each distance as the line density. Along with
When each information of each divided small area is mapped to each of the equally divided areas with the specified size, the line density near each character line is set to zero, so it is slightly different from handwritten characters. In the case of a curved character line, the line density occurring in the vicinity thereof is removed, and thus it can be extracted as a normal straight character line.
Further, a second circumscribed rectangular frame having a constant multiple of the first circumscribed rectangular frame is provided outside the first circumscribed rectangular frame, and each character line adjacent to the first circumscribed rectangular frame is connected to the second circumscribed rectangular frame. Since the distance to each side of was calculated and the reciprocal of this distance was used as the linear density,
The line density of a character image in which each character line adjacent to each side of the first circumscribed rectangular frame is close to each corresponding side can be reduced,
Therefore, since it is possible to prevent the character image from being reduced when the nonlinear normalization is performed, it is possible to avoid a problem that it is difficult to determine similar characters such as “ta” and “yu” after the nonlinear normalization.

[Brief description of the drawings]

【図１】本発明を適用した装置の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of an apparatus to which the present invention is applied.

【図２】本装置で認識処理される文字画像及びこの文
字画像の輪郭線を示す図である。FIG. 2 is a diagram illustrating a character image to be recognized by the apparatus and a contour line of the character image.

【図３】本装置における画素の方向成分の検出状況を
示す図である。FIG. 3 is a diagram illustrating a detection state of a directional component of a pixel in the present apparatus.

【図４】輪郭線の非線形正規化処理の状況を示す図で
ある。FIG. 4 is a diagram illustrating a state of nonlinear normalization processing of a contour line.

【図５】非線形正規化の際の文字線間の距離（線間
隔）の算出状況を示す図である。FIG. 5 is a diagram illustrating a calculation state of a distance (line interval) between character lines during nonlinear normalization.

【図６】文字線間の距離算出の際の処理の要部を示す
図である。FIG. 6 is a diagram illustrating a main part of a process when calculating a distance between character lines.

【図７】従来の非線形正規化の際に生じる不具合の例
を示す図である。FIG. 7 is a diagram illustrating an example of a problem that occurs during conventional nonlinear normalization.

【図８】従来の非線形正規化の処理手順を示す図であ
る。FIG. 8 is a diagram showing a processing procedure of conventional nonlinear normalization.

【図９】文字認識の過程を示す認識処理フローであ
る。FIG. 9 is a recognition processing flow showing a process of character recognition.

[Explanation of symbols]

１…ＣＰＵ、２…タッチパネル、３…ＬＣＤ、４…メモ
リ、５…辞書メモリ、１１，１２…外接矩形枠。1 CPU, 2 touch panel, 3 LCD, 4 memory, 5 dictionary memory, 11 and 12 circumscribed rectangular frame.

Claims

[Claims]

A first circumscribing rectangular frame circumscribing the character to be identified; a distance between character lines constituting the character to be identified;
And determining a distance between each side of the first circumscribed rectangular frame and each character line adjacent to the first circumscribed rectangular frame, and defining a reciprocal of each distance as a line density in accordance with the line density information. A non-linear normalization method that divides a frame into each small area and maps each piece of information of each divided small area into each area equally divided into a prescribed size, wherein a line density near each of the character lines is provided. Is set to zero.

2. The character according to claim 1, wherein a second circumscribed rectangular frame having a constant multiple of the first circumscribed rectangular frame is provided outside the first circumscribed rectangular frame, and each character adjacent to the first circumscribed rectangular frame is provided. A non-linear normalization method, wherein a distance between a line and each side of a second circumscribed rectangular frame is obtained, and a reciprocal of the distance is used as a line density.