JPH10207985A

JPH10207985A - Method and device for segmenting character

Info

Publication number: JPH10207985A
Application number: JP9012875A
Authority: JP
Inventors: Hiroshi Sasaki; 佐々木　　寛; Hirohisa Goto; 裕久後藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-01-27
Filing date: 1997-01-27
Publication date: 1998-08-07

Abstract

PROBLEM TO BE SOLVED: To reduce the calculation amount for word generation processing and word collation processing and to more efficiently segment characters by controlling the number of characters constituting a word to generate the word. SOLUTION: A segment extraction part 17 extracts each primary segment being a cluster area of black bits from original picture data stored in a memory of a picture input part 13 in such state that its coordinates can be recognized. Extracted primary segments are integrated in accordance with a prescribed rule by a segment integrating part 19 to generate a secondary segment. A character recognition part 21 performs character recognition on the assumption that each of primary segments and secondary segments is one character, and a candidate character code as the result is stored. A word generation part 23 combines respective candidate characters of primary and secondary segments obtained by character recognition to generate a word. In this case, the number of characters constituting a word is controlled to generate the word. Thus, unnecessary candidate character codes and their combinations are removed from the calculation object at the time of word generation.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文字認識技術に
おける文字切り出し方法とその実施に好適な文字切り出
し装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting method in character recognition technology and a character extracting apparatus suitable for implementing the method.

【０００２】[0002]

【従来の技術】手書き文字列は、活字文字列に比べ、文
字間隔や文字形状の変化が大きい。そのため、手書き文
字列についての画像データから文字切り出しをする際に
一定間隔ごとに文字を切り出すと、文字を精度良く切り
出せない。これは誤認識の原因になる。これを解決する
ための従来技術として、例えば特開平５−３５９１７号
公報に開示された文字切り出し方法がある。2. Description of the Related Art A handwritten character string has a greater change in character spacing and character shape than a printed character string. Therefore, if characters are cut out at regular intervals when extracting characters from image data of a handwritten character string, characters cannot be extracted accurately. This causes misrecognition. As a conventional technique for solving this, there is a character cutout method disclosed in, for example, Japanese Patent Application Laid-Open No. 5-35917.

【０００３】この従来技術では先ず行画像から文字塊が
切り出される。ここで文字塊とは、黒ビットの塊領域で
ある。なお行画像から切り出された文字塊を、この明細
書では１次セグメントともいう。この文字塊はそれ単独
で文字パタンを構成する場合と、文字パタンの一部であ
る場合とがある。次に、この文字塊（１次セグメント）
を統合して文字パタン（２次セグメントともいう）が生
成される。次に、文字パタン（１次セグメント、２次セ
グメント）について文字認識がされる。文字パタンが他
の文字パタンとの関係において同じ文字塊を含まない場
合（文字塊の重複がない場合）、その文字パタンはその
まま切り出される。一方、文字塊の重複がある場合は、
これら文字パタンそれぞれの認識結果とその前あるいは
後の数文字の認識結果とを組み合わせて単語が生成され
る。生成された単語は単語辞書と照合される。In this prior art, a character block is first cut out from a line image. Here, the character block is a block region of black bits. Note that the character block cut out from the line image is also referred to as a primary segment in this specification. This character block may constitute a character pattern by itself or may be a part of the character pattern. Next, this character block (primary segment)
Are integrated to generate a character pattern (also referred to as a secondary segment). Next, character recognition is performed on the character pattern (primary segment, secondary segment). If the character pattern does not include the same character block in relation to another character pattern (when there is no overlap of character blocks), the character pattern is cut out as it is. On the other hand, if there are duplicate characters,
A word is generated by combining the recognition result of each of these character patterns with the recognition result of several characters before or after the character pattern. The generated words are checked against a word dictionary.

【０００４】生成された単語で単語辞書中の登録単語に
一致した生成単語については、その文字パタンが切り出
される。不一致の場合は、単語生成の前の文字パターン
それぞれの評価値および単語の評価値が求められ、評価
値が最高の文字パタンが切り出される。[0004] For a generated word that matches a registered word in the word dictionary with the generated word, its character pattern is cut out. If they do not match, the evaluation value of each character pattern before word generation and the evaluation value of the word are obtained, and the character pattern with the highest evaluation value is cut out.

【０００５】この従来の文字切り出し方法によれば、文
字塊の重複があり単語照合によっては切り出しが出来な
いことが判明した文字パタンに関してのみ、認識結果の
確からしさの評価値が求められる。そのため評価値を求
める処理に無駄がないので文字切り出し処理の効率を高
められる。[0005] According to this conventional character extraction method, an evaluation value of the certainty of the recognition result is obtained only for a character pattern that has been determined to be impossible to extract by word collation due to overlapping character blocks. Therefore, there is no waste in the process of obtaining the evaluation value, and the efficiency of the character segmentation process can be increased.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、従来技
術では、１次セグメントについての認識結果（候補文字
コード）およびまたは２次セグメントについての認識結
果（候補文字コード）を組み合わせて単語を作成するに
当たり、それぞれの認識結果とその前後数文字とを組み
合わせると記載しているだけであり、作成する単語の長
さについては明確ではない。そのため場合によっては、
無用に長い単語を作成することになり計算量に無駄が生
じることもある。However, in the prior art, when a word is created by combining the recognition result (candidate character code) for the primary segment and the recognition result (candidate character code) for the secondary segment, It merely states that each recognition result is combined with several characters before and after it, and the length of the word to be created is not clear. Therefore, in some cases,
Unnecessarily long words are created, and the amount of calculation may be wasted.

【０００７】文字認識対象分野によっては、単語の文字
数を制限できる場合がある。そして単語の文字数をあら
かじめ制限することができれば単語作成処理や単語照合
処理の際の計算量を低減することができるので、より効
率良く文字を切り出せる。[0007] Depending on the field of character recognition, the number of characters in a word may be limited. If the number of characters in a word can be limited in advance, the amount of calculation in word creation processing and word collation processing can be reduced, so that characters can be cut out more efficiently.

【０００８】[0008]

【課題を解決するための手段】そこでこの出願の文字切
り出し方法の発明によれば、メモリに格納されている文
字列についての画像データを含む原画像データから、黒
ビットの塊領域である１次セグメントをその座標が分か
る状態でそれぞれ抽出する処理と、抽出された各１次セ
グメントを所定規則に従い統合して２次セグメントをそ
の座標が分かる状態で作成する処理と、各１次セグメン
トおよび各２次セグメントそれぞれを文字認識する処理
と、該文字認識により得られる１次セグメントおよび２
次セグメントそれぞれの候補文字コードを組み合わせて
単語を作成する処理と、前記作成された単語を単語辞書
と照合する処理と、該単語照合の結果に基づいて前記各
１次セグメントおよび前記各２次セグメントのうちのい
ずれかを１文字分の文字切り出し領域と決定する処理
と、を含む文字切り出し方法において、前記単語を構成
する文字数を規制して前記単語を作成することを特徴と
する。Therefore, according to the invention of the character extracting method of this application, the original image data including the image data of the character string stored in the memory is converted into a primary area which is a block area of black bits. A process of extracting each segment in a state where its coordinates are known, a process of integrating each extracted primary segment according to a predetermined rule, and creating a secondary segment in a state where its coordinates are known, a process of extracting each primary segment and each of the two segments Character recognition processing for each of the next segments, and the primary segments and 2 obtained by the character recognition
A process of creating a word by combining candidate character codes of each of the next segments; a process of matching the created word with a word dictionary; and the primary segment and the secondary segment based on a result of the word matching. Deciding one of the characters as a character cutout area for one character, wherein the word is created by limiting the number of characters constituting the word.

【０００９】この文字切り出し方法の発明によれば、作
成される単語の文字数が規制されるので、単語作成処理
や単語照合処理の際の計算量が低減される。According to the invention of the character extracting method, the number of characters of a word to be created is restricted, so that the amount of calculation in the word creating process and the word matching process is reduced.

【００１０】なお、ここでは１次セグメントは黒ビット
の塊領域であると説明した。ここで黒ビットと述べたの
は文字構成画素を黒画素と考えてのことであり、黒ビッ
トを限定する趣旨ではない。白黒反転する場合であれば
白ビットの塊領域が１次セグメントになることを付記す
る。[0010] Here, it has been described that the primary segment is a lump area of black bits. The word “black bit” here means that the character constituent pixels are considered as black pixels, and is not intended to limit the black bits. Note that in the case of black and white inversion, the lump area of white bits becomes the primary segment.

【００１１】この文字切り出し方法の発明の実施に当た
り、文字数を規制するための基準値を、文字認識対象と
なる分野に応じ決めるのが好適である。ここで文字認識
対象となる分野とは、例えば住所データや、姓名データ
等、個々のデータの文字数が大体において何文字以内に
納まることが予想できる分野である。このように分野に
応じ基準値を決めると、適正な文字数以内で単語作成が
行なえる。In practicing the invention of the character extracting method, it is preferable that a reference value for regulating the number of characters is determined in accordance with the field of character recognition. Here, the field to be subjected to character recognition is a field in which the number of characters of each piece of data can be expected to be approximately within the number of characters, such as address data and first and last name data. When the reference value is determined according to the field in this way, words can be created within an appropriate number of characters.

【００１２】さらにこの文字切り出し方法の発明の実施
に当たり、前記２次セグメントは、文字の並ぶ方向をＸ
方向としたとき該Ｘ方向に連続しているｍ個の１次セグ
メントを所定規則に従い統合することで作成し、かつ、
前記単語は、前記１次セグメントおよびまたは２次セグ
メントの連接で表される候補パスであって以下の(a) 〜
(d) の処理を含む処理により作成される候補パスに基づ
いて作成するのが好適である。ただし、ｍは２以上の整
数。Further, in implementing the invention of the character segmentation method, the secondary segment has a direction in which characters are arranged in X.
The direction is created by integrating m primary segments that are continuous in the X direction according to a predetermined rule, and
The word is a candidate path represented by the concatenation of the primary segment and / or the secondary segment, and includes the following (a) to
It is preferable to create it based on the candidate path created by the process including the process (d). Here, m is an integer of 2 or more.

【００１３】(a) 前記ｍ個の１次セグメントそれぞれを
前記Ｘ方向で区分けする座標を、切り出し候補位置Ｃｉ
（ｉ＝０〜ｍ）としたとき、前記ｍ個の１次セグメント
および前記作成した２次セグメントの中から、切り出し
候補位置Ｃ０が切り出し開始点となっているセグメント
をすべて抽出する処理。(A) Coordinates for dividing each of the m primary segments in the X direction are determined as candidate clipping positions Ci
When (i = 0 to m), a process of extracting all the segments whose extraction candidate position C0 is the extraction start point from the m primary segments and the created secondary segments.

【００１４】(b) 前記(a) の処理にて抽出されたセグメ
ントそれぞれについて、そのセグメントの終了点側の切
り出し候補位置Ｃｊ（ｊ＝１〜ｍ）が切り出し開始位置
となっているため連接することができる他のセグメン
ト、該他のセグメントに前記と同様な切り出し候補位置
の関係となっているためさらに連接することができる他
のセグメントを、終了点側の切り出し候補位置がＣｍと
なっている他のセグメントが出現するまで、前記ｍ個の
１次セグメントおよび前記作成した２次セグメントの中
から抽出する処理。(B) With respect to each of the segments extracted in the process (a), the segmentation candidate positions Cj (j = 1 to m) on the end point side of the segment are connected since the segmentation start positions are set. Other segments that can be connected, and other segments that have the same relationship as the above-described segmentation candidate positions to the other segments, have another segment that can be further connected to the segment, and the segmentation candidate position on the end point side is Cm. A process of extracting from the m primary segments and the created secondary segments until another segment appears.

【００１５】(c) 前記(b) の処理において前記他のセグ
メントを抽出する度に、該他のセグメントまでで構成さ
れる候補パスのセグメント数が規定数以内か否かを判定
する処理。(C) Every time the other segment is extracted in the process of (b), a process of determining whether or not the number of segments of the candidate path including the other segment is within a specified number.

【００１６】(d) セグメント数が前記規定数以内の候補
パスで、かつ、候補パス中の最終セグメントの終了点側
切り出し候補位置がＣｍとなっている候補パスを、単語
作成のための候補パスとする処理。(D) A candidate path having a number of segments within the specified number and a candidate position for extracting the end point of the last segment in the candidate path being Cm is set as a candidate path for word creation. Processing.

【００１７】この好適例によれば、ｍ個の１次セグメン
トで規定される切り出し領域から、切り出し候補位置Ｃ
０が出発点でかつ切り出し候補位置Ｃｍが終了点で然も
セグメント数が規定数以内となっているセグメント列か
らなる候補パスが全て抽出される。抽出された候補パス
を構成しているセグメントそれぞれの認識結果（候補文
字コード）は認識処理にて既に判明しているので、抽出
された候補パスからは規定数以内の文字数からなる単語
が生成される。According to this preferred embodiment, the extraction candidate position C is selected from the extraction area defined by the m primary segments.
All the candidate paths consisting of the segment sequence in which 0 is the starting point, the cutout candidate position Cm is the end point, and the number of segments is within the specified number are extracted. Since the recognition result (candidate character code) of each of the segments constituting the extracted candidate path has already been found in the recognition processing, a word having a specified number of characters or less is generated from the extracted candidate path. You.

【００１８】上述した好適例における候補パス作成処理
は具体的には以下の(1) 〜(9) の処理を含む処理によ
り作成するのが好適である。Specifically, the candidate path creation process in the preferred embodiment described above is preferably created by a process including the following processes (1) to (9).

【００１９】(1) 前記ｍ個の１次セグメントそれぞれを
前記Ｘ方向で区分けする座標を、切り出し候補位置Ｃ０
〜Ｃｍとしたとき、着目した切り出し候補位置Ｃｉ（ｉ
＝０〜ｍ）がＣｍか否かを判定する第１の処理。(1) Coordinates for dividing each of the m primary segments in the X direction are designated as cutout candidate positions C0
To Cm, the cutout candidate position Ci (i
= 0 to m) is Cm.

【００２０】(2) 前記第１の処理でＣｉ＝Ｃｍと判定さ
れた場合に実行され、現在の候補パスを候補パスメモリ
に記録する第２の処理。(2) A second process which is executed when it is determined that Ci = Cm in the first process, and records the current candidate path in the candidate path memory.

【００２１】(3) 前記第１の処理でＣｉ≠Ｃｍと判定さ
れた場合に実行され、切り出し候補位置Ｃｉと切り出し
候補位置Ｃｊ（ｊ＝ｉ＋１）とに挟まれるセグメントＳ
ｋ＋１が存在するか否かを判定する第３の処理。(3) This is executed when it is determined that Ci 判定 Cm in the first processing, and the segment S sandwiched between the extraction candidate position Ci and the extraction candidate position Cj (j = i + 1)
Third processing for determining whether or not k + 1 exists.

【００２２】(4) 前記第３の処理でセグメントが存在す
ると判定された場合に実行され、前記セグメントＳｋ＋
１を候補パスに加えた場合に該候補パスのセグメント数
が規定数を越えないか否かを判定する第４の処理。(4) Executed when it is determined in the third processing that a segment exists, and the segment Sk +
A fourth process of determining whether the number of segments of the candidate path does not exceed a specified number when 1 is added to the candidate path.

【００２３】(5) 前記第４の処理で規定数以内と判定さ
れた場合に実行され、前記セグメントＳｋ＋１を前記候
補パスに追加する第５の処理。(5) A fifth process which is executed when it is determined in the fourth process that the number is within the specified number, and adds the segment Sk + 1 to the candidate path.

【００２４】(6) 前記第５の処理に続いて実行され、前
記切り出し候補位置Ｃｊを前記着目した切り出し候補位
置Ｃｉとみなして、前記第１の処理から再実行する第６
の処理。(6) The sixth processing which is executed subsequent to the fifth processing and is executed again from the first processing by regarding the cut candidate position Cj as the focused cut candidate position Ci.
Processing.

【００２５】(7) 前記第５の処理と前記第６の処理とを
実行して作成された候補パスについては、該候補パスに
最新に追加されたセグメントを該候補パスから削除する
第７の処理。(7) For the candidate path created by executing the fifth and sixth processes, the seventh segment for deleting the segment most recently added to the candidate path from the candidate path. processing.

【００２６】(8) 前記第３の処理で否と判定された場
合、または前記第４の処理で否と判定された場合、また
は前記第７の処理が実行された場合に実行され、前記切
り出し候補位置を規定しているｊをｊ＝ｊ＋１に変更
し、かつ、変更したｊが前記ｍとの関係でｊ＞ｍを満た
すか否かを判定する第８の処理。(8) When the determination is negative in the third processing, when the determination is negative in the fourth processing, or when the seventh processing is performed, Eighth processing for changing j defining the candidate position to j = j + 1 and determining whether or not the changed j satisfies j> m in relation to m.

【００２７】(9) 前記第８の処理でｊ≦ｍと判定された
場合に実行され、前記第３の処理から再実行する第９の
処理。(9) A ninth process that is executed when j ≦ m is determined in the eighth process, and is re-executed from the third process.

【００２８】これら(1) 〜(9) の処理によれば、ｍ個の
１次セグメントで規定される切り出し領域から、切り出
し候補位置Ｃ０が出発点でかつ切り出し候補位置Ｃｍが
終了点で然もセグメント数が規定数以内となっているセ
グメント列から成る候補パスが全て抽出される。According to the processes (1) to (9), from the cutout area defined by the m primary segments, the cutout candidate position C0 is the starting point and the cutout candidate position Cm is the end point. All the candidate paths composed of the segment sequence in which the number of segments is within the specified number are extracted.

【００２９】さらにこの文字切り出し方法の発明を実施
するに当たり、１次セグメントを統合するときの前記所
定規則として、文字の並ぶ方向をＸ方向としたとき、該
Ｘ方向に連続しているｍ個の１次セグメントのうちの高
さが最高のセグメントの当該高さＨを求め、かつ、着目
する１次セグメントに対しＸ方向でＨ×Ｎの座標範囲に
存在する他の１次セグメントを該着目する１次セグメン
トに統合するという規則を用いるのが好適である。ただ
しＮは予め定めた値である。Further, in implementing the invention of the character segmentation method, when the direction in which the characters are arranged is set to the X direction as the predetermined rule when the primary segments are integrated, m continuous m characters in the X direction are set. The height H of the segment having the highest height among the primary segments is obtained, and another primary segment existing in the H × N coordinate range in the X direction with respect to the primary segment of interest is focused on. It is preferred to use the rule of merging into primary segments. Here, N is a predetermined value.

【００３０】手書き文字は、書き手や書き手を取り巻く
状況によって文字の大きさや文字の間隔等が変化するの
が普通である。したがって、行高さも変化することが普
通である。そこでセグメントの高さを考慮することによ
り、上記の行高さの変動が考慮されることになる。その
結果、書き手や書き手を取り巻く状況によって変化する
文字の大きさ等を考慮した条件で２次セグメントを作成
することができるので、妥当な２次セグメントを作成す
ることができる。なおＮの値は経験的（統計的）に決定
するのが良い。この出願に係る発明者の研究によれば、
Ｎを１．２とすることで好ましい統合が実現されること
が分かっている。Generally, the size of a handwritten character, the space between characters, and the like change depending on the writer and the circumstances surrounding the writer. Therefore, it is common that the row height also changes. Therefore, by considering the height of the segment, the above variation in the row height is taken into account. As a result, the secondary segment can be created under conditions that take into account the size of the character that changes depending on the writer and the circumstances surrounding the writer, and so an appropriate secondary segment can be created. The value of N is preferably determined empirically (statistically). According to the inventor's research on this application,
It has been found that preferable integration is realized by setting N to 1.2.

【００３１】なお上述の文字切り出し方法の発明を実施
するため、以下のように文字切り出し装置を構成するの
が好適である。In order to carry out the above-described character extracting method, it is preferable to configure a character extracting device as follows.

【００３２】メモリに格納されている文字列についての
画像データを含む入力画像データから、黒ビットの塊領
域である１次セグメントをその座標が分かる状態でそれ
ぞれ抽出するセグメント抽出部と、抽出された各１次セ
グメントを所定規則に従い統合し２次セグメントをその
座標が分かる状態で作成するセグメント統合部と、各１
次セグメントおよび各２次セグメントそれぞれを文字認
識する文字認識部と、該文字認識により得られる１次セ
グメントおよびまたは２次セグメントそれぞれの候補文
字コードを組み合わせて単語を作成する単語作成部と、
前記作成された単語を単語辞書と照合する単語照合部
と、該単語照合の結果に基づいて前記各１次セグメント
および前記各２次セグメントのうちのいずれかを１文字
分の文字切り出し領域と決定する結果選択部とを具える
と共に、前記単語作成部に、単語を構成する文字数を規
制するための文字数規則検査部を具えた文字切り出し装
置。A segment extraction unit for extracting, from input image data including image data of a character string stored in the memory, a primary segment, which is a block area of black bits, in a state where its coordinates are known; A segment integration unit for integrating each primary segment according to a predetermined rule and creating a secondary segment with its coordinates known;
A character recognizing unit for recognizing each of the next segment and each of the secondary segments, a word generating unit for generating a word by combining candidate character codes of the primary and / or secondary segments obtained by the character recognition,
A word matching unit that matches the created word with a word dictionary; and determines one of the primary segments and the secondary segments as a character cutout area for one character based on a result of the word matching. And a character selecting unit for controlling the number of characters constituting the word in the word creating unit.

【００３３】またこの文字切り出し装置の発明の実施に
当たり、前記セグメント統合部として、文字の並ぶ方向
をＸ方向としたとき該Ｘ方向に連続しているｍ個の１次
セグメントを所定規則に従い統合するセグメント統合部
を具え、前記単語作成部を、（Ａ）以下の(1) 〜(8) の
手段を含み前記１次セグメントおよびまたは２次セグメ
ントの連接で表される候補パスを作成する候補パス作成
部と、（Ｂ）該候補パス作成部に含まれる第３の手段で
セグメントが存在すると判定された場合に動作し、セグ
メントＳｋ＋１を候補パスに加えた場合に該候補パスの
セグメント数が規定数を越えないか否かを判定すること
で前記文字数を規制する文字数規則検査部とで構成する
のが好適である。ただし、ｍは２以上の整数。In implementing the invention of the character extracting apparatus, the segment integrating unit integrates m primary segments continuous in the X direction according to a predetermined rule when the direction in which characters are arranged is the X direction. A segment integrating unit, wherein the word creating unit (A) includes a means (1) to (8) below and creates a candidate path represented by a concatenation of the primary segment and / or the secondary segment. (B) operates when the third means included in the candidate path generating unit determines that a segment exists, and when the segment Sk + 1 is added to the candidate path, the number of segments of the candidate path is defined. It is preferable that the number of characters is controlled by determining whether the number does not exceed the number of characters. Here, m is an integer of 2 or more.

【００３４】(1) 前記ｍ個の１次セグメントそれぞれを
前記Ｘ方向で区分けする座標を、切り出し候補位置Ｃ０
〜Ｃｍとしたとき、着目した切り出し候補位置Ｃｉ（ｉ
＝０〜ｍ）がＣｍか否かを判定する第１の手段。(1) The coordinates for dividing each of the m primary segments in the X direction are defined as a candidate cutout position C0.
To Cm, the cutout candidate position Ci (i
= 0 to m) is Cm.

【００３５】(2) 前記第１の手段でＣｉ＝Ｃｍと判定さ
れた場合に動作し、現在の候補パスを候補パスメモリに
記録する第２の手段。(2) A second means which operates when the first means determines that Ci = Cm, and records the current candidate path in the candidate path memory.

【００３６】(3) 前記第１の手段でＣｉ≠Ｃｍと判定さ
れた場合に動作し、切り出し候補位置Ｃｉと切り出し候
補位置Ｃｊ（ｊ＝ｉ＋１）とに挟まれるセグメントＳｋ
＋１が存在するか否かを判定する第３の手段。(3) The operation is performed when Ci と Cm is determined by the first means, and the segment Sk sandwiched between the extraction candidate position Ci and the extraction candidate position Cj (j = i + 1)
Third means for determining whether or not +1 exists.

【００３７】(4) 前記文字数規則検査部で規定数以内と
判定された場合に動作し、前記セグメントＳｋ＋１を前
記候補パスに追加する第４の手段。(4) A fourth means which operates when the number-of-characters rule checking unit determines that the number is within the specified number, and adds the segment Sk + 1 to the candidate path.

【００３８】(5) 前記第４の手段に続いて動作し、前記
切り出し候補位置Ｃｊを前記着目した切り出し候補位置
Ｃｉとみなして、前記第１の手段の動作を開始させる第
５の手段。(5) Fifth means which operates following the fourth means, and regards the cut candidate position Cj as the focused cut candidate position Ci and starts the operation of the first means.

【００３９】(6) 前記第４の手段および前記第５の手段
が動作した結果作成された候補パスについては、該候補
パスに最新に追加されたセグメントを該候補パスから削
除する第６の手段。(6) For a candidate path created as a result of the operation of the fourth means and the fifth means, a sixth means for deleting from the candidate path the segment most recently added to the candidate path. .

【００４０】(7) 前記第３の手段が否と判定した場合、
または前記文字数規則検査部が規定数を越えると判定し
た場合、または前記第６の手段が動作した後に動作し、
前記切り出し候補位置を規定しているｊをｊ＝ｊ＋１に
変更し、かつ、変更したｊが前記ｍとの関係でｊ＞ｍを
満たすか否かを判定する第７の手段。(7) When the third means determines that no,
Or, if the character number rule checking unit determines that the number exceeds a specified number, or operates after the sixth means is operated,
A seventh means for changing j defining the cutout candidate position to j = j + 1 and determining whether the changed j satisfies j> m in relation to the m.

【００４１】(8) 前記第７の手段がｊ≦ｍと判定した場
合に動作し、前記第３の手段を動作させる第８の手段。(8) An eighth means for operating when the seventh means determines that j ≦ m, and for operating the third means.

【００４２】また、この文字切り出し装置の発明の実施
に当たり、前記セグメント統合部は、文字の並ぶ方向を
Ｘ方向としたとき、該Ｘ方向に連続しているｍ個の１次
セグメントのうちの高さが最高のセグメントの当該高さ
Ｈを求める手段と、着目する１次セグメントに対しＸ方
向でＨ×Ｎの座標範囲に存在する他の１次セグメントを
該着目する１次セグメントに統合する手段とを含む構成
とするのが好適である。ただしＮは予め定めた値であ
る。In implementing the invention of the character segmenting apparatus, when the direction in which the characters are arranged is the X direction, the segment integration section sets the height of the m primary segments continuous in the X direction. Means for obtaining the height H of the segment having the highest segment, and means for integrating another primary segment present in the H × N coordinate range in the X direction with respect to the primary segment of interest into the primary segment of interest. It is preferable to adopt a configuration including Here, N is a predetermined value.

【００４３】[0043]

【発明の実施の形態】以下、図面を参照してこの発明の
文字切り出し方法および文字切り出し装置の実施の形態
について併せて説明する。しかしながら説明に用いる各
図はこの発明を理解することができる程度に概略的に示
してある。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a character extracting apparatus according to an embodiment of the present invention; However, the drawings used in the description are schematically shown to the extent that the present invention can be understood.

【００４４】図１は実施の形態の文字切り出し装置の構
成を示した図である。この文字切り出し装置１０は、制
御部１１、画像入力部１３、文字数規則入力部１５、セ
グメント抽出部１７、セグメント統合部１９、文字認識
部２１、単語作成部２３、単語照合部２５、結果選択部
２７および結果出力部２９を具える。FIG. 1 is a diagram showing a configuration of a character cutout device according to the embodiment. The character segmenting apparatus 10 includes a control unit 11, an image input unit 13, a character number rule input unit 15, a segment extraction unit 17, a segment integration unit 19, a character recognition unit 21, a word creation unit 23, a word comparison unit 25, and a result selection unit. 27 and a result output unit 29.

【００４５】これら構成成分１１〜２９はコンピュータ
およびその周辺装置によりそれぞれ構成することができ
る。以下、各構成成分の構成および動作について順次に
説明する。These components 11 to 29 can be constituted by a computer and its peripheral devices, respectively. Hereinafter, the configuration and operation of each component will be sequentially described.

【００４６】制御部１１は、各構成成分１３〜２９の動
作を制御する。The control section 11 controls the operation of each of the components 13 to 29.

【００４７】画像入力部１３は、メモリ（図示せず）を
含んでいて、文字認識対象である原画像データを入力し
該メモリに格納する。具体的には、白黒二値で表される
原画像データを入力する。The image input section 13 includes a memory (not shown), inputs original image data to be recognized, and stores the input image data in the memory. Specifically, original image data represented by black and white binary is input.

【００４８】この画像入力部１３は、任意好適な構成と
することができる。例えばスキャナを有し原稿からの光
信号を光電変換して原画像データをメモリに取り込む構
成の画像入力部であったり、または、原画像データをそ
もそも格納している他のデータベースであっても良い。
もちろん、多値画像から二値画像を得る場合があっても
良い。The image input section 13 can have any suitable configuration. For example, the image input unit may have a configuration in which a scanner is provided and photoelectrically converts an optical signal from a document and takes in the original image data into a memory, or another database that stores the original image data in the first place. .
Of course, a binary image may be obtained from a multivalued image.

【００４９】文字数規則入力部１５は、単語作成部２３
で作成する単語の文字数を規制するための基準値を入力
する。この文字数規則入力部１５は、例えばコンピュー
タおよびそのキーボードにより構成することができる。The number-of-characters rule input unit 15 includes a word creating unit 23.
Enter the reference value for regulating the number of characters in the word created in. The character number rule input unit 15 can be constituted by, for example, a computer and its keyboard.

【００５０】例えばオペレータは、単語作成部２３で作
成する単語の文字数を規制するための基準値を、キーボ
ードから入力することができる。ここでは候補パス（後
述する）を構成するセグメント数の上限値を基準値とし
て入力する。またこの基準値は、例えば文字認識対象の
分野に応じて入力することが出来る。例えば名字に関す
るデータを認識する例であれば、名字は長くても例えば
４文字以内であるので、基準値として４を入力する。For example, the operator can input a reference value for regulating the number of characters of a word created by the word creating section 23 from a keyboard. Here, an upper limit value of the number of segments constituting a candidate path (described later) is input as a reference value. The reference value can be input, for example, according to the field of the character recognition target. For example, in the case of recognizing data related to surnames, the surname is at most four characters or less, for example, so 4 is input as a reference value.

【００５１】セグメント抽出部１７は、画像入力部１３
のメモリに格納されている原画像データから黒ビットの
塊領域である１次セグメントをその座標が分かる状態で
それぞれ抽出する。The segment extraction unit 17 is provided with the image input unit 13
The primary segment, which is a block area of black bits, is extracted from the original image data stored in the memory of FIG.

【００５２】このセグメント抽出部１７でのセグメント
抽出処理は、従来から良く知られている黒ビット（黒点
ともいう）の水平方向、垂直方向の射影分布を利用する
方法で容易に行なうことができる。The segment extraction processing in the segment extraction unit 17 can be easily performed by a method using the horizontal and vertical projection distributions of black bits (also referred to as black points) which are well known in the art.

【００５３】具体的には、原画像データが格納されてい
るメモリを水平方向に走査し、黒点のヒストグラムを求
める。このヒストグラムにおける極小点それぞれを水平
方向についての切り出し候補位置Ｃｉとする。走査方向
を垂直方向に変えて同様の処理を行なって、垂直方向に
ついての切り出し候補位置を抽出する。切り出し候補位
置に囲まれる矩形の領域内に１次セグメントは含まれ
る。More specifically, the memory in which the original image data is stored is scanned in the horizontal direction, and a histogram of black points is obtained. Each minimum point in the histogram is set as a candidate cutting position Ci in the horizontal direction. The same processing is performed by changing the scanning direction to the vertical direction, and the extraction candidate position in the vertical direction is extracted. The primary segment is included in a rectangular area surrounded by the extraction candidate positions.

【００５４】このセグメント抽出部１７の動作の理解を
深めるために、図２に、原画像データ３１から抽出され
た１次セグメントＳ０〜Ｓ５と、各１次セグメントを文
字の並ぶ方向（この例ではＸ方向）で区分けする座標す
なわち切り出し候補位置Ｃ０〜Ｃ６とをそれぞれ示し
た。In order to deepen the understanding of the operation of the segment extracting section 17, FIG. 2 shows primary segments S0 to S5 extracted from the original image data 31 and each primary segment in a direction in which characters are arranged (in this example, (X direction), that is, the cutout candidate positions C0 to C6 are shown.

【００５５】なお１次セグメントＳ０〜Ｓ５それぞれ
の、Ｘ方向開始座標Ｘｓ、Ｘ方向終了座標Ｘｅ、Ｙ方向
開始座標Ｙｓ、Ｙ方向終了座標Ｙｅそれぞれを、セグメ
ント抽出部１７は、内部のセグメント座標テーブル（図
示せず）に格納する。図３に、１次セグメントＳ０〜Ｓ
５についてのセグメント座標テーブルを模式的に示し
た。例えば１次セグメントＳ０についてのＸｓ〜Ｙｅ
は、Ｘｓ＝１０、Ｘｅ＝２６、Ｙｓ＝６１、Ｙｅ＝１０
５であることが分かる。The X-direction start coordinates Xs, X-direction end coordinates Xe, Y-direction start coordinates Ys, and Y-direction end coordinates Ye of each of the primary segments S0 to S5 are stored in an internal segment coordinate table. (Not shown). FIG. 3 shows primary segments S0 to S
5 schematically shows the segment coordinate table. For example, Xs to Ye for the primary segment S0
Xs = 10, Xe = 26, Ys = 61, Ye = 10
It turns out that it is 5.

【００５６】また、後に候補パスを作成する際に必要な
Ｘ方向についての切り出し候補位置（座標）Ｃ０〜Ｃ６
を、セグメント抽出部１７は内部の所定メモリ（図示せ
ず）に記憶する。Further, the cutout candidate positions (coordinates) C0 to C6 in the X direction necessary for creating a candidate path later.
Is stored in an internal predetermined memory (not shown).

【００５７】セグメント統合部１９は、セグメント抽出
部１７で抽出された各１次セグメントを所定規則に従い
統合して２次セグメントを作成する。具体的には、隣接
する複数の１次セグメントの形状特徴を考慮し、統合し
ても１文字としての可能性がある場合、それら１次セグ
メントを統合して２次セグメントを作成する。ここで
は、以下の手順で２次セグメントを作成する。The segment integrating section 19 integrates the primary segments extracted by the segment extracting section 17 according to a predetermined rule to create a secondary segment. Specifically, in consideration of the shape characteristics of a plurality of adjacent primary segments, if there is a possibility that they will be one character even if they are integrated, the primary segments are integrated to create a secondary segment. Here, a secondary segment is created by the following procedure.

【００５８】先ず、該Ｘ方向に連続しているｍ個の１次
セグメントのうちの高さが最高のセグメントの当該高さ
Ｈを求める。この高さＨは、各１次セグメントについて
のＹｓ座標とＹｅ座標との差を求めることで求まる。図
２の例の６個の１次セグメントの例で考えると、セグメ
ントＳ５のＹ座標差が１１２−５＝１０７であり、他の
セグメントＳ０〜Ｓ４のどれよりも、高さが高い。した
がって、図２の例の場合は、高さが最高のセグメント
は、セグメントＳ５となる。First, the height H of the segment having the highest height among the m primary segments continuous in the X direction is obtained. The height H is obtained by calculating the difference between the Ys coordinate and the Ye coordinate for each primary segment. Considering the example of the six primary segments in the example of FIG. 2, the Y coordinate difference of the segment S5 is 112-5 = 107, which is higher than any of the other segments S0 to S4. Therefore, in the example of FIG. 2, the segment having the highest height is the segment S5.

【００５９】次に、着目する１次セグメントに対しＸ方
向でＨ×Ｎの座標範囲に存在する他の１次セグメントを
該着目する１次セグメントに統合して、２次セグメント
を作成する。Next, a secondary segment is created by integrating another primary segment present in the H × N coordinate range in the X direction with respect to the primary segment of interest into the primary segment of interest.

【００６０】この２次セグメント作成処理について、図
２に示した１次セグメントの説明図と、図３に示したセ
グメント座標テーブルと、図４に示したセグメント統合
処理の流れ図と、図５に示した２次セグメント作成手順
の説明図とを参照して、より具体的に説明する。Regarding the secondary segment creation processing, the illustration of the primary segment shown in FIG. 2, the segment coordinate table shown in FIG. 3, the flow chart of the segment integration processing shown in FIG. 4, and FIG. This will be described more specifically with reference to the explanatory diagram of the secondary segment creation procedure.

【００６１】先ず、全入力セグメントそれぞれを始点と
したループ１の処理を開始する。そこで、着目セグメン
ト（図４ではセグメントＡ）として先ず１次セグメント
Ｓ０を始点としたループ１の処理を開始する（図４のス
テップ４１〜４７）。First, the processing of loop 1 starting from each input segment is started. Therefore, as a target segment (segment A in FIG. 4), the process of loop 1 starting from the primary segment S0 is first started (steps 41 to 47 in FIG. 4).

【００６２】すなわち、１次セグメントＳ０と、その右
に並ぶセグメントＢとしての１次セグメントＳ１との、
文字の並ぶ方向（ここではＸ方向）についての距離Ｄを
求める（図４のステップ４２，４３）。この距離Ｄは各
セグメントＳ０，Ｓ１それぞれの例えばＸｓ座標同士の
差により求まる。するとこの例では距離Ｄ＝５２−１０
＝４２ということになる。That is, the primary segment S0 and the primary segment S1 as the segment B arranged to the right thereof
The distance D in the direction in which the characters are arranged (here, the X direction) is determined (steps 42 and 43 in FIG. 4). This distance D is obtained from, for example, the difference between the Xs coordinates of each of the segments S0 and S1. Then, in this example, the distance D = 52−10
= 42.

【００６３】次に、距離ＤがＨ×Ｎの範囲か否かを判定
する（図４のステップ４４）。ここで、Ｎは予め定めた
値である。ここではＮ＝１．２とする。また、Ｈは上述
したようにここでは１０７である。したがって、この場
合、Ｄ≦１．２×１０７＝１２８．４を満たすか否かを
判定する。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). Here, N is a predetermined value. Here, N = 1.2. H is 107 here as described above. Therefore, in this case, it is determined whether or not D ≦ 1.2 × 107 = 128.4 is satisfied.

【００６４】図２の例の場合は１次セグメントＳ０と１
次セグメントＳ１との距離Ｄは、Ｄ≦１．２×１０７の
条件を満たすので、１次セグメントＳ１は１次セグメン
トＳ０に統合され、２次セグメントＳ６が作成される
（図４のステップ４５、図５参照）。In the case of the example of FIG. 2, the primary segments S0 and S1
Since the distance D from the next segment S1 satisfies the condition of D ≦ 1.2 × 107, the primary segment S1 is integrated into the primary segment S0 to create the secondary segment S6 (step 45 in FIG. 4, (See FIG. 5).

【００６５】次に、ループ２が再実行されるので（図４
のステップ４６，４２）、今度は、１次セグメントＳ０
と１次セグメントＳ２との距離Ｄを求める（図４のステ
ップ４３）。この距離Ｄは１２６−１０＝１１６であ
る。Next, the loop 2 is executed again (FIG. 4
Steps 46 and 42), this time the primary segment S0
The distance D between the first segment S2 is obtained (step 43 in FIG. 4). This distance D is 126-10 = 116.

【００６６】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ０に対し１次セグメントＳ２は、Ｄ≦１．２×
１０７の条件を満たすので、統合され、２次セグメント
Ｓ７が作成される（図４のステップ４５、図５参照）。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S2 is D ≦ 1.2 × with respect to the primary segment S0.
Since the condition of 107 is satisfied, they are integrated and the secondary segment S7 is created (see step 45 in FIG. 4 and FIG. 5).

【００６７】次に、ループ２が再実行されるので（図４
のステップ４６，４２）、今度は、１次セグメントＳ０
と１次セグメントＳ３との距離Ｄを求める（図４のステ
ップ４３）。この距離Ｄは１８０−１０＝１７０であ
る。Next, since the loop 2 is executed again (FIG. 4)
Steps 46 and 42), this time the primary segment S0
The distance D between the first segment S3 and the primary segment S3 is determined (step 43 in FIG. 4). This distance D is 180-10 = 170.

【００６８】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ０に対し１次セグメントＳ３は、Ｄ≦１．２×
１０７の条件を満たさないので、統合されない。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S3 is D ≦ 1.2 × with respect to the primary segment S0.
Since the condition of 107 is not satisfied, it is not integrated.

【００６９】次に着目する１次セグメントを１次セグメ
ントＳ１に変更してループ１の処理が開始される（図４
のステップ４１，４２）。そこで、１次セグメントＳ１
と１次セグメントＳ２との距離Ｄを求める（図４のステ
ップ４３）。この距離Ｄは１２６−５２＝７４である。Next, the primary segment of interest is changed to primary segment S1, and the processing of loop 1 is started (FIG. 4).
Steps 41 and 42). Therefore, the primary segment S1
The distance D between the first segment S2 is obtained (step 43 in FIG. 4). This distance D is 126-52 = 74.

【００７０】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ１に対し１次セグメントＳ２は、Ｄ≦１．２×
１０７の条件を満たすので、統合され、２次セグメント
Ｓ８が作成される（図４のステップ４５、図５参照）。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S2 is D ≦ 1.2 × with respect to the primary segment S1.
Since the condition of 107 is satisfied, they are integrated and a secondary segment S8 is created (see step 45 in FIG. 4 and FIG. 5).

【００７１】次に、１次セグメントＳ１と１次セグメン
トＳ３との距離Ｄを求める（図４のステップ４３）。こ
の距離Ｄは１８０−５２＝１２８である。Next, the distance D between the primary segment S1 and the primary segment S3 is determined (step 43 in FIG. 4). This distance D is 180-52 = 128.

【００７２】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ１に対し１次セグメントＳ３は、Ｄ≦１．２×
１０７の条件を満たすので、統合され、２次セグメント
Ｓ９が作成される（図４のステップ４５、図５参照）。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S3 is D ≦ 1.2 × with respect to the primary segment S1.
Since the condition of 107 is satisfied, they are integrated and a secondary segment S9 is created (see step 45 in FIG. 4 and FIG. 5).

【００７３】次に、１次セグメントＳ１と１次セグメン
トＳ４との距離Ｄを求める（図４のステップ４３）。こ
の距離Ｄは２３１−５２＝１７９である。Next, the distance D between the primary segment S1 and the primary segment S4 is determined (step 43 in FIG. 4). This distance D is 231-52 = 179.

【００７４】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ１に対し１次セグメントＳ４は、Ｄ≦１．２×
１０７の条件を満たさないので、統合されない。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S4 is D ≦ 1.2 × with respect to the primary segment S1.
Since the condition of 107 is not satisfied, it is not integrated.

【００７５】以下、同様に、１次セグメントＳ２からＳ
５それぞれを始点として（着目セグメントとして）ルー
プ１の処理を施し、１次セグメントを統合することで、
２次セグメントを作成する。この結果この例ではＳ６〜
Ｓ１４の９個（ただし図５ではＳ６〜Ｓ９と、Ｓ１３，
Ｓ１４のみ図示）の２次セグメントが作成される。Hereinafter, similarly, the primary segments S2 to S
5 by performing the processing of Loop 1 with each of them as a starting point (as a target segment) and integrating the primary segments,
Create a secondary segment. As a result, in this example, S6 ~
Nine in S14 (however, in FIG. 5, S6 to S9, S13,
A secondary segment (only S14 is shown) is created.

【００７６】これら作成した２次セグメントそれぞれ
の、Ｘ方向開始座標Ｘｓ、Ｘ方向終了座標Ｘｅ、Ｙ方向
開始座標Ｙｓ、Ｙ方向終了座標Ｙｅそれぞれを、セグメ
ント統合部１９は、前記のセグメント座標テーブルに追
加格納する（図４のステップ４８）。The segment integrating unit 19 stores the X direction start coordinate Xs, X direction end coordinate Xe, Y direction start coordinate Ys, and Y direction end coordinate Ye of each of the created secondary segments in the segment coordinate table. It is additionally stored (step 48 in FIG. 4).

【００７７】図６に、１次セグメントＳ０〜Ｓ５および
２次セグメントＳ６〜Ｓ１４についてのセグメント座標
テーブルを模式的に示した。FIG. 6 schematically shows a segment coordinate table for the primary segments S0 to S5 and the secondary segments S6 to S14.

【００７８】また、上記の１次セグメントＳ０〜Ｓ５と
２次セグメントＳ６〜Ｓ１４とに関して、ある切り出し
候補位置と他の切り出し候補位置との間にいかなるセグ
メントが挟まれているかを整理したテーブル（これを
「セグメントテーブル」という）を作成すると、図７の
ようになる。Further, with respect to the above-mentioned primary segments S0 to S5 and secondary segments S6 to S14, a table in which what segments are sandwiched between a certain extraction candidate position and another extraction candidate position (this Is called a “segment table”) as shown in FIG.

【００７９】このセグメントテーブルはグラフ理論でい
う隣接行列である。すなわち、開始切り出し点を行と
し、終了切り出し点を列とした隣接行列を考えると、そ
の要素にセグメント番号（ここではＳ０〜Ｓ１４のいず
れか）を与えることにより作成できるテーブルである。
ただし、図７においてＮＵＬＬとは、空白すなわち、挟
まれるセグメントが無いことを示している。This segment table is an adjacency matrix in graph theory. That is, considering an adjacent matrix in which the start cutout point is a row and the end cutout point is a column, the table can be created by giving a segment number (in this case, any of S0 to S14) to the element.
However, NULL in FIG. 7 indicates a blank, that is, no segment to be sandwiched.

【００８０】この図７から、１次セグメントＳ０は、切
り出し候補位置Ｃ０と切り出し候補位置Ｃ１とに挟まれ
るセグメントであること、１次セグメントＳ６は、切り
出し候補位置Ｃ０と切り出し候補位置Ｃ２とに挟まれる
セグメントであること、２次セグメントＳ７は、切り出
し候補位置Ｃ０と切り出し候補位置Ｃ３とに挟まれるセ
グメントであること等が分かる。As shown in FIG. 7, the primary segment S0 is a segment sandwiched between the clipping candidate position C0 and the clipping candidate position C1, and the primary segment S6 is sandwiched between the clipping candidate position C0 and the clipping candidate position C2. It can be seen that the secondary segment S7 is a segment sandwiched between the cutout candidate position C0 and the cutout candidate position C3.

【００８１】文字認識部２１は、各１次セグメントおよ
び各２次セグメントそれぞれを１文字と仮定して文字認
識を行ない、その結果である候補文字コードを格納す
る。文字認識処理自体は、従来公知の任意の方法により
行なうことができる。The character recognition section 21 performs character recognition on the assumption that each primary segment and each secondary segment is one character, and stores a candidate character code as a result. The character recognition processing itself can be performed by any conventionally known method.

【００８２】図８（Ａ）、（Ｂ）は、図２に示した１次
セグメントＳ０を１文字と仮定してこれを文字認識した
例を示している。図８（Ａ）が１次セグメントＳ０であ
り、図８（Ｂ）が、認識の結果としての候補文字の集合
５１である。なお候補文字は、実際は候補文字コードに
より与えられる。候補数は特に制限しないが、ここで
は、最大Ｋ位までを考慮する。この場合はＫ＝１０まで
を考慮する例を考えている。ただし、図８（Ｂ）の例の
場合は、候補文字が１個しか出なかった場合を示してい
る。FIGS. 8A and 8B show an example in which the primary segment S0 shown in FIG. 2 is recognized as one character and is recognized as a character. FIG. 8A shows a primary segment S0, and FIG. 8B shows a set 51 of candidate characters as a result of recognition. Note that the candidate characters are actually given by candidate character codes. Although the number of candidates is not particularly limited, here, up to the K-th position is considered. In this case, an example considering K = 10 is considered. However, the case of FIG. 8B shows a case where only one candidate character appears.

【００８３】また、他の１次セグメントＳ１〜Ｓ５それ
ぞれの認識結果を図９（Ａ）〜（Ｅ）にそれぞれ示し
た。また、２次セグメントＳ６〜Ｓ１４それぞれの認識
結果のうちのいくつかの例（Ｓ６，Ｓ７，Ｓ１１〜Ｓ１
４に関するもの）を図１０（Ａ）〜（Ｃ）および、図１
１（Ａ）〜（Ｃ）にそれぞれ示した。The recognition results of the other primary segments S1 to S5 are shown in FIGS. 9A to 9E, respectively. Some examples (S6, S7, S11 to S1) of the recognition results of the secondary segments S6 to S14, respectively.
4A) to 10 (A) to 10 (C) and FIG.
1 (A) to (C).

【００８４】単語作成部２３は、該文字認識により得ら
れる１次セグメントおよび２次セグメントそれぞれの候
補文字コードを組み合わせて単語を作成する。The word creating section 23 creates a word by combining the candidate character codes of the primary segment and the secondary segment obtained by the character recognition.

【００８５】この実施の形態の単語作成部２３は、候補
パス作成部２３ａと文字数規則検査部２３ｂとを含む構
成としてある。候補パス作成部２３ａおよび文字数規則
検査部２３ｂの構成および動作について、図１２を参照
して以下に説明する。The word creating section 23 of this embodiment is configured to include a candidate path creating section 23a and a character number rule checking section 23b. The configuration and operation of the candidate path creation unit 23a and the character number rule checking unit 23b will be described below with reference to FIG.

【００８６】候補パス作成部２３ａは、以下の(1) 〜
(8) の手段を含み前記１次セグメントおよびまたは２次
セグメントの連接で表される候補パスを作成する。The candidate path creation unit 23a performs the following (1) to
A candidate path represented by the concatenation of the primary segment and / or the secondary segment including the means of (8) is created.

【００８７】(1) ｍ個の１次セグメントそれぞれをＸ方
向で区分けする座標を、切り出し候補位置Ｃ０〜Ｃｍと
したとき、着目した切り出し候補位置Ｃｉ（ｉ＝０〜
ｍ）がＣｍか否かを判定する第１の手段（図１２のステ
ップ６１）。(1) When the coordinates for dividing each of the m primary segments in the X direction are set as the extraction candidate positions C0 to Cm, the extracted extraction candidate positions Ci (i = 0 to 0)
First means for determining whether or not m) is Cm (step 61 in FIG. 12).

【００８８】(2) 前記第１の手段でＣｉ＝Ｃｍと判定さ
れた場合に動作し、現在の候補パスを候補パスメモリ
（図示せず）に記録する第２の手段（図１２のステップ
６２）。(2) The second means for operating when the first means determines that Ci = Cm and recording the current candidate path in a candidate path memory (not shown) (step 62 in FIG. 12). ).

【００８９】(3) 前記第１の手段でＣｉ≠Ｃｍと判定さ
れた場合に動作し、切り出し候補位置Ｃｉと切り出し候
補位置Ｃｊ（ｊ＝ｉ＋１）とに挟まれるセグメントＳｋ
＋１が存在するか否かを判定する第３の手段（図１２の
ステップ６３〜６５）。(3) The operation is performed when Ci ≠ Cm is determined by the first means, and the segment Sk sandwiched between the extraction candidate position Ci and the extraction candidate position Cj (j = i + 1)
Third means for determining whether or not +1 exists (steps 63 to 65 in FIG. 12).

【００９０】(4) 文字数規則検査部で規定数以内と判定
された場合に動作し、前記セグメントＳｋ＋１を前記候
補パスに追加する第４の手段（図１２のステップ６
６）。(4) The fourth means for operating when the number-of-characters rule checking unit determines that the number is within the specified number and adding the segment Sk + 1 to the candidate path (step 6 in FIG. 12)
6).

【００９１】(5) 前記第４の手段に続いて動作し、前記
切り出し候補位置Ｃｊを前記着目した切り出し候補位置
Ｃｉとみなして、前記第１の手段の動作を開始させる第
５の手段（図１２のステップ６７）。(5) Fifth means (FIG. 5) which operates following the fourth means and starts the operation of the first means by regarding the cut candidate position Cj as the focused cut candidate position Ci. Twelve steps 67).

【００９２】(6) 前記第４の手段および前記第５の手段
が動作した結果作成された候補パスについては、該候補
パスに最新に追加されたセグメントを該候補パスから削
除する第６の手段（図１２のステップ６８）。(6) For a candidate path created as a result of the operation of the fourth means and the fifth means, a sixth means for deleting, from the candidate path, the segment most recently added to the candidate path. (Step 68 in FIG. 12).

【００９３】(7) 前記第３の手段が否と判定した場合、
または文字数規則検査部が規定数を越えると判定した場
合、または前記第６の手段が動作した後に動作し、前記
切り出し候補位置を規定しているｊをｊ＝ｊ＋１に変更
し、かつ、変更したｊが前記ｍとの関係でｊ＞ｍを満た
すか否かを判定する第７の手段（図１２のステップ６
９，７０）。(7) If the third means determines no,
Alternatively, when the character number rule checking unit determines that the number exceeds the specified number, or when the sixth means is operated, it is operated, and j defining the cutout candidate position is changed to j = j + 1 and changed. Seventh means for determining whether j satisfies j> m in relation to m (step 6 in FIG. 12)
9, 70).

【００９４】(8) 前記第７の手段がｊ≦ｍと判定した場
合に動作し、前記第３の手段を動作させる第８の手段
（図１２のステップ７０，６４）。(8) An eighth means (steps 70 and 64 in FIG. 12) that operates when the seventh means determines that j ≦ m and operates the third means.

【００９５】一方、文字数規則検査部２３ｂは、候補パ
ス作成部２３ａに含まれる第３の手段でセグメントＳｋ
＋１が存在すると判定された場合に動作し、セグメント
Ｓｋ＋１を候補パスに加えた場合に該候補パスのセグメ
ント数が規定数を越えないか否かを判定することで前記
文字数を規制する（図１２のステップ７１）。On the other hand, the character number rule checking unit 23b uses the segment Sk by the third means included in the candidate path creating unit 23a.
The operation is performed when it is determined that +1 exists, and when the segment Sk + 1 is added to the candidate path, the number of characters is regulated by determining whether the number of segments of the candidate path does not exceed a specified number (FIG. 12). Step 71).

【００９６】これら候補パス作成部２３ａおよび文字数
規則検査部２３ｂの理解を深めるために、候補パス作成
処理の具体例を説明する。ただし、候補パス作成処理と
文字数規則検査処理の原理がそれぞれ説明されれば良い
ので、ここではセグメントの数と切り出し候補位置の数
とを少なくした例により説明する。すなわち、切り出し
候補位置がＣ０〜Ｃ２の３個で、かつセグメントがＳ０
〜Ｓ２の３個で、然も各切り出し候補位置Ｃ０〜Ｃ２と
各セグメントＳ０〜Ｓ２との関係が図１３に示したよう
なセグメントテーブルで表される関係となっている場合
での、候補パス作成処理および文字数規則検査処理につ
いて以下に説明する。In order to deepen the understanding of the candidate path creating unit 23a and the character number rule checking unit 23b, a specific example of the candidate path creating process will be described. However, since the principles of the candidate path creation process and the character number rule checking process need only be explained, an example in which the number of segments and the number of extraction candidate positions are reduced will be described. That is, there are three extraction candidate positions C0 to C2, and the segment is S0
In the case where the relationship between each of the cutout candidate positions C0 to C2 and each of the segments S0 to S2 has a relationship represented by a segment table as shown in FIG. The creation processing and the character number rule inspection processing will be described below.

【００９７】なお、切り出し候補位置Ｃ０は文字が並ぶ
方向の最初の切り出し候補位置、また、切り出し候補位
置Ｃ２は文字が並ぶ方向の最終（最右端）の切り出し候
補位置とする。Note that the cutout candidate position C0 is the first cutout candidate position in the direction in which the characters are arranged, and the cutout candidate position C2 is the last (rightmost) cutout candidate position in the direction in which the characters are arranged.

【００９８】先ず、候補パスメモリ（図示せず）のパス
Ｐをクリアし、関数Ｆｕｎｃｔ（Ｃｉ，Ｐ）ここでは先
ず（Ｃ０，Ｐ）についての処理を開始する（図１２のス
テップ６０）。First, the path P in the candidate path memory (not shown) is cleared, and the process for the function Funct (Ci, P), here (C0, P), is first started (step 60 in FIG. 12).

【００９９】すなわち先ず、候補パス作成部２３ａは切
り出し候補位置ＣｉここではＣ０が最右端の切り出し候
補位置か否か（すなわちＣ０＝Ｃ２か否か）を判定する
（図１２のステップ６１）。なお切り出し候補位置Ｃｉ
は、ここでは、制御部１１がセグメント抽出部１７から
候補パス作成部２３ｂに転送する。That is, first, the candidate path creation unit 23a determines whether or not the cut candidate position Ci, here C0, is the rightmost cut candidate position (ie, whether or not C0 = C2) (step 61 in FIG. 12). Note that the clipping candidate position Ci
Here, the control unit 11 transfers from the segment extraction unit 17 to the candidate path creation unit 23b.

【０１００】ここでＣ０は最右端の切り出し候補位置で
はないので、ステップ６３の処理に移る。すなわちｊ＝
ｉ＋１＝０＋１＝１の処理が行なわれる。その結果Ｃｊ
はＣ１になる。Here, since C0 is not the rightmost cut-out candidate position, the process proceeds to step 63. That is, j =
The processing of i + 1 = 0 + 1 = 1 is performed. As a result, Cj
Becomes C1.

【０１０１】切り出し候補位置Ｃ０と切り出し候補位置
Ｃ１とに挟まれるセグメントＳｋ＋１を、セグメント抽
出部１７またはセグメント統合部１９から、制御部１１
は候補パス作成部２３ｂに転送する。この図１３の例の
場合はセグメントＳ０が転送される。なお該当するセグ
メントが無い場合は、制御部１１はその旨の信号（ＮＵ
ＬＬ）を候補パス作成部２３ａに転送する。The segment Sk + 1 sandwiched between the extraction candidate position C0 and the extraction candidate position C1 is sent from the segment extraction unit 17 or the segment integration unit 19 to the control unit 11
Is transferred to the candidate path creation unit 23b. In the case of the example of FIG. 13, the segment S0 is transferred. If there is no corresponding segment, the control unit 11 sends a signal to that effect (NU
LL) is transferred to the candidate path creation unit 23a.

【０１０２】候補パス作成部２３ａは、セグメントＳｋ
＋１が存在するか否かを判定する（図１２のステップ６
５）。この場合はセグメントＳ０が存在するので、ステ
ップ７１に移る。The candidate path creation section 23a sets the segment Sk
It is determined whether or not +1 exists (step 6 in FIG. 12).
5). In this case, since the segment S0 exists, the process proceeds to step 71.

【０１０３】ステップ７１では、セグメントＳｋ＋１を
候補パスＰに加えて構成した列（セグメント列）のセグ
メント数が規定数以内か否かが判定される。ここでは規
定数を４と考える。この場合のセグメント列のセグメン
トはＳ０のみであるのでセグメント数は１であるから、
規定数を満足するので、ステップ６６に移る。In step 71, it is determined whether or not the number of segments in a column (segment column) formed by adding the segment Sk + 1 to the candidate path P is within a specified number. Here, the prescribed number is considered to be four. In this case, since the segment of the segment row is only S0, the number of segments is 1, so that
Since the specified number is satisfied, the process proceeds to step 66.

【０１０４】ステップ６６では、セグメントＳ０を候補
パスＰに追加する処理がなされる。その結果、候補パス
Ｐ＝｛Ｓ０｝になる。その後、ステップ６７に移る。At step 66, processing for adding the segment S0 to the candidate path P is performed. As a result, the candidate path P = {S0}. Thereafter, the process proceeds to step 67.

【０１０５】ステップ６７では、今度はＣｊを着目する
切り出し候補位置とするので、切り出し候補位置Ｃ１が
着目する切り出し候補位置Ｃｉとみなされる。すなわち
関数をＦｕｎｃｔ（Ｃ１，Ｐ）とする。そして、ステッ
プ６１の処理から処理を再開する。In step 67, since Cj is set as the target extraction candidate position this time, the extraction candidate position C1 is regarded as the target extraction candidate position Ci. That is, the function is set to Funct (C1, P). Then, the processing is restarted from the processing of step 61.

【０１０６】したがって、候補パス作成部２３ａは今度
は切り出し候補位置Ｃ１が最右端の切り出し候補位置か
否か（すなわちＣ１＝Ｃ２か否か）を判定する（図１２
のステップ６１）。Therefore, the candidate path creation unit 23a determines whether the cutout candidate position C1 is the rightmost cutout candidate position (ie, whether or not C1 = C2) (FIG. 12).
Step 61).

【０１０７】ここでＣ１は最右端の切り出し候補位置で
はないので、ステップ６３の処理に移る。すなわちｊ＝
ｉ＋１＝１＋１＝２の処理が行なわれる。その結果、Ｃ
ｊはＣ２になる。Since C1 is not the rightmost cut-out candidate position, the process proceeds to step 63. That is, j =
The processing of i + 1 = 1 + 1 = 2 is performed. As a result, C
j becomes C2.

【０１０８】切り出し候補位置Ｃ１と切り出し候補位置
Ｃ２とに挟まれるセグメントＳｋ＋１を、セグメント抽
出部１７またはセグメント統合部１９から、制御部１１
は候補パス作成部２３ｂに転送する。この図１３の例の
場合はセグメントＳ１が転送される。The segment Sk + 1 sandwiched between the extraction candidate position C1 and the extraction candidate position C2 is sent from the segment extraction unit 17 or the segment integration unit 19 to the control unit 11
Is transferred to the candidate path creation unit 23b. In the case of the example of FIG. 13, the segment S1 is transferred.

【０１０９】候補パス作成部２３ａは、セグメントＳｋ
＋１が存在するか否かを判定する（図１２のステップ６
５）。この場合はセグメントＳ１が存在するので、ステ
ップ７１に移る。[0109] The candidate path creation unit 23a calculates the segment Sk
It is determined whether or not +1 exists (step 6 in FIG. 12).
5). In this case, since the segment S1 exists, the process proceeds to step 71.

【０１１０】ステップ７１では、セグメントＳｋ＋１を
候補パスＰに加えて構成した列（セグメント列）のセグ
メント数が規定数以内か否かが判定される。この場合の
セグメント列のセグメント数は、Ｓ０およびＳ１の２個
であるから、規定数を満足するので、ステップ６６に移
る。In step 71, it is determined whether or not the number of segments of a column (segment column) formed by adding the segment Sk + 1 to the candidate path P is within a specified number. In this case, the number of segments in the segment row is two, S0 and S1, so that the specified number is satisfied.

【０１１１】ステップ６６では、セグメントＳ１を候補
パスＰに追加する処理がなされる。その結果、候補パス
Ｐ＝｛Ｓ０，Ｓ１｝になる。その後、ステップ６７に移
る。At step 66, processing for adding the segment S1 to the candidate path P is performed. As a result, the candidate path P = {S0, S1}. Thereafter, the process proceeds to step 67.

【０１１２】ステップ６７では、今度は切り出し候補位
置Ｃ２を着目する切り出し候補位置Ｃｉとみなす。すな
わち関数をＦｕｎｃｔ（Ｃ２，Ｐ）とする。そして、ス
テップ６１の処理から処理を再開する。In step 67, the cut candidate position C2 is regarded as the target cut candidate position Ci this time. That is, the function is set to Funct (C2, P). Then, the processing is restarted from the processing of step 61.

【０１１３】したがって、候補パス作成部２３ａは今度
は切り出し候補位置Ｃ２が最右端の切り出し候補位置か
否か（すなわちＣ２＝Ｃ２か否か）を判定する（図１２
のステップ６１）。Therefore, the candidate path creation unit 23a determines whether or not the cutout candidate position C2 is the rightmost cutout candidate position (ie, whether or not C2 = C2) (FIG. 12).
Step 61).

【０１１４】ここでＣ２は最右端の切り出し候補位置で
あるので、ステップ６２の処理に移る。したがって、候
補パスＰ＝｛Ｓ０，Ｓ１｝が候補パスメモリ（図示せ
ず）に記録される。これにより、始点がＣ０で、終点が
Ｃ２で、かつ、セグメント数が規定数以下である候補パ
スの１つとして、候補パスＰ＝｛Ｓ０，Ｓ１｝が作成さ
れる。また、ここまでの処理により、関数Ｆｕｎｃｔ
（Ｃ２，Ｐ）の処理が終了する。Since C2 is the rightmost cut-out candidate position, the process proceeds to step 62. Therefore, the candidate path P = {S0, S1} is recorded in the candidate path memory (not shown). As a result, a candidate path P = {S0, S1} is created as one of the candidate paths whose start point is C0, whose end point is C2, and whose number of segments is equal to or less than the specified number. Also, by the processing up to this point, the function Funct
The processing of (C2, P) ends.

【０１１５】この候補パスＰ＝｛Ｓ０，Ｓ１｝は、第４
の手段および第５の手段が動作した結果作成された候補
パスである。すなわちステップ６６、６７の処理が済ん
だ結果作成された候補パスである。そこで、今度は、ス
テップ６８に移る。このステップ６８では、候補パスＰ
＝｛Ｓ０，Ｓ１｝から、これに最新に追加されたセグメ
ントＳ１を削除する処理をする。この結果、候補パスＰ
＝｛Ｓ０｝になる。This candidate path P = {S0, S1} is the fourth
And candidate paths created as a result of the operation of the fifth means and the fifth means. That is, it is a candidate path created as a result of the processing of steps 66 and 67. Therefore, the process proceeds to step 68. In this step 68, the candidate path P
= S0, S1}, the segment S1 newly added to this is deleted. As a result, the candidate path P
= {S0}.

【０１１６】次に、ｊ＝ｊ＋１とする（図１２のステッ
プ６９）。ここで現在のｊは２であるので、ｊ＝２＋１
＝３となる。Next, j = j + 1 is set (step 69 in FIG. 12). Here, the current j is 2, so j = 2 + 1
= 3.

【０１１７】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図１２のステップ７
０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 7 in FIG. 12).
0).

【０１１８】この場合のｊ＝３は、最大切り出し候補位
置２を越えているので、関数Ｆｕｎｃｔ（Ｃ１，Ｐ）の
処理が終了する。そこで、今度は元の関数であるＦｕｎ
ｃｔ（Ｃ０，Ｐ）についてステップ６８からの処理をす
る。Since j = 3 in this case exceeds the maximum cutout candidate position 2, the processing of the function Funct (C1, P) is completed. Therefore, this time the original function Fun
The processing from step 68 is performed for ct (C0, P).

【０１１９】したがって、候補パスＰ＝｛Ｓ０｝から、
これに最新に追加されたセグメントＳ０を削除する処理
をする。この結果、候補パスＰ＝｛｝＝０になる。Accordingly, from the candidate path P = {S0},
Then, a process of deleting the segment S0 added most recently is performed. As a result, the candidate path P = {｛} = 0.

【０１２０】次に、ｊ＝ｊ＋１とする（図１２のステッ
プ６９）。ここで現在のｊは１であるので、ｊ＝１＋１
＝２となる。Next, j = j + 1 is set (step 69 in FIG. 12). Here, the current j is 1, so j = 1 + 1
= 2.

【０１２１】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図１２のステップ７
０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 7 in FIG. 12).
0).

【０１２２】この場合のｊ＝２は、最大切り出し候補位
置２を越えていないので、ステップ６４からの処理が行
なわれる。そのため、切り出し候補位置Ｃ０と切り出し
候補位置Ｃ２とに挟まれるセグメントＳｋ＋１が存在す
るか否かの判定がなされる。この場合のセグメントＳｋ
＋１として、セグメントＳ２が存在するので（図１３参
照）、候補パスＰにセグメントＳ２を加えたセグメント
列のセグメント数が規定数以内か否かを判定する。Since j = 2 in this case does not exceed the maximum clipping candidate position 2, the processing from step 64 is performed. Therefore, it is determined whether or not there is a segment Sk + 1 sandwiched between the extraction candidate position C0 and the extraction candidate position C2. Segment Sk in this case
Since the segment S2 exists as +1 (see FIG. 13), it is determined whether or not the number of segments in the segment string obtained by adding the segment S2 to the candidate path P is within a specified number.

【０１２３】このセグメント列のセグメントはＳ２だけ
であるので、規定数３以内を満足する。したがって候補
パスＰにセグメントＳ２を加える。その結果、候補パス
Ｐ＝｛Ｓ２｝になる。その後、ステップ６７に移る。Since the segment of this segment row is only S2, it satisfies the specified number 3 or less. Therefore, the segment S2 is added to the candidate path P. As a result, the candidate path P = {S2}. Thereafter, the process proceeds to step 67.

【０１２４】この場合のＣｊは２になっているので、こ
のステップ６７では、今度は切り出し候補位置Ｃ２を着
目する切り出し候補位置Ｃｉとみなす。すなわち関数を
Ｆｕｎｃｔ（Ｃ２，Ｐ）とする。そして、ステップ６１
の処理から処理を再開する。Since Cj in this case is 2, in this step 67, the extraction candidate position C2 is regarded as the extraction candidate position Ci of interest. That is, the function is set to Funct (C2, P). And step 61
Processing is restarted from the processing of.

【０１２５】したがって、候補パス作成部２３ａは今度
は切り出し候補位置Ｃ２が最右端の切り出し候補位置か
否か（すなわちＣ２＝Ｃ２か否か）を判定する（図１２
のステップ６１）。Therefore, the candidate path creating section 23a determines whether or not the cutout candidate position C2 is the rightmost end cutout candidate position (ie, whether or not C2 = C2) (FIG. 12).
Step 61).

【０１２６】ここでＣ２は最右端の切り出し候補位置で
あるので、ステップ６２の処理に移る。したがって、候
補パスＰ＝｛Ｓ２｝が候補パスメモリ（図示せず）に記
録される。これにより、始点がＣ０で、終点がＣ２で、
かつ、セグメント数が規定数以下である候補パスの１つ
として、候補パスＰ＝｛Ｓ２｝が作成される。また、こ
こまでの処理で関数Ｆｕｎｃｔ（Ｃ２，Ｐ）についての
処理が終了する。Since C2 is the rightmost cut-out candidate position, the process proceeds to step 62. Therefore, the candidate path P = {S2} is recorded in the candidate path memory (not shown). Thus, the starting point is C0, the ending point is C2,
In addition, a candidate path P = {S2} is created as one of the candidate paths whose number of segments is equal to or less than the specified number. Further, the processing for the function Funct (C2, P) is completed by the processing up to this point.

【０１２７】この候補パスＰ＝｛Ｓ２｝は、第４の手段
および第５の手段が動作した結果作成された候補パスで
ある。すなわちステップ６６、６７の処理が済んだ結果
作成された候補パスである。そこで、今度は、ステップ
６８に移る。このステップ６８では、候補パスＰ＝｛Ｓ
２｝から、これに最新に追加されたセグメントＳ２を削
除する処理をする。この結果、候補パスＰ＝｛｝＝０
になる。This candidate path P = {S2} is a candidate path created as a result of the operation of the fourth means and the fifth means. That is, it is a candidate path created as a result of the processing of steps 66 and 67. Therefore, the process proceeds to step 68. In this step 68, the candidate path P = ｛S
From 2｝, processing is performed to delete the segment S2 that has been added to this most recently. As a result, the candidate path P = {｛} = 0
become.

【０１２８】次に、ｊ＝ｊ＋１とする（図１２のステッ
プ６９）。ここで現在のｊは２であるので、ｊ＝２＋１
＝３となる。Next, j = j + 1 is set (step 69 in FIG. 12). Here, the current j is 2, so j = 2 + 1
= 3.

【０１２９】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図１２のステップ７
０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 7 in FIG. 12).
0).

【０１３０】この場合のｊ＝３は、最大切り出し候補位
置２を越えているので、関数Ｆｕｎｃｔ（Ｃ０，Ｐ）の
処理が終了する。Since j = 3 in this case exceeds the maximum clipping candidate position 2, the processing of the function Funct (C0, P) ends.

【０１３１】この図１２を用い説明した処理は、再帰的
アルゴリズムと呼ばれる処理である。文字列の左端のセ
グメントから、右端のセグメントまでを順に再帰的に辿
ることができる処理である。The process described with reference to FIG. 12 is a process called a recursive algorithm. This is a process capable of recursively tracing from the leftmost segment of the character string to the rightmost segment.

【０１３２】この処理に従えば、Ｃ０を開始点とするセ
グメントが全て抽出される。しかも、この抽出されたセ
グメントの修了点が開始点となって連接する他のセグメ
ントがさらに順次に抽出される。しかも、Ｃ０を開始点
としかつＣ２を終了点とし然もセグメント数が規定数以
下であるセグメント列（１個のセグメントの場合も含
む）で構成される候補パスが容易に作成される。According to this process, all segments starting from C0 are extracted. In addition, other connected segments with the end point of the extracted segment as a starting point are further sequentially extracted. In addition, a candidate path composed of a sequence of segments (including one segment) whose start point is C0 and whose end point is C2 and whose number of segments is equal to or less than a specified number is easily created.

【０１３３】この図１２を用いて説明した処理を、図２
に示した原画像データ３１についての１次セグメントＳ
０〜Ｓ５および２次セグメントＳ６〜Ｓ１４に適用して
候補パスを作成すると、図１４に示したように、［１］
〜［１７］の合計１７個の候補パスが作成される。ただ
し、候補パスを構成するセグメント数の規定数は４とし
て候補パスを作成した場合である。The processing described with reference to FIG.
The primary segment S of the original image data 31 shown in FIG.
When candidate paths are created by applying the candidate paths to 0 to S5 and the secondary segments S6 to S14, as shown in FIG.
To [17], a total of 17 candidate paths are created. However, this is the case where the specified number of segments constituting the candidate path is 4 and the candidate path is created.

【０１３４】この図１４から分かるように、切り出し候
補位置Ｃ０が切り出し開始点となっているセグメントＳ
０、Ｓ６、Ｓ７各々を頂点として、これらセグメントに
ツリー状に連接するセグメントで構成される候補パスが
作成される。ただし各候補パスでは、右端のセグメント
が、切り出し候補点Ｃ６を終了点とするセグメント（具
体的にはＳ１３、Ｓ１４、Ｓ１５のいずれか）となって
いる。しかも、各候補パスを構成するセグメント数は規
定数以下（ここでは４以下）となっている。As can be seen from FIG. 14, the segment S where the extraction candidate position C0 is the extraction start point
With 0, S6, and S7 as vertices, candidate paths composed of segments connected to these segments in a tree shape are created. However, in each candidate path, the segment at the right end is a segment (specifically, one of S13, S14, and S15) ending at the cutout candidate point C6. Moreover, the number of segments constituting each candidate path is equal to or less than a specified number (here, equal to or less than 4).

【０１３５】この図１４に示した１７個の候補パスから
単語を作成する原理を、図１５に示した。FIG. 15 shows the principle of creating a word from the 17 candidate paths shown in FIG.

【０１３６】ただし、この図１５では、１７個の候補パ
スのうちの、候補パス［１］、候補パス［２］、候補パ
ス［３］および候補パス［１７］それぞれから単語を作
成する例を示してある。However, FIG. 15 shows an example in which a word is created from each of candidate path [1], candidate path [2], candidate path [3] and candidate path [17] among the 17 candidate paths. Is shown.

【０１３７】この図１５において候補パス［１］は、図
１４を用い説明した通り、セグメントＳ０、Ｓ１、Ｓ２
およびＳ１３で構成される。またここで、セグメントＳ
０、Ｓ１、Ｓ２各々の認識結果は、図８、図９（Ａ）、
（Ｂ）に示したように、１個のみであってそれぞれ
「１」である。そこで、図１５の候補パス［１］の欄の
セグメントＳ０、Ｓ１、Ｓ２の位置には、候補文字とし
てそれぞれ「１（１）」が入る。一方、セグメントＳ１
３の認識結果は、図１１（Ｂ）に示したように、第１位
〜第１０位まで合計１０個ある。そこで図１５の候補パ
ス［１］の欄のセグメントＳ１３の位置には、候補文字
として「川（１）」・・・ル（１０）」という各候補文
字が入る。したがって、この候補パス［１］からは、
『１１１川』・・・『１１１ル』までの１０個の単語が
作成される。以下同様にして、候補パス［２］、候補パ
ス［３］、・・・、候補パス［１７］から候補単語が作
成できる。In FIG. 15, the candidate path [1] includes segments S0, S1, S2 as described with reference to FIG.
And S13. Also, here, segment S
The recognition results of each of 0, S1, and S2 are shown in FIGS.
As shown in (B), there is only one and each is “1”. Therefore, “1 (1)” is entered as a candidate character in each of the positions of the segments S0, S1, and S2 in the column of the candidate path [1] in FIG. On the other hand, segment S1
As shown in FIG. 11B, there are a total of ten recognition results for the first to tenth places. Therefore, at the position of the segment S13 in the column of the candidate path [1] in FIG. 15, each candidate character such as “kawa (1)”... Therefore, from this candidate path [1],
"111 river" ... ten words up to "111 le" are created. Similarly, a candidate word can be created from the candidate path [2], the candidate path [3],..., The candidate path [17].

【０１３８】単語照合部２５は、単語作成部２３で作成
された単語を単語辞書（図示せず）と照合する。単語照
合部２５は、単語照合の結果単語辞書に候補パスで示す
単語が存在しなければ、その候補パスの評価値として
「０」を格納し、一方、候補パスで示す単語が存在して
いれば、その候補パスの評価値として「１」を格納す
る。The word matching section 25 checks the word created by the word creating section 23 against a word dictionary (not shown). The word matching unit 25 stores “0” as an evaluation value of the candidate path if the word indicated by the candidate path does not exist in the word dictionary as a result of the word matching. For example, "1" is stored as the evaluation value of the candidate path.

【０１３９】例えば図１５に示した例の場合は、候補パ
ス［１］の欄の「１１１川」という単語から候補パス
［３］の欄の「１１リ１」という単語まで、および候補
パス［１７］の欄の「州ル」という単語は、いずれも単
語辞書に存在しない。したがって、これら各候補パスの
評価値として「０」を格納する。これに対し、候補パス
［１７］の欄の「小川」という単語は単語辞書に存在す
る。したがって、この候補パスの評価値として「１」を
格納する。For example, in the case of the example shown in FIG. 15, from the word “111 river” in the column of the candidate path [1] to the word “11” in the column of the candidate path [3], and from the candidate path [1]. 17] does not exist in the word dictionary. Therefore, “0” is stored as the evaluation value of each of these candidate paths. On the other hand, the word "Ogawa" in the column of the candidate path [17] exists in the word dictionary. Therefore, “1” is stored as the evaluation value of this candidate path.

【０１４０】結果選択部２７は、単語照合の結果に基づ
いて、候補パスを構成している１次セグメントおよびま
たは２次セグメントのうちのいずれかを、１文字分の文
字切り出し領域と決定する。ここでは単語照合の最も評
価値の良い候補パスの各セグメントを切り出し領域とす
る。図１５の例でいえば、候補パス［１７］における
「小（１）川（１）」という候補パスを構成しているセ
グメントＳ７とセグメントＳ１３とを、それぞれ文字切
り出し領域とする。The result selecting section 27 determines one of the primary segment and the secondary segment constituting the candidate path as a character cutout area for one character based on the result of the word collation. Here, each segment of the candidate path having the best evaluation value in word matching is set as a cutout area. In the example of FIG. 15, the segments S7 and S13 constituting the candidate path "Small (1) river (1)" in the candidate path [17] are each character extraction regions.

【０１４１】結果出力部２９は、結果選択部２７で決定
された文字切り出し領域に対応する切り出し候補位置を
例えば制御部１１に出力する。図１５の例でいえば、Ｃ
０、Ｃ３およびＣ６それぞれが文字切り出し候補位置と
して制御部１１に出力される。制御部１１はこの結果に
基づいて文字切り出しを指示することができる。The result output unit 29 outputs a cutout candidate position corresponding to the character cutout area determined by the result selection unit 27 to, for example, the control unit 11. In the example of FIG. 15, C
0, C3, and C6 are output to the control unit 11 as character extraction candidate positions. The control unit 11 can instruct character cutout based on the result.

【０１４２】なお、単語照合の結果において評価値が
「１」である候補パスが複数出現した場合は、例えば、
認識結果での候補順位に着目し候補順位が高い認識結果
で構成された単語を選択する等、第２の評価法、第３の
評価法等に従い、文字切り出しの規準となる単語を決定
すれば良い。When a plurality of candidate paths having an evaluation value of “1” appear in the result of word matching, for example,
According to the second evaluation method, the third evaluation method, or the like, a word to be a criterion for character segmentation is determined, such as selecting a word composed of a recognition result having a high candidate rank by focusing on the candidate rank in the recognition result. good.

【０１４３】上述においてはこの発明の実施の形態につ
いて説明した。しかしこの発明は上述の実施の形態に何
ら限定されるものではなく、多くの変形または変更を行
なうことが出来る。The embodiment of the present invention has been described above. However, the present invention is not limited to the above-described embodiment, and many modifications or changes can be made.

【０１４４】例えば、上述の実施の形態では、文字数を
規定する際、規準値以下という規定をしていた。しか
し、規準値としてＴ１およびＴ２（Ｔ１＜Ｔ２）という
ように２つの規準値を設ける。そして、Ｔ１＜文字数
（実施の形態でいえばセグメント数）＜Ｔ２というよう
に、文字数を規制しても良い。こうすると、候補パスの
下限側の数も制約できるので、候補パス数をさらに減ら
すことができる。そのため、単語作成処理等の処理量を
さらに低減することができる。For example, in the above-described embodiment, when defining the number of characters, the number of characters is defined to be equal to or less than the reference value. However, two reference values such as T1 and T2 (T1 <T2) are provided as reference values. The number of characters may be restricted such that T1 <number of characters (in the embodiment, the number of segments) <T2. In this case, the number of candidate paths on the lower limit side can be restricted, so that the number of candidate paths can be further reduced. Therefore, the amount of processing such as word creation processing can be further reduced.

【０１４５】[0145]

【発明の効果】上述した説明から明らかなように、この
出願の文字切り出し方法の発明によれば、メモリに格納
されている文字列についての画像データを含む原画像デ
ータから、黒ビットの塊領域である１次セグメントをそ
れぞれ抽出する処理と、抽出された各１次セグメントを
所定規則に従い統合して２次セグメントを作成する処理
と、各１次セグメントおよび各２次セグメントそれぞれ
を文字認識する処理と、該文字認識により得られる１次
セグメントおよび２次セグメントそれぞれの候補文字コ
ードを組み合わせて単語を作成する処理と、前記作成さ
れた単語を単語辞書と照合する処理と、該単語照合の結
果に基づいて文字切り出し領域を決定する処理と、を含
む文字切り出し方法において、前記単語を構成する文字
数を規制して前記単語を作成する。そのため、単語作成
の際の計算対象から無駄な候補文字コードとその組み合
わせとを省くことができる。したがって、計算量の無駄
を省くことができるので、より効率的な文字切り出しを
行なうことができる。As is apparent from the above description, according to the invention of the character extracting method of this application, a black bit block area is obtained from the original image data including the image data of the character string stored in the memory. , A process of integrating the extracted primary segments according to a predetermined rule to create a secondary segment, and a process of recognizing each primary segment and each secondary segment with a character. A process of creating a word by combining candidate character codes of the primary segment and the secondary segment obtained by the character recognition; a process of matching the created word with a word dictionary; Determining a character cut-out area based on the number of characters constituting the word by controlling the number of characters constituting the word. To create a word. Therefore, useless candidate character codes and combinations thereof can be omitted from the calculation target when creating words. Therefore, it is possible to eliminate the waste of the calculation amount, and it is possible to perform more efficient character segmentation.

【０１４６】また、この出願の文字切り出し装置の発明
によれば、上述した文字切り出し方法の発明を容易に実
施することができる。Further, according to the invention of the character extracting device of this application, the invention of the above-described character extracting method can be easily implemented.

[Brief description of the drawings]

【図１】実施の形態の文字切り出し装置の説明図であ
る。FIG. 1 is an explanatory diagram of a character cutout device according to an embodiment.

【図２】１次セグメントと切り出し候補位置とを説明す
る図である。FIG. 2 is a diagram for explaining a primary segment and a clipping candidate position.

【図３】（Ａ）および（Ｂ）は、統合前のセグメント座
標テーブルの例を示した図である。FIGS. 3A and 3B are diagrams showing examples of a segment coordinate table before integration.

【図４】セグメント統合部の説明図である。FIG. 4 is an explanatory diagram of a segment integration unit.

【図５】２次セグメント作成手順の説明図である。FIG. 5 is an explanatory diagram of a secondary segment creation procedure.

【図６】統合後のセグメント座標テーブルの例を示した
図である。FIG. 6 is a diagram showing an example of a segment coordinate table after integration.

【図７】統合後のセグメントテーブルの例を示した図で
ある。FIG. 7 is a diagram showing an example of a segment table after integration.

【図８】認識結果の一例（その１）を示した図であり、
セグメントＳ０の認識結果を示した図である。FIG. 8 is a diagram showing an example (part 1) of a recognition result;
FIG. 9 is a diagram showing a recognition result of a segment S0.

【図９】（Ａ）〜（Ｅ）は、認識結果の一例（その２）
を示した図であり、セグメントＳ１〜Ｓ５の認識結果を
それぞれ示した図である。FIGS. 9A to 9E are examples of recognition results (part 2); FIGS.
And is a diagram showing recognition results of the segments S1 to S5, respectively.

【図１０】（Ａ）〜（Ｃ）は、認識結果の一例（その
３）を示した図であり、セグメントＳ６、Ｓ７、Ｓ１１
の認識結果をそれぞれ示した図である。FIGS. 10A to 10C are diagrams illustrating an example (part 3) of a recognition result, and include segments S6, S7, and S11.
It is the figure which showed each recognition result.

【図１１】（Ａ）〜（Ｃ）は、認識結果の一例（その
４）を示した図であり、セグメントＳ１２、Ｓ１３、Ｓ
１４の認識結果をそれぞれ示した図である。FIGS. 11A to 11C are diagrams illustrating an example (part 4) of a recognition result, and include segments S12, S13, and S.
It is the figure which showed each of 14 recognition results.

【図１２】候補パス作成部および文字数規則検査部の説
明図である。FIG. 12 is an explanatory diagram of a candidate path creation unit and a character number rule checking unit.

【図１３】候補パス作成処理の具体例の説明図である。FIG. 13 is an explanatory diagram of a specific example of a candidate path creation process.

【図１４】候補パスの説明図であり、図２に示した原画
像データから規定数を４として作成される候補パスの説
明図である。14 is an explanatory diagram of a candidate path, and is an explanatory diagram of a candidate path created from the original image data shown in FIG. 2 with a specified number of 4; FIG.

【図１５】候補パスから単語を作成する原理の説明図で
ある。FIG. 15 is an explanatory diagram of the principle of creating a word from a candidate path.

[Explanation of symbols]

１０：実施の形態の文字切り出し装置１１：制御部１３：画像入力部１５：文字数規則入力部１７：セグメント抽出部１９：セグメント統合部２１：文字認識部２３：単語作成部２３ａ：候補パス作成部２３ｂ：文字数規則検査部２５：単語照合部２７：結果選択部２９：結果出力部Ｓ０〜Ｓ５：セグメント（１次セグメント）Ｓ６〜Ｓ１４：セグメント（２次セグメント）Ｃ０〜Ｃ６：切り出し候補位置５１：候補文字の集合 10: Character extraction device of embodiment 11: Control unit 13: Image input unit 15: Character number rule input unit 17: Segment extraction unit 19: Segment integration unit 21: Character recognition unit 23: Word creation unit 23a: Candidate path creation unit 23b: Character number rule checking unit 25: Word matching unit 27: Result selecting unit 29: Result output unit S0 to S5: Segment (primary segment) S6 to S14: Segment (secondary segment) C0 to C6: Cutout candidate position 51: Set of candidate characters

Claims

[Claims]

1. A process for extracting primary segments each of which is a black bit lump area from original image data including image data of a character string stored in a memory, and extracting each of the extracted primary segments by a predetermined process. Integrate according to the rules 2
A word is created by combining a process of creating a next segment, a process of character recognition of each primary segment and each secondary segment, and a candidate character code of each of the primary segment and secondary segment obtained by the character recognition. Processing, processing of matching the created word with a word dictionary, and extracting one of the primary segments and the secondary segments based on the result of the word matching into a character cutout area for one character. Determining the number of characters constituting the word, and creating the word.

2. The character extracting method according to claim 1, wherein a reference value for regulating the number of characters is determined according to a field to be recognized.

3. The character segmenting method according to claim 1, wherein the secondary segment is defined as follows: when the direction in which characters are arranged is the X direction, m primary segments that are continuous in the X direction are determined according to a predetermined rule. The word is a candidate path represented by the concatenation of the primary segment and / or the secondary segment, and the following words (a) to
A character segmentation method (where m is an integer of 2 or more), which is created based on a candidate path created by a process including the process (d). (a) Coordinates for dividing each of the m primary segments in the X direction are determined as candidate clipping positions Ci (i = 0 to 0).
m), from among the m primary segments and the created secondary segment, a cutout candidate position C
A process of extracting all the segments for which 0 is the extraction start point. (b) With respect to each of the segments extracted in the process (a), the segmentation candidate position Cj (j = 1 to m) on the end point side of the segment is the segmentation start position, so that the segments can be connected. Other segments which have the same relationship with the other segment and have the same candidate clipping position as described above, and which can be further connected to each other, are replaced by other segments whose candidate candidate positions on the end point side are Cm. A process of extracting from the m primary segments and the created secondary segments until appears. (c) A process of determining whether or not the number of segments of a candidate path including the other segment is within a specified number every time the other segment is extracted in the process of (b). (d) A process in which a candidate path having a number of segments within the specified number and a candidate position for extracting the end point of the last segment in the candidate path being Cm is set as a candidate path for word creation. .

4. The character segmentation method according to claim 1, wherein, when the direction in which characters are arranged is the X direction, m secondary segments that are continuous in the X direction are determined according to a predetermined rule. The word is a candidate path represented by the concatenation of the primary segment and / or the secondary segment, and is expressed by the following (1) to
A character segmentation method characterized by being created based on a candidate path created by a process including the process (9) (where m is an integer of 2 or more). (1) When coordinates for partitioning each of the m primary segments in the X direction are set as candidate clipping positions C0 to Cm, it is determined whether or not the focused candidate clipping position Ci (i = 0 to m) is Cm. First processing for determination. (2) A second process that is executed when it is determined that Ci = Cm in the first process, and records the current candidate path in the candidate path memory. (3) This is executed when it is determined that Ci ≠ Cm in the first processing, and the extraction candidate position Ci and the extraction candidate position Cj
Third processing for determining whether or not a segment Sk + 1 sandwiched between (j = i + 1) exists. (4) This is executed when it is determined that a segment exists in the third processing, and when the segment Sk + 1 is added to a candidate path, it is determined whether or not the number of segments of the candidate path does not exceed a specified number. Fourth processing. (5) A fifth process which is executed when it is determined in the fourth process that the number is within the specified number, and adds the segment Sk + 1 to the candidate path. (6) A sixth process which is executed subsequent to the fifth process, and regards the clipping candidate position Cj as the focused clipping candidate position Ci and re-executes the first process. (7) A seventh process of deleting, from the candidate path, a segment that is most recently added to the candidate path for a candidate path created by executing the fifth process and the sixth process. (8) This is executed when it is determined to be no in the third process, or when it is determined to be no in the fourth process, or when the seventh process is executed, and the cutout candidate position is determined. An eighth process of changing the specified j to j = j + 1 and determining whether the changed j satisfies j> m in relation to the m. (9) A ninth process that is executed when j ≦ m is determined in the eighth process, and is executed again from the third process.

5. The character segmenting method according to claim 1, wherein the predetermined rule for integrating the primary segments is that, when the direction in which the characters are arranged is the X direction, The height H of the segment having the highest height among the m primary segments is determined, and the other primary segments existing in the H × N coordinate range in the X direction with respect to the primary segment of interest are obtained. A character segmentation method using a rule of integrating segments into the primary segment of interest (where N is a predetermined value).

6. The character extracting method according to claim 1, wherein the character string is a character string including a handwritten character.

7. A segment extraction unit for extracting a primary segment, which is a block area of black bits, from input image data including image data of a character string stored in a memory, and each extracted primary segment. A segment integrating unit that creates a secondary segment by integrating the primary segment and the secondary segment according to predetermined rules, a character recognizing unit that recognizes each primary segment and each secondary segment, and a primary segment and / or a secondary segment obtained by the character recognition. A word creating unit that creates a word by combining the candidate character codes; a word matching unit that matches the created word with a word dictionary; and the primary segment and the 2nd segment based on the result of the word matching. A result selection unit for determining any one of the next segments as a character cutout area for one character;
A character segmentation device comprising: a character number regulation inspection unit for regulating the number of characters constituting a word in the word creation unit.

8. The character segmenting device according to claim 7, wherein, when the direction in which characters are arranged is the X direction, m segment segments that are continuous in the X direction are defined as the segment integrating unit according to a predetermined rule. A segment integrating unit for integrating, the word generating unit includes: (A) generating a candidate path represented by a concatenation of the primary segment and / or the secondary segment including the following means (1) to (8): (B) operates when the third means included in the candidate path creation unit determines that a segment exists, and when the segment Sk + 1 is added to the candidate path, the number of segments of the candidate path A character cutout device (where m is an integer of 2 or more), comprising a character number rule checker that controls the number of characters by determining whether the number does not exceed a prescribed number. (1) When coordinates for partitioning each of the m primary segments in the X direction are set as candidate clipping positions C0 to Cm, it is determined whether or not the focused candidate clipping position Ci (i = 0 to m) is Cm. First means for determining. (2) The second operation is performed when the first means determines that Ci = Cm, and records the current candidate path in the candidate path memory.
Means. (3) The first means operates when it is determined that Ci ≠ Cm, and the cut-out candidate position Ci and the cut-out candidate position Cj
Third means for determining whether or not a segment Sk + 1 sandwiched between (j = i + 1) exists. (4) A fourth means which operates when the number-of-characters rule checking unit determines that the number is within the specified number, and adds the segment Sk + 1 to the candidate path. (5) A fifth means which operates following the fourth means, and regards the cutout candidate position Cj as the focused cutout candidate position Ci, and starts the operation of the first means. (6) A sixth means for deleting, from the candidate path, a segment most recently added to the candidate path for the candidate path created as a result of the operation of the fourth means and the fifth means. (7) When the third means determines no, or when the character number rule checking unit determines that the number exceeds a specified number, or operates after the sixth means operates, and specifies the cutout candidate position. A seventh means for changing j being performed to j = j + 1 and determining whether or not the changed j satisfies j> m in relation to m. (8) operates when the seventh means determines that j ≦ m,
Eighth means for operating the third means.

9. The character segmenting device according to claim 7, wherein the segment integrating unit is configured to calculate m primary segments that are continuous in the X direction when a direction in which characters are arranged is the X direction. A means for calculating the height H of the segment having the highest height, and the other primary segments existing in the H × N coordinate range in the X direction with respect to the primary segment of interest as the primary segment of interest. A character extracting device (where N is a predetermined value).