JPH10240867A

JPH10240867A - Method and device for character segmentation

Info

Publication number: JPH10240867A
Application number: JP9043945A
Authority: JP
Inventors: Hiroshi Sasaki; 佐々木　　寛; Hirohisa Goto; 裕久後藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-02-27
Filing date: 1997-02-27
Publication date: 1998-09-11

Abstract

PROBLEM TO BE SOLVED: To select a word matching a source image out of words even when there are more than one word in a word dictionary among generated words by matching a generated word against the word dictionary and determining a character segmentation area for one character according to the result of the word matching. SOLUTION: A word generation part 25 generates a word by combining candidate character codes of a primary segment and a secondary segment obtained by a character recognition part 21. A word matching part 27 matches the word generated by the word generation part 25 against the word dictionary. A result selection part 33 compares word level evaluated values as to the word generated by the word generation part 25 and selects a preferential word as a word for character area segmentation. A result output part 35 outputs a segmentation candidate position as to the characters constituting the word determined by the result selection part 33 to a control part 11. The control part 11 indicates character segmentation according to the result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文字認識技術に
おける文字切り出し方法とその実施に好適な文字切り出
し装置とに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting method in a character recognition technology and a character extracting apparatus suitable for implementing the method.

【０００２】[0002]

【従来の技術】手書き文字列は、活字文字列に比べ、文
字間隔や文字形状の変化が大きい。そのため、手書き文
字列についての画像データから文字切り出しをする際に
一定間隔ごとに文字を切り出すと、文字を精度良く切り
出せない。これは誤認識の原因になる。これを解決する
ための従来技術として、例えば特開平５−３５９１７号
公報に開示された文字切り出し方法がある。2. Description of the Related Art A handwritten character string has a greater change in character spacing and character shape than a printed character string. Therefore, if characters are cut out at regular intervals when extracting characters from image data of a handwritten character string, characters cannot be extracted accurately. This causes misrecognition. As a conventional technique for solving this, there is a character cutout method disclosed in, for example, Japanese Patent Application Laid-Open No. 5-35917.

【０００３】この従来技術では先ず行画像から文字塊が
切り出される。ここで文字塊とは、黒ビットの塊領域で
ある。なお行画像から切り出された文字塊を、この明細
書では１次セグメントともいう。この文字塊が１文字で
ある保証はない。すなわち、文字塊はそれ単独で文字パ
タンを構成する場合と、文字パタンの一部である場合と
がある。次に、文字塊（１次セグメント）が統合されて
１文字の大きさと見做し得るパタンが文字パタン（２次
セグメントともいう）として生成される。なお１次セグ
メントが文字パタンとされる場合もある。次に、文字パ
タンについて文字認識がされる。次に文字パタンが他の
文字パタンとの関係において同じ文字塊を含まない場合
（文字塊の重複がない場合）、その文字パタンはそのま
ま切り出される。一方、文字塊の重複がある場合は、こ
れら文字パタンそれぞれの認識結果とその前あるいは後
の数文字の認識結果それぞれの第１位候補を組み合わせ
て単語が生成される。生成された単語は単語辞書と照合
される。In this prior art, a character block is first cut out from a line image. Here, the character block is a block region of black bits. Note that the character block cut out from the line image is also referred to as a primary segment in this specification. There is no guarantee that this character block is one character. That is, the character block may form a character pattern by itself or may be a part of the character pattern. Next, the character blocks (primary segments) are integrated, and a pattern that can be regarded as the size of one character is generated as a character pattern (also referred to as a secondary segment). The primary segment may be a character pattern. Next, character recognition is performed on the character pattern. Next, when the character pattern does not include the same character block in relation to another character pattern (when there is no overlap of the character blocks), the character pattern is cut out as it is. On the other hand, when there is an overlap of character blocks, a word is generated by combining the recognition result of each of these character patterns and the first candidate of each of the recognition results of several characters before or after the character pattern. The generated words are checked against a word dictionary.

【０００４】生成された単語で単語辞書中の登録単語に
一致した単語については、その文字パタンが切り出され
る。不一致の場合は、単語生成の前の文字パターンそれ
ぞれの評価値および単語の評価値が求められ、評価値が
最高の文字パタンが切り出される。[0004] For a word that matches a registered word in the word dictionary with the generated word, its character pattern is cut out. If they do not match, the evaluation value of each character pattern before word generation and the evaluation value of the word are obtained, and the character pattern with the highest evaluation value is cut out.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来技
術では、文字パタンの認識結果同士を組み合わせて生成
した単語を単語辞書と照合した際に、これら単語中に単
語辞書に存在する単語が複数あった場合の手当について
は、何ら記載されていない。そのため、以下に説明する
ような問題が生じる。However, in the prior art, when words generated by combining the results of character pattern recognition are collated with a word dictionary, there are a plurality of words present in the word dictionary among these words. No allowance is given for the case. Therefore, a problem as described below occurs.

【０００６】例えば複数の切り出しが考えられる原画像
の場合を考える。具体的には、ある文字パタンＡの認識
結果の第１位候補が「矢」、ある文字パタンＢの認識結
果の第１位候補が「知」、ある文字パタンＣの第１位候
補が「田」、ある文字パタンＤの第１位候補が「口」で
ある例を考える。しかも、文字パタンＡと文字パタンＢ
とは、画像上重複しており、かつ、文字パタンＣと文字
パタンＤとは画像上重複している例を考える。このと
き、候補文字の組み合わせ「Ａ→Ｃ」では「矢田」とい
う単語になり、「Ｂ→Ｄ」では「知口」という単語にな
る。「矢田」および「知口」いずれも、単語辞書に登録
されていたと仮定する。しかし、従来技術では、これら
競合した単語のいずれを選択するかに関する記載はされ
ていないので、先に単語照合された方が、自動的に最終
結果となってしまう。次になされる文字切り出しの処理
では、この最終結果とされた単語の文字パタン単位に文
字切り出しがなされるから、(1) 「矢田」が最終結果と
された場合は「矢」および「田」がそれぞれ切り出さ
れ、(2) 「知口」が最終結果とされた場合は「知」およ
び「口」がそれぞれ切り出される。これら(1) と(2) と
を比較することで分かるように、どの単語を選択するか
で、文字の切り出され方に大きな違いが生じる。したが
って、従来技術では、切り出し精度、認識精度が低下し
てしまう場合がある。For example, consider the case of an original image in which a plurality of cutouts can be considered. Specifically, the first candidate of the recognition result of a certain character pattern A is “arrow”, the first candidate of the recognition result of a certain character pattern B is “knowledge”, and the first candidate of a certain character pattern C is “ Consider an example in which the first candidate of a certain character pattern D is “mouth”. Moreover, character pattern A and character pattern B
Means that the character pattern C and the character pattern D overlap on the image. At this time, the word “Yada” is obtained for the combination of candidate characters “A → C”, and the word “Shiguchi” is obtained for “B → D”. It is assumed that both “Yada” and “Shiguchi” have been registered in the word dictionary. However, in the related art, there is no description as to which of these competing words is to be selected, so that the final result is automatically obtained when the words are compared first. In the character extraction process performed next, character extraction is performed for each character pattern of the word as the final result, so (1) if “Yata” is the final result, “Y” and “ (2) If “Shiguchi” is the final result, “Shi” and “Mouth” are respectively extracted. As can be seen by comparing (1) and (2), there is a great difference in how characters are cut out depending on which word is selected. Therefore, in the related art, the cutout accuracy and the recognition accuracy may be reduced.

【０００７】したがって、１次セグメントおよび２次セ
グメントそれぞれの候補文字コードを組み合わせて単語
を作成し、この作成された単語を単語辞書と照合し、該
単語照合の結果に基づいて原画像から１文字分の文字切
り出し領域を決定する処理を含む文字切り出し方法であ
って、作成した単語の中に単語辞書中に存在する単語が
複数あった場合でもこれら複数の単語中から原画像に適
合した単語を選択することができる文字切り出し方法と
その実施に好適な文字認識装置とが望まれる。Therefore, a word is created by combining the candidate character codes of the primary segment and the secondary segment, and the created word is collated with a word dictionary, and one character is extracted from the original image based on the result of the word collation. This is a character extraction method including a process of determining a character extraction area for each word. Even if there are a plurality of words in the word dictionary among the created words, a word matching the original image is selected from among the plurality of words. There is a need for a character segmentation method that can be selected and a character recognition device suitable for implementing the method.

【０００８】[0008]

【課題を解決するための手段】そこでこの出願の文字切
り出し方法の発明によれば、メモリに格納されている文
字列についての画像データを含む原画像データから、黒
ビットの塊領域である１次セグメントをそれぞれ抽出す
る処理と、抽出された各１次セグメントを所定規則に従
い統合して２次セグメントを作成する処理と、各１次セ
グメントおよび各２次セグメントそれぞれを文字認識す
る処理と、該文字認識により得られる１次セグメントお
よび２次セグメントそれぞれの候補文字コードを組み合
わせて単語を作成する処理と、前記作成された単語を単
語辞書と照合する処理と、該単語照合の結果に基づいて
１文字分の文字切り出し領域を決定する処理と、を含む
文字切り出し方法において、以下の各処理をさらに含む
ことを特徴とする。Therefore, according to the invention of the character extracting method of this application, the original image data including the image data of the character string stored in the memory is converted into a primary area which is a block area of black bits. A process of extracting each segment, a process of integrating the extracted primary segments according to a predetermined rule to create a secondary segment, a process of character recognition of each primary segment and each secondary segment, A process of creating a word by combining candidate character codes of the primary segment and the secondary segment obtained by the recognition, a process of matching the created word with a word dictionary, and a process for matching one character based on the result of the word matching And a process of determining a character cut-out area for a minute.

【０００９】：前記各１次セグメントおよび各２次セ
グメントそれぞれの形状特徴に基づく形状特徴評価値を
求める処理。Processing for obtaining a shape characteristic evaluation value based on the shape characteristics of each of the primary segment and each of the secondary segments.

【００１０】：前記文字認識をする際に各セグメント
それぞれの認識結果候補を最大Ｋ位まで求め、該認識結
果候補における文字種の割合で示される認識評価値を各
セグメントごとに求める処理。ただし、Ｋは予め定めた
正の整数である。In the above-described character recognition, a process of obtaining a recognition result candidate of each segment up to the Kth position and obtaining a recognition evaluation value represented by a ratio of a character type in the recognition result candidate for each segment. Here, K is a predetermined positive integer.

【００１１】：前記作成された単語ごとに、該単語を
構成している各文字の構成セグメントについての前記形
状特徴評価値、前記認識評価値およびセグメント数（前
記構成セグメントが１次セグメントの場合は１、２次セ
グメントの場合は統合されたセグメント数をいう）に基
づいて算出される文字レベル評価値を求める処理。[0011] For each of the created words, the shape feature evaluation value, the recognition evaluation value, and the number of segments for the constituent segments of each character constituting the word (if the constituent segment is a primary segment, A process of calculating a character level evaluation value calculated based on the number of integrated segments in the case of primary and secondary segments).

【００１２】：前記作成された単語を単語辞書と照合
する際に該単語が単語辞書に存在するか否かにより示さ
れる単語レベル評価値を求める処理。When the created word is compared with a word dictionary, a process of obtaining a word level evaluation value indicated by whether or not the word exists in the word dictionary.

【００１３】：文字切り出し領域を決定する前記処理
として、各単語についての前記単語レベル評価値同士を
比較しそれにより優劣がついた場合は当該単語を選択
し、単語レベル評価値が同じ単語が複数存在した場合は
それら複数の単語同士の前記文字レベル評価値同士を比
較して１つの単語を選択し、該選択した単語を構成して
いるセグメントそれぞれを、前記文字切り出し領域とす
る処理。In the processing for determining a character cut-out area, the word level evaluation values for each word are compared with each other, and when the word levels are superior, the word is selected, and a plurality of words having the same word level evaluation value are selected. If there is, the character level evaluation values of the plurality of words are compared with each other to select one word, and each segment constituting the selected word is set as the character cutout area.

【００１４】この文字切り出し方法の発明によれば、ま
ず、セグメントごとの形状特徴評価値と認識評価値とが
それぞれ求められる。また、セグメントごとの認識結果
を組み合わせて単語が作成される。さらに、作成された
単語ごとの文字レベル評価値が、前記形状特徴評価値、
認識評価値およびセグメント数に基づいて求められる。
ここで形状評価値は、セグメントが１文字らしい形状を
持つか否かの評価値といえる。また、認識評価値は、セ
グメントがいかなる文字種（例えば記号・数字、カタカ
ナ、ひらがな、漢字の４種類のうちのいかなる文字種）
に所属する性質が高いか否かを示す評価値といえる。し
たがって、文字レベル評価値は、これら形状特徴評価値
および認識評価値が反映された評価値となる。すなわ
ち、文字レベル評価値は、(1) 作成された単語を構成し
ている各文字が１文字らしいか否かという点と、(2) 各
文字が妥当な文字種のセグメント同士で構成されている
か否かという点（換言すれば、例えばひらがなと思われ
るセグメントと漢字と思われるセグメントとが組み合わ
されている文字等のように、存在する確率が低い文字を
排除する点）とを考慮した評価値といえる。そのため、
文字レベル評価値を用いると、１文字らしい形状でない
文字を含む単語や、文字種からいって通常考えられない
セグメント同士の組み合わせとなっている文字を含む単
語は、文字切り出し領域を決めるための単語として選択
されにくくできる。したがって、この文字レベル評価値
を用いる本発明では、作成した単語の中に単語辞書中に
存在する単語が複数あった場合でもこれら複数の単語中
から原画像に適合した単語を選択し易いといえる。ま
た、このように原画像に適合した単語から文字切り出し
領域を決定することができるので、切り出し精度、認識
精度の向上が図れる。According to the invention of the character extracting method, first, a shape characteristic evaluation value and a recognition evaluation value for each segment are obtained. Also, words are created by combining the recognition results for each segment. Further, the created character level evaluation value for each word is the shape feature evaluation value,
It is obtained based on the recognition evaluation value and the number of segments.
Here, the shape evaluation value can be said to be an evaluation value as to whether or not the segment has a shape like one character. In addition, the recognition evaluation value indicates that the segment has any character type (for example, any one of the four character types of symbols and numbers, katakana, hiragana, and kanji).
Can be said to be an evaluation value indicating whether or not the property belonging to is high. Therefore, the character level evaluation value is an evaluation value reflecting the shape characteristic evaluation value and the recognition evaluation value. In other words, the character level evaluation value is (1) whether each character constituting the created word is likely to be one character, and (2) whether each character is composed of segments of a valid character type. Evaluation value in consideration of whether or not (in other words, characters that have a low probability of being present, such as characters in which a segment considered to be Hiragana and a segment considered to be Kanji are combined) It can be said that. for that reason,
Using the character level evaluation value, a word including a character that does not have a shape like a single character or a word including a character that is a combination of segments that cannot be normally considered due to the character type is used as a word for determining a character cutout area. It can be difficult to select. Therefore, in the present invention using this character level evaluation value, even if there are a plurality of words in the word dictionary among the created words, it can be said that it is easy to select a word suitable for the original image from the plurality of words. . In addition, since the character cutout region can be determined from the word suitable for the original image, the cutout accuracy and the recognition accuracy can be improved.

【００１５】なお、この文字切り出し方法の発明の実施
に当たり、形状特徴評価値をセグメントの縦横比とする
のが好適である。こうすると、文字の外接枠に相当する
形状がセグメントの形状特徴として考慮される。一般に
１文字らしい文字は正方形状に近いことが多いので、セ
グメントの縦横比を用いると、セグメントの１文字らし
さの特徴把握を比較的容易に行なうことができる。In implementing the character extracting method of the present invention, it is preferable that the shape characteristic evaluation value be the aspect ratio of the segment. In this case, the shape corresponding to the circumscribed frame of the character is considered as the shape characteristic of the segment. In general, a character that is likely to be a single character is often close to a square shape. Therefore, if the aspect ratio of a segment is used, it is relatively easy to grasp the characteristics of a single character of a segment.

【００１６】さらにこの文字切り出し方法の発明の実施
に当たり、前記２次セグメントは、文字の並ぶ方向をＸ
方向としたとき該Ｘ方向に連続しているｍ個の１次セグ
メントを所定規則に従い統合することで作成し、かつ、
前記単語は、前記１次セグメントおよびまたは２次セグ
メントの連接で表される候補パスであって以下の(a) 〜
(d) の処理を含む処理により作成される候補パスに基づ
いて作成するのが好適である。ただし、ｍは２以上の整
数。Further, in implementing the invention of this character segmentation method, the secondary segment has a direction in which characters are arranged in X.
The direction is created by integrating m primary segments that are continuous in the X direction according to a predetermined rule, and
The word is a candidate path represented by the concatenation of the primary segment and / or the secondary segment, and includes the following (a) to
It is preferable to create it based on the candidate path created by the process including the process (d). Here, m is an integer of 2 or more.

【００１７】(a) 前記ｍ個の１次セグメントそれぞれを
前記Ｘ方向で区分けする座標を、切り出し候補位置Ｃｉ
（ｉ＝０〜ｍ）としたとき、前記ｍ個の１次セグメント
および前記作成した２次セグメントの中から、切り出し
候補位置Ｃ０が切り出し開始点となっているセグメント
をすべて抽出する処理。(A) Coordinates for dividing each of the m primary segments in the X direction are determined as candidate clipping positions Ci.
When (i = 0 to m), a process of extracting all the segments whose extraction candidate position C0 is the extraction start point from the m primary segments and the created secondary segments.

【００１８】(b) 前記(a) の処理にて抽出されたセグメ
ントそれぞれについて、そのセグメントの終了点側の切
り出し候補位置Ｃｊ（ｊ＝１〜ｍ）が切り出し開始位置
となっているため連接することができる他のセグメン
ト、該他のセグメントに前記と同様な切り出し候補位置
の関係となっているためさらに連接することができる他
のセグメントを、終了点側の切り出し候補位置がＣｍと
なっている他のセグメントが出現するまで、前記ｍ個の
１次セグメントおよび前記作成した２次セグメントの中
から抽出する処理。(B) With respect to each of the segments extracted in the process of (a), the segmentation candidate positions Cj (j = 1 to m) on the end point side of the segment are connected since the segmentation start position. Other segments that can be connected, and other segments that have the same relationship as the above-described segmentation candidate positions to the other segments, have another segment that can be further connected to the segment, and the segmentation candidate position on the end point side is Cm. A process of extracting from the m primary segments and the created secondary segments until another segment appears.

【００１９】(c) 前記(b) の処理において前記他のセグ
メントを抽出する度に、該他のセグメントまでで構成さ
れる候補パスのセグメント数が規定数以内か否かを判定
する処理。(C) Every time the other segment is extracted in the process of (b), a process of determining whether or not the number of segments of a candidate path including the other segment is within a specified number.

【００２０】(d) セグメント数が前記規定数以内の候補
パスで、かつ、候補パス中の最終セグメントの終了点側
切り出し候補位置がＣｍとなっている候補パスを、単語
作成のための候補パスとする処理。(D) A candidate path having the number of segments within the specified number and a candidate position for extracting the end point of the last segment in the candidate path being Cm is set as a candidate path for word creation. Processing.

【００２１】この好適例によれば、ｍ個の１次セグメン
トで規定される切り出し領域から、切り出し候補位置Ｃ
０が出発点でかつ切り出し候補位置Ｃｍが終了点で然も
セグメント数が規定数以内となっているセグメント列か
らなる候補パスが全て抽出される。抽出された候補パス
を構成しているセグメントそれぞれの認識結果（候補文
字コード）は認識処理にて既に判明しているので、抽出
された候補パスからは規定数以内の文字数からなる単語
が生成される。According to this preferred embodiment, from the cutout area defined by m primary segments, a cutout candidate position C
All the candidate paths consisting of the segment sequence in which 0 is the starting point, the cutout candidate position Cm is the end point, and the number of segments is within the specified number are extracted. Since the recognition result (candidate character code) of each of the segments constituting the extracted candidate path has already been found in the recognition processing, a word having a specified number of characters or less is generated from the extracted candidate path. You.

【００２２】上述した好適例における候補パス作成処理
は具体的には以下の(1) 〜(9) の処理を含む処理によ
り作成するのが好適である。Specifically, the candidate path creation process in the preferred embodiment described above is preferably created by a process including the following processes (1) to (9).

【００２３】(1) 前記ｍ個の１次セグメントそれぞれを
前記Ｘ方向で区分けする座標を、切り出し候補位置Ｃ０
〜Ｃｍとしたとき、着目した切り出し候補位置Ｃｉ（ｉ
＝０〜ｍ）がＣｍか否かを判定する第１の処理。(1) Coordinates for dividing each of the m primary segments in the X direction are designated as cutout candidate positions C0
To Cm, the cutout candidate position Ci (i
= 0 to m) is Cm.

【００２４】(2) 前記第１の処理でＣｉ＝Ｃｍと判定さ
れた場合に実行され、現在の候補パスを候補パスメモリ
に記録する第２の処理。(2) A second process which is executed when it is determined that Ci = Cm in the first process, and records the current candidate path in the candidate path memory.

【００２５】(3) 前記第１の処理でＣｉ≠Ｃｍと判定さ
れた場合に実行され、切り出し候補位置Ｃｉと切り出し
候補位置Ｃｊ（ｊ＝ｉ＋１）とに挟まれるセグメントＳ
ｋ＋１が存在するか否かを判定する第３の処理。(3) This is executed when Ci 場合 Cm is determined in the first processing, and the segment S sandwiched between the extraction candidate position Ci and the extraction candidate position Cj (j = i + 1)
Third processing for determining whether or not k + 1 exists.

【００２６】(4) 前記第３の処理でセグメントが存在す
ると判定された場合に実行され、前記セグメントＳｋ＋
１を候補パスに加えた場合に該候補パスのセグメント数
が規定数を越えないか否かを判定する第４の処理。(4) Executed when it is determined in the third processing that a segment exists, and the segment Sk +
A fourth process of determining whether the number of segments of the candidate path does not exceed a specified number when 1 is added to the candidate path.

【００２７】(5) 前記第４の処理で規定数以内と判定さ
れた場合に実行され、前記セグメントＳｋ＋１を前記候
補パスに追加する第５の処理。(5) A fifth process which is executed when it is determined in the fourth process that the number is within the specified number, and adds the segment Sk + 1 to the candidate path.

【００２８】(6) 前記第５の処理に続いて実行され、前
記切り出し候補位置Ｃｊを前記着目した切り出し候補位
置Ｃｉとみなして、前記第１の処理から再実行する第６
の処理。(6) The sixth processing which is executed subsequent to the fifth processing and is executed again from the first processing by regarding the cut candidate position Cj as the focused cut candidate position Ci.
Processing.

【００２９】(7) 前記第５の処理と前記第６の処理とを
実行して作成された候補パスについては、該候補パスに
最新に追加されたセグメントを該候補パスから削除する
第７の処理。(7) For the candidate path created by executing the fifth processing and the sixth processing, a seventh step is to delete the segment added to the candidate path from the candidate path. processing.

【００３０】(8) 前記第３の処理で否と判定された場
合、または前記第４の処理で否と判定された場合、また
は前記第７の処理が実行された場合に実行され、前記切
り出し候補位置を規定しているｊをｊ＝ｊ＋１に変更
し、かつ、変更したｊが前記ｍとの関係でｊ＞ｍを満た
すか否かを判定する第８の処理。(8) When the determination is negative in the third processing, when the determination is negative in the fourth processing, or when the seventh processing is performed, Eighth processing for changing j defining the candidate position to j = j + 1 and determining whether or not the changed j satisfies j> m in relation to m.

【００３１】(9) 前記第８の処理でｊ≦ｍと判定された
場合に実行され、前記第３の処理から再実行する第９の
処理。(9) A ninth process that is executed when j ≦ m is determined in the eighth process, and is re-executed from the third process.

【００３２】これら(1) 〜(9) の処理によれば、ｍ個の
１次セグメントで規定される切り出し領域から、切り出
し候補位置Ｃ０が出発点でかつ切り出し候補位置Ｃｍが
終了点で然もセグメント数が規定数以内となっているセ
グメント列から成る候補パスが全て抽出される。According to the processes (1) to (9), from the cutout area defined by the m primary segments, the cutout candidate position C0 is the starting point and the cutout candidate position Cm is the end point. All the candidate paths composed of the segment sequence in which the number of segments is within the specified number are extracted.

【００３３】さらにこの文字切り出し方法の発明を実施
するに当たり、１次セグメントを統合するときの前記所
定規則として、文字の並ぶ方向をＸ方向としたとき、該
Ｘ方向に連続しているｍ個の１次セグメントのうちの高
さが最高のセグメントの当該高さＨを求め、かつ、着目
する１次セグメントに対しＸ方向でＨ×Ｎの座標範囲に
存在する他の１次セグメントを該着目する１次セグメン
トに統合するという規則を用いるのが好適である（ただ
しＮは予め定めた値である）。Further, in implementing the invention of the character segmentation method, when the direction in which the characters are arranged is set to the X direction as the predetermined rule when the primary segments are integrated, m consecutive m characters are arranged in the X direction. The height H of the segment having the highest height among the primary segments is obtained, and another primary segment existing in the H × N coordinate range in the X direction with respect to the primary segment of interest is focused on. It is preferred to use the rule of merging into primary segments (where N is a predetermined value).

【００３４】なぜなら、手書き文字は、書き手や書き手
を取り巻く状況によって文字の大きさや文字の間隔等が
変化するのが普通である。したがって、行高さも変化す
ることが普通である。そこでセグメントの高さを考慮す
ることにより、上記の行高さの変動が考慮されることに
なる。The reason is that the size of a handwritten character, the space between characters, and the like usually change depending on the writer and the circumstances surrounding the writer. Therefore, it is common that the row height also changes. Therefore, by considering the height of the segment, the above variation in the row height is taken into account.

【００３５】その結果、書き手や書き手を取り巻く状況
によって変化する文字の大きさ等を考慮した条件で２次
セグメントを作成することができるので、妥当な２次セ
グメントを作成することができる。なおＮの値は経験的
（統計的）に決定するのが良い。この出願に係る発明者
の研究によれば、Ｎを１．２とすることで好ましい統合
が実現されることが分かっている。As a result, the secondary segment can be created under the condition in which the size of the character which changes depending on the writer or the situation surrounding the writer can be created, so that an appropriate secondary segment can be created. The value of N is preferably determined empirically (statistically). According to the research of the inventor of the present application, it is known that preferable integration is realized by setting N to 1.2.

【００３６】なお上述の文字切り出し方法の発明を実施
するため、以下のように文字切り出し装置を構成するの
が好適である。In order to carry out the above-described character extracting method, it is preferable to configure a character extracting device as follows.

【００３７】メモリに格納されている文字列についての
画像データを含む入力画像データから、黒ビットの塊領
域である１次セグメントをそれぞれ抽出するセグメント
抽出部と、抽出された各１次セグメントを所定規則に従
い統合し２次セグメントを作成するセグメント統合部
と、各１次セグメントおよび各２次セグメントそれぞれ
を文字認識する文字認識部と、該文字認識により得られ
る１次セグメントおよびまたは２次セグメントそれぞれ
の候補文字コードを組み合わせて単語を作成する単語作
成部と、前記作成された単語を単語辞書と照合する単語
照合部と、該単語照合の結果に基づいて１文字分の文字
切り出し領域を決定する結果選択部と、を具える文字切
り出し装置において、前記文字認識部を、各セグメント
それぞれの認識結果候補を最大Ｋ位まで求める構成と
し、前記単語照合部を、前記作成された単語が単語辞書
に存在するか否かにより示される単語レベル評価値を求
める構成とする。A segment extraction unit for extracting a primary segment, which is a block area of black bits, from input image data including image data of a character string stored in a memory; A segment integration unit that integrates to create a secondary segment according to rules, a character recognition unit that recognizes each primary segment and each secondary segment as a character, and a primary segment and / or a secondary segment obtained by the character recognition. A word creating unit that creates a word by combining candidate character codes, a word matching unit that matches the created word with a word dictionary, and a result of determining a character cutout area for one character based on the result of the word matching A character segmentation device comprising: A structure for obtaining auxiliary of up to K-position, the word collating unit, a word created in the above is configured to determine a word level evaluation value indicated by the presence or absence in the word dictionary.

【００３８】さらに、前記各１次セグメントおよび各２次セグメントそれぞ
れの形状特徴に基づく形状特徴評価値を求めるセグメン
ト形状評価値計算部と、前記文字認識部で作成される各セグメントごとの認識
結果候補における文字種の割合で示される認識評価値を
各セグメントごとに求めるセグメント認識評価値計算部
と、前記作成された単語ごとに、該単語を構成している各
文字の構成セグメントについての前記形状特徴評価値、
前記認識評価値およびセグメント数（前記構成セグメン
トが１次セグメントの場合は１、２次セグメントの場合
は統合されたセグメント数をいう）に基づいて算出され
る文字レベル評価値を求める文字レベル評価値計算部と
を具える。しかも、前記結果選択部を、各単語について
の前記単語レベル評価値同士を比較しそれにより優劣が
ついた場合は当該単語を選択し、単語レベル評価値が同
じ単語が複数存在した場合はそれら複数の単語同士の前
記文字レベル評価値同士を比較して１つの単語を選択
し、該選択した単語を構成しているセグメントそれぞれ
を、前記文字切り出し領域と決定する構成とする。ただ
し、前記Ｋは予め定めた正の整数である。A segment shape evaluation value calculation unit for obtaining a shape feature evaluation value based on a shape feature of each of the primary segment and the secondary segment; a recognition result candidate for each segment created by the character recognition unit A segment recognition evaluation value calculation unit that obtains, for each segment, a recognition evaluation value represented by the ratio of the character type in the above, and for each of the created words, the shape feature evaluation for the constituent segments of each character constituting the word value,
A character level evaluation value for calculating a character level evaluation value calculated based on the recognition evaluation value and the number of segments (1 when the constituent segment is a primary segment, the number of integrated segments when the constituent segment is a secondary segment) And a calculation unit. In addition, the result selection unit compares the word level evaluation values for each word, selects the word if the word level is superior, and selects the words if a plurality of words have the same word level evaluation value. The character level evaluation values of the words are compared with each other to select one word, and each of the segments constituting the selected word is determined as the character cutout region. Here, K is a predetermined positive integer.

【００３９】[0039]

【発明の実施の形態】以下、図面を参照してこの発明の
文字切り出し方法および文字切り出し装置の実施の形態
について説明する。しかしながら説明に用いる各図はこ
の発明を理解することができる程度に概略的に示してあ
る。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a character extracting method and a character extracting apparatus according to the present invention; However, the drawings used in the description are schematically shown to the extent that the present invention can be understood.

【００４０】図１は実施の形態の文字切り出し装置の構
成を示した図である。この文字切り出し装置１０は、制
御部１１、画像入力部１３、セグメント抽出部１５、セ
グメント統合部１７、セグメント形状評価値計算部１
９、文字認識部２１、セグメント認識評価値計算部２
３、単語作成部２５、単語照合部２７、単語レベル評価
値２９、文字レベル評価値計算部３１、結果選択部３３
および結果出力部３５を具える。FIG. 1 is a diagram showing a configuration of a character segmenting apparatus according to the embodiment. The character cutout device 10 includes a control unit 11, an image input unit 13, a segment extraction unit 15, a segment integration unit 17, and a segment shape evaluation value calculation unit 1.
9, character recognition unit 21, segment recognition evaluation value calculation unit 2
3. Word creation unit 25, word collation unit 27, word level evaluation value 29, character level evaluation value calculation unit 31, result selection unit 33
And a result output unit 35.

【００４１】これら構成成分１１〜３５は、コンピュー
タおよびその周辺装置によりそれぞれ構成することがで
きる。以下、各構成成分の構成および動作について順次
に説明することにより、この発明の文字切り出し方法の
詳細について併せて説明する。These components 11 to 35 can be constituted by a computer and its peripheral devices, respectively. Hereinafter, the structure and operation of each component will be sequentially described, and the details of the character segmentation method of the present invention will also be described.

【００４２】（制御部）制御部１１は、各構成成分１３
〜３５の動作を制御する。(Control Unit) The control unit 11 controls each component 13
To 35 are controlled.

【００４３】（画像入力部）画像入力部１３は、メモリ
（図示せず）を含んでいて、文字認識対象である原画像
データを入力し該メモリに格納する。具体的には、白黒
二値で表される原画像データを入力する。(Image Input Unit) The image input unit 13 includes a memory (not shown), inputs original image data to be subjected to character recognition, and stores it in the memory. Specifically, original image data represented by black and white binary is input.

【００４４】この画像入力部１３は、任意好適な構成と
することができる。例えばスキャナを有し原稿からの光
信号を光電変換して原画像データをメモリに取り込む構
成の画像入力部であったり、または、原画像データをそ
もそも格納している他のデータベースであっても良い。
もちろん、多値画像から二値画像を得る場合があっても
良い。The image input section 13 can have any suitable configuration. For example, the image input unit may have a configuration in which a scanner is provided and photoelectrically converts an optical signal from a document and takes in the original image data into a memory, or another database that stores the original image data in the first place. .
Of course, a binary image may be obtained from a multivalued image.

【００４５】（セグメント抽出部）セグメント抽出部１
５は、画像入力部１３のメモリに格納されている原画像
データから黒ビットの塊領域である１次セグメントをそ
の座標が分かる状態でそれぞれ抽出する。(Segment Extraction Unit) Segment Extraction Unit 1
5 extracts a primary segment, which is a block area of black bits, from the original image data stored in the memory of the image input unit 13 in a state where its coordinates are known.

【００４６】このセグメント抽出部１５でのセグメント
抽出処理は、従来から良く知られている黒ビット（黒点
ともいう）の水平方向、垂直方向の射影分布を利用する
方法で容易に行なうことができる。The segment extracting process in the segment extracting section 15 can be easily performed by a method using a horizontal and vertical projection distribution of a black bit (also referred to as a black point) which is well known in the art.

【００４７】具体的には、原画像データが格納されてい
るメモリを水平方向に走査し、黒点のヒストグラムを求
める。このヒストグラムにおける極小点それぞれを水平
方向についての切り出し候補位置Ｃｉとする。走査方向
を垂直方向に変えて同様の処理を行なって、垂直方向に
ついての切り出し候補位置を抽出する。切り出し候補位
置に囲まれる矩形の領域内に１次セグメントは含まれ
る。More specifically, a memory in which original image data is stored is scanned in the horizontal direction, and a histogram of black points is obtained. Each minimum point in the histogram is set as a candidate cutting position Ci in the horizontal direction. The same processing is performed by changing the scanning direction to the vertical direction, and the extraction candidate position in the vertical direction is extracted. The primary segment is included in a rectangular area surrounded by the extraction candidate positions.

【００４８】このセグメント抽出部１５の動作の理解を
深めるために、図２に、『弘三』とう原画像データ４０
から抽出された１次セグメントＳ０，Ｓ１，Ｓ２と、各
１次セグメントＳ０〜Ｓ２を文字の並ぶ方向（この例で
はＸ方向）で区分けする座標すなわち切り出し候補位置
Ｃ０〜Ｃ３とをそれぞれ示した。ただし図２中のＳ３
は、２次セグメントである。これについては後に説明す
る。In order to deepen the understanding of the operation of the segment extracting unit 15, FIG.
The primary segments S0, S1, and S2 extracted from, and the coordinates for segmenting each of the primary segments S0 to S2 in the direction in which the characters are arranged (X direction in this example), that is, cutout candidate positions C0 to C3 are shown, respectively. However, S3 in FIG.
Is the secondary segment. This will be described later.

【００４９】なお１次セグメントＳ０〜Ｓ２それぞれ
の、Ｘ方向開始座標Ｘｓ、Ｘ方向終了座標Ｘｅ、Ｙ方向
開始座標Ｙｓ、Ｙ方向終了座標Ｙｅそれぞれを、セグメ
ント抽出部１５は、内部のセグメント座標テーブル（図
示せず）に格納する。図３（Ａ）にＸｓ、Ｘｅ、Ｙｓお
よびＹｅの定義を示し、図３（Ｂ）に１次セグメントＳ
０〜Ｓ３についてのセグメント座標テーブルを模式的に
示した。例えば１次セグメントＳ１についてのＸｓ〜Ｙ
ｅは、Ｘｓ＝３７、Ｘｅ＝１０６、Ｙｓ＝２、Ｙｅ＝８
４であることが分かる。The X-direction start coordinates Xs, X-direction end coordinates Xe, Y-direction start coordinates Ys, and Y-direction end coordinates Ye of each of the primary segments S0 to S2 are stored in an internal segment coordinate table. (Not shown). FIG. 3A shows the definitions of Xs, Xe, Ys and Ye, and FIG.
The segment coordinate table for 0 to S3 is schematically shown. For example, Xs to Y for the primary segment S1
e is Xs = 37, Xe = 106, Ys = 2, Ye = 8
It turns out that it is 4.

【００５０】また、後に候補パスを作成する際に必要な
Ｘ方向についての切り出し候補位置（座標）Ｃ０〜Ｃ３
を、セグメント抽出部１７は内部の所定メモリ（図示せ
ず）に記憶する。Further, the cutout candidate positions (coordinates) C0 to C3 in the X direction necessary for creating a candidate path later.
Is stored in an internal predetermined memory (not shown).

【００５１】（セグメント統合部）セグメント統合部１
７は、セグメント抽出部１５で抽出された各１次セグメ
ントを所定規則に従い統合して２次セグメントを作成す
る。具体的には、隣接する複数の１次セグメントの形状
特徴を考慮し、統合しても１文字としての可能性がある
場合、それら１次セグメントを統合して２次セグメント
を作成する。ここでは、以下の手順で２次セグメントを
作成する。(Segment Integration Unit) Segment Integration Unit 1
7 creates a secondary segment by integrating the primary segments extracted by the segment extracting unit 15 according to a predetermined rule. Specifically, in consideration of the shape characteristics of a plurality of adjacent primary segments, if there is a possibility that they will be one character even if they are integrated, the primary segments are integrated to create a secondary segment. Here, a secondary segment is created by the following procedure.

【００５２】先ず、該Ｘ方向に連続しているｍ個の１次
セグメントのうちの高さが最高のセグメントの当該高さ
Ｈを求める。この高さＨは、各１次セグメントについて
のＹｓ座標とＹｅ座標との差を求めることで求まる。図
２の例の３個の１次セグメントの例で考えると、セグメ
ントＳ０のＹ座標差が８９−１＝８８であり、他のセグ
メントＳ１，Ｓ２のどれよりも、高さが高い。したがっ
て、図２の例の場合は、高さが最高のセグメントは、セ
グメントＳ０となる。First, the height H of the segment having the highest height among the m primary segments continuous in the X direction is obtained. The height H is obtained by calculating the difference between the Ys coordinate and the Ye coordinate for each primary segment. Considering the example of the three primary segments in the example of FIG. 2, the Y coordinate difference of the segment S0 is 89-1 = 88, which is higher than any of the other segments S1 and S2. Therefore, in the example of FIG. 2, the segment having the highest height is the segment S0.

【００５３】次に、着目する１次セグメントに対しＸ方
向でＨ×Ｎの座標範囲に存在する他の１次セグメントを
該着目する１次セグメントに統合して、２次セグメント
を作成する。Next, a secondary segment is created by integrating another primary segment present in the H × N coordinate range in the X direction with respect to the primary segment of interest into the primary segment of interest.

【００５４】この２次セグメント作成処理について、図
２に示した１次セグメントの説明図と、図３に示したセ
グメント座標テーブルと、図４に示したセグメント統合
処理の流れ図とを参照して、より具体的に説明する。This secondary segment creation processing will be described with reference to the explanatory diagram of the primary segment shown in FIG. 2, the segment coordinate table shown in FIG. 3, and the flow chart of the segment integration processing shown in FIG. This will be described more specifically.

【００５５】先ず、全入力セグメントそれぞれを始点と
したループ１の処理を開始する。そこで、着目セグメン
ト（図４ではセグメントＡと記す）として、先ず１次セ
グメントＳ０を始点としたループ１の処理を開始する
（図４のステップ４１〜４７）。First, the processing of the loop 1 starting from each of all the input segments is started. Therefore, as a target segment (referred to as segment A in FIG. 4), the processing of loop 1 starting from the primary segment S0 is first started (steps 41 to 47 in FIG. 4).

【００５６】すなわち、１次セグメントＳ０と、その右
に並ぶセグメントＢとしての１次セグメントＳ１との、
文字の並ぶ方向（ここではＸ方向）についての距離Ｄを
求める（図４のステップ４２，４３）。この距離Ｄは各
セグメントＳ０，Ｓ１それぞれの例えばＸｓ座標同士の
差により求まる。するとこの例では距離Ｄ＝３７−１＝
３６ということになる。That is, the primary segment S0 and the primary segment S1 as the segment B arranged to the right thereof
The distance D in the direction in which the characters are arranged (here, the X direction) is determined (steps 42 and 43 in FIG. 4). This distance D is obtained from, for example, the difference between the Xs coordinates of each of the segments S0 and S1. Then, in this example, the distance D = 37-1 =
That is 36.

【００５７】次に、距離ＤがＨ×Ｎの範囲か否かを判定
する（図４のステップ４４）。ここで、Ｎは予め定めた
値である。ここではＮ＝１．２とする。また、Ｈは上述
したようにここでは８８である。したがって、この場
合、Ｄ≦１．２×８８＝１０５．６を満たすか否かを判
定する。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). Here, N is a predetermined value. Here, N = 1.2. H is 88 here as described above. Therefore, in this case, it is determined whether or not D ≦ 1.2 × 88 = 105.6 is satisfied.

【００５８】図２の例の場合は１次セグメントＳ０と１
次セグメントＳ１との距離Ｄ＝３６は、Ｄ≦１．２×８
８の条件を満たすので、１次セグメントＳ１は１次セグ
メントＳ０に統合され、２次セグメントＳ３が作成され
る（図４のステップ４５、図２参照）。In the case of the example of FIG. 2, the primary segments S0 and S1
The distance D = 36 from the next segment S1 is D ≦ 1.2 × 8.
8, the primary segment S1 is integrated into the primary segment S0, and a secondary segment S3 is created (step 45 in FIG. 4, see FIG. 2).

【００５９】次に、ループ２が再実行されるので（図４
のステップ４６，４２）、今度は、１次セグメントＳ０
と１次セグメントＳ２との距離Ｄを求める（図４のステ
ップ４３）。この距離Ｄは１７３−１＝１７２である。Next, the loop 2 is executed again (FIG. 4
Steps 46 and 42), this time the primary segment S0
The distance D between the first segment S2 is obtained (step 43 in FIG. 4). This distance D is 173-1 = 172.

【００６０】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ０に対し１次セグメントＳ２は、Ｄ≦１．２×
８７の条件を満たさないので、統合されない。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S2 is D ≦ 1.2 × with respect to the primary segment S0.
Since the conditions of 87 are not satisfied, they are not integrated.

【００６１】次に着目する１次セグメントを１次セグメ
ントＳ１に変更してループ１の処理が開始される（図４
のステップ４１，４２）。そこで、１次セグメントＳ１
と１次セグメントＳ２との距離Ｄを求める（図４のステ
ップ４３）。この距離Ｄは１７３−３７＝１３６であ
る。The primary segment of interest is changed to the primary segment S1, and the processing of loop 1 is started (FIG. 4).
Steps 41 and 42). Therefore, the primary segment S1
The distance D between the first segment S2 is obtained (step 43 in FIG. 4). This distance D is 173-37 = 136.

【００６２】次に、この距離ＤがＨ×Ｎの範囲か否かを
判定する（図４のステップ４４）。この場合、１次セグ
メントＳ１に対し１次セグメントＳ２は、Ｄ≦１．２×
８７の条件を満たさないので、統合されない。Next, it is determined whether or not the distance D is in the range of H × N (step 44 in FIG. 4). In this case, the primary segment S2 is D ≦ 1.2 × with respect to the primary segment S1.
Since the conditions of 87 are not satisfied, they are not integrated.

【００６３】上記の手順で２次セグメントが作成され
る。これら作成した２次セグメントそれぞれの、Ｘ方向
開始座標Ｘｓ、Ｘ方向終了座標Ｘｅ、Ｙ方向開始座標Ｙ
ｓ、Ｙ方向終了座標Ｙｅそれぞれを、セグメント統合部
１７は、前記のセグメント座標テーブルに追加格納する
（図４のステップ４８）。A secondary segment is created by the above procedure. The X direction start coordinate Xs, X direction end coordinate Xe, and Y direction start coordinate Y of each of the created secondary segments.
The segment integrating unit 17 additionally stores each of the s and Y direction end coordinates Ye in the segment coordinate table (step 48 in FIG. 4).

【００６４】図５に、１次セグメントＳ０〜Ｓ２および
２次セグメントＳ３についてのセグメント座標テーブル
を模式的に示した。FIG. 5 schematically shows a segment coordinate table for the primary segments S0 to S2 and the secondary segment S3.

【００６５】また、上記の１次セグメントＳ０〜Ｓ２と
２次セグメントＳ３とに関して、ある切り出し候補位置
と他の切り出し候補位置との間にいかなるセグメントが
挟まれているかを整理したテーブル（これを「セグメン
トテーブル」という）を作成すると、図６のようにな
る。Further, with respect to the primary segments S0 to S2 and the secondary segment S3, a table in which what segments are interposed between a certain extraction candidate position and another extraction candidate position (this is referred to as " When a “segment table” is created, the result is as shown in FIG.

【００６６】このセグメントテーブルはグラフ理論でい
う隣接行列である。すなわち、開始切り出し点を行と
し、終了切り出し点を列とした隣接行列を考えると、そ
の要素にセグメント番号（ここではＳ０〜Ｓ３のいずれ
か）を与えることにより作成できるテーブルである。た
だし、図６において空白は、挟まれるセグメントが無い
ことを示している。This segment table is an adjacency matrix in graph theory. That is, considering an adjacency matrix in which the start cutout point is a row and the end cutout point is a column, the table can be created by giving a segment number (one of S0 to S3 in this case) to the element. However, a blank in FIG. 6 indicates that there is no segment to be sandwiched.

【００６７】この図６から、１次セグメントＳ０は、切
り出し候補位置Ｃ０と切り出し候補位置Ｃ１とに挟まれ
るセグメントであること、・・・、２次セグメントＳ３
は、切り出し候補位置Ｃ０と切り出し候補位置Ｃ２とに
挟まれるセグメントであること等が分かる。From FIG. 6, the primary segment S0 is a segment sandwiched between the cutout candidate position C0 and the cutout candidate position C1,..., The secondary segment S3
Is a segment sandwiched between the extraction candidate position C0 and the extraction candidate position C2.

【００６８】（セグメント形状評価値計算部）セグメン
ト形状評価値計算部１９は、各１次セグメントおよび前
記統合により作成される各２次セグメントそれぞれの形
状特徴に基づく評価値（これを形状特徴評価値という）
を求める。このセグメント形状評価値計算部１９を、例
えば、セグメントの縦横比を形状評価値として算出する
計算部とする。より具体的には、Ｘ方向に沿うセグメン
ト寸法をΔＸ、Ｙ方向に沿うセグメント寸法をΔＹとし
た場合（例えば図３（Ａ）参照）、下記（１）式により
形状評価値を求める。(Segment Shape Evaluation Value Calculation Unit) The segment shape evaluation value calculation unit 19 calculates an evaluation value based on the shape characteristics of each primary segment and each of the secondary segments created by the integration (this is a shape feature evaluation value). That)
Ask for. The segment shape evaluation value calculation unit 19 is, for example, a calculation unit that calculates an aspect ratio of a segment as a shape evaluation value. More specifically, when the segment dimension along the X direction is ΔX and the segment dimension along the Y direction is ΔY (see, for example, FIG. 3A), the shape evaluation value is obtained by the following equation (1).

【００６９】形状評価値＝ｍｉｎ（ΔＸ，ΔＹ）／ｍａｘ（ΔＸ，ΔＹ）（１）ここで、ΔＸ＝Ｘｅ−Ｘｓ、ΔＹ＝Ｙｅ−Ｙｓである。
また、ｍｉｎ（ΔＸ，ΔＹ）は、ΔＸおよびΔＹのうち
の小さい方の値をとり、ｍａｘ（ΔＸ，ΔＹ）は、ΔＸ
およびΔＹのうちの大きい方の値をとる意味である。Shape evaluation value = min (ΔX, ΔY) / max (ΔX, ΔY) (1) Here, ΔX = Xe−Xs and ΔY = Ye−Ys.
Also, min (ΔX, ΔY) takes the smaller value of ΔX and ΔY, and max (ΔX, ΔY) is ΔX
And ΔY means the larger value.

【００７０】このようにして求まる形状評価値は、セグ
メントの縦横比を０〜１の範囲で表した評価値である。
この場合、正方形に近いセグメント程すなわち形状評価
値が１に近いセグメント程、文字らしさが高いことを意
味している。The shape evaluation value obtained in this manner is an evaluation value representing the aspect ratio of the segment in the range of 0 to 1.
In this case, a segment closer to a square, that is, a segment whose shape evaluation value is closer to 1, means that the character-likeness is higher.

【００７１】（文字認識部）文字認識部２１は、各１次
セグメントおよび各２次セグメントそれぞれを１文字と
仮定して文字認識を行ない、その結果である候補文字コ
ードを格納する。文字認識処理自体は、従来公知の任意
の方法により行なうことができる。(Character Recognition Unit) The character recognition unit 21 performs character recognition on the assumption that each primary segment and each secondary segment is one character, and stores a candidate character code as a result. The character recognition processing itself can be performed by any conventionally known method.

【００７２】ただし、この発明における文字認識部２１
は、各セグメントについて最大Ｋ位まで（Ｋは予め定め
た正の整数である。ここではＫ＝１０とする）の候補文
字を求める文字認識部とする。However, the character recognition unit 21 in the present invention
Is a character recognizing unit that determines candidate characters up to the Kth position (K is a predetermined positive integer; here, K = 10) for each segment.

【００７３】図７（Ａ）には、１次セグメントＳ０に当
たる「弓」というセグメントを文字認識部２１が文字認
識処理し、その結果として、１位から１０位までの１０
個の候補文字が出力された例を示している。なお候補文
字は、実際は候補文字コードにより与えられる。また、
候補文字はもちろん１０個も出力されない場合もあり得
る。これら候補文字はセグメント認識評価値計算部２３
および単語作成部２５にて利用される（詳細は該当各部
で説明する。）。In FIG. 7A, the character recognition unit 21 performs a character recognition process on a segment called “bow” corresponding to the primary segment S0. As a result, 10 segments from the first place to the tenth place are obtained.
An example in which the number of candidate characters has been output is shown. Note that the candidate characters are actually given by candidate character codes. Also,
Of course, not ten candidate characters may be output. These candidate characters are calculated by the segment recognition evaluation value calculator 23.
And used by the word creation section 25 (details will be described in the corresponding sections).

【００７４】（セグメント認識評価値計算部）セグメン
ト認識評価値計算部２３は、各セグメントごとの最大Ｋ
位までの認識結果における文字種の割合で表される認識
評価値を求める。(Segment Recognition Evaluation Value Calculation Unit) The segment recognition evaluation value calculation unit 23 calculates the maximum K for each segment.
A recognition evaluation value represented by the ratio of the character type in the recognition result up to the rank is obtained.

【００７５】この場合のセグメント認識評価値計算部２
３は、各セグメントごとの最大Ｋ位までの認識結果中
に、(1) 記号・数字、(2) カタカナ、(3) ひらがな、
(4) 漢字の４種類の文字種がそれぞれいくつあるかを計
数し、かつ、それぞれの計数値を認識結果数Ｋで割るこ
とで、４種類の文字種ごとの認識評価値を算出する。In this case, the segment recognition evaluation value calculation unit 2
3 indicates (1) symbols / numbers, (2) katakana, (3) hiragana,
(4) Count the number of each of the four types of kanji, and divide each count value by the number of recognition results K to calculate a recognition evaluation value for each of the four types of characters.

【００７６】このセグメント認識評価値計算部２７での
処理の理解を深めるために、図７（Ａ）に示した「弓」
についての認識結果から認識評価値計算部２７で算出さ
れる認識評価値を、図７（Ｂ）に示した。この図７
（Ｂ）の場合は、１０個の認識結果中に、記号・数字と
認識された結果が４個、カタカナと認識された結果が２
個、ひらがなと認識された結果が０個、漢字と認識され
た結果が４個含まれているので、４種類の文字種の認識
評価値は、０．４、０．２、０、０．４になっている。In order to deepen the understanding of the processing in the segment recognition evaluation value calculation section 27, the "bow" shown in FIG.
FIG. 7B shows the recognition evaluation value calculated by the recognition evaluation value calculation unit 27 from the recognition result of the above. This FIG.
In the case of (B), out of the ten recognition results, four results were recognized as symbols and numbers, and two results were recognized as katakana.
Since there are 0 characters and 4 results recognized as Hiragana and 4 results recognized as Kanji, the recognition evaluation values of the four character types are 0.4, 0.2, 0, 0.4 It has become.

【００７７】（単語作成部）単語作成部２５は、上記文
字認識部２１により得られる１次セグメントおよび２次
セグメントそれぞれの候補文字コードを組み合わせて単
語を作成する。より具体的には、前記１次セグメントお
よびまたは２次セグメントの連接で表される候補パスと
いう形式に基づいて単語を作成する。なお、単語を作成
する際に文字数を規制した方が無用に長い単語が作成さ
れるのを防止することができるので、ここでは単語の文
字数を規制して（実際はセグメント数を規制して）以下
に説明するように単語を作成する。もちろん単語作成の
際に文字数を規制せずに単語を作成しても良い。(Word Creation Unit) The word creation unit 25 creates a word by combining the candidate character codes of the primary segment and the secondary segment obtained by the character recognition unit 21. More specifically, a word is created based on a form of a candidate path represented by the concatenation of the primary segment and / or the secondary segment. Note that limiting the number of characters when creating a word can prevent the creation of an unnecessarily long word, so here we limit the number of characters in the word (actually limit the number of segments). Create a word as described in. Of course, a word may be created without restricting the number of characters when creating the word.

【００７８】この実施の形態の単語作成部２５の構成お
よび動作について、図８を参照して以下に説明する。The configuration and operation of the word generator 25 of this embodiment will be described below with reference to FIG.

【００７９】単語作成部２５は、以下の(1) 〜(8) の手
段を含み前記１次セグメントおよびまたは２次セグメン
トの連接で表される候補パスを作成する。The word creating section 25 includes the following means (1) to (8) and creates a candidate path represented by the concatenation of the primary segment and / or the secondary segment.

【００８０】(1) ｍ個の１次セグメントそれぞれをＸ方
向で区分けする座標を、切り出し候補位置Ｃ０〜Ｃｍと
したとき、着目した切り出し候補位置Ｃｉ（ｉ＝０〜
ｍ）がＣｍか否かを判定する第１の手段（図８のステッ
プ６１）。(1) When the coordinates for dividing each of the m primary segments in the X direction are set as the extraction candidate positions C0 to Cm, the extracted extraction candidate positions Ci (i = 0 to 0)
First means for determining whether or not m) is Cm (step 61 in FIG. 8).

【００８１】(2) 前記第１の手段でＣｉ＝Ｃｍと判定さ
れた場合に動作し、現在の候補パスを候補パスメモリ
（図示せず）に記録する第２の手段（図８のステップ６
２）。(2) The second means for operating when the first means determines that Ci = Cm and recording the current candidate path in a candidate path memory (not shown) (step 6 in FIG. 8)
2).

【００８２】(3) 前記第１の手段でＣｉ≠Ｃｍと判定さ
れた場合に動作し、切り出し候補位置Ｃｉと切り出し候
補位置Ｃｊ（ｊ＝ｉ＋１）とに挟まれるセグメントＳｋ
＋１が存在するか否かを判定する第３の手段（図８のス
テップ６３〜６５）。(3) The operation is performed when Ci ≠ Cm is determined by the first means, and the segment Sk sandwiched between the extraction candidate position Ci and the extraction candidate position Cj (j = i + 1)
Third means for determining whether or not +1 exists (steps 63 to 65 in FIG. 8).

【００８３】(4) 前記第３の処理でセグメントＳｋ＋１
が存在すると判定された場合に動作し、候補パスＰにセ
グメントＳｋ＋１を加えた場合該候補パスのセグメント
数が規定数以内か否かを判定する第４の手段（図８のス
テップ７１）。(4) Segment Sk + 1 in the third process
Is operated when it is determined that the candidate path P exists, and when the segment Sk + 1 is added to the candidate path P, a fourth means for determining whether or not the number of segments of the candidate path is within a specified number (step 71 in FIG. 8).

【００８４】(5) 第４の手段にて規定数以内と判定され
た場合に動作し、前記セグメントＳｋ＋１を前記候補パ
スに追加する第５の手段（図８のステップ６６）。(5) Fifth means (step 66 in FIG. 8) which operates when the fourth means determines that the number is within the specified number, and adds the segment Sk + 1 to the candidate path.

【００８５】(6) 前記第５の手段に続いて動作し、前記
切り出し候補位置Ｃｊを前記着目した切り出し候補位置
Ｃｉとみなして、前記第１の手段の動作を開始させる第
６の手段（図８のステップ６７）。(6) A sixth means (FIG. 9) which operates following the fifth means and starts the operation of the first means by regarding the cut candidate position Cj as the focused cut candidate position Ci. 8 step 67).

【００８６】(7) 前記第５の手段および前記第６の手段
が動作した結果作成された候補パスについては、該候補
パスに最新に追加されたセグメントを該候補パスから削
除する第７の手段（図８のステップ６８）。(7) For a candidate path created as a result of the operation of the fifth means and the sixth means, a seventh means for deleting the segment most recently added to the candidate path from the candidate path. (Step 68 in FIG. 8).

【００８７】(8) 前記第３の手段が否と判定した場合、
前記第４の手段が規定数を越えると判定した場合、また
は前記第７の手段が動作した後に動作し、前記切り出し
候補位置を規定しているｊをｊ＝ｊ＋１に変更し、か
つ、変更したｊが前記ｍとの関係でｊ＞ｍを満たすか否
かを判定する第８の手段（図８のステップ６９，７
０）。(8) If the third means determines that no,
When the fourth means determines that the number exceeds the specified number, or when the seventh means is operated, it is operated, and j defining the cutout candidate position is changed to j = j + 1 and changed. Eighth means for determining whether j satisfies j> m in relation to m (steps 69 and 7 in FIG. 8)
0).

【００８８】(9) 前記第８の手段がｊ≦ｍと判定した場
合に動作し、前記第３の手段を動作させる第９の手段
（図８のステップ７０，６４）。(9) A ninth means which operates when the eighth means determines that j ≦ m, and operates the third means (steps 70 and 64 in FIG. 8).

【００８９】この単語作成部２５の理解を深めるため
に、候補パス作成処理の具体例を説明する。ただし、候
補パス作成処理の原理が説明されれば良いので、ここで
はセグメントの数と切り出し候補位置の数とを少なくし
た例により説明する。すなわち、切り出し候補位置がＣ
０〜Ｃ２の３個で、かつセグメントがＳ０〜Ｓ２の３個
で、然も各切り出し候補位置Ｃ０〜Ｃ２と各セグメント
Ｓ０〜Ｓ２との関係が図９に示したようなセグメントテ
ーブルで表される関係となっている場合での、候補パス
作成処理について以下に説明する。In order to deepen the understanding of the word creation unit 25, a specific example of the candidate path creation processing will be described. However, since the principle of the candidate path creation processing need only be explained, an example in which the number of segments and the number of extraction candidate positions are reduced will be described. That is, the cutout candidate position is C
There are three segments 0 to C2 and three segments S0 to S2, and the relationship between each of the cutout candidate positions C0 to C2 and each of the segments S0 to S2 is represented by a segment table as shown in FIG. The following describes the candidate path creation processing in the case where the relationship is satisfied.

【００９０】なお、切り出し候補位置Ｃ０は文字が並ぶ
方向の最初の切り出し候補位置、また、切り出し候補位
置Ｃ２は文字が並ぶ方向の最終（最右端）の切り出し候
補位置とする。The cutout candidate position C0 is the first cutout candidate position in the direction in which the characters are arranged, and the cutout candidate position C2 is the last (rightmost) cutout candidate position in the direction in which the characters are arranged.

【００９１】先ず、候補パスメモリ（図示せず）のパス
Ｐをクリアし、関数Ｆｕｎｃｔ（Ｃｉ，Ｐ）ここでは先
ず（Ｃ０，Ｐ）についての処理を開始する（図８のステ
ップ６０）。First, the path P in the candidate path memory (not shown) is cleared, and the process for the function Funct (Ci, P), here (C0, P), is first started (step 60 in FIG. 8).

【００９２】すなわち先ず、単語作成部２５は切り出し
候補位置ＣｉここではＣ０が最右端の切り出し候補位置
か否か（すなわちＣ０＝Ｃ２か否か）を判定する（図８
のステップ６１）。なお切り出し候補位置Ｃｉは、ここ
では、制御部１１がセグメント抽出部１５から単語作成
部２５に転送する。That is, first, the word creating section 25 determines whether or not the cut candidate position Ci, in this case, C0 is the rightmost cut candidate position (ie, whether or not C0 = C2) (FIG. 8).
Step 61). Here, the control unit 11 transfers the extraction candidate position Ci from the segment extraction unit 15 to the word creation unit 25.

【００９３】ここでＣ０は最右端の切り出し候補位置で
はないので、ステップ６３の処理に移る。すなわちｊ＝
ｉ＋１＝０＋１＝１の処理が行なわれる。その結果Ｃｊ
はＣ１になる。Since C0 is not the rightmost cut-out candidate position, the process proceeds to step 63. That is, j =
The processing of i + 1 = 0 + 1 = 1 is performed. As a result, Cj
Becomes C1.

【００９４】切り出し候補位置Ｃ０と切り出し候補位置
Ｃ１とに挟まれるセグメントＳｋ＋１を、セグメント抽
出部１５またはセグメント統合部１７から、制御部１１
は単語作成部２５に転送する。この図９の例の場合はセ
グメントＳ０が転送される。なお該当するセグメントが
無い場合は、制御部１１はその旨の信号（ＮＵＬＬ）を
単語作成部２５に転送する。The segment Sk + 1 sandwiched between the extraction candidate position C0 and the extraction candidate position C1 is sent from the segment extraction unit 15 or the segment integration unit 17 to the control unit 11
Is transferred to the word generator 25. In the case of the example of FIG. 9, the segment S0 is transferred. If there is no corresponding segment, the control unit 11 transfers a signal (NULL) to that effect to the word creation unit 25.

【００９５】単語作成部２５は、セグメントＳｋ＋１が
存在するか否かを判定する（図８のステップ６５）。こ
の場合はセグメントＳ０が存在するので、ステップ７１
に移る。The word creating section 25 determines whether or not the segment Sk + 1 exists (Step 65 in FIG. 8). In this case, since segment S0 exists, step 71
Move on to

【００９６】ステップ７１では、セグメントＳｋ＋１を
候補パスＰに加えて構成した列（セグメント列）のセグ
メント数が規定数以内か否かが判定される。ここでは規
定数を４（もちろん一例）と考える。この場合のセグメ
ント列のセグメントはＳ０のみであるのでセグメント数
は１であるから、規定数を満足するので、ステップ６６
に移る。In step 71, it is determined whether or not the number of segments of a column (segment column) formed by adding the segment Sk + 1 to the candidate path P is within a specified number. Here, it is assumed that the prescribed number is 4 (of course, an example). In this case, since the segment of the segment row is only S0 and the number of segments is 1, the specified number is satisfied.
Move on to

【００９７】ステップ６６では、セグメントＳ０を候補
パスＰに追加する処理がなされる。その結果、候補パス
Ｐ＝｛Ｓ０｝になる。その後、ステップ６７に移る。At step 66, processing for adding the segment S0 to the candidate path P is performed. As a result, the candidate path P = {S0}. Thereafter, the process proceeds to step 67.

【００９８】ステップ６７では、今度はＣｊを着目する
切り出し候補位置とするので、切り出し候補位置Ｃ１が
着目する切り出し候補位置Ｃｉとみなされる。すなわち
関数をＦｕｎｃｔ（Ｃ１，Ｐ）とする。そして、ステッ
プ６１の処理から処理を再開する。In step 67, since Cj is set as the target cutout candidate position this time, the cutout candidate position C1 is regarded as the target cutout candidate position Ci. That is, the function is set to Funct (C1, P). Then, the processing is restarted from the processing of step 61.

【００９９】したがって、単語作成部２５は今度は切り
出し候補位置Ｃ１が最右端の切り出し候補位置か否か
（すなわちＣ１＝Ｃ２か否か）を判定する（図８のステ
ップ６１）。Therefore, the word creating section 25 determines whether or not the cutout candidate position C1 is the rightmost cutout candidate position (ie, whether or not C1 = C2) (step 61 in FIG. 8).

【０１００】ここでＣ１は最右端の切り出し候補位置で
はないので、ステップ６３の処理に移る。すなわちｊ＝
ｉ＋１＝１＋１＝２の処理が行なわれる。その結果、Ｃ
ｊはＣ２になる。Here, since C1 is not the rightmost cut-out candidate position, the process proceeds to step 63. That is, j =
The processing of i + 1 = 1 + 1 = 2 is performed. As a result, C
j becomes C2.

【０１０１】切り出し候補位置Ｃ１と切り出し候補位置
Ｃ２とに挟まれるセグメントＳｋ＋１を、セグメント抽
出部１５またはセグメント統合部１７から、制御部１１
は単語作成部２５に転送する。この図９の例の場合はセ
グメントＳ１が転送される。The segment Sk + 1 sandwiched between the extraction candidate position C1 and the extraction candidate position C2 is sent from the segment extraction unit 15 or the segment integration unit 17 to the control unit 11
Is transferred to the word generator 25. In the case of the example of FIG. 9, the segment S1 is transferred.

【０１０２】単語作成部２５は、セグメントＳｋ＋１が
存在するか否かを判定する（図８のステップ６５）。こ
の場合はセグメントＳ１が存在するので、ステップ７１
に移る。The word generator 25 determines whether or not the segment Sk + 1 exists (Step 65 in FIG. 8). In this case, since segment S1 exists, step 71
Move on to

【０１０３】ステップ７１では、セグメントＳｋ＋１を
候補パスＰに加えて構成した列（セグメント列）のセグ
メント数が規定数以内か否かが判定される。この場合の
セグメント列のセグメント数は、Ｓ０およびＳ１の２個
であるから、規定数を満足するので、ステップ６６に移
る。In step 71, it is determined whether or not the number of segments in a column (segment column) formed by adding the segment Sk + 1 to the candidate path P is within a specified number. In this case, the number of segments in the segment row is two, S0 and S1, so that the specified number is satisfied.

【０１０４】ステップ６６では、セグメントＳ１を候補
パスＰに追加する処理がなされる。その結果、候補パス
Ｐ＝｛Ｓ０，Ｓ１｝になる。その後、ステップ６７に移
る。At step 66, processing for adding the segment S1 to the candidate path P is performed. As a result, the candidate path P = {S0, S1}. Thereafter, the process proceeds to step 67.

【０１０５】ステップ６７では、今度は切り出し候補位
置Ｃ２を着目する切り出し候補位置Ｃｉとみなす。すな
わち関数をＦｕｎｃｔ（Ｃ２，Ｐ）とする。そして、ス
テップ６１の処理から処理を再開する。In step 67, the extraction candidate position C2 is regarded as the extraction candidate position Ci of interest. That is, the function is set to Funct (C2, P). Then, the processing is restarted from the processing of step 61.

【０１０６】したがって、単語作成部２５は今度は切り
出し候補位置Ｃ２が最右端の切り出し候補位置か否か
（すなわちＣ２＝Ｃ２か否か）を判定する（図８のステ
ップ６１）。Therefore, the word creation section 25 determines whether or not the cutout candidate position C2 is the rightmost cutout candidate position (ie, whether or not C2 = C2) (step 61 in FIG. 8).

【０１０７】ここでＣ２は最右端の切り出し候補位置で
あるので、ステップ６２の処理に移る。したがって、候
補パスＰ＝｛Ｓ０，Ｓ１｝が候補パスメモリ（図示せ
ず）に記録される。これにより、始点がＣ０で、終点が
Ｃ２で、かつ、セグメント数が規定数以下である候補パ
スの１つとして、候補パスＰ＝｛Ｓ０，Ｓ１｝が作成さ
れる。また、ここまでの処理により、関数Ｆｕｎｃｔ
（Ｃ２，Ｐ）の処理が終了する。Here, since C2 is the rightmost cut-out candidate position, the process proceeds to step 62. Therefore, the candidate path P = {S0, S1} is recorded in the candidate path memory (not shown). As a result, a candidate path P = {S0, S1} is created as one of the candidate paths whose start point is C0, whose end point is C2, and whose number of segments is equal to or less than the specified number. Also, by the processing up to this point, the function Funct
The processing of (C2, P) ends.

【０１０８】この候補パスＰ＝｛Ｓ０，Ｓ１｝は、第５
の手段および第６の手段が動作した結果作成された候補
パスである。すなわちステップ６６、６７の処理が済ん
だ結果作成された候補パスである。そこで、今度は、ス
テップ６８に移る。このステップ６８では、候補パスＰ
＝｛Ｓ０，Ｓ１｝から、これに最新に追加されたセグメ
ントＳ１を削除する処理をする。この結果、候補パスＰ
＝｛Ｓ０｝になる。This candidate path P = {S0, S1} is the fifth
And candidate paths created as a result of the operation of the means and the sixth means. That is, it is a candidate path created as a result of the processing of steps 66 and 67. Therefore, the process proceeds to step 68. In this step 68, the candidate path P
= S0, S1}, the segment S1 newly added to this is deleted. As a result, the candidate path P
= {S0}.

【０１０９】次に、ｊ＝ｊ＋１とする（図８のステップ
６９）。ここで現在のｊは２であるので、ｊ＝２＋１＝
３となる。Next, j = j + 1 is set (step 69 in FIG. 8). Here, the current j is 2, so j = 2 + 1 =
It becomes 3.

【０１１０】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図８のステップ７０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 70 in FIG. 8).

【０１１１】この場合のｊ＝３は、最大切り出し候補位
置２を越えているので、関数Ｆｕｎｃｔ（Ｃ１，Ｐ）の
処理が終了する。そこで、今度は元の関数であるＦｕｎ
ｃｔ（Ｃ０，Ｐ）についてステップ６８からの処理をす
る。Since j = 3 in this case is beyond the maximum clipping candidate position 2, the processing of the function Funct (C1, P) ends. Therefore, this time the original function Fun
The processing from step 68 is performed for ct (C0, P).

【０１１２】したがって、候補パスＰ＝｛Ｓ０｝から、
これに最新に追加されたセグメントＳ０を削除する処理
をする。この結果、候補パスＰ＝｛｝＝０になる。Therefore, from the candidate path P = {S0},
Then, a process of deleting the segment S0 added most recently is performed. As a result, the candidate path P = {｛} = 0.

【０１１３】次に、ｊ＝ｊ＋１とする（図８のステップ
６９）。ここで現在のｊは１であるので、ｊ＝１＋１＝
２となる。Next, j = j + 1 is set (step 69 in FIG. 8). Here, the current j is 1, so j = 1 + 1 =
It becomes 2.

【０１１４】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図８のステップ７０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 70 in FIG. 8).

【０１１５】この場合のｊ＝２は、最大切り出し候補位
置２を越えていないので、ステップ６４からの処理が行
なわれる。そのため、切り出し候補位置Ｃ０と切り出し
候補位置Ｃ２とに挟まれるセグメントＳｋ＋１が存在す
るか否かの判定がなされる。この場合のセグメントＳｋ
＋１として、セグメントＳ２が存在するので（図１３参
照）、候補パスＰにセグメントＳ２を加えたセグメント
列のセグメント数が規定数以内か否かを判定する。Since j = 2 in this case does not exceed the maximum clipping candidate position 2, the processing from step 64 is performed. Therefore, it is determined whether or not there is a segment Sk + 1 sandwiched between the extraction candidate position C0 and the extraction candidate position C2. Segment Sk in this case
Since the segment S2 exists as +1 (see FIG. 13), it is determined whether or not the number of segments in the segment string obtained by adding the segment S2 to the candidate path P is within a specified number.

【０１１６】このセグメント列のセグメントはＳ２だけ
であるので、規定数３以内を満足する。したがって候補
パスＰにセグメントＳ２を加える。その結果、候補パス
Ｐ＝｛Ｓ２｝になる。その後、ステップ６７に移る。Since the segment of this segment row is only S2, it satisfies the specified number 3 or less. Therefore, the segment S2 is added to the candidate path P. As a result, the candidate path P = {S2}. Thereafter, the process proceeds to step 67.

【０１１７】この場合のＣｊは２になっているので、こ
のステップ６７では、今度は切り出し候補位置Ｃ２を着
目する切り出し候補位置Ｃｉとみなす。すなわち関数を
Ｆｕｎｃｔ（Ｃ２，Ｐ）とする。そして、ステップ６１
の処理から処理を再開する。Since Cj is 2 in this case, in this step 67, the extraction candidate position C2 is regarded as the extraction candidate position Ci of interest. That is, the function is set to Funct (C2, P). And step 61
Processing is restarted from the processing of.

【０１１８】したがって、単語作成部２５は今度は切り
出し候補位置Ｃ２が最右端の切り出し候補位置か否か
（すなわちＣ２＝Ｃ２か否か）を判定する（図８のステ
ップ６１）。Therefore, the word creating section 25 determines whether or not the cutout candidate position C2 is the rightmost cutout candidate position (ie, whether or not C2 = C2) (step 61 in FIG. 8).

【０１１９】ここでＣ２は最右端の切り出し候補位置で
あるので、ステップ６２の処理に移る。したがって、候
補パスＰ＝｛Ｓ２｝が候補パスメモリ（図示せず）に記
録される。これにより、始点がＣ０で、終点がＣ２で、
かつ、セグメント数が規定数以下である候補パスの１つ
として、候補パスＰ＝｛Ｓ２｝が作成される。また、こ
こまでの処理で関数Ｆｕｎｃｔ（Ｃ２，Ｐ）についての
処理が終了する。Since C2 is the rightmost cut-out candidate position, the process proceeds to step 62. Therefore, the candidate path P = {S2} is recorded in the candidate path memory (not shown). Thus, the starting point is C0, the ending point is C2,
In addition, a candidate path P = {S2} is created as one of the candidate paths whose number of segments is equal to or less than the specified number. Further, the processing for the function Funct (C2, P) is completed by the processing up to this point.

【０１２０】この候補パスＰ＝｛Ｓ２｝は、第５の手段
および第６の手段が動作した結果作成された候補パスで
ある。すなわちステップ６６、６７の処理が済んだ結果
作成された候補パスである。そこで、今度は、ステップ
６８に移る。このステップ６８では、候補パスＰ＝｛Ｓ
２｝から、これに最新に追加されたセグメントＳ２を削
除する処理をする。この結果、候補パスＰ＝｛｝＝０
になる。This candidate path P = {S2} is a candidate path created as a result of the operation of the fifth means and the sixth means. That is, it is a candidate path created as a result of the processing of steps 66 and 67. Therefore, the process proceeds to step 68. In this step 68, the candidate path P = ｛S
From 2｝, processing is performed to delete the segment S2 that has been added to this most recently. As a result, the candidate path P = {｛} = 0
become.

【０１２１】次に、ｊ＝ｊ＋１とする（図８のステップ
６９）。ここで現在のｊは２であるので、ｊ＝２＋１＝
３となる。Next, j = j + 1 is set (step 69 in FIG. 8). Here, the current j is 2, so j = 2 + 1 =
It becomes 3.

【０１２２】次に、ｊが最大切り出し候補位置か否か
（ｊ＞ｍか否か）を判定する（図８のステップ７０）。Next, it is determined whether or not j is the maximum clipping candidate position (whether or not j> m) (step 70 in FIG. 8).

【０１２３】この場合のｊ＝３は、最大切り出し候補位
置２を越えているので、関数Ｆｕｎｃｔ（Ｃ０，Ｐ）の
処理が終了する。Since j = 3 in this case exceeds the maximum clipping candidate position 2, the processing of the function Funct (C0, P) ends.

【０１２４】この図８を用い説明した処理は、再帰的ア
ルゴリズムと呼ばれる処理である。文字列の左端のセグ
メントから、右端のセグメントまでを順に再帰的に辿る
ことができる処理である。The process described with reference to FIG. 8 is a process called a recursive algorithm. This is a process capable of recursively tracing from the leftmost segment of the character string to the rightmost segment.

【０１２５】この処理に従えば、Ｃ０を開始点とするセ
グメントが全て抽出される。しかも、この抽出されたセ
グメントの修了点が開始点となって連接する他のセグメ
ントがさらに順次に抽出される。しかも、Ｃ０を開始点
としかつＣ２を終了点とし然もセグメント数が規定数以
下であるセグメント列（１個のセグメントの場合も含
む）で構成される候補パスが容易に作成される。According to this process, all segments starting from C0 are extracted. In addition, other connected segments with the end point of the extracted segment as a starting point are further sequentially extracted. In addition, a candidate path composed of a sequence of segments (including one segment) whose start point is C0 and whose end point is C2 and whose number of segments is equal to or less than a specified number is easily created.

【０１２６】この図８を用いて説明した処理を、図２に
示した原画像データ４０についての１次セグメントＳ０
〜Ｓ２および２次セグメントＳ３に適用して候補パスを
作成すると、「Ｓ０→Ｓ１→Ｓ２」という候補パスＰ０
と、「Ｓ３→Ｓ２」という候補パスＰ１とが作成され
る。候補パスＰ０を構成するＳ０、Ｓ１およびＳ２それ
ぞれのセグメントごとの認識候補を組み合わせて単語が
多数作成される。ただしここでは説明の都合上、候補パ
スＰ０による単語の一例として、「弓ム三」という
単語の例を考える。また、候補パスＰ１を構成するＳ３
およびＳ２それぞれのセグメントごとの認識候補を組み
合わせて単語が多数作成される。ただしここでは説明の
都合上、候補パスＰ１による単語の一例として、「弘
三」という単語の例を考える。The processing described with reference to FIG. 8 is applied to the primary segment S0 of the original image data 40 shown in FIG.
To S2 and the secondary segment S3 to create a candidate path, the candidate path P0 of “S0 → S1 → S2”
And a candidate path P1 of “S3 → S2” is created. Many words are created by combining the recognition candidates for each of the segments S0, S1, and S2 that constitute the candidate path P0. However, here, for convenience of explanation, as an example of the word based on the candidate path P0, an example of the word “Yumu 3” is considered. Also, S3 constituting the candidate path P1
A number of words are created by combining the recognition candidates for each of the segments S2 and S2. However, here, for convenience of explanation, an example of the word “Kozo” will be considered as an example of the word based on the candidate path P1.

【０１２７】（単語照合部）単語照合部２７は、単語作
成部２５で作成された単語を単語辞書（図示せず）とそ
れぞれ照合する。上記の例の場合であれば、「弓ム
三」という単語と、「弘三」という単語とが単語辞書と
照合される。(Word Matching Unit) The word matching unit 27 checks the word created by the word creating unit 25 with a word dictionary (not shown). In the case of the above example,
The word "3" and the word "Kozo" are checked against the word dictionary.

【０１２８】（単語レベル評価値計算部）単語レベル評
価値計算部２９は、単語作成部２５で作成された単語が
単語辞書に存在するか否かに応じ、前記作成された単語
ごとに評価値を求める。ここでは、前記作成された単語
が単語辞書に存在する場合は「１」という評価値を、ま
た存在しない場合は「０」という評価値を、単語レベル
評価値計算部２９は該当する単語の単語レベル評価値と
して格納する。なお、この実施の形態では、「弓ム
三」という単語は単語辞書に存在しないため、単語レベ
ル評価値として「０」が、また、「弘三」という単語は
単語辞書に存在するため、単語レベル評価値として
「１」が、単語レベル評価値計算部２９に格納されたと
仮定する。(Word Level Evaluation Value Calculation Unit) The word level evaluation value calculation unit 29 determines an evaluation value for each of the created words in accordance with whether or not the word created by the word creation unit 25 exists in the word dictionary. Ask for. Here, when the created word exists in the word dictionary, the evaluation value of “1” is used. When the word does not exist, the evaluation value of “0” is used. Store as level evaluation value. Note that in this embodiment,
Since the word "3" does not exist in the word dictionary, the word level evaluation value is "0". Since the word "Kozo" exists in the word dictionary, the word level evaluation value is "1". It is assumed that the data is stored in the evaluation value calculation unit 29.

【０１２９】（文字レベル評価値計算部）文字レベル評
価値計算部３１は、単語作成部２５で作成された単語ご
とに、該単語を構成している各文字の構成セグメントに
ついての形状特徴評価値、認識評価値およびセグメント
数に基づいて評価値（これを文字レベル評価値という）
を求める。これについて図１０（Ａ）〜（Ｃ）を参照し
て具体的に説明する。(Character Level Evaluation Value Calculation Unit) The character level evaluation value calculation unit 31 calculates, for each word created by the word creation unit 25, the shape feature evaluation value for the constituent segment of each character constituting the word. , Recognition evaluation value and evaluation value based on the number of segments (this is called character level evaluation value)
Ask for. This will be specifically described with reference to FIGS.

【０１３０】図１０（Ａ）の場合、単語作成部で作成さ
れた候補パスＰ１を示している。すなわち、ここでは単
語「弘三」を示している。この単語は、図２からも分か
るように、「弘」という２次セグメントＳ３と、「三」
という１次セグメントＳ２とで構成される。FIG. 10A shows the candidate path P1 created by the word creating section. That is, the word “Kozo” is shown here. As can be seen from FIG. 2, this word is composed of a secondary segment S3 called "Hiro" and a "San"
And a primary segment S2.

【０１３１】またこの図１０（Ａ）の例の場合の形状評
価値および認識評価値それぞれは、「弘」という２次セ
グメントについてと、「三」という１次セグメントにつ
いてそれぞれ求められる。これら評価値の求め方は既に
説明した通りである。ここでは、「弘」というセグメン
トについての形状評価値および認識評価値が図１０
（Ｂ）に示す値であり、「三」というセグメントについ
ての形状評価値および認識評価値が図１０（Ｃ）に示す
値であると仮定する。The shape evaluation value and the recognition evaluation value in the case of the example of FIG. 10A are obtained for the secondary segment “Hiro” and for the primary segment “III”, respectively. The method of obtaining these evaluation values is as described above. Here, the shape evaluation value and the recognition evaluation value for the segment “Hiro” are shown in FIG.
It is assumed that the shape evaluation value and the recognition evaluation value for the segment “3” are the values shown in FIG.

【０１３２】また、「弘」の文字の構成セグメントは、
上述した通り「弘」という２次セグメントＳ３である。
しかし、「弘」という２次セグメントは、既に説明した
ように「弓」および「ム」の２個の１次セグメントを統
合したものであるので、結局、「弘」という文字を構成
しているセグメント数は２ということになる。また、
「三」の文字の構成セグメントは、図２からも分かるよ
うに「三」という１個の１次セグメントのみであるの
で、「三」という文字を構成しているセグメント数は１
ということになる。The constituent segments of the character "Hiro" are:
As described above, this is the secondary segment S3 called “Hiro”.
However, since the secondary segment "Hiro" is a combination of the two primary segments "bow" and "mu" as described above, the character "hiro" is eventually formed. The number of segments is two. Also,
As can be seen from FIG. 2, the constituent segment of the character "3" is only one primary segment "3", so the number of segments forming the character "3" is 1
It turns out that.

【０１３３】次に、単語「弘三」についての文字レベル
評価値を、図１０（Ｂ）、（Ｃ）に示した形状評価値お
よび認識評価値と、上記のセグメント数とに基づいて求
める。ここでは、下記の（２）式に従い文字レベル評価
値を求める。Next, the character level evaluation value for the word “Kozo” is obtained based on the shape evaluation value and the recognition evaluation value shown in FIGS. 10B and 10C, and the number of segments. Here, the character level evaluation value is obtained according to the following equation (2).

【０１３４】[0134]

【数１】 (Equation 1)

【０１３５】ここで、（２）式において、Ｐ_j は候補パ
ス集合、α_i はｉ文字目の文字についての構成セグメン
トのセグメント数（「弘」に関しての２や「三」に関し
ての１）、Ｋ_i はｉ文字目の文字についての構成セグメ
ントの形状評価値、Ｒ_i はｉ文字目の文字についての構
成セグメントの認識評価値である。Here, in equation (2), P _j is a set of candidate paths, α _i is the number of segments of the constituent segment for the i-th character (2 for “Hiro” and 1 for “3”), K _i is the shape evaluation value of the constituent segment for the _i- th character, and R _i is the recognition evaluation value of the constituent segment for the i-th character.

【０１３６】上記の単語「弘三」についての、文字レベ
ル評価値を上記（２）式により求めると、次のようにな
る。When the character level evaluation value for the above-mentioned word “Kozo” is obtained by the above equation (2), the result is as follows.

【０１３７】ｍａｘ［２×０．８３×（０，０，０，１）＋０．８９×（０，０．４，０．４，０．２）］＝ｍａｘ［（０，０，０，１．６６）＋（０，０．３５６，０．３５６，０．１７８）＝ｍａｘ［（０＋０，０＋０．３５６，０＋０．３５６，１．６６＋０．１７８）］＝ｍａｘ［（０，０．３５６，０．３５６，１．８３８）］＝１．８３８となる。Max [2 × 0.83 × (0,0,0,1) + 0.89 × (0,0.4,0.4,0.2)] = max [(0,0,0, 1.66) + (0,0.356,0.356,0.178) = max [(0 + 0,0 + 0.356,0 + 0.356,1.66 + 0.178)] = max [(0,0.0) 356, 0.356, 1.838)] = 1.838.

【０１３８】すなわち、ｍａｘ［（０，０．３５６，
０．３５６，１．８３８）］における、最大値である
１．８３８が文字レベル評価値となる。That is, max [(0, 0.356,
0.356, 1.838)], the maximum value of 1.838 is the character level evaluation value.

【０１３９】なお、ｍａｘ［（０，０．３５６，０．３
５６，１．８３８）］における、０，０．３５６，０．
３５６，１．８３８それぞれは、単語「弘三」に関して
の、記号・数字についての認識評価値、カタカナについ
ての認識評価値、ひらがなについての認識評価値およ
び、漢字についての認識評価値とする（後の図１０のＰ
１の欄参照）。Note that max [(0, 0.356, 0.3
56, 1.838)].
Each of 356 and 1.838 is a recognition evaluation value for the symbol "", a recognition evaluation value for katakana, a recognition evaluation value for hiragana, and a recognition evaluation value for kanji for the word "Kozo" (see later). Of FIG. 10
1).

【０１４０】また、単語作成部２５で作成された他の単
語についても文字レベル評価値を上記の手順で求める。
ここでは、他の単語として候補パスＰ０による「弓ム
三」という単語についての文字レベル評価値を求め
る。Further, the character level evaluation values of the other words created by the word creating section 25 are obtained by the above procedure.
Here, the character level evaluation value for the word “Yu-mu-san” in the candidate path P0 is obtained as another word.

【０１４１】この「弓ム三」という単語は、既に説
明したように、１次セグメントＳ０、Ｓ１およびＳ２で
構成される。それぞれのセグメントＳ０〜Ｓ２の形状評
価値および認識評価値は図１１のＳ０〜Ｓ２の各欄に記
載された通りであると仮定する。すると、「弓ム
三」という単語についての文字レベル評価値は、上記
「弘三」の場合の手順で求めると、図１０の候補パスＰ
０の欄に示したごとく、０．６９２となる。The word "bow" is composed of the primary segments S0, S1, and S2, as described above. It is assumed that the shape evaluation value and the recognition evaluation value of each segment S0 to S2 are as described in each column of S0 to S2 in FIG. Then, the bow
The character level evaluation value for the word "3" is obtained by the procedure for "Kozo" described above, and the candidate path P in FIG.
As shown in the column of 0, it is 0.692.

【０１４２】（結果選択部）結果選択部３３は、単語作
成部で作成された単語について単語レベル評価値同士を
先ず比較し、これで優劣がついた場合は、その優位な単
語を文字領域切り出しのための単語として選択し、また
優劣がつかなかった場合は、文字レベル評価値同士を比
較して優位な単語を文字領領域切り出しのための単語と
して選択する。(Result Selection Unit) The result selection unit 33 first compares the word level evaluation values of the words created by the word creation unit, and if there is superiority or inferiority, cuts out the superior word in a character area. If no superiority or inferiority is found, the character level evaluation values are compared with each other, and the superior word is selected as a word for extracting a character area.

【０１４３】具体的には、上記の「弘三」と「弓ム
三」という単語それぞれの場合では、この結果選択部３
３は、単語レベル評価値同士の比較段階で「弘三」につ
いての単語レベル評価値が「１」、「弓ム三」につ
いての単語レベル評価値が「０」であるので、「弘三」
という単語を選択する。More specifically, the above “Kozo” and “Yu
In the case of each of the words "3", the result selection unit 3
No. 3 indicates that the word level evaluation value for “Kozo” is “1” and the word level evaluation value for “Yumu M3” is “0” in the comparison stage between the word level evaluation values.
Select the word

【０１４４】なお、単語作成部２５が作成した単語の中
に単語レベル評価値が「１」である単語が２以上存在し
てしまい、単語レベル評価値の比較同士では優劣がつか
ない場合の処理については後の（他の処理例）の項にて
説明する。Processing when two or more words having a word level evaluation value of "1" exist in the words created by the word creation unit 25, and no comparison is made between the word level evaluation values. Will be described later (other processing examples).

【０１４５】（結果出力部）結果出力部３５は、結果選
択部３３で決定された単語を構成する文字（セグメン
ト）についての切り出し候補位置を例えば制御部１１に
出力する。上記の例でいえば、「弘三」が決定された単
語になり、「弘」および「三」がこの単語を構成する文
字（セグメント）になり、切り出し候補位置として、Ｃ
０、Ｃ２およびＣ３それぞれが制御部１１に出力され
る。制御部１１はこの結果に基づいて文字切り出しを指
示することができる。(Result Output Unit) The result output unit 35 outputs to the control unit 11, for example, the cutout candidate positions for the characters (segments) constituting the word determined by the result selection unit 33. In the above example, "Kozo" is the determined word, "Hiro" and "3" are the characters (segments) constituting this word, and C
0, C2, and C3 are output to the control unit 11. The control unit 11 can instruct character cutout based on the result.

【０１４６】（他の処理例）上述の例では、単語作成部
２５が作成した複数の単語の中から結果選択部３３が文
字切り出し領域決定用の単語を選択する際、単語レベル
評価値同士の比較のみでその選択が行なえた。しかし、
単語作成部２５が作成した複数の単語の中から結果選択
部３３が文字切り出し領域決定用の単語を選択する際、
単語レベル評価値が「１」である単語が複数存在する場
合もある。その場合は、文字レベル評価値を用いて、文
字切り出し領域決定用の単語を選択する。このような例
について以下に説明する。(Other Processing Examples) In the above-described example, when the result selecting unit 33 selects a word for determining a character cut-out area from a plurality of words created by the word creating unit 25, the word level evaluation value Only the comparison made that choice. But,
When the result selecting unit 33 selects a word for determining a character cutout area from a plurality of words created by the word creating unit 25,
There may be a plurality of words whose word level evaluation value is “1”. In that case, a word for determining a character cutout area is selected using the character level evaluation value. Such an example will be described below.

【０１４７】ここでは、図１２に示したように『矢吹』
という原画像データ１０１の例を考える。なお、以下の
説明中の、セグメント抽出、セグメント統合、各評価値
の計算、文字認識、単語作成、単語照合等の各処理は、
原画像データ４０の例により既に詳細に説明した各手順
でなされるので、その詳細は、以下の説明では省略す
る。In this case, as shown in FIG.
Let us consider an example of original image data 101. In the following description, segment extraction, segment integration, calculation of each evaluation value, character recognition, word creation, word matching, etc.
Since the processing is performed in each procedure already described in detail using the example of the original image data 40, the details are omitted in the following description.

【０１４８】この『矢吹』という原画像データ１０１か
ら、セグメント抽出部１５は、１次セグメントを抽出す
る。ここでは、「矢」、「口」および「欠」という３つ
の１次セグメントＳ０，Ｓ１，Ｓ２がそれぞれ抽出され
る。なお、各１次セグメントＳ０〜Ｓ２を文字の並ぶ方
向（この例ではＸ方向）で区分けする座標すなわち切り
出し候補位置Ｃ０〜Ｃ３を、それぞれ図１２に示す。From the original image data 101 of "Yabuki", the segment extracting section 15 extracts a primary segment. Here, three primary segments S0, S1, and S2 of "arrow", "mouth", and "missing" are respectively extracted. Note that FIG. 12 shows coordinates that separate the primary segments S0 to S2 in the direction in which the characters are arranged (in this example, the X direction), that is, the cutout candidate positions C0 to C3.

【０１４９】次に、セグメント統合部１７は、上記の
「矢」、「口」および「欠」という３つの１次セグメン
トＳ０，Ｓ１，Ｓ２を統合する。この結果、「知」とい
う２次セグメントＳ３と、「吹」という２次セグメント
Ｓ４とが作成される（図１２参照）。Next, the segment integrating section 17 integrates the three primary segments S0, S1, and S2 of "arrow", "mouth", and "missing". As a result, a secondary segment S3 called “knowledge” and a secondary segment S4 called “blown” are created (see FIG. 12).

【０１５０】また、これら１次セグメントＳ０〜Ｓ２
と、２次セグメントＳ３，Ｓ４とに関するセグメント座
標テーブルは、図１３のＸｓ〜Ｙｅの欄に示したものと
なる。また、セグメント形状評価値計算部１９は、これ
ら１次セグメントＳ０〜Ｓ２と、２次セグメントＳ３，
Ｓ４とに関する形状評価値を、上述した（１）式に基づ
いてそれぞれ求める。求めた形状評価値は、図１３およ
び図１７それぞれの形状評価値の欄に示した値になる。Further, these primary segments S0 to S2
The segment coordinate table for the secondary segments S3 and S4 is as shown in the column of Xs to Ye in FIG. Further, the segment shape evaluation value calculation unit 19 calculates the primary segments S0 to S2 and the secondary segments S3 and S3.
A shape evaluation value for S4 is obtained based on the above-described equation (1). The obtained shape evaluation values are the values shown in the column of the shape evaluation values in FIGS.

【０１５１】またこれら１次セグメントＳ０〜Ｓ２と、
２次セグメントＳ３，Ｓ４とから作成されるセグメント
テーブルは、図１４に示したようになる。Further, these primary segments S0 to S2,
The segment table created from the secondary segments S3 and S4 is as shown in FIG.

【０１５２】また、文字認識部２１は、これら１次セグ
メントＳ０〜Ｓ２と、２次セグメントＳ３，Ｓ４それぞ
れについて、文字認識をして、各セグメントごとで最大
Ｋ位までの認識結果を求める。このようにして求めた各
セグメントＳ０〜Ｓ４それぞれの認識結果は、図１５に
示したような結果であるとする。The character recognizing section 21 performs character recognition on each of the primary segments S0 to S2 and the secondary segments S3 and S4, and obtains a recognition result up to the K-th position for each segment. It is assumed that the recognition result of each of the segments S0 to S4 obtained in this manner is a result as shown in FIG.

【０１５３】また、認識評価値計算部２３は、各セグメ
ントＳ０〜Ｓ４についての、４種の文字種について認識
評価値をそれぞれ求める。求めた認識評価値は、図１７
の認識評価値の欄に示した値になる。The recognition evaluation value calculation unit 23 obtains recognition evaluation values for four types of characters for each of the segments S0 to S4. The obtained recognition evaluation value is shown in FIG.
Are the values shown in the column of the recognition evaluation value.

【０１５４】単語作成部２５は、上記各セグメントＳ０
〜Ｓ４の認識結果を組み合わせて単語を作成する。作成
される単語を候補パスとして表現すると、図１６に示し
たようになる。すなわち、候補パスＰ０としてセグメン
トＳ０−Ｓ１−Ｓ２からなる候補パスと、候補パスＰ１
としてセグメントＳ３−Ｓ２からなる候補パスと、候補
パスＰ２としてセグメントＳ０−Ｓ３からなる候補パス
という３つの候補パスが生成される。実際の単語は、各
セグメントの候補結果を組み合わせたものとなるので、
候補パスＰ０については「矢口欠」、「矢口父」等、多
数である。候補パスＰ１、Ｐ２それぞれについても多数
である。The word creating section 25 determines whether each of the segments S0
A word is created by combining the recognition results of steps S4 to S4. Expressing the created word as a candidate path is as shown in FIG. That is, a candidate path composed of segments S0-S1-S2 as a candidate path P0 and a candidate path P1
, A candidate path composed of segments S3-S2 and a candidate path composed of segments S0-S3 are generated as candidate paths P2. The actual word is a combination of the candidate results for each segment,
There are many candidate paths P0, such as "missing Yaguchi" and "father Yaguchi". There are many candidate paths P1 and P2.

【０１５５】単語照合部２７は単語作成部２５が作成し
た多数の単語を単語辞書（図示せず）と照合する。ここ
では、候補パスＰ０により作成された各単語はいずれ
も、単語辞書に存在しなかったとする。また、候補パス
Ｐ１により作成された各単語のうちの「知見」という単
語が単語辞書中に存在していたとする。また、候補パス
Ｐ２により作成された各単語のうちの「矢吹」という単
語が単語辞書中に存在していたとする。したがって、単
語レベル評価値計算部２９は、候補パスＰ０についての
単語レベル評価値として「０」を格納し、また、候補パ
スＰ１による単語「知見」および候補パスＰ２による単
語「矢吹」それぞれの単語レベル評価値として「１」を
それぞれ格納する。The word collating unit 27 collates many words created by the word creating unit 25 with a word dictionary (not shown). Here, it is assumed that none of the words created by the candidate path P0 exists in the word dictionary. It is also assumed that the word “knowledge” among the words created by the candidate path P1 exists in the word dictionary. It is also assumed that the word “Yabuki” among the words created by the candidate path P2 exists in the word dictionary. Therefore, the word level evaluation value calculation unit 29 stores “0” as the word level evaluation value for the candidate path P0, and also stores the word “knowledge” by the candidate path P1 and the word “Yabuki” by the candidate path P2. "1" is stored as the level evaluation value.

【０１５６】一方、文字レベル評価値計算部３１は、候
補パスＰ０、Ｐ１、Ｐ２それぞれの文字レベル評価値
を、すでに説明したように形状評価値、認識評価値およ
びセグメント数に基づいて求める。この場合に求まる候
補パスＰ０、Ｐ１、Ｐ２それぞれの文字レベル評価値
は、１．８１、１．８８、２．２４５である。これら文
字レベル評価値を、図１７の文字レベル評価値の欄に示
した。On the other hand, the character level evaluation value calculation unit 31 obtains the character level evaluation values of the candidate paths P0, P1, and P2 based on the shape evaluation value, the recognition evaluation value, and the number of segments as described above. The character level evaluation values of the candidate paths P0, P1, and P2 obtained in this case are 1.81, 1.88, and 2.245. These character level evaluation values are shown in the column of character level evaluation value in FIG.

【０１５７】結果選択部３３は、単語作成部で作成され
た単語について単語レベル評価値同士を先ず比較する。
しかし『矢吹』という原画像データ１０１については、
それから作成された単語中には、上記したように、単語
辞書中に存在する単語として「知見」と「矢吹」とが存
在する。そのため、単語レベル評価値同士の比較では優
劣がつかない。The result selecting section 33 first compares the word level evaluation values of the words created by the word creating section.
However, for the original image data 101 called "Yabuki",
As described above, the words created therefrom include “knowledge” and “yabuki” as words existing in the word dictionary. Therefore, the comparison between the word level evaluation values does not give any advantage.

【０１５８】そこで、結果選択部３３は、今度は、文字
レベル評価値同士の比較をする。すると「知見」という
単語についての文字レベル評価値は１．８８であり、
「矢吹」という単語についての文字レベル評価値は２．
２４５であるので、「矢吹」という単語が文字領域切り
出しのための単語として選択される。Then, the result selecting section 33 compares the character level evaluation values with each other this time. Then, the character level evaluation value for the word “knowledge” is 1.88,
The character level evaluation value for the word "Yabuki" is 2.
Since it is 245, the word “Yabuki” is selected as a word for extracting a character area.

【０１５９】従来技術の場合では、もし「知見」という
単語が先に照合されてしまうと、「知見」という単語が
文字領域切り出しのための単語として選択されてしま
う。すると、文字の切り出しは、「知」と「見」という
ように行なわれる。ところが、本発明では、文字領域切
り出しのための単語として「知見」は選択されずに「矢
吹」が選択される。その結果、文字の切り出しも、
「矢」と「吹」というように行なわれる。したがって、
単語作成部２５が作成した単語の中に単語辞書に存在す
る単語が複数存在しても、本発明では、原画像に適合し
た単語を選択出来、そのため、原画像に適合した文字切
り出しが可能なことが理解できる。In the case of the prior art, if the word "knowledge" is collated first, the word "knowledge" is selected as a word for extracting a character area. Then, the character is cut out as "knowledge" and "look". However, according to the present invention, "yabuki" is selected without selecting "knowledge" as a word for extracting a character area. As a result, character cutout
It is performed like "arrow" and "blow". Therefore,
Even if there are a plurality of words existing in the word dictionary among the words created by the word creating unit 25, in the present invention, a word suitable for the original image can be selected, and therefore, character extraction suitable for the original image can be performed. I can understand.

【０１６０】結果出力部３５は、結果選択部３３で決定
された単語を構成する文字（セグメント）についての切
り出し候補位置を例えば制御部１１に出力する。上記の
例でいえば、「矢吹」が決定された単語になり、「矢」
および「吹」がこの単語を構成する文字（セグメント）
になり、切り出し候補位置として、Ｃ０、Ｃ１およびＣ
３それぞれが制御部１１に出力される。制御部１１はこ
の結果に基づいて文字切り出しを指示することができ
る。The result output unit 35 outputs, for example, to the control unit 11 the extraction candidate positions of the characters (segments) constituting the word determined by the result selection unit 33. In the above example, “Yabuki” is the determined word,
And the characters (segments) that make up this word
And C0, C1, and C
3 are output to the control unit 11. The control unit 11 can instruct character cutout based on the result.

【０１６１】[0161]

【発明の効果】上述した説明から明らかなように、この
出願の文字切り出し方法の発明によれば、原画像データ
から、黒ビットの塊領域である１次セグメントをそれぞ
れ抽出する処理と、各１次セグメントを統合して２次セ
グメントを作成する処理と、これら１次および２次セグ
メントそれぞれを文字認識する処理と、該文字認識によ
り得られる候補文字コードを組み合わせて単語を作成す
る処理と、作成された単語を単語辞書と照合する処理
と、該単語照合の結果に基づいて１文字分の文字切り出
し領域を決定する処理と、を含む文字切り出し方法にお
いて、：各セグメントそれぞれの形状特徴評価値を求
める処理と、：各セグメントそれぞれの認識結果候補
を最大Ｋ位まで求め、該認識結果候補における文字種の
割合で示される認識評価値を各セグメントごとに求める
処理と、：前記作成された単語ごとに、所定の文字レ
ベル評価値を求める処理と、：前記作成された単語が
単語辞書に存在するか否かにより示される単語レベル評
価値を求める処理と、：各単語についての前記単語レ
ベル評価値同士を比較しそれにより優劣がついた場合は
当該単語を選択し、単語レベル評価値が同じ単語が複数
存在した場合はそれら複数の単語同士の前記文字レベル
評価値同士を比較して１つの単語を選択し、該選択した
単語を構成しているセグメントそれぞれを、前記文字切
り出し領域とする処理とをさらに含む。そのため、作成
した単語の中に単語辞書中に存在する単語が複数あった
場合でも、文字レベル評価値を基準として文字領域切り
出しのための単語を選択することができる。As is clear from the above description, according to the invention of the character extracting method of the present application, a process of extracting a primary segment, which is a block region of black bits, from original image data, and A process of combining the next segments to create a secondary segment, a process of character recognition of each of the primary and secondary segments, a process of combining candidate character codes obtained by the character recognition to create a word, A character extraction method that includes a process of comparing the extracted word with a word dictionary and a process of determining a character extraction region for one character based on the result of the word comparison. A process of obtaining: a recognition result candidate of each segment is obtained up to the Kth position, and a recognition evaluation indicated by a ratio of a character type in the recognition result candidate For each segment: a process for obtaining a predetermined character level evaluation value for each of the created words; and a word level evaluation indicated by whether or not the created word exists in a word dictionary. A process of obtaining a value: comparing the word level evaluation values of the respective words, selecting the word if the word is superior, and selecting a plurality of the words if a plurality of words having the same word level evaluation value exist. Comparing the character level evaluation values of the words with each other to select one word, and setting each of the segments constituting the selected word as the character cutout region. Therefore, even when there are a plurality of words in the word dictionary among the created words, it is possible to select a word for extracting a character area based on the character level evaluation value.

【０１６２】ここで、この文字レベル評価値は、(1) 作
成された単語を構成している各文字が１文字らしいか否
かという点と、(2) 各文字が妥当な文字種のセグメント
同士で構成されているか否かという点とを考慮した評価
値といえる。そのため、文字レベル評価値を用いると、
１文字らしい形状でない文字を含む単語や、文字種から
いって通常考えられないセグメント同士の組み合わせと
なっている文字を含む単語は、文字切り出し領域を決め
るための単語として選択されにくくできる。したがっ
て、この文字レベル評価値を用いる本発明では、作成し
た単語の中に単語辞書中に存在する単語が複数あった場
合でもこれら複数の単語中から原画像に適合した単語を
選択し易いといえる。また、このように原画像に適合し
た単語から文字切り出し領域を決定することができるの
で、切り出し精度、認識精度の向上が図れる。Here, the character level evaluation value is determined by (1) whether each character constituting the created word is likely to be one character, and (2) the segment of each character having a valid character type. It can be said that this is an evaluation value in consideration of whether or not it is composed of Therefore, if the character level evaluation value is used,
A word including a character that does not have a shape like a single character or a word including a character that is a combination of segments that cannot be normally considered due to the character type can be hardly selected as a word for determining a character cutout region. Therefore, in the present invention using this character level evaluation value, even if there are a plurality of words in the word dictionary among the created words, it can be said that it is easy to select a word suitable for the original image from the plurality of words. . In addition, since the character cutout region can be determined from the word suitable for the original image, the cutout accuracy and the recognition accuracy can be improved.

【０１６３】また、この出願の文字切り出し装置の発明
によれば、上述した文字切り出し方法の発明を容易に実
施することができる。Further, according to the invention of the character extracting device of this application, the invention of the above-described character extracting method can be easily implemented.

[Brief description of the drawings]

【図１】実施の形態の文字切り出し装置の説明図であ
る。FIG. 1 is an explanatory diagram of a character cutout device according to an embodiment.

【図２】１次セグメントおよび２次セグメントの説明図
である。FIG. 2 is an explanatory diagram of a primary segment and a secondary segment.

【図３】（Ａ）および（Ｂ）は、統合前のセグメント座
標テーブルの例を示した図である。FIGS. 3A and 3B are diagrams showing examples of a segment coordinate table before integration.

【図４】セグメント統合の説明図である。FIG. 4 is an explanatory diagram of segment integration.

【図５】統合後のセグメント座標テーブルの例を示した
図である。FIG. 5 is a diagram showing an example of a segment coordinate table after integration.

【図６】統合後のセグメントテーブルの例を示した図で
ある。FIG. 6 is a diagram showing an example of a segment table after integration.

【図７】（Ａ）は文字認識部の説明図であり、（Ｂ）は
セグメント認識評価値計算部の説明図である。FIG. 7A is an explanatory diagram of a character recognition unit, and FIG. 7B is an explanatory diagram of a segment recognition evaluation value calculation unit.

【図８】単語作成部の説明図である。FIG. 8 is an explanatory diagram of a word creation unit.

【図９】候補パス作成処理の具体例（単語作成部の動
作）の説明図である。FIG. 9 is an explanatory diagram of a specific example of the candidate path creation processing (the operation of the word creation unit).

【図１０】（Ａ）〜（Ｃ）は、文字レベル評価値計算部
の説明図である。FIGS. 10A to 10C are explanatory diagrams of a character level evaluation value calculation unit.

【図１１】原画像データ４０についての評価値一覧を示
した図である。11 is a diagram showing a list of evaluation values for original image data 40. FIG.

【図１２】他の処理例の説明図（その１）であり、『矢
吹』という原画像データから抽出される１次セグメン
ト、これら１次セグメントを統合して得られる２次セグ
メントの説明図である。FIG. 12 is an explanatory diagram (part 1) of another processing example, illustrating a primary segment extracted from original image data “Yabuki” and a secondary segment obtained by integrating these primary segments. is there.

【図１３】他の処理例の説明図（その２）であり、セグ
メントＳ０〜Ｓ４についてのセグメント座標テーブルを
示した図である。FIG. 13 is an explanatory diagram (part 2) of another processing example, and is a diagram showing a segment coordinate table for segments S0 to S4.

【図１４】他の処理例の説明図（その３）であり、セグ
メントＳ０〜Ｓ４についてのセグメントテーブルを示し
た図である。FIG. 14 is an explanatory diagram (part 3) of another processing example, showing a segment table for segments S0 to S4.

【図１５】他の処理例の説明図（その４）であり、セグ
メントＳ０〜Ｓ４についての文字認識結果を示した図で
ある。FIG. 15 is an explanatory diagram (part 4) of another processing example, showing the character recognition results for segments S0 to S4.

【図１６】他の処理例の説明図（その５）であり、原画
像データ１０１から作成される候補パスの説明図であ
る。FIG. 16 is an explanatory diagram (part 5) of another processing example, and is an explanatory diagram of a candidate path created from original image data 101;

【図１７】他の処理例の説明図（その６）であり、原画
像データ１０１についての評価値一覧を示した図であ
る。FIG. 17 is an explanatory diagram (part 6) of another processing example, showing a list of evaluation values for original image data 101;

[Explanation of symbols]

１１：制御部１３：画像入力部１５：セグメント抽出部１７：セグメント統合部１９：セグメント形状評価値計算部２１：文字認識部２３：セグメント認識評価値計算部２５：単語作成部２７：単語照合部２９：単語レベル評価値計算部３１：文字レベル評価値計算部３３：結果選択部３５：結果出力部Ｓ０〜Ｓ２：セグメント（１次セグメント）Ｓ３，Ｓ４：セグメント（２次セグメント）Ｃ０〜Ｃ３：切り出し候補位置４０，１０１：原画像データ 11: Control unit 13: Image input unit 15: Segment extraction unit 17: Segment integration unit 19: Segment shape evaluation value calculation unit 21: Character recognition unit 23: Segment recognition evaluation value calculation unit 25: Word creation unit 27: Word collation unit 29: Word level evaluation value calculation unit 31: Character level evaluation value calculation unit 33: Result selection unit 35: Result output unit S0 to S2: Segment (primary segment) S3, S4: Segment (secondary segment) C0 to C3: Cutout candidate positions 40, 101: original image data

Claims

[Claims]

1. A process for extracting primary segments each of which is a black bit lump area from original image data including image data of a character string stored in a memory, and extracting each of the extracted primary segments by a predetermined process. Integrate according to the rules 2
A word is created by combining a process of creating a next segment, a process of character recognition of each primary segment and each secondary segment, and a candidate character code of each of the primary segment and secondary segment obtained by the character recognition. The character segmentation method includes: a process of comparing the created word with a word dictionary; and a process of determining a character segmentation area for one character based on a result of the word collation. And a process of obtaining a shape feature evaluation value based on the shape feature of each secondary segment; and determining the recognition result candidate of each segment up to the Kth position at the time of performing the character recognition. Processing for obtaining a recognition evaluation value to be indicated for each segment; and forming the word for each of the created words. The shape feature evaluation value, the recognition evaluation value, and the number of segments (the primary segment is 1 if the primary segment is a secondary segment, and the number of integrated segments if the secondary segment is a secondary segment) A process of calculating a character level evaluation value calculated based on the word processing, a process of calculating a word level evaluation value indicated by whether or not the word exists in the word dictionary when the created word is compared with the word dictionary; As the processing for determining a character cut-out area, the word level evaluation values of each word are compared with each other, and if the word is superior, the word is selected.If there are a plurality of words having the same word level evaluation value, The character level evaluation values of the plurality of words are compared with each other to select one word, and each of the segments constituting the selected word is selected. And a process for setting the character cutout area (where K is a predetermined positive integer).

2. The character extracting method according to claim 1, wherein the shape characteristic evaluation value is an aspect ratio of a segment.

3. The character segmenting method according to claim 1, wherein the secondary segment is defined as follows: when the direction in which characters are arranged is the X direction, m primary segments that are continuous in the X direction are determined according to a predetermined rule. The word is a candidate path represented by a concatenation of the primary segment and / or the secondary segment.
A character segmentation method (where m is an integer of 2 or more), which is created based on a candidate path created by a process including the process (d). (a) Coordinates for dividing each of the m primary segments in the X direction are determined as candidate clipping positions Ci (i = 0 to 0).
m), from among the m primary segments and the created secondary segment, a cutout candidate position C
A process of extracting all the segments for which 0 is the extraction start point. (b) With respect to each of the segments extracted in the process (a), the segmentation candidate position Cj (j = 1 to m) on the end point side of the segment is the segmentation start position, so that the segments can be connected. Other segments which have the same relationship with the other segment and have the same candidate clipping position as described above, and which can be further connected to each other, are replaced by other segments whose candidate candidate positions on the end point side are Cm. A process of extracting from the m primary segments and the created secondary segments until appears. (c) A process of determining whether or not the number of segments of a candidate path including the other segment is within a specified number every time the other segment is extracted in the process of (b). (d) A process in which a candidate path having a number of segments within the specified number and a candidate position for extracting the end point of the last segment in the candidate path being Cm is set as a candidate path for word creation. .

4. The character segmenting method according to claim 1, wherein the predetermined rule when integrating the primary segments is:
When the direction in which the characters are arranged is the X direction, the height H of the segment having the highest height among the m primary segments continuous in the X direction is obtained, and the primary segment of interest is determined. On the other hand, another one existing in the H × N coordinate range in the X direction
A character segmentation method using a rule of integrating a next segment into the primary segment of interest.

5. A segment extraction unit for extracting a primary segment, which is a block area of black bits, from input image data including image data of a character string stored in a memory, and each extracted primary segment. A segment integrating unit that creates a secondary segment by integrating the primary segment and the secondary segment according to predetermined rules, a character recognizing unit that recognizes each primary segment and each secondary segment, and a primary segment and / or a secondary segment obtained by the character recognition. A word creating unit that creates a word by combining the candidate character codes, a word matching unit that matches the created word with a word dictionary, and determines a character cutout area for one character based on the result of the word matching A character selecting unit comprising: a character selecting unit configured to recognize each of the segments; The word matching unit is configured to obtain a word level evaluation value indicated by whether or not the created word exists in a word dictionary. A segment shape evaluation value calculation unit for obtaining a shape feature evaluation value based on the shape characteristics of each of the next segments; and a recognition evaluation value indicated by a ratio of a character type in a recognition result candidate for each segment created by the character recognition unit. A segment recognition evaluation value calculation unit for each of the words, and for each of the created words, the shape feature evaluation value for a constituent segment of each character constituting the word;
A character level evaluation value for calculating a character level evaluation value calculated based on the recognition evaluation value and the number of segments (1 when the constituent segment is a primary segment, the number of integrated segments when the constituent segment is a secondary segment) A calculation unit, and the result selection unit compares the word level evaluation values for each word, and selects a word when the word level evaluation value is higher or lower, thereby selecting a word having the same word level evaluation value. When a plurality of words exist, the character level evaluation values of the plurality of words are compared with each other to select one word, and each of the segments constituting the selected word is set as the character cutout region. A character cutout device (where K is a predetermined positive integer).

6. The character cutout device according to claim 5, wherein the segment shape evaluation value calculation unit is a calculation unit that obtains an evaluation value based on the aspect ratio of the segment.