JPH0589279A

JPH0589279A - Character recognizing device

Info

Publication number: JPH0589279A
Application number: JP3251933A
Authority: JP
Inventors: Toshiaki Morita; 敏昭森田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-09-30
Filing date: 1991-09-30
Publication date: 1993-04-09

Abstract

PURPOSE:To provide a character recognizing device which takes picture data, which includes a character group having a prescribed meaning where at least two characters are arranged in a prerilinarily determined area for one character, as the input and recognizes each character in this character group in the same manner as normal characters. CONSTITUTION:A character recognizing device 20 is provided with a pattern storage part D1 where character group paterns and standard character patterns are stored with correspondence between their features, and a recognizing part 3 retrieves the storage part D1 based on extracted features of a character group segmented by a segmenting part 1 if the character group is included in given picture data. When the pertinent character pattern is found, a intra- group character segmenting part 5 segments each character of the segmented character group, and a feature extracting part 2 and the recognizing part 3 recognize each segmented character while retrieving the storage part D1 again.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は文字認識装置に関し、
特に、予め定められた一文字の領域に少なくとも２つ以
上の文字を配置して所定の意味を有した文字群を含む画
像データを入出力し、この文字群を通常の文字と異なる
と判断し、さらにこの文字群内の文字を一文字ずつ切出
して認識することのできる文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device,
Particularly, at least two or more characters are arranged in a predetermined one character area, image data including a character group having a predetermined meaning is input / output, and it is determined that this character group is different from a normal character, Furthermore, the present invention relates to a character recognition device that can recognize characters by cutting out the characters in the character group one by one.

【０００２】[0002]

【従来の技術】文字を認識する装置を文字認識装置とい
い、印字ないし手書きされた文字を読む光学的文字読取
器（ＯＣＲ：ｏｐｔｉｃａｌｃｈａｒａｃｔｅｒｒ
ｅａｄｅｒ）に代表される装置である。2. Description of the Related Art A device for recognizing characters is called a character recognizing device, which is an optical character reader (OCR) for reading printed or handwritten characters.
It is a device typified by an eader).

【０００３】ＯＣＲは、走査・光電変換部、認識処理部
および出力部を主要な構成要素としている。The OCR mainly comprises a scanning / photoelectric conversion section, a recognition processing section and an output section.

【０００４】まず認識したい文字が記載される資料は、
オペレータなどによって走査・光電変換部に送り込ま
れ、応じて走査・光電変換部は二次元の平面を持つこの
資料を走査し、光による文字の濃淡情報（１または０）
を電気の強弱信号に変換し、次段に接続される認識処理
部に与える。First, the material in which the character to be recognized is described is
It is sent to the scanning / photoelectric conversion unit by an operator or the like, and the scanning / photoelectric conversion unit scans this material having a two-dimensional plane in response, and the light and shade information of the character by light (1 or 0)
Is converted into an electric intensity signal and given to the recognition processing unit connected to the next stage.

【０００５】認識処理部は、文字認識方式の主要な部分
を実現している。認識方式は、対象である文字パターン
が１個独立に存在していることを前提として考えられて
いるのに対し、実際は資料上には多くの文字が連続的
に、または離散的に記されていることから、まず文字パ
ターンを１つ単独な形で切出す（セグメンテーションと
も呼ぶ）ことが必要である。この切出しは、切離された
文字枠、文字の存在する行の位置、文章の始めや終わり
などを指示する適当な印を予め認識すべき資料上に印刷
しておき、これを利用して行なわれる。The recognition processing section realizes the main part of the character recognition method. The recognition method is considered on the premise that one target character pattern exists independently, but in reality, many characters are written continuously or discretely in the material. Therefore, it is necessary to cut out a single character pattern (also called segmentation) first. This cutout is performed by printing a suitable character frame indicating the separated character frame, the position of the line where the character exists, the beginning and end of the sentence, etc. on the material to be recognized in advance, and using this. Be done.

【０００６】上述したように文字パターンが切出される
と、入力が文字パターンであり、出力がその文字名であ
るようなパターン処理が行なわれる。このパターン処理
を要約すれば、予め各文字名に対応する標準パターン
を、認識したい文字名すべてに対して記憶装置などに記
憶して準備しておき、入力文字パターンが与えられる
と、応じて記憶された標準の文字パターンと比較して最
も類似しているものを捜し出し、その結果特定された標
準パターンのもつ文字名が入力パターンのそれとするよ
うに認識するものである。When the character pattern is cut out as described above, pattern processing is performed such that the input is the character pattern and the output is the character name. To summarize this pattern processing, a standard pattern corresponding to each character name is stored in advance in a storage device for all the character names to be recognized, and when an input character pattern is given, it is stored accordingly. It finds the most similar one by comparing with the specified standard character pattern, and as a result, recognizes that the character name of the specified standard pattern is that of the input pattern.

【０００７】上述のようにして認識された文字名は、一
時的に結果記憶部などへ蓄えられる場合もあり、もし、
この文字認識装置が他の計算機のデータ入力装置として
使用されている場合には、出力部においてデータ処理さ
れて、その結果符号化された文字名が、次段の計算機へ
データ伝送される。The character name recognized as described above may be temporarily stored in the result storage section or the like.
When this character recognition device is used as a data input device of another computer, data processing is performed in the output section, and the resulting encoded character name is transmitted to the next computer.

【０００８】上述した予め準備される標準パターンは、
数字・英字・片仮名・平仮名・漢字などの文字パターン
を含んでいる。The standard pattern prepared in advance is
It contains character patterns such as numbers, letters, katakana, hiragana and kanji.

【０００９】[0009]

【発明が解決しようとする課題】図５は、文字群を含む
認識すべき文章データの一例を示す図である。図５に示
されるようにこの文章データには漢字一文字の印字領域
に、４文字あるいは５文字を配置して一つの意味を表す
文字群Ｇ１〜Ｇ３が含まれる。このような文字群を含む
文章は、最近、新聞や名刺などにおいても頻繁に見受け
られる。従来の、文字認識装置による文字切出手法であ
れば、図５に示された文書データの各文字を認識しよう
とする場合、文字群Ｇ１、Ｇ２およびＧ３は１文字とし
て誤認識される場合や、半角の２文字として誤認識され
る場合もあり、意味として重要な「株式会社」、「主任
研究員」などの情報は得られないという問題があった。FIG. 5 is a diagram showing an example of sentence data to be recognized including a character group. As shown in FIG. 5, this text data includes character groups G1 to G3 that represent one meaning by arranging four or five characters in the print area for one kanji character. Recently, sentences including such a character group have been frequently found in newspapers and business cards. In the case of the conventional character cutting method using a character recognition device, when trying to recognize each character of the document data shown in FIG. 5, the character groups G1, G2, and G3 may be erroneously recognized as one character, or In some cases, it may be erroneously recognized as two half-width characters, and there is a problem in that information such as "co. Ltd."

【００１０】それゆえにこの発明の目的は、予め定めら
れた１文字の領域に少なくとも２つ以上の文字を配置し
て所定の意味を有した文字群を、通常の文字とは異なる
判断し、さらに文字群内の文字を１文字ずつ切出して認
識を行なうことのできる文字認識装置を提供することで
ある。Therefore, an object of the present invention is to determine at least two or more characters in a predetermined one-character area and determine a character group having a predetermined meaning, which is different from normal characters. It is an object of the present invention to provide a character recognition device that can recognize characters by cutting out characters in a character group one by one.

【００１１】[0011]

【課題を解決するための手段】この発明にかかる文字認
識装置は、第１切出手段、第１特徴抽出手段、記憶手
段、パターン特定手段、第２切出手段、第２特徴抽出手
段、文字名特定手段および出力手段とを備えて構成され
る。A character recognition device according to the present invention comprises a first cutting means, a first characteristic extracting means, a storing means, a pattern specifying means, a second cutting means, a second characteristic extracting means, and a character. It comprises a name specifying means and an output means.

【００１２】第１切出手段は、予め定められた１文字の
領域に少なくとも２つ以上の文字を配置して所定の意味
を有した文字群を含む画像データを入力し、応じて文字
領域ごとにパターンを切出すよう構成される。The first cut-out means inputs image data including a character group having a predetermined meaning by arranging at least two or more characters in a predetermined one-character area, and in accordance with each character area. It is configured to cut out a pattern.

【００１３】第１の特徴抽出手段は、第１切出手段によ
り切出されたパターンの特徴を抽出するよう構成され
る。The first feature extraction means is configured to extract the features of the pattern cut out by the first cutout means.

【００１４】記憶手段は、予め認識したい文字名すべて
に対して個別に用意される文字パターンと、予め認識し
たい文字群すべてに対して個別に用意される文字領域内
の文字の配置状態を示す文字群パターンとを、それぞれ
の特徴を対応付けて記憶するように構成される。The storage means stores character patterns individually prepared for all character names to be recognized in advance and characters indicating the arrangement state of characters in a character area individually prepared for all character groups to be recognized in advance. The group pattern and the group pattern are configured to be stored in association with each other.

【００１５】パターン特定手段は、第１特徴抽出手段に
より抽出された特徴に基づいて、記憶手段を検索し、そ
の特徴が最も類似する文字パターンまたは文字群パター
ンを特定するよう構成される。The pattern specifying means is configured to search the storage means based on the feature extracted by the first feature extracting means and specify a character pattern or a character group pattern having the most similar feature.

【００１６】第２切出手段は、パターン特定手段により
特定されたパターンが文字群パターンであることに応じ
て、第１切出手段により切出された文字群の各文字のパ
ターンを切出すよう構成される。The second cutting-out means cuts out a pattern of each character of the character group cut out by the first cutting-out means in response to the pattern specified by the pattern specifying means being a character group pattern. Composed.

【００１７】第２特徴抽出手段は、第２切出手段により
切出されたパターンごとに特徴を抽出するよう構成され
る。The second feature extraction means is configured to extract a feature for each pattern cut out by the second cutout means.

【００１８】文字名特定手段は、第２特徴抽出手段によ
り抽出された各特徴に基づいて、記憶手段を検索し、そ
の特徴が最も類似する文字パターンごとに対応する文字
名を特定するよう構成される。The character name specifying means is configured to search the storage means on the basis of each characteristic extracted by the second characteristic extracting means, and specify a character name corresponding to each character pattern having the most similar characteristic. It

【００１９】出力手段は、文字名特定手段により特定さ
れた各文字名を、第１切出手段により予め切出された文
字群ごとに出力するように構成される。The output means is configured to output each character name specified by the character name specifying means for each character group cut out in advance by the first cutting means.

【００２０】また、この発明にかかる文字認識装置は上
述したような構成において、出力手段が出力する文字群
ごとに、該当する所定の情報を付加しながら出力する情
報付加手段をさらに備えて構成されてもよい。Further, the character recognition apparatus according to the present invention is configured as described above, further comprising information addition means for outputting while adding corresponding predetermined information for each character group output by the output means. May be.

【００２１】[0021]

【作用】この発明にかかる文字認識装置は、予め定めら
れた１文字の領域に少なくとも２つ以上の文字を配置し
て所定の意味を有した文字群と、通常の文字とを混在し
た画像データを入力し、文字群を通常の文字とは異なる
と判断し、さらに判断された文字群内の文字を１文字ず
つ切出し認識するために、まず記憶手段が設けられる。In the character recognition device according to the present invention, image data in which at least two or more characters are arranged in a predetermined one-character area and a group of characters having a predetermined meaning and normal characters are mixed is provided. First, a storage means is provided for recognizing that a character group is different from an ordinary character by inputting, and cutting out and recognizing the characters in the determined character group one by one.

【００２２】この記憶手段には、文字群のパターンは、
そこに含まれるそれぞれの文字間に空白があり、それは
通常の文字パターンとはかなり異なったパターンにて構
成されているということに着目し、各パターンごとに特
徴を対応させながら、通常の文字パターンと文字群であ
る文字群パターンとに分類して記憶する。In this storage means, the pattern of the character group is
Paying attention to the fact that there is a space between each character contained in it, and that it is composed of a pattern that is quite different from a normal character pattern, while making the characteristics correspond to each pattern, And a character group pattern, which is a character group, are stored.

【００２３】与えられる画像データを入力し、応じて第
１切出手段、第１特徴抽出手段およびパターン特定手段
により画像データ中にある文字群が特定された後は、特
定された文字群について第２切出手段により文字群内で
各文字パターンの切出が行なわれ、切出された個々の文
字パターンについて第２特徴抽出手段により特徴が抽出
され、この抽出された特徴に基づいて文字名特定手段は
記憶手段を検索し、該当する文字名を特定する。そし
て、出力手段は特定された各文字名を文字群ごとに出力
するので、通常の文字と同様に文字群についてもその表
わす意味を出力される各文字名にして得ることが可能と
なる。After the given image data is input and the character group in the image data is specified by the first cut-out means, the first feature extraction means and the pattern specifying means in response, the specified character group is Each character pattern is cut out in the character group by the two cutting-out means, the characteristics are extracted by the second characteristic extracting means for each cut-out character pattern, and the character name is specified based on the extracted characteristics. The means searches the storage means and specifies the corresponding character name. Then, since the output means outputs each specified character name for each character group, it is possible to obtain the meaning of the character group to be output as each character name in the same manner as a normal character.

【００２４】[0024]

【実施例】以下、この発明の一実施例について図面を参
照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings.

【００２５】図１は、本発明の一実施例による文字認識
装置の概略構成図である。文字認識装置２０は、たとえ
ば、図５に示される認識すべき文書データを画像データ
にして入力し、これについて文字認識処理する。このと
き、通常の文字は認識された文字名にして出力し、通常
の文字領域に「株式会社」の４文字が集約して示される
文字群Ｇ1 およびＧ２、「主任研究員」が集約して表示
される文字群Ｇ３については、文字群内の各文字を個別
に認識して各文字群が表示する意味を連続して出力する
文字名で表わすように構成される。FIG. 1 is a schematic block diagram of a character recognition device according to an embodiment of the present invention. The character recognition device 20 inputs, for example, the document data to be recognized shown in FIG. 5 as image data, and performs character recognition processing on this. At this time, normal characters are output as recognized character names, and the character groups G1 and G2 and "Principal Researcher" are displayed in which the four characters "corporation" are collectively displayed in the normal character area. The character group G3 to be displayed is configured so that each character in the character group is individually recognized and the meaning displayed by each character group is represented by a character name that is continuously output.

【００２６】文字認識装置２０は、切出部１、特徴抽出
部２、認識部３、文字群判定部４、文字群内切出部５、
項目認識部６および結果出力部７を備えるともに、認識
部３によりそこにストアされるデータが検索されて読出
される一種の記憶装置である認識標準パターン記憶部Ｄ
１、項目認識部６によりアクセスされてデータが読出さ
れる項目判定用データベースＤ２を備える。The character recognition device 20 includes a cutout unit 1, a feature extraction unit 2, a recognition unit 3, a character group determination unit 4, a character group cutout unit 5,
A recognition standard pattern storage unit D that is a kind of storage device that includes the item recognition unit 6 and the result output unit 7 and that is used by the recognition unit 3 to retrieve and read the data stored therein.
1. An item determination database D2 that is accessed by the item recognition unit 6 and whose data is read out.

【００２７】図２は、本発明の一実施例による認識標準
パターン記憶部Ｄ１のデータ記憶形式を説明するための
図である。FIG. 2 is a diagram for explaining the data storage format of the recognition standard pattern storage unit D1 according to one embodiment of the present invention.

【００２８】図２において認識標準パターン記憶部Ｄ１
は、通常の文字のカテゴリと文字群であるカテゴリとに
予め分類される。通常の文字のカテゴリは文字標準パタ
ーンに関するデータが格納され、アドレス０〜アドレス
（Ｎ−１）にそれらのデータがストアされる。また、文
字群のカテゴリは文字群標準パターンに関するデータが
ストアされ、アドレスＮ〜Ｎ＋６にこれらのデータがス
トアされる。In FIG. 2, the recognition standard pattern storage unit D1
Are pre-classified into normal character categories and character group categories. In the normal character category, data relating to a character standard pattern is stored, and the data is stored at addresses 0 to (N-1). Further, in the character group category, data relating to the character group standard pattern is stored, and these data are stored at addresses N to N + 6.

【００２９】図示されるように、認識標準パターン記憶
部Ｄ１は文字標準パターンごとおよび文字群標準パター
ンごとにそれらに関するデータをレコード形式にしてス
トアしており、各レコードはアドレス、文字パターンお
よび文字群パターンＩ１および濃度分布特徴データＩ２
を項目として含む。As shown in the figure, the recognition standard pattern storage section D1 stores data relating to each character standard pattern and each character group standard pattern in a record format, and each record stores an address, a character pattern and a character group. Pattern I1 and density distribution characteristic data I2
Is included as an item.

【００３０】濃度分布特徴データＩ２は、対応する文字
パターンおよび文字群パターンＩ１の所定の１文字の領
域（本実施例では、全角文字の領域）における画像の濃
度分布の状況を表わすデータがストアされる。The density distribution characteristic data I2 stores data representing the status of the density distribution of an image in a predetermined one-character area (a full-width character area in this embodiment) of the corresponding character pattern and character group pattern I1. It

【００３１】また文字群標準パターンがストアされる領
域においては、濃度分布特徴データＩ２は、対応する文
字パターンおよび文字群パターンＩ１にストアされるパ
ターンについての画像濃度の分布状況に関するデータが
ストアされる。この文字群標準パターンにストアされる
データＩ２は、文字群にはそれぞれの文字間に空白が存
在するという特徴データも併せてストアされる。In the area where the character group standard pattern is stored, the density distribution characteristic data I2 is stored as data relating to the distribution of image density of the corresponding character pattern and the pattern stored in the character group pattern I1. .. The data I2 stored in this character group standard pattern is also stored together with the characteristic data that there is a space between each character in the character group.

【００３２】次に、認識標準パターン記憶部Ｄ１にスト
アされる文字群標準パターンのパターン１〜パターン７
に関して説明する。Next, patterns 1 to 7 of the character group standard patterns stored in the recognition standard pattern storage unit D1.
Will be described.

【００３３】図３（ａ）ないし（ｇ）は、本発明の一実
施例による文字群標準パターンを説明する図である。FIGS. 3A to 3G are views for explaining a character group standard pattern according to an embodiment of the present invention.

【００３４】図示されるように、文字群が全角文字とし
て切出された場合には、図３（ａ）ないし（ｅ）に示さ
れるパターン１ないしパターン５の５つのパターンが発
生する。また、図３（ａ）および（ｇ）に示されるよう
に文字群が半角文字として切出されれば、パターン６と
パターン７の２つのパターンが発生する。これらパター
ン１ないしパターン７の文字群パターンは、１文字群内
における各文字の配置状態を表わすデータとも言える。As shown in the figure, when the character group is cut out as full-width characters, five patterns 1 to 5 shown in FIGS. 3A to 3E occur. When the character group is cut out as half-width characters as shown in FIGS. 3A and 3G, two patterns, pattern 6 and pattern 7, are generated. It can be said that the character group patterns of the patterns 1 to 7 are data representing the arrangement state of each character in one character group.

【００３５】図１に戻り、切出部１は画像データを入力
し、応じて画像データに含まれる文字を、例えば全角文
字の文字領域ごとに文字パターンを切出す。Returning to FIG. 1, the cutout unit 1 inputs the image data, and cuts out a character included in the image data, for example, a character pattern for each character area of full-width characters.

【００３６】特徴抽出部２は、与えられる文字パターン
について文字の特徴抽出（濃度分布特徴）を行なう。The feature extraction unit 2 performs feature extraction (density distribution feature) of characters for a given character pattern.

【００３７】認識部３は、切出された各文字パターンに
ついて、予め抽出された特徴に基づいて、認識標準パタ
ーン記憶部Ｄ１をアクセスし、文字標準パターンおよび
文字群標準パターンのすべてとマッチングを行ない、最
もその特徴が類似する標準パターンを特定して読出すよ
うに動作する。The recognition unit 3 accesses the recognition standard pattern storage unit D1 for each of the cut out character patterns based on the characteristics extracted in advance, and performs matching with all the character standard patterns and the character group standard patterns. , And operates to identify and read a standard pattern having the most similar features.

【００３８】文字群判定部４は、認識部３が読出した標
準パターンが文字群標準パターンであるか否かを判定す
る。この判定は、認識部３がマッチング処理をして、最
もその特徴が類似している標準パターンがストアされて
いたアドレス値に基づいて行なわれる。図２を参照して
わかるように、マッチング結果得られたアドレス値が、
アドレスＮ以上であれば、切出された文字パターンは、
文字群パターンであると判定される。このようにして文
字群パターンと判定されれば、判定された文字群内の個
々の文字の認識を行なうために、個別認識の処理に移行
する。The character group determination unit 4 determines whether the standard pattern read by the recognition unit 3 is a character group standard pattern. This determination is performed based on the address value at which the recognition unit 3 performs the matching process and the standard pattern having the most similar feature is stored. As can be seen from FIG. 2, the address value obtained as a result of the matching is
If the address is N or more, the extracted character pattern is
It is determined to be a character group pattern. If the character group pattern is determined in this way, the process proceeds to individual recognition processing in order to recognize each character in the determined character group.

【００３９】文字群内切出部５は、文字群判定部４で文
字群パターンであると判定された文字群の画像に対して
のみ水平・垂直方向に画像の濃淡度合いを表わすヒスト
グラムをとる。そして、このヒストグラムに基づいて、
文字群内のそれぞれの文字の切出を行なう。このように
して文字群から切出された各文字は、文字標準パターン
を有する。The character group cut-out unit 5 takes a histogram representing horizontal and vertical image density levels only for the image of the character group determined to be a character group pattern by the character group determination unit 4. And based on this histogram,
Cuts out each character in the character group. Each character cut out from the character group in this way has a character standard pattern.

【００４０】文字群内切出部５において切出された各文
字パターンは、特徴抽出部２および認識部３において前
述と同様に処理される。詳細には、抽出された各特徴に
基づいて認識標準パターン記憶部Ｄ１がアクセスされ
て、文字標準パターン、すなわちアドレス０〜アドレス
（Ｎ−１）の文字標準パターンについてのみマッチング
処理が行なわれる。これにより、文字群内のそれぞれの
文字について認識結果が得られる。この認識結果は、文
字群判定部４を通過し、項目認識部６に与えられる。Each character pattern cut out by the character group cutout unit 5 is processed in the feature extraction unit 2 and the recognition unit 3 in the same manner as described above. Specifically, the recognition standard pattern storage unit D1 is accessed based on each extracted feature, and the matching process is performed only for the character standard pattern, that is, the character standard pattern of address 0 to address (N-1). As a result, a recognition result is obtained for each character in the character group. The recognition result passes through the character group determination unit 4 and is given to the item recognition unit 6.

【００４１】項目認識部６は、たとえば図５に示される
文字群Ｇ１を認識部３が「株式会社」と認識した場合に
は、「株」、「式」、「会」、「社」を順次出力するよ
う構成される。When the recognizing unit 3 recognizes, for example, the character group G1 shown in FIG. 5 as "stock company", the item recognizing unit 6 recognizes "stock", "expression", "kai", and "company". It is configured to output sequentially.

【００４２】また、項目認識部６は、必要に応じて項目
判定用データベースＤ２を検索し、この検索結果により
得られた情報を、出力すべきデータに付加して出力する
よう動作する。項目判定用データベースＤ２は，たとえ
ば、この文字認識装置２０が名刺上に印刷された画像デ
ータから文字を認識する場合に、たとえば図５の文字群
Ｇ１が記される行の項目を「会社名」と判断したり、図
５の文字群Ｇ３が記される行の項目を「肩書」と判断す
るためにアクセスされるデータベースである。Further, the item recognition section 6 operates to search the item determination database D2 as necessary, and add the information obtained by this search result to the data to be output and output it. For example, when the character recognition device 20 recognizes a character from image data printed on a business card, the item determination database D2 sets the item of the line in which the character group G1 of FIG. This is a database that is accessed to determine that the item in the line in which the character group G3 in FIG. 5 is written is "title".

【００４３】結果出力部７は、項目認識部６が出力する
認識した文字名および項目判定用データベースＤ２から
読出された項目をそれに該当するコードに変換して装置
外部に出力する機能を備える。The result output unit 7 has a function of converting the recognized character name output by the item recognition unit 6 and the item read from the item determination database D2 into a corresponding code and outputting the code to the outside of the apparatus.

【００４４】図４は、前掲図１に示された文字認識装置
２０の動作を示す処理フロー図である。FIG. 4 is a processing flow chart showing the operation of the character recognition device 20 shown in FIG.

【００４５】次に、図１ないし図５を参照し、図４の処
理フローに従って装置２０の文字認識動作について説明
する。Next, with reference to FIGS. 1 to 5, the character recognition operation of the apparatus 20 will be described according to the processing flow of FIG.

【００４６】なお、認識標準パターン記憶部Ｄ１には、
予め認識したい文字名すべてに対しての文字パターンＩ
１およびその濃度分布特徴データＩ２がストアされると
ともに、予め認識したい文字群すべてに対してその文字
群パターンＩ１および対応する濃度分布特徴データＩ２
ストアされていると想定する。In the recognition standard pattern storage section D1,
Character pattern I for all the character names you want to recognize in advance
1 and its density distribution characteristic data I2 are stored, and the character group pattern I1 and corresponding density distribution characteristic data I2 are stored for all the character groups to be recognized in advance.
Assumed to be stored.

【００４７】また、項目判定用データベースＤ２には、
項目認識部６において項目を認識するために必要とされ
るデータが予めストアされていると想定する。Further, in the item judging database D2,
It is assumed that the data required for recognizing the item in the item recognizing unit 6 is stored in advance.

【００４８】文字認識装置２０は、たとえば、光学的に
読取られた図５に示されるような認識すべき文書データ
の濃淡の画像データが切出部１に与えることに応じて、
図４に示されるステップＳＴ１以降の処理を実行開始す
る。The character recognition device 20 responds to, for example, the optically read image data of the light and shade of the document data to be recognized as shown in FIG.
Execution of the processes after step ST1 shown in FIG. 4 is started.

【００４９】まず、ステップＳＴ１の処理において、切
出部１は与えられる濃淡画像データを入力し、行抽出お
よび文字切出を行なう。First, in the processing of step ST1, the cutout unit 1 inputs the given grayscale image data, and performs line extraction and character cutout.

【００５０】行抽出は、与えられた画像データに基づい
て、水平方向または垂直方向に画像の濃淡度合いを表わ
すヒストグラムをとり、このヒストグラムに基づいて濃
淡度合いの少ない部分を行間と判断し、応じて１行の抽
出を行なう。このときに、与えられる画像データが縦書
きの画像データであるか、横書きの画像データであるか
の判定も同時に行なわれる。In the line extraction, a histogram representing the degree of shading of the image in the horizontal direction or the vertical direction is taken based on the given image data, and a portion having a small degree of shading is judged to be the space between lines based on this histogram, and accordingly, Extract one line. At this time, it is also determined at the same time whether the supplied image data is vertically written image data or horizontally written image data.

【００５１】このように行抽出が行なわれると、次に文
字切出が行なわれる。文字切出は、抽出された行につい
て、垂直方向（縦書きの場合）に画像の濃淡度合いを表
わすヒストグラムをとり、そのヒストグラムに基づく濃
度のばらつきの状態から、１文字と思われる文字パター
ンの抽出を行なう。この文字パターンの抽出は、全角文
字ごと、または半角文字ごとの文字領域毎に行なわれ
る。When the line extraction is performed in this way, character cutting is next performed. For character extraction, a histogram that represents the degree of shading of an image in the vertical direction (in the case of vertical writing) is taken for the extracted line, and a character pattern that seems to be one character is extracted from the state of density variation based on the histogram. Do. The extraction of the character pattern is performed for each full-width character or each half-width character area.

【００５２】上述のようにして文字パターンが抽出され
ると、次のステップＳＴ４の処理において、認識に関す
る処理が行なわれる。この処理は、特徴抽出部２および
認識部３において行なわれる。When the character pattern is extracted as described above, the processing relating to recognition is performed in the processing of the next step ST4. This processing is performed by the feature extraction unit 2 and the recognition unit 3.

【００５３】特徴抽出部２は前述したように切出部１に
おいて切出された所定の文字領域ごとのパターンについ
て画像濃度分布特徴の抽出を行ない、この抽出された特
徴は認識部３に与えられる。また、特徴抽出部２は、図
３に示されるように与えられるパターンが文字群パター
ンである場合、各文字間にある空白の存在に関する特徴
抽出データも認識部３に与える。The feature extraction unit 2 extracts the image density distribution feature of the pattern for each predetermined character region cut out by the cutout unit 1 as described above, and the extracted feature is given to the recognition unit 3. .. In addition, when the pattern provided as shown in FIG. 3 is a character group pattern, the feature extraction unit 2 also provides the recognition unit 3 with the feature extraction data regarding the existence of a blank space between each character.

【００５４】認識部３は与えられる特徴データに基づい
て認識標準パターン記憶部Ｄ１をアクセスしてマッチン
グ処理を行なう。このマッチング処理の結果、その特徴
が最も類似する標準文字パターンまはた文字群パターン
Ｉ１が特定されると、この特定された文字パターンまた
は文字群パターンＩ１とそれがストアされるアドレスの
データとを文字群判定部４に与える。The recognition section 3 accesses the recognition standard pattern storage section D1 based on the given feature data and performs matching processing. As a result of this matching processing, when the standard character pattern or the character group pattern I1 having the most similar features is specified, the specified character pattern or character group pattern I1 and the address data at which it is stored. It is given to the character group determination unit 4.

【００５５】文字群判定部４は、与えられるアドレスの
データに基づいてステップＳＴ５の処理を実行する。ス
テップＳＴ５の処理において、文字群判定部４は与えら
れるアドレス値がＮより小さいか否かにより、切出され
たパターンは文字標準パターンであるか文字群標準パタ
ーンであるかを判定する。このとき、与えられるアドレ
ス値がＮ以上であれば、文字群判定部４は切出された文
字パターンは文字群標準パターンであることを判定し、
応じて次に示されるステップＳＴ６以降の処理に移る。
一方、文字群パターンでないことを判定すると、後述す
るステップＳＴ８以降の処理に移行する。The character group determination unit 4 executes the process of step ST5 based on the data of the given address. In the process of step ST5, the character group determination unit 4 determines whether the cut-out pattern is a character standard pattern or a character group standard pattern, depending on whether the given address value is smaller than N or not. At this time, if the given address value is N or more, the character group determination unit 4 determines that the cut-out character pattern is a character group standard pattern,
Accordingly, the process proceeds to step ST6 and subsequent steps shown next.
On the other hand, if it is determined that the character group pattern is not a character group pattern, the process proceeds to step ST8 and subsequent steps, which will be described later.

【００５６】ステップＳＴ６の処理においては切出され
た文字群パターン内の各文字についてパターン切出が行
なわれる。この各文字のパターン切出は、文字群内切出
部５において行なわれる。In the process of step ST6, pattern cutting is performed for each character in the cut-out character group pattern. The pattern cutout of each character is performed in the character group cutout unit 5.

【００５７】切出部５により文字群内の各文字が文字パ
ターンにして切出されると、その各文字パターンについ
て特徴抽出部２がステップＳＴ７の処理において前述し
たように濃度分布特徴の抽出を行ない、認識部３が、抽
出された特徴に基づいて認識標準パターン記憶部Ｄ１の
文字標準パターンをアクセスし、該当する文字パターン
Ｉ１を特定する。詳細には、文字群内切出部５は、切出
部１において切出された文字群パターンの画像に対して
のみ水平・垂直方向に画像の濃淡度合いを表わすヒスト
グラムをとり、このヒストグラムに基づきそれぞれの文
字パターンの切出を行なった後、各文字パターンについ
て特徴抽出を行なう。そして、抽出特徴に基づき記憶部
Ｄ１にストアされる文字標準パターンをアクセスしてマ
ッチング処理を行なう。このことにより、該文字群内の
それぞれの文字についての認識結果が、文字名にして得
られる。文字群パターンおよび通常の文字パターンにお
いて得られた認識結果である文字名は、項目認識部６に
与えられる。When each character in the character group is cut out into a character pattern by the cutout unit 5, the feature extraction unit 2 extracts the density distribution feature for each character pattern as described above in the process of step ST7. The recognizing unit 3 accesses the character standard pattern in the recognition standard pattern storage unit D1 based on the extracted features and identifies the corresponding character pattern I1. Specifically, the character group cutout unit 5 takes a histogram representing the degree of grayscale of the image in the horizontal and vertical directions only for the image of the character group pattern cut out by the cutout unit 1, and based on this histogram After extracting each character pattern, feature extraction is performed for each character pattern. Then, the character standard pattern stored in the storage unit D1 is accessed based on the extracted feature to perform matching processing. As a result, the recognition result for each character in the character group is obtained as a character name. The character name which is the recognition result obtained in the character group pattern and the normal character pattern is given to the item recognition unit 6.

【００５８】項目認識部６は、図４のステップＳＴ８の
処理を実行する。ステップＳＴ８の処理においては項目
判定が行なわれる。項目認識部６は、たとえば図５の文
字群Ｇ１が認識された場合には、「株」、「式」、
「会」および「社」にして順次、かつ連続して結果出力
部７に出力する。The item recognition section 6 executes the process of step ST8 in FIG. Item determination is performed in the process of step ST8. For example, when the character group G1 in FIG. 5 is recognized, the item recognition unit 6 recognizes “share”, “expression”,
The result is output to the result output unit 7 sequentially and continuously as “kai” and “company”.

【００５９】また、この文字認識装置２０が名刺上に印
刷された画像データを入力した場合には、その名刺上に
印字された文字群、たとえば図５の文字群Ｇ２およびＧ
３を認識すると、この認識結果に基づいて項目判定用デ
ータベースＤ２を検索し、該当する項目を読出し、認識
された文字群の各文字名とともに、読出された項目を出
力する。たとえば、図５の文字群Ｇ２が認識された場合
は、文字群Ｇ２の行を「会社名」という項目で特定し、
文字群Ｇ３が認識された場合は、文字群Ｇ３の行を「氏
名」という項目で認識するよう処理する。項目認識部６
が出力する項目認識結果および認識された文字名は結果
出力部７に導出される。When the character recognition device 20 inputs the image data printed on the business card, the character group printed on the business card, for example, the character groups G2 and G in FIG.
When 3 is recognized, the item determination database D2 is searched based on the recognition result, the corresponding item is read, and the read item is output together with each character name of the recognized character group. For example, when the character group G2 in FIG. 5 is recognized, the line of the character group G2 is specified by the item "company name",
When the character group G3 is recognized, the line of the character group G3 is processed to be recognized by the item "name". Item recognition unit 6
The item recognition result and the recognized character name output by are output to the result output unit 7.

【００６０】結果出力部７においてはステップＳＴ９に
示される出力に関する処理が行なわれる。結果出力部７
は導出された項目認識結果、および認識して得られた文
字名を印字出力、画像出力、音声出力、またコード出力
などするよう動作する。The result output unit 7 performs the output-related processing in step ST9. Result output unit 7
Operates to print out the derived item recognition result and the character name obtained by the recognition, output an image, output an audio, or output a code.

【００６１】また、この文字認識装置２０に計算機が接
続され、装置２０自体がこの計算機の入力装置である場
合、結果出力部７は計算機に対してそのデータ処理が容
易となるように、認識された文字名および項目認識結果
を、所定のコードに変換しながら出力するよう処理す
る。When a computer is connected to the character recognition device 20 and the device 20 itself is an input device of the computer, the result output unit 7 is recognized by the computer so that the data processing can be easily performed. The character name and the item recognition result are converted into a predetermined code and output.

【００６２】以上は、ステップＳＴ３の処理において切
出された１つのパターンに関する処理を説明したが、１
行に複数のパターンがある場合は、ステップＳＴ３ない
しステップＳＴ９に関する処理がその行に含まれるパタ
ーンの数だけ繰返し実行される。さらに、与えられる画
像データに複数の行が存在する場合は、ステップＳＴ２
ないしステップＳＴ９の処理が、ステップＳＴ３におけ
る各行の文字パターンの切出が繰返されながら、この画
像データに含まれる行の数だけさらに繰返される。The processing relating to one pattern cut out in the processing of step ST3 has been described above.
When a row has a plurality of patterns, the processes in steps ST3 to ST9 are repeatedly executed by the number of patterns included in the row. Further, when the given image data has a plurality of lines, step ST2
The process of step ST9 is further repeated by the number of lines included in this image data while the extraction of the character pattern of each line in step ST3 is repeated.

【００６３】また、認識されるべき文字群の情報が一定
の場合（たとえば、「株式会社」、「財団法人」な
ど）、および予めわかっている場合は、その知識を利用
して、上述した文字群の認識結果の正誤の確認、誤って
いる場合の自動修正も行なうような機能を追加してもよ
い。When the information of the character group to be recognized is constant (for example, "corporation", "foundation", etc.) or when it is known in advance, the knowledge is used to make the above-mentioned characters. A function for confirming whether the recognition result of the group is correct or not and automatically correcting if the recognition result is incorrect may be added.

【００６４】また、文字群内の各文字の配置も、縦書き
横書きで異なるため、図４のステップＳＴ２の処理にお
いて得られた行切出結果に伴う縦書きおよび横書きの判
定に基づき、縦書きまたは横書きの予め定められたルー
ルを参照して、文字群内の各文字の認識結果から文字名
の出力時の順序付けを行なうようにしてもよい。Further, since the arrangement of each character in the character group is also different between vertical writing and horizontal writing, vertical writing is performed based on the determination of vertical writing and horizontal writing accompanying the line cutting result obtained in the process of step ST2 of FIG. Alternatively, by referring to a predetermined rule of horizontal writing, the order in outputting the character names may be performed based on the recognition result of each character in the character group.

【００６５】[0065]

【発明の効果】以上のようにこの発明によれば、予め定
められた１文字の領域に少なくとも２つ以上の文字を配
置して所定の意味を有した文字群を含む画像データが入
力されても、この文字群を認識し、さらに文字群内にあ
る各文字を個別に認識し、文字群ごとにその文字群を構
成する各文字名を文字群毎に出力することができるの
で、このような文字群が表す意味を通常の文字と同様に
正確に認識できるという効果がある。As described above, according to the present invention, image data including a character group having a predetermined meaning in which at least two characters are arranged in a predetermined one-character area is input. Also, this character group can be recognized, each character in the character group can be recognized individually, and each character name that constitutes the character group can be output for each character group. There is an effect that the meaning represented by the character group can be recognized exactly as in the normal character.

【００６６】また、情報付加出力手段は、出力する文字
群ごとに該当の所定意味を予め分類した情報を付加しな
がら出力するので、この出力結果を用いれば、付加され
た情報を重要なキー単語にして、それに続く文字名から
なるデータを分類しながらファイルにしてデータ管理す
ることもできるという効果がある。Further, since the information addition / output means outputs while adding the information in which the corresponding predetermined meanings are classified in advance for each character group to be output, using this output result, the added information can be regarded as an important key word. Then, there is an effect that the data consisting of the character names that follow can be classified into files and managed.

[Brief description of drawings]

【図１】本発明の一実施例による文字認識装置の概略構
成図である。FIG. 1 is a schematic configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】本発明の一実施例による認識標準パターン記憶
部のデータ記憶形式を説明するための図である。FIG. 2 is a diagram for explaining a data storage format of a recognition standard pattern storage unit according to an embodiment of the present invention.

【図３】（ａ）ないし（ｇ）は、本発明の一実施例によ
る文字群標準パターンを説明する図である。FIGS. 3A to 3G are views for explaining a character group standard pattern according to an embodiment of the present invention.

【図４】図１に示される文字認識装置の動作を示す処理
フロー図である。FIG. 4 is a process flow chart showing an operation of the character recognition device shown in FIG. 1.

【図５】文字群を含む認識すべき文書データの一例を示
す図である。FIG. 5 is a diagram showing an example of document data to be recognized including a character group.

[Explanation of symbols]

１切出部２特徴抽出部３認識部４文字群判定部５文字群内切出部６項目認識部７結果出力部Ｄ１認識標準パターン記憶部Ｄ２項目判定用データベースなお、各図中、同一符号は同一または相当部分を示す。 1 cutout part 2 feature extraction part 3 recognition part 4 character group determination part 5 character group cutout part 6 item recognition part 7 result output part D1 recognition standard pattern storage part D2 item determination database Indicates the same or a corresponding portion.

Claims

[Claims]

1. At least two or more characters are arranged in a predetermined character area, image data including a character group having a predetermined meaning is input, and a pattern is cut out for each of the character areas accordingly. First cut-out means, first feature extraction means for extracting the features of the pattern cut out by the first cut-out means, and character patterns individually prepared for all character names to be recognized in advance, A storage unit that stores a character group pattern indicating the arrangement state of characters in the character area, which is individually prepared for all the character groups to be recognized in advance, in association with each of the characteristics, and the first characteristic. Pattern specifying means for searching the storage means based on the characteristics extracted by the extraction means, and specifying the character pattern or the character group pattern having the most similar characteristics; Second cutting means for cutting out a pattern of each character of the character group cut out by the first cutting means in response to the pattern specified by the turn specifying means being the character group pattern; A second feature extraction unit that extracts the feature for each of the patterns cut out by the second cutout unit, and a search for the storage unit based on each feature extracted by the second feature extraction unit. A character name specifying unit that specifies the character name corresponding to each of the character patterns having the most similar characteristics, and each of the character names specified by the character name specifying unit,
A character recognition device, comprising: an output unit that outputs each of the character groups cut out by the first cutting unit.

2. The character recognition device according to claim 1, wherein the output means further comprises an information addition output means for outputting while adding predetermined information corresponding to each of the output character groups.