JPH0696286A

JPH0696286A - Character recognizing device

Info

Publication number: JPH0696286A
Application number: JP4243053A
Authority: JP
Inventors: Yukiya Sugiyama; 幸也杉山
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-09-11
Filing date: 1992-09-11
Publication date: 1994-04-08

Abstract

PURPOSE:To provide the character recognizing device which relieves a character in the parentheses recognized erroneously with respect to a result of character recognition including the parentheses, and can improve remarkably a recognition rate. CONSTITUTION:The character recognizing device is constitution by providing an image reading part 1, a character segmenting part 2 for extracting an image area of a character unit from data of the image reading part 1, a character recognizing part 3 for recognizing the data obtained from the character segmenting part 2 as a character and converting it to a character code, a language processing part 4 for executing phrase setting, a word collation, and a word connection test, a parentheses character array generating part 5 for storing a position of the character in which a parentheses character is a first candidate in an array, a pair deciding part 6 for deciding whether the parentheses character constitutes a pair or not, the other character searching part 7 for searching a character for forming a pair with the parentheses character, and a candidate character exchanging part 8 for positioning the character for forming a pair with the parentheses character in a first candidate character in a character position detected by the other character searching part 7.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は新聞，雑誌，小説等の活
字，ドット文字及び手書き文字パターンをＪＩＳコード
等のコード情報に変換する文字認識装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for converting printed characters, dot characters and handwritten character patterns of newspapers, magazines, novels etc. into code information such as JIS code.

【０００２】[0002]

【従来の技術】近年、オフィースオートメーションの進
展に伴い、キーボードを用いず、文字認識装置によって
活字原稿を入力することが行われている。2. Description of the Related Art In recent years, with the development of office automation, a character recognition device is used to input a printed manuscript without using a keyboard.

【０００３】文字認識装置は、候補文字の字種遷移によ
り仮文節を設定し、仮文節内の候補文字群より生成した
単語を単語辞書と照合して一致する候補単語を求め、前
記手順によって求められた候補単語間の接続可否を検定
することにより解となる単語を得ている。The character recognition apparatus sets a provisional bunsetsu by character type transition of a candidate character, compares a word generated from a group of candidate characters in the provisional bunsetsu with a word dictionary to find a matching candidate word, and finds it by the above procedure. A word as a solution is obtained by testing whether or not the candidate words are connected.

【０００４】以下に、従来の文字認識装置について説明
する。（表１）に示すような認識結果を基にして従来の
文字認識装置の後処理方法を説明する。認識結果の字種
遷移によって仮文節を設定し、仮文節内の単語を解析す
る。A conventional character recognition device will be described below. A post-processing method of a conventional character recognition device will be described based on the recognition result as shown in (Table 1). A temporary clause is set according to the character type transition of the recognition result, and the words in the temporary clause are analyzed.

【０００５】[0005]

【表１】 [Table 1]

【０００６】仮文節１ “「” “「”は記号なので文節が成立する。Tentative phrase 1 "" "Since" "" is a symbol, the phrase is established.

【０００７】仮文節２ “ＡＢＣＪと” “ＡＢＣＪ”は英文字列なので名詞とみなされる。Tentative phrase 2 "ABCJ and" "ABCJ" are regarded as nouns because they are English character strings.

【０００８】“と”は助詞であり、名詞に後接可能なの
で文節が成立する。仮文節３ “「” “「”は記号なので文節が成立する。"To" is a postpositional particle and can be suffixed to a noun, so that a clause is established. Tentative phrase 3 ““ ”“ “” is a symbol, so the phrase is established.

【０００９】仮文節４ “ＤＥＦ” “ＤＥＦ”は英文字列なので名詞とみなされる。Tentative phrase 4 "DEF" "DEF" is regarded as a noun because it is an English character string.

【００１０】仮文節５ “」がある。” “」”は記号なので文節が成立する。There is a provisional clause 5 "". Since "" "" is a symbol, the clause is established.

【００１１】“がある。”は断定助動詞であり、名詞に
後接可能なので文節が成立する。"Are." Is an affirmative auxiliary verb, and since it can be suffixed to a noun, a clause is established.

【００１２】以上のように、従来の文字認識装置では仮
文節２の時点で、“Ｊ”を誤認識文字として検出しない
で後処理を行っていた。As described above, in the conventional character recognition device, the post-processing is performed at the time of the provisional phrase 2 without detecting "J" as a misrecognized character.

【００１３】次に（表２）に示すような認識結果を基に
して従来の文字認識装置の後処理方法を説明する。認識
結果の字種遷移によって仮文節を設定し、仮文節内の単
語を解析する。Next, the post-processing method of the conventional character recognition device will be described based on the recognition result as shown in (Table 2). A temporary clause is set according to the character type transition of the recognition result, and the words in the temporary clause are analyzed.

【００１４】[0014]

【表２】 [Table 2]

【００１５】仮文節１ “［” “［”は記号なので文節が成立する。Tentative clause 1 "[" "[" is a symbol, so the clause is satisfied.

【００１６】仮文節２ “ＡＢＣてです” “ＡＢＣ”は英文字列なので名詞とみなされる。Tentative phrase 2 "ABCte da" "ABC" is considered as a noun because it is an English character string.

【００１７】第一候補の“て”は接続助詞，五段動詞た
行仮定形，五段動詞た行命令形，の何れかであるが、三
者とも名詞に後接不可能である。The first candidate, "te", is either a connective particle, a five-verb verb line hypothesis form, or a five-verb verb line imperative form, but all three cannot be suffixed to a noun.

【００１８】第二候補の“そ”は五段動詞さ行未然形で
あり、名詞に後接不可能である。The second candidate, "so", is in the form of a five-verb verb, which cannot be suffixed to a noun.

【００１９】第三候補の“こ”は五段動詞か行未然形で
あり、名詞に後接不可能である。The third candidate, "ko", is a five-verb verb or incomplete form, and cannot be suffixed to a noun.

【００２０】以上のように、従来の文字認識装置では４
カラム目が誤認識文字であることを指摘できるが、候補
文字群中に正解文字が含まれていないために救済しない
で後処理を行っていた。As described above, in the conventional character recognition device, 4
Although it can be pointed out that the column eye is an erroneously recognized character, post-processing was performed without relief because the correct character is not included in the candidate character group.

【００２１】[0021]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、前述のように文法規則に則った単語の抽出
が主であるため、括弧等の通常対をなして用いられるべ
き文字が対を構成していない場合に、その異常を検出す
る事ができないという問題点を有している。また、本来
括弧が認識されるべき文字位置が誤認識していることを
検出できたとしても候補文字群中に正解文字が含まれて
いなければ救済は不可能という問題点を有していた。However, in the above-mentioned conventional configuration, since the extraction of words according to the grammatical rules is mainly performed as described above, the characters that should be normally paired such as parentheses form a pair. If not, there is a problem that the abnormality cannot be detected. Further, even if it is possible to detect that the character position where the parenthesis should be recognized is erroneously recognized, there is a problem that the rescue is impossible unless the correct character is included in the candidate character group.

【００２２】本発明は上記従来の問題点を解決するもの
で、括弧を含む文字認識結果に対し、誤認識された括弧
文字を救済し、認識率の著しい向上を図った文字認識装
置を提供することを目的とする。The present invention solves the above-mentioned conventional problems, and provides a character recognition device which relieves a erroneously recognized parenthesis character from a character recognition result including a parenthesis and significantly improves the recognition rate. The purpose is to

【００２３】[0023]

【課題を解決するための手段】この目的を達成するため
に本発明の文字認識装置は、画像読み取り装置から読み
取った画像データを文字認識して文字コードに変換する
文字認識部により認識された認識結果に対して、認識結
果のうち全ての第一候補文字中より、括弧などの対をな
す記号や文字を検索し、その記号の文字コードと画像デ
ータを配列に記憶しておき、記憶された各文字毎に、該
文字と対をなす文字が認識結果の第一候補文字中に存在
することを検定し、存在しなければ、配列に記憶されて
いる該文字と対をなす文字の画像データと、認識対象文
字画像データの照合を行い、照合の結果が一致するとと
もに該文字との位置関係が適正である文字画像データが
存在した場合は、前記文字画像データの第一候補文字と
して該文字と対をなす文字を採用することにより、括弧
などを含む文章を正しく認識する各手段からなる。In order to achieve this object, a character recognition device of the present invention is a recognition device that recognizes image data read from an image reading device and converts it into a character code. With respect to the result, among all the first candidate characters in the recognition result, a pair of symbols or characters such as parentheses is searched, and the character code of the symbol and the image data are stored in the array, and then stored. For each character, it is verified that the character paired with the character exists in the first candidate character of the recognition result, and if not, the image data of the character paired with the character stored in the array. And the character image data to be recognized is collated, and when the collation result is coincident and the character image data having a proper positional relationship with the character exists, the character image data is used as the first candidate character of the character image data. And a pair By adopting to character, it consists of correctly recognizing each means the sentence, including parentheses.

【００２４】具体的には、原画像を光電変換する画像読
み取り部と、前記画像読み取り部の画像データから文字
単位の画像領域を抽出する文字切出し部と、前記文字切
出し部より得られた画像データを文字認識して文字コー
ドに変換する文字認識部と、文節設定，単語照合，単語
接続検定を行う言語処理部と、前記言語処理部の検定結
果から括弧文字を第一候補とする文字の位置を配列に記
憶する括弧文字配列作成部と、前記括弧文字配列作成部
で記憶された括弧文字が対を構成しているか否かを判定
するペア判定部と、前記ペア判定部で判定された括弧文
字と対をなす文字を探す相手文字探索部と、前記相手文
字探索部で発見された文字位置の第一候補文字に、括弧
文字と対をなす文字を位置付ける候補文字交換部と、を
有する文字認識装置である。Specifically, an image reading section for photoelectrically converting an original image, a character cutting section for extracting an image area in character units from the image data of the image reading section, and image data obtained by the character cutting section. A character recognition unit that recognizes characters and converts them into a character code, a language processing unit that performs clause setting, word matching, and word connection verification, and the position of the character that makes the parenthesis character the first candidate from the verification result of the language processing unit. A parenthesis character array creation unit that stores the array, a pair determination unit that determines whether or not the parenthesis characters stored in the parenthesis character array creation unit form a pair, and the parentheses determined by the pair determination unit. A character having a partner character search unit that searches for a character paired with a character, and a candidate character exchange unit that positions a character paired with a parenthesis character to the first candidate character at the character position found by the partner character search unit. Recognition equipment It is.

【００２５】[0025]

【作用】この構成によって、認識結果の第一候補に括弧
を持つ文字を探索して、該文字の画像データ，位置，文
字コードを記憶し、記憶した括弧が相手括弧を持たない
場合は、相手括弧の存在が予想される位置の文字画像デ
ータと、相手括弧と同一の文字コードを持つ記憶済み画
像データを照合する。一致するならば、予想位置の第一
候補文字を相手括弧文字に置換する括弧を含む日本語文
章を正しく認識することができる。With this configuration, a character having a parenthesis in the first candidate of the recognition result is searched for, the image data, the position, and the character code of the character are stored, and if the stored parenthesis does not have the other parenthesis, the other parent The character image data at the position where the parenthesis is expected to exist is compared with the stored image data having the same character code as the other parenthesis. If they match, it is possible to correctly recognize the Japanese sentence including the parenthesis that replaces the first candidate character at the expected position with the partner parenthesis character.

【００２６】[0026]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００２７】図１は本発明の一実施例における文字認識
装置のブロック図である。１は原画像を光電変換する画
像読み取り部、２は画像データから文字単位の画像領域
を抽出する文字切り出し部、３は文字切り出し部１より
得られた画像データを文字認識して文字コードに変換す
る文字認識部、４は文節設定，単語照合，単語接続検定
を行う言語処理部、５は括弧文字を第一候補とする文字
の位置を配列に記憶する括弧文字配列作成部、６は括弧
文字が対を構成しているか否かを判定するペア判定部、
７は括弧文字と対をなす文字を探す相手文字探索部、８
は相手文字探索部で発見された文字位置の第一候補文字
に、括弧文字と対をなす文字を位置付ける候補文字交換
部、９は全ての括弧文字に対して処理を行ったか否かを
判定する終了判定部である。FIG. 1 is a block diagram of a character recognition apparatus according to an embodiment of the present invention. Reference numeral 1 is an image reading unit for photoelectrically converting an original image, 2 is a character cutout unit for extracting an image area in character units from image data, and 3 is character recognition of the image data obtained from the character cutout unit 1 and converting it into a character code. A character recognition unit, 4 is a language processing unit that performs phrase setting, word matching, and word connection verification, 5 is a parenthesis character array creation unit that stores the position of a character whose parenthesis character is the first candidate in an array, and 6 is a parenthesis character A pair determination unit that determines whether or not a pair is formed,
7 is a partner character search unit that searches for a character paired with a bracket character, 8
Is a candidate character exchange unit that positions a character paired with a parenthesis character to the first candidate character at the character position found by the partner character search unit, and 9 determines whether or not all parenthesis characters have been processed. It is an end determination unit.

【００２８】以上のように構成された本実施例の文字認
識装置について、以下その動作を説明する。The operation of the character recognition apparatus of this embodiment having the above-described structure will be described below.

【００２９】図２は認識対象の画像データであり、図３
は本実施例の文字認識装置の記号認識方法を示すフロー
チャートである。FIG. 2 shows image data to be recognized, and FIG.
3 is a flowchart showing a symbol recognition method of the character recognition device of this embodiment.

【００３０】まず、画像読み取り部１で光電変換（ステ
ップ２１）された図２の画像データが、文字切出し部２
により文字単位の画像データに切り出され（ステップ２
２）、文字認識部３で認識され（ステップ２３）、候補
文字群が言語処理部４に送出され、言語処理部４で救済
処理（ステップ２４）を施された認識結果が括弧文字配
列作成部５に送出されている。送出された認識結果を
（表３）に示す。First, the image data of FIG. 2 photoelectrically converted (step 21) by the image reading unit 1 is converted into the character cutting unit 2.
Is used to cut out image data in character units (step 2
2), the character recognition unit 3 recognizes (step 23), the candidate character group is sent to the language processing unit 4, and the recognition result subjected to the relief processing (Step 24) by the language processing unit 4 is the parenthesis character array creation unit. 5 is sent. The recognition result sent out is shown in (Table 3).

【００３１】[0031]

【表３】 [Table 3]

【００３２】表中、文字位置は文字切り出し部で切り出
された順番に付与される通番であり、第一候補は最も確
からしい認識結果を示し、第二候補は二番目に確からし
い認識結果を示し、第三候補は三番目に確からしい認識
結果を示す。In the table, the character positions are serial numbers assigned in the order in which they are cut out by the character cutting section, the first candidate shows the most probable recognition result, and the second candidate shows the second most probable recognition result. , The third candidate shows the third most probable recognition result.

【００３３】一文字単語サインは言語処理時に生成した
単語の文字数により判断される。０の時は一文字単語で
はないことを示し、１の時は一文字単語である。The one-character word signature is determined by the number of characters of the word generated during language processing. When it is 0, it means that it is not a single letter word, and when it is 1, it is a single letter word.

【００３４】言語処理（ステップ２４）を終えた認識結
果において、括弧文字配列作成部５で括弧文字処理（ス
テップ２５）を行い第一候補に括弧文字種を持つ文字位
置を配列に記憶する。In the recognition result after the language processing (step 24), parenthesis character processing (step 25) is performed by the parenthesis character array creating unit 5 to store the character position having the parenthesis character type as the first candidate in the array.

【００３５】次に括弧文字配列作成部５における括弧文
字配列作成手順を説明する。図４は括弧文字配列作成手
順を示すフローチャートである。Next, a procedure for creating a parenthesis character array in the parenthesis character array creating unit 5 will be described. FIG. 4 is a flowchart showing a procedure for creating a parenthesis character array.

【００３６】まず、文章の先頭を処理対象文字位置とす
る（ステップ３１，ステップ３２）。First, the beginning of the sentence is set as the character position to be processed (step 31, step 32).

【００３７】認識結果より処理対象文字位置の第一候補
文字コードを得る（ステップ３３）。The first candidate character code at the character position to be processed is obtained from the recognition result (step 33).

【００３８】文字コードは括弧文字種かを調べる（ステ
ップ３４）。括弧文字種ならば、文字コードに対応する
括弧文字配列へ文字位置を記憶する（ステップ３５）。
文字位置が文章の最終文字に到達するまで上記処理を繰
り返す。Whether the character code is a parenthesis character type is checked (step 34). If it is a parenthesis character type, the character position is stored in the parenthesis character array corresponding to the character code (step 35).
The above process is repeated until the character position reaches the last character of the sentence.

【００３９】括弧文字配列結果を（表４）に示す。The results of the bracket character arrangement are shown in (Table 4).

【００４０】[0040]

【表４】 [Table 4]

【００４１】（表４）においては、便宜上文字コードを
キャラクタ表現している。ここで、文字コードは括弧文
字コードを示す。相手文字コードは文字コードに示され
た括弧文字と対をなす括弧文字の文字コードを示す。In (Table 4), character codes are represented by characters for convenience. Here, the character code indicates a parenthesis character code. The partner character code indicates the character code of the parenthesis character paired with the parenthesis character indicated in the character code.

【００４２】文字位置は文字コードに示される括弧文字
を第一候補文字とする文字の位置であり、最大５箇所ま
で記憶できる。−１は、記憶文字位置が無いことを表
す。The character position is the position of the character whose parenthesis character shown in the character code is the first candidate character, and can be stored up to 5 places. -1 indicates that there is no memory character position.

【００４３】以下、本文においても文字コードをキャラ
クタ表現する。ペア判定部６において括弧の対応状況を
検定する。位置確認済みの括弧文字位置を括弧文字配列
より一つ取り出してcNumに格納する（ステップ４１）。
cNumには０が収められる。cNum位置の括弧文字コードを
codeに格納する（ステップ４２）。codeには“「”が収
められる。括弧の向きを判断する（ステップ４３）。括
弧文字配列番号が偶数ならば左括弧、奇数ならば右括弧
と判断する。括弧文字配列番号が１２であり偶数なので
左括弧と判断される。In the text below, the character code is also expressed as a character. The pair determination unit 6 verifies the correspondence status of the parentheses. One parenthesis character position whose position has been confirmed is taken out from the parenthesis character array and stored in cNum (step 41).
0 is stored in cNum. The parenthesis character code at the cNum position
Store in code (step 42). "" is stored in code. The orientation of the parentheses is determined (step 43). If the parenthesis character array number is even, it is judged as left parenthesis, and if it is odd, it is judged as right parenthesis. Since the parenthesis character array number is 12, which is an even number, it is determined to be a left parenthesis.

【００４４】左括弧であったので、相手文字探索部７に
おいて対応する右括弧の探索を開始する（ステップ４
４）。Since it is a left parenthesis, the partner character searching unit 7 starts searching for the corresponding right parenthesis (step 4).
4).

【００４５】図５は右括弧の探索方法を示すフローチャ
ートである。図５のフローチャートに従い右括弧の探索
方法を説明する。まず探索文字位置leftCNumにcNumを代
入する（ステップ５１）。leftCNumは０となる。FIG. 5 is a flow chart showing a method for searching for a right parenthesis. A method for searching for a right parenthesis will be described with reference to the flowchart of FIG. First, cNum is substituted for the search character position leftCNum (step 51). leftCNum becomes 0.

【００４６】括弧文字配列よりcodeと対になる括弧文字
コードを取り出してopCodeに代入する（ステップ５
２）。opCodeは“」”となる。A parenthesis character code paired with code is taken out from the parenthesis character array and is substituted into opCode (step 5).
2). The opCode will be “”.

【００４７】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝１となる。leftCNumは文章末尾位置以
下であるか否かを確認する（ステップ５４）。文章末尾
文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 1. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００４８】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“あ”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). The tempCode is "a".

【００４９】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００５０】leftCNumを位置文字先に進める（ステップ
５３）。leftCNum＝２となる。leftCNumは文章末尾位置
以下であるか否かを確認する（ステップ５４）。文章末
尾文字以下である。Advance leftCNum to the position character ahead (step 53). leftCNum = 2. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００５１】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“」”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). The tempCode will be “”.

【００５２】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は一致である。対応するコードが発見さ
れたのでcodeに対する解析を終了する。Match tempCode and opCode (step 5)
8). The matching result is a match. Since the corresponding code is found, the analysis for the code ends.

【００５３】位置確認済みの括弧文字全てに対して解析
を行ったかを判定する（ステップ４６）。行っていない
括弧文字が残っている。It is determined whether analysis has been performed for all the parenthesized characters whose position has been confirmed (step 46). There are still parenthesis characters left.

【００５４】位置確認済みの括弧文字位置を括弧文字配
列より一つ取り出してcNumに格納する（ステップ４
１）。cNumには５が収められる。One parenthesis character position whose position has been confirmed is extracted from the parenthesis character array and stored in cNum (step 4).
1). 5 is stored in cNum.

【００５５】cNum位置の括弧文字コードをcodeに格納す
る（ステップ４２）。codeには“「”が収められる。The parenthesis character code at the cNum position is stored in code (step 42). "" is stored in code.

【００５６】括弧の向きを判断する（ステップ４３）。
括弧文字配列番号が偶数ならば左括弧、奇数ならば右括
弧と判断する。括弧文字配列番号が１２であり偶数なの
で左括弧と判断される。The orientation of the parentheses is judged (step 43).
If the parenthesis character array number is even, it is judged as left parenthesis, and if it is odd, it is judged as right parenthesis. Since the parenthesis character array number is 12, which is an even number, it is determined to be a left parenthesis.

【００５７】左括弧であったので対応する右括弧の探索
を開始する（ステップ４４）。探索文字位置leftCNumに
cNumを代入する（ステップ５１）。leftCNumは５とな
る。Since it is the left parenthesis, the search for the corresponding right parenthesis is started (step 44). At search character position leftCNum
Substitute cNum (step 51). leftCNum becomes 5.

【００５８】括弧文字配列よりcodeと対になる括弧文字
コードを取り出してopCodeに代入する（ステップ５
２）。opCodeは“」”となる。A parenthesis character code paired with code is taken out from the parenthesis character array and is substituted into opCode (step 5).
2). The opCode will be “”.

【００５９】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝６となる。leftCNumは文章末尾位置以
下であるか否かを確認する（ステップ５４）。文章末尾
文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 6. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００６０】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“お”と
なる。Temp is the first candidate character code at the leftCNum position
Substitute for Code (step 57). tempCode will be "O".

【００６１】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００６２】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝７となる。leftCNumは文章末尾位置以
下であるか否かを確認する（ステップ５４）。文章末尾
文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 7. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００６３】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“Ｊ”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). The tempCode will be "J".

【００６４】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００６５】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝８となる。leftCNumは文章末尾位置以
下であるか否かを確認する（ステップ５４）。文章末尾
文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 8. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００６６】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“の”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). tempCode will be “of”.

【００６７】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００６８】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝９となる。leftCNumは文章末尾位置以
下であるか否かを確認する（ステップ５４）。文章末尾
文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 9. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００６９】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“内”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). tempCode is “in”.

【００７０】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Match tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００７１】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝１０となる。leftCNumは文章末尾位置
以下であるか否かを確認する（ステップ５４）。文章末
尾文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 10. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００７２】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“、”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). tempCode is ",".

【００７３】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は不一致である。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The collation results are inconsistent.

【００７４】leftCNumを一文字先に進める（ステップ５
３）。leftCNum＝１１となる。leftCNumは文章末尾位置
以下であるか否かを確認する（ステップ５４）。文章末
尾文字以下である。Move leftCNum forward one character (step 5
3). leftCNum = 11. It is confirmed whether leftCNum is less than or equal to the sentence end position (step 54). It is less than the last character of the sentence.

【００７５】leftCNum位置の第一候補文字コードをtemp
Codeに代入する（ステップ５７）。tempCodeは“「”と
なる。The first candidate character code at the leftCNum position is temp
Substitute for Code (step 57). tempCode is "".

【００７６】tempCodeとopCodeを照合する（ステップ５
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ５ａ）。照合結果は一致である。対応す
る右括弧が現れないまま左括弧が検出されたので、右括
弧が誤認識されていると判断できる。Collate tempCode and opCode (step 5)
8). The collation results are inconsistent. The tempCode and code are collated (step 5a). The matching result is a match. Since the left parenthesis is detected without the corresponding right parenthesis appearing, it can be determined that the right parenthesis is erroneously recognized.

【００７７】解析サインに“要”を代入する。解析終了
位置endNumにleftCNum−１を代入する（ステップ５
ｂ）。Substitute "necessary" for the analysis signature. Substitute leftCNum-1 for the analysis end position endNum (step 5)
b).

【００７８】解析サインは“要”か“不要”か判定する
（ステップ５ｃ）。“要”だったので解析を進める。It is determined whether the analysis signature is "necessary" or "unnecessary" (step 5c). Since it was "necessary", we proceed with the analysis.

【００７９】図６は括弧文字の処理方法を示すフローチ
ャートである。まず、leftCNumにcNumを代入する（ステ
ップ６１）。leftCNumは５となる。FIG. 6 is a flow chart showing a method for processing parenthesis characters. First, cNum is substituted for leftCNum (step 61). leftCNum becomes 5.

【００８０】leftCNumを一文字先に進める（ステップ６
２）。leftCNumは６となる。leftCNumはendNum以下であ
るか否かを確認する（ステップ６３）。leftCNumはendN
um以下であった。Move leftCNum forward one character (step 6
2). leftCNum becomes 6. It is confirmed whether leftCNum is less than or equal to endNum (step 63). leftCNum is endN
It was below um.

【００８１】leftCNum位置は、一文字単語か否かを確認
する（ステップ６４）。一文字単語である。Whether or not the leftCNum position is a one-letter word is confirmed (step 64). It is a one-letter word.

【００８２】leftCNum位置の候補単語群中にopCodeは含
まれているかを探索する（ステップ６５）。含まれてい
ない。そこで、同様の処理を図２の７，８，９，１０文
字位置についても行ったが、候補単語群中にopCodeは含
まれていなかった。opCodeが発見されたか否かを確認す
る（ステップ６７）。発見されなかったので文章内の画
像データ同士の照合によって対応する括弧文字を探索す
る。It is searched whether opCode is included in the candidate word group at the leftCNum position (step 65). Not included. Therefore, similar processing was performed for the character positions 7, 8, 9, and 10 in FIG. 2, but opCode was not included in the candidate word group. It is confirmed whether the opCode is found (step 67). Since it was not found, the corresponding parenthesis character is searched by collating the image data in the sentence with each other.

【００８３】leftCNumにcNumを代入する（ステップ６
８）。leftCNumを一文字先に進める（ステップ６２）。
leftCNumは６となる。Substitute cNum for leftCNum (step 6
8). Advance leftCNum one character forward (step 62).
leftCNum becomes 6.

【００８４】leftCNumはendNum以下であるか否かを確認
する（ステップ６３）。leftCNumはendNum以下であっ
た。It is confirmed whether leftCNum is less than or equal to endNum (step 63). leftCNum was less than endNum.

【００８５】leftCNum位置は、一文字単語か否かを確認
する（ステップ６４）。一文字単語である。It is confirmed whether the leftCNum position is a one-letter word (step 64). It is a one-letter word.

【００８６】opCodeを第一候補に持つ文字位置を括弧文
字配列より一つ取り出す。２が得られる。One character position having opCode as the first candidate is extracted from the parenthesis character array. 2 is obtained.

【００８７】leftCNum位置の文字画像データと２位置の
文字画像データを照合する。照合結果は一致であった。The character image data at the leftCNum position and the character image data at the two positions are collated. The verification results were in agreement.

【００８８】そこで、候補文字交換部８でleftCNum位置
の第一候補文字にopCodeを位置付ける（ステップ６
ｄ）。Therefore, the candidate character exchange unit 8 positions opCode at the first candidate character at the leftCNum position (step 6).
d).

【００８９】次に、左括弧の探索について説明する。括
弧文字配列作成部５に送出された認識結果を（表５）に
示す。Next, the search for the left parenthesis will be described. The recognition result sent to the bracket character array creation unit 5 is shown in (Table 5).

【００９０】[0090]

【表５】 [Table 5]

【００９１】括弧文字配列作成部５の手順に従って作成
した括弧文字配列を（表６）に示す。The parenthesis character array created according to the procedure of the parenthesis character array creating unit 5 is shown in (Table 6).

【００９２】[0092]

【表６】 [Table 6]

【００９３】ペア判定部６において括弧の対応状況を検
定する。図４において、位置確認済みの括弧文字位置を
括弧文字配列より一つ取り出してcNumに格納する（ステ
ップ４１）。cNumには２が収められる。The pair determination unit 6 verifies the correspondence status of parentheses. In FIG. 4, one parenthesis character position whose position has been confirmed is extracted from the parenthesis character array and stored in cNum (step 41). 2 is stored in cNum.

【００９４】cNum位置の括弧文字コードをcodeに格納す
る（ステップ４２）。codeには“」”が収められる。The parenthesis character code at the cNum position is stored in code (step 42). "" is stored in code.

【００９５】括弧の向きを判断する（ステップ４３）。
括弧文字配列番号が偶数ならば左括弧、奇数ならば右括
弧と判断する。括弧文字配列番号が１３であり奇数なの
で右括弧と判断される。The orientation of the parentheses is judged (step 43).
If the parenthesis character array number is even, it is judged as left parenthesis, and if it is odd, it is judged as right parenthesis. Since the parenthesis character array number is 13 and is an odd number, it is determined to be a right parenthesis.

【００９６】図７は左括弧の探索方法を示すフローチャ
ートである。図７のフローチャートに従い左括弧の探索
方法を説明する。まず探索文字位置rCNum にcNumを代入
する（ステップ７１）。rCNum は２となる。FIG. 7 is a flow chart showing the method for searching the left parenthesis. A search method for the left parenthesis will be described with reference to the flowchart of FIG. First, cNum is substituted for the search character position rCNum (step 71). rCNum is 2.

【００９７】括弧文字配列よりcodeに対応する括弧文字
コードを取り出してopCodeに代入する（ステップ７
２）。opCodeは“「”となる。A parenthesis character code corresponding to code is taken out from the parenthesis character array and assigned to opCode (step 7
2). The opCode will be "".

【００９８】rCNum を一文字文頭方向に戻す（ステップ
７３）。rCNum ＝１となる。rCNum は文頭位置以上であ
るか否かを確認する（ステップ７４）。文頭位置以上で
ある。The rCNum is returned to the beginning of one character (step 73). rCNum = 1. It is confirmed whether rCNum is equal to or higher than the sentence head position (step 74). It is above the beginning of the sentence.

【００９９】rCNum 位置の第一候補文字コードをtempCo
deに代入する（ステップ７７）。tempCodeは“あ”とな
る。The first candidate character code at the rCNum position is tempCo
Substitute in de (step 77). The tempCode is "a".

【０１００】tempCodeとopCodeを照合する（ステップ７
８）。照合結果は不一致である。tempCodeとcodeを照合
する（ステップ７ａ）。照合結果は不一致である。Collate tempCode and opCode (step 7)
8). The collation results are inconsistent. The tempCode and code are collated (step 7a). The collation results are inconsistent.

【０１０１】rCNum を一文字文頭方向に戻す（ステップ
７３）。rCNum ＝０となる。rCNum は文頭位置以上であ
るか否かを確認する（ステップ７４）。文頭位置以上で
ある。RCNum is returned in the direction toward the beginning of one character (step 73). rCNum = 0. It is confirmed whether rCNum is equal to or higher than the sentence head position (step 74). It is above the beginning of the sentence.

【０１０２】rCNum 位置の第一候補文字コードをtempCo
deに代入する（ステップ７７）。tempCodeは“Ｆ”とな
る。The first candidate character code at the rCNum position is tempCo
Substitute in de (step 77). tempCode becomes “F”.

【０１０３】tempCodeとopCodeを照合する（ステップ７
８）。照合結果は不一致である。tempCodeとopCodeを照
合する（ステップ７ａ）。照合結果は不一致である。Collate tempCode and opCode (step 7)
8). The collation results are inconsistent. The tempCode and opCode are collated (step 7a). The collation results are inconsistent.

【０１０４】rCNum を一文字文頭方向に戻す（ステップ
７３）。rCNum ＝−１となる。rCNum は文頭位置以上で
あるか否かを確認する（ステップ７４）。文頭位置以下
である。RCNum is returned to the beginning of one character (step 73). rCNum = -1. It is confirmed whether rCNum is equal to or higher than the sentence head position (step 74). It is below the beginning of the sentence.

【０１０５】対応する左括弧が現れないまま文頭が検出
されたので、左括弧が誤認識されていると判断できる。Since the beginning of a sentence is detected without the corresponding left parenthesis appearing, it can be determined that the left parenthesis is erroneously recognized.

【０１０６】解析サインに“要”を代入する。解析終了
位置endNumに文頭位置を代入する（ステップ７ｂ）。“Necessary” is substituted for the analysis signature. The sentence head position is substituted for the analysis end position endNum (step 7b).

【０１０７】解析サインは“要”か“不要”か判定する
（ステップ７ｃ）。“要”だったので解析を進める。It is determined whether the analysis sign is "necessary" or "unnecessary" (step 7c). Since it was "necessary", we will proceed with the analysis.

【０１０８】図８は括弧文字の処理方法を示すフローチ
ャートである。まず、rCNum にcNumを代入する（ステッ
プ８１）。rCNum は２となる。FIG. 8 is a flowchart showing a method of processing parenthesized characters. First, cNum is substituted for rCNum (step 81). rCNum is 2.

【０１０９】rCNum を一文字戻す（ステップ８２）。rC
Num は１となる。rCNum はendNum以上であるか否かを確
認する（ステップ８３）。rCNum はendNum以上であっ
た。One character is returned from rCNum (step 82). rC
Num becomes 1. It is confirmed whether rCNum is equal to or larger than endNum (step 83). rCNum was greater than endNum.

【０１１０】rCNum 位置は、一文字単語か否かを確認す
る（ステップ８４）。一文字単語である。It is confirmed whether the rCNum position is a one-letter word (step 84). It is a one-letter word.

【０１１１】rCNum 位置の候補単語群中にopCodeは含ま
れているか探索する（ステップ８５）。含まれていな
い。It is searched whether opCode is included in the candidate word group at the position of rCNum (step 85). Not included.

【０１１２】同様の処理を０文字位置についても行う
が、候補単語群中にopCodeは含まれていなかった。Similar processing is performed for the 0 character position, but opCode is not included in the candidate word group.

【０１１３】opCodeが発見されたか否かを確認する（ス
テップ８７）。発見されなかったので文章内の画像デー
タ同士の照合によって対応する括弧文字を探索する。It is confirmed whether or not the opCode is found (step 87). Since it was not found, the corresponding parenthesis character is searched by collating the image data in the sentence with each other.

【０１１４】rCNum にcNumを代入する（ステップ８
８）。rCNum を一文字戻す（ステップ８９）。rCNum は
１となる。Substitute cNum for rCNum (step 8)
8). Return rCNum by one character (step 89). rCNum is 1.

【０１１５】rCNum はendNum以上であるか否かを確認す
る（ステップ８ａ）。rCNum はendNum以上であった。It is confirmed whether rCNum is equal to or larger than endNum (step 8a). rCNum was greater than endNum.

【０１１６】rCNum 位置は、一文字単語か否かを確認す
る（ステップ８ｂ）。一文字単語である。Whether or not the rCNum position is a one-character word is confirmed (step 8b). It is a one-letter word.

【０１１７】opCodeを第一候補に持つ文字位置を括弧文
字配列より一つ取り出す。５が得られる。One character position having opCode as the first candidate is extracted from the parenthesis character array. 5 is obtained.

【０１１８】rCNum 位置の文字画像データと５位置の文
字画像データを照合する。照合結果は不一致であった。The character image data at the rCNum position is compared with the character image data at the 5th position. The matching results were inconsistent.

【０１１９】opCodeを第一候補に持つ文字位置を括弧文
字配列より一つ取り出す。１１が得られる。One character position having opCode as the first candidate is extracted from the parenthesis character array. 11 is obtained.

【０１２０】rCNum 位置の文字画像データと５位置の文
字画像データを照合する。照合結果は不一致であった。The character image data at the rCNum position is compared with the character image data at the 5th position. The matching results were inconsistent.

【０１２１】rCNum を一文字戻す（ステップ８９）。rC
Num は０となる。rCNum はendNum以上であるか否かを確
認する（ステップ８ａ）。rCNum はendNum以上であっ
た。The rCNum is returned by one character (step 89). rC
Num becomes 0. It is confirmed whether rCNum is equal to or larger than endNum (step 8a). rCNum was greater than endNum.

【０１２２】rCNum 位置は、一文字単語か否かを確認す
る（ステップ８ｂ）。一文字単語である。It is confirmed whether or not the rCNum position is a one-character word (step 8b). It is a one-letter word.

【０１２３】opCodeを第一候補に持つ文字位置を括弧文
字配列より一つ取り出す。５が得られる。One character position having opCode as the first candidate is extracted from the parenthesis character array. 5 is obtained.

【０１２４】rCNum 位置の文字画像データと５位置の文
字画像データを照合する。照合結果は一致であった。The character image data at the rCNum position is compared with the character image data at the 5th position. The verification results were in agreement.

【０１２５】そこで、候補文字交換部８でrCNum 位置の
第一候補文字にopCodeを位置付ける（ステップ８ｄ）。Therefore, the candidate character exchange unit 8 positions opCode at the first candidate character at the position rCNum (step 8d).

【０１２６】以下、同様にして全ての括弧文字に対して
解析を行う。以上の作業を終了したかどうか終了判定部
９で判定し、終了の判定がでると判定作業を終了する。In the same manner, all parenthesized characters are analyzed in the same manner. The end determination unit 9 determines whether or not the above work is completed, and when the end is determined, the determination work is completed.

【０１２７】[0127]

【発明の効果】以上のように本発明は、左右括弧文字の
対応がとれるように文字認識結果を修正することがで
き、認識率を著しく向上させた文字認識装置を実現でき
るものである。As described above, according to the present invention, the character recognition result can be corrected so that the left and right parenthesis characters can be associated with each other, and the character recognition device having a significantly improved recognition rate can be realized.

[Brief description of drawings]

【図１】本発明の一実施例における文字認識装置のブロ
ック図FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

【図２】本発明の一実施例の説明に用いる認識対象の画
像データの例を示す図FIG. 2 is a diagram showing an example of image data of a recognition target used for explaining one embodiment of the present invention.

【図３】本発明の一実施例における文字認識装置の記号
認識方法を示すフローチャートFIG. 3 is a flowchart showing a symbol recognition method of a character recognition device according to an embodiment of the present invention.

【図４】本発明の一実施例における文字認識装置の括弧
文字配列作成手順を示すフローチャートFIG. 4 is a flowchart showing a procedure for creating a parenthesis character array of the character recognition device in the embodiment of the present invention.

【図５】本発明の一実施例における文字認識装置の右括
弧の探索方法を示すフローチャートFIG. 5 is a flowchart showing a right parenthesis search method of the character recognition device in the embodiment of the present invention.

【図６】本発明の一実施例における文字認識装置の括弧
文字の処理方法を示すフローチャートFIG. 6 is a flowchart showing a method for processing parenthesized characters of the character recognition device in the embodiment of the present invention.

【図７】本発明の一実施例における文字認識装置の左括
弧の探索方法を示すフローチャートFIG. 7 is a flowchart showing a method for searching for a left parenthesis of the character recognition device in the embodiment of the present invention.

【図８】本発明の一実施例における文字認識装置の右括
弧文字の処理方法を示すフローチャートFIG. 8 is a flowchart showing a method for processing a right parenthesis character of the character recognition device in the embodiment of the present invention.

[Explanation of symbols]

１画像読み取り部２文字切り出し部３文字認識部４言語処理部５括弧文字配列作成部６ペア判定部７相手文字探索部８候補文字交換部９終了判定部 1 image reading unit 2 character cutout unit 3 character recognition unit 4 language processing unit 5 parenthesis character array creation unit 6 pair determination unit 7 partner character search unit 8 candidate character exchange unit 9 end determination unit

Claims

[Claims]

1. An image reading unit for photoelectrically converting an original image,
A character cutout unit for extracting an image area in character units from the image data of the image reading unit, a character recognition unit for character-recognizing the image data obtained from the character cutout unit and converting it into a character code, clause setting, word A language processing unit that performs matching and word connection verification, a parenthesis character array creation unit that stores the position of a character whose parenthesis character is the first candidate from the test result of the language processing unit in an array, and the parenthesis character array creation unit. A pair determination unit that determines whether or not the stored parenthesis characters form a pair, a partner character search unit that searches for a character that forms a pair with the parentheses character that is determined by the pair determination unit, and the partner character search unit A character recognition device, comprising: a candidate character exchange unit for locating a character paired with a parenthesis character, at the first candidate character at the character position found in 1.