JP2890241B2

JP2890241B2 - Optical character recognition device

Info

Publication number: JP2890241B2
Application number: JP6153392A
Authority: JP
Inventors: 正人加納
Original assignee: NIPPON DENKI ENJINIARINGU KK
Current assignee: NIPPON DENKI ENJINIARINGU KK
Priority date: 1994-07-05
Filing date: 1994-07-05
Publication date: 1999-05-10
Anticipated expiration: 2014-05-10
Also published as: JPH0816712A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は光学的文字認識装置に関
し、特に光学的に読取った読取り結果に対して知識処理
を行う光学的文字認識方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an optical character recognition apparatus and, more particularly, to an optical character recognition system for performing a knowledge process on an optically read result.

【０００２】[0002]

【従来の技術】従来、この種の光学的文字認識方式にお
いては、図１０に示すように、漢字「宇賀神」やそれに
対応する振り仮名「ウガシン」を１マスに１文字ずつ記
入するようになっており、振り仮名に付与される濁点
「゛」や半濁点「゜」も１マスに記入するようになって
いる。2. Description of the Related Art Conventionally, in this type of optical character recognition system, as shown in FIG. 10, a kanji character "Ugajin" and its corresponding pseudonym "Ugasin" are written one by one in one box. In addition, a clouded point “゛” and a semi-voiced mark “゜” given to a pseudonym are also entered in one box.

【０００３】光学的文字読取装置は漢字や振り仮名の認
識結果を、図１１に示すように、候補文字という順位付
けされた類似文字で出力する。すなわち、漢字の認識結
果は第１候補文字が「宇黄神」、第２候補文字が「申賀
詳」、第３候補文字が「甲寅柿」、第４候補文字が「夕
宵仲」、第５候補文字が「旬質袖」、第６候補文字が
「夕脅挿」、第７候補文字が「９皆伸」、第８候補文字
が「宗賛禅」、第９候補文字が「字薄枠」というように
予め順位が付与された類似文字で出力される。As shown in FIG. 11, an optical character reading device outputs a recognition result of a kanji character or a hiragana character as a similar character ranked as a candidate character. In other words, the recognition result of the kanji is that the first candidate character is “Uojin”, the second candidate character is “Shoga detail”, the third candidate character is “Kotorakaki”, the fourth candidate character is “Yuuoi Naka”, The 5 candidate characters are “seasonal sleeve”, the 6th candidate character is “Evening Insane Insertion”, the 7th candidate character is “9 Min Shin”, the 8th candidate character is “Soushu Zen”, and the 9th candidate character is “ It is output in a similar character to which a ranking is given in advance, such as "light frame".

【０００４】また、振り仮名の認識結果も第１候補文字
が「ウカハシン」、第２候補文字が「カタ゛ンニ」、第
３候補文字が「タウソソ」というように予め順位が付
与された類似文字で出力される。In addition, the recognition result of the phonetic kana is a similar character that has been given a ranking in advance, such as a first candidate character of "Ukahashin", a second candidate character of "Katabuni", and a third candidate character of "Tausoso". Is output.

【０００５】光学的文字認識方式では光学的文字読取装
置から出力される漢字の候補文字の組合せによって単語
を作り、この単語を漢字と振り仮名とが予め対応付けら
れて格納されたデータベースの内容と比較し、漢字候補
単語を決定する。漢字候補単語が決定されることによっ
て、データベースの内容から振り仮名単語が決定され
る。In the optical character recognition method, a word is created by combining kanji candidate characters output from an optical character reader, and the word is stored in a database in which kanji and shakugana are stored in association with each other in advance. Compare and determine kanji candidate words. By determining the kanji candidate word, the furigana word is determined from the contents of the database.

【０００６】この振り仮名単語と、振り仮名の認識結果
の候補文字とを比較し、候補順位によって振り仮名単語
の得点を計算し、この得点と上記の方法で決定された漢
字候補単語の得点とを合計したものが漢字と振り仮名と
をペアにした候補単語の総合得点になる。[0006] This kana word is compared with the candidate character of the recognition result of the kana character, and the score of the kana word is calculated according to the candidate order. The score and the score of the kanji candidate word determined by the above method are compared with the score. Is the total score of the candidate word that pairs the kanji and the Chinese kana.

【０００７】上記の動作を繰返して実行し、最終的に総
合得点がよいものから候補単語として出力する。この光
学的文字認識方式については、特開平４−３２８６９２
号公報等に詳述されている。[0007] The above operation is repeatedly executed, and finally, words having a good total score are output as candidate words. This optical character recognition method is disclosed in Japanese Patent Laid-Open No. 4-328892.
The details are described in Japanese Patent Publication No.

【０００８】[0008]

【発明が解決しようとする課題】上述した従来の光学的
文字認識方式では、濁点や半濁点が他の文字（漢字や振
り仮名）と同様に１マスに記入されているので、誤読や
不読が発生した時にデータベースの振り仮名データと振
り仮名認識結果の文字順序が濁点や半濁点の分のずれを
生じてしまい、データが一致せずに正常な候補単語を決
定することができない。In the above-mentioned conventional optical character recognition method, since the syllabic point and the syllabary point are entered in one box as in the case of other characters (kanji and syllabic kana), misreading or unreading is performed. Occurs, the character order between the pseudonym data in the database and the pseudonym recognition result is shifted by the amount of voiced or semi-voiced dots, and the data does not match, and a normal candidate word cannot be determined.

【０００９】そこで、本発明の目的は上記の問題点を解
消し、振り仮名に付与された濁点や半濁点の分だけ文字
順序がずれても正常に候補単語を決定することができる
光学的文字認識装置を提供することにある。Accordingly, an object of the present invention is to solve the above-mentioned problems and to provide an optical character which can normally determine a candidate word even if the character order is shifted by the amount of the voiced or semi-voiced characters assigned to the pen-kana. An object of the present invention is to provide a recognition device.

【００１０】[0010]

【課題を解決するための手段】本発明による光学的文字
認識装置は、漢字とその漢字に対応する振り仮名とを夫
々光学的に読取って認識する光学的文字認識装置であっ
て、漢字の認識結果と予め格納された漢字とを比較して
漢字単語を抽出する漢字単語選択手段と、前記漢字単語
選択手段で抽出された漢字単語に対応する振り仮名の中
に濁点及び半濁点が有るか否かを検出する検出手段と、
前記検出手段で前記濁点及び半濁点が検出されたときに
その濁点及び半濁点を除く振り仮名と振り仮名認識結果
とを比較して振り仮名単語を抽出する振り仮名単語選択
手段と、前記漢字単語選択手段で抽出された漢字単語の
候補順位及び前記振り仮名単語選択手段で抽出された振
り仮名単語の候補順位を基に候補単語の順位を決定する
手段とを備えている。SUMMARY OF THE INVENTION An optical character recognition apparatus according to the present invention is an optical character recognition apparatus for optically reading and recognizing a kanji character and a katakana character corresponding to the kanji character. A kanji word selecting means for extracting a kanji word by comparing the result with a pre-stored kanji; and whether or not there is a clouded point and a semi-voiced point in a shakana corresponding to the kanji word extracted by the kanji word selecting means. Detecting means for detecting whether
When the detection means detects the voiced and semi-voiced points, the pseudonym word selection means for extracting a pseudonym word by comparing the pseudonym and the pseudonym recognition result excluding the voiced and semi-voiced points, and the kanji word Means for determining the order of the candidate words based on the candidate order of the kanji words extracted by the selecting means and the candidate order of the pseudonym words extracted by the shaking kana word selecting means.

【００１１】本発明の他の光学的文字認識装置は、上記
の構成のほかに、前記漢字単語と前記濁点及び半濁点が
付与された振り仮名を含む振り仮名とを互い対応づけて
格納するデータベースを具備している。According to another aspect of the present invention, there is provided an optical character recognizing apparatus for storing, in addition to the above-described configuration, the kanji word and the syllabary including the syllabary to which the syllabic point and the syllabic point are added. Is provided.

【００１２】本発明の他の光学的文字認識装置は、上記
の構成のほかに、前記検出手段で検出された前記振り仮
名の中の濁点及び半濁点の位置を保持する保持手段を備
え、前記保持手段の内容を基に前記振り仮名から前記濁
点及び半濁点を除去するよう構成されている。Another optical character recognition apparatus of the present invention further comprises, in addition to the above-described configuration, holding means for holding the positions of the voiced and semi-voiced points in the kurikana detected by the detecting means. It is configured to remove the cloudy point and the semi-voiced point from the Chinese kana based on the contents of the holding means.

【００１３】[0013]

【作用】漢字認識結果の文字を組合わせた単語とデータ
ベース部に予め格納された単語とを比較して漢字単語を
抽出し、その漢字単語の得点を計算する。A kanji word is extracted by comparing a word obtained by combining the characters of the kanji recognition result with a word previously stored in the database unit, and the score of the kanji word is calculated.

【００１４】また、漢字単語に対応する振り仮名をデー
タベース部から読出し、その振り仮名内における濁点や
半濁点の位置を検出する。検出した濁点や半濁点の位置
に基づいてデータベース部６の振り仮名データから濁点
や半濁点を除き、この振り仮名データと振り仮名認識結
果とを比較して振り仮名単語を抽出し、その振り仮名単
語の得点を計算する。Further, the phonetic character corresponding to the kanji word is read out from the database unit, and the position of the voiced or semi-voiced voice in the character is detected. Based on the positions of the detected voiced or semi-voiced voices, the voiced or semi-voiced voice data in the database unit 6 is excluded from the pseudonym data, and the pseudonym word is extracted by comparing the pseudonym data with the pseudonym recognition result. Calculate word scores.

【００１５】これら漢字単語及び振り仮名単語の得点を
合計してその漢字単語の総合得点を計算し、総合得点の
上位のものから順次候補単語として出力する。The total score of the kanji word and the kana word is summed to calculate a total score of the kanji word, and the highest score of the total kanji word is sequentially output as a candidate word.

【００１６】これによって、データベース部の振り仮名
データと振り仮名認識結果とにおいて文字順序が濁点や
半濁点の分だけずれてデータが一致しないときでも、正
常に候補単語を決定することができる。In this way, even when the character order in the kana and the kana recognition results in the database unit is out of alignment by the amount of the voiced or semi-voiced voice and the data does not match, the candidate word can be determined normally.

【００１７】[0017]

【実施例】次に、本発明の一実施例について図面を参照
して説明する。Next, an embodiment of the present invention will be described with reference to the drawings.

【００１８】図１は本発明の一実施例の構成を示すブロ
ック図である。図において、漢字単語選択部１は光学的
文字読取装置（図示せず）から出力される漢字認識結果
を受取ると、漢字認識結果の文字を組合わせた単語とデ
ータベース部６に予め格納された単語とを比較する。FIG. 1 is a block diagram showing the configuration of one embodiment of the present invention. In FIG. 1, when a kanji word selection unit 1 receives a kanji recognition result output from an optical character reading device (not shown), the kanji word selection unit 1 combines a word obtained by combining characters of the kanji recognition result with a word stored in the database unit 6 in advance. Compare with

【００１９】その比較の結果、データベース部６に漢字
単語が存在する場合、漢字単語選択部１はその漢字認識
結果の順位を基に漢字単語の得点を計算し、その漢字単
語データ及び得点を候補単語出力部５に渡す。As a result of the comparison, if a kanji word exists in the database unit 6, the kanji word selection unit 1 calculates the score of the kanji word based on the ranking of the kanji recognition result, and determines the kanji word data and the score as candidate. Pass to word output unit 5.

【００２０】同時に、漢字単語選択部１はその漢字単語
に対応する振り仮名のデータベース部６内でのポインタ
を振り仮名単語選択部２に渡す。振り仮名単語選択部２
は漢字単語選択部１からの振り仮名のポインタを濁点検
索部３に渡す。At the same time, the kanji word selection unit 1 passes a pointer in the database 6 for the kana character corresponding to the kanji word to the kana word selection unit 2. Furigana word selection part 2
Handovers the pointer of the furigana from the kanji word selection unit 1 to the voiced dot search unit 3.

【００２１】濁点検索部３は振り仮名単語選択部２から
の振り仮名のポインタを受取ると、そのポインタを基に
データベース部６内の振り仮名データを検索し、振り仮
名データ内における濁点や半濁点の位置を示す位置情報
を濁点位置保持部４に保持する。When receiving the pointer of the pseudonym from the pseudonym word selecting section 2, the voiced point search section 3 searches the database for the pseudonym data in the database section 6 based on the pointer, and searches for the voiced or semi-voiced point in the pseudonym data. Is stored in the turbid point position holding unit 4.

【００２２】振り仮名単語選択部２は濁点位置保持部４
に保持された位置情報を参照しながら、振り仮名認識結
果とデータベース部６の振り仮名データから濁点や半濁
点を除いた振り仮名とを一文字ずつ比較する。The kana kana word selection unit 2 includes a voiced dot position holding unit 4
Is compared with the phonetic character recognition result obtained by removing the phonetic character or semi-voiced character from the phonetic character data of the database unit 6 while referring to the position information held in the database.

【００２３】その比較の結果、データベース部６に漢字
単語が存在する場合、振り仮名単語選択部２はその振り
仮名認識結果の順位を基に振り仮名単語の得点を計算
し、その振り仮名単語データ及び得点を候補単語出力部
５に渡す。As a result of the comparison, if a kanji word exists in the database unit 6, the kana-kana word selection unit 2 calculates the score of the kana-kana word based on the ranking of the kana-kana recognition result, and And the score are passed to the candidate word output unit 5.

【００２４】候補単語出力部５は漢字認識結果の単語の
組合せがなくなるまで上記の動作が繰返し行われると、
漢字単語選択部１からの漢字単語データ及びその得点
と、振り仮名単語選択部２からの振り仮名データ及びそ
の得点とを合計してその漢字単語の総合得点を計算す
る。The candidate word output unit 5 repeats the above operation until there are no more combinations of words in the kanji recognition result.
The kanji word data and its score from the kanji word selection unit 1 and the kana character data and its score from the furigana word selection unit 2 are summed to calculate the total score of the kanji word.

【００２５】候補単語出力部５は全ての漢字単語の総合
得点を計算した後に、総合得点の上位のものから順位の
高い候補単語として順次出力していく。この場合、候補
単語出力部５は順位の高い候補単語から第１候補単語、
第２候補単語と順次出力していく。After calculating the total score of all the kanji words, the candidate word output section 5 sequentially outputs the candidate words having the highest total score as candidate words having the highest rank. In this case, the candidate word output unit 5 outputs the first candidate word,
The second candidate word is sequentially output.

【００２６】図２は本発明の一実施例による帳票記入例
を示す図である。図において、帳票Ａには振り仮名「ウ
ガシン」の記入欄Ａ１と、漢字「宇賀神」の記入欄Ａ２
とが設けられている。ここでは「ガ」のように濁点を含
む振り仮名が１マスに記入されている場合を示してい
る。FIG. 2 is a diagram showing a form entry example according to one embodiment of the present invention. In the figure, the form A has an entry column A1 for the pseudonym “Ugasin” and an entry column A2 for the kanji character “Ugajin”.
Are provided. Here, a case is shown in which a pseudonym including a voiced point is entered in one box, such as “ga”.

【００２７】図３は図１のデータベース部６の構成を示
す図である。図において、データベース部６には１つの
漢字単語に複数の振り仮名がある場合、例えば「宇賀
神」という漢字単語の場合、その複数の振り仮名単語
「ウカカミ」，「ウガシン」，「ウガカミ」，「ウガジ
ン」，「ウガガミ」が漢字単語「宇賀神」に予め対応付
けられて格納されている。FIG. 3 is a diagram showing the configuration of the database unit 6 of FIG. In the figure, when one kanji word has a plurality of kana characters, for example, a kanji word “Ukajin” in the database unit 6, the plurality of kana words “Ukakami”, “Ugasin”, “Ugakami”, “ “Ugajin” and “Ugami” are stored in advance in association with the kanji word “Ugajin”.

【００２８】図４は本発明の一実施例による漢字認識結
果の一例を示す図である。図において、光学的文字読取
装置は漢字認識結果Ｂを順位付けした類似文字で出力す
る。図４では光学的文字読取装置が「宇賀神」を読取っ
て認識したときに出力される類似文字を示している。FIG. 4 is a diagram showing an example of a kanji recognition result according to one embodiment of the present invention. In the figure, the optical character reading device outputs the kanji recognition result B as a similar character with ranking. FIG. 4 shows similar characters that are output when the optical character reader reads and recognizes “Ugajin”.

【００２９】この場合、漢字認識結果Ｂは第１候補文字
が「宇黄神」、第２候補文字が「申賀詳」、第３候補文
字が「甲寅柿」、第４候補文字が「夕宵仲」、第５候補
文字が「旬質袖」、第６候補文字が「夕脅挿」、第７候
補文字が「９皆伸」、第８候補文字が「宗賛禅」、第９
候補文字が「字薄枠」というように予め順位が付与され
た類似文字で出力される。In this case, in the kanji recognition result B, the first candidate character is “Uhuangjin”, the second candidate character is “Shoga detail”, the third candidate character is “Kotorakaki”, and the fourth candidate character is “Evening evening”. Naka, the 5th candidate character is “seasonal sleeve”, the 6th candidate character is “Evening Insect”, the 7th candidate character is “9 Min Shin”, the 8th candidate character is “Souzen Zen”, the ninth
The candidate character is output as a similar character to which a ranking is given in advance such as “character thin frame”.

【００３０】図５は図４の漢字認識結果Ｂの漢字を組合
せて作成された単語例を示す図である。図においては、
漢字認識結果Ｂの中の漢字が組合され、「宇黄神」、
「申黄神」、「甲黄神」、……、「宇賀神」、「宇寅
神」、「宇宵神」、……のような単語が作成される例を
示している。FIG. 5 is a diagram showing an example of a word created by combining the kanji of the kanji recognition result B of FIG. In the figure,
The kanji in the kanji recognition result B are combined, and "Uhuangjin"
An example is shown in which words such as “Shinhuangjin”, “Kohuangjin”,..., “Ugajin”, “Utorajin”, “Uoijin”,.

【００３１】図６及び図７は本発明の一実施例による振
り仮名認識結果とデータベースの内容との比較動作例を
示す図である。図６はデータベース部６の振り仮名デー
タに濁点や半濁点が検出されない場合の振り仮名認識結
果Ｃと振り仮名データ１との比較動作を示している。FIGS. 6 and 7 are diagrams showing an example of the comparison operation between the pseudonym recognition result and the contents of the database according to one embodiment of the present invention. FIG. 6 shows the comparison operation between the pseudonym recognition result C and the pseudonym data 1 when no voiced or semi-voiced point is detected in the pseudonym data in the database unit 6.

【００３２】このとき、振り仮名認識結果Ｃの第１候補
文字「ウ？シン」、第２候補文字「カンニ」、第３候
補文字「タソソ」各々の文字はデータベース部６の振
り仮名データ１「ウカカミ」の同じ位置の文字と比較さ
れる。At this time, the characters of the first candidate character “U? Shin”, the second candidate character “Kanni”, and the third candidate character “Tasso” of the kana kana recognition result C are stored in the kana kana data of the database unit 6. 1 The character is compared with the character at the same position of "Ukakami".

【００３３】また、図７はデータベース部６の振り仮名
データに濁点や半濁点が検出された場合の振り仮名認識
結果Ｃと、振り仮名データ１，２との比較動作を示して
いる。FIG. 7 shows a comparison operation between the pseudonym recognition result C and the pseudonym data 1 and 2 when a voiced or semi-voiced voice is detected in the pseudonym data in the database unit 6.

【００３４】このとき、振り仮名認識結果Ｃの第１候補
文字「ウ？シン」、第２候補文字「カンニ」、第３候
補文字「タソソ」各々の１文字目及び２文字目はデー
タベース部６の振り仮名データ１「ウカ゛シン」、振り
仮名データ２「ウカ゛カミ」の１文字目及び２文字目と
比較される。At this time, the first and second characters of the first candidate character “U? Shin”, the second candidate character “Kanni”, and the third candidate character “Tasso” of the kana-kana recognition result C are stored in the database. The first character and the second character of the pseudonym data 1 “Uka @ shin” and the hiragana data 2 “Uka @ kami” of the part 6 are compared.

【００３５】また、振り仮名認識結果Ｃの第１候補文字
「ウ？シン」、第２候補文字「カンニ」、第３候補文字
「タソソ」各々の３文字目及び４文字目はデータベー
ス部６の振り仮名データ１「ウカ゛シン」、振り仮名デ
ータ２「ウカ゛カミ」の濁点「゛」を飛ばした４文字目
及び５文字目と比較される。The third and fourth characters of the first candidate character "U? Shin", the second candidate character "Kani", and the third candidate character "Tasoso" of the kana-kana recognition result C are stored in the database unit 6. Are compared with the fourth and fifth characters of the pseudonym data 1 "Uka @ shin" and the hiragana data 2 "Uka @ kami", which are obtained by skipping the cloudy point "@".

【００３６】図８及び図９は本発明の一実施例の動作を
示すフローチャートである。これら図１〜図９を用いて
本発明の一実施例による光学的文字認識の動作について
説明する。FIGS. 8 and 9 are flowcharts showing the operation of one embodiment of the present invention. The operation of optical character recognition according to one embodiment of the present invention will be described with reference to FIGS.

【００３７】濁点付きの振り仮名が１マスに記入されて
いる帳票Ａ（図２参照）を光学的文字読取装置が読取っ
た場合、漢字認識結果Ｂ（図４参照）及び振り仮名認識
結果Ｃ（図６及び図７参照）からなる候補文字という順
位付けされた類似文字が光学的文字読取装置から出力さ
れる。When an optical character reader reads a form A (see FIG. 2) in which a phonetic character with voiced dots is entered in one box, a kanji recognition result B (see FIG. 4) and a character recognition result C (see FIG. 4). The similar characters ranked as candidate characters composed of the candidate characters (see FIGS. 6 and 7) are output from the optical character reading device.

【００３８】この場合、帳票Ａの１マスに記入されてい
る濁点付きの振り仮名に対する振り仮名認識結果Ｃは読
取り不能文字（図中、「？」で表示）または誤読とな
る。In this case, the recognition result C of the pseudonym with respect to the pseudonym with a voiced dot entered in one box of the form A is an unreadable character (indicated by "?" In the figure) or misread.

【００３９】漢字単語選択部１は光学的文字読取装置か
らの漢字認識結果Ｂを受取ると、漢字認識結果Ｂの文字
を組合わせて漢字単語を作成する（図８ステップＳ１）
（図５参照）。Upon receiving the kanji recognition result B from the optical character reading device, the kanji word selection section 1 creates a kanji word by combining the characters of the kanji recognition result B (step S1 in FIG. 8).
(See FIG. 5).

【００４０】漢字単語選択部１は作成した漢字単語とデ
ータベース部６に予め格納された単語とを比較し、上記
の漢字単語がデータベース部６に登録されているか否か
を検索する（図８ステップＳ２，Ｓ３）。The kanji word selection unit 1 compares the kanji word created with a word stored in the database unit 6 in advance, and searches whether or not the kanji word is registered in the database unit 6 (step in FIG. 8). S2, S3).

【００４１】この検索によって決定される漢字単語には
その単語を構成する１文字１文字について候補文字での
順位によって得点付けされており、単語の文字毎の得点
の合計が漢字単語の得点となる。The kanji word determined by this search is scored according to the order of candidate characters for each character constituting the word, and the total score of each word character is the kanji word score. .

【００４２】漢字単語選択部１は漢字単語が決定する
と、その漢字単語に対応する振り仮名単語を決定し、そ
の振り仮名単語のデータベース部６内でのポインタを振
り仮名単語選択部２に渡す。When the kanji word is determined, the kanji word selection section 1 determines the phonetic kana word corresponding to the kanji word and passes the pointer in the database section 6 of the phonetic kana word to the phonetic kana word selection section 2.

【００４３】振り仮名単語選択部２は漢字単語選択部１
からポインタを受取ると、そのポインタを基にデータベ
ース部６から振り仮名データを読出し、振り仮名単語選
択処理を行う。The furigana word selection unit 2 is a kanji word selection unit 1
When the pointer is received from the database, it reads the pseudonym data from the database unit 6 based on the pointer, and performs a pseudonym word selection process.

【００４４】すなわち、振り仮名単語選択部２は漢字単
語選択部１からのポインタを基にデータベース部６から
読出した振り仮名データと振り仮名認識結果Ｃとを比較
し、振り仮名認識結果Ｃの単語がデータベース部６にあ
るか否かを検索する（図８ステップＳ４，Ｓ５）。That is, the hiragana word selecting section 2 compares the hiragana data read out from the database section 6 with the hiragana recognition result C based on the pointer from the kanji word selecting section 1, and determines the word in the hiragana recognition result C. Is searched in the database unit 6 (steps S4 and S5 in FIG. 8).

【００４５】この場合、濁点付きの振り仮名が１マスに
記入されているので、振り仮名認識結果Ｃ全体の文字数
が４文字となってしまい、振り仮名単語選択部２がデー
タベース部６から参照する振り仮名データとのずれが生
ずるために単語なしとなる。In this case, since the kana with the syllabary point is entered in one box, the total number of characters of the kana recognition result C becomes 4 characters, and the kana kana word selection section 2 refers to the database section 6 from the database section 6. There is no word because there is a deviation from the pseudonym data.

【００４６】振り仮名単語選択部２は単語なしになる
と、漢字単語選択部１からの振り仮名単語のポインタを
濁点検索部３に渡す。濁点検索部３は振り仮名単語選択
部２からの振り仮名単語のポインタを受取ると、そのポ
インタを基にデータベース部６から振り仮名データを読
出し、濁点検索処理を行う（図８ステップＳ６）。When there is no word, the furigana word selection unit 2 passes the pointer of the furigana word from the kanji word selection unit 1 to the voice mark search unit 3. When receiving the pointer of the pseudonym word from the pseudonym word selection unit 2, the voiced dot search unit 3 reads out the pseudonym data from the database unit 6 based on the pointer and performs the voiced word search process (step S6 in FIG. 8).

【００４７】すなわち、濁点検索部３は振り仮名単語選
択部２からのポインタを基にデータベース部６から読出
した振り仮名データに濁点や半濁点があるか否かを検索
する（図９ステップＳ６−１，Ｓ６−２）。That is, the dakuten search unit 3 searches whether or not there is a dakuten or a semi-dakuten in the kana data read from the database unit 6 based on the pointer from the kana word selection unit 2 (step S6 in FIG. 9). 1, S6-2).

【００４８】振り仮名データに濁点や半濁点がある場
合、濁点検索部３はその振り仮名データ内における濁点
や半濁点の位置を示す位置情報を濁点位置保持部４に保
持する（図９ステップＳ６−３）。If there is a dakuten or semi-dakuten in the shakugana data, the dakuten search unit 3 holds the position information indicating the position of the dakuten or semi-dakuten in the shagana data in the dakuten position holding unit 4 (step S6 in FIG. 9). -3).

【００４９】濁点や半濁点の位置情報が濁点位置保持部
４に保持されると、振り仮名単語選択部２は振り仮名単
語選択処理を行う（図９ステップＳ６−４）。すなわ
ち、振り仮名単語選択部２は濁点位置保持部４に保持さ
れた位置情報を参照しながら、振り仮名認識結果Ｃとデ
ータベース部６の振り仮名データから濁点や半濁点を除
いた振り仮名とを１文字ずつ比較する。When the position information of the voiced and semi-voiced voices is stored in the voiced voice position storage unit 4, the pseudonym word selection unit 2 performs a pseudonym word selection process (step S6-4 in FIG. 9). That is, while referring to the position information stored in the voiced character position storage unit 4, the pseudonym word selection unit 2 recognizes the pseudonym recognition result C and the pseudonym obtained by removing the voiced and semi-voiced characters from the pseudonym data in the database unit 6. Compare one character at a time.

【００５０】振り仮名単語選択部２はその比較の結果か
らデータベース部６に漢字単語が存在するかを判断する
（図８ステップＳ７）。データベース部６に漢字単語が
存在する場合、振り仮名単語選択部２はその振り仮名認
識結果の順位を基に振り仮名単語の得点を計算し、その
振り仮名単語データと得点とを候補単語出力部５に渡
す。The furigana word selection unit 2 determines whether a kanji word exists in the database unit 6 based on the result of the comparison (step S7 in FIG. 8). If a kanji word exists in the database unit 6, the kana-kana word selection unit 2 calculates the score of the kana-kana word based on the order of the kana-kana recognition result, and outputs the kana-kana word data and the score to the candidate word output unit. Pass to 5.

【００５１】候補単語出力部５は漢字単語選択部１から
の漢字単語データ及びその得点と、振り仮名単語選択部
２からの振り仮名データ及びその得点とを合計してその
漢字単語の総合得点を計算する（図８ステップＳ８）。The candidate word output unit 5 sums up the kanji word data and its score from the kanji word selection unit 1 and the kana character data and its score from the kana word selection unit 2 to obtain the total score of the kanji word. The calculation is performed (step S8 in FIG. 8).

【００５２】候補単語出力部５は漢字認識結果の単語の
組合せがなくなるまで上記の動作を繰返し行うと（図８
ステップＳ１〜Ｓ９）、総合得点の上位のものから順位
の高い候補単語として順次出力していく（図８ステップ
Ｓ１０）。この場合、候補単語出力部５は順位の高い候
補単語から第１候補単語、第２候補単語と順次出力して
いく。The candidate word output unit 5 repeats the above operation until there are no more combinations of words in the kanji recognition result (FIG. 8).
Steps S1 to S9), and sequentially output the candidate words having higher total scores as candidate words having higher ranks (step S10 in FIG. 8). In this case, the candidate word output unit 5 sequentially outputs the first candidate word and the second candidate word in order from the candidate word having the highest rank.

【００５３】このように、濁点や半濁点が付与された振
り仮名が１マスに記入され、不読や誤読が生じた時にデ
ータベース部６の振り仮名データから濁点や半濁点を除
いて振り仮名認識結果Ｃと比較することによって、デー
タベース部６の振り仮名データと振り仮名認識結果Ｃと
において文字順序が濁点や半濁点の分だけずれてデータ
が一致しないときにも、正常に候補単語を決定すること
ができる。As described above, the pen-kana to which the voiced point and the semi-voiced point are added is entered in one box. By comparing the result with the result C, the candidate word can be determined normally even when the character order between the phonetic kana data in the database unit 6 and the phonetic kana recognition result C is different from each other due to the shift of the voiced or semi-voiced point and the data does not match. be able to.

【００５４】[0054]

【発明の効果】以上説明したように本発明によれば、漢
字の認識結果と予め格納された漢字とを比較して漢字単
語を抽出し、この抽出された漢字単語に対応する振り仮
名の中に濁点や半濁点が検出されたときにその濁点や半
濁点を除く振り仮名と振り仮名認識結果とを比較して振
り仮名単語を抽出し、これら漢字単語の候補順位及び振
り仮名単語の候補順位を基に候補単語の順位を決定する
ことによって、振り仮名に付与された濁点や半濁点の分
だけ文字順序がずれても正常に候補単語を決定すること
ができるという効果がある。As described above, according to the present invention, a kanji word is extracted by comparing a kanji recognition result with a previously stored kanji character, and a kanji word corresponding to the extracted kanji word is extracted. When a voiced point or semi-voiced point is detected, the phonetic alphabet excluding the voiced point or semi-voiced point is compared with the character recognition result to extract the word pseudonym. By determining the order of the candidate words based on, there is an effect that the candidate words can be normally determined even if the character order is shifted by the amount of the voiced or semi-voiced characters assigned to the pen-kana.

[Brief description of the drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】本発明の一実施例による帳票記入例を示す図で
ある。FIG. 2 is a diagram showing a form entry example according to an embodiment of the present invention.

【図３】図１のデータベース部の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a database unit in FIG. 1;

【図４】本発明の一実施例による漢字認識結果の一例を
示す図である。FIG. 4 is a diagram illustrating an example of a kanji recognition result according to an embodiment of the present invention.

【図５】図４の漢字認識結果の漢字を組合せて作成され
た単語例を示す図である。5 is a diagram showing an example of a word created by combining the kanji of the kanji recognition result of FIG. 4;

【図６】本発明の一実施例による振り仮名認識結果とデ
ータベースの内容との比較動作例を示す図である。FIG. 6 is a diagram illustrating an example of a comparison operation between a pseudonym recognition result and the contents of a database according to an embodiment of the present invention.

【図７】本発明の一実施例による振り仮名認識結果とデ
ータベースの内容との比較動作例を示す図である。FIG. 7 is a diagram showing an example of a comparison operation between a pseudonym recognition result and the contents of a database according to an embodiment of the present invention.

【図８】本発明の一実施例の動作を示すフローチャート
である。FIG. 8 is a flowchart showing the operation of one embodiment of the present invention.

【図９】本発明の一実施例の動作を示すフローチャート
である。FIG. 9 is a flowchart showing the operation of one embodiment of the present invention.

【図１０】従来例による帳票記入例を示す図である。FIG. 10 is a diagram showing a form entry example according to a conventional example.

【図１１】従来例による認識結果とデータベースの内容
との比較動作例を示す図である。FIG. 11 is a diagram showing an example of a comparison operation between a recognition result and the contents of a database according to a conventional example.

[Explanation of symbols]

１漢字単語選択部２振り仮名単語選択部３濁点検索部４濁点位置保持部５候補単語出力部６データベース部 REFERENCE SIGNS LIST 1 Kanji word selection unit 2 Shakukana word selection unit 3 Dakuten search unit 4 Dakuten position holding unit 5 Candidate word output unit 6 Database unit

Claims

(57) [Claims]

(1) A kanji and a kana corresponding to the kanji.
Optical character recognition means for reading and recognizing each optically
And a candidate word consisting of a kanji word and the corresponding kana
Candidate word storing means for storing the read kanji recognition result and the previously stored kanji
Candidate kanji to extract candidate kanji words by comparing with kanji words
A word selecting means, and a kana character in a kana word corresponding to the candidate kanji word
Is a cloud point / semi-voice point detection means for detecting whether there is a semi-voice point
And, to respond to the candidate Kanji word and the read the furigana
If the total number of characters does not match
From the kana word corresponding to the candidate kanji word
Alternatively, remove the semi-voiced point and re-
Candidate kana to extract candidate kana words by comparing with the name
A word selecting means, and a candidate kanji word extracted by the candidate kanji word selecting means.
Extracted by the candidate rank and the candidate pseudonym word selecting means.
Based on the candidate rank of candidate kana words
Optical sentence having means for final determination
Character recognition device.

2. The turbidity detected by the turbidity / semi-turbidity point detecting means.
Point / semi-voice point position
2. The optical statement according to claim 1, wherein the optical statement has holding means.
Character recognition device.