JPH076212A

JPH076212A - Intelligence processing unit for optical character reader

Info

Publication number: JPH076212A
Application number: JP5149476A
Authority: JP
Inventors: Yasumasa Murai; 康眞村井; Shinsuke Yamashita; 信介山下
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-06-21
Filing date: 1993-06-21
Publication date: 1995-01-10

Abstract

PURPOSE:To provide the intelligence processing unit for the optical character reader in which processing time is short and characters are surely recognized. CONSTITUTION:The processing unit is provided with a font-depending word dictionary 8 for each font, a font retrieval section 6 retrieving a character string comprising a same font from an object character group for each font from a recognition section 4 and calculating the priority of the character string from the object order of the object characters being components of the character string, and a collation section 7 collating the font-depending word dictionary 8 for each font of the same kind as the font of the character string retrieved depending on the priority and providing an output of a read result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、光学文字読取装置の知
識処理装置に関し、特に、複数の字種（漢字、平仮名、
片仮名、数字、英字など）の文字が混在する一般文章を
読み取ることにより得られた候補文字群から辞書を用い
て単語照合を行なう光学文字読取装置の知識処理装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a knowledge processing device for an optical character reader, and more particularly to a plurality of character types (Kanji, Hiragana,
The present invention relates to a knowledge processing device of an optical character reading device that performs word matching using a dictionary from a candidate character group obtained by reading a general sentence in which characters such as katakana, numbers, and English letters are mixed.

【０００２】[0002]

【従来の技術】紙などの媒体に書かれた文字を読み取っ
て、その文字を認識する光学文字読取装置があるが、書
かれた文字のイメージデータをパターン処理するだけで
は完全な認識を行なうことは難しく、文字を読み取った
のちに何らかの知識処理を行なって認識率を向上させる
ことが必要である。2. Description of the Related Art There is an optical character reader for reading a character written on a medium such as paper and recognizing the character, but it is possible to perform complete recognition only by pattern-processing the image data of the written character. Is difficult, and it is necessary to improve the recognition rate by performing some knowledge processing after reading the characters.

【０００３】従来の光学文字読取装置の知識処理では、
住所辞書、姓名辞書、企業名辞書、役職名辞書、アパー
ト・マンション名辞書、職業辞書、病名辞書などの各種
の辞書を用意し、一方、記入用紙においては記入欄ごと
に記入されるべき項目を定めてその情報を予め装置に登
録しておき、文字の読み取り後、記入欄ごとに対応する
辞書を照合して文字を決定していた。図８に示した例で
は、記入用紙５１に企業名、役職名、氏名の各欄が設け
られ、これら各欄に記入された文字列について、企業名
辞書、役職名辞書、姓名辞書の各知識辞書とそれぞれ照
合されるようになっている。In the knowledge processing of the conventional optical character reader,
We prepare various dictionaries such as address dictionary, first and last name dictionary, company name dictionary, job title dictionary, apartment / condominium name dictionary, occupation dictionary, and disease name dictionary. The information is set and registered in advance in the device, and after reading the characters, the corresponding dictionary is collated for each entry field to determine the characters. In the example shown in FIG. 8, each of the company name, job title, and name fields is provided on the entry form 51, and the knowledge of the company name dictionary, job name dictionary, and surname family dictionary for the character strings entered in these fields. It is designed to be matched with each dictionary.

【０００４】読み取ろうとする文字列が一般文章である
場合には、一般単語辞書を含む上述した全ての辞書を使
用する。また、読み取ろうとする文字列の各文字ごとに
複数の候補文字を求めて候補文字群としておく。そし
て、図９に示されるように、候補文字群おいて文字列の
先頭の文字（第１文字目）についての第１候補文字を読
出し、この第１候補文字を先頭文字とする単語を辞書か
ら捜し出し、その単語の２文字目以降の文字が候補文字
群の第２文字目以降に存在しているかの比較を行なう。
比較によって単語の照合ができた場合にはその単語を読
取結果として出力し、照合できなかった場合には照合す
る辞書側の単語を更新し、照合処理を繰り返す。When the character string to be read is a general sentence, all the above-mentioned dictionaries including the general word dictionary are used. In addition, a plurality of candidate characters is obtained for each character of the character string to be read and set as a candidate character group. Then, as shown in FIG. 9, the first candidate character for the first character (first character) of the character string in the candidate character group is read, and the word having the first candidate character as the first character is read from the dictionary. A search is performed and a comparison is made as to whether the second and subsequent characters of the word are present after the second and subsequent characters of the candidate character group.
If the word can be collated by comparison, the word is output as a read result, and if the word cannot be collated, the dictionary side word to be collated is updated, and the collation processing is repeated.

【０００５】上述した照合処理によって候補文字群の第
１文字目の第１候補文字を先頭文字とするとする一連の
単語との照合を行なった結果、一致する単語がなかった
場合には、候補文字群の第１文字目の第２候補文字を先
頭文字とする単語について、同様の照合処理を実行す
る。As a result of matching with a series of words whose first character is the first candidate character of the first character of the candidate character group as a result of the matching process described above, if no matching word is found, the candidate character is selected. The same matching process is executed for the word having the first candidate character of the first character of the group as the first character.

【０００６】候補文字群の第１文字目の全ての候補文字
に関し、単語照合によってこれを先頭文字とする単語が
見つからなかった場合には、候補文字群の第２文字目を
先頭文字とする単語を照合すべく、上述の処理を実行す
る。そして、これらの処理を読み取ろうとする文字列の
全ての文字に対して実行し、結果を出力するようになっ
ている。For all the first candidate characters of the candidate character group, if no word having the first character as the first character is found by word matching, the word having the second character of the candidate character group as the first character is found. The above-mentioned processing is executed in order to collate. Then, these processes are executed for all the characters of the character string to be read, and the result is output.

【０００７】さらに特公平1-19195号公報には、候補文
字に認識順位に応じた重みをつけ、単語と照合する場合
に重みを考慮して重み値の一番高い単語を選択出力する
ように構成された知識処理方法が開示されている。Further, Japanese Patent Publication No. 1-19195 discloses that a candidate character is weighted according to a recognition rank, and when matching with a word, the word having the highest weight value is selected and output in consideration of the weight. A structured knowledge processing method is disclosed.

【０００８】[0008]

【発明が解決しようとする課題】日本語の文章では、一
般に、複数の字種（漢字、平仮名、片仮名、数字、英字
など）の文字が混在するが、上述した従来の知識処理方
法では、単語照合しようとする部分の第１番目の文字と
第２番目の文字とが異なる字種のものであっても照合処
理を行なうため、不要な処理時間がかかるという問題点
がある。また、字種が異なるということを意識せずに単
語照合を行なうため、候補文字として複数の字種の文字
が挙げられていた場合などに、本来ならば単語として抽
出されてはならない場合であるにも関わらず、単語とし
て認識してしまうことがあるという問題点がある。Generally, in Japanese sentences, characters of a plurality of character types (Kanji, Hiragana, Katakana, numbers, English letters, etc.) are mixed. However, in the conventional knowledge processing method described above, Even if the first character and the second character of the portion to be collated are of different character types, the collation processing is performed, and there is a problem that unnecessary processing time is required. In addition, since word matching is performed without being aware of the fact that the character types are different, there is a case where the character should not be originally extracted as a word, for example, when characters of multiple character types are listed as candidate characters. Nevertheless, there is a problem that it may be recognized as a word.

【０００９】本発明の目的は、処理時間が短く、かつ確
実に文字認識を行なうことができる光学文字読取装置の
知識処理装置を提供することにある。It is an object of the present invention to provide a knowledge processing device for an optical character reading device which requires a short processing time and can reliably perform character recognition.

【００１０】[0010]

【課題を解決するための手段】本発明の光学文字読取装
置の知識処理装置は、文字が記載された媒体に光を照射
し前記媒体からの反射光を電気信号に変換して記憶する
走査部と、前記記憶された電気信号から１文字単位で切
出しを行ない大きさを整える前処理部と、切出された１
文字ごとに特徴を抽出する特徴抽出部と、認識辞書と、
抽出された特徴と前記認識辞書の内容とを比較し切出さ
れた１文字ごとに複数個の候補文字を認識結果として出
力する認識部とを有する光学文字読取装置に使用される
知識処理装置において、字種ごとに設けられた字種別単
語辞書と、前記認識部から出力された候補文字群から字
種ごとに同一字種からなる文字列を検索し、該文字列を
構成する候補文字の候補順位から該文字列の優先順位を
計算する字種別検索部と、前記優先順位に応じて前記文
字列の字種と同一字種の字種別単語辞書を照合し読取結
果を出力する照合部とを有する。A knowledge processing device of an optical character reader according to the present invention comprises a scanning unit for irradiating a medium on which characters are written with light and converting reflected light from the medium into an electric signal for storage. And a pre-processing unit that cuts out the stored electric signal in units of one character to adjust the size, and cut out one
A feature extraction unit that extracts features for each character, a recognition dictionary,
In a knowledge processing device used in an optical character reading device having a recognition unit that compares extracted features with the contents of the recognition dictionary and outputs a plurality of candidate characters for each extracted character as a recognition result. , A character type word dictionary provided for each character type, and a character string having the same character type for each character type are searched from a candidate character group output from the recognition unit, and candidate character candidates that form the character string are searched. And a collation unit that collates a character type word dictionary of the same character type as the character type of the character string according to the priority and outputs a reading result. Have.

【００１１】[0011]

【作用】候補文字群から字種ごとに同一字種からなる文
字列を検索し、候補文字の候補順位からこの文字列に対
する優先順位を計算し、字種ごとに設けられた字種別単
語辞書を照合して読取結果を得るので、一連の照合の対
象となる辞書側の単語の範囲を絞ることができて、照合
処理に要する時間が短縮される。また、異なる字種間の
区切りに応じて単語認識が行なわれるので、誤認識が減
って認識率が向上する。[Function] A character string consisting of the same character type is searched from the candidate character group for each character type, the priority order for this character string is calculated from the candidate character candidate ranks, and the character type word dictionary provided for each character type is calculated. Since the reading result is obtained by collating, the range of words on the dictionary side to be subjected to a series of collations can be narrowed down, and the time required for the collation processing can be shortened. Further, since word recognition is performed according to the division between different character types, erroneous recognition is reduced and the recognition rate is improved.

【００１２】[0012]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。図１は、本発明の一実施例の光学文字読取
装置の構成を示すブロック図である。この光学文字読取
装置は、同一字種別の文字列の検索機能を備えたもので
ある。Embodiments of the present invention will now be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an optical character reading device according to an embodiment of the present invention. This optical character reading device has a search function for a character string of the same character type.

【００１３】文字が記載された紙などの媒体に光を照射
し媒体からの反射光を電気信号に変換して記憶する走査
部１が設けられ、記憶された電気信号から１文字単位で
切出しを行ない大きさを整える（正規化を行なう）前処
理部２が、走査部１に接続されている。前処理部２で正
規化された各文字に対しその文字の特徴を抽出する特徴
抽出部３と、文字と特徴との関係を記憶した認識辞書５
とが設けられている。特徴抽出部３の出力が入力しかつ
認識辞書を検索する認識部４が設けられており、認識部
４は、抽出された特徴と認識辞書５との内容を比較し、
切出された１文字ごとに複数個の候補文字を選択して認
識結果として出力するように構成されている。A scanning unit 1 is provided which irradiates a medium such as paper on which characters are written with light and converts reflected light from the medium into an electric signal and stores the electric signal. The scanning unit 1 cuts the stored electric signal in units of one character. A preprocessing unit 2 that adjusts the size of the operation (performs normalization) is connected to the scanning unit 1. A feature extraction unit 3 that extracts the feature of each character normalized by the preprocessing unit 2 and a recognition dictionary 5 that stores the relationship between the character and the feature.
And are provided. A recognition unit 4 that receives the output of the feature extraction unit 3 and searches the recognition dictionary is provided, and the recognition unit 4 compares the contents of the extracted feature and the recognition dictionary 5,
A plurality of candidate characters is selected for each cut out character and is output as a recognition result.

【００１４】さらに、この光学文字読取装置は、漢字、
平仮名、片仮名、英字、数字、記号などの字種ごとに設
けられた複数個の字種別単語辞書８と、認識部４から出
力された候補文字群を対象として字種別検索を行なって
文字列を出力する字種別検索部６と、字種別検索部６か
ら出力された文字列に対し、その文字列の優先順位に応
じてその文字列の字種と同一字種の字種別単語辞書を照
合し読取結果を出力する照合部７とを有する。字種別検
索部６は、認識部４からの候補文字群に対し、字種ごと
に同一字種からなる文字列を検索し、その文字列を構成
する候補文字の候補順位からその文字列の優先順位を計
算するように構成されている。Furthermore, this optical character reader is designed to
A plurality of character type word dictionaries 8 provided for each character type such as hiragana, katakana, letters, numbers, and symbols, and a character type search for a candidate character group output from the recognition unit 4 are performed to obtain a character string. For the character type search unit 6 to be output and the character string output from the character type search unit 6, a character type word dictionary of the same character type as the character type of the character string is collated according to the priority of the character string. The verification unit 7 outputs the reading result. The character type search unit 6 searches the candidate character group from the recognition unit 4 for a character string having the same character type for each character type, and prioritizes the character string from the candidate ranks of the candidate characters forming the character string. It is configured to calculate ranks.

【００１５】次に、本実施例の動作を説明する。Next, the operation of this embodiment will be described.

【００１６】紙などに記入された文章は、走査部１によ
って読み込まれ、前処理部２で１文字ごとに切り出され
て大きさを整えられ、特徴抽出部３で１文字ごとに特徴
を抽出され、認識部４に入力する。認識部４では認識辞
書５との照合が行なわれ、切り出された各文字ごとに複
数個の候補文字が選択されて出力される。各候補文字に
は、それぞれ候補順位が付されている。ここまでの処理
は従来の光学文字読取装置と同様である。The text entered on a paper or the like is read by the scanning unit 1, cut out by the pre-processing unit 2 for each character and adjusted in size, and the feature extraction unit 3 extracts the features for each character. , To the recognition unit 4. The recognition unit 4 collates with the recognition dictionary 5, and a plurality of candidate characters are selected and output for each cut out character. Each candidate character is assigned a candidate rank. The processing up to this point is similar to that of the conventional optical character reading device.

【００１７】以下、１８文字からなる「英字のＥＮＤ
は，カタカナで表現すると」という文章が読み込まれ、
認識部４は各文字に対してそれぞれ９文字ずつの候補文
字を出力するものとして、説明を行なう。その結果、図
２に示すように、１８×９＝１６２個の候補文字からな
る候補文字群が、認識部４から出力される。なお、読取
文字No.は、読取文字列の中で文頭から何文字目である
かを示している。Below, the 18-character "English character END"
Is read in katakana ”
The recognition unit 4 will be described assuming that each character outputs nine candidate characters. As a result, as shown in FIG. 2, a candidate character group consisting of 18 × 9 = 162 candidate characters is output from the recognition unit 4. The read character No. indicates the number of the character from the beginning of the sentence in the read character string.

【００１８】続いて、上述した候補文字群が字種別検索
部６に入力し、同一字種別検索が行なわれる。同一字種
別検索は、まず、読取文字No.ごとに、その候補文字の
各文字について字種の区別が行なわれ、字種ごとに、字
数および重みの和が求められる。重みは、認識順位が第
１位であれば９、第２位であれば８、以下同様にして第
９位であれば１というようにふるものとする。Subsequently, the above-mentioned candidate character group is input to the character type search unit 6 and the same character type search is performed. In the same-character-type search, first, a character type is distinguished for each character of the candidate character for each read character No., and the sum of the number of characters and the weight is obtained for each character type. The weight is 9 when the recognition rank is the first rank, 8 when the recognition rank is the second rank, and 1 in the same manner when the recognition rank is the ninth rank.

【００１９】ここで説明している例では、読取文字No.1
については、その各候補文字「英葵芙莫奨萃夷菓菜」が
全て漢字であるので、漢字の数量が９、重みが４５とな
る。同様に、読取文字No.3に対しては、平仮名の個数が
３でその重みが１７、漢字の個数が５でその重みが２
０、記号の個数が１でその重みが８となる。In the example described here, the read character No. 1
As for each of the candidate characters, “Ei Aoi Fu Mosho Ganryokuka” is all kanji, the number of kanji is 9 and the weight is 45. Similarly, for read character No. 3, the number of hiragana is 3, the weight is 17, the number of kanji is 5, and the weight is 2.
0, the number of symbols is 1, and the weight is 8.

【００２０】各読取文字について上述の処理を行なった
のち、字種ごとに、各読取文字の候補文字に中にその字
種の文字が読取文字No.順に連続して存在するかどうか
が調べられる。連続して存在する場合、これを同一字種
列とし、各同一字種列ごとに、その候補文字中の同一字
種の文字数と重みの総和が求められる。図に示した例で
は、読取文字No.4、読取文字No.5および読取文字No.6の
各候補文字中に英字が存在し、読取文字No.3と読取文字
No.7の候補文字には英字が存在しないから、読取文字N
o.4、読取文字No.5および読取文字No.6をもって英字列N
o.1とする。そしてこれらの読取文字中の英字の総数と
英字についての重みの総和が求められる。ここで示した
例では、図２の同一字種列検索結果の欄に示されている
ように、英字の総数は４、重みの和は３１となる。After the above-described processing is performed for each read character, it is checked for each character type whether or not the characters of that character type are continuously present in the candidate characters of each read character in the order of the read character number. . If they exist consecutively, this is regarded as the same character type string, and the sum of the number of characters and the weight of the same character type in the candidate characters is obtained for each same character type string. In the example shown in the figure, there is an alphabetic character in each of the read character No. 4, read character No. 5, and read character No. 6 candidate characters, and read character No. 3 and read character No.
No English character exists in the candidate character No. 7, so the read character N
Alphabet string N with o.4, read character No. 5 and read character No. 6
o.1. Then, the total number of letters in these read characters and the sum of weights for the letters are obtained. In the example shown here, the total number of alphabetic characters is 4 and the sum of weights is 31, as shown in the same character type string search result column in FIG.

【００２１】英字、平仮名、片仮名、漢字、数字、記号
の全ての字種について上記の処理を行なった結果が図２
に示されている。なお、数字および記号については、２
以上の読取文字No.に連続して出現しないので、同一字
種列を構成しない。このため、同一字種列検索結果の欄
には"（）"が付されている。FIG. 2 shows the result of the above processing for all the alphabetic characters, hiragana, katakana, kanji, numbers and symbols.
Is shown in. For numbers and symbols, 2
Since the above read character numbers do not appear consecutively, they do not form the same character type string. Therefore, "()" is added to the column of the same character type string search result.

【００２２】続いて、同一字種列検索結果から、同一字
種列ごとの優先順位を求める。ここでは、各同一字種列
について、重みを数量（字数）で割った商を比較するこ
とにより優先順位を求める。英字列No.1については、重
みの和を文字の総数で割ると約７.７５となる。他の同
一字種列についても同様に１文字当たりの重みを求め、
その１文字当たりの重みの大きい順に優先順位を割り当
てる。その結果が同一字種列優先順位の欄に示されてい
る。Next, the priority order for each of the same character type strings is obtained from the same character type string search results. Here, the priority order is obtained by comparing the quotient obtained by dividing the weight by the quantity (the number of characters) for each of the same character type strings. For the English character string No. 1, the sum of weights divided by the total number of characters is approximately 7.75. Similarly, for other strings of the same character type, the weight per character is calculated,
Priorities are assigned in descending order of weight per character. The result is shown in the column of the same character type string priority.

【００２３】各同一字種列に対してそれぞれ優先順位を
求めたら、照合部７は、優先順位が１位である同一字種
列について、その同一字種列を構成する読取文字No.の
候補文字と対応する字種の字種別単語辞書８とを照合し
て、単語の検索を行なう。ここでに示した例では、英字
列No.1が第１順位であるので、この英字列No.1に対応す
る読取文字No.4、読取文字No.5および読取文字No.6の各
候補文字と英語単語辞書とが照合され、その結果、「Ｅ
ＮＤ」の英字単語が一致する。When the priority order is obtained for each of the same character type strings, the collating unit 7 selects the read character No. that constitutes the same character type string for the same character type string having the first priority. A character is searched by collating the character with the character type word dictionary 8 of the corresponding character type. In the example shown here, the alphabet string No. 1 is the first rank, so each of the read character No. 4, read character No. 5 and read character No. 6 candidates corresponding to this alphabet string No. 1 The letters are checked against the English word dictionary, and as a result, "E
The English word "ND" matches.

【００２４】もし、優先順位が第１位の同一字種列にお
いて一致する単語が辞書になかった場合には、優先順位
が第２位の同一字種列（この場合なら平仮名列No.4）に
ついて、対応する字種別単語辞書との照合が行なわれ
る。一致する単語が見つかるまで、順次優先順位の低い
同一字種列を対象として照合が繰り返される。If there is no matching word in the same character type string having the first priority, the dictionary has the same character type string having the second priority (in this case, hiragana string No. 4). Is compared with the corresponding character-type word dictionary. The matching is repeated for the same character type sequence having a lower priority in sequence until a matching word is found.

【００２５】一致する単語が見つかったら、その一致す
る単語に対応する読取文字を除いた候補文字群につい
て、図３に示すように同一字種列検索からの処理を再度
行ない、同様に、優先順位の高い同一字種列から順に単
語照合を行なう。以下、同一字種列検索と単語照合とを
対象とする同一字種列がなくなるまで繰りかえす。図４
から図６は、この繰り返しの各段階を示したものであ
る。When a matching word is found, the candidate character group excluding the read character corresponding to the matching word is processed again from the same character type string search as shown in FIG. The word matching is performed in order from the same character type string having a high Hereinafter, the same character type string search and word matching are repeated until there is no same character type string. Figure 4
6 to 9 show each step of this repetition.

【００２６】照合対象の文字列がなくなった段階で、単
独文字が未照合の状態で残るが、このような単独文字列
については例えば文法処理を行ない、最終的な読取結果
を出力する。図７は、単独文字のみが残った時点の状態
を示しており、最終的な読取結果として、読取文字列に
完全に一致する「英字のＥＮＤは，カタカナで表現する
と」が得られたことを示している。At the stage where there is no character string to be collated, single characters remain in a non-collated state. For such a single character string, for example, grammar processing is performed and the final reading result is output. FIG. 7 shows a state when only a single character remains, and as a final reading result, "the English character END is expressed in katakana" is obtained. Shows.

【００２７】[0027]

【発明の効果】以上説明したように本発明は、字種ごと
の字種別単語辞書と、候補文字群から字種ごとに同一字
種からなる文字列を検索し、その文字列を構成する候補
文字の候補順位からその文字列の優先順位を計算する字
種別検索部と、優先順位に応じて検索された文字列の字
種と同一字種の字種別単語辞書を照合し読取結果を出力
する照合部とを設けることにより、一致する単語を見つ
けるまでの照合処理に要する時間が短縮でき、誤認識が
減って認識率が向上するという効果がある。As described above, according to the present invention, a character type word dictionary for each character type and a character string having the same character type for each character type are searched from a candidate character group, and a candidate forming the character string is searched. The character type search unit that calculates the priority order of the character string from the candidate order of the character and the character type word dictionary of the same character type as the character type of the searched character string according to the priority order are collated and the read result is output. By providing the matching unit, it is possible to shorten the time required for the matching process until a matching word is found, reduce false recognition, and improve the recognition rate.

[Brief description of drawings]

【図１】本発明の一実施例の光学文字読取装置の構成を
示すブロック図である。FIG. 1 is a block diagram showing a configuration of an optical character reading device according to an embodiment of the present invention.

【図２】字種別の検索（１回目）と単語照合の過程を説
明する図である。FIG. 2 is a diagram illustrating a process of character type search (first time) and word matching.

【図３】字種別の検索（２回目）と単語照合の過程を説
明する図である。FIG. 3 is a diagram illustrating a process of character type search (second time) and word matching.

【図４】字種別の検索（３回目）と単語照合の過程を説
明する図である。FIG. 4 is a diagram illustrating a process of character type search (third time) and word matching.

【図５】字種別の検索（４回目）と単語照合の過程を説
明する図である。FIG. 5 is a diagram illustrating a process of character type search (fourth time) and word matching.

【図６】字種別の検索（５回目）と単語照合の過程を説
明する図である。FIG. 6 is a diagram illustrating a process of character type search (fifth time) and word matching.

【図７】字種別の検索（６回目）と単語照合の過程を説
明する図である。FIG. 7 is a diagram illustrating a process of character type search (sixth time) and word matching.

【図８】従来の光学文字読取装置の知識処理方法を説明
する図である。FIG. 8 is a diagram illustrating a knowledge processing method of a conventional optical character reading device.

【図９】従来の光学文字読取装置の知識処理方法を説明
する図である。FIG. 9 is a diagram illustrating a knowledge processing method of a conventional optical character reading device.

[Explanation of symbols]

１走査部２前処理部３特徴抽出部４認識部５認識辞書６字種別検索部７照合部８字種別単語辞書 DESCRIPTION OF SYMBOLS 1 Scanning section 2 Pre-processing section 3 Feature extraction section 4 Recognition section 5 Recognition dictionary 6 Character classification search section 7 Collation section 8 Character classification word dictionary

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年１１月９日[Submission date] November 9, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Name of item to be amended] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【特許請求の範囲】[Claims]

【手続補正２】[Procedure Amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１０[Correction target item name] 0010

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００１０】[0010]

【課題を解決するための手段】本発明の光学文字読取装
置の知識処理装置は、文字が記載された媒体に光を照射
し前記媒体からの応答光を電気信号に変換して記憶する
走査部と、前記記憶された電気信号から１文字単位で切
出しを行ない大きさを整える前処理部と、切出された１
文字ごとに特徴を抽出する特徴抽出部と、認識辞書と、
抽出された特徴と前記認識辞書の内容とを比較し切出さ
れた１文字ごとに複数個の候補文字を認識結果として出
力する認識部とを有する光学文字読取装置に使用される
知識処理装置において、字種ごとに設けられた字種別単
語辞書と、前記認識部から出力された候補文字群から字
種ごとに同一字種からなる文字列を検索し、該文字列を
構成する候補文字の候補順位から該文字列の優先順位を
計算する字種別検索部と、前記優先順位に応じて前記文
字列の字種と同一字種の字種別単語辞書を照合し読取結
果を出力する照合部とを有する。A knowledge processing device for an optical character reader according to the present invention comprises a scanning unit for irradiating a medium on which characters are written with light and converting response light from the medium into an electric signal for storage. And a pre-processing unit that cuts out the stored electric signal in units of one character to adjust the size, and cut out one
A feature extraction unit that extracts features for each character, a recognition dictionary,
In a knowledge processing device used in an optical character reading device having a recognition unit that compares extracted features with the contents of the recognition dictionary and outputs a plurality of candidate characters for each extracted character as a recognition result. , A character type word dictionary provided for each character type, and a character string having the same character type for each character type are searched from a candidate character group output from the recognition unit, and candidate character candidates that form the character string are searched. And a collation unit that collates a character type word dictionary of the same character type as the character type of the character string according to the priority and outputs a reading result. Have.

【手続補正３】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１１[Correction target item name] 0011

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００１１】[0011]

【作用】候補文字群から字種ごとに同一字種からなる文
字列を検索し、候補文字の候補順位からこの文字列に対
する優先順位を計算し、字種ごとに設けられた字種別単
語辞書を照合して読取結果を得るので、一連の照合の対
象となる辞書側の単語の範囲を絞ることができて、照合
処理に要する時間が短縮される。また、異なる字種間の
区切りに応じて単語認識が行なわれるので、誤認識が減
って認識率が向上する。優先順位の算出方法としては、
例えば、候補順位の重み付け和に基づく方法がある。さ
らに、未照合の単独文字が残った場合には、文法処理を
併用するようにしてもよい。 [Function] A character string consisting of the same character type is searched from the candidate character group for each character type, the priority order for this character string is calculated from the candidate character candidate ranks, and the character type word dictionary provided for each character type is calculated. Since the reading result is obtained by collating, the range of words on the dictionary side to be subjected to a series of collations can be narrowed down, and the time required for the collation processing can be shortened. Further, since word recognition is performed according to the division between different character types, erroneous recognition is reduced and the recognition rate is improved. As a method of calculating the priority,
For example, there is a method based on a weighted sum of candidate ranks. It
In addition, if unmatched single characters remain, grammar processing is performed.
You may make it use together.

Claims

[Claims]

1. A scanning unit for irradiating a medium on which characters are written with light and converting reflected light from the medium into an electric signal and storing the electric signal, and a size for cutting out the stored electric signal in units of one character. A preprocessing unit for adjusting the size, a feature extraction unit for extracting features for each extracted character, a recognition dictionary, and the extracted features by comparing the extracted features with the contents of the recognition dictionary.
In a knowledge processing device used in an optical character reading device having a recognition unit for outputting a plurality of candidate characters for each character as a recognition result, a character type word dictionary provided for each character type and output from the recognition unit. A character type search unit that searches a character string having the same character type for each character type from the selected candidate character group, and calculates the priority order of the character string from the candidate order of the candidate characters that form the character string; A knowledge processing device for an optical character reading device, comprising: a collating unit that collates a character type word dictionary of the same character type with a character type of the character string according to a rank and outputs a reading result.