JPH076212A - Intelligence processing unit for optical character reader - Google Patents

Intelligence processing unit for optical character reader

Info

Publication number
JPH076212A
JPH076212A JP5149476A JP14947693A JPH076212A JP H076212 A JPH076212 A JP H076212A JP 5149476 A JP5149476 A JP 5149476A JP 14947693 A JP14947693 A JP 14947693A JP H076212 A JPH076212 A JP H076212A
Authority
JP
Japan
Prior art keywords
character
candidate
character type
recognition
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5149476A
Other languages
Japanese (ja)
Inventor
Yasumasa Murai
康眞 村井
Shinsuke Yamashita
信介 山下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP5149476A priority Critical patent/JPH076212A/en
Publication of JPH076212A publication Critical patent/JPH076212A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To provide the intelligence processing unit for the optical character reader in which processing time is short and characters are surely recognized. CONSTITUTION:The processing unit is provided with a font-depending word dictionary 8 for each font, a font retrieval section 6 retrieving a character string comprising a same font from an object character group for each font from a recognition section 4 and calculating the priority of the character string from the object order of the object characters being components of the character string, and a collation section 7 collating the font-depending word dictionary 8 for each font of the same kind as the font of the character string retrieved depending on the priority and providing an output of a read result.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、光学文字読取装置の知
識処理装置に関し、特に、複数の字種(漢字、平仮名、
片仮名、数字、英字など)の文字が混在する一般文章を
読み取ることにより得られた候補文字群から辞書を用い
て単語照合を行なう光学文字読取装置の知識処理装置に
関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a knowledge processing device for an optical character reader, and more particularly to a plurality of character types (Kanji, Hiragana,
The present invention relates to a knowledge processing device of an optical character reading device that performs word matching using a dictionary from a candidate character group obtained by reading a general sentence in which characters such as katakana, numbers, and English letters are mixed.

【0002】[0002]

【従来の技術】紙などの媒体に書かれた文字を読み取っ
て、その文字を認識する光学文字読取装置があるが、書
かれた文字のイメージデータをパターン処理するだけで
は完全な認識を行なうことは難しく、文字を読み取った
のちに何らかの知識処理を行なって認識率を向上させる
ことが必要である。
2. Description of the Related Art There is an optical character reader for reading a character written on a medium such as paper and recognizing the character, but it is possible to perform complete recognition only by pattern-processing the image data of the written character. Is difficult, and it is necessary to improve the recognition rate by performing some knowledge processing after reading the characters.

【0003】従来の光学文字読取装置の知識処理では、
住所辞書、姓名辞書、企業名辞書、役職名辞書、アパー
ト・マンション名辞書、職業辞書、病名辞書などの各種
の辞書を用意し、一方、記入用紙においては記入欄ごと
に記入されるべき項目を定めてその情報を予め装置に登
録しておき、文字の読み取り後、記入欄ごとに対応する
辞書を照合して文字を決定していた。図8に示した例で
は、記入用紙51に企業名、役職名、氏名の各欄が設け
られ、これら各欄に記入された文字列について、企業名
辞書、役職名辞書、姓名辞書の各知識辞書とそれぞれ照
合されるようになっている。
In the knowledge processing of the conventional optical character reader,
We prepare various dictionaries such as address dictionary, first and last name dictionary, company name dictionary, job title dictionary, apartment / condominium name dictionary, occupation dictionary, and disease name dictionary. The information is set and registered in advance in the device, and after reading the characters, the corresponding dictionary is collated for each entry field to determine the characters. In the example shown in FIG. 8, each of the company name, job title, and name fields is provided on the entry form 51, and the knowledge of the company name dictionary, job name dictionary, and surname family dictionary for the character strings entered in these fields. It is designed to be matched with each dictionary.

【0004】読み取ろうとする文字列が一般文章である
場合には、一般単語辞書を含む上述した全ての辞書を使
用する。また、読み取ろうとする文字列の各文字ごとに
複数の候補文字を求めて候補文字群としておく。そし
て、図9に示されるように、候補文字群おいて文字列の
先頭の文字(第1文字目)についての第1候補文字を読
出し、この第1候補文字を先頭文字とする単語を辞書か
ら捜し出し、その単語の2文字目以降の文字が候補文字
群の第2文字目以降に存在しているかの比較を行なう。
比較によって単語の照合ができた場合にはその単語を読
取結果として出力し、照合できなかった場合には照合す
る辞書側の単語を更新し、照合処理を繰り返す。
When the character string to be read is a general sentence, all the above-mentioned dictionaries including the general word dictionary are used. In addition, a plurality of candidate characters is obtained for each character of the character string to be read and set as a candidate character group. Then, as shown in FIG. 9, the first candidate character for the first character (first character) of the character string in the candidate character group is read, and the word having the first candidate character as the first character is read from the dictionary. A search is performed and a comparison is made as to whether the second and subsequent characters of the word are present after the second and subsequent characters of the candidate character group.
If the word can be collated by comparison, the word is output as a read result, and if the word cannot be collated, the dictionary side word to be collated is updated, and the collation processing is repeated.

【0005】上述した照合処理によって候補文字群の第
1文字目の第1候補文字を先頭文字とするとする一連の
単語との照合を行なった結果、一致する単語がなかった
場合には、候補文字群の第1文字目の第2候補文字を先
頭文字とする単語について、同様の照合処理を実行す
る。
As a result of matching with a series of words whose first character is the first candidate character of the first character of the candidate character group as a result of the matching process described above, if no matching word is found, the candidate character is selected. The same matching process is executed for the word having the first candidate character of the first character of the group as the first character.

【0006】候補文字群の第1文字目の全ての候補文字
に関し、単語照合によってこれを先頭文字とする単語が
見つからなかった場合には、候補文字群の第2文字目を
先頭文字とする単語を照合すべく、上述の処理を実行す
る。そして、これらの処理を読み取ろうとする文字列の
全ての文字に対して実行し、結果を出力するようになっ
ている。
For all the first candidate characters of the candidate character group, if no word having the first character as the first character is found by word matching, the word having the second character of the candidate character group as the first character is found. The above-mentioned processing is executed in order to collate. Then, these processes are executed for all the characters of the character string to be read, and the result is output.

【0007】さらに特公平1-19195号公報には、候補文
字に認識順位に応じた重みをつけ、単語と照合する場合
に重みを考慮して重み値の一番高い単語を選択出力する
ように構成された知識処理方法が開示されている。
Further, Japanese Patent Publication No. 1-19195 discloses that a candidate character is weighted according to a recognition rank, and when matching with a word, the word having the highest weight value is selected and output in consideration of the weight. A structured knowledge processing method is disclosed.

【0008】[0008]

【発明が解決しようとする課題】日本語の文章では、一
般に、複数の字種(漢字、平仮名、片仮名、数字、英字
など)の文字が混在するが、上述した従来の知識処理方
法では、単語照合しようとする部分の第1番目の文字と
第2番目の文字とが異なる字種のものであっても照合処
理を行なうため、不要な処理時間がかかるという問題点
がある。また、字種が異なるということを意識せずに単
語照合を行なうため、候補文字として複数の字種の文字
が挙げられていた場合などに、本来ならば単語として抽
出されてはならない場合であるにも関わらず、単語とし
て認識してしまうことがあるという問題点がある。
Generally, in Japanese sentences, characters of a plurality of character types (Kanji, Hiragana, Katakana, numbers, English letters, etc.) are mixed. However, in the conventional knowledge processing method described above, Even if the first character and the second character of the portion to be collated are of different character types, the collation processing is performed, and there is a problem that unnecessary processing time is required. In addition, since word matching is performed without being aware of the fact that the character types are different, there is a case where the character should not be originally extracted as a word, for example, when characters of multiple character types are listed as candidate characters. Nevertheless, there is a problem that it may be recognized as a word.

【0009】本発明の目的は、処理時間が短く、かつ確
実に文字認識を行なうことができる光学文字読取装置の
知識処理装置を提供することにある。
It is an object of the present invention to provide a knowledge processing device for an optical character reading device which requires a short processing time and can reliably perform character recognition.

【0010】[0010]

【課題を解決するための手段】本発明の光学文字読取装
置の知識処理装置は、文字が記載された媒体に光を照射
し前記媒体からの反射光を電気信号に変換して記憶する
走査部と、前記記憶された電気信号から1文字単位で切
出しを行ない大きさを整える前処理部と、切出された1
文字ごとに特徴を抽出する特徴抽出部と、認識辞書と、
抽出された特徴と前記認識辞書の内容とを比較し切出さ
れた1文字ごとに複数個の候補文字を認識結果として出
力する認識部とを有する光学文字読取装置に使用される
知識処理装置において、字種ごとに設けられた字種別単
語辞書と、前記認識部から出力された候補文字群から字
種ごとに同一字種からなる文字列を検索し、該文字列を
構成する候補文字の候補順位から該文字列の優先順位を
計算する字種別検索部と、前記優先順位に応じて前記文
字列の字種と同一字種の字種別単語辞書を照合し読取結
果を出力する照合部とを有する。
A knowledge processing device of an optical character reader according to the present invention comprises a scanning unit for irradiating a medium on which characters are written with light and converting reflected light from the medium into an electric signal for storage. And a pre-processing unit that cuts out the stored electric signal in units of one character to adjust the size, and cut out one
A feature extraction unit that extracts features for each character, a recognition dictionary,
In a knowledge processing device used in an optical character reading device having a recognition unit that compares extracted features with the contents of the recognition dictionary and outputs a plurality of candidate characters for each extracted character as a recognition result. , A character type word dictionary provided for each character type, and a character string having the same character type for each character type are searched from a candidate character group output from the recognition unit, and candidate character candidates that form the character string are searched. And a collation unit that collates a character type word dictionary of the same character type as the character type of the character string according to the priority and outputs a reading result. Have.

【0011】[0011]

【作用】候補文字群から字種ごとに同一字種からなる文
字列を検索し、候補文字の候補順位からこの文字列に対
する優先順位を計算し、字種ごとに設けられた字種別単
語辞書を照合して読取結果を得るので、一連の照合の対
象となる辞書側の単語の範囲を絞ることができて、照合
処理に要する時間が短縮される。また、異なる字種間の
区切りに応じて単語認識が行なわれるので、誤認識が減
って認識率が向上する。
[Function] A character string consisting of the same character type is searched from the candidate character group for each character type, the priority order for this character string is calculated from the candidate character candidate ranks, and the character type word dictionary provided for each character type is calculated. Since the reading result is obtained by collating, the range of words on the dictionary side to be subjected to a series of collations can be narrowed down, and the time required for the collation processing can be shortened. Further, since word recognition is performed according to the division between different character types, erroneous recognition is reduced and the recognition rate is improved.

【0012】[0012]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。図1は、本発明の一実施例の光学文字読取
装置の構成を示すブロック図である。この光学文字読取
装置は、同一字種別の文字列の検索機能を備えたもので
ある。
Embodiments of the present invention will now be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an optical character reading device according to an embodiment of the present invention. This optical character reading device has a search function for a character string of the same character type.

【0013】文字が記載された紙などの媒体に光を照射
し媒体からの反射光を電気信号に変換して記憶する走査
部1が設けられ、記憶された電気信号から1文字単位で
切出しを行ない大きさを整える(正規化を行なう)前処
理部2が、走査部1に接続されている。前処理部2で正
規化された各文字に対しその文字の特徴を抽出する特徴
抽出部3と、文字と特徴との関係を記憶した認識辞書5
とが設けられている。特徴抽出部3の出力が入力しかつ
認識辞書を検索する認識部4が設けられており、認識部
4は、抽出された特徴と認識辞書5との内容を比較し、
切出された1文字ごとに複数個の候補文字を選択して認
識結果として出力するように構成されている。
A scanning unit 1 is provided which irradiates a medium such as paper on which characters are written with light and converts reflected light from the medium into an electric signal and stores the electric signal. The scanning unit 1 cuts the stored electric signal in units of one character. A preprocessing unit 2 that adjusts the size of the operation (performs normalization) is connected to the scanning unit 1. A feature extraction unit 3 that extracts the feature of each character normalized by the preprocessing unit 2 and a recognition dictionary 5 that stores the relationship between the character and the feature.
And are provided. A recognition unit 4 that receives the output of the feature extraction unit 3 and searches the recognition dictionary is provided, and the recognition unit 4 compares the contents of the extracted feature and the recognition dictionary 5,
A plurality of candidate characters is selected for each cut out character and is output as a recognition result.

【0014】さらに、この光学文字読取装置は、漢字、
平仮名、片仮名、英字、数字、記号などの字種ごとに設
けられた複数個の字種別単語辞書8と、認識部4から出
力された候補文字群を対象として字種別検索を行なって
文字列を出力する字種別検索部6と、字種別検索部6か
ら出力された文字列に対し、その文字列の優先順位に応
じてその文字列の字種と同一字種の字種別単語辞書を照
合し読取結果を出力する照合部7とを有する。字種別検
索部6は、認識部4からの候補文字群に対し、字種ごと
に同一字種からなる文字列を検索し、その文字列を構成
する候補文字の候補順位からその文字列の優先順位を計
算するように構成されている。
Furthermore, this optical character reader is designed to
A plurality of character type word dictionaries 8 provided for each character type such as hiragana, katakana, letters, numbers, and symbols, and a character type search for a candidate character group output from the recognition unit 4 are performed to obtain a character string. For the character type search unit 6 to be output and the character string output from the character type search unit 6, a character type word dictionary of the same character type as the character type of the character string is collated according to the priority of the character string. The verification unit 7 outputs the reading result. The character type search unit 6 searches the candidate character group from the recognition unit 4 for a character string having the same character type for each character type, and prioritizes the character string from the candidate ranks of the candidate characters forming the character string. It is configured to calculate ranks.

【0015】次に、本実施例の動作を説明する。Next, the operation of this embodiment will be described.

【0016】紙などに記入された文章は、走査部1によ
って読み込まれ、前処理部2で1文字ごとに切り出され
て大きさを整えられ、特徴抽出部3で1文字ごとに特徴
を抽出され、認識部4に入力する。認識部4では認識辞
書5との照合が行なわれ、切り出された各文字ごとに複
数個の候補文字が選択されて出力される。各候補文字に
は、それぞれ候補順位が付されている。ここまでの処理
は従来の光学文字読取装置と同様である。
The text entered on a paper or the like is read by the scanning unit 1, cut out by the pre-processing unit 2 for each character and adjusted in size, and the feature extraction unit 3 extracts the features for each character. , To the recognition unit 4. The recognition unit 4 collates with the recognition dictionary 5, and a plurality of candidate characters are selected and output for each cut out character. Each candidate character is assigned a candidate rank. The processing up to this point is similar to that of the conventional optical character reading device.

【0017】以下、18文字からなる「英字のEND
は,カタカナで表現すると」という文章が読み込まれ、
認識部4は各文字に対してそれぞれ9文字ずつの候補文
字を出力するものとして、説明を行なう。その結果、図
2に示すように、18×9=162個の候補文字からな
る候補文字群が、認識部4から出力される。なお、読取
文字No.は、読取文字列の中で文頭から何文字目である
かを示している。
Below, the 18-character "English character END"
Is read in katakana ”
The recognition unit 4 will be described assuming that each character outputs nine candidate characters. As a result, as shown in FIG. 2, a candidate character group consisting of 18 × 9 = 162 candidate characters is output from the recognition unit 4. The read character No. indicates the number of the character from the beginning of the sentence in the read character string.

【0018】続いて、上述した候補文字群が字種別検索
部6に入力し、同一字種別検索が行なわれる。同一字種
別検索は、まず、読取文字No.ごとに、その候補文字の
各文字について字種の区別が行なわれ、字種ごとに、字
数および重みの和が求められる。重みは、認識順位が第
1位であれば9、第2位であれば8、以下同様にして第
9位であれば1というようにふるものとする。
Subsequently, the above-mentioned candidate character group is input to the character type search unit 6 and the same character type search is performed. In the same-character-type search, first, a character type is distinguished for each character of the candidate character for each read character No., and the sum of the number of characters and the weight is obtained for each character type. The weight is 9 when the recognition rank is the first rank, 8 when the recognition rank is the second rank, and 1 in the same manner when the recognition rank is the ninth rank.

【0019】ここで説明している例では、読取文字No.1
については、その各候補文字「英葵芙莫奨萃夷菓菜」が
全て漢字であるので、漢字の数量が9、重みが45とな
る。同様に、読取文字No.3に対しては、平仮名の個数が
3でその重みが17、漢字の個数が5でその重みが2
0、記号の個数が1でその重みが8となる。
In the example described here, the read character No. 1
As for each of the candidate characters, “Ei Aoi Fu Mosho Ganryokuka” is all kanji, the number of kanji is 9 and the weight is 45. Similarly, for read character No. 3, the number of hiragana is 3, the weight is 17, the number of kanji is 5, and the weight is 2.
0, the number of symbols is 1, and the weight is 8.

【0020】各読取文字について上述の処理を行なった
のち、字種ごとに、各読取文字の候補文字に中にその字
種の文字が読取文字No.順に連続して存在するかどうか
が調べられる。連続して存在する場合、これを同一字種
列とし、各同一字種列ごとに、その候補文字中の同一字
種の文字数と重みの総和が求められる。図に示した例で
は、読取文字No.4、読取文字No.5および読取文字No.6の
各候補文字中に英字が存在し、読取文字No.3と読取文字
No.7の候補文字には英字が存在しないから、読取文字N
o.4、読取文字No.5および読取文字No.6をもって英字列N
o.1とする。そしてこれらの読取文字中の英字の総数と
英字についての重みの総和が求められる。ここで示した
例では、図2の同一字種列検索結果の欄に示されている
ように、英字の総数は4、重みの和は31となる。
After the above-described processing is performed for each read character, it is checked for each character type whether or not the characters of that character type are continuously present in the candidate characters of each read character in the order of the read character number. . If they exist consecutively, this is regarded as the same character type string, and the sum of the number of characters and the weight of the same character type in the candidate characters is obtained for each same character type string. In the example shown in the figure, there is an alphabetic character in each of the read character No. 4, read character No. 5, and read character No. 6 candidate characters, and read character No. 3 and read character No.
No English character exists in the candidate character No. 7, so the read character N
Alphabet string N with o.4, read character No. 5 and read character No. 6
o.1. Then, the total number of letters in these read characters and the sum of weights for the letters are obtained. In the example shown here, the total number of alphabetic characters is 4 and the sum of weights is 31, as shown in the same character type string search result column in FIG.

【0021】英字、平仮名、片仮名、漢字、数字、記号
の全ての字種について上記の処理を行なった結果が図2
に示されている。なお、数字および記号については、2
以上の読取文字No.に連続して出現しないので、同一字
種列を構成しない。このため、同一字種列検索結果の欄
には"()"が付されている。
FIG. 2 shows the result of the above processing for all the alphabetic characters, hiragana, katakana, kanji, numbers and symbols.
Is shown in. For numbers and symbols, 2
Since the above read character numbers do not appear consecutively, they do not form the same character type string. Therefore, "()" is added to the column of the same character type string search result.

【0022】続いて、同一字種列検索結果から、同一字
種列ごとの優先順位を求める。ここでは、各同一字種列
について、重みを数量(字数)で割った商を比較するこ
とにより優先順位を求める。英字列No.1については、重
みの和を文字の総数で割ると約7.75となる。他の同
一字種列についても同様に1文字当たりの重みを求め、
その1文字当たりの重みの大きい順に優先順位を割り当
てる。その結果が同一字種列優先順位の欄に示されてい
る。
Next, the priority order for each of the same character type strings is obtained from the same character type string search results. Here, the priority order is obtained by comparing the quotient obtained by dividing the weight by the quantity (the number of characters) for each of the same character type strings. For the English character string No. 1, the sum of weights divided by the total number of characters is approximately 7.75. Similarly, for other strings of the same character type, the weight per character is calculated,
Priorities are assigned in descending order of weight per character. The result is shown in the column of the same character type string priority.

【0023】各同一字種列に対してそれぞれ優先順位を
求めたら、照合部7は、優先順位が1位である同一字種
列について、その同一字種列を構成する読取文字No.の
候補文字と対応する字種の字種別単語辞書8とを照合し
て、単語の検索を行なう。ここでに示した例では、英字
列No.1が第1順位であるので、この英字列No.1に対応す
る読取文字No.4、読取文字No.5および読取文字No.6の各
候補文字と英語単語辞書とが照合され、その結果、「E
ND」の英字単語が一致する。
When the priority order is obtained for each of the same character type strings, the collating unit 7 selects the read character No. that constitutes the same character type string for the same character type string having the first priority. A character is searched by collating the character with the character type word dictionary 8 of the corresponding character type. In the example shown here, the alphabet string No. 1 is the first rank, so each of the read character No. 4, read character No. 5 and read character No. 6 candidates corresponding to this alphabet string No. 1 The letters are checked against the English word dictionary, and as a result, "E
The English word "ND" matches.

【0024】もし、優先順位が第1位の同一字種列にお
いて一致する単語が辞書になかった場合には、優先順位
が第2位の同一字種列(この場合なら平仮名列No.4)に
ついて、対応する字種別単語辞書との照合が行なわれ
る。一致する単語が見つかるまで、順次優先順位の低い
同一字種列を対象として照合が繰り返される。
If there is no matching word in the same character type string having the first priority, the dictionary has the same character type string having the second priority (in this case, hiragana string No. 4). Is compared with the corresponding character-type word dictionary. The matching is repeated for the same character type sequence having a lower priority in sequence until a matching word is found.

【0025】一致する単語が見つかったら、その一致す
る単語に対応する読取文字を除いた候補文字群につい
て、図3に示すように同一字種列検索からの処理を再度
行ない、同様に、優先順位の高い同一字種列から順に単
語照合を行なう。以下、同一字種列検索と単語照合とを
対象とする同一字種列がなくなるまで繰りかえす。図4
から図6は、この繰り返しの各段階を示したものであ
る。
When a matching word is found, the candidate character group excluding the read character corresponding to the matching word is processed again from the same character type string search as shown in FIG. The word matching is performed in order from the same character type string having a high Hereinafter, the same character type string search and word matching are repeated until there is no same character type string. Figure 4
6 to 9 show each step of this repetition.

【0026】照合対象の文字列がなくなった段階で、単
独文字が未照合の状態で残るが、このような単独文字列
については例えば文法処理を行ない、最終的な読取結果
を出力する。図7は、単独文字のみが残った時点の状態
を示しており、最終的な読取結果として、読取文字列に
完全に一致する「英字のENDは,カタカナで表現する
と」が得られたことを示している。
At the stage where there is no character string to be collated, single characters remain in a non-collated state. For such a single character string, for example, grammar processing is performed and the final reading result is output. FIG. 7 shows a state when only a single character remains, and as a final reading result, "the English character END is expressed in katakana" is obtained. Shows.

【0027】[0027]

【発明の効果】以上説明したように本発明は、字種ごと
の字種別単語辞書と、候補文字群から字種ごとに同一字
種からなる文字列を検索し、その文字列を構成する候補
文字の候補順位からその文字列の優先順位を計算する字
種別検索部と、優先順位に応じて検索された文字列の字
種と同一字種の字種別単語辞書を照合し読取結果を出力
する照合部とを設けることにより、一致する単語を見つ
けるまでの照合処理に要する時間が短縮でき、誤認識が
減って認識率が向上するという効果がある。
As described above, according to the present invention, a character type word dictionary for each character type and a character string having the same character type for each character type are searched from a candidate character group, and a candidate forming the character string is searched. The character type search unit that calculates the priority order of the character string from the candidate order of the character and the character type word dictionary of the same character type as the character type of the searched character string according to the priority order are collated and the read result is output. By providing the matching unit, it is possible to shorten the time required for the matching process until a matching word is found, reduce false recognition, and improve the recognition rate.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例の光学文字読取装置の構成を
示すブロック図である。
FIG. 1 is a block diagram showing a configuration of an optical character reading device according to an embodiment of the present invention.

【図2】字種別の検索(1回目)と単語照合の過程を説
明する図である。
FIG. 2 is a diagram illustrating a process of character type search (first time) and word matching.

【図3】字種別の検索(2回目)と単語照合の過程を説
明する図である。
FIG. 3 is a diagram illustrating a process of character type search (second time) and word matching.

【図4】字種別の検索(3回目)と単語照合の過程を説
明する図である。
FIG. 4 is a diagram illustrating a process of character type search (third time) and word matching.

【図5】字種別の検索(4回目)と単語照合の過程を説
明する図である。
FIG. 5 is a diagram illustrating a process of character type search (fourth time) and word matching.

【図6】字種別の検索(5回目)と単語照合の過程を説
明する図である。
FIG. 6 is a diagram illustrating a process of character type search (fifth time) and word matching.

【図7】字種別の検索(6回目)と単語照合の過程を説
明する図である。
FIG. 7 is a diagram illustrating a process of character type search (sixth time) and word matching.

【図8】従来の光学文字読取装置の知識処理方法を説明
する図である。
FIG. 8 is a diagram illustrating a knowledge processing method of a conventional optical character reading device.

【図9】従来の光学文字読取装置の知識処理方法を説明
する図である。
FIG. 9 is a diagram illustrating a knowledge processing method of a conventional optical character reading device.

【符号の説明】[Explanation of symbols]

1 走査部 2 前処理部 3 特徴抽出部 4 認識部 5 認識辞書 6 字種別検索部 7 照合部 8 字種別単語辞書 DESCRIPTION OF SYMBOLS 1 Scanning section 2 Pre-processing section 3 Feature extraction section 4 Recognition section 5 Recognition dictionary 6 Character classification search section 7 Collation section 8 Character classification word dictionary

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成5年11月9日[Submission date] November 9, 1993

【手続補正1】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Name of item to be amended] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【特許請求の範囲】[Claims]

【手続補正2】[Procedure Amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】0010[Correction target item name] 0010

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【0010】[0010]

【課題を解決するための手段】本発明の光学文字読取装
置の知識処理装置は、文字が記載された媒体に光を照射
し前記媒体からの応答光を電気信号に変換して記憶する
走査部と、前記記憶された電気信号から1文字単位で切
出しを行ない大きさを整える前処理部と、切出された1
文字ごとに特徴を抽出する特徴抽出部と、認識辞書と、
抽出された特徴と前記認識辞書の内容とを比較し切出さ
れた1文字ごとに複数個の候補文字を認識結果として出
力する認識部とを有する光学文字読取装置に使用される
知識処理装置において、字種ごとに設けられた字種別単
語辞書と、前記認識部から出力された候補文字群から字
種ごとに同一字種からなる文字列を検索し、該文字列を
構成する候補文字の候補順位から該文字列の優先順位を
計算する字種別検索部と、前記優先順位に応じて前記文
字列の字種と同一字種の字種別単語辞書を照合し読取結
果を出力する照合部とを有する。
A knowledge processing device for an optical character reader according to the present invention comprises a scanning unit for irradiating a medium on which characters are written with light and converting response light from the medium into an electric signal for storage. And a pre-processing unit that cuts out the stored electric signal in units of one character to adjust the size, and cut out one
A feature extraction unit that extracts features for each character, a recognition dictionary,
In a knowledge processing device used in an optical character reading device having a recognition unit that compares extracted features with the contents of the recognition dictionary and outputs a plurality of candidate characters for each extracted character as a recognition result. , A character type word dictionary provided for each character type, and a character string having the same character type for each character type are searched from a candidate character group output from the recognition unit, and candidate character candidates that form the character string are searched. And a collation unit that collates a character type word dictionary of the same character type as the character type of the character string according to the priority and outputs a reading result. Have.

【手続補正3】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】0011[Correction target item name] 0011

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【0011】[0011]

【作用】候補文字群から字種ごとに同一字種からなる文
字列を検索し、候補文字の候補順位からこの文字列に対
する優先順位を計算し、字種ごとに設けられた字種別単
語辞書を照合して読取結果を得るので、一連の照合の対
象となる辞書側の単語の範囲を絞ることができて、照合
処理に要する時間が短縮される。また、異なる字種間の
区切りに応じて単語認識が行なわれるので、誤認識が減
って認識率が向上する。優先順位の算出方法としては、
例えば、候補順位の重み付け和に基づく方法がある。さ
らに、未照合の単独文字が残った場合には、文法処理を
併用するようにしてもよい。
[Function] A character string consisting of the same character type is searched from the candidate character group for each character type, the priority order for this character string is calculated from the candidate character candidate ranks, and the character type word dictionary provided for each character type is calculated. Since the reading result is obtained by collating, the range of words on the dictionary side to be subjected to a series of collations can be narrowed down, and the time required for the collation processing can be shortened. Further, since word recognition is performed according to the division between different character types, erroneous recognition is reduced and the recognition rate is improved. As a method of calculating the priority,
For example, there is a method based on a weighted sum of candidate ranks. It
In addition, if unmatched single characters remain, grammar processing is performed.
You may make it use together.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 文字が記載された媒体に光を照射し前記
媒体からの反射光を電気信号に変換して記憶する走査部
と、前記記憶された電気信号から1文字単位で切出しを
行ない大きさを整える前処理部と、切出された1文字ご
とに特徴を抽出する特徴抽出部と、認識辞書と、抽出さ
れた特徴と前記認識辞書の内容とを比較し切出された1
文字ごとに複数個の候補文字を認識結果として出力する
認識部とを有する光学文字読取装置に使用される知識処
理装置において、 字種ごとに設けられた字種別単語辞書と、 前記認識部から出力された候補文字群から字種ごとに同
一字種からなる文字列を検索し、該文字列を構成する候
補文字の候補順位から該文字列の優先順位を計算する字
種別検索部と、 前記優先順位に応じて前記文字列の字種と同一字種の字
種別単語辞書を照合し読取結果を出力する照合部とを有
することを特徴とする光学文字読取装置の知識処理装
置。
1. A scanning unit for irradiating a medium on which characters are written with light and converting reflected light from the medium into an electric signal and storing the electric signal, and a size for cutting out the stored electric signal in units of one character. A preprocessing unit for adjusting the size, a feature extraction unit for extracting features for each extracted character, a recognition dictionary, and the extracted features by comparing the extracted features with the contents of the recognition dictionary.
In a knowledge processing device used in an optical character reading device having a recognition unit for outputting a plurality of candidate characters for each character as a recognition result, a character type word dictionary provided for each character type and output from the recognition unit. A character type search unit that searches a character string having the same character type for each character type from the selected candidate character group, and calculates the priority order of the character string from the candidate order of the candidate characters that form the character string; A knowledge processing device for an optical character reading device, comprising: a collating unit that collates a character type word dictionary of the same character type with a character type of the character string according to a rank and outputs a reading result.
JP5149476A 1993-06-21 1993-06-21 Intelligence processing unit for optical character reader Pending JPH076212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5149476A JPH076212A (en) 1993-06-21 1993-06-21 Intelligence processing unit for optical character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5149476A JPH076212A (en) 1993-06-21 1993-06-21 Intelligence processing unit for optical character reader

Publications (1)

Publication Number Publication Date
JPH076212A true JPH076212A (en) 1995-01-10

Family

ID=15475990

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5149476A Pending JPH076212A (en) 1993-06-21 1993-06-21 Intelligence processing unit for optical character reader

Country Status (1)

Country Link
JP (1) JPH076212A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020166722A (en) * 2019-03-29 2020-10-08 富士ゼロックス株式会社 Character recognition device and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6419195A (en) * 1987-07-14 1989-01-23 Kubota Ltd Cutting device for grinder pump
JPH01296393A (en) * 1988-05-25 1989-11-29 Toshiba Corp Category deciding device
JPH03189890A (en) * 1989-12-20 1991-08-19 Nippon Telegr & Teleph Corp <Ntt> Compound word collating method
JPH04318687A (en) * 1991-04-17 1992-11-10 N T T Data Tsushin Kk Character recognition unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6419195A (en) * 1987-07-14 1989-01-23 Kubota Ltd Cutting device for grinder pump
JPH01296393A (en) * 1988-05-25 1989-11-29 Toshiba Corp Category deciding device
JPH03189890A (en) * 1989-12-20 1991-08-19 Nippon Telegr & Teleph Corp <Ntt> Compound word collating method
JPH04318687A (en) * 1991-04-17 1992-11-10 N T T Data Tsushin Kk Character recognition unit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020166722A (en) * 2019-03-29 2020-10-08 富士ゼロックス株式会社 Character recognition device and program

Similar Documents

Publication Publication Date Title
US9875254B2 (en) Method for searching for, recognizing and locating a term in ink, and a corresponding device, program and language
US7269547B2 (en) Tokenizer for a natural language processing system
US5615378A (en) Dictionary retrieval device
US7415171B2 (en) Multigraph optical character reader enhancement systems and methods
US20070230787A1 (en) Method for automated processing of hard copy text documents
Lehal et al. A shape based post processor for Gurmukhi OCR
JPH0682403B2 (en) Optical character reader
JP2001175661A (en) Device and method for full-text retrieval
WO2000036530A1 (en) Searching method, searching device, and recorded medium
JP3975825B2 (en) Character recognition error correction method, apparatus and program
JPH076212A (en) Intelligence processing unit for optical character reader
JP2586372B2 (en) Information retrieval apparatus and information retrieval method
JP2570784B2 (en) Document reader post-processing device
JP2560959B2 (en) Post-processing method for character recognition
JPH0256086A (en) Method for postprocessing for character recognition
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JP3350127B2 (en) Character recognition device
JPS63282586A (en) Character recognition device
Le et al. Greek alphabet recognition technique for biomedical documents
JPH11120294A (en) Character recognition device and medium
JPH0589281A (en) Erroneous read correcting and detecting method
JP2917310B2 (en) Word dictionary search method for word matching
Marukawa et al. A post-processing method for handwritten Kanji name recognition using Furigana information
Kwon et al. Contextual postprocessing of a korean ocr system by linguistic constraints
JPH0562020A (en) Character recognition device