JPH0758499B2 - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JPH0758499B2
JPH0758499B2 JP62223154A JP22315487A JPH0758499B2 JP H0758499 B2 JPH0758499 B2 JP H0758499B2 JP 62223154 A JP62223154 A JP 62223154A JP 22315487 A JP22315487 A JP 22315487A JP H0758499 B2 JPH0758499 B2 JP H0758499B2
Authority
JP
Japan
Prior art keywords
word
character
characters
same
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP62223154A
Other languages
Japanese (ja)
Other versions
JPS6466790A (en
Inventor
浩一 樋口
裕久 後藤
義征 山下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP62223154A priority Critical patent/JPH0758499B2/en
Publication of JPS6466790A publication Critical patent/JPS6466790A/en
Publication of JPH0758499B2 publication Critical patent/JPH0758499B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は、文字認識装置に関し、特に文字単位の認識
後、単語単位に認識を行う文字認識装置に関するもので
ある。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, and more particularly, to a character recognition device that recognizes a character unit and then a word unit.

(従来の技術) 従来、この種の文字認識装置では、例えば電気学会編
「電子計算機入力のための文字図形の自動認識」P251〜
252に開示されており、住所、姓、名、商品名や文章な
どを認識対象として次のように認識処理が行われる。ま
ず、帳票上を走査して得られた入力文字パタン(イメー
ジパタン)を文字単位に切出して文字の認識を行う。こ
の認識の結果、複数の候補文字が得られ、一意に決定で
きなかった場合、前後の文字又は単語を単位として、候
補文字を組合せて得られた複数の候補単語について、あ
らかじめ用意した単語辞書と照合し、候補単語のうち単
語辞書に存在する単語の名称又は単語を構成する文字名
を認識結果として出力する。
(Prior Art) Conventionally, in this type of character recognition device, for example, "Automatic recognition of character figures for computer input" edited by The Institute of Electrical Engineers, P251-
It is disclosed in No. 252, and recognition processing is performed as follows with addresses, family names, first names, product names, sentences, etc. as recognition targets. First, the input character pattern (image pattern) obtained by scanning the form is cut out for each character to recognize the character. As a result of this recognition, a plurality of candidate characters are obtained, and when it is not possible to uniquely determine, for each of the plurality of candidate words obtained by combining the candidate characters in units of the preceding and succeeding characters or words, a word dictionary prepared in advance is used. Matching is performed and the names of the words existing in the word dictionary among the candidate words or the character names forming the words are output as the recognition result.

(発明が解決しようとする問題点) しかしながら、前記従来の文字認識装置では、例えば、
地名において などのように同じ地名を示し、読みが同じで、・印を付
した文字だけが異なる複数の単語が書き手によって区別
されずに使用される例が多く、単語辞書には同じ意味の
複数の単語を全て登録する必要があり、そのため単語辞
書の増大ひいては処理時間の増大を招くという問題点が
あった。
(Problems to be Solved by the Invention) However, in the conventional character recognition device, for example,
In place name For example, multiple words that have the same place name, have the same reading, and differ only in the marked characters are used without distinction by the writer. It is necessary to register all of the words, which causes a problem of increasing the word dictionary and eventually the processing time.

本発明は前記問題点を解決し、簡単で処理速度の速い文
字認識装置を提供することを目的とする。
SUMMARY OF THE INVENTION It is an object of the present invention to solve the above problems and provide a simple character recognizing device having a high processing speed.

(問題点を解決するための手段) 本発明は前記問題点を解決するために、帳票上を走査し
て得られるイメージパタンを文字単位に切出して文字の
認識を行う第1の認識部と、該認識結果の候補文字を単
語単位に組立てた候補単語と単語辞書内の単語とを照合
して単語の認識を行う第2の認識部とを備えた文字認識
装置において、構成文字のうち1部の文字のみが同音同
意文字で異なる同音同意の単語については当該同音同意
文字を特殊文字で置き換えた代表単語を前記単語辞書に
予め格納すると共に、前記特殊文字をアドレスとした同
音同意文字のテーブルを設けたものである。
(Means for Solving Problems) In order to solve the above problems, the present invention includes a first recognition unit that cuts out an image pattern obtained by scanning a form for each character to recognize characters. A character recognition device comprising a second recognition unit that recognizes a word by matching a candidate word in which candidate characters of the recognition result are assembled in word units with a word in a word dictionary. For the words of the same-sound agreement that differ only in the same-sound agreement character, the representative word in which the same-sound agreement character is replaced with a special character is stored in the word dictionary in advance, and a table of the same-sound agreement character with the special character as an address is stored. It is provided.

好ましい実施態様では、前記第2の認識部は、代表単語
との照合の際に、当該同音同意の候補単語の同音同意文
字をマスクして照合し、一致した場合には前記テーブル
を参照して得られる同音同意文字を代表単語の特殊文字
と順次置き換えて照合するものである。
In a preferred embodiment, the second recognizing unit masks the same-sound agreement characters of the same-sound agreement candidate word when matching with the representative word, and when they match, refers to the table. The obtained homonymous characters are sequentially replaced with the special characters of the representative word for matching.

(作用) 本発明によれば以上のように文字認識装置を構成したの
で技術的手段は次のように作用する。第2の認識部(例
えば後述する単語照合部)は、第1の認識部からの候補
文字を組立てた候補単語と単語辞書内の単語との照合の
際に、その単語が代表単語の場合には、例えば候補単語
としての同音同意の単語を構成する文字のうち同音同意
文字をマスクして照合するように働く。この結果、一致
する場合には、第2の認識部は、代表単語が示す特殊文
字に対応するテーブルを参照し、この結果得られる同音
同意文字を代表単語の特殊文字と順次置き換えた単語に
対し、当該同音同意の候補単語を照合するように働く。
この照合の結果、一致する場合には、第2の認識部は一
致した単語を構成する文字名(例えば文字コード)を認
識結果として順次出力する。従って、同音同意の複数の
単語は個別に登録せずに1つの代表単語を単語辞書に登
録すればよく、また意味の異なる単語に対しても同音同
意文字を共有できるので、単語辞書として必要な単語数
を少なくできると共に、認識のための処理時間を短かく
できる。
(Operation) According to the present invention, since the character recognition device is configured as described above, the technical means operates as follows. The second recognizing unit (for example, a word matching unit described later), when matching the candidate word assembled from the candidate characters from the first recognizing unit with a word in the word dictionary, if the word is a representative word, Works, for example, to mask and match the same-sound agreeing characters among the characters forming the same-sound agreeing word as a candidate word. As a result, if the two match, the second recognition unit refers to the table corresponding to the special character indicated by the representative word, and for the words obtained by sequentially replacing the homophones obtained as a result with the special characters of the representative word. , It works to match the candidate words of the homonym agreement.
As a result of this collation, when they match, the second recognition unit sequentially outputs the character names (for example, character codes) forming the matched words as the recognition result. Therefore, it is only necessary to register one representative word in the word dictionary without registering a plurality of words with the same phonetic agreement individually, and it is possible to share the same phonetic agreement characters even for words having different meanings. The number of words can be reduced and the processing time for recognition can be shortened.

(実施例) 第1図は本発明の一実施例を示す文字認識装置の構成図
である。本実施例の文字認識装置は文字認識部1,単語照
合部2,単語辞書3及び同音同意文字テーブル4から構成
される。
(Embodiment) FIG. 1 is a block diagram of a character recognition device showing an embodiment of the present invention. The character recognition device of this embodiment comprises a character recognition unit 1, a word collation unit 2, a word dictionary 3, and a homophony agreement character table 4.

第2図(a),(b)は、それぞれ単語辞書3、同音同
意文字テーブル4(即ちいずれもメモリ)のフォーマッ
ト例を示すものであり、第3図は入力文字パタン例に対
する候補文字例を示すものである。
2 (a) and 2 (b) show format examples of the word dictionary 3 and the homonym agreement character table 4 (that is, both are memories), and FIG. 3 shows candidate character examples for input character pattern examples. It is shown.

次に、第2図及び第3図を参照して本実施例の動作を説
明する。
Next, the operation of this embodiment will be described with reference to FIGS. 2 and 3.

まず、帳票面上を走査したときの反射光である光信号が
文字認識部1に入力されて光電変換される。この結果得
られた入力文字パタン(イメージパタン)を1文字単位
に切出し、切出したパタンに対し認識処理を行って得ら
れた候補文字名を単語照合部2に出力する。単語照合部
2では、文字認識部1より得られた候補文字名を単語単
位に区切り、かつ1文字につき候補文字名が複数の場合
は、それらを組み合わせて複数の候補単語を組み立て、
それら候補単語と単語辞書3に格納されている単語との
照合を同音同意文字テーブル4を参照して行う。単語辞
書3にはあらかじめ認識対象とする単語が登録されてい
る。また のような同音同意単語については・印を付した同音同意
文字を特殊文字に置き換え、1つの単語(即ち、代表単
語)として登録されている。第2図(a)の単語辞書の
例では「の」,「ノ」等は「(A)」に置き換えて登録
されている。同音同意文字テーブル4には単語辞書3内
の同音同意単語に与えた特殊文字に対応した実際の文字
(・印を付した同音同意文字)が格納されている。第2
図(b)の例では、例えば第2図(a)の特殊文字
(A)に対応して「の」「ノ」「之」「乃」が格納され
ている。
First, an optical signal, which is reflected light when the surface of a document is scanned, is input to the character recognition unit 1 and photoelectrically converted. The input character pattern (image pattern) obtained as a result is cut out on a character-by-character basis, and a candidate character name obtained by performing recognition processing on the cut-out pattern is output to the word matching unit 2. The word matching unit 2 divides the candidate character names obtained from the character recognition unit 1 into word units, and when there are a plurality of candidate character names for each character, combines them to assemble a plurality of candidate words,
The candidate words and the words stored in the word dictionary 3 are compared with each other by referring to the homophone synonymous character table 4. Words to be recognized are registered in the word dictionary 3 in advance. Also With respect to the same-sound agreement words like this, the same-sound agreement characters marked with are replaced with special characters and registered as one word (that is, a representative word). In the example of the word dictionary shown in FIG. 2 (a), "no", "no", etc. are replaced with "(A)" and registered. The same-sound agreement character table 4 stores actual characters corresponding to the special characters given to the same-sound agreement words in the word dictionary 3 (same-same agreement characters with a mark). Second
In the example of FIG. 2B, for example, "no", "no", "no", and "no" are stored corresponding to the special character (A) of FIG. 2 (a).

単語照合部2の動作を第3図の入力文字パタン「虎ノ
門」について述べる。この場合「門」の候補文字が複数
存在するので、それらを組合せた「虎ノ門」及び「虎ノ
問」が候補単語となる。次に前記候補単語が単語辞書3
内に存在するか調べる。候補単語と単語辞書3内の単語
が一致した場合には、当該単語を構成する文字名を例え
ば文字コードとして出力する。また、第3図の入力文字
パタン例のように、同音同意文字があてはまることを示
す特殊文字以外の文字が一致した場合は、その特殊文字
に対応する同音同意文字を同音同意文字テーブル4から
取り出し、単語辞書3内の当該単語の該特殊文字と置き
換えて、さらにその単語と候補単語との照合を行う。こ
の照合によって一致する単語が存在するとき、当該単語
を構成する文字名を文字コードで出力する。第3図の例
では、候補単語「虎ノ問」は単語辞書3に存在しない
が、候補単語「虎ノ門」に対して第2図(a)に示す単
語辞書3内の「虎(A)門」が対応する。さらに第2図
(b)に示す同音同意文字テーブル4内の特殊文字
(A)に対応する欄(即ちアドレス(A))の同音同意
文字を順次取り出し、これを特殊文字(A)と置き換え
て照合すると、同音同意文字「ノ」の場合に、「虎ノ
門」と一致する。よって、入力文字パタン「虎ノ門」は
正しく認識される。
The operation of the word matching unit 2 will be described with respect to the input character pattern "Toranomon" in FIG. In this case, since there are a plurality of candidate characters for “gate”, the candidate words are “Toranomon” and “Toranoquest”, which are a combination of them. Next, the candidate word is the word dictionary 3
Check if it exists inside. When the candidate word matches a word in the word dictionary 3, the character name forming the word is output as, for example, a character code. Further, as in the input character pattern example of FIG. 3, when a character other than the special character indicating that the same-sound consent character is applied is matched, the same-sound consent character corresponding to the special character is taken out from the same-sound consent character table 4. , The word in the word dictionary 3 is replaced with the special character, and the word is compared with the candidate word. When there is a matching word by this collation, the character names forming the word are output with a character code. In the example of FIG. 3, the candidate word “Torano Q” does not exist in the word dictionary 3, but for the candidate word “Toranomon”, the word “Tora (A) gate in the word dictionary 3 shown in FIG. "Corresponds. Further, the same-sound consent characters in the column (that is, the address (A)) corresponding to the special character (A) in the same-sound consent character table 4 shown in FIG. 2 (b) are sequentially taken out and replaced with the special characters (A). When collated, if the homonymous synonym “no”, it matches “toranomon”. Therefore, the input character pattern "Toranomon" is correctly recognized.

また入力文字パタン「虎の門」に対する候補単語「虎の
門」が文字認識部1から出力された場合についても同様
の動作で認識することができる。さらに についても、・印を付した文字に対して第2図(b)に
示す同音同意文字テーブル4の(A)欄を対応づけるこ
とにより、単語辞書3には「丸(A)内」を登録するだ
けで良い。
Further, when the candidate word “Toramon” for the input character pattern “Toramon” is output from the character recognition unit 1, the same operation can be performed. further Regarding also, by registering the circled character in the word dictionary 3 by associating the marked character with the (A) column of the homonymous agreement character table 4 shown in FIG. 2 (b). Just do it.

このように本実施例によれば、同音同意文字テーブル4
を用いることにより、単語文字列の一部文字だけが異な
り、読みの同じである単語を個別に登録する必要がなく
なるので単語辞書3を小型にできる。また、 の例のように、・印を付した同音同意文字を意味の異な
る単語間で共有することも可能である。
As described above, according to the present embodiment, the homonym agreement character table 4
By using, only some characters of the word character string are different and it is not necessary to individually register words having the same reading, so that the word dictionary 3 can be made compact. Also, As in the example above, it is possible to share the same-sound synonymous character with a mark between words having different meanings.

したがって性能を低下させることなく単語辞書3を小型
化できるので、ハード規模が小さく高速の文字認識装置
が実現できる。
Therefore, the word dictionary 3 can be miniaturized without deteriorating the performance, and a small-scale hardware and high-speed character recognition device can be realized.

(発明の効果) 以上詳細に説明したように本発明によれば、同音同意の
単語については代表単語で単語辞書に登録するようにす
ると共に代表単語の示す特殊文字をアドレスとする同音
同意文字のテーブルを設けたので、単語辞書の小型化と
認識処理速度の高速化を図ることができる。
(Effects of the Invention) As described in detail above, according to the present invention, a homonymous word is registered as a representative word in the word dictionary, and a special character indicated by the representative word is used as an address of the homonymous character. Since the table is provided, the word dictionary can be downsized and the recognition processing speed can be increased.

【図面の簡単な説明】[Brief description of drawings]

第1図は本発明の一実施例を示す文字認識装置のブロッ
ク図、第2図(a),(b)はそれぞれ単語辞書、同音
同意文字テーブルのフォーマット例を示す図、第3図は
本実施例の動作説明図である。 1…文字認識部、2…単語照合部、3…単語辞書、4…
同音同意文字テーブル。
FIG. 1 is a block diagram of a character recognition device showing an embodiment of the present invention, FIGS. 2 (a) and 2 (b) are diagrams showing a format example of a word dictionary and a homonym agreement character table, and FIG. 3 is a book. It is operation | movement explanatory drawing of an Example. 1 ... Character recognition unit, 2 ... Word matching unit, 3 ... Word dictionary, 4 ...
Homophone agreement character table.

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】帳票上を走査して得られるイメージパタン
を文字単位に切出して文字の認識を行う第1の認識部
と、該認識結果の候補文字を単語単位に組立てた候補単
語と単語辞書内の単語とを照合して単語の認識を行う第
2の認識部とを備えた文字認識装置において、 構成文字のうち1部の文字のみが同音同意文字で異なる
同音同意の単語については当該同音同意文字を特殊文字
で置き換えた代表単語を前記単語辞書に予め格納すると
共に、前記特殊文字に対応する同音同意文字のテーブル
を設けたことを特徴とする文字認識装置。
1. A first recognition unit for recognizing a character by cutting out an image pattern obtained by scanning a form for each character, a candidate word in which candidate characters of the recognition result are assembled in a word unit, and a word dictionary. In a character recognition device including a second recognition unit that recognizes a word by matching it with a word in the same character, only one part of the constituent characters is a same-sound consensus character A character recognition device characterized in that a representative word in which synonymous characters are replaced with special characters is stored in advance in the word dictionary and a table of homophone synonymous characters corresponding to the special characters is provided.
【請求項2】前記第2の認識部は、代表単語との照合の
際に、当該同音同意の候補単語の同音同意文字をマスク
して照合し、一致した場合には前記テーブルを参照して
得られる同音同意文字を代表単語の特殊文字と順次置き
換えて照合する特許請求の範囲第1項記載の文字認識装
置。
2. The second recognizing unit masks the same-sound agreement characters of the same-sound-consensus candidate word when matching with the representative word, and when they match, refers to the table. The character recognition device according to claim 1, wherein the obtained homophonic characters are sequentially replaced with the special characters of the representative word for matching.
JP62223154A 1987-09-08 1987-09-08 Character recognition device Expired - Lifetime JPH0758499B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62223154A JPH0758499B2 (en) 1987-09-08 1987-09-08 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62223154A JPH0758499B2 (en) 1987-09-08 1987-09-08 Character recognition device

Publications (2)

Publication Number Publication Date
JPS6466790A JPS6466790A (en) 1989-03-13
JPH0758499B2 true JPH0758499B2 (en) 1995-06-21

Family

ID=16793637

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62223154A Expired - Lifetime JPH0758499B2 (en) 1987-09-08 1987-09-08 Character recognition device

Country Status (1)

Country Link
JP (1) JPH0758499B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3542902B2 (en) 1998-03-13 2004-07-14 セイコープレシジョン株式会社 EL element
US7353173B2 (en) * 2002-07-11 2008-04-01 Sony Corporation System and method for Mandarin Chinese speech recognition using an optimized phone set
US7353172B2 (en) * 2003-03-24 2008-04-01 Sony Corporation System and method for cantonese speech recognition using an optimized phone set
US7353174B2 (en) * 2003-03-31 2008-04-01 Sony Corporation System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

Also Published As

Publication number Publication date
JPS6466790A (en) 1989-03-13

Similar Documents

Publication Publication Date Title
JPH0682403B2 (en) Optical character reader
JP2847715B2 (en) Character recognition device and character recognition method
JPH0758499B2 (en) Character recognition device
JPH0441388B2 (en)
JPS63138479A (en) Character recognizing device
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JP2874199B2 (en) Word dictionary matching device
JP2939945B2 (en) Roman character address recognition device
JPS63268082A (en) Pattern recognizing device
JPS6095689A (en) Optical character reader
JPH06180767A (en) Character recognizing device
JPH0778155A (en) Document recognizing device
JPS61114388A (en) Character input device
JP2839515B2 (en) Character reading system
JP3476872B2 (en) Character recognition device
JP2784004B2 (en) Character recognition device
JPH0272497A (en) Optical character reader
KR20210047192A (en) Apparatus and method for searching text based on phoneme
JPH0520505A (en) Character recognizing device
JPS6115288A (en) Optical character reader
JPH024035B2 (en)
JPS60207983A (en) Production system of dictionary for recognizing character
JPS60144886A (en) Post-processing system of character recognizer
JPH0746374B2 (en) Character recognition method
JPH06103403A (en) Optical character reader