JPS59117673A - Postprocessing system of character recognizing device - Google Patents

Postprocessing system of character recognizing device

Info

Publication number
JPS59117673A
JPS59117673A JP57234691A JP23469182A JPS59117673A JP S59117673 A JPS59117673 A JP S59117673A JP 57234691 A JP57234691 A JP 57234691A JP 23469182 A JP23469182 A JP 23469182A JP S59117673 A JPS59117673 A JP S59117673A
Authority
JP
Japan
Prior art keywords
character
recognition
candidate
result
post
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP57234691A
Other languages
Japanese (ja)
Other versions
JPH0259513B2 (en
Inventor
Touzen Hai
裴 東善
Yoshihisa Fujii
敬久 藤井
Yukikazu Kaburayama
蕪山 幸和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP57234691A priority Critical patent/JPS59117673A/en
Publication of JPS59117673A publication Critical patent/JPS59117673A/en
Publication of JPH0259513B2 publication Critical patent/JPH0259513B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To shorten a postprocessing time by taking notice of the characteristics of a Japanese sentence repeating the same word or expression in one sentence to utilize the result read and recognized beforehand as a dictionary. CONSTITUTION:When a form A is recognized as a form B and the primary recognition result of the form B is C under this condition, it is checked whether the same character exists on a result D or not with respect to the character ''te'' (hand) of a candidate (1', 1). It is found that this character coincides with D10 of the result D. Next, when a candidate (2', 1) and D11 are collated with each other, they do not coincide with each other. Therefore, a candidate (2', 2) and D11 are collated with each other, and the second recognition result is judged to be ''sho'' (writing) because they coincide with each other. Similarly, the third character is recognized as ''ki'' (part of ''writing''), and a candidate (4', 1) is not equal to D13 and a candidate (4', 2) is not equal to D13, and thus, a series of processings are terminated up to the third character, and a new processing is started from the fourth character.

Description

【発明の詳細な説明】 (A)  発明の技術分野 本発明は、文字認識装置における後処理方式、特に認識
手段によって複数個の候補を出力した上で、その候補の
中から真の認識結果を選択する後処理を行うに当って、
先に認識された結果を辞書として利用するようにした文
字認識装置における後処理方式に関するものである。
Detailed Description of the Invention (A) Technical Field of the Invention The present invention relates to a post-processing method in a character recognition device, in particular, to output a plurality of candidates by a recognition means, and then select a true recognition result from among the candidates. When selecting post-processing,
The present invention relates to a post-processing method in a character recognition device that uses previously recognized results as a dictionary.

(B)  技術の背景と問題点 従来から、上述の如き後処理を行うに轟っては、(1)
認識対象が例えば住所基や商品名などの如く限定されて
いる場合には当該住所基などを辞書としてもち単語単位
で後処理を行う方式や、(11)一般的な例えば日本語
辞書をもち後処理を行う方式が知られている。
(B) Technical background and problems Traditionally, post-processing as described above has been carried out (1).
When the recognition target is limited, such as an address base or a product name, there is a method that uses the address base as a dictionary and performs post-processing on a word-by-word basis, or (11) a general method that uses a Japanese dictionary, for example, and Methods of processing are known.

しかし、前者の方式の場合には予め限定された単語の範
囲内においてしか使用できないものであり、後者の場合
には日本語辞書を記憶する大容量のメモリを必要とする
と共に後処理時間がきわめて犬となる。
However, the former method can only be used within a predefined range of words, and the latter method requires a large amount of memory to store the Japanese dictionary, and the post-processing time is extremely long. Become a dog.

(C)  発明の目的と構成 本発明は、上記後処理を効率よく行うようにすることを
目的としており、本発明の文字認識装置における後処理
方式は、認識されるべき文字を入力する文字入力手段と
、該文字の特徴を抽出する特徴抽出手段と、抽出された
特徴を予め登録されている辞書の内容と照合して認識結
果候補を順次出力する認識手段とを備えた文字認識装置
において、認識結果を修正する後処理機構をもうけると
共に、該後処理機構は、先に認識された結果の複数個の
文字群を文章辞書として保持する結果メモリと、新らた
に抽出された上記認識結果候補が上記結果メモリの内容
中のいずれかの文字と最長一致の概念で一致するか否か
を調べかつ一致する文字にもとづいて上記認識結果候補
から真の結果を得る後処理部を有し、先に認識された結
果を利用して新らたに抽出された認識結果候補から真の
結果を得るようにしたことを特徴としている。以下図面
を参照しつつ説明する。
(C) Object and Structure of the Invention The purpose of the present invention is to efficiently perform the above-mentioned post-processing. A character recognition device comprising: a means for extracting features of the character; and a recognition means for comparing the extracted features with the contents of a dictionary registered in advance and sequentially outputting recognition result candidates, In addition to providing a post-processing mechanism for correcting the recognition results, the post-processing mechanism also includes a result memory that stores a plurality of previously recognized character groups as a text dictionary, and a result memory that stores the newly extracted recognition results. a post-processing unit that checks whether the candidate matches any character in the content of the result memory based on the concept of longest match and obtains a true result from the recognition result candidate based on the matching character; It is characterized in that the true result is obtained from newly extracted recognition result candidates using the previously recognized results. This will be explained below with reference to the drawings.

(D)  発明の実施例 第1図(A) (B)は一連の日本語文が複数枚の帳票
にわたって記述されている例、第2図は第1図(A)図
示の帳票を認識し修正した結果の例、第3図は第1図(
B)図示の帳票を認識手段によって認識したナマの結果
の例、第4図は本発明にもとづいて後処理された結果例
、第5図は本発明の一実施例構成、第6図は第5図図示
の後処理部において実行される一実施例処理を示す。
(D) Embodiment of the invention Figures 1 (A) and (B) are examples in which a series of Japanese sentences are written on multiple forms, and Figure 2 shows the recognition and correction of the forms shown in Figure 1 (A). An example of the results shown in Figure 3 is shown in Figure 1 (
B) An example of the raw result of recognizing the illustrated form by the recognition means, FIG. 4 is an example of the result after post-processing based on the present invention, FIG. 5 is an example of the configuration of an embodiment of the present invention, and FIG. 5 shows an example process executed in the post-processing section shown in FIG.

一般に日本語文の場合には、文章内で同一の単語や表現
が繰返して現われることが生じる。本発明においては、
この点を利用するようにしている。
Generally, in the case of Japanese sentences, the same word or expression may appear repeatedly within the sentence. In the present invention,
I am trying to take advantage of this point.

令弟1図(A) (B)に示す如き帳票が存在し、これ
を認識するものとして説明する。そして、第1図(A)
に示す帳票が第2図図示の如く認識されたとする。
The following explanation assumes that there are forms as shown in Figures 1 (A) and (B), and that these are to be recognized. And, Figure 1 (A)
Assume that the form shown in is recognized as shown in FIG.

なお、第2図図示の結果は、本発明にいう認識手段によ
って認識されたナマの結果を表示修正°部によって修正
された正しい結果であるとする。
It is assumed that the result shown in FIG. 2 is a correct result obtained by correcting the raw result recognized by the recognition means according to the present invention by the display correction unit.

この状態の下で、第1図(B)図示の帳票を認識するに
当って、認識手段によるナマの認識結果が、第3図図示
のものであった場合を考える。なお各文字毎に、図示の
場合、第1位の候補と第2位の候補とが得られている。
In this state, when recognizing the form shown in FIG. 1(B), let us consider a case where the result of raw recognition by the recognition means is that shown in FIG. 3. Note that for each character, in the illustrated case, a first candidate and a second candidate are obtained.

以下候補をK(■A)の如き形で特定するが、これは、
第1図(B)図示の帳票上の第の番目の文字についての
第4位の候補であることを表わす。また第1図(A)図
示の帳票上の真の認識結果、即ち第2図に示す認識結果
に対してDJの如き形で特定するが、これは第2図図示
の認識結果の第1番目の結果を指示する。
The following candidates are specified in the form K (■A), which is
FIG. 1(B) indicates that this is the 4th candidate for the No. 1 character on the illustrated form. In addition, the true recognition result on the form shown in FIG. 1 (A), that is, the recognition result shown in FIG. Indicate the result.

最初K(■、1)について、第2図図示の結果の上で同
一の文字が存在するか否かを調べてゆく。
First, for K(■, 1), it is checked whether or not the same character exists based on the results shown in FIG.

なお、図示の場合K(■、1)は文字「手」である。該
K(■、1)の場合にはDloと一致する。
In the illustrated case, K (■, 1) is the character "hand". In the case of K(■, 1), it matches Dlo.

次にK(■、1)とり8.との照合を調べると、この場
合には等しくない。したがろてK(■、2)とDI+と
の照合を行う。この場合には等しいので、第0番目の認
識結果は「書」であるものと考えるようにされ鼠。同様
に第0番目は「き」とされる。
Next, take K(■, 1)8. In this case, they are not equal. Therefore, K(■, 2) is compared with DI+. In this case, since they are equal, the 0th recognition result is considered to be "calligraphy". Similarly, the 0th character is ``ki''.

次にK(■、1)Nハ3でありかつK(■、2β山。Next, K(■, 1)Nha3 and K(■, 2β mountain.

であることから、一連の処理は第0番目までで終了され
、第0番目からは新らたに処理を開始する。
Therefore, the series of processing ends up to the 0th processing, and a new processing starts from the 0th processing.

つまD、K(■、1)と等しいものを第2図図示の結果
から探す。この場合、K(■、1)については一致する
ものが存在しないことから、次にK(■、2)と同じも
のを探す。この結果K(■、2 ) = D、。o  
が見出され、またK(■、1 ) =D、。。
Find something that is equal to Tsume D, K (■, 1) from the results shown in Figure 2. In this case, since there is no match for K(■, 1), we next search for the same one as K(■, 2). The result is K(■, 2) = D. o
is found, and K(■, 1) = D,. .

が得られ、第0番目の文字は「漢」であるとされる。更
にK([株]、’ ) =D++o  となるが、K(
■、’ ) ’E D+20  でかつK(0,2) 
’= DI20  であることから、第0番目の文字の
修正は行われない。
is obtained, and the 0th character is assumed to be "Kan". Furthermore, K([stock],') = D++o, but K(
■,' ) 'ED+20 K(0,2)
Since '= DI20, the 0th character is not modified.

このようにして、第4図図示の結果が得られる。In this way, the result shown in FIG. 4 is obtained.

第5図は本発明の一実施例構成を示している。FIG. 5 shows the configuration of an embodiment of the present invention.

図中1は、観測部であって、第1図(A) (B)図示
の如き帳票を読取る。2は、特徴抽出部(手段)であっ
て、従来公知の如く、各文字毎に当該文字の特徴を抽出
する。3は、認識辞書であって、標準文字の特徴とその
カテゴリ名とが格納されている。
Reference numeral 1 in the figure is an observation unit, which reads the forms shown in FIGS. 1(A) and 1(B). Reference numeral 2 denotes a feature extraction unit (means), which extracts the features of each character, as is conventionally known. 3 is a recognition dictionary in which features of standard characters and their category names are stored.

4は、認識部(手段)であって、特徴抽出部2がら供給
された文字の特徴と認識辞書3がら読出されてくる標準
文字の特徴との照合を行い、1つの文字について第3図
図示の例で言えば第1位候補と第2位候補とを出力する
。5ば、候補メモリであって、第3図図示の如く、第0
番目の文字に対応する候補を夫々格納する。6は、表示
修正制御部であって、例えば最初の帳票上の文字に関し
て、候補メモリ5から各文字毎の複数の候補を読込んで
、表示・修正部7に表示せしめる。また上記候補にもと
づいて得られた真の結果(第2図図示の如きもの)を結
果メモリ(文章辞書)9に格納する07は、表示・修正
部であって、認識されつつある文字や候補や修正結果な
どが表示され、必要に応じてオペレータによって正しい
文字を選定したりする機能をもっている。8は、後処理
部であって、候補メモリ5の内容について結果メモリ9
の内容にもとづいて、第2図ないし第4図に関連して説
明した如き形で後処理を行う09は、結果メモリ(文章
辞書)であって、正しい形で認識された結果が第2図図
示の如く格納されて後処理のために利用させるものであ
る。
Reference numeral 4 denotes a recognition unit (means) that compares the character features supplied from the feature extraction unit 2 with the standard character features read out from the recognition dictionary 3, and converts each character into the characters shown in FIG. In the example, the first candidate and the second candidate are output. 5 is a candidate memory, as shown in FIG.
The candidates corresponding to the th character are stored respectively. Reference numeral 6 denotes a display correction control section which reads a plurality of candidates for each character from the candidate memory 5 and displays them on the display/correction section 7, for example, regarding the first character on the form. Further, 07, which stores the true result obtained based on the above candidates (as shown in FIG. It also displays the results of corrections, and allows the operator to select the correct characters if necessary. 8 is a post-processing unit that processes the contents of the candidate memory 5 into a result memory 9;
09 is a result memory (text dictionary) that performs post-processing in the form described in connection with Figs. 2 to 4 based on the contents of It is stored as shown in the figure and used for post-processing.

第5図図示の後処理部8における処理は、第2図ないし
第4図を参照して説明した所であるが、よシ詳細には、
第6図図示の一実施例処理の如く処理が行われる。第6
図図示の工は第3図図示の■、■・・・・・などのナン
バに対応し、Jは第2図図示の112、・・・・、12
0の如きナンバに対応している0ちなみに、図示ルート
のは、上記第2位の候補にもとづいて照合が行われるル
ートに対応している。
The processing in the post-processing section 8 shown in FIG. 5 has been described with reference to FIGS. 2 to 4, but in more detail,
Processing is performed as in the embodiment shown in FIG. 6th
The numbers shown in the figure correspond to the numbers such as ■, ■, etc. shown in Figure 3, and J corresponds to 112, ..., 12 shown in Figure 2.
0, which corresponds to a number such as 0, corresponds to a route in which matching is performed based on the second candidate.

(E)  発明の詳細 な説明した如く、本発明によれば、後処理のために持つ
べき辞書の容量がいわば一連の文章の単位で足シ、一般
の日本語辞書をもつことにくらべて、きわめて小さい容
量のもので済む0″if、り、処理効率もすぐれている
。更に単語単位でなく文章などの単位で修正を行うこと
が可能となシ、後処理の精度が大きく向上する。
(E) As described in detail, according to the present invention, the capacity of the dictionary required for post-processing is limited to the unit of a series of sentences, compared to having a general Japanese dictionary. Since the 0"if requires only a very small capacity, the processing efficiency is also excellent.Furthermore, it is possible to perform corrections not in units of words but in units of sentences, etc., greatly improving the accuracy of post-processing.

【図面の簡単な説明】[Brief explanation of drawings]

第1図(A) (B)は一連の日本語文が複数枚の帳票
にわたって記述されている例、第2図は第1図(A)図
示の帳票を認識し修正した結果の例、第3図は第1図(
B)図示の帳票を認識手段によって認識したナマの結果
の例、第4図は本発明にもとづいて後処理された結果例
、第5図は本発明の一実施例構成、第6図は第5図図示
の後処理部において実行される一実施例処理を示す。 図中、lは観測部、2は特徴抽出部、3は認識辞書、4
は認識部、5は候補メモリ、6は表示修正制御部、7は
表示・修正部、8は後処理部、9は結果メモリ(文章辞
書)を表わす。
Figures 1 (A) and (B) are examples where a series of Japanese sentences are written across multiple forms, Figure 2 is an example of the result of recognizing and correcting the form shown in Figure 1 (A), and Figure 3 is an example of the result of recognizing and correcting the form shown in Figure 1 (A). The figure is Figure 1 (
B) An example of the raw result of recognizing the illustrated form by the recognition means, FIG. 4 is an example of the result after post-processing based on the present invention, FIG. 5 is an example of the configuration of an embodiment of the present invention, and FIG. 5 shows an example process executed in the post-processing section shown in FIG. In the figure, l is an observation unit, 2 is a feature extraction unit, 3 is a recognition dictionary, and 4
5 is a recognition unit, 5 is a candidate memory, 6 is a display correction control unit, 7 is a display/correction unit, 8 is a post-processing unit, and 9 is a result memory (text dictionary).

Claims (1)

【特許請求の範囲】[Claims] 認識されるべき文字を入力する文字入力手段と、該文字
の特徴を抽出する特徴抽出手段と、抽出された特徴を予
め登録されている辞書の内容と照合して認識結果候補を
順次出力する認識手段とを備えた文字認識装置において
、認識結果を修正する後処理機構をもうけると共に、該
後処理機構は、先に認識された結果の複数個の文字群を
文章辞書として保持する結果メモリと、新らたに抽出さ
れた上記認識結果候補が上記結果メモリの内容中のいず
れかの文字と最長一致の概念で一致するか否かを調べか
つ一致する文字にもとづいて上記認識結果候補から裏の
結果を得る後処理部を有し、先に認識された結果を利用
して新らたに抽出された認識結果候補から真の結果を得
るようにしたことを特徴とする文字認識装置における後
処理方式0
Character input means for inputting characters to be recognized, feature extraction means for extracting features of the characters, and recognition for sequentially outputting recognition result candidates by comparing the extracted features with the contents of a dictionary registered in advance. A character recognition device comprising: a post-processing mechanism for correcting recognition results; It is checked whether the newly extracted recognition result candidate matches any character in the contents of the result memory based on the longest match concept, and based on the matching character, the back of the recognition result candidate is extracted from the recognition result candidate. Post-processing in a character recognition device, characterized in that it has a post-processing unit that obtains results, and uses previously recognized results to obtain true results from newly extracted recognition result candidates. Method 0
JP57234691A 1982-12-24 1982-12-24 Postprocessing system of character recognizing device Granted JPS59117673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57234691A JPS59117673A (en) 1982-12-24 1982-12-24 Postprocessing system of character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57234691A JPS59117673A (en) 1982-12-24 1982-12-24 Postprocessing system of character recognizing device

Publications (2)

Publication Number Publication Date
JPS59117673A true JPS59117673A (en) 1984-07-07
JPH0259513B2 JPH0259513B2 (en) 1990-12-12

Family

ID=16974909

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57234691A Granted JPS59117673A (en) 1982-12-24 1982-12-24 Postprocessing system of character recognizing device

Country Status (1)

Country Link
JP (1) JPS59117673A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62105292A (en) * 1985-10-31 1987-05-15 Toshiba Corp Optical character recognition device
JPS6368989A (en) * 1986-09-11 1988-03-28 Fujitsu Ltd Document reader
JPH02122372A (en) * 1988-11-01 1990-05-10 Kenzo Ikegami Recognizer and translation device containing recognizer

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62105292A (en) * 1985-10-31 1987-05-15 Toshiba Corp Optical character recognition device
JPS6368989A (en) * 1986-09-11 1988-03-28 Fujitsu Ltd Document reader
JPH02122372A (en) * 1988-11-01 1990-05-10 Kenzo Ikegami Recognizer and translation device containing recognizer

Also Published As

Publication number Publication date
JPH0259513B2 (en) 1990-12-12

Similar Documents

Publication Publication Date Title
CN107153469B (en) Method for searching input data for matching candidate items, database creation method, database creation device and computer program product
Prutskov Algorithmic provision of a universal method for word-form generation and recognition
JPS59117673A (en) Postprocessing system of character recognizing device
JP2003331214A (en) Character recognition error correction method, device and program
JPS59229683A (en) Recognition processor
JP2839515B2 (en) Character reading system
JPS63138479A (en) Character recognizing device
JP3331302B2 (en) Post-processing device for character recognition
JPH0226268B2 (en)
JPH0355874B2 (en)
JPH0438026B2 (en)
JPH10134150A (en) Postprocessing method for character recognition result
JPS61133487A (en) Character recognizing device
JPS6173199A (en) Voice preselection system for large-vocabulary word
JPH05298489A (en) System for recognizing character
JPS59160275A (en) Word recognizing device
JPH05120325A (en) Electronic dictionary
JPH0291785A (en) Character recognizing device
JPH0540854A (en) Post-processing method for character recognizing result
JPH03214198A (en) Word spotting voice recognizing method
JPS5820075B2 (en) pattern recognition device
JPS62154169A (en) Dictionary retrieving method for kana-to-kanji converting device
JPH03198180A (en) Post-processing method for character recognition
JPH01183796A (en) Character recognizing device
JPH04148290A (en) Character recognition device