JPS59117673A

JPS59117673A - Postprocessing system of character recognizing device

Info

Publication number: JPS59117673A
Application number: JP57234691A
Authority: JP
Inventors: Touzen Hai; 裴　東善; Yoshihisa Fujii; 敬久藤井; Yukikazu Kaburayama; 蕪山　幸和
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1982-12-24
Filing date: 1982-12-24
Publication date: 1984-07-07
Also published as: JPH0259513B2

Abstract

PURPOSE:To shorten a postprocessing time by taking notice of the characteristics of a Japanese sentence repeating the same word or expression in one sentence to utilize the result read and recognized beforehand as a dictionary. CONSTITUTION:When a form A is recognized as a form B and the primary recognition result of the form B is C under this condition, it is checked whether the same character exists on a result D or not with respect to the character ''te'' (hand) of a candidate (1', 1). It is found that this character coincides with D10 of the result D. Next, when a candidate (2', 1) and D11 are collated with each other, they do not coincide with each other. Therefore, a candidate (2', 2) and D11 are collated with each other, and the second recognition result is judged to be ''sho'' (writing) because they coincide with each other. Similarly, the third character is recognized as ''ki'' (part of ''writing''), and a candidate (4', 1) is not equal to D13 and a candidate (4', 2) is not equal to D13, and thus, a series of processings are terminated up to the third character, and a new processing is started from the fourth character.

Description

【発明の詳細な説明】（Ａ）　　発明の技術分野本発明は、文字認識装置における後処理方式、特に認識
手段によって複数個の候補を出力した上で、その候補の
中から真の認識結果を選択する後処理を行うに当って、
先に認識された結果を辞書として利用するようにした文
字認識装置における後処理方式に関するものである。Detailed Description of the Invention (A) Technical Field of the Invention The present invention relates to a post-processing method in a character recognition device, in particular, to output a plurality of candidates by a recognition means, and then select a true recognition result from among the candidates. When selecting post-processing,
The present invention relates to a post-processing method in a character recognition device that uses previously recognized results as a dictionary.

（Ｂ）　　技術の背景と問題点従来から、上述の如き後処理を行うに轟っては、（１）
認識対象が例えば住所基や商品名などの如く限定されて
いる場合には当該住所基などを辞書としてもち単語単位
で後処理を行う方式や、（１１）一般的な例えば日本語
辞書をもち後処理を行う方式が知られている。(B) Technical background and problems Traditionally, post-processing as described above has been carried out (1).
When the recognition target is limited, such as an address base or a product name, there is a method that uses the address base as a dictionary and performs post-processing on a word-by-word basis, or (11) a general method that uses a Japanese dictionary, for example, and Methods of processing are known.

しかし、前者の方式の場合には予め限定された単語の範
囲内においてしか使用できないものであり、後者の場合
には日本語辞書を記憶する大容量のメモリを必要とする
と共に後処理時間がきわめて犬となる。However, the former method can only be used within a predefined range of words, and the latter method requires a large amount of memory to store the Japanese dictionary, and the post-processing time is extremely long. Become a dog.

（Ｃ）　　発明の目的と構成本発明は、上記後処理を効率よく行うようにすることを
目的としており、本発明の文字認識装置における後処理
方式は、認識されるべき文字を入力する文字入力手段と
、該文字の特徴を抽出する特徴抽出手段と、抽出された
特徴を予め登録されている辞書の内容と照合して認識結
果候補を順次出力する認識手段とを備えた文字認識装置
において、認識結果を修正する後処理機構をもうけると
共に、該後処理機構は、先に認識された結果の複数個の
文字群を文章辞書として保持する結果メモリと、新らた
に抽出された上記認識結果候補が上記結果メモリの内容
中のいずれかの文字と最長一致の概念で一致するか否か
を調べかつ一致する文字にもとづいて上記認識結果候補
から真の結果を得る後処理部を有し、先に認識された結
果を利用して新らたに抽出された認識結果候補から真の
結果を得るようにしたことを特徴としている。以下図面
を参照しつつ説明する。(C) Object and Structure of the Invention The purpose of the present invention is to efficiently perform the above-mentioned post-processing. A character recognition device comprising: a means for extracting features of the character; and a recognition means for comparing the extracted features with the contents of a dictionary registered in advance and sequentially outputting recognition result candidates, In addition to providing a post-processing mechanism for correcting the recognition results, the post-processing mechanism also includes a result memory that stores a plurality of previously recognized character groups as a text dictionary, and a result memory that stores the newly extracted recognition results. a post-processing unit that checks whether the candidate matches any character in the content of the result memory based on the concept of longest match and obtains a true result from the recognition result candidate based on the matching character; It is characterized in that the true result is obtained from newly extracted recognition result candidates using the previously recognized results. This will be explained below with reference to the drawings.

（Ｄ）　　発明の実施例第１図（Ａ）　（Ｂ）は一連の日本語文が複数枚の帳票
にわたって記述されている例、第２図は第１図（Ａ）図
示の帳票を認識し修正した結果の例、第３図は第１図（
Ｂ）図示の帳票を認識手段によって認識したナマの結果
の例、第４図は本発明にもとづいて後処理された結果例
、第５図は本発明の一実施例構成、第６図は第５図図示
の後処理部において実行される一実施例処理を示す。(D) Embodiment of the invention Figures 1 (A) and (B) are examples in which a series of Japanese sentences are written on multiple forms, and Figure 2 shows the recognition and correction of the forms shown in Figure 1 (A). An example of the results shown in Figure 3 is shown in Figure 1 (
B) An example of the raw result of recognizing the illustrated form by the recognition means, FIG. 4 is an example of the result after post-processing based on the present invention, FIG. 5 is an example of the configuration of an embodiment of the present invention, and FIG. 5 shows an example process executed in the post-processing section shown in FIG.

一般に日本語文の場合には、文章内で同一の単語や表現
が繰返して現われることが生じる。本発明においては、
この点を利用するようにしている。Generally, in the case of Japanese sentences, the same word or expression may appear repeatedly within the sentence. In the present invention,
I am trying to take advantage of this point.

令弟１図（Ａ）　（Ｂ）に示す如き帳票が存在し、これ
を認識するものとして説明する。そして、第１図（Ａ）
に示す帳票が第２図図示の如く認識されたとする。The following explanation assumes that there are forms as shown in Figures 1 (A) and (B), and that these are to be recognized. And, Figure 1 (A)
Assume that the form shown in is recognized as shown in FIG.

なお、第２図図示の結果は、本発明にいう認識手段によ
って認識されたナマの結果を表示修正°部によって修正
された正しい結果であるとする。It is assumed that the result shown in FIG. 2 is a correct result obtained by correcting the raw result recognized by the recognition means according to the present invention by the display correction unit.

この状態の下で、第１図（Ｂ）図示の帳票を認識するに
当って、認識手段によるナマの認識結果が、第３図図示
のものであった場合を考える。なお各文字毎に、図示の
場合、第１位の候補と第２位の候補とが得られている。In this state, when recognizing the form shown in FIG. 1(B), let us consider a case where the result of raw recognition by the recognition means is that shown in FIG. 3. Note that for each character, in the illustrated case, a first candidate and a second candidate are obtained.

以下候補をＫ（■Ａ）の如き形で特定するが、これは、
第１図（Ｂ）図示の帳票上の第の番目の文字についての
第４位の候補であることを表わす。また第１図（Ａ）図
示の帳票上の真の認識結果、即ち第２図に示す認識結果
に対してＤＪの如き形で特定するが、これは第２図図示
の認識結果の第１番目の結果を指示する。The following candidates are specified in the form K (■A), which is
FIG. 1(B) indicates that this is the 4th candidate for the No. 1 character on the illustrated form. In addition, the true recognition result on the form shown in FIG. 1 (A), that is, the recognition result shown in FIG. Indicate the result.

最初Ｋ（■、１）について、第２図図示の結果の上で同
一の文字が存在するか否かを調べてゆく。First, for K(■, 1), it is checked whether or not the same character exists based on the results shown in FIG.

なお、図示の場合Ｋ（■、１）は文字「手」である。該
Ｋ（■、１）の場合にはＤｌｏと一致する。In the illustrated case, K (■, 1) is the character "hand". In the case of K(■, 1), it matches Dlo.

次にＫ（■、１）とり８．との照合を調べると、この場
合には等しくない。したがろてＫ（■、２）とＤＩ＋と
の照合を行う。この場合には等しいので、第０番目の認
識結果は「書」であるものと考えるようにされ鼠。同様
に第０番目は「き」とされる。Next, take K(■, 1)8. In this case, they are not equal. Therefore, K(■, 2) is compared with DI+. In this case, since they are equal, the 0th recognition result is considered to be "calligraphy". Similarly, the 0th character is ``ki''.

次にＫ（■、１）Ｎハ３でありかつＫ（■、２β山。Next, K(■, 1)Nha3 and K(■, 2β mountain.

であることから、一連の処理は第０番目までで終了され
、第０番目からは新らたに処理を開始する。Therefore, the series of processing ends up to the 0th processing, and a new processing starts from the 0th processing.

つまＤ、Ｋ（■、１）と等しいものを第２図図示の結果
から探す。この場合、Ｋ（■、１）については一致する
ものが存在しないことから、次にＫ（■、２）と同じも
のを探す。この結果Ｋ（■、２　）　＝　Ｄ、。ｏ　　
が見出され、またＫ（■、１　）　＝Ｄ、。。Find something that is equal to Tsume D, K (■, 1) from the results shown in Figure 2. In this case, since there is no match for K(■, 1), we next search for the same one as K(■, 2). The result is K(■, 2) = D. o
is found, and K(■, 1) = D,. .

が得られ、第０番目の文字は「漢」であるとされる。更
にＫ（［株］、’　）　＝Ｄ＋＋ｏ　　となるが、Ｋ（
■、’　）　’Ｅ　Ｄ＋２０　　でかつＫ（０，２）　
’＝　ＤＩ２０　　であることから、第０番目の文字の
修正は行われない。is obtained, and the 0th character is assumed to be "Kan". Furthermore, K([stock],') = D++o, but K(
■,' ) 'ED+20 K(0,2)
Since '= DI20, the 0th character is not modified.

このようにして、第４図図示の結果が得られる。In this way, the result shown in FIG. 4 is obtained.

第５図は本発明の一実施例構成を示している。FIG. 5 shows the configuration of an embodiment of the present invention.

図中１は、観測部であって、第１図（Ａ）　（Ｂ）図示
の如き帳票を読取る。２は、特徴抽出部（手段）であっ
て、従来公知の如く、各文字毎に当該文字の特徴を抽出
する。３は、認識辞書であって、標準文字の特徴とその
カテゴリ名とが格納されている。Reference numeral 1 in the figure is an observation unit, which reads the forms shown in FIGS. 1(A) and 1(B). Reference numeral 2 denotes a feature extraction unit (means), which extracts the features of each character, as is conventionally known. 3 is a recognition dictionary in which features of standard characters and their category names are stored.

４は、認識部（手段）であって、特徴抽出部２がら供給
された文字の特徴と認識辞書３がら読出されてくる標準
文字の特徴との照合を行い、１つの文字について第３図
図示の例で言えば第１位候補と第２位候補とを出力する
。５ば、候補メモリであって、第３図図示の如く、第０
番目の文字に対応する候補を夫々格納する。６は、表示
修正制御部であって、例えば最初の帳票上の文字に関し
て、候補メモリ５から各文字毎の複数の候補を読込んで
、表示・修正部７に表示せしめる。また上記候補にもと
づいて得られた真の結果（第２図図示の如きもの）を結
果メモリ（文章辞書）９に格納する０７は、表示・修正
部であって、認識されつつある文字や候補や修正結果な
どが表示され、必要に応じてオペレータによって正しい
文字を選定したりする機能をもっている。８は、後処理
部であって、候補メモリ５の内容について結果メモリ９
の内容にもとづいて、第２図ないし第４図に関連して説
明した如き形で後処理を行う０９は、結果メモリ（文章
辞書）であって、正しい形で認識された結果が第２図図
示の如く格納されて後処理のために利用させるものであ
る。Reference numeral 4 denotes a recognition unit (means) that compares the character features supplied from the feature extraction unit 2 with the standard character features read out from the recognition dictionary 3, and converts each character into the characters shown in FIG. In the example, the first candidate and the second candidate are output. 5 is a candidate memory, as shown in FIG.
The candidates corresponding to the th character are stored respectively. Reference numeral 6 denotes a display correction control section which reads a plurality of candidates for each character from the candidate memory 5 and displays them on the display/correction section 7, for example, regarding the first character on the form. Further, 07, which stores the true result obtained based on the above candidates (as shown in FIG. It also displays the results of corrections, and allows the operator to select the correct characters if necessary. 8 is a post-processing unit that processes the contents of the candidate memory 5 into a result memory 9;
09 is a result memory (text dictionary) that performs post-processing in the form described in connection with Figs. 2 to 4 based on the contents of It is stored as shown in the figure and used for post-processing.

第５図図示の後処理部８における処理は、第２図ないし
第４図を参照して説明した所であるが、よシ詳細には、
第６図図示の一実施例処理の如く処理が行われる。第６
図図示の工は第３図図示の■、■・・・・・などのナン
バに対応し、Ｊは第２図図示の１１２、・・・・、１２
０の如きナンバに対応している０ちなみに、図示ルート
のは、上記第２位の候補にもとづいて照合が行われるル
ートに対応している。The processing in the post-processing section 8 shown in FIG. 5 has been described with reference to FIGS. 2 to 4, but in more detail,
Processing is performed as in the embodiment shown in FIG. 6th
The numbers shown in the figure correspond to the numbers such as ■, ■, etc. shown in Figure 3, and J corresponds to 112, ..., 12 shown in Figure 2.
0, which corresponds to a number such as 0, corresponds to a route in which matching is performed based on the second candidate.

（Ｅ）　　発明の詳細な説明した如く、本発明によれば、後処理のために持つ
べき辞書の容量がいわば一連の文章の単位で足シ、一般
の日本語辞書をもつことにくらべて、きわめて小さい容
量のもので済む０″ｉｆ、り、処理効率もすぐれている
。更に単語単位でなく文章などの単位で修正を行うこと
が可能となシ、後処理の精度が大きく向上する。(E) As described in detail, according to the present invention, the capacity of the dictionary required for post-processing is limited to the unit of a series of sentences, compared to having a general Japanese dictionary. Since the 0"if requires only a very small capacity, the processing efficiency is also excellent.Furthermore, it is possible to perform corrections not in units of words but in units of sentences, etc., greatly improving the accuracy of post-processing.

[Brief explanation of drawings]

第１図（Ａ）　（Ｂ）は一連の日本語文が複数枚の帳票
にわたって記述されている例、第２図は第１図（Ａ）図
示の帳票を認識し修正した結果の例、第３図は第１図（
Ｂ）図示の帳票を認識手段によって認識したナマの結果
の例、第４図は本発明にもとづいて後処理された結果例
、第５図は本発明の一実施例構成、第６図は第５図図示
の後処理部において実行される一実施例処理を示す。図中、ｌは観測部、２は特徴抽出部、３は認識辞書、４
は認識部、５は候補メモリ、６は表示修正制御部、７は
表示・修正部、８は後処理部、９は結果メモリ（文章辞
書）を表わす。Figures 1 (A) and (B) are examples where a series of Japanese sentences are written across multiple forms, Figure 2 is an example of the result of recognizing and correcting the form shown in Figure 1 (A), and Figure 3 is an example of the result of recognizing and correcting the form shown in Figure 1 (A). The figure is Figure 1 (
B) An example of the raw result of recognizing the illustrated form by the recognition means, FIG. 4 is an example of the result after post-processing based on the present invention, FIG. 5 is an example of the configuration of an embodiment of the present invention, and FIG. 5 shows an example process executed in the post-processing section shown in FIG. In the figure, l is an observation unit, 2 is a feature extraction unit, 3 is a recognition dictionary, and 4
5 is a recognition unit, 5 is a candidate memory, 6 is a display correction control unit, 7 is a display/correction unit, 8 is a post-processing unit, and 9 is a result memory (text dictionary).

Claims

[Claims]

Character input means for inputting characters to be recognized, feature extraction means for extracting features of the characters, and recognition for sequentially outputting recognition result candidates by comparing the extracted features with the contents of a dictionary registered in advance. A character recognition device comprising: a post-processing mechanism for correcting recognition results; It is checked whether the newly extracted recognition result candidate matches any character in the contents of the result memory based on the longest match concept, and based on the matching character, the back of the recognition result candidate is extracted from the recognition result candidate. Post-processing in a character recognition device, characterized in that it has a post-processing unit that obtains results, and uses previously recognized results to obtain true results from newly extracted recognition result candidates. Method 0