JPH0514953B2

JPH0514953B2 -

Info

Publication number: JPH0514953B2
Application number: JP59156258A
Authority: JP
Inventors: Yasunao Isaki
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-07-26
Filing date: 1984-07-26
Publication date: 1993-02-26
Also published as: JPS6133584A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、媒体上から読み取つた複数個の文字
の組合せを単語として照合する照合装置に係り、
特に高速照合が可能な照合装置に関するものであ
る。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a collation device that collates a combination of a plurality of characters read from a medium as a word.
In particular, the present invention relates to a verification device capable of high-speed verification.

近来、OCRの進歩は目覚ましく、英数字、か
な文字を対象とする活字印刷、及び手書き文字の
読み取りが可能なOCRが、帳票処理業務等に広
く実用に供されているが、更に漢字を含む日本語
文字の認識技術の開発も盛んで種々の方法が試み
られている。 In recent years, advances in OCR have been remarkable, and OCR, which is capable of printing alphanumeric characters and kana characters, and reading handwritten characters, is widely used in form processing operations, etc. Development of word recognition technology is also active, and various methods are being tried.

このようなOCRにおいては、認識する漢字が
複数個で単語を構成している時には個々の漢字を
認識した後、漢字の組合わせを単語辞書と照合す
ることにより認識精度を高めている。従つてこの
ような照合を行う場合には照合を高速に行う方法
が望まれている。 In this type of OCR, when a word is made up of multiple kanji, recognition accuracy is improved by recognizing each kanji and then comparing the combination of kanji with a word dictionary. Therefore, when performing such verification, a method of performing verification at high speed is desired.

[Conventional technology]

第３図は漢字を含む手書き文字を対象とする日
本語文字のOCRのブロツク図を示す。 Figure 3 shows a block diagram of OCR for Japanese characters, which targets handwritten characters including kanji.

図において、帳票１は、フイールド毎に顧客の
住所、氏名、または品名等が記された伝票であ
る。 In the figure, form 1 is a form in which the customer's address, name, product name, etc. are written in each field.

読取部２は、帳票１上に照射された光の反射光
をレンズ系２１を経てイメージセンサ２２によつ
て走査して１フレームの文字を読み取り、イメー
ジデータとして２値化回路３へ送る機能を有す
る。 The reading unit 2 has a function of scanning the reflected light of the light irradiated onto the form 1 through a lens system 21 with an image sensor 22, reading one frame of characters, and transmitting the characters as image data to the binarization circuit 3. have

主制御部４は、各部を制御して文字読取り、認
識処理プログラムを遂行する機能を有する。 The main control section 4 has a function of controlling each section and executing a character reading and recognition processing program.

画像メモリ５は、２値化されたイメージデー
タ、即ち、読み取られた文字の画像データを記憶
するものである。 The image memory 5 stores binarized image data, that is, image data of read characters.

１文字切出回路６は、フオーマツト情報メモリ
９から送られるフオーマツト情報に基いて、画像
メモリ５に記憶された１フレームの文字より１文
字を切り出して認識回路１０へ送る機能を有す
る。 The character cutting circuit 6 has a function of cutting out one character from one frame of characters stored in the image memory 5 and sending it to the recognition circuit 10 based on the format information sent from the format information memory 9.

特徴抽出回路７は、認識回路１０から送られる
文字の特徴、即ち、文字の画数、曲線係数等を抽
出して認識回路１０へ送る機能を有する。 The feature extraction circuit 7 has a function of extracting character features sent from the recognition circuit 10, such as the number of strokes of the character, curve coefficients, etc., and sending them to the recognition circuit 10.

辞書メモリ８は、認識の基準となる文字の特
徴、即ち、漢字、平仮名、片仮名、英文字、数
字、記号等の文字の特徴が記憶されており、認識
回路１０の要求により、順次認識回路１０へ送出
する機能を有する。 The dictionary memory 8 stores characteristics of characters serving as recognition standards, that is, characteristics of characters such as kanji, hiragana, katakana, English letters, numbers, symbols, etc. The dictionary memory 8 stores characteristics of characters such as kanji, hiragana, katakana, English letters, numbers, symbols, etc. It has a function to send to.

フオーマツト情報メモリ９は、帳票１上の文字
記入位置、及び単語長を示す情報が格納されてお
り、読み取られた文字の記入位置、或いは単語長
等を画像メモリ５、１文字切出回路６、及び認識
回路１０へ送る機能を有する。 The format information memory 9 stores information indicating the character entry position and word length on the form 1, and stores the read character entry position or word length etc. in the image memory 5, the single character cutting circuit 6, and has a function of sending it to the recognition circuit 10.

認識回路１０は、１文字切出回路６より送られ
た文字に対する特徴を特徴抽出回路７より受け取
り、辞書メモリ８から順次送られる文字の特徴と
を照合して一致度を求め、一致度の高いものから
順に文字コードを候補列とし、順次画像メモリ５
の文字の認識を行い、フオーマツト情報メモリ９
からの単語長によつて候補文字の文字コードを組
合せて単語コードを構成して単語候補列メモリ１
１へ送出する機能を有する。 The recognition circuit 10 receives from the feature extraction circuit 7 the features of the characters sent from the single character extraction circuit 6, compares them with the features of the characters sequentially sent from the dictionary memory 8, determines the degree of matching, and determines the degree of matching. Character codes are set as candidate strings in order, and sequentially stored in the image memory 5.
Recognizes the characters in the format information memory 9.
A word code is constructed by combining the character codes of candidate characters according to the word length from the word candidate string memory 1.
It has a function to send to 1.

単語候補列メモリ１１は、認識回路１０から送
られる単語候補列のコードを記憶する記憶手段で
ある。 The word candidate string memory 11 is a storage means for storing the code of the word candidate string sent from the recognition circuit 10.

比較回路１２は、単語候補列メモリ１１から送
られる単語候補列の文字コードの組合せと、単語
辞書メモリ１３から送られる単語とを比較照合し
て一致度の高い単語を候補として出力する機能を
有する。 The comparison circuit 12 has a function of comparing and collating the character code combinations of the word candidate string sent from the word candidate string memory 11 and the words sent from the word dictionary memory 13, and outputting words with a high degree of matching as candidates. .

単語辞書メモリ１３は、単語のコードを辞書と
して記憶する記憶手段である。 The word dictionary memory 13 is a storage means for storing word codes as a dictionary.

このような構成及び機能を有するので、文字認
識の方法を説明すると、まず帳票１上の文字が読
み取られて２値化された画像データは画像メモリ
５に格納される。 Since it has such a configuration and function, the character recognition method will be explained. First, the characters on the form 1 are read and the binarized image data is stored in the image memory 5.

次に画像データは１文字切出回路６に送られ、
フオーマツト情報メモリ９から送られた文字位置
情報に基いて、１文字の切出しを行つて認識回路
１０へ送る。 Next, the image data is sent to the single character cutting circuit 6,
Based on the character position information sent from the format information memory 9, one character is cut out and sent to the recognition circuit 10.

認識回路１０は入力した文字データを特徴抽出
回路７へ送り、その文字データの特徴を抽出させ
て受け取る。そこで辞書メモリ８より文字の特徴
を順次読み出して文字データの特徴と照合して、
一致度の高い文字を認識の答として候補文字と
し、フオーマツト情報メモリ９からの情報による
単語長によつて候補単語コードを編成して出力す
る。 The recognition circuit 10 sends the input character data to the feature extraction circuit 7, which extracts and receives the features of the character data. Therefore, the characteristics of the characters are sequentially read out from the dictionary memory 8 and compared with the characteristics of the character data.
Characters with a high degree of matching are recognized as candidate characters, and candidate word codes are organized and output based on word lengths based on information from the format information memory 9.

出力された候補単語コードの候補列は、単語候
補列メモリ１１に記憶され、更に比較回路１２へ
送られる。一方、単語辞書メモリ１３から単語コ
ードが比較回路１２へ送られて、候補単語コード
と比較され、照合の結果、一致した時はその単語
が存在したことになり、その候補単語コードｃが
出力される。 The output candidate word code candidate string is stored in the word candidate string memory 11 and further sent to the comparison circuit 12. On the other hand, the word code from the word dictionary memory 13 is sent to the comparison circuit 12, where it is compared with the candidate word code. If the word code matches as a result of the comparison, it means that the word exists, and the candidate word code c is output. Ru.

このようにして画像メモリ５に格納されている
画像データは順次文字認識の後、単語照合されて
出力される。 The image data stored in the image memory 5 in this manner is sequentially character-recognized, word-matched, and output.

[Problem that the invention seeks to solve]

上記従来方法では単語の照合方法として、入力
単語コードを単語辞書メモリ１３に記憶されてい
る単語コードと順次比較して行くので比較処理量
が多く照合速度が遅いという問題点がある。 In the conventional method described above, since the input word code is sequentially compared with the word code stored in the word dictionary memory 13 as a word matching method, there is a problem that the amount of comparison processing is large and the matching speed is slow.

[Means for solving problems]

本発明は、単語辞書メモリに、単語のコードを
アドレスとして該単語の存在の有無を異なる符号
で記憶し、入力単語を照合する時は、入力単語の
コードをアドレスとして単語辞書メモリより単語
の存在の有無を示す符号を読み出して判定する照
合装置であり、かくすることにより上記問題点を
解決することができる。 The present invention stores the presence or absence of a word in a word dictionary memory using the code of the word as an address, and stores the presence or absence of the word using a different code. This is a verification device that reads out and determines the code indicating the presence or absence of an object, thereby solving the above-mentioned problems.

[Effect]

本発明によれば、入力単語コードを単語辞書メ
モリに記憶されている単語コードと順次照合する
従来方法に代えて、単語辞書メモリ中に単語のコ
ードをアドレスとして単語の存在の有無を異なる
記号、例えば“１”，“０”で記憶しておき、照合
時に入力される単語コードをアドレスとして、そ
の記憶位置の“１”、或いは、“０”を読み取つて
その単語の存在の有無を判定することにより、検
索のみで照合できるので高速処理が可能となる。 According to the present invention, instead of the conventional method of sequentially collating an input word code with word codes stored in a word dictionary memory, the presence or absence of a word is determined by using a word code as an address in the word dictionary memory using different symbols. For example, it is stored as "1" and "0", and using the word code input during verification as an address, the "1" or "0" at that storage location is read to determine whether the word exists. This enables high-speed processing because matching can be performed only by searching.

〔Example〕

以下、本発明の一実施例を第１図及び第２図を
参照して説明する。第１図は本発明による実施例
を示すブロツク図、第２図は第１図の説明図であ
る。全図を通じて同一符号は同一対象物を示す。 An embodiment of the present invention will be described below with reference to FIGS. 1 and 2. FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is an explanatory diagram of FIG. 1. The same reference numerals indicate the same objects throughout the figures.

第１図において、主制御部４ａは、読み取られ
た文字の画像データの特徴に基いて認識処理を遂
行し、またその結果により編成された単語を後述
する符号化単語辞書メモリを制御して照合処理を
行う機能を有する。 In FIG. 1, the main control unit 4a performs recognition processing based on the characteristics of the image data of the read characters, and also controls the encoded word dictionary memory (described later) to collate the words organized based on the results. It has the function of performing processing.

符号化単語辞書メモリ１４は、単語のコードを
アドレスとして単語の存在の有無を異なる符号で
記憶する記憶手段である。即ち、単語コードをア
ドレスとして、その記憶位置に、符号として例え
ばその記憶位置に単語が存在するものは、“１”、
存在しないものは“０”が記憶されている。 The encoded word dictionary memory 14 is a storage means that uses the code of a word as an address and stores the presence or absence of a word using a different code. That is, if a word code is used as an address and a word exists at that storage location as a code, for example, "1",
If the item does not exist, "0" is stored.

そして従来例で説明した比較回路１２は省略さ
れている。 The comparison circuit 12 described in the conventional example is omitted.

このような構成及び機能を有するので、認識回
路１０での認識処理後の単語照合方法を説明する
と、まず認識回路１０より出力された候補単語コ
ードの候補列は、単語候補列メモリ１１に記憶
される。 Having such a configuration and function, the word matching method after recognition processing in the recognition circuit 10 will be explained. First, the candidate string of candidate word codes output from the recognition circuit 10 is stored in the word candidate string memory 11. Ru.

単語候補列メモリ１１に記憶された候補単語
コードは符号化単語辞書メモリ１４に送られ、
候補単語コードをアドレスとして検索し、その
記憶位置の符号を読み取られて、“１”、或いは
“０”が出力して単語候補列メモリ１１に送ら
れる。例えば第２図に示すように、入力単語が
“富士”の場合に、そのコード（C9C9，
BBCE）に対するアドレスａ，ｂによつて符号
化単語辞書メモリ１４を検索して符号を読み取
る。 The candidate word codes stored in the word candidate string memory 11 are sent to the encoded word dictionary memory 14,
The candidate word code is searched as an address, the code at the storage location is read, and "1" or "0" is output and sent to the word candidate string memory 11. For example, as shown in Figure 2, if the input word is "Fuji", the code (C9C9,
The encoded word dictionary memory 14 is searched using addresses a and b for BBCE) and the code is read.

かくて符号が“１”であれば、その単語は存
在すると判定され、また“０”であれば存在し
ないと判定される。そこで“１”の時に候補単
語コードｃが出力される。。 Thus, if the code is "1", it is determined that the word exists, and if the code is "0", it is determined that the word does not exist. Therefore, when the value is "1", candidate word code c is output. .

このようにして単語候補列メモリ１１に入力さ
れた単語候補列の単語コードをアドレスとして符
号化単語辞書メモリ１４の検索により高速照合す
ることができる。 In this manner, high-speed verification can be performed by searching the encoded word dictionary memory 14 using the word code of the word candidate string inputted to the word candidate string memory 11 as an address.

上記は認識回路１０より出力した単語候補列を
直ちに単語候補列メモリ１１に入力して処理する
場合を説明したが、認識回路１０からの出力を例
えばフロツピーデイスク等の記憶手段に順次記憶
して置き、後から一括して処理する方法としても
同様の効果が得られる。 In the above description, the word candidate string output from the recognition circuit 10 is immediately input to the word candidate string memory 11 for processing. However, the output from the recognition circuit 10 may be sequentially stored in a storage means such as a floppy disk. A similar effect can be obtained by placing it in place and processing it all at once.

〔Effect of the invention〕

以上説明したように本発明によれば、単語候補
の照合を高速に処理することができるという効果
がある。 As explained above, according to the present invention, there is an effect that matching of word candidates can be processed at high speed.

[Brief explanation of drawings]

第１図は本発明による実施例を示すブロツク
図、第２図は第１図の説明図、第３図は従来方法
を示すブロツク図である。図において、４，４ａは主制御部、５は画像メ
モリ、６は１文字切出回路、７は特徴抽出回路、
８は辞書メモリ、９はフオーマツト情報メモリ、
１０は認識回路、１１は単語候補列メモリ、１２
は単語辞書メモリ、１３は比較回路、１４は符号
化単語辞書メモリを示す。 FIG. 1 is a block diagram showing an embodiment according to the present invention, FIG. 2 is an explanatory diagram of FIG. 1, and FIG. 3 is a block diagram showing a conventional method. In the figure, 4 and 4a are main control units, 5 is an image memory, 6 is a single character extraction circuit, 7 is a feature extraction circuit,
8 is a dictionary memory, 9 is a format information memory,
10 is a recognition circuit, 11 is a word candidate string memory, 12
13 is a comparison circuit, and 14 is a coded word dictionary memory.

Claims

[Claims]

1. A collation device for determining whether or not an input word exists in a word dictionary memory, comprising a word dictionary memory that uses a word code as an address and stores the presence or absence of the word with a different code, A collation device characterized in that when collating an input word, the code of the input word is used as an address and a code indicating the presence or absence of the word is read out from the word dictionary memory to make a determination.