JPH03150692A

JPH03150692A - Word collation system

Info

Publication number: JPH03150692A
Application number: JP1289716A
Authority: JP
Inventors: Tadashi Kitamura; 正北村; Akemichi Tanaka; 田中　明道; Masami Oguro; 雅己小黒
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-11-07
Filing date: 1989-11-07
Publication date: 1991-06-27

Abstract

PURPOSE:To prevent false retrival due to the difference of the sort of a character by calculating the product of ratio between the distance value of the candidate character of first order among the candidate characters and the distance value of each candidate character and a standard value set beforehand, and sort-processign the points-obtained of a word obtained by a summing the points-obtained of the character as considering a normalized calculated result the points-obtained of the character. CONSTITUTION:Distance ratios to higher order such as th distance ratio between the first and the second of the candidate characters and the distance ratio between the first and the third and so on are calculated successively by inter-character candidate distance ratio calculation 102a. Further, the product of a constant value determined experientially and the distance ratio obtained in the processing of a preceding stage is calculated by normalized points-obtained calculation 102b. Next, in order to calculated the points-obtained of the word by using the points-obtained of the character candidate obtained by this processing, character points-obtained cumulative summation and word points-obtained sorting are executed, and a word candidate is fixed. Thus, at the time of the retrieval of the word in which plural sorts of the characters are mixed, the false retrieval of a word dictionary due to the difference of the sort of the character can be prevented.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字認識結果をもとに、単語辞書を検索して
最も光度の高い単語を検索する単語照合方式に関し、特
に手書き文字列の認識精度を高めることが可能なｓ語照
合方式に間する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a word matching method that searches a word dictionary to find the word with the highest luminance based on character recognition results, and is particularly applicable to handwritten character strings. We will develop an S-word matching method that can improve recognition accuracy.

[Conventional technology]

従来、手書き文字認識は１紙等の上に漢字やがなが手書
きされた文書を計算機にスカして、データベースを構築
する場合、あるいは、伝票、原稿等を入力する場合に用
いることが考えられるが、漢字の構造は複雑であるため
、文字認識のみにより充分な精度を得ることは難しく、
用語辞書を用いて文字認識の結果を補正することが一般
に行われている。Traditionally, handwritten character recognition has been used to create a database by scanning a document with kanji or kana handwritten on a piece of paper using a computer, or to input documents such as slips and manuscripts. However, because the structure of kanji is complex, it is difficult to obtain sufficient accuracy through character recognition alone.
It is common practice to correct the results of character recognition using a terminology dictionary.

例えば、“松尾他：　［連想統合型照合による単語あい
まい検索法」情報学会、第３４口金国大会。For example, “Matsuo et al.: [Fuzzy word retrieval method using associative integration type matching”, Information Society of Japan, 34th National Conference.

４Ｅ−７（１９８６）”に記載されている方法では、第
２図のように、文字認識により得られた文字の距離値を
一定の定数値から減算しく２０１゜２０２）、その差を
、文字の確信度を示す文字の得点として、文字認識の結
果得られる文字候補が含まれる単語に対し、その文字の
得点を加算する（２０３）、これにより、単語に対する
得点づけを行って、候補単語のソート処理を行い（２０
４）、得点の高いものから順次光度が高い単語として出
力していた。4E-7 (1986)'', as shown in Figure 2, the distance value of the character obtained by character recognition is subtracted from a constant value201°202), and the difference is calculated as the character distance value. As the score of the character indicating the confidence level of the character, the score of the character is added to the word that includes the character candidate obtained as a result of character recognition (203). Perform sorting process (20
4), the words with the highest luminosity were output in descending order of the score.

なお、距離値とは１文字辞書の標準パターンのベクトル
と候補文字が示すベクトルとの差である。Note that the distance value is the difference between the standard pattern vector of the one-character dictionary and the vector indicated by the candidate character.

また、定数値とは、その差が許容される範囲を示す値で
あり、定数値と距離値との差が大きいほど、光度が高い
と判断する。また、カテゴリ患とは、管理を容易にする
ため、ＪＩＳコード等のコードとは別に、漢字に付与し
た連番の番号である。Further, the constant value is a value indicating a range within which the difference is allowed, and it is determined that the larger the difference between the constant value and the distance value, the higher the luminous intensity. In addition, the category kanji is a serial number assigned to a kanji character, in addition to a code such as a JIS code, in order to facilitate management.

また、従来の手書き漢字を対象とした文字認識の手法と
しては、例えば“萩田他：　「外郭方向寄与度特徴によ
る手書き漢字の識別」電子通信学会論文誌、　Ｖｏｌ、
Ｊ６６−Ｄ　Ｎａ１Ｏ（１９８３）”に記載されている
外郭方向寄与度法が挙げられる。In addition, as a conventional character recognition method for handwritten kanji, for example, “Hagita et al.: “Identification of handwritten kanji using outer direction contribution characteristics” Journal of the Institute of Electronics and Communication Engineers, Vol.
J66-D Na1O (1983)'', the contour direction contribution method is mentioned.

この方法では、文字を構成する黒画素の分布状況を縦、
横、斜めの複数方向から調べ、各方向毎の黒画素の連続
量を特徴量（以下特徴ベクトルと呼ぶ）として利用する
。文字認識は、この特徴ベクトルを多数の文字サンプル
の平均値として求めた文字辞書と、認識するべき文字イ
メージデータから得られた特徴ベクトルとの間で距離計
算を行い、距離の小さいものから順次文字候補であると
判断する。With this method, the distribution of black pixels that make up a character is measured vertically and
Examination is performed from multiple directions horizontally and diagonally, and the amount of consecutive black pixels in each direction is used as a feature amount (hereinafter referred to as a feature vector). In character recognition, distances are calculated between a character dictionary obtained from this feature vector as the average value of many character samples and feature vectors obtained from the character image data to be recognized, and characters are selected in order from the one with the smallest distance. Judged as a candidate.

このように、黒画素の分布状況をもとにした文字認識方
法では、複雑な形をした漢字と、形の単純な英数字やか
な等とでは、距離計算の結果得られる距離値は、かなや
英数字に対するものの方が小さくなる傾向にある。なお
、形が複雑であるとは、イメージレベルで見た場合、黒
画素が多い、あるいはサンプルによる形の変動量が多い
ということである。In this way, in a character recognition method based on the distribution of black pixels, the distance value obtained as a result of distance calculation is difficult to distinguish between a complex-shaped kanji and a simple-shaped alphanumeric character, etc. and alphanumeric characters tend to be smaller. Note that a complex shape means that, when viewed at the image level, there are many black pixels or that there is a large amount of variation in the shape depending on the sample.

〔発明が解決しようとする課題Ｊ上記従来技術では５文字認識の結果から得られる距離値
が１文字種の違いにより影響を受ける点については考慮
されていないため、漢字とその他の文字種（非漢字）が
混在した単語の場合、非漢字文字列の重みが増すことに
より、正しい単語を検索できないという問題があった。[Problem to be Solved by the Invention J] The above prior art does not take into account the fact that the distance value obtained from the result of five-character recognition is affected by the difference in one character type. In the case of a word containing a mixture of , there was a problem that the correct word could not be retrieved due to the increased weight of non-kanji character strings.

例えば、第３図のように、′増鳥きね子”という手書き
文字を認識する場合、文字認識により得た各候補文字の
距離値は漢字候補の方が大きい。For example, when recognizing the handwritten character 'Kineko Masutori' as shown in FIG. 3, the distance value of each candidate character obtained through character recognition is larger for the Kanji candidate.

これにより、差計算の結果は、かな文字“ね““ぬ”、
および漢字“子”が高得点となってしまい、文字得点累
積加算結果から“増鳥きめ子”より“増畠きぬ子”が高
順位で出力される。As a result, the result of the difference calculation is the kana characters “Ne”, “Nu”,
and the kanji ``子'' have a high score, and from the cumulative addition result of character scores, ``Kinuko Masahata'' is output with a higher rank than ``Kimeko Masutori''.

本発明の目的は、このような問題点を改善し、複数の文
字種が混在した単語を検索する場合、文字種の違いによ
る誤検索を防止することが可能な単語照合方式を提供す
ることにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a word matching method that can improve such problems and prevent erroneous searches due to differences in character types when searching for words that include a mixture of multiple character types.

[Means to solve the problem]

上記目的を達成するため、本発明の単語照合方式は、カ
テゴリｌ　−Ｎの文字の特徴量の平均値が格納された文
字辞書を用い、手書き文字の特徴量と該文字辞書に格納
された文字の特徴量の平均値との距離計算を行って、距
離値の小さいカテゴリから順次入力文字の文字候補を決
定し、得られた文字候補列の情報をもとに単語辞書を検
索して、最も光度の高い単語を検索する単語検索方式に
おいて、文字候補間距離比計算により、第１位の候補文
字の距離値と各候補文字の距離値の比を求め、正規化得
点計算により、その比と所定の基準値（経験的に定めた
定数値）との積を算出して、算出結果を正規化した文字
得点とし、文字得点累積加算により、各文字得点を加算
して得た単語の得点をソート処理することに特徴がある
。In order to achieve the above object, the word matching method of the present invention uses a character dictionary in which the average value of the feature values of characters of categories l - N is stored, and the feature values of handwritten characters and the characters stored in the character dictionary are Calculates the distance from the average value of the feature values, determines character candidates for the input character in order from the category with the smallest distance value, searches the word dictionary based on the information on the obtained character candidate string, and selects the most In a word search method that searches for words with high luminosity, the distance value of the first candidate character and the distance value of each candidate character are calculated by calculating the distance ratio between character candidates, and the ratio is calculated by calculating the normalized score. Calculate the product with a predetermined standard value (a constant value determined empirically), use the calculation result as the normalized character score, and add the character scores by accumulating the character scores to obtain the word score. It is characterized by sorting processing.

また、上記基準値を複数設け、文字認識で得た候補文字
の特徴量によって文字の複雑さを判定し、複雑さに従っ
て基準値を選択することに特徴がある。Another feature is that a plurality of the above reference values are provided, the complexity of the character is determined based on the feature amount of the candidate character obtained through character recognition, and the reference value is selected according to the complexity.

〔作用）本発明においては、文字認識で得られた候補文字の得点
を算出する際、文字候補列の各々の候補文字の持つ距離
値と、第１位の候補文字の持つ距離値の比を算定し、こ
の比と基準値（経験的に定める定数値）との積を正規化
した文字得点とする。[Operation] In the present invention, when calculating the score of candidate characters obtained by character recognition, the ratio of the distance value of each candidate character in the character candidate string to the distance value of the first candidate character is calculated. The product of this ratio and the standard value (a constant value determined empirically) is the normalized character score.

これにより、複数の文字種が混在する単語を検索する際
、文字種の違いによる単語辞書の誤検索を避けることが
できる。As a result, when searching for a word that includes a mixture of multiple character types, it is possible to avoid erroneous searches in the word dictionary due to differences in character types.

さらに、経験的に定める定数値をカテゴリによって複数
用意し、正規化得点計算の際、候補文字の特徴ベクトル
によって適宜選択する。これは、特徴ベクトルが黒画素
の連続量をもとに算定されているため、近似的に文字の
複雑さを表現することによる。Furthermore, a plurality of empirically determined constant values are prepared for each category, and are appropriately selected depending on the feature vector of the candidate character when calculating the normalized score. This is because the feature vector is calculated based on the amount of continuous black pixels, so it approximately represents the complexity of the character.

これによ番ハ漢字、非漢字の区別だけでなく、文字の複
雑さを考慮して、検索精度をより向上させることが可能
である。This makes it possible not only to distinguish between Kanji and non-Kanji characters, but also to take into account the complexity of the characters, thereby further improving the search accuracy.

〔Example〕

以下、本発明の一実施例を図面により説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は、本発明の第１の実施例における単語照合方式
を示すフローチャート、第４図は本発明の第１の実施例
における単語照合処理を示す説明図である。FIG. 1 is a flowchart showing the word matching method in the first embodiment of the present invention, and FIG. 4 is an explanatory diagram showing the word matching process in the first embodiment of the present invention.

本実施例では、第１図のように、認識対象の手書き文字
列に対し、文字辞書を用いて文字認識を行う（１０１）
。In this embodiment, as shown in FIG. 1, a character dictionary is used to perform character recognition on a handwritten character string to be recognized (101).
.

次に、この際得られたカテゴリ番号と距離値をもとにし
て、正規化処理を行う（ｌ　Ｏ２）。Next, normalization processing is performed based on the category number and distance value obtained at this time (l O2).

すなわち、文字候補間距離比計算（１０２ａ）により、
候補文字の１位と２位の距離比、１位と３位の距離比と
いうように、上位との距離比を順次算定する。さらに、
正規化得点計算（１０２ｂ）により、経験的に定めた定
数値と前段の処理で得られた距離比の積を算定する。That is, by calculating the distance ratio between character candidates (102a),
Distance ratios with higher rankings are sequentially calculated, such as the distance ratio between the first and second candidate characters, the distance ratio between the first and third candidate characters, and so on. moreover,
The normalized score calculation (102b) calculates the product of the empirically determined constant value and the distance ratio obtained in the previous process.

次に、この処理により得られた文字候補の得点を利用し
て単語の得点を算定するために、文字得点累積加算およ
び単語得点ソートを行い（１０３゜１０４）、単語候補
を確定する。Next, in order to calculate a word score using the character candidate scores obtained through this process, character score cumulative addition and word score sorting are performed (103 and 104) to determine word candidates.

なお、１位の候補に対する正規化得点は、予め正しいカ
テゴリの分っているサンプルについて、文字辞書との距
離計算を行い、算定した距離値について、漢字、非漢字
ごとに分布を求めることにより定める。The normalized score for the first-place candidate is determined by calculating the distance from the character dictionary for samples for which the correct category is known in advance, and calculating the distribution of the calculated distance values for each kanji and non-kanji character. .

この単語照合処理の具体例は、第４５！Ｊに示される。A specific example of this word matching process is the 45th! Shown in J.

本実施例では、第３図と同様に、手書き文字列の“増鳥
きね子”を文字認識し、各候補文字の距離値を得る。In this embodiment, as in FIG. 3, the handwritten character string "Kineko Masutori" is character recognized and the distance value of each candidate character is obtained.

さらに、正規化処理を行う場合、第１位の候補文字の距
離比はｌとなるので、定数値＝２０００がそのまま文字
得点となる。すなわち、第１位の候補文字の得点が定数
値となる。従って、漢字および非漢字の違いに影響され
ることなく、第１位の候補文字の得点は”２０００”と
なる。Furthermore, when normalization processing is performed, the distance ratio of the first candidate character is l, so the constant value = 2000 becomes the character score as it is. That is, the score of the first candidate character becomes a constant value. Therefore, the score of the first candidate character is "2000" without being affected by the difference between kanji and non-kanji characters.

こうして得た文字得点を、単語辞書中で該当する文字に
ついて累積加算することにより、単語辞書に登録されて
いる“増鳥きめ子”が“増畠きぬ子“より高順位で出力
される。By cumulatively adding up the character scores obtained in this way for the corresponding characters in the word dictionary, "Kimeko Masutori" registered in the word dictionary is output at a higher rank than "Kinuko Masuhata".

第５図は、本発明の第２の実施例における単語照合方式
を示すフローチャートである。FIG. 5 is a flowchart showing the word matching method in the second embodiment of the present invention.

本実施例では、近似的に文字の複雑さを表現する特徴ベ
クトルにより、経験的に決めた定数値を複数設定して、
文字の形の複雑さに従った正規化処理を行う。In this example, a plurality of empirically determined constant values are set using feature vectors that approximately represent the complexity of characters.
Performs normalization processing according to the complexity of character shapes.

第５図のように、第１の実施例と同様に、文字認識を行
う（５０１）。As shown in FIG. 5, character recognition is performed in the same manner as in the first embodiment (501).

次に、この際得られたカテゴリ番号および距離値に加え
、経過データとして容易に得られる特徴ベクトルを用い
て正規化処理を行う（５０２）。Next, in addition to the category number and distance value obtained at this time, normalization processing is performed using a feature vector that can be easily obtained as progress data (502).

すなわち、文字候補間距離比計算（５０２ａ）により、
文字認識で得られた距離値から第１位の候補文字との比
を求める。さらに、定数選択（５０２ｂ）により、文字
認識で得られた特徴ベクトルの各次元の総和を計算し、
第′１位の候補文字の得点として複数用意した定数値の
中から、この総和値に従って最適な定数値を選択する。That is, by calculating the distance ratio between character candidates (502a),
The ratio to the first candidate character is calculated from the distance value obtained by character recognition. Furthermore, by constant selection (502b), the sum of each dimension of the feature vector obtained by character recognition is calculated,
The optimum constant value is selected from among a plurality of constant values prepared as the score of the ``1st candidate character'' according to this total value.

こうして選択した定数値を用い、前段の処理で得られた
距離比との積を算定する（５０２Ｃ）。Using the constant value thus selected, the product with the distance ratio obtained in the previous process is calculated (502C).

以降、第１の実施例と同様の処理を行い（５０３，５０
４）、単語候補を出力する。Thereafter, the same processing as in the first embodiment is performed (503, 50
4) Output word candidates.

〔Effect of the invention〕

本発明によれば、文字認識の結果書られる距離値を簡単
な方法で正規化することができるので、複数の文字種が
混在した単語を検索する際、文字種の違いにより距離値
の特性が違うことから発生する単語辞書の誤検索を避け
ることができる。According to the present invention, the distance value written as a result of character recognition can be normalized in a simple way, so when searching for a word that contains a mixture of multiple character types, the characteristics of the distance value will differ depending on the character type. It is possible to avoid erroneous word dictionary searches that occur from

[Brief explanation of the drawing]

第１図は本発明の第１の実施例における単語照合方式を
示すフローチャート、第２図は従来の単語照合方式を示
すフローチャート、第３１！Ｉは従来の単語照合処理を
示す説明図、第４図は本発明の第１の実施例における単
語照合処理を示す説明図、第５図は本発明の第２の実施
例における単語照合方式を示すフローチャートである。第図単語候補！ｓ′ｆｊｉFIG. 1 is a flowchart showing the word matching method in the first embodiment of the present invention, FIG. 2 is a flowchart showing the conventional word matching method, and 31! I is an explanatory diagram showing the conventional word matching process, FIG. 4 is an explanatory diagram showing the word matching process in the first embodiment of the present invention, and FIG. 5 is an explanatory diagram showing the word matching method in the second embodiment of the present invention. FIG. Diagram word candidates! s'fji

Claims

[Claims]

(1) Using a character dictionary that stores the average value of the feature values of characters in categories 1 to N, calculate the distance between the feature values of handwritten characters and the average value of the character feature values stored in the character dictionary. In the word search method, character candidates for the input character are determined sequentially from the category with the smallest distance value, and a word dictionary is searched based on the information of the obtained character candidate string to search for the word with the highest luminosity. Among the above character candidates, the product of the ratio of the distance value of the first candidate character to the distance value of each candidate character and a preset reference value is calculated, and the calculation result is normalized as the character score. A word matching method characterized by sorting the scores of words obtained by adding character scores.

(2) A plurality of the reference values are provided, the complexity of the character is determined based on the feature amount of the candidate character obtained through character recognition, and the reference value is selected according to the complexity. Word matching method.