JPH04340686A

JPH04340686A - Name dictionary for post-processing of character recognition

Info

Publication number: JPH04340686A
Application number: JP3113235A
Authority: JP
Inventors: Masaaki Nakanou; 中農　正明
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 1991-05-17
Filing date: 1991-05-17
Publication date: 1992-11-27

Abstract

PURPOSE:To easily obtain proper reading and Chinese characters (KANJI) by providing the dictionary equipped with a family name reading part, name reading part, family name word part and name word part and arranged while being divided for each number of characters. CONSTITUTION:A family name reading part 2 stores KANJI 'Nakamura' constituting a family name in respect to reading 'Nakamura', for example, a name reading part 3 stores KANJI 'Taro' constituting a name in respect to reading 'Taro', for example, a family name word part 4 stores 'Nakamura' constituting the reading in respect to the KANJI 'Nakamura' and a name word part 5 stores 'Taro' constituting the reading in respect to the KANJI 'Taro' respectively correspondently. For example, when the reading of 'Nakamufu' is recognized as one candidate, the part of 'reading composed of four characters' is investigated in the family name reading part 2, and it is discriminated that any family name having the reading of 'Nakamufu' does not exist but the reading of 'Nakamura' exists. Further, the KANJI corresponding to 'Nakamura' is investigated among several candidates.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、文字認識装置の後処理
に使用する文字認識後処理用氏名辞書に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a name dictionary for post-processing of character recognition used in post-processing of a character recognition device.

【０００２】0002

【従来の技術】従来から、帳票上の氏名を読み取って認
識する文字認識装置においては、認識された複数の候補
氏名をもとに、氏名辞書を索引して、最も好ましいと考
えられる氏名を最終結果として出力するようにしている
。[Prior Art] Conventionally, in character recognition devices that read and recognize names on forms, a name dictionary is indexed based on a plurality of recognized candidate names, and the name considered to be the most preferable is selected as the final name. I am trying to output it as a result.

【０００３】このような後処理に使用する氏名辞書は、
従来、姓に関する情報が格納される姓格納部と、名に関
する情報が格納される名格納部とをもつようにされる。[0003] The name dictionary used for such post-processing is
Conventionally, a device has been provided with a surname storage section in which information related to the surname is stored and a given name storage section in which information related to the given name is stored.

【０００４】0004

【発明が解決しようとする課題】従来の場合には、姓な
らば姓に関しての文字数ごと、また名なら名に関しての
文字数ごとに区分して配列されて格納されるという状態
になかった。このために、読みなら読みが完全にマッチ
ングされる氏名が最終結果として得られることになるも
のであった。したがって、例えば読みに関して先頭文字
と文字長だけが合致する幾つかの氏名を抽出したりする
ことが困難であった。即ち、例えば読みに関して、１部
の文字のみが誤認識されている如き場合に、可能性のあ
る氏名を索引してみるというような処理ができなかった
。[Problems to be Solved by the Invention] In the past, surnames were not sorted and stored by the number of characters in the surname, and given names by the number of characters in the first name. For this reason, the final result is a name whose reading matches perfectly. Therefore, it is difficult, for example, to extract several names that match only the first character and character length in terms of pronunciation. That is, for example, when only one part of the characters is misrecognized in terms of pronunciation, it is not possible to perform a process of indexing possible names.

【０００５】本発明は、先頭文字と文字長とが合致する
幾つかの氏名を候補として索引してみるという如き処理
を可能にすることを目的としている。An object of the present invention is to enable processing such as indexing several names whose first characters and character lengths match as candidates.

【０００６】[0006]

【課題を解決するための手段】図１は本発明の原理構成
図を示す。図中の符号１は氏名辞書、２は姓読み部、３
は名読み部、４は姓単語部、５は名単語部、６は一般読
み部、７は一般単語部を表している。[Means for Solving the Problems] FIG. 1 shows a diagram of the basic configuration of the present invention. In the figure, 1 is the name dictionary, 2 is the surname reading part, and 3 is the name dictionary.
4 represents the name reading part, 4 represents the surname word part, 5 represents the name word part, 6 represents the general reading part, and 7 represents the general word part.

【０００７】姓読み部２内には、例えば「ナカムラ−中
村」のように、読み「ナカムラ」に対して姓を構成する
漢字「中村」が対応づけられて格納されている。そして
、読みを構成する仮名文字の文字数が「２」である場合
、「３」である場合、「４」である場合、「５」である
場合、「２ないし５」文字以外の場合とに区分されてお
り、更に例えば「２」である場合で言えば当該「２」で
ある場合内でアイウエオ順に配列されて格納されている
。[0007] In the surname reading unit 2, the kanji character ``Nakamura'' constituting the surname is stored in correspondence with the pronunciation ``Nakamura'', for example ``Nakamura-Nakamura''. If the number of kana characters that make up the reading is "2", "3", "4", "5", or other than "2 to 5" characters. For example, if it is "2", it is stored in the order of the numbers within the case "2".

【０００８】名読み部３内には、例えば「タロウ−太郎
」のように、読み「タロウ」に対して名を構成する漢字
「太郎」が対応づけられて格納されている。そして、読
みを構成する仮名文字の文字数が「２」である場合、「
３」である場合、「４」である場合、「５」である場合
、「２なしい５」文字以外の場合とに区分されており、
更にアイウエオ順に格納されている。[0008] In the name pronunciation section 3, the kanji character ``Taro'' constituting the given name is stored in correspondence with the pronunciation ``Taro'', for example, ``Taro-Taro''. If the number of kana characters that make up the reading is "2", then "
3", "4", "5", and cases other than "2-5" characters.
Furthermore, they are stored in the order of iueo.

【０００９】姓単語部４内には、例えば「中村−ナカム
ラ」のように、漢字「中村」に対して読みを構成する「
ナカムラ」が対応づけられて格納されている。そして、
漢字を構成する文字が「１」である場合、「２」である
場合、「３」である場合、「１ないし３」文字以外の場
合とに区分されており、更に例えば「１」である場合で
言えば当該「１」である場合内で字画数順に配列されて
格納されている。[0009] In the surname word part 4, for example, ``Nakamura-Nakamura'', which constitutes the pronunciation for the kanji ``Nakamura'',
Nakamura" are stored in association with each other. and,
It is divided into cases where the character that makes up the kanji is "1", "2", "3", and characters other than "1 to 3", and furthermore, for example, "1". In terms of cases, they are arranged and stored in the order of the number of strokes within the case, which is "1".

【００１０】名単語部５内には、例えば「太郎−タロウ
」のように、漢字「太郎」に対して読みを構成する「タ
ロウ」が対応づけられて格納されている。この場合も、
姓単語部４の場合と同様な配列の下に格納されている。一般読み部６内には、例えば会社名などが「読み−漢字
」の形で格納されている。また一般単語部７内には、例
えば会社名などが「漢字−読み」の形で格納されている
。そして、文字数に対応して区画されていることは、上
記姓読み部２や上記姓単語部４などと同じと考えてよい
。[0010] In the famous word section 5, the kanji character "Taro" is stored in correspondence with the pronunciation "Taro", for example, "Taro-Taro". In this case too,
It is stored under the same arrangement as the last name word section 4. In the general reading section 6, company names and the like are stored in the form of "reading-kanji", for example. Further, in the general word section 7, for example, company names are stored in the form of "kanji-yomi". The fact that it is divided according to the number of characters can be considered to be the same as the surname pronunciation section 2, the surname word section 4, etc.

【００１１】[0011]

【作用】文字認識装置において姓の読みとして例えば「
ナカムフ」という読みを１つの候補として認識したとす
る。この場合、姓読み部２内において、「４文字で構成
される読み」の部分が調べられ、「ナカムフ」という読
みをもつものは存在しないが「ナカムラ」が存在するこ
となどが判明される。また、読みとして「ナカムラ」が
見出され、当該「ナカムラ」に対応する漢字が「中村」
か「仲村」かなどを調べることもできる。また更に、漢
字の姓の文字をもとに読みを調べることも容易となる。[Operation] In the character recognition device, the reading of the surname is, for example, “
Suppose that the reading ``Nakamfu'' is recognized as one candidate. In this case, the ``pronunciation consisting of four characters'' part in the surname reading part 2 is checked, and it is determined that there is no name with the pronunciation of ``Nakamuf'' but that ``Nakamura'' does exist. Also, the reading ``Nakamura'' was found, and the kanji corresponding to ``Nakamura'' is ``Nakamura''.
You can also check whether it is "Nakamura" or "Nakamura". Furthermore, it becomes easy to check the pronunciation based on the characters of the surname in kanji.

【００１２】また、一般読み部や一般単語部をもうけて
いることによって、会社名などの読み取りに利用できる
。Furthermore, by providing a general reading section and a general word section, it can be used to read company names, etc.

【００１３】[0013]

【実施例】図２と図３とは本発明による辞書利用の後処
理態様を表すフローチャートを示す。図２は主として「
読み」からの処理に対応し、図３は主として「漢字」か
らの処理に対応している。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIGS. 2 and 3 are flowcharts showing post-processing aspects of dictionary use according to the present invention. Figure 2 mainly shows “
This corresponds to the processing starting from "Yomi", and FIG. 3 mainly corresponds to the processing starting from "Kanji".

【００１４】（Ｓ１）：ステップＳ１において、文字認
識装置が認識したデータ（カナ＋漢字）を受け取る。（Ｓ２）：読みの先頭１文字と読みの文字数との組合わ
せ条件をもとに、辞書１をアクセスし、それに合致する
データを抽出する。(S1): In step S1, data (kana+kanji) recognized by the character recognition device is received. (S2): Based on the combination condition of the first character of the pronunciation and the number of characters of the pronunciation, the dictionary 1 is accessed and data matching the condition is extracted.

【００１５】（Ｓ３）：抽出されたデータと入力データ
とを照合して、類似度などの得点を計算する。（Ｓ４）：上記ステップＳ２と同じ条件を満足する辞書
データがなおも存在するか否かを調べ、存在すれば、ス
テップＳ２に戻る。(S3): The extracted data is compared with the input data, and scores such as similarity are calculated. (S4): It is checked whether there is still dictionary data that satisfies the same conditions as in step S2, and if there is, the process returns to step S2.

【００１６】（Ｓ５）：抽出されたデータについての得
点が閾値よりも大であるか否かがチェックされる。以上のようにして、「読み」からみた場合に、候補とな
り得るデータが幾つか決定される。(S5): It is checked whether the score of the extracted data is greater than a threshold value. As described above, some data that can be candidates are determined from the "reading" perspective.

【００１７】（Ｓ６）：当該候補となったデータに対応
する漢字を、辞書１をアクセスして、抽出する。（Ｓ７）：入力データの漢字と辞書からの漢字とを照合
して、得点を計算する。(S6): The dictionary 1 is accessed and the kanji corresponding to the candidate data is extracted. (S7): Compare the kanji of the input data with the kanji from the dictionary and calculate the score.

【００１８】（Ｓ８）：同音異義語がなおも存在するか
否かチェックされる。（Ｓ９）：得られている得点が閾値以上であるか否かが
チェックされる。（Ｓ１０）：閾値以上であれば、出力データとして出力
される。（Ｓ１１）：ステップＳ５やステップＳ９においてＮＯ
となった場合には「漢字」からの処理に進む。(S8): It is checked whether homonyms still exist. (S9): It is checked whether the obtained score is greater than or equal to a threshold value. (S10): If it is equal to or greater than the threshold value, it is output as output data. (S11): NO in step S5 or step S9
If so, proceed to processing from "Kanji".

【００１９】「漢字」からの処理においては、図３に示
すステップＳ１２に入る。（Ｓ１２）：入力データを組合わせた漢字に一致する漢
字とその読みとを辞書１から抽出する。例えば入力デー
タにおける漢字について、例えば「小本」、「山木」、
「川来」が夫々候補として挙がっている場合には、「小
本」、「小木」、「小来」、「山本」、「山木」、「山
来」、「川本」、「川木」、「川来」の夫々に一致する
漢字とその読みとを辞書１から抽出する。In the process starting from "Kanji", the process proceeds to step S12 shown in FIG. (S12): Extract from the dictionary 1 kanji and their readings that match the kanji combined with the input data. For example, regarding the kanji in the input data, for example, "komoto", "yamaki",
If "Kawago" is listed as a candidate, "Komoto", "Ogi", "Kogo", "Yamamoto", "Yamaki", "Yamago", "Kawamoto", "Kawaki" , "Kawago" and their pronunciations are extracted from the dictionary 1.

【００２０】（Ｓ１３）：入力データの（カナ＋漢字）
と辞書データの（カナ＋漢字）とを照合して得点を計算
する。例えばステップＳ１２において示した入力データ
の漢字「小本」、「山木」、「川来」に対応する「読み
」の入力データが「カマモト」、「ヤヌホノ」、「アメ
タイ」を候補として挙がっている場合には、「カマモト
」、「カヌモト」、「カメモト」、「ヤマモト」、「ヤ
ヌモト」、「ヤメモト」、「アマモト」、「アヌモト」
、「アメモト」、「カマホト」、「カマタト」・・・・
・などが組合わせとして得られ、それらと上記漢字例の
候補と組合わせて夫々の得点が計算される。(S13): Input data (kana + kanji)
and the dictionary data (kana + kanji) to calculate the score. For example, the input data of the "reading" corresponding to the kanji characters "Komoto", "Yamaki", and "Kawaki" in the input data shown in step S12 lists "Kamamoto", "Yanuhono", and "Ametai" as candidates. In this case, "Kamamoto", "Kanumoto", "Kamemoto", "Yamamoto", "Yanumoto", "Yamemoto", "Amamoto", "Anumoto"
, "Amemoto", "Kamahot", "Kamatato"...
・ etc. are obtained as combinations, and their scores are calculated by combining them with the above Kanji example candidates.

【００２１】（Ｓ１４）：辞書データ１になお調べるべ
きデータが存在するか否かがチェックされる。（Ｓ１５）：調べるべき他の組合わせが存在するかがチ
ェックされる。（Ｓ１６）：得点が閾値以上か否かがチェックされる。(S14): It is checked whether there is still data to be examined in the dictionary data 1. (S15): It is checked whether there are other combinations to be examined. (S16): It is checked whether the score is equal to or greater than a threshold value.

【００２２】（Ｓ１７）：ＹＥＳであれば出力データと
して出力される。（Ｓ１８）：ＮＯであれば出力失敗となる。以上の如く処理が行われるが、入力データが例えば、読
みに関して、カマモトヤヌホノアメタイが候補として得られており、漢字に関して、小本山木川来が候補として得られている場合には、本発明の辞書を利
用した後処理によって、ヤマモト−山本を得ることができた。(S17): If YES, the data is output as output data. (S18): If NO, output fails. The processing is performed as described above, but if the input data is, for example, for the pronunciation, Kamamotoyanuhonoaametai is obtained as a candidate, and for the kanji, Komotoyamakikawarai is obtained as a candidate, the present invention By post-processing using the dictionary, we were able to obtain Yamamoto-Yamamoto.

【００２３】[0023]

【発明の効果】以上説明した如く、本発明によれば、文
字認識装置における認識過程において得られている候補
文字（読みと漢字）から、後処理によって、適正な「読
みと漢字」とを得ることが容易になる。[Effects of the Invention] As explained above, according to the present invention, appropriate "readings and kanji" are obtained through post-processing from candidate characters (readings and kanji) obtained during the recognition process in a character recognition device. It becomes easier.

[Brief explanation of drawings]

【図１】本発明の原理構成図を示す。FIG. 1 shows a diagram of the principle configuration of the present invention.

【図２】辞書利用の後処理態様を表すフローチャートで
ある。FIG. 2 is a flowchart showing a post-processing aspect of dictionary use.

【図３】辞書利用の後処理態様を表すフローチャートで
ある。FIG. 3 is a flowchart showing a post-processing mode using a dictionary.

[Explanation of symbols]

１　　氏名辞書２　　姓読み部３　　名読み部４　　姓単語部５　　名単語部６　　一般読み部７　　一般単語部 1 Name dictionary 2 Surname reading part 3. Famous reading section 4 Surname word part 5. Famous words section 6 General reading section 7 General vocabulary section

Claims

[Claims]

[Claim 1] In a name dictionary for character recognition post-processing used in a character recognition device that reads and recognizes at least characters corresponding to a name written on a form, for a surname, a kanji is associated with a reading and a kanji is , a surname reading part (2) in which readings with the same number of characters are arranged and stored in a predetermined order for each number of characters making up the reading; For each number of characters, a name reading section (3) stores readings with the same number of characters arranged in a predetermined order, and for surnames, readings are associated with kanji, and for each number of characters constituting the kanji, A surname word section (4) in which kanji with the same number of characters are arranged and stored in a predetermined order, and a kanji with the same number of characters for each kanji that corresponds to the pronunciation of the kanji for the first name. at least a famous word section (5) in which names are arranged and stored in a predetermined order, and with respect to a group of candidate names corresponding to the names extracted by the character recognition device, information on each of the group of candidate names is provided. Based on reading and/or kanji,
A name dictionary for post-processing of character recognition, characterized by being able to extract corresponding kanji and/or pronunciations.

Claim 2: The surname pronunciation part (2), the given name pronunciation part (3), the surname word part (4), and the given name part (5).
2. The name dictionary for character recognition post-processing according to claim 1, further comprising a general reading section (6) for general words other than surnames and/or given names, and a general word section (7). .