JPS61107486A

JPS61107486A - Character recognition post-processing system

Info

Publication number: JPS61107486A
Application number: JP59229113A
Authority: JP
Inventors: Yasusuke Isaki; 伊崎　保直; Michiaki Nakanishi; 道明中西
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-10-31
Filing date: 1984-10-31
Publication date: 1986-05-26
Also published as: JPH0340434B2

Abstract

PURPOSE:To improve identification ratio of solid-written character string by searching candidate characters for preregistered specific characters sequentially, and segmenting an input character string at the position of that specific characters. CONSTITUTION:A character input division 2 scans an input sheet 1 by an OCR, and sends information of strength of light to an identification division 3. The identification division 3 selects candidate characters for input characters, coordinates the selected candidate characters in the order of smaller degree of difference, and stores them in a candidate memory 4. If the input character string is an address, for example, specific characters such as prefecture, city, town, and village are registered and stored in advance in a specific character registration division 5 according to the levels in the orders appearing in the input character string. When the candidate character string is stored in the candidate memory 4, a specific character retrieval division 7 refers to the specific character registration division 5, and checks if specific characters appears in the candidate characters in the order from the beginning. If the specific characters are detected, a candidate character string abstraction division 8 cuts out the character string of that part.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文字認識後処理方式、特にパターン認識された
候補文字に対して、候補文字中に特定の文字があるか否
かに着目することにより１文字列の区切りを見つけ、切
り出された文字列に対して辞書と照合することにより後
処理を行うようにした文字認識後処理方式に関するもの
である。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition post-processing method, in particular, to a pattern-recognized candidate character, focusing on whether or not there is a specific character among the candidate characters. This invention relates to a character recognition post-processing method in which post-processing is performed by finding the break in one character string and comparing the extracted character string with a dictionary.

（従来の技術と問題点〕第３図は従来方式による問題点を説明するための図を示
す。(Prior Art and Problems) FIG. 3 shows a diagram for explaining problems with the conventional method.

従来１例えば光学的手段により漢字を認識する文字認識
装置において、認識率を上げるために。Conventional 1 For example, in order to increase the recognition rate in a character recognition device that recognizes Chinese characters by optical means.

例えば住所辞書というような特定の知識辞書と照合を行
う後処理が行われている。従来方式によれば３例えば第
３図に示すように９人力シート１に住所を記入する場合
３都道府県、市部区１区町村別に入力する文字列を５区
切って記入しなければ後処理ができないという問題があ
った。即ち、後処理を行うためには、入力シート１にお
いて、予め都道府県入力枠、市部区入力枠１区町村入力
枠というように区分された入力枠を持つフォーマット用
紙を使用する必要があった。For example, post-processing is performed to check against a specific knowledge dictionary such as an address dictionary. According to the conventional method, for example, when entering an address on the 9-person sheet 1 as shown in Figure 3, the character string to be entered for each of 3 prefectures, 1 city, 1 ward, town, and village must be entered in 5 sections for post-processing. The problem was that I couldn't do it. In other words, in order to perform post-processing, it was necessary to use a format paper that had input frames divided in advance into prefecture input frames, city/ward input frames, 1 ward/town/village input frames, etc. in input sheet 1. .

そのため１人力枠の数が多くなり、用紙のフォーマツテ
ィングが難しくなると共に、記入しにくく、記入した文
字についても読みにくいという問題があった。As a result, the number of slots for one person to work increases, making it difficult to format the paper, making it difficult to fill in information, and making it difficult to read the written characters.

[Means for solving problems]

本発明は上記問題点の解決を図り、べた書きされた文字
列１例えば住所の場合、都道府県、市部区などを区切ら
ずに書かれた文字列に対して、認識後処理を行い、認識
率を可上させる文字認識後処理方式を提供する。そのた
め２本発明の文字認識後処理方式は、入力文字列の認識
結果として各文字に対して候補文字か出力され、それら
の候補文字に対して最終候補を選択する後処理を行う文
字認識装置における文字認識後処理方式において。The present invention aims to solve the above-mentioned problems, and performs post-recognition processing on a character string written in solid form (for example, in the case of an address, a character string written without separating prefectures, cities, wards, etc.). To provide a character recognition post-processing method that increases the processing rate. Therefore, in the character recognition post-processing method of the present invention, a candidate character is output for each character as a recognition result of an input character string, and a character recognition device performs post-processing to select a final candidate for those candidate characters. In character recognition post-processing methods.

予め上記入力文字列の区切りとなる特定文字を登録し記
ｔ＠する特定文字登録部と、上記特定文字によって区切
られる文字列について当該特定文字によって定まるレベ
ルに対応して意味ある用語を記憶する辞書と、上記候補
文字を検索し候補文字中における上記特定文字の位置を
見つける特定文字検索部と５該特定文字検索部によって
得られた特定文字の位置情報に基づいて上記候補文字か
ら文字列を切り出す候補文字列抽出部と、該候補文字列
抽出部によって切り出された文字列について上記辞書と
照合する辞書照合部とを備えたことを特徴としている。A specific character registration unit that registers and records specific characters that delimit the input character string in advance; and a dictionary that stores meaningful terms corresponding to the level determined by the specific characters for the character strings delimited by the specific characters. (5) a specific character search unit that searches the candidate characters and finds the position of the specific character in the candidate characters; and (5) cutting out a character string from the candidate characters based on the position information of the specific character obtained by the specific character search unit. The present invention is characterized by comprising a candidate character string extraction section and a dictionary matching section that compares the character string extracted by the candidate character string extraction section with the dictionary.

[Effect]

本発明は、入力文字を認識した結果、候補文字の中に特
定の入力文字が存在することを利用し。The present invention utilizes the fact that a specific input character exists among candidate characters as a result of recognizing input characters.

例えば住所の場合１県、市、郡９区、町などの予め登録
された特定の文字が候補文字の中に存在するかどうかを
順次探していき、入力文字列をその４．。。□イ□、い
。よ６．よｉ、−、ｔ、ニー＠−’・１れた文字列につ
いて認識後処理を行い得るようにしたものである。For example, in the case of an address, we sequentially search to see if a specific character registered in advance, such as 1 prefecture, city, 9 wards, town, etc., exists among the candidate characters, and change the input character string to 4. . . □I□, I. Yo6. It is possible to perform post-recognition processing on character strings such as yoi, -, t, nee@-', and 1.

〔Example〕

以下２図面を参照しつつ、実施例に従って説明する。 Embodiments will be described below with reference to two drawings.

第１図は本発明の一実施例構成、第２図は本発明の一実
施例についての処理態様を説明するための図を示す。FIG. 1 shows the configuration of an embodiment of the present invention, and FIG. 2 is a diagram for explaining the processing aspect of the embodiment of the present invention.

図中、１は入力シート、２は例えばＯＣＲ等の文字入力
部、３は入力文字についてパターン解析し候補文字を選
出する認識部、４は選出された候補文字列が格納される
候補メモリ、５は予め文字列の区切りとなる特定文字が
登録される特定文字登録部、６は住所辞書、７は候補メ
モリ４において特定文字を検索する特定文字検索部、８
は特定文字によって区切られた候補文字列を切り出す候
補文字列抽出部、９は候補文字列抽出部８によって切り
出された候補文字列について住所辞書６と照合する辞書
照合部、１０は辞書照合結果を出力する結果出力部を表
す。また、第２図において。In the figure, 1 is an input sheet, 2 is a character input unit such as OCR, 3 is a recognition unit that analyzes patterns of input characters and selects candidate characters, 4 is a candidate memory in which the selected candidate character strings are stored, and 5 6 is an address dictionary; 7 is a specific character search unit that searches for specific characters in the candidate memory 4;
9 is a candidate character string extraction unit that extracts candidate character strings delimited by specific characters; 9 is a dictionary collation unit that collates the candidate character strings extracted by the candidate character string extraction unit 8 with the address dictionary 6; 10 is a dictionary collation result. Represents the result output section to be output. Also, in FIG.

符号２０は記入文字列を表している。Reference numeral 20 represents an input character string.

本発明の場合２例えば第２図に示す記入文字列２０のよ
うに、住所を都道府県や市部区などで区切らずに、べた
書きで人力シート１に記入できるようになっている。文
字人力部２は１例えばＯＣＲなどによる光学的手段によ
り入力シート１を走査し、光の強弱情報を認識部３に伝
達する。認識部３は、入力情報について１例えば位相幾
何学的特徴を抽出したり、ストローク解析を行うなどし
て入力文字についての候補文字を選出するが、この認識
処理については９種々の方式が周知となっており、詳細
な説明は省略する。Case 2 of the present invention For example, as shown in the character string 20 shown in FIG. 2, the address can be written in solid text on the manual sheet 1 without dividing it into prefectures, cities, wards, etc. The character input section 2 scans the input sheet 1 by optical means such as OCR, and transmits light intensity information to the recognition section 3. The recognition unit 3 selects candidate characters for the input characters by, for example, extracting topological features or performing stroke analysis on the input information, and nine different methods are known for this recognition process. , and detailed explanation will be omitted.

認識部３は１選出した各候補文字について２例えば第２
図に示すように、いわゆる相違度の小さい順に順位をつ
けて、第１位から第２０位まで候補メモリ４に格納する
。言うまでもなく、第１位の候補文字が、必ずしも正し
い入力文字であるとは限らない。For each selected candidate character, the recognition unit 3
As shown in the figure, the candidates are ranked in descending order of so-called difference degree and stored in the candidate memory 4 from the 1st to the 20th place. Needless to say, the first candidate character is not necessarily the correct input character.

ところで、入力文字列が住所である場合、住所には通常
「都、道、府１県、市、郡２区、町、村」等の特定の文
字が含まれることになる。特定文字登録部５には、これ
らの特定文字が、入力文字列中に現れる順番に従ったレ
ベルに対応して、予め登録され記憶される。住所の場合
２例えば「都。By the way, when the input character string is an address, the address usually includes specific characters such as "city, prefecture, prefecture, city, county, two wards, town, village." In the specific character registration section 5, these specific characters are registered and stored in advance in correspondence with the level according to the order in which they appear in the input character string. Address 2 For example, “Miyako.

道、府１県」の各漢字が都道府県レベルとして登録され
、「市、郡１区」の各漢字が市部区レベルとして登録さ
れ、「区、町、村」の各漢字が区町村レベルとして登録
される。また、特定文字登録部５には、各レベルまたは
各特定文字に対応して。Each kanji for "Michi, 1 prefecture" is registered at the prefecture level, each kanji for "city, 1 ward" is registered at the city/ward level, and each kanji for "ku, town, village" is registered at the ward/town/village level. Registered as . Further, the specific character registration section 5 has information corresponding to each level or each specific character.

そのレベル等に現れ得る文字列を記憶する辞書へのイン
デックス情報が、設定されるようになっている。Index information is set for a dictionary that stores character strings that can appear at that level.

特定文字検索部７は、候補メモリ４に候補文字列が格納
されると、特定文字登録部５を参照し。When the candidate character string is stored in the candidate memory 4, the specific character search unit 7 refers to the specific character registration unit 5.

各レベルに対応する特定文字をキーにして、候補文字中
にそのキーとなる特定文字が出現するかどうかを先頭か
ら順次鋼べていく。第２図に示した例の場合、第３文字
目の第１順位の場所に、「都」という文字が見つけられ
ることになる。その結果、第１文字目から第３文字目ま
でが、都道府県レベルの文字列であることがわかる。次
は、第４文字目から順次鋼べていくことにより、第６文
字目で「市」が出現するので、第４文字目から第６文字
目までの３文字が、市部区レベルであると認識される。Using a specific character corresponding to each level as a key, the system sequentially tests from the beginning whether or not the specific character that serves as the key appears among the candidate characters. In the example shown in FIG. 2, the character ``Miyako'' is found in the first position of the third character. As a result, it can be seen that the first to third characters are character strings at the prefecture level. Next, by sequentially starting from the 4th character, "city" will appear at the 6th character, so the 3 characters from the 4th character to the 6th character are at the city/ward level. It is recognized as

同様にし７で、第７文字目から第１０文字目までが区町
村しベルの単語であることが認識される。Similarly, at 7, it is recognized that the 7th to 10th characters are the word ``ku, town, village, and bell''.

なお、これらの特定文字は、必ず候補順位の第１位に現
れなければならないわけではなく１例えば第３文字目ま
たは第１０文字目等の特定文字が現れやすい場所につい
て、候補順位の高いほうから順に検索結果が選択される
ようになっている。Note that these specific characters do not necessarily have to appear first in the candidate ranking.1 For example, for locations where specific characters are likely to appear, such as the 3rd character or the 10th character, the candidate rankings are determined from the highest candidate ranking. Search results are selected in order.

特定文字検索部７によって、特定文字の位置が検出され
ると、候補文字列抽出部８は、候補メモＩＪ　４に記憶
されている候補文字列から、その特定文字が現れるまで
の部分候補文字列を切り出し。When the specific character search unit 7 detects the position of a specific character, the candidate character string extraction unit 8 extracts a partial candidate character string from the candidate character string stored in the candidate memo IJ 4 until the specific character appears. Cut out.

辞書照合部９に通知する。The dictionary checking unit 9 is notified.

辞書照合部９は１通知された文字列が例えば都道府県″
へ）′７：：ｂ６．ｕ＠・住所占辛書６０辞書４部　　
　　　；、、ｉに登録されている各単語と２通知された
部分候補文字列における候補順位に従った文字の組合わ
せとか、−敗するか否かを順次鋼べていく。これにより
、第２図図示の例の場合、都道府県レベルでは単語が「
東京都」であることがわかる。なお。The dictionary matching unit 9 determines that the notified character string is, for example, ``prefecture''.
to)'7::b6. u@・Address Zhanshinsho 60 Dictionary 4 copies
; , , The combination of each word registered in i and the characters according to the candidate ranking in the notified partial candidate character strings is sequentially tested to see if it will fail or not. As a result, in the example shown in Figure 2, at the prefecture level, the word "
It turns out that it is Tokyo. In addition.

住所辞四６との照合において２以上の一敗する単語があ
る場合、候補順位のポイント計算により。If there is a word that loses 2 or more in the comparison with the address dictionary 46, by calculating the points of the candidate ranking.

候補順位のより高い方の文字の組合わせのものが選出さ
れるようになっている。The character combination with the higher candidate ranking is selected.

同様に、第２図図示の例において市郡区レベルでは、第
４文字目から第６文字目までの部分候補文字列について
、住所辞書６の辞書Ｂ部との照合により、「町田布」が
照合結果として得られる。Similarly, in the example shown in FIG. 2, at the city/town/ward level, the partial candidate character string from the 4th character to the 6th character is compared with the dictionary B part of the address dictionary 6, and "Machidafu" is determined. Obtained as a matching result.

さらに区町村レベルでは、第７文字目ないし第１０文字
目までの部分候補文字列により、「真光寺町」が照合結
果として得られる。Furthermore, at the ward/town/village level, "Shinkojimachi" is obtained as a matching result based on the partial candidate character string from the 7th character to the 10th character.

照合結果は、結果出力部１０に通知され、結果出力部１
０は、必要に応じて入力者への確認を行って、最終的な
認識結果を確定し、予め定められた機器への出力処理等
を実行する。The matching result is notified to the result output unit 10, and the result output unit 1
0 confirms with the input person as necessary, determines the final recognition result, and executes output processing to a predetermined device.

以上、住所の文字入力を例に説明したが１本発明は９例
えば会社における所属等の入力において。The above description has been made using character input of an address as an example, but the present invention can also be applied to input of, for example, affiliation in a company.

「部」や「課」などを特定文字とするというように１文
字列の区切りに通常よく現れる文字があるものについて
同様に適用することができる。また。This method can be similarly applied to cases where there are characters that often appear at the end of one character string, such as "department" or "section" as specific characters. Also.

手書き文字に限らず、活字による印刷文字の認識等にも
適用できる。It can be applied not only to handwritten characters but also to recognition of printed characters.

〔Effect of the invention〕

以上説明した如く１本発明によれば、べた書きされた入
力文字列を、後処理可能な単語単位に区切ることができ
るので、複数の候補から最も妥当な最終的候補を選出す
る後処理を行うことができ。As explained above, according to the present invention, a solid input character string can be divided into word units that can be post-processed, so post-processing is performed to select the most appropriate final candidate from a plurality of candidates. It is possible.

認識率を向上させることができる。入力文字列について
、べた書きが可能であることから、記入にあたって書き
易（、記入された文字列は読み易い。The recognition rate can be improved. Since input character strings can be written in solid text, they are easy to write (and the entered character strings are easy to read).

また、用紙の無駄を少なくすることかできる。入力者は
特定文字を意識する必要はなく、入力者に負担を与える
ことはない。Additionally, paper waste can be reduced. The inputter does not need to be aware of specific characters, and there is no burden on the inputter.

[Brief explanation of drawings]

第１図は本発明の一実施例構成、第２図は本発明の一実
施例についての処理態様を説明するための図、第３図は
従来方式による問題点を説明するだめの図を示す。図中、１は入力シート、２は文字入力部、３は認識部、
４は候補メモリ、５は特定文字登録部。６は住所辞書、７は特定文字検索部、８は候補文字列抽
出部、９は辞書照合部、１０は結果出力部。２０は記入文字列を表す。特許出願人　　　富士通株式会社代理人弁理士　　森１）寛（外１名）第　１１２１Figure 1 shows the configuration of an embodiment of the present invention, Figure 2 is a diagram for explaining the processing mode of an embodiment of the present invention, and Figure 3 is a diagram for explaining problems with the conventional method. . In the figure, 1 is an input sheet, 2 is a character input section, 3 is a recognition section,
4 is a candidate memory, and 5 is a specific character registration section. 6 is an address dictionary, 7 is a specific character search section, 8 is a candidate character string extraction section, 9 is a dictionary collation section, and 10 is a result output section. 20 represents an input character string. Patent applicant: Fujitsu Ltd. Representative Patent Attorney Hiroshi Mori 1) (1 other person) No. 1121

Claims

[Claims]

In a character recognition post-processing method in a character recognition device that outputs candidate characters for each character as a recognition result of an input character string, and performs post-processing to select a final candidate for those candidate characters, the above input character string is a specific character registration unit that registers and stores specific characters that serve as delimiters; a dictionary that stores meaningful terms corresponding to the level determined by the specific characters for character strings delimited by the specific characters; and a dictionary that searches for the candidate characters. a specific character search unit that finds the position of the specific character in the candidate characters; a candidate character string extraction unit that extracts a character string from the candidate characters based on the position information of the specific character obtained by the specific character search unit; A character recognition post-processing method comprising: a dictionary matching unit that matches the character string extracted by the candidate character string extraction unit with the dictionary.