JPS61107486A - Character recognition post-processing system - Google Patents

Character recognition post-processing system

Info

Publication number
JPS61107486A
JPS61107486A JP59229113A JP22911384A JPS61107486A JP S61107486 A JPS61107486 A JP S61107486A JP 59229113 A JP59229113 A JP 59229113A JP 22911384 A JP22911384 A JP 22911384A JP S61107486 A JPS61107486 A JP S61107486A
Authority
JP
Japan
Prior art keywords
character
characters
candidate
specific
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP59229113A
Other languages
Japanese (ja)
Other versions
JPH0340434B2 (en
Inventor
Yasusuke Isaki
伊崎 保直
Michiaki Nakanishi
道明 中西
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP59229113A priority Critical patent/JPS61107486A/en
Publication of JPS61107486A publication Critical patent/JPS61107486A/en
Publication of JPH0340434B2 publication Critical patent/JPH0340434B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve identification ratio of solid-written character string by searching candidate characters for preregistered specific characters sequentially, and segmenting an input character string at the position of that specific characters. CONSTITUTION:A character input division 2 scans an input sheet 1 by an OCR, and sends information of strength of light to an identification division 3. The identification division 3 selects candidate characters for input characters, coordinates the selected candidate characters in the order of smaller degree of difference, and stores them in a candidate memory 4. If the input character string is an address, for example, specific characters such as prefecture, city, town, and village are registered and stored in advance in a specific character registration division 5 according to the levels in the orders appearing in the input character string. When the candidate character string is stored in the candidate memory 4, a specific character retrieval division 7 refers to the specific character registration division 5, and checks if specific characters appears in the candidate characters in the order from the beginning. If the specific characters are detected, a candidate character string abstraction division 8 cuts out the character string of that part.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は文字認識後処理方式、特にパターン認識された
候補文字に対して、候補文字中に特定の文字があるか否
かに着目することにより1文字列の区切りを見つけ、切
り出された文字列に対して辞書と照合することにより後
処理を行うようにした文字認識後処理方式に関するもの
である。
[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition post-processing method, in particular, to a pattern-recognized candidate character, focusing on whether or not there is a specific character among the candidate characters. This invention relates to a character recognition post-processing method in which post-processing is performed by finding the break in one character string and comparing the extracted character string with a dictionary.

(従来の技術と問題点〕 第3図は従来方式による問題点を説明するための図を示
す。
(Prior Art and Problems) FIG. 3 shows a diagram for explaining problems with the conventional method.

従来1例えば光学的手段により漢字を認識する文字認識
装置において、認識率を上げるために。
Conventional 1 For example, in order to increase the recognition rate in a character recognition device that recognizes Chinese characters by optical means.

例えば住所辞書というような特定の知識辞書と照合を行
う後処理が行われている。従来方式によれば3例えば第
3図に示すように9人力シート1に住所を記入する場合
3都道府県、市部区1区町村別に入力する文字列を5区
切って記入しなければ後処理ができないという問題があ
った。即ち、後処理を行うためには、入力シート1にお
いて、予め都道府県入力枠、市部区入力枠1区町村入力
枠というように区分された入力枠を持つフォーマット用
紙を使用する必要があった。
For example, post-processing is performed to check against a specific knowledge dictionary such as an address dictionary. According to the conventional method, for example, when entering an address on the 9-person sheet 1 as shown in Figure 3, the character string to be entered for each of 3 prefectures, 1 city, 1 ward, town, and village must be entered in 5 sections for post-processing. The problem was that I couldn't do it. In other words, in order to perform post-processing, it was necessary to use a format paper that had input frames divided in advance into prefecture input frames, city/ward input frames, 1 ward/town/village input frames, etc. in input sheet 1. .

そのため1人力枠の数が多くなり、用紙のフォーマツテ
ィングが難しくなると共に、記入しにくく、記入した文
字についても読みにくいという問題があった。
As a result, the number of slots for one person to work increases, making it difficult to format the paper, making it difficult to fill in information, and making it difficult to read the written characters.

〔問題点を解決するための手段〕[Means for solving problems]

本発明は上記問題点の解決を図り、べた書きされた文字
列1例えば住所の場合、都道府県、市部区などを区切ら
ずに書かれた文字列に対して、認識後処理を行い、認識
率を可上させる文字認識後処理方式を提供する。そのた
め2本発明の文字認識後処理方式は、入力文字列の認識
結果として各文字に対して候補文字か出力され、それら
の候補文字に対して最終候補を選択する後処理を行う文
字認識装置における文字認識後処理方式において。
The present invention aims to solve the above-mentioned problems, and performs post-recognition processing on a character string written in solid form (for example, in the case of an address, a character string written without separating prefectures, cities, wards, etc.). To provide a character recognition post-processing method that increases the processing rate. Therefore, in the character recognition post-processing method of the present invention, a candidate character is output for each character as a recognition result of an input character string, and a character recognition device performs post-processing to select a final candidate for those candidate characters. In character recognition post-processing methods.

予め上記入力文字列の区切りとなる特定文字を登録し記
t@する特定文字登録部と、上記特定文字によって区切
られる文字列について当該特定文字によって定まるレベ
ルに対応して意味ある用語を記憶する辞書と、上記候補
文字を検索し候補文字中における上記特定文字の位置を
見つける特定文字検索部と5該特定文字検索部によって
得られた特定文字の位置情報に基づいて上記候補文字か
ら文字列を切り出す候補文字列抽出部と、該候補文字列
抽出部によって切り出された文字列について上記辞書と
照合する辞書照合部とを備えたことを特徴としている。
A specific character registration unit that registers and records specific characters that delimit the input character string in advance; and a dictionary that stores meaningful terms corresponding to the level determined by the specific characters for the character strings delimited by the specific characters. (5) a specific character search unit that searches the candidate characters and finds the position of the specific character in the candidate characters; and (5) cutting out a character string from the candidate characters based on the position information of the specific character obtained by the specific character search unit. The present invention is characterized by comprising a candidate character string extraction section and a dictionary matching section that compares the character string extracted by the candidate character string extraction section with the dictionary.

〔作用〕[Effect]

本発明は、入力文字を認識した結果、候補文字の中に特
定の入力文字が存在することを利用し。
The present invention utilizes the fact that a specific input character exists among candidate characters as a result of recognizing input characters.

例えば住所の場合1県、市、郡9区、町などの予め登録
された特定の文字が候補文字の中に存在するかどうかを
順次探していき、入力文字列をその4.。。□イ□、い
。よ6.よi、−、t、ニー@−’・1れた文字列につ
いて認識後処理を行い得るようにしたものである。
For example, in the case of an address, we sequentially search to see if a specific character registered in advance, such as 1 prefecture, city, 9 wards, town, etc., exists among the candidate characters, and change the input character string to 4. . . □I□, I. Yo6. It is possible to perform post-recognition processing on character strings such as yoi, -, t, nee@-', and 1.

〔実施例〕〔Example〕

以下2図面を参照しつつ、実施例に従って説明する。 Embodiments will be described below with reference to two drawings.

第1図は本発明の一実施例構成、第2図は本発明の一実
施例についての処理態様を説明するための図を示す。
FIG. 1 shows the configuration of an embodiment of the present invention, and FIG. 2 is a diagram for explaining the processing aspect of the embodiment of the present invention.

図中、1は入力シート、2は例えばOCR等の文字入力
部、3は入力文字についてパターン解析し候補文字を選
出する認識部、4は選出された候補文字列が格納される
候補メモリ、5は予め文字列の区切りとなる特定文字が
登録される特定文字登録部、6は住所辞書、7は候補メ
モリ4において特定文字を検索する特定文字検索部、8
は特定文字によって区切られた候補文字列を切り出す候
補文字列抽出部、9は候補文字列抽出部8によって切り
出された候補文字列について住所辞書6と照合する辞書
照合部、10は辞書照合結果を出力する結果出力部を表
す。また、第2図において。
In the figure, 1 is an input sheet, 2 is a character input unit such as OCR, 3 is a recognition unit that analyzes patterns of input characters and selects candidate characters, 4 is a candidate memory in which the selected candidate character strings are stored, and 5 6 is an address dictionary; 7 is a specific character search unit that searches for specific characters in the candidate memory 4;
9 is a candidate character string extraction unit that extracts candidate character strings delimited by specific characters; 9 is a dictionary collation unit that collates the candidate character strings extracted by the candidate character string extraction unit 8 with the address dictionary 6; 10 is a dictionary collation result. Represents the result output section to be output. Also, in FIG.

符号20は記入文字列を表している。Reference numeral 20 represents an input character string.

本発明の場合2例えば第2図に示す記入文字列20のよ
うに、住所を都道府県や市部区などで区切らずに、べた
書きで人力シート1に記入できるようになっている。文
字人力部2は1例えばOCRなどによる光学的手段によ
り入力シート1を走査し、光の強弱情報を認識部3に伝
達する。認識部3は、入力情報について1例えば位相幾
何学的特徴を抽出したり、ストローク解析を行うなどし
て入力文字についての候補文字を選出するが、この認識
処理については9種々の方式が周知となっており、詳細
な説明は省略する。
Case 2 of the present invention For example, as shown in the character string 20 shown in FIG. 2, the address can be written in solid text on the manual sheet 1 without dividing it into prefectures, cities, wards, etc. The character input section 2 scans the input sheet 1 by optical means such as OCR, and transmits light intensity information to the recognition section 3. The recognition unit 3 selects candidate characters for the input characters by, for example, extracting topological features or performing stroke analysis on the input information, and nine different methods are known for this recognition process. , and detailed explanation will be omitted.

認識部3は1選出した各候補文字について2例えば第2
図に示すように、いわゆる相違度の小さい順に順位をつ
けて、第1位から第20位まで候補メモリ4に格納する
。言うまでもなく、第1位の候補文字が、必ずしも正し
い入力文字であるとは限らない。
For each selected candidate character, the recognition unit 3
As shown in the figure, the candidates are ranked in descending order of so-called difference degree and stored in the candidate memory 4 from the 1st to the 20th place. Needless to say, the first candidate character is not necessarily the correct input character.

ところで、入力文字列が住所である場合、住所には通常
「都、道、府1県、市、郡2区、町、村」等の特定の文
字が含まれることになる。特定文字登録部5には、これ
らの特定文字が、入力文字列中に現れる順番に従ったレ
ベルに対応して、予め登録され記憶される。住所の場合
2例えば「都。
By the way, when the input character string is an address, the address usually includes specific characters such as "city, prefecture, prefecture, city, county, two wards, town, village." In the specific character registration section 5, these specific characters are registered and stored in advance in correspondence with the level according to the order in which they appear in the input character string. Address 2 For example, “Miyako.

道、府1県」の各漢字が都道府県レベルとして登録され
、「市、郡1区」の各漢字が市部区レベルとして登録さ
れ、「区、町、村」の各漢字が区町村レベルとして登録
される。また、特定文字登録部5には、各レベルまたは
各特定文字に対応して。
Each kanji for "Michi, 1 prefecture" is registered at the prefecture level, each kanji for "city, 1 ward" is registered at the city/ward level, and each kanji for "ku, town, village" is registered at the ward/town/village level. Registered as . Further, the specific character registration section 5 has information corresponding to each level or each specific character.

そのレベル等に現れ得る文字列を記憶する辞書へのイン
デックス情報が、設定されるようになっている。
Index information is set for a dictionary that stores character strings that can appear at that level.

特定文字検索部7は、候補メモリ4に候補文字列が格納
されると、特定文字登録部5を参照し。
When the candidate character string is stored in the candidate memory 4, the specific character search unit 7 refers to the specific character registration unit 5.

各レベルに対応する特定文字をキーにして、候補文字中
にそのキーとなる特定文字が出現するかどうかを先頭か
ら順次鋼べていく。第2図に示した例の場合、第3文字
目の第1順位の場所に、「都」という文字が見つけられ
ることになる。その結果、第1文字目から第3文字目ま
でが、都道府県レベルの文字列であることがわかる。次
は、第4文字目から順次鋼べていくことにより、第6文
字目で「市」が出現するので、第4文字目から第6文字
目までの3文字が、市部区レベルであると認識される。
Using a specific character corresponding to each level as a key, the system sequentially tests from the beginning whether or not the specific character that serves as the key appears among the candidate characters. In the example shown in FIG. 2, the character ``Miyako'' is found in the first position of the third character. As a result, it can be seen that the first to third characters are character strings at the prefecture level. Next, by sequentially starting from the 4th character, "city" will appear at the 6th character, so the 3 characters from the 4th character to the 6th character are at the city/ward level. It is recognized as

同様にし7で、第7文字目から第10文字目までが区町
村しベルの単語であることが認識される。
Similarly, at 7, it is recognized that the 7th to 10th characters are the word ``ku, town, village, and bell''.

なお、これらの特定文字は、必ず候補順位の第1位に現
れなければならないわけではなく1例えば第3文字目ま
たは第10文字目等の特定文字が現れやすい場所につい
て、候補順位の高いほうから順に検索結果が選択される
ようになっている。
Note that these specific characters do not necessarily have to appear first in the candidate ranking.1 For example, for locations where specific characters are likely to appear, such as the 3rd character or the 10th character, the candidate rankings are determined from the highest candidate ranking. Search results are selected in order.

特定文字検索部7によって、特定文字の位置が検出され
ると、候補文字列抽出部8は、候補メモIJ 4に記憶
されている候補文字列から、その特定文字が現れるまで
の部分候補文字列を切り出し。
When the specific character search unit 7 detects the position of a specific character, the candidate character string extraction unit 8 extracts a partial candidate character string from the candidate character string stored in the candidate memo IJ 4 until the specific character appears. Cut out.

辞書照合部9に通知する。The dictionary checking unit 9 is notified.

辞書照合部9は1通知された文字列が例えば都道府県″
へ)′7::b6.u@・住所占辛書60辞書4部  
   ;、、iに登録されている各単語と2通知された
部分候補文字列における候補順位に従った文字の組合わ
せとか、−敗するか否かを順次鋼べていく。これにより
、第2図図示の例の場合、都道府県レベルでは単語が「
東京都」であることがわかる。なお。
The dictionary matching unit 9 determines that the notified character string is, for example, ``prefecture''.
to)'7::b6. u@・Address Zhanshinsho 60 Dictionary 4 copies
; , , The combination of each word registered in i and the characters according to the candidate ranking in the notified partial candidate character strings is sequentially tested to see if it will fail or not. As a result, in the example shown in Figure 2, at the prefecture level, the word "
It turns out that it is Tokyo. In addition.

住所辞四6との照合において2以上の一敗する単語があ
る場合、候補順位のポイント計算により。
If there is a word that loses 2 or more in the comparison with the address dictionary 46, by calculating the points of the candidate ranking.

候補順位のより高い方の文字の組合わせのものが選出さ
れるようになっている。
The character combination with the higher candidate ranking is selected.

同様に、第2図図示の例において市郡区レベルでは、第
4文字目から第6文字目までの部分候補文字列について
、住所辞書6の辞書B部との照合により、「町田布」が
照合結果として得られる。
Similarly, in the example shown in FIG. 2, at the city/town/ward level, the partial candidate character string from the 4th character to the 6th character is compared with the dictionary B part of the address dictionary 6, and "Machidafu" is determined. Obtained as a matching result.

さらに区町村レベルでは、第7文字目ないし第10文字
目までの部分候補文字列により、「真光寺町」が照合結
果として得られる。
Furthermore, at the ward/town/village level, "Shinkojimachi" is obtained as a matching result based on the partial candidate character string from the 7th character to the 10th character.

照合結果は、結果出力部10に通知され、結果出力部1
0は、必要に応じて入力者への確認を行って、最終的な
認識結果を確定し、予め定められた機器への出力処理等
を実行する。
The matching result is notified to the result output unit 10, and the result output unit 1
0 confirms with the input person as necessary, determines the final recognition result, and executes output processing to a predetermined device.

以上、住所の文字入力を例に説明したが1本発明は9例
えば会社における所属等の入力において。
The above description has been made using character input of an address as an example, but the present invention can also be applied to input of, for example, affiliation in a company.

「部」や「課」などを特定文字とするというように1文
字列の区切りに通常よく現れる文字があるものについて
同様に適用することができる。また。
This method can be similarly applied to cases where there are characters that often appear at the end of one character string, such as "department" or "section" as specific characters. Also.

手書き文字に限らず、活字による印刷文字の認識等にも
適用できる。
It can be applied not only to handwritten characters but also to recognition of printed characters.

〔発明の効果〕〔Effect of the invention〕

以上説明した如く1本発明によれば、べた書きされた入
力文字列を、後処理可能な単語単位に区切ることができ
るので、複数の候補から最も妥当な最終的候補を選出す
る後処理を行うことができ。
As explained above, according to the present invention, a solid input character string can be divided into word units that can be post-processed, so post-processing is performed to select the most appropriate final candidate from a plurality of candidates. It is possible.

認識率を向上させることができる。入力文字列について
、べた書きが可能であることから、記入にあたって書き
易(、記入された文字列は読み易い。
The recognition rate can be improved. Since input character strings can be written in solid text, they are easy to write (and the entered character strings are easy to read).

また、用紙の無駄を少なくすることかできる。入力者は
特定文字を意識する必要はなく、入力者に負担を与える
ことはない。
Additionally, paper waste can be reduced. The inputter does not need to be aware of specific characters, and there is no burden on the inputter.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例構成、第2図は本発明の一実
施例についての処理態様を説明するための図、第3図は
従来方式による問題点を説明するだめの図を示す。 図中、1は入力シート、2は文字入力部、3は認識部、
4は候補メモリ、5は特定文字登録部。 6は住所辞書、7は特定文字検索部、8は候補文字列抽
出部、9は辞書照合部、10は結果出力部。 20は記入文字列を表す。 特許出願人   富士通株式会社 代理人弁理士  森1)寛(外1名) 第 1121
Figure 1 shows the configuration of an embodiment of the present invention, Figure 2 is a diagram for explaining the processing mode of an embodiment of the present invention, and Figure 3 is a diagram for explaining problems with the conventional method. . In the figure, 1 is an input sheet, 2 is a character input section, 3 is a recognition section,
4 is a candidate memory, and 5 is a specific character registration section. 6 is an address dictionary, 7 is a specific character search section, 8 is a candidate character string extraction section, 9 is a dictionary collation section, and 10 is a result output section. 20 represents an input character string. Patent applicant: Fujitsu Ltd. Representative Patent Attorney Hiroshi Mori 1) (1 other person) No. 1121

Claims (1)

【特許請求の範囲】[Claims] 入力文字列の認識結果として各文字に対して候補文字が
出力され、それらの候補文字に対して最終候補を選択す
る後処理を行う文字認識装置における文字認識後処理方
式において、予め上記入力文字列の区切りとなる特定文
字を登録し記憶する特定文字登録部と、上記特定文字に
よって区切られる文字列について当該特定文字によって
定まるレベルに対応して意味ある用語を記憶する辞書と
、上記候補文字を検索し候補文字中における上記特定文
字の位置を見つける特定文字検索部と、該特定文字検索
部によって得られた特定文字の位置情報に基づいて上記
候補文字から文字列を切り出す候補文字列抽出部と、該
候補文字列抽出部によって切り出された文字列について
上記辞書と照合する辞書照合部とを備えたことを特徴と
する文字認識後処理方式。
In a character recognition post-processing method in a character recognition device that outputs candidate characters for each character as a recognition result of an input character string, and performs post-processing to select a final candidate for those candidate characters, the above input character string is a specific character registration unit that registers and stores specific characters that serve as delimiters; a dictionary that stores meaningful terms corresponding to the level determined by the specific characters for character strings delimited by the specific characters; and a dictionary that searches for the candidate characters. a specific character search unit that finds the position of the specific character in the candidate characters; a candidate character string extraction unit that extracts a character string from the candidate characters based on the position information of the specific character obtained by the specific character search unit; A character recognition post-processing method comprising: a dictionary matching unit that matches the character string extracted by the candidate character string extraction unit with the dictionary.
JP59229113A 1984-10-31 1984-10-31 Character recognition post-processing system Granted JPS61107486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59229113A JPS61107486A (en) 1984-10-31 1984-10-31 Character recognition post-processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59229113A JPS61107486A (en) 1984-10-31 1984-10-31 Character recognition post-processing system

Publications (2)

Publication Number Publication Date
JPS61107486A true JPS61107486A (en) 1986-05-26
JPH0340434B2 JPH0340434B2 (en) 1991-06-18

Family

ID=16886946

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59229113A Granted JPS61107486A (en) 1984-10-31 1984-10-31 Character recognition post-processing system

Country Status (1)

Country Link
JP (1) JPS61107486A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02101596A (en) * 1988-10-11 1990-04-13 Fujitsu Ltd Character recognizing device
JPH02268388A (en) * 1989-04-10 1990-11-02 Hitachi Ltd Character recognizing method
JPH0554021A (en) * 1991-05-10 1993-03-05 Hitachi Ltd Information processor

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02101596A (en) * 1988-10-11 1990-04-13 Fujitsu Ltd Character recognizing device
JPH02268388A (en) * 1989-04-10 1990-11-02 Hitachi Ltd Character recognizing method
JPH0554021A (en) * 1991-05-10 1993-03-05 Hitachi Ltd Information processor

Also Published As

Publication number Publication date
JPH0340434B2 (en) 1991-06-18

Similar Documents

Publication Publication Date Title
JPS6262387B2 (en)
KR100288144B1 (en) Foreign language coding method in Korean and search method using the same
JPS61107486A (en) Character recognition post-processing system
WO2000036530A1 (en) Searching method, searching device, and recorded medium
JP2825072B2 (en) String recognition device
KR100473660B1 (en) Word recognition method
JP3767180B2 (en) Document structure analysis method and apparatus, and storage medium storing document structure analysis program
JP3151866B2 (en) English character recognition method
JP2618018B2 (en) Character recognition device
JPH0256086A (en) Method for postprocessing for character recognition
JP2922365B2 (en) Kanji address data processing method in OCR processing system
KR940007933B1 (en) User independent type on-line korean character recognition method
JPH11120294A (en) Character recognition device and medium
JP3245415B2 (en) Character recognition method
JPS61161588A (en) Postprocessing system of character recognition
JPS63188284A (en) Character reader
JPS63100584A (en) Character recognition processing system
JPS63268083A (en) Word recognizing device
JPS63138479A (en) Character recognizing device
JP2000215273A (en) Online handwritten character recognition device, and recording medium which computer can read
KR970049822A (en) Cursive Multi-Character Recognition Method
KR950033945A (en) Pre-registration method, character recognition method and character recognition device
JPH06176206A (en) Character recognizing device
JPS62247482A (en) Post-processing system for character recognition
JPH04318687A (en) Character recognition unit

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term