JP2985243B2 - Character recognition method - Google Patents

Character recognition method

Info

Publication number
JP2985243B2
JP2985243B2 JP2180804A JP18080490A JP2985243B2 JP 2985243 B2 JP2985243 B2 JP 2985243B2 JP 2180804 A JP2180804 A JP 2180804A JP 18080490 A JP18080490 A JP 18080490A JP 2985243 B2 JP2985243 B2 JP 2985243B2
Authority
JP
Japan
Prior art keywords
character
word
recognition
dictionary
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP2180804A
Other languages
Japanese (ja)
Other versions
JPH0468483A (en
Inventor
良一 湯下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2180804A priority Critical patent/JP2985243B2/en
Publication of JPH0468483A publication Critical patent/JPH0468483A/en
Application granted granted Critical
Publication of JP2985243B2 publication Critical patent/JP2985243B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】 産業上の利用分野 本発明は英文字のように複数の文字で構成された単語
を複数個並べて表現された文章の認識を行う文字認識方
法に関するものである。
Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method for recognizing a sentence expressed by arranging a plurality of words composed of a plurality of characters such as English characters.

従来の技術 近年、文字認識装置をコンピュータ等の入力装置とし
て利用する要求が高まっており、安定な認識結果を効率
的に得ることのできる文字認識装置がコンピュータ等の
システムの性能向上に不可欠となっている。従来の文字
認識の方法として認識対象文書から文章領域及び文字領
域を順次切り出し、文字領域に存在する文字一つ一つの
画像データから図形特徴を抽出し、予め用意された辞書
と比較することにより認識結果を得るものがあった。
2. Description of the Related Art In recent years, there has been an increasing demand for using a character recognition device as an input device such as a computer, and a character recognition device capable of efficiently obtaining a stable recognition result has become indispensable for improving the performance of a system such as a computer. ing. As a conventional character recognition method, a text area and a character area are sequentially cut out from a document to be recognized, and a graphic feature is extracted from image data of each character existing in the character area, and the recognition is performed by comparing with a prepared dictionary. Some have obtained results.

[発明が解決しようとする課題] 上記で説明した従来の文字認識の方法は、各文字に注
目して認識処理を行っており、文字領域を切り出す処理
の正確さが認識性能を決める一つの大きな要因となって
いる。しかしながら一般的な文書にはノイズやつぶれに
よる文字間の接触が多く存在する為、文字領域を正確に
切り出すのが困難となり認識率の低下を招いていた。
[Problems to be Solved by the Invention] In the conventional character recognition method described above, recognition processing is performed by focusing on each character, and the accuracy of the processing of cutting out a character region is one of the major factors that determine recognition performance. It is a factor. However, in a general document, there is a lot of contact between characters due to noise or crushing, so that it is difficult to cut out a character region accurately, leading to a decrease in recognition rate.

[課題を解決する為の手段] 本発明は上記問題点を解決する為、認識すべき文字画
像を読み取り、読み取った文字画像データから単語を切
り出し、切り出された各単語の画像データの特徴と単語
辞書とを比較して単語単位の認識を行い、前記単語辞書
で認識できなかった単語については、該単語を構成する
文字を切り出し、切り出された各文字の画像データの特
徴と文字辞書とを比較して文字単位の認識を行う構成と
した。
Means for Solving the Problems In order to solve the above problems, the present invention reads a character image to be recognized, cuts out a word from the read character image data, and sets the features and words of the image data of each cut out word. Performs word-by-word recognition by comparing with a dictionary, and for words that cannot be recognized by the word dictionary, cuts out characters constituting the word, and compares the features of the image data of each cut-out character with the character dictionary. Then, the recognition is performed in units of characters.

[作用] 本発明において、単語単位の文字認識処理と文字単位
の文字認識処理とを併用した場合における、効率的な文
字認識処理を行うことができるので、処理速度を向上さ
せることが可能となる。
[Operation] In the present invention, when character recognition processing in units of words and character recognition processing in units of characters are used in combination, efficient character recognition processing can be performed, so that processing speed can be improved. .

[実施例] 以下、本発明の一実施例を添付図面とともに説明す
る。第1図は本実施例の文字認識方法を用いた文字認識
装置の構成を示すブロック図である。第1図において、
1は認識対象文書を文書画像として入力する画像入力部
である。2は入力された文書画像から文字列の集まりを
見つけ、文章領域を出力する文章領域切り出し部であ
る。3は文書領域から単語単位の区切りを見つけ1つの
単語の範囲を単語領域として出力する単語切り出し部で
ある。4は単語領域から文字単位の区切りを見つけ1つ
の文字の範囲を文字領域として出力する文字切り出し部
である。5は単語領域から抽出した図形特徴と、出現頻
度にて限定した単語の図形特徴を基に作成された限定単
語特徴辞書7とを比較し、特徴が一致した単語を認識結
果として出力する1次認識処理部である。6は文字領域
から抽出した図形特徴と、認識対象となる全ての文字の
図形特徴を基に作成された文字特徴辞書8とを比較し、
特徴が一致した文字を認識結果とする2次認識処理部で
ある。7は出現頻度にて限定した単語の図形特徴を基に
作成された限定単語特徴辞書である。8は認識対象とな
る全ての文字の図形特徴を基に作成された文字特徴辞書
である。9は画像入力部1文章切り出し部2等の各部を
図のようにつなぐ内部バスである。10は1次認識処理部
5と限定単語辞書7とをつなぐ内部バスである。11は2
次認識処理部6と文字特徴辞書8とをつなぐ内部バスで
ある。
Hereinafter, one embodiment of the present invention will be described with reference to the accompanying drawings. FIG. 1 is a block diagram showing the configuration of a character recognition device using the character recognition method of this embodiment. In FIG.
An image input unit 1 inputs a recognition target document as a document image. Reference numeral 2 denotes a text area cutout unit that finds a group of character strings from the input document image and outputs a text area. Reference numeral 3 denotes a word cutout unit that finds a word-unit delimiter from a document area and outputs a range of one word as a word area. Reference numeral 4 denotes a character cutout unit that finds a character unit delimiter from a word area and outputs a range of one character as a character area. Reference numeral 5 compares a graphic feature extracted from the word area with a limited word feature dictionary 7 created based on the graphic feature of a word limited by the frequency of appearance, and outputs a word whose feature matches as a recognition result. It is a recognition processing unit. 6 compares a graphic feature extracted from the character area with a character feature dictionary 8 created based on the graphic features of all characters to be recognized.
This is a secondary recognition processing unit that uses a character with a matching feature as a recognition result. Reference numeral 7 denotes a limited word feature dictionary created based on the graphic features of words limited by appearance frequency. Reference numeral 8 denotes a character feature dictionary created based on the graphic features of all the characters to be recognized. Reference numeral 9 denotes an internal bus that connects each unit such as the image input unit 1 and the text cutout unit 2 as shown in the figure. Reference numeral 10 denotes an internal bus that connects the primary recognition processing unit 5 and the limited word dictionary 7. 11 is 2
An internal bus that connects the next recognition processing unit 6 and the character feature dictionary 8.

以上のように構成された本実施例の文字認識装置を用
いた文字認識装置について、第2図に全体の処理の流れ
図を、第3図に具体的な一例を示し、以下その動作を説
明する。認識したい文書を画像入力部にて文書画像とし
て入力する(処理12)。入力された文書画像を文章領域
切り出し部2に送り、文章領域切り出し部2にて文書画
像の縦方向及び横方向の黒画素のヒストグラムを求め、
これをもとに文章領域を見つける。そして文章領域の位
置情報を内部データとして蓄える(処理13)。単語切り
出し部3に文章領域の位置情報を送り、文章領域内に対
する単語切り出し処理を行う。単語切り出し部3では単
語の前後のスペースが単語内の文字間よりも大きいこと
に注目し、ある幅以上のスペースに挟まれた文字列を単
語として切り出し、文章領域内の全ての単語の位置情報
を求め、内部データとして蓄える(処理14)。第3図中
の24を文章領域切り出し結果とした時の、単語切り出し
結果を同図中25に示す。以下、処理15から処理22は、求
められた単語の位置情報を文章領域の左上端から全ての
単語に対して繰り返して行うことを前提として説明す
る。単語切り出し処理にて求められた単語の位置情報を
1次認識処理部5に送り、1次認識処理部5ではその位
置情報にて定まる単語領域内の図形特徴を求め、出現頻
度が高い順に限定した単語の図形特徴を基に作成された
限定単語特徴辞書7と比較し特徴が一致した単語を1次
認識処理部5における認識結果とする。一致する単語が
存在しない場合はリジェクト、すなわち棄却として後段
に伝える(処理15)。限定単語辞書は出現頻度が高い順
に限定し作成されるが、ここでは一般的に出現頻度が高
いと考えられる3文字以内の特別動詞、助動詞、人称代
名詞、冠詞、前置詞を限定単語として作成した。第3図
中26が1次認識処理の結果であり、‘she'、‘is'、‘t
he'、‘of'、‘my'は限定単語に含まれた為、認識さ
れ、‘daughter'、‘friend'は含まれない為、リジェク
トされたことを示している。一次認識処理部5にて認識
された場合は認識結果を出力し、処理21に処理を移す
(処理16、22)。リジェクトされた場合は次の処理を行
う(処理16)。一次認識処理部5にてリジェクトされた
単語を文字切り出し部4に送り、縦方向の黒画素のヒス
トグラムの変化により一文字ずつ区切り、単語領域内に
存在する全ての文字の位置情報が求められる(処理1
7)。第3図中27が文字単位に切り出された結果であ
る。求められた文字の位置情報は左端から2次認識処理
部6に送られ、文字単位の認識処理が行われる。2次認
識処理部6では送られてきた文字の位置情報をもとに文
字の図形情報を抽出し、これと認識対象となる全ての文
字の図形特徴を基に作成された文字特徴辞書8とを比較
し、特徴が一致した文字を認識結果として出力する(処
理18、19)。処理18から処理20を処理17にて求められた
文字の総数の回数繰り返し、一つの単語の認識を完了さ
せる。第3図中28に2次認識処理の結果を示す。以上、
処理15から処理22を、文章領域の左上端から全ての単語
に対して繰り返して行うことにより、文章領域内の単語
全てが認識処理され事となり、1次認識処理に於ける認
識結果及び2次認識処理に於ける認識結果を組合せ、最
終的な認識結果を得ることが出来る。第3図中29に認識
結果を示す。図中、‘she'、‘is'、‘the'、‘of'、
‘my'が1次認識処理結果であり、‘daughter'、‘frie
nd'が2次認識処理結果である。
FIG. 2 shows a flow chart of the entire processing and FIG. 3 shows a specific example of the character recognition device using the character recognition device of the present embodiment configured as described above, and the operation will be described below. . A document to be recognized is input as a document image by the image input unit (process 12). The input document image is sent to the text region cutout unit 2, and the text region cutout unit 2 obtains histograms of the vertical and horizontal black pixels of the document image,
The sentence area is found based on this. Then, the position information of the text area is stored as internal data (process 13). The position information of the sentence area is sent to the word cutout unit 3, and the word cutout processing is performed on the inside of the sentence area. The word extraction unit 3 focuses on the fact that the space before and after the word is larger than the space between the characters in the word, cuts out a character string sandwiched between spaces having a certain width or more as a word, and obtains position information of all the words in the text area. Is obtained and stored as internal data (process 14). When 24 in FIG. 3 is taken as the result of extracting the sentence area, the result of word extraction is shown in 25 in FIG. Hereinafter, the processing 15 to the processing 22 will be described on the assumption that the position information of the obtained word is repeatedly performed for all the words from the upper left corner of the text area. The position information of the word obtained by the word extraction processing is sent to the primary recognition processing unit 5, and the primary recognition processing unit 5 obtains a graphic feature in the word region determined by the position information, and limits the figure feature in the descending order of appearance frequency. The primary words are compared with a limited word feature dictionary 7 created based on the graphic features of the selected words, and words having matching features are used as the recognition results in the primary recognition processing unit 5. If there is no matching word, it is rejected, that is, rejected and notified to the subsequent stage (process 15). The limited word dictionary is created by limiting the order of appearance frequency. Here, special verbs, auxiliary verbs, personal pronouns, articles, and prepositions of up to three characters, which are generally considered to be frequently appearing, are created as limited words. In FIG. 3, reference numeral 26 denotes the result of the primary recognition processing, and the values of 'she', 'is',' t
The words "he", "of", and "my" were recognized because they were included in the limited words, and were rejected because they did not include "daughter" and "friend". If the primary recognition processing unit 5 recognizes the image, the recognition result is output, and the process proceeds to the process 21 (processes 16 and 22). If rejected, the following processing is performed (processing 16). The word rejected by the primary recognition processing unit 5 is sent to the character cutout unit 4, and is separated one by one by the change in the histogram of black pixels in the vertical direction, and the position information of all the characters existing in the word area is obtained (processing 1
7). In FIG. 3, reference numeral 27 denotes a result obtained by cutting out characters. The obtained character position information is sent from the left end to the secondary recognition processing unit 6, where a recognition process is performed for each character. The secondary recognition processing unit 6 extracts the graphic information of the character based on the sent positional information of the character, and a character characteristic dictionary 8 created based on the graphic information of all the characters to be recognized. Are compared, and a character having a matching feature is output as a recognition result (processes 18, 19). Processing 18 to processing 20 are repeated the number of times of the total number of characters obtained in processing 17 to complete recognition of one word. The result of the secondary recognition processing is shown at 28 in FIG. that's all,
By repeating the processing 15 to the processing 22 for all the words from the upper left corner of the text area, all the words in the text area are recognized and the recognition result in the primary recognition processing and the secondary By combining the recognition results in the recognition processing, a final recognition result can be obtained. The recognition result is shown in FIG. In the figure, 'she', 'is', 'the', 'of',
'my' is the primary recognition processing result, 'daughter', 'frie
nd 'is the result of the secondary recognition processing.

[発明の効果] 以上説明したように本発明は、認識すべき文字画像を
読み取り、読み取った文字画像データから単語を切り出
し、切り出された各単語の画像データの特徴と単語辞書
とを比較して単語単位の認識を行い、前記単語辞書で認
識できなかった単語については、該単語を構成する文字
を切り出し、切り出された各文字の画像データの特徴と
文字辞書とを比較して文字単位の認識を行う構成とした
ことにより、単語単位の文字認識処理と文字単位の文字
認識処理とを併用した場合における、効率的な文字認識
処理を行うことができるので、処理速度を向上させ、安
定な認識結果を得ることが出来る。
[Effects of the Invention] As described above, the present invention reads a character image to be recognized, cuts out a word from the read character image data, and compares the feature of the image data of each cut out word with the word dictionary. For words that could not be recognized by the word dictionary, the words constituting the words were cut out, and the characters of the cut-out characters were compared with the characteristics of the image data of each character and the character dictionary. In this configuration, efficient character recognition processing can be performed when character recognition processing on a word basis and character recognition processing on a character basis are used together, thereby improving processing speed and achieving stable recognition. The result can be obtained.

【図面の簡単な説明】[Brief description of the drawings]

第1図は本発明の一実施例における文字認識方法を用い
た文字認識装置の構成を示すブロック図、第2図は本実
施例の制御手順を示すフローチャート、第3図は処理過
程での認識対象文字列、認識結果を示す説明図である。 1……画像入力部、2……文章領域切り出し部、3……
単語切り出し部、4……文字切り出し部、5……1次認
識処理部、6……2次認識処理部、7……限定単語特徴
辞書、8……文字特徴辞書、
FIG. 1 is a block diagram showing a configuration of a character recognition apparatus using a character recognition method according to an embodiment of the present invention, FIG. 2 is a flowchart showing a control procedure of this embodiment, and FIG. FIG. 9 is an explanatory diagram showing a target character string and a recognition result. 1. Image input unit 2. Text region cutout unit 3.
Word extraction unit, 4 ... character extraction unit, 5 ... primary recognition processing unit, 6 ... secondary recognition processing unit, 7 ... limited word feature dictionary, 8 ... character feature dictionary,

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.6,DB名) G06K 9/62 G06K 9/72 ──────────────────────────────────────────────────続 き Continued on the front page (58) Field surveyed (Int.Cl. 6 , DB name) G06K 9/62 G06K 9/72

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】認識すべき文字画像を読み取り、 読み取った文字画像データから単語を切り出し、 切り出された各単語の画像データの特徴と、単語辞書と
を比較して単語単位の認識を行い、 前記単語辞書で認識できなかった単語については、該単
語を構成する文字を切り出し、 切り出された各文字の画像データの特徴と文字辞書とを
比較して文字単位の認識を行うことを特徴とする文字認
識方法。
1. A character image to be recognized is read, a word is cut out from the read character image data, and a feature of the cut out image data of each word is compared with a word dictionary to perform word-by-word recognition. For a word that could not be recognized by the word dictionary, characters constituting the word are cut out, and the feature of image data of each cut out character is compared with the character dictionary to perform character-by-character recognition. Recognition method.
JP2180804A 1990-07-09 1990-07-09 Character recognition method Expired - Lifetime JP2985243B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2180804A JP2985243B2 (en) 1990-07-09 1990-07-09 Character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2180804A JP2985243B2 (en) 1990-07-09 1990-07-09 Character recognition method

Publications (2)

Publication Number Publication Date
JPH0468483A JPH0468483A (en) 1992-03-04
JP2985243B2 true JP2985243B2 (en) 1999-11-29

Family

ID=16089638

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2180804A Expired - Lifetime JP2985243B2 (en) 1990-07-09 1990-07-09 Character recognition method

Country Status (1)

Country Link
JP (1) JP2985243B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4887636B2 (en) * 2005-02-28 2012-02-29 富士ゼロックス株式会社 Hologram recording medium

Also Published As

Publication number Publication date
JPH0468483A (en) 1992-03-04

Similar Documents

Publication Publication Date Title
CN109299480B (en) Context-based term translation method and device
WO2017177809A1 (en) Word segmentation method and system for language text
Halima et al. Nf-savo: Neuro-fuzzy system for arabic video ocr
JP2985243B2 (en) Character recognition method
CN115909381A (en) Text image recognition method, system and related device
JP3080066B2 (en) Character recognition device, method and storage medium
JP2000148788A (en) Device and method for extracting title area from document image and document retrieving method
JPH0528324A (en) English character recognition device
JP2746345B2 (en) Post-processing method for character recognition
JP2001147990A (en) Device and method for processing image data and storage medium to be utilized therefor
JP2995825B2 (en) Japanese character recognition device
JP3116453B2 (en) English character recognition device
JP3116452B2 (en) English character recognition device
JP3151866B2 (en) English character recognition method
JP3100786B2 (en) Character recognition post-processing method
JP3243389B2 (en) Document identification method
JP3345469B2 (en) Word spacing calculation method, word spacing calculation device, character reading method, character reading device
Abdulatif et al. Smart glasses robot for blind people using raspberry-pi and python
JP2013246721A (en) Character string recognition device, character string recognition program, and storage medium
JPH04372087A (en) English character recognition device
JPS60110089A (en) Character recognizer
JP2917310B2 (en) Word dictionary search method for word matching
JP2549831B2 (en) Character recognition device input pattern / character string registration method
JPH10334190A (en) Character recognition method and device and recording medium
JP2000200323A (en) Online handwriting kanji recognizing device

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081001

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091001

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101001

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101001

Year of fee payment: 11