JPH03228195A - Optical character recognizing device - Google Patents

Optical character recognizing device

Info

Publication number
JPH03228195A
JPH03228195A JP2022267A JP2226790A JPH03228195A JP H03228195 A JPH03228195 A JP H03228195A JP 2022267 A JP2022267 A JP 2022267A JP 2226790 A JP2226790 A JP 2226790A JP H03228195 A JPH03228195 A JP H03228195A
Authority
JP
Japan
Prior art keywords
recognition
font
character
dictionary
outputs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2022267A
Other languages
Japanese (ja)
Inventor
Hideaki Ueda
上田 秀明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2022267A priority Critical patent/JPH03228195A/en
Publication of JPH03228195A publication Critical patent/JPH03228195A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To shorten the following recognition time of a type character by settling the classification of font in an original picture in accordance with the recognition category outputted from a recognizing part and information for collation. CONSTITUTION:A recognizing part 5 collates the recognition dictionary of a first font initially stored in a high-speed access storage part 6 with features of the character pattern extracted by a feature extracting part 4 and outputs the recognition category and information for collation to a recognition dictionary selecting part 7. A control part 9 repeats this processing with respect to one-row components of character patterns stored in a character pattern storage part 3 until collation with dictionaries of all fonts stored in a low-speed access storage part 8 is completed. When the processing is completed, a recognition dictionary selecting part 7 selects the recognition dictionary of an optimum font in accordance with the recognition category for each recognition dictionary and information for collation. The control part 9 stores the recognition dictionary of the reported font in the high-speed access storage part 6. Thus, the following collation time is shortened.

Description

【発明の詳細な説明】 [産業上の利用分野コ 本発明は光学的文字認識装置(以下OCRと称する)に
関し、特に多フォントの活字文字認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an optical character recognition device (hereinafter referred to as OCR), and particularly to a multi-font printed character recognition device.

[従来の技術] 従来、多フォントの活字文字を認識対象とするOCRは
、各フォントのカテゴリが持つ共通の特徴を辞書とした
。1カテゴリに対して1つの辞書を用意しているか、各
フォント毎に独立して1カテゴリに対して複数種の辞書
を用意している。
[Prior Art] Conventionally, OCR that recognizes printed characters in multiple fonts uses common features of each font category as a dictionary. One dictionary is prepared for one category, or multiple types of dictionaries are prepared for one category independently for each font.

[発明が解決しようとする課題] 上述した従来の多フォントの活字文字を認識対象するO
CRにおいては、前者の方式によると。
[Problem to be solved by the invention]
In CR, according to the former method.

多フォント間の同一カテゴリに共通な特徴を使用してい
るため読取精度が低く、かつ認識辞書の改良も難しいと
いう欠点がある。また後者の方式によると 全フォント
の辞書と照合するため照合時間が大となり、かつ認識辞
書を格納するメモリか大きくなり装置のコストを高くす
るという欠点がある。
Since common features are used in the same category among multiple fonts, the reading accuracy is low and it is difficult to improve the recognition dictionary. Furthermore, the latter method has disadvantages in that it takes a long time to check the dictionaries of all fonts, and also requires a large memory to store the recognition dictionary, increasing the cost of the device.

前述の両方式の欠点は認識対象字種か多ければ多いほど
大きな欠点となる。例えば認識対象か日本語の文書とな
れば、その認識対象文字種は最低でもJIS第1水準の
3000字種は必要になるし1文書の種類によってはJ
IS第2水準の6000字種も要求され、上述の欠点か
ら装置として実現することは困難となる。
The disadvantages of both of the above-mentioned methods become greater as the number of character types to be recognized increases. For example, if the recognition target is a Japanese document, the recognition target character types must be at least 3000 characters of JIS 1st level, and depending on the type of document, JIS
6,000 character types are also required at the IS second level, and the above-mentioned drawbacks make it difficult to implement as a device.

[課題を解決するための手段] 本発明の目的は、活字印刷物か通常は同一のフォントで
印刷されていることに着目して、印刷物の最初の頁に印
刷されている全文字を各フォント毎に独立して用意して
いる認識辞書と照合し、その時のカテゴリの認識結果に
よって印刷されている文字のフォントの種類を確定し、
以後そのフォントの辞書とのみ照合を行うことで前述の
両方式の問題点を解決するOCRを提供することにある
[Means for Solving the Problems] An object of the present invention is to focus on the fact that printed materials are usually printed in the same font, and to analyze all characters printed on the first page of printed materials for each font. The font type of the printed characters is determined based on the recognition result of the category at that time.
The object of the present invention is to provide an OCR that solves the problems of both of the above-mentioned methods by performing subsequent verification only with the dictionary of that font.

本発明によれば、原画を光電変換して1文字毎の文字パ
ターンを得て入力文字パターンを出力する観測部と、前
記入力文字パターンを一行分格納して格納された文字パ
ターンを1文字毎に出力する文字パターン格納部と、前
記格納された文字パターンの特徴を抽出して抽出された
特徴を出力する特徴抽出部と、複数フォント分の認識辞
書を格納する低速アクセス記憶部と、該低速アクセス記
憶部に格納された前記複数フォント分の認識辞書から選
択信号によって選択された1フォント分の認識辞書を選
択された認識辞書として格納する高速アクセス記憶部と
、前記抽出された特徴と前記選択された認識辞書とを照
合して認識カテゴリおよび照合情報を出力する認識部と
、前記認識カテゴリおよび照合情報から前記格納された
文字パターンに対応する認識対象印刷物のフォントを確
定して確定されたフォントを出力する認識辞書選択部と
、前記確定されたフォントから前記選択信号を出力する
制御部とを有する光学的文字認識装置か得られる。
According to the present invention, there is provided an observation unit that photoelectrically converts an original image to obtain a character pattern for each character and outputs an input character pattern, and an observation unit that stores the input character pattern for one line and outputs the input character pattern for each character. a character pattern storage unit that extracts features of the stored character patterns and outputs the extracted features; a low-speed access storage unit that stores recognition dictionaries for a plurality of fonts; a high-speed access storage unit that stores, as a selected recognition dictionary, a recognition dictionary for one font selected by a selection signal from the recognition dictionaries for the plurality of fonts stored in an access storage unit; and the extracted feature and the selection. a recognition unit that outputs a recognition category and verification information by comparing the recognition dictionary with the recognition dictionary; and a recognition unit that determines a font of the printed matter to be recognized that corresponds to the stored character pattern from the recognition category and verification information, and determines the determined font. An optical character recognition device is obtained, which includes a recognition dictionary selection unit that outputs the font, and a control unit that outputs the selection signal from the determined font.

[実施例] 次に本発明の実施例について図面を参、照して。[Example] Reference will now be made to the drawings for embodiments of the invention.

より詳細に説明する。This will be explained in more detail.

第1図は本発明の一実施例による光学的文字認識装置の
構成を示すブロック図である。
FIG. 1 is a block diagram showing the configuration of an optical character recognition device according to an embodiment of the present invention.

第1図において、観測部2は原画1を光電変換して得た
1文字毎の文字パターンを文字パターン格納部3に出力
する。文字パターン格納部3は。
In FIG. 1, an observation section 2 outputs a character pattern for each character obtained by photoelectrically converting an original image 1 to a character pattern storage section 3. The character pattern storage section 3 is.

−行分の文字パターンを格納するパターンメモリであり
、制御部9の制御信号により1文字毎に文字パターンを
特徴抽出部4に出力する。特徴抽出部4は文字パターン
格納部3から送られてくる文字パターンの特徴を抽出し
、認識部5に出力する。
- A pattern memory that stores character patterns for lines, and outputs the character pattern to the feature extraction unit 4 character by character according to a control signal from the control unit 9. The feature extraction section 4 extracts the features of the character pattern sent from the character pattern storage section 3 and outputs them to the recognition section 5.

認識部5は高速アクセス記憶部6に初期格納されている
第1のフォントの認識辞書と特徴抽出部4て抽出された
文字パターンの特徴とを照合し、認識ガテゴリと照合時
の情報を認識辞書選択部7に出力する。認識辞書選択部
7は認識部5から出力されて(る認識カテゴリと照合時
の情報のうち。
The recognition unit 5 compares the recognition dictionary of the first font initially stored in the high-speed access storage unit 6 with the characteristics of the character pattern extracted by the feature extraction unit 4, and uses the recognition category and information at the time of the comparison as a recognition dictionary. It is output to the selection section 7. The recognition dictionary selection unit 7 selects the recognition categories output from the recognition unit 5 and the information at the time of comparison.

−行分の認識カテゴリを格納する。- Store recognition categories for lines.

制御部9は文字パターン格納部3に格納されている一行
分の文字パターンについて上述の処理を繰り返し実施す
る制御を行い、高速アクセス記憶部6に初期格納されて
いる認識辞書による照合か完了すると、低速アクセス記
憶部8にあらかじめ格納されている第2のフォントの認
識辞書を高速アクセス記憶部6に格納する制御を行う。
The control unit 9 performs control to repeatedly perform the above-mentioned processing on the character pattern for one line stored in the character pattern storage unit 3, and when the verification using the recognition dictionary initially stored in the high-speed access storage unit 6 is completed, Control is performed to store the second font recognition dictionary, which is previously stored in the low-speed access storage section 8, in the high-speed access storage section 6.

第2のフォントの認識辞書が高速アクセス記憶部6に格
納されると、制御部9は前記と同様の処理を繰り返し、
第2のフォントの認識辞書との照合を行い。
When the second font recognition dictionary is stored in the high-speed access storage unit 6, the control unit 9 repeats the same process as described above,
Check with the recognition dictionary of the second font.

認識カテゴリと照合時の情報のうち、−行分の認識カテ
ゴリを認識辞書選択部7に格納していく。
Among the recognition categories and the information at the time of comparison, the recognition categories corresponding to - lines are stored in the recognition dictionary selection section 7.

これらの処理は低速アクセス記憶部8に格納されいる全
てのフォントの辞書との照合が完了するまで繰り返され
る。
These processes are repeated until all fonts stored in the low-speed access storage section 8 have been checked against the dictionary.

前記処理が完了すると、第1のフォントの認識辞書から
最後のフォントの認識辞書までの認識カテゴリと照合時
の情報により、認識辞書選択部7は最適なフォントの超
重辞書を選択し、制御部9に通知する。制御部9は低速
アクセス記憶部8から通知されたフォントの超重辞書を
高速アクセス記憶部6に格納する制御を行う。ここで制
御部9は以後最適なフォントの辞書を照合した認識カテ
ゴリ出力する。
When the above processing is completed, the recognition dictionary selection unit 7 selects the most suitable font super-heavy dictionary based on the recognition categories from the first font recognition dictionary to the last font recognition dictionary and the information at the time of collation, and the control unit 9 Notify. The control unit 9 performs control to store the super heavy dictionary of fonts notified from the low-speed access storage unit 8 in the high-speed access storage unit 6 . At this point, the control unit 9 outputs the recognition category that has been checked with the dictionary of the most suitable font.

「発明の効果」 以上説明したように本発明によれば1原画中のフォント
の種類を認識部から出力される認識カテゴリと照合時の
情報によって確定することによって、その後の活字文字
の認識時間を短縮させることかでき、かつ認識辞書を格
納するメモリも1フォント分のメモリ容量で装置を構成
することかできるという効果かある。
"Effects of the Invention" As explained above, according to the present invention, by determining the type of font in one original image based on the recognition category output from the recognition unit and the information at the time of matching, the subsequent recognition time of printed characters is reduced. This has the advantage that it is possible to reduce the size of the font, and the memory for storing the recognition dictionary can be configured with a memory capacity equivalent to one font.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例による光学的文字認識装置の
構成を示すブロック図である。 1・・・原画12・・・観/I11部、3・・・文字パ
ターン格納部、4・・・特徴抽出部、5・・・認識部、
6・・・高速アクセス記憶部、7・・・認識辞書選択部
、8・・・低速アクセス記憶部、9・・・制御部。
FIG. 1 is a block diagram showing the configuration of an optical character recognition device according to an embodiment of the present invention. 1... Original picture 12... View/I11 section, 3... Character pattern storage section, 4... Feature extraction section, 5... Recognition section,
6... High-speed access storage unit, 7... Recognition dictionary selection unit, 8... Low-speed access storage unit, 9... Control unit.

Claims (1)

【特許請求の範囲】[Claims] 1、原画を光電変換して1文字毎の文字パターンを得て
入力文字パターンを出力する観測部と、前記入力文字パ
ターンを一行分格納して格納された文字パターンを1文
字毎に出力する文字パターン格納部と、前記格納された
文字パターンの特徴を抽出して抽出された特徴を出力す
る特徴抽出部と、複数フォント分の認識辞書を格納する
低速アクセス記憶部と、該低速アクセス記憶部に格納さ
れた前記複数フォント分の認識辞書から選択信号によっ
て選択された1フォント分の認識辞書を選択された認識
辞書として格納する高速アクセス記憶部と、前記抽出さ
れた特徴と前記選択された認識辞書とを照合して認識カ
テゴリおよび照合情報を出力する認識部と、前記認識カ
テゴリおよび照合情報から前記格納された文字パターン
に対応する認識対象印刷物のフォントを確定して確定さ
れたフォントを出力する認識辞書選択部と、前記確定さ
れたフォントから前記選択信号を出力する制御部とを有
する光学的文字認識装置。
1. An observation unit that photoelectrically converts the original image to obtain a character pattern for each character and outputs the input character pattern, and a character that stores the input character pattern for one line and outputs the stored character pattern for each character. a pattern storage section; a feature extraction section that extracts features of the stored character patterns and outputs the extracted features; a low-speed access storage section that stores recognition dictionaries for a plurality of fonts; a high-speed access storage unit that stores a recognition dictionary for one font selected by a selection signal from the stored recognition dictionaries for the plurality of fonts as a selected recognition dictionary; and the extracted feature and the selected recognition dictionary. a recognition unit that outputs a recognition category and verification information by comparing the recognition category and verification information; and a recognition unit that determines a font of a recognized printed matter corresponding to the stored character pattern from the recognition category and verification information and outputs the determined font. An optical character recognition device comprising: a dictionary selection section; and a control section outputting the selection signal from the determined font.
JP2022267A 1990-02-02 1990-02-02 Optical character recognizing device Pending JPH03228195A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022267A JPH03228195A (en) 1990-02-02 1990-02-02 Optical character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2022267A JPH03228195A (en) 1990-02-02 1990-02-02 Optical character recognizing device

Publications (1)

Publication Number Publication Date
JPH03228195A true JPH03228195A (en) 1991-10-09

Family

ID=12077997

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2022267A Pending JPH03228195A (en) 1990-02-02 1990-02-02 Optical character recognizing device

Country Status (1)

Country Link
JP (1) JPH03228195A (en)

Similar Documents

Publication Publication Date Title
JP3139521B2 (en) Automatic language determination device
US4979227A (en) Method for automatic character recognition employing a lexicon having updated character strings
US5119437A (en) Tabular document reader service
JPH0721319A (en) Automatic determination device of asian language
EP0032913B1 (en) Multi-font character recognition technique
US4799271A (en) Optical character reader apparatus
US4491965A (en) Character recognition apparatus
JPH03228195A (en) Optical character recognizing device
WO2022025216A1 (en) Information processing device using compression data search engine, and information processing method therefor
JPS63150787A (en) Optical character recognition device
JPS6089290A (en) Pattern recognition method
Eqbal EXTRACTION AND DETECTION OF TEXT FROM IMAGES
JP3121401B2 (en) Recognition dictionary and character recognition device
JPS6120180A (en) Optical character recognizing device
JPS63147287A (en) Optical character recognizing device
JP2917396B2 (en) Character recognition method
JPS5914078A (en) Reader of business form
JPS63269267A (en) Character recognizing device
JPH02205990A (en) Optical character recognizing device
JP3116453B2 (en) English character recognition device
JP2784004B2 (en) Character recognition device
JPS60254388A (en) Optical character reader
JP2977244B2 (en) Character recognition method and character recognition device
JPS6191780A (en) Character recognizing device
JPH1166240A (en) Method and device for character recognition