JPS63121991A - Dictionary forming method for character recognizing device - Google Patents

Dictionary forming method for character recognizing device

Info

Publication number
JPS63121991A
JPS63121991A JP61268042A JP26804286A JPS63121991A JP S63121991 A JPS63121991 A JP S63121991A JP 61268042 A JP61268042 A JP 61268042A JP 26804286 A JP26804286 A JP 26804286A JP S63121991 A JPS63121991 A JP S63121991A
Authority
JP
Japan
Prior art keywords
dictionary
character
registered
arbitrary
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP61268042A
Other languages
Japanese (ja)
Inventor
Masahiro Nakamura
昌弘 中村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP61268042A priority Critical patent/JPS63121991A/en
Publication of JPS63121991A publication Critical patent/JPS63121991A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To recognize document written by a special type of character other than types printed on document for forming a dictionary, as well by specifying an arbitrary character deseired for registration to a dictionary from an input character string, extracting the feature amount of the specified character and registering it to a dictionary. CONSTITUTION:An arbitrary document at user's side are read out, and an arbitrary character is specified. Its feature amount is extracted and registered in the dictionary. In such a case, if a registered feature amount exists, the arithmetic means of the amount to a registered parameter is taken and registered again, or the newly amount is replaced with registered one to register it. Thus no special document is prepared to generate a dictionary, and the dictionary of arbitrary characters can be generated and registered easily and quickly. Moreover addition to a registered dictionary, replacement and new registration can be made easily. Further, a multifont dictionary can be very easily formed and registered.

Description

【発明の詳細な説明】 〔技術分野〕 本発明は光学的文字読取装置(QC:R)などの文字認
識装置において、認識処理に使用される特徴辞書の作成
方法に関する。
DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a method for creating a feature dictionary used in recognition processing in a character recognition device such as an optical character reader (QC:R).

〔従来技術〕[Prior art]

近年、パソコン等の発達にともない、OCR等の文字認
識処理をパソコンを用いてソフトウェア処理で実現でき
るようになってきた。第7図にその一般的なシステム構
成を示す。スキャナ11は。
In recent years, with the development of personal computers and the like, it has become possible to implement character recognition processing such as OCR by software processing using a personal computer. FIG. 7 shows the general system configuration. The scanner 11 is.

原稿上の文字列を光学的に読み取り、白画素は“0”、
黒画素は“1”の2値パターンに変換する。パソコン1
2では、まず、スキャナ11で読み取られた2値パター
ンを1文字単位に切り出し、その特徴量を抽出する。特
徴量の抽出には種々の方式があるが、例えば文字パター
ンの輪郭部に特定の方向コードをつけるとNもに、該文
字パターンを複数のブロックに分割し、各ブロック毎に
方向コード別のヒストグラムをとるのも一つの方法であ
る。この場合、分割するブロックの数を16ブロツク、
方向コードを8方向とすると、特徴量(特徴パラメータ
)は128次元で表わされる。
The character string on the document is read optically, and the white pixel is “0”.
Black pixels are converted into a binary pattern of "1". PC 1
In step 2, first, the binary pattern read by the scanner 11 is cut out character by character, and its feature amount is extracted. There are various methods for extracting feature values, but for example, if a specific direction code is attached to the outline of a character pattern, the character pattern is divided into multiple blocks, and each block is extracted by direction code. One method is to take a histogram. In this case, the number of blocks to be divided is 16 blocks,
If the direction code is 8 directions, the feature amount (feature parameter) is expressed in 128 dimensions.

次にパソコン12では、抽出した特徴量と辞書ファイル
13にあらかじめ登録されている辞書の同特徴量との距
離演算(マツチング)を行って候補文字を決定するmg
!を識結果は表示装置14やプリンタ15に出力する。
Next, the computer 12 performs distance calculation (matching) between the extracted feature amount and the same feature amount of a dictionary registered in advance in the dictionary file 13 to determine candidate characters.
! The identification results are output to the display device 14 and printer 15.

従来、このようなOCR用辞書は第8図のように作成し
ていた。同一文字が複数連続して印刷された辞書作成用
原稿を特別に用意し、まず、該原稿の1行分をスキャナ
で読み取ってパソコン本体内のRAMに格納する0次に
、1文字切り出しを繰り返して、複数の同一文字パター
ンについて各々特徴量を抽出し、それらの平均をとって
辞書ファイルに登録する。これを辞書作成用原稿の各行
について行う。
Conventionally, such an OCR dictionary was created as shown in FIG. A dictionary creation manuscript in which multiple identical characters are printed consecutively is specially prepared. First, one line of the manuscript is read with a scanner and stored in the RAM inside the computer. Next, one character is repeatedly cut out. Then, feature quantities are extracted for each of a plurality of identical character patterns, and the average of these is taken and registered in a dictionary file. This is done for each line of the manuscript for dictionary creation.

しかしながら、この方法により作成された辞書は、Lf
i!、rIiシようとする原稿の文字種が辞書作成用原
稿のもに合っている場合にしか適用することができず、
任意の文字種、変形文字等が記載された原稿をL2 ′
Ilt対象とすることができない不便さがあった。
However, the dictionary created by this method is Lf
i! , rIi can only be applied when the character type of the manuscript to be written matches that of the manuscript for dictionary creation,
Documents containing arbitrary character types, modified characters, etc. can be sent to L2'
There was an inconvenience that it could not be targeted for Ilt.

〔目 的〕 本発明の目的は、辞書作成用原稿に印刷された文字種以
外の、特殊な文字種等で記載されたJJλ稿も任意にS
42識対象とすることができる辞書作成方法を提供する
ことにある。
[Purpose] The purpose of the present invention is to allow JJλ drafts written in special character types other than the character types printed in the manuscript for dictionary creation to be arbitrarily S
The object of the present invention is to provide a dictionary creation method that can be used as a dictionary object.

〔構 成〕〔composition〕

本発明は、利用者の手元にある任意の原稿を読み取り、
その任意の文字を指定し、該指定した文字の特徴量を抽
出して辞書登録する。この場合、既に登録しである特徴
量があると、既登録パラメータとの加算平均等をとって
再登録するか、あるいは既登録のものと置き換えて90
することもできる。
The present invention reads any manuscript in the hands of a user,
Specify the arbitrary character, extract the feature amount of the specified character, and register it in the dictionary. In this case, if there is a feature that has already been registered, it will be re-registered by taking the average of the already registered parameters, or it will be replaced with the already registered one.
You can also.

以下、本発明の一実施例について図面により説明する。An embodiment of the present invention will be described below with reference to the drawings.

第1図は本発明の辞書作成方法の全体の処理フロー例を
示したものである。まず、第2図(A)に示すような任
意の原稿の文字列をスキャナ11で読み取り、白画素は
11 Q 71.黒画素は111 I+の2値パターン
に変換してパソコン12に入力する(ステップ101)
、パソコン12では、この入力された文字列パターンを
本体内のページメモリに格納するとNもに表示装置14
に表示する(ステップ102)。利用者は、この表示装
置14の表示内容を見て、辞書登録したい文字を1文字
選び、その行数及び列数を例えばパソコン12に付属の
キーボードより入力する(ステップ103)。
FIG. 1 shows an example of the overall processing flow of the dictionary creation method of the present invention. First, a character string of an arbitrary document as shown in FIG. 2(A) is read by the scanner 11, and the white pixels are 11 Q 71. Black pixels are converted into a binary pattern of 111 I+ and input to the computer 12 (step 101).
In the personal computer 12, when this input character string pattern is stored in the page memory in the main body, it is also displayed on the display device 14.
(Step 102). The user looks at the content displayed on the display device 14, selects one character to be registered in the dictionary, and inputs the number of rows and columns using, for example, a keyboard attached to the personal computer 12 (step 103).

例えば第2図(A)の原稿について、raimJのrm
Jであれば「1」と「6」を、rimportant 
JのrmJであれば「2」と「2」を入力する。なお、
入力手段にはキーボードのかわりにライトペン、マウス
等を利用することも可能である。
For example, for the manuscript in Figure 2 (A), ramJ's rm
For J, "1" and "6" are important.
If it is rmJ of J, enter "2" and "2". In addition,
It is also possible to use a light pen, a mouse, etc. as an input means instead of a keyboard.

パソコン12では、ページメモリの文字列パターンから
1行分切り出す(ステップ104)。そして、この切り
出した行が指定された該当行かチェックしくステップ1
05)、該当行でなければ次の行の切り出しに進む。9
.録したい文字の該当行が見つかると、文字切り出しに
進む(ステップ106)、文字切り出しは、上記該当行
の文字列について1文字単位に切り出し、切り出した文
字が指定された該当位置(列)のものかチェックしくス
テップ107)、該当位置のものでなければ次の文字切
り出しへ進む。このようにして、目的の文字が切り出さ
れると、その文字パターンを表示装置14に表示すると
\もに特徴量(特徴パラメータ)を抽出してそれも表示
する(ステップ108)。この表示例を第2図(B)に
示す。利用者がこの表示された文字と特徴パラメータを
見て登録するか否を決定しくステップ109)、<FM
が指定されると、辞書ファイル13にへ)録する(ステ
ップ110)。
The personal computer 12 cuts out one line from the character string pattern in the page memory (step 104). Then, check the corresponding line where this cut out line is specified, step 1
05), if the line does not match, proceed to cutting out the next line. 9
.. When the corresponding line of the character to be recorded is found, the process proceeds to character extraction (step 106).Character extraction is performed by cutting out the character string in the corresponding line one character at a time, and cutting out the character string in the corresponding position (column) where the extracted character is specified. Check whether the character is in the corresponding position (step 107), and if the character is not in the corresponding position, proceed to cutting out the next character. In this way, when the target character is cut out and the character pattern is displayed on the display device 14, feature quantities (feature parameters) are extracted and also displayed (step 108). An example of this display is shown in FIG. 2(B). The user looks at the displayed characters and feature parameters and decides whether to register or not. Step 109), <FM
When specified, it is recorded in the dictionary file 13 (step 110).

第3図は辞書の構成例を示したもので、一つの文字に対
応する辞書は1文字コード、辞書番号(文字種等の区別
に利用する)1個数(辞書作成に使用した文字数)、及
びN個(N次元)の特徴パラメータよりなる。
Figure 3 shows an example of the structure of a dictionary. The dictionary corresponding to one character has one character code, one dictionary number (used to distinguish character types, etc.), one number (number of characters used to create the dictionary), and N. It consists of (N dimensions) feature parameters.

次に、辞書登録の二、三の実施例について説明する。Next, a few examples of dictionary registration will be described.

第4図は登録指定文字と同一文字が既に辞IFに登録さ
れている場合の辞書登録処理フロー例を示したものであ
る。利用者は、辞書登録を指定して登録文字コードを入
力する。パソコンでは、この登録文字コードが既<e録
のものか否かチェックし、未登録文字コードの場合は、
抽出した特徴パラメータを入力文字コードとへもに辞書
に登録し、個数を1とする。既登録文字コードであれば
、該当辞書内の個数を読み出し、その値をnとした時、
次式。
FIG. 4 shows an example of the dictionary registration process flow when the same character as the registration designated character has already been registered in the dictionary IF. The user specifies dictionary registration and inputs the registered character code. On the computer, check whether this registered character code is already registered or not, and if it is an unregistered character code,
The extracted feature parameters are registered in the dictionary together with the input character code, and the number is set to 1. If it is a registered character code, read the number in the corresponding dictionary and set the value to n,
The following formula.

に従って特徴量を演算し、その結果を辞書に再登録する
。この時、個数はn+1となる。
The feature amount is calculated according to the following, and the result is re-registered in the dictionary. At this time, the number becomes n+1.

第5図は辞lF登録の他の実施例の処理フローを示した
もので、これはマルチフォント用辞書のように、1つの
文字コードについて複数の特徴量が存在する場合、辞書
番号を指定することによって、目的のテンプレートを検
索して登録する例である。
Fig. 5 shows the processing flow of another example of IF registration. In this case, when there are multiple features for one character code, such as a multi-font dictionary, the dictionary number is specified. This is an example of searching for and registering a target template.

登録処理の手順は基本的には第4図と同様である。The procedure of the registration process is basically the same as that shown in FIG.

第6図は登録したい文字と同一文字が既に登録されてい
る時、第4図のような加算平均による再登録にするか、
置き換え登録にするか、あるいは新規登録にするかを指
定して、登録する文字の辞書の性質により、登録条件を
任意に変更する場合の処理フロー例である。登録条件の
指定方法は。
Figure 6 shows that when the same character as the character you want to register has already been registered, do you want to re-register it by averaging as shown in Figure 4?
This is an example of a processing flow in which registration conditions are arbitrarily changed depending on the nature of the dictionary of characters to be registered by specifying whether to perform replacement registration or new registration. How to specify registration conditions.

辞書登録処理の開始時に1例えばメニュ一方式で条件を
番号でセットし、それをメモリに格納して第6図のフロ
ーに従いチェックしてもよく、あるいは実際の登録作業
の直前にパソコンと応答して決める方式でもよい。
At the beginning of the dictionary registration process, for example, you can set the conditions by number using the menu method, store them in memory and check them according to the flow shown in Figure 6, or you can communicate with the computer just before the actual registration process. It is also possible to decide by

〔効 果〕〔effect〕

以上の説明から明らかな如く、本発明によれば次のよう
な効果が得られる。
As is clear from the above description, the following effects can be obtained according to the present invention.

(1)辞書作成用に特別な原稿を用意することなく。(1) There is no need to prepare a special manuscript for dictionary creation.

任意の文字の辞書を簡単かつ迅速に作成し登録すること
ができる。
A dictionary of arbitrary characters can be easily and quickly created and registered.

(2)既登録辞書への追加、置き換え、新規登録を容易
に行うことができる。
(2) Additions to existing registered dictionaries, replacement, and new registration can be easily performed.

(3)マルチフォント用辞書の作成登録が極めて容易で
ある。
(3) Creating and registering a multi-font dictionary is extremely easy.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の辞書作成方法の一実施例の処理フロー
チャート、第2181は原稿の一例と登録対象文字、そ
の特徴パラメータの一例を示す図、第3図は辞書の構成
例を示す図、第4図乃至第6図は辞書9録の処理フロー
チャート、第7図はOCRのシステム構成例を示す図、
第8図は従来の辞書作成方法を説明する図である。 11・・・スキャナ、  12・・・パソコン、13・
・・辞書ファイル、  14・・・表示装置、15・・
・プリンタ。 第1図 第2図(A)      第2図(B)第3図 箪5図 第6図 第7図 第8図
1 is a processing flowchart of an embodiment of the dictionary creation method of the present invention; 2181 is a diagram showing an example of a manuscript, characters to be registered, and an example of their characteristic parameters; FIG. 3 is a diagram showing an example of the configuration of a dictionary; 4 to 6 are processing flowcharts of dictionary 9 records, and FIG. 7 is a diagram showing an example of an OCR system configuration.
FIG. 8 is a diagram illustrating a conventional dictionary creation method. 11...Scanner, 12...PC, 13.
...Dictionary file, 14...Display device, 15...
・Printer. Fig. 1 Fig. 2 (A) Fig. 2 (B) Fig. 3 Fig. 5 Fig. 6 Fig. 7 Fig. 8

Claims (2)

【特許請求の範囲】[Claims] (1)入力した文字列を1文字単位に切り出し、該切り
出した文字をあらかじめ用意された辞書を用いて認識す
る文字認識装置において、入力文字列から辞書登録した
い任意の文字を指定し、該指定した文字の特徴量を抽出
して辞書に登録することを特徴とする文字認識装置の辞
書作成方法。
(1) In a character recognition device that cuts out the input string into individual characters and recognizes the cut out characters using a dictionary prepared in advance, specify any character from the input string that you want to register in the dictionary, and 1. A dictionary creation method for a character recognition device, characterized by extracting characteristic amounts of characters that have been read and registering them in a dictionary.
(2)登録したい文字が辞書に既に登録されている場合
、抽出した文字の特徴パラメータと既登録パラメータの
加算平均をとって再登録することを特徴とする特許請求
の範囲第1項記載の文字認識装置の辞書作成方法。
(2) If the character to be registered is already registered in the dictionary, the character according to claim 1 is re-registered by taking the average of the characteristic parameters of the extracted character and the already registered parameters. How to create a dictionary for a recognition device.
JP61268042A 1986-11-11 1986-11-11 Dictionary forming method for character recognizing device Pending JPS63121991A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61268042A JPS63121991A (en) 1986-11-11 1986-11-11 Dictionary forming method for character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61268042A JPS63121991A (en) 1986-11-11 1986-11-11 Dictionary forming method for character recognizing device

Publications (1)

Publication Number Publication Date
JPS63121991A true JPS63121991A (en) 1988-05-26

Family

ID=17453069

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61268042A Pending JPS63121991A (en) 1986-11-11 1986-11-11 Dictionary forming method for character recognizing device

Country Status (1)

Country Link
JP (1) JPS63121991A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07134750A (en) * 1993-11-11 1995-05-23 Nec Corp Document image recognizing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07134750A (en) * 1993-11-11 1995-05-23 Nec Corp Document image recognizing device

Similar Documents

Publication Publication Date Title
EP0434930B1 (en) Editing text in an image
EP0439951B1 (en) Data processing
US6903751B2 (en) System and method for editing electronic images
JP2713622B2 (en) Tabular document reader
RU2437152C2 (en) Device to process images, method and computer programme to process images
JP3452774B2 (en) Character recognition method
US20040213458A1 (en) Image processing method and system
US7548916B2 (en) Calculating image similarity using extracted data
JPS63155386A (en) Document data reader
JP2001167131A (en) Automatic classifying method for document using document signature
JP2004252843A (en) Image processing method
CN104636428A (en) Trademark recommendation method and device
CN111401099A (en) Text recognition method, device and storage medium
US6535652B2 (en) Image retrieval apparatus and method, and computer-readable memory therefor
CN115828874A (en) Industry table digital processing method based on image recognition technology
CN116682118A (en) Ancient character recognition method, system, terminal and medium
JPS63121991A (en) Dictionary forming method for character recognizing device
Rao et al. Script identification of telugu, english and hindi document image
JP2582611B2 (en) How to create a multi-font dictionary
JP3014123U (en) Character recognition device
JP2537973B2 (en) Character recognition device
JP2662404B2 (en) Dictionary creation method for optical character reader
JP4328511B2 (en) Pattern recognition apparatus, pattern recognition method, program, and storage medium
JPS63155385A (en) Optical character reader
Ozaki Column segmentation by white space pattern matching