JPS63121991A

JPS63121991A - Dictionary forming method for character recognizing device

Info

Publication number: JPS63121991A
Application number: JP61268042A
Authority: JP
Inventors: Masahiro Nakamura; 昌弘中村
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-11-11
Filing date: 1986-11-11
Publication date: 1988-05-26

Abstract

PURPOSE:To recognize document written by a special type of character other than types printed on document for forming a dictionary, as well by specifying an arbitrary character deseired for registration to a dictionary from an input character string, extracting the feature amount of the specified character and registering it to a dictionary. CONSTITUTION:An arbitrary document at user's side are read out, and an arbitrary character is specified. Its feature amount is extracted and registered in the dictionary. In such a case, if a registered feature amount exists, the arithmetic means of the amount to a registered parameter is taken and registered again, or the newly amount is replaced with registered one to register it. Thus no special document is prepared to generate a dictionary, and the dictionary of arbitrary characters can be generated and registered easily and quickly. Moreover addition to a registered dictionary, replacement and new registration can be made easily. Further, a multifont dictionary can be very easily formed and registered.

Description

【発明の詳細な説明】〔技術分野〕本発明は光学的文字読取装置（ＱＣ：Ｒ）などの文字認
識装置において、認識処理に使用される特徴辞書の作成
方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a method for creating a feature dictionary used in recognition processing in a character recognition device such as an optical character reader (QC:R).

[Prior art]

近年、パソコン等の発達にともない、ＯＣＲ等の文字認
識処理をパソコンを用いてソフトウェア処理で実現でき
るようになってきた。第７図にその一般的なシステム構
成を示す。スキャナ１１は。In recent years, with the development of personal computers and the like, it has become possible to implement character recognition processing such as OCR by software processing using a personal computer. FIG. 7 shows the general system configuration. The scanner 11 is.

原稿上の文字列を光学的に読み取り、白画素は“０”、
黒画素は“１”の２値パターンに変換する。パソコン１
２では、まず、スキャナ１１で読み取られた２値パター
ンを１文字単位に切り出し、その特徴量を抽出する。特
徴量の抽出には種々の方式があるが、例えば文字パター
ンの輪郭部に特定の方向コードをつけるとＮもに、該文
字パターンを複数のブロックに分割し、各ブロック毎に
方向コード別のヒストグラムをとるのも一つの方法であ
る。この場合、分割するブロックの数を１６ブロツク、
方向コードを８方向とすると、特徴量（特徴パラメータ
）は１２８次元で表わされる。The character string on the document is read optically, and the white pixel is “0”.
Black pixels are converted into a binary pattern of "1". PC 1
In step 2, first, the binary pattern read by the scanner 11 is cut out character by character, and its feature amount is extracted. There are various methods for extracting feature values, but for example, if a specific direction code is attached to the outline of a character pattern, the character pattern is divided into multiple blocks, and each block is extracted by direction code. One method is to take a histogram. In this case, the number of blocks to be divided is 16 blocks,
If the direction code is 8 directions, the feature amount (feature parameter) is expressed in 128 dimensions.

次にパソコン１２では、抽出した特徴量と辞書ファイル
１３にあらかじめ登録されている辞書の同特徴量との距
離演算（マツチング）を行って候補文字を決定するｍｇ
！を識結果は表示装置１４やプリンタ１５に出力する。Next, the computer 12 performs distance calculation (matching) between the extracted feature amount and the same feature amount of a dictionary registered in advance in the dictionary file 13 to determine candidate characters.
! The identification results are output to the display device 14 and printer 15.

従来、このようなＯＣＲ用辞書は第８図のように作成し
ていた。同一文字が複数連続して印刷された辞書作成用
原稿を特別に用意し、まず、該原稿の１行分をスキャナ
で読み取ってパソコン本体内のＲＡＭに格納する０次に
、１文字切り出しを繰り返して、複数の同一文字パター
ンについて各々特徴量を抽出し、それらの平均をとって
辞書ファイルに登録する。これを辞書作成用原稿の各行
について行う。Conventionally, such an OCR dictionary was created as shown in FIG. A dictionary creation manuscript in which multiple identical characters are printed consecutively is specially prepared. First, one line of the manuscript is read with a scanner and stored in the RAM inside the computer. Next, one character is repeatedly cut out. Then, feature quantities are extracted for each of a plurality of identical character patterns, and the average of these is taken and registered in a dictionary file. This is done for each line of the manuscript for dictionary creation.

しかしながら、この方法により作成された辞書は、Ｌｆ
ｉ！、ｒＩｉシようとする原稿の文字種が辞書作成用原
稿のもに合っている場合にしか適用することができず、
任意の文字種、変形文字等が記載された原稿をＬ２　′
Ｉｌｔ対象とすることができない不便さがあった。However, the dictionary created by this method is Lf
i! , rIi can only be applied when the character type of the manuscript to be written matches that of the manuscript for dictionary creation,
Documents containing arbitrary character types, modified characters, etc. can be sent to L2'
There was an inconvenience that it could not be targeted for Ilt.

〔目　的〕本発明の目的は、辞書作成用原稿に印刷された文字種以
外の、特殊な文字種等で記載されたＪＪλ稿も任意にＳ
４２識対象とすることができる辞書作成方法を提供する
ことにある。[Purpose] The purpose of the present invention is to allow JJλ drafts written in special character types other than the character types printed in the manuscript for dictionary creation to be arbitrarily S
The object of the present invention is to provide a dictionary creation method that can be used as a dictionary object.

〔composition〕

本発明は、利用者の手元にある任意の原稿を読み取り、
その任意の文字を指定し、該指定した文字の特徴量を抽
出して辞書登録する。この場合、既に登録しである特徴
量があると、既登録パラメータとの加算平均等をとって
再登録するか、あるいは既登録のものと置き換えて９０
することもできる。The present invention reads any manuscript in the hands of a user,
Specify the arbitrary character, extract the feature amount of the specified character, and register it in the dictionary. In this case, if there is a feature that has already been registered, it will be re-registered by taking the average of the already registered parameters, or it will be replaced with the already registered one.
You can also.

以下、本発明の一実施例について図面により説明する。An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の辞書作成方法の全体の処理フロー例を
示したものである。まず、第２図（Ａ）に示すような任
意の原稿の文字列をスキャナ１１で読み取り、白画素は
１１　Ｑ　７１．黒画素は１１１　Ｉ＋の２値パターン
に変換してパソコン１２に入力する（ステップ１０１）
、パソコン１２では、この入力された文字列パターンを
本体内のページメモリに格納するとＮもに表示装置１４
に表示する（ステップ１０２）。利用者は、この表示装
置１４の表示内容を見て、辞書登録したい文字を１文字
選び、その行数及び列数を例えばパソコン１２に付属の
キーボードより入力する（ステップ１０３）。FIG. 1 shows an example of the overall processing flow of the dictionary creation method of the present invention. First, a character string of an arbitrary document as shown in FIG. 2(A) is read by the scanner 11, and the white pixels are 11 Q 71. Black pixels are converted into a binary pattern of 111 I+ and input to the computer 12 (step 101).
In the personal computer 12, when this input character string pattern is stored in the page memory in the main body, it is also displayed on the display device 14.
(Step 102). The user looks at the content displayed on the display device 14, selects one character to be registered in the dictionary, and inputs the number of rows and columns using, for example, a keyboard attached to the personal computer 12 (step 103).

例えば第２図（Ａ）の原稿について、ｒａｉｍＪのｒｍ
Ｊであれば「１」と「６」を、ｒｉｍｐｏｒｔａｎｔ　
ＪのｒｍＪであれば「２」と「２」を入力する。なお、
入力手段にはキーボードのかわりにライトペン、マウス
等を利用することも可能である。For example, for the manuscript in Figure 2 (A), ramJ's rm
For J, "1" and "6" are important.
If it is rmJ of J, enter "2" and "2". In addition,
It is also possible to use a light pen, a mouse, etc. as an input means instead of a keyboard.

パソコン１２では、ページメモリの文字列パターンから
１行分切り出す（ステップ１０４）。そして、この切り
出した行が指定された該当行かチェックしくステップ１
０５）、該当行でなければ次の行の切り出しに進む。９
．録したい文字の該当行が見つかると、文字切り出しに
進む（ステップ１０６）、文字切り出しは、上記該当行
の文字列について１文字単位に切り出し、切り出した文
字が指定された該当位置（列）のものかチェックしくス
テップ１０７）、該当位置のものでなければ次の文字切
り出しへ進む。このようにして、目的の文字が切り出さ
れると、その文字パターンを表示装置１４に表示すると
＼もに特徴量（特徴パラメータ）を抽出してそれも表示
する（ステップ１０８）。この表示例を第２図（Ｂ）に
示す。利用者がこの表示された文字と特徴パラメータを
見て登録するか否を決定しくステップ１０９）、＜ＦＭ
が指定されると、辞書ファイル１３にへ）録する（ステ
ップ１１０）。The personal computer 12 cuts out one line from the character string pattern in the page memory (step 104). Then, check the corresponding line where this cut out line is specified, step 1
05), if the line does not match, proceed to cutting out the next line. 9
．． When the corresponding line of the character to be recorded is found, the process proceeds to character extraction (step 106).Character extraction is performed by cutting out the character string in the corresponding line one character at a time, and cutting out the character string in the corresponding position (column) where the extracted character is specified. Check whether the character is in the corresponding position (step 107), and if the character is not in the corresponding position, proceed to cutting out the next character. In this way, when the target character is cut out and the character pattern is displayed on the display device 14, feature quantities (feature parameters) are extracted and also displayed (step 108). An example of this display is shown in FIG. 2(B). The user looks at the displayed characters and feature parameters and decides whether to register or not. Step 109), <FM
When specified, it is recorded in the dictionary file 13 (step 110).

第３図は辞書の構成例を示したもので、一つの文字に対
応する辞書は１文字コード、辞書番号（文字種等の区別
に利用する）１個数（辞書作成に使用した文字数）、及
びＮ個（Ｎ次元）の特徴パラメータよりなる。Figure 3 shows an example of the structure of a dictionary. The dictionary corresponding to one character has one character code, one dictionary number (used to distinguish character types, etc.), one number (number of characters used to create the dictionary), and N. It consists of (N dimensions) feature parameters.

次に、辞書登録の二、三の実施例について説明する。Next, a few examples of dictionary registration will be described.

第４図は登録指定文字と同一文字が既に辞ＩＦに登録さ
れている場合の辞書登録処理フロー例を示したものであ
る。利用者は、辞書登録を指定して登録文字コードを入
力する。パソコンでは、この登録文字コードが既＜ｅ録
のものか否かチェックし、未登録文字コードの場合は、
抽出した特徴パラメータを入力文字コードとへもに辞書
に登録し、個数を１とする。既登録文字コードであれば
、該当辞書内の個数を読み出し、その値をｎとした時、
次式。FIG. 4 shows an example of the dictionary registration process flow when the same character as the registration designated character has already been registered in the dictionary IF. The user specifies dictionary registration and inputs the registered character code. On the computer, check whether this registered character code is already registered or not, and if it is an unregistered character code,
The extracted feature parameters are registered in the dictionary together with the input character code, and the number is set to 1. If it is a registered character code, read the number in the corresponding dictionary and set the value to n,
The following formula.

に従って特徴量を演算し、その結果を辞書に再登録する
。この時、個数はｎ＋１となる。The feature amount is calculated according to the following, and the result is re-registered in the dictionary. At this time, the number becomes n+1.

第５図は辞ｌＦ登録の他の実施例の処理フローを示した
もので、これはマルチフォント用辞書のように、１つの
文字コードについて複数の特徴量が存在する場合、辞書
番号を指定することによって、目的のテンプレートを検
索して登録する例である。Fig. 5 shows the processing flow of another example of IF registration. In this case, when there are multiple features for one character code, such as a multi-font dictionary, the dictionary number is specified. This is an example of searching for and registering a target template.

登録処理の手順は基本的には第４図と同様である。The procedure of the registration process is basically the same as that shown in FIG.

第６図は登録したい文字と同一文字が既に登録されてい
る時、第４図のような加算平均による再登録にするか、
置き換え登録にするか、あるいは新規登録にするかを指
定して、登録する文字の辞書の性質により、登録条件を
任意に変更する場合の処理フロー例である。登録条件の
指定方法は。Figure 6 shows that when the same character as the character you want to register has already been registered, do you want to re-register it by averaging as shown in Figure 4?
This is an example of a processing flow in which registration conditions are arbitrarily changed depending on the nature of the dictionary of characters to be registered by specifying whether to perform replacement registration or new registration. How to specify registration conditions.

辞書登録処理の開始時に１例えばメニュ一方式で条件を
番号でセットし、それをメモリに格納して第６図のフロ
ーに従いチェックしてもよく、あるいは実際の登録作業
の直前にパソコンと応答して決める方式でもよい。At the beginning of the dictionary registration process, for example, you can set the conditions by number using the menu method, store them in memory and check them according to the flow shown in Figure 6, or you can communicate with the computer just before the actual registration process. It is also possible to decide by

〔effect〕

以上の説明から明らかな如く、本発明によれば次のよう
な効果が得られる。As is clear from the above description, the following effects can be obtained according to the present invention.

（１）辞書作成用に特別な原稿を用意することなく。(1) There is no need to prepare a special manuscript for dictionary creation.

任意の文字の辞書を簡単かつ迅速に作成し登録すること
ができる。A dictionary of arbitrary characters can be easily and quickly created and registered.

（２）既登録辞書への追加、置き換え、新規登録を容易
に行うことができる。(2) Additions to existing registered dictionaries, replacement, and new registration can be easily performed.

（３）マルチフォント用辞書の作成登録が極めて容易で
ある。(3) Creating and registering a multi-font dictionary is extremely easy.

[Brief explanation of the drawing]

第１図は本発明の辞書作成方法の一実施例の処理フロー
チャート、第２１８１は原稿の一例と登録対象文字、そ
の特徴パラメータの一例を示す図、第３図は辞書の構成
例を示す図、第４図乃至第６図は辞書９録の処理フロー
チャート、第７図はＯＣＲのシステム構成例を示す図、
第８図は従来の辞書作成方法を説明する図である。１１・・・スキャナ、　　１２・・・パソコン、１３・
・・辞書ファイル、　　１４・・・表示装置、１５・・
・プリンタ。第１図第２図（Ａ）　　　　　　第２図（Ｂ）第３図箪５図第６図第７図第８図1 is a processing flowchart of an embodiment of the dictionary creation method of the present invention; 2181 is a diagram showing an example of a manuscript, characters to be registered, and an example of their characteristic parameters; FIG. 3 is a diagram showing an example of the configuration of a dictionary; 4 to 6 are processing flowcharts of dictionary 9 records, and FIG. 7 is a diagram showing an example of an OCR system configuration.
FIG. 8 is a diagram illustrating a conventional dictionary creation method. 11...Scanner, 12...PC, 13.
...Dictionary file, 14...Display device, 15...
・Printer. Fig. 1 Fig. 2 (A) Fig. 2 (B) Fig. 3 Fig. 5 Fig. 6 Fig. 7 Fig. 8

Claims

[Claims]

(1) In a character recognition device that cuts out the input string into individual characters and recognizes the cut out characters using a dictionary prepared in advance, specify any character from the input string that you want to register in the dictionary, and 1. A dictionary creation method for a character recognition device, characterized by extracting characteristic amounts of characters that have been read and registering them in a dictionary.

(2) If the character to be registered is already registered in the dictionary, the character according to claim 1 is re-registered by taking the average of the characteristic parameters of the extracted character and the already registered parameters. How to create a dictionary for a recognition device.