JP2015069256A

JP2015069256A - Character identification system

Info

Publication number: JP2015069256A
Application number: JP2013200753A
Authority: JP
Inventors: 永崎　健; Takeshi Nagasaki; 健永崎; 孝志河合; Takashi Kawai; 平林　元明; Motoaki Hirabayashi; 平林　　元明; 正行小澤; Masayuki Ozawa; 松田　純一; Junichi Matsuda; 純一松田; 昇一中上; Shoichi Nakagami; 英宣谷口; Hidenori Taniguchi; 正和藤尾; Masakazu Fujio; 竜治嶺
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-09-27
Filing date: 2013-09-27
Publication date: 2015-04-13

Abstract

PROBLEM TO BE SOLVED: To provide an addition learning function using a few samples, in an automatic recognition device and an automatic recognition service.SOLUTION: A character identification system includes: a character image input receiving unit which receives input of a sample character image; a character component extraction unit which extracts a character component, on the basis of the sample character image; a pseudo-character model generation unit which generates a pseudo-character model, on the basis of the character component; and an identification dictionary generation unit which generates a character identification pattern, on the basis of the pseudo-character model, to generate an identification dictionary.

Description

本発明は、手書きや活字等の文書や、映像や写真などの画像において文字を認識するシステム及び関連サービスに関する。 The present invention relates to a system and related services for recognizing characters in documents such as handwritten characters and printed characters, and images such as videos and photographs.

情報利活用の効率化に対する社会的関心の高まりに伴って、サーバ又は個人が所有するパーソナルコンピュータ（ＰＣ）に格納された大量の電子文書中から、有益な情報を高精度に検索して、整理するための情報活用技術が求められている。企業情報システムに格納されるデータは年５０％〜６０％と年々増加する一方で、データベースに蓄積される構造化データは２０％程、残り８０％は文書画像・写真・映像等の非構造化データとされる。これら企業や組織内に蓄えられた膨大な非構造データとしての文書群は、本来、その当該組織にとっての情報資産であるが、現状では完全に活用できているとは言い難い。 With increasing social interest in improving the efficiency of information utilization, useful information can be searched and organized from a large number of electronic documents stored on a server or personal computer (PC) owned by an individual with high accuracy. There is a need for information utilization technology to do this. The data stored in the corporate information system increases from 50% to 60% annually, while the structured data stored in the database is about 20% and the remaining 80% is unstructured such as document images / photos / videos. Data. A large amount of unstructured data stored in the company or organization is originally an information asset for the organization, but it is difficult to say that it is fully utilized at present.

情報活用のためには、紙文書・文書画像・写真・映像中に含まれる文字列をコード化しておく必要がある。この時に問題となるのが姓名の異体字や地名等に使われる外字の取扱いである。 In order to utilize information, it is necessary to code character strings included in paper documents, document images, photographs, and videos. The problem at this time is the handling of external characters used for variants of first and last names and place names.

例えば日本の戸籍法等の関連法令・通達によれば、戸籍に使うことができる文字種は約５万種類あるとされている。これは一般的に使われているＪＩＳ（ＪＩＳＸ０２０８）、シフトＪＩＳなどの文字コード体系（第一水準、第二水準）に比べて、はるかに字種が多いため、データ入力や検索などで、これまでは、各企業や自治体のシステム毎に、それぞれ独自の外字取扱い機能を構築してきた。全世界の文字コード体系をカバーするＵＮＩＣＯＤＥにおいても、戸籍で使うための文字セット・字形とは異なる点が多々あるため、必ずしも使える訳ではない。近年では、これら行政等で用いられる人名漢字等の漢字を整備する試みとして、文字情報基盤、戸籍統一文字や住民基本台帳ネットワーク統一文字などの文字体系が検討されている。文字体系毎に扱える文字種は異なるが、大体約２万から６万の字種を扱うように設計されている。海外の文字種を含めれば、その数はさらに増大する。 For example, according to related laws and notices such as the Japanese Family Register Act, there are about 50,000 character types that can be used for family register. This is much more character than character code systems (first level, second level) such as JIS (JIS X 0208) and Shift JIS, which are generally used. So far, each company and local government system has built its own external character handling function. Even in UNICODE, which covers character code systems around the world, there are many differences from character sets and shapes used for family register, so they are not necessarily usable. In recent years, character systems such as character information infrastructure, family register unified characters, and basic resident register network unified characters have been studied as an attempt to improve kanji such as personal name kanji used in such administrations. It is designed to handle approximately 20,000 to 60,000 character types, although the character types that can be handled differ depending on the character system. The number will increase further if foreign character types are included.

紙文書・文書画像・写真・映像中に含まれる文字列をコード化するデバイスとしては、ＯＣＲ装置がある。ＯＣＲ装置の一般的な機能及び、それを用いた帳票入力業務の形態については、特許文献１、特許文献２および特許文献３に概要が記されている。特許文献１にはＯＣＲ装置内の基本的な処理の流れが述べられている。帳票の自動読取りを行う場合、帳票内に記載されている文字コード、文字行、罫線、枠等を抽出し、データ入力が必要な帳票上の特定領域の読取りを行い、これをテキストファイルとして外部記憶装置に出力する。特許文献２には、ＯＣＲの読取精度を向上する手段として、ＯＣＲの認識結果に形態素解析を適用する手法が記されている。また、特許文献３では、手書き数字列に対して上昇型構文解析を使った文字列表記解析処理が提案されている。いずれも、ＯＣＲ装置を使って、紙文書あるいは文書画像上のデータを読取り精度を向上するための技術を提案している。 There is an OCR device as a device for encoding a character string included in a paper document, document image, photograph, or video. The general functions of the OCR device and the form input form using the same are outlined in Patent Document 1, Patent Document 2, and Patent Document 3. Patent Document 1 describes a basic processing flow in the OCR apparatus. When reading a form automatically, extract the character code, character line, ruled line, frame, etc. written in the form, read a specific area on the form that requires data input, and externalize it as a text file. Output to storage device. Patent Document 2 describes a method of applying morphological analysis to the recognition result of OCR as means for improving the reading accuracy of OCR. Patent Document 3 proposes a character string notation analysis process using ascending syntax analysis for a handwritten digit string. In both cases, a technique for improving the accuracy of reading data on a paper document or a document image using an OCR apparatus is proposed.

すなわち、ＯＣＲ装置における文字認識機能と、戸籍姓名の文字字形を厳密に扱うための文字コード体系の整備が進められているというのが、本発明の背景となっている。 That is, the background of the present invention is that a character recognition function in the OCR device and a character code system for strictly handling the character form of family names are being developed.

特開平０６−５２１５６JP 06-52156 特開平０５−１０８８９１JP 05-108991 特開２００２−１１７３７４JP 2002-117374 A

紙文書あるいは画像、映像中の文字パタンを判別して読み取るＯＣＲ装置においても、姓名等の外字の扱いには慎重な設計を必要としてきた。ＯＣＲ装置において外字を登録する場合は、一般にはユーザ定義領域と称する文字コードの特別なエリアを設けて、外字の見本画像を登録することで読取を行ってきた。しかし、数少ない見本画像では認識精度が十分に出ないことが課題とされる。 Even in an OCR apparatus that discriminates and reads a character pattern in a paper document, an image, or a video, careful design has been required for handling external characters such as first and last names. When registering an external character in an OCR apparatus, a special character code area called a user-defined area is generally provided, and reading is performed by registering a sample image of the external character. However, the problem is that the recognition accuracy is not sufficient with a few sample images.

前述の課題は、例えば、見本文字画像の入力を受け付ける文字画像入力受付部と、見本文字画像に基づいて文字部品を抽出する文字部品抽出と、文字部品に基づいて擬似文字モデルを生成する擬似文字モデル生成部と、擬似文字モデルに基づいて文字識別パターンを生成して識別辞書を生成する識別辞書生成と、を含むことを特徴とする文字識別システムによって解決される。 The above-described problems include, for example, a character image input receiving unit that receives an input of a sample character image, a character component extraction that extracts a character component based on the sample character image, and a pseudo character that generates a pseudo character model based on the character component The invention is solved by a character identification system comprising: a model generation unit; and an identification dictionary generation for generating an identification dictionary by generating a character identification pattern based on a pseudo-character model.

本発明の一実施形態によれば、見本となる外字または新しい文字画像を最低１つ用意するだけで、それに対応した手書き文字パタンや活字文字パタンを学習して、単純な見本画像を登録する外字認識手法よりも、より高精度に認識することが可能となる。 According to an embodiment of the present invention, an external character for registering a simple sample image by learning at least one handwritten character pattern or printed character pattern corresponding to at least one prepared external character or new character image. It becomes possible to recognize with higher accuracy than the recognition method.

本発明を構成する文書認識サービスの全体像である。1 is an overall view of a document recognition service constituting the present invention. 本発明を構成する認識辞書生成装置である。It is the recognition dictionary production | generation apparatus which comprises this invention. 本発明の自動認識装置の認識機能を実現する認識装置である。It is a recognition apparatus which implement | achieves the recognition function of the automatic recognition apparatus of this invention. 本発明におけるハードウェアの構成例である。It is a structural example of the hardware in this invention. 本発明におけるハードウェアの入力装置部分である。It is the input device part of the hardware in this invention. 本発明における文書の認識プロセス図である。It is a recognition process diagram of a document in the present invention. 本発明における文字の認識プロセス図である。It is a recognition process figure of the character in the present invention. 本発明における自律学習の仕組みを示した図である。It is the figure which showed the mechanism of the autonomous learning in this invention. 本発明における自動微分の仕組みを示した図である。It is the figure which showed the mechanism of the automatic differentiation in this invention. 本発明における入力文書のサンプル例である。It is an example of the sample of the input document in this invention.

以下、図面を用いて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まずは、本発明を適用する対象となる自動認識装置システム全体の外観について説明する。図１は複数台の窓口入力装置０１０１が、ネットワーク０１０２を介して認識クラウド又はホストコンピュータ０１０３に接続された様子を示したものである。 First, the external appearance of the entire automatic recognition apparatus system to which the present invention is applied will be described. FIG. 1 shows a state in which a plurality of window input devices 0101 are connected to a recognition cloud or a host computer 0103 via a network 0102.

図１は、本発明の適用例であるＷｅｂサービスを用いた文書認識サービスを利用した業務のフローを示した図である。本サービスでは、業務を継続することにより文字認識辞書が改良され、認識精度が向上するという特徴を持つ。 FIG. 1 is a diagram showing a business flow using a document recognition service using a Web service, which is an application example of the present invention. This service has the feature that the character recognition dictionary is improved and the recognition accuracy is improved by continuing the business.

まず、文書認識サービスの利用者は読み取りたい文書や文字を登録する（プロセス１、０１１０）。文書画像や文字画像はコンピュータ（すなわちディスプレイによる表示と、マウス、キーボードによる入力）を使って行うことも可能であるが、この図では電子ペンと特殊な用紙を利用することや、カメラやスキャナで画像化した文書群（０１０１）を扱うとしている。入力された業務文書は、ネットワーク０１０２を通して認識クラウド０１０３に送られる。さまざまな地域、さまざまな執筆者より集められた業務文書は、認識クラウド内のデータベース０１０４に記録される。次に、この文書画像データに基づいて、当該文書の必要な箇所をコンピュータで認識する（０１０５）。認識時には文字認識用の辞書０１０６を用いる。認識結果は認識サービスの利用者に提示される（プロセス２、０１１１）。 First, the user of the document recognition service registers a document or character to be read (process 1, 0110). Document images and text images can also be performed using a computer (ie, display on a display and input using a mouse or keyboard). In this figure, an electronic pen and special paper are used, or a camera or scanner is used. An imaged document group (0101) is handled. The input business document is sent to the recognition cloud 0103 through the network 0102. Business documents collected from various regions and various authors are recorded in the database 0104 in the recognition cloud. Next, based on the document image data, a necessary part of the document is recognized by the computer (0105). At the time of recognition, a dictionary 0106 for character recognition is used. The recognition result is presented to the user of the recognition service (process 2, 0111).

認識クラウドでは、大きく３つの機能を持つ部を持つ。第１が認識部０１０５である。認識部では、文書画像データに基づいて、個々の文字パタンを認識する。この際には、文字パタンの形状や分布を記録したデータが辞書として用いられる。第２が分析部０１０６である。分析部では、ユーザに提示した認識結果のうち不具合があるパタンや、あるいは新規で登録するべきパタンを同定する。第３が学習部である。学習部では、分析部によって同定されたパタンを読み取るために、文字認識辞書の変更を行う。このとき学習のために必要な辞書を用いる。認識結果提示（プロセス２、０１１１）では、文字認識の結果を利用者に提示する。提示内容に応じて、必要があれば利用者はシステムにフィードバック（プロセス３、０１１２）をする。フィードバックの例としては、認識結果が誤っている場合には正しい文字を入力する認識訂正や、認識対象の文字が辞書に登録されていない外字等である場合には辞書へ追加するための文字追加依頼などがある。これらのフィードバックに基づき、学習部（０１０８）によって認識辞書の修正すべきパラメータが同定され、その結果が認識辞書（０１０６）へと反映される（プロセス４、０１１３）。また、この学習の結果の反映には、どのような改善提案を辞書に反映するべきかについて、認識クラウドにフィードバックされる。このようにして、学習結果をフィードバックし、認識辞書が更新され、より高精度な認識ができるようになる。 The recognition cloud has a section with three major functions. The first is the recognition unit 0105. The recognition unit recognizes individual character patterns based on the document image data. At this time, data recording the shape and distribution of the character pattern is used as a dictionary. The second is the analysis unit 0106. The analysis unit identifies a defective pattern or a new pattern to be registered among the recognition results presented to the user. The third is a learning unit. The learning unit changes the character recognition dictionary in order to read the pattern identified by the analysis unit. At this time, a dictionary necessary for learning is used. In recognition result presentation (process 2, 0111), the result of character recognition is presented to the user. If necessary, the user provides feedback (process 3, 0112) to the system according to the contents presented. Examples of feedback include recognition correction to input correct characters if the recognition result is incorrect, or addition of characters to add to the dictionary if the recognition target character is an external character that is not registered in the dictionary. There are requests. Based on these feedbacks, parameters to be corrected in the recognition dictionary are identified by the learning unit (0108), and the result is reflected in the recognition dictionary (0106) (process 4, 0113). In addition, in order to reflect this learning result, what kind of improvement proposal should be reflected in the dictionary is fed back to the recognition cloud. In this way, the learning result is fed back, the recognition dictionary is updated, and more accurate recognition can be performed.

一般的なＯＣＲの読取り対象文字種が３０００から４０００であるのに対し、本発明では外字等を含めた数万文字種の認識を前提としている。したがって、一般的なＯＣＲと比べて１位の正解率は低くならざるを得ない。このため本発明では１位認識率だけでなく累積認識率を考慮したシステムとする。累積認識率とは、第ｎ位（ｎは用途により変わる。たとえば１５位や５０位）以内に正解が存在する場合の認識率である。認識結果が１位正解でない場合、本発明のフィードバックプロセス（０１１２）の認識訂正では、利用者がｎ位以内の認識結果の候補文字から選択することで正しい文字を入力する。この作業は、数万字の文字コードから目的の文字コードを手作業で調べるのに比べると作業効率が向上するという特長がある。この特徴を実現するため、本発明では、１位正解率の向上に寄与する辞書と、累積認識率の向上に寄与する辞書を利用する。以下、それぞれの辞書の作成方法について説明する。 While the general OCR reading target character types are 3000 to 4000, the present invention presupposes recognition of tens of thousands of character types including external characters. Therefore, the correct answer rate of the first place is inevitably lower than that of a general OCR. Therefore, in the present invention, not only the first recognition rate but also the cumulative recognition rate is considered. The cumulative recognition rate is a recognition rate when a correct answer exists within the nth place (n varies depending on the application, for example, 15th place or 50th place). When the recognition result is not the first correct answer, in the recognition correction of the feedback process (0112) of the present invention, the user inputs a correct character by selecting from the candidate characters of the recognition result within the nth position. This work has a feature that the work efficiency is improved as compared with a case where a target character code is manually checked from a character code of tens of thousands of characters. In order to realize this feature, the present invention uses a dictionary that contributes to an improvement in the first-rank accuracy rate and a dictionary that contributes to an improvement in the cumulative recognition rate. Hereinafter, a method for creating each dictionary will be described.

図２は、図１の学習部１０８と学習辞書１０９に相当する認識辞書生成装置である。認識辞書生成装置の記憶装置には、学習用文字パタン０２０１があり、ここから特徴ベクトルを抽出する（０２０２）。認識辞書生成装置と認識装置が同一のハードウェア装置で駆動とした場合は、例えば図４の外部補助記憶部（０４１１）に学習用文字パタン０２０１が記憶されているものとする。一般には、ここで得た特徴ベクトルを学習、辞書に記録して、新たに登録した外字などの文字パタンを識別するための辞書１（単純学習辞書、０２０４）を構成する。辞書１（０２０４）は、１位正解率向上に寄与するための辞書であり、文字パタンを認識する際の基本となる辞書である。ここには文字パタンから作成した特徴ベクトル、文字コードや、その重みなどが格納されている。学習に用いた文字パタンの特徴を記録しているため、最近傍法などの簡易な識別アルゴリズムを処理するために使われる。しかし、姓名の異体字のように非常に字種が多い場合、あるいは文字パタン字形で新しいものが追加され得る場合、すべての字種に対して十分な学習サンプルを用意することは難しいこともあって、辞書１のみでは十分な認識精度を得ることが難しい。 FIG. 2 is a recognition dictionary generation device corresponding to the learning unit 108 and the learning dictionary 109 of FIG. The storage device of the recognition dictionary generation device has a learning character pattern 0201 from which a feature vector is extracted (0202). When the recognition dictionary generation device and the recognition device are driven by the same hardware device, for example, it is assumed that a learning character pattern 0201 is stored in the external auxiliary storage unit (0411) of FIG. In general, the feature vector obtained here is learned and recorded in a dictionary, and a dictionary 1 (simple learning dictionary, 0204) for identifying a newly registered character pattern such as an external character is constructed. Dictionary 1 (0204) is a dictionary that contributes to improving the first-ranked correct answer rate, and is a basic dictionary for recognizing character patterns. Here, a feature vector created from a character pattern, a character code, its weight, and the like are stored. Since the characteristics of the character pattern used for learning are recorded, it is used to process simple identification algorithms such as the nearest neighbor method. However, if there are very many character types, such as variants of first and last names, or if new characters can be added in the character pattern shape, it may be difficult to prepare sufficient learning samples for all character types. Thus, it is difficult to obtain sufficient recognition accuracy with the dictionary 1 alone.

外字パタンＡが、既存の文字パタンＢに対して類似した字形を持つことがある。類似した文字の場合、文字パタンＡと文字パタンＢを判別することが難しいので、クラスタリングを行うことで認識精度を向上することができる。文字パタンの似ている者同士を一つのグループとする異体字パタンクラスタリング処理（０２０５）を行うことで、似た文字同士の認識の間違い易さなどが分かる。クラスタリングの結果として、類似した文字同士の部分集合が情報として得られる。例えば、文字｛Ａ、Ｂ、Ｃ、・・・｝があるとして、クラスタリングによって文字ＯとＱが似ていると判断されると｛Ｏ、Ｑ｝という部分集合が得られる。この情報を検索補完用の辞書２（検索補完辞書、０２０６）として記録する。辞書２は累積認識率向上に寄与する辞書である。この辞書により、“Ｏ”を認識した際には、誤認識も想定して“Ｑ”を認識結果の候補に加えることができる。 The external character pattern A may have a character shape similar to the existing character pattern B. In the case of similar characters, it is difficult to discriminate between the character pattern A and the character pattern B, so that recognition accuracy can be improved by performing clustering. By performing the variant pattern clustering process (0205) in which persons who have similar character patterns are grouped together, it is possible to understand the ease of recognition of similar characters. As a result of clustering, a subset of similar characters is obtained as information. For example, if there is a character {A, B, C,...}, And it is determined by clustering that the characters O and Q are similar, a subset {O, Q} is obtained. This information is recorded as a search complement dictionary 2 (search complement dictionary, 0206). The dictionary 2 is a dictionary that contributes to improving the cumulative recognition rate. With this dictionary, when “O” is recognized, “Q” can be added to the recognition result candidate assuming misrecognition.

また入力された文字パタンの画像に対して、縦や横の投影を撮る、黒い画素の連結成分を得る、当該文字パタンＡの一部分と一致する別の文字パタンＢが存在するかどうかを計算するといった処理によって、文字パタンの部品分解を行い（０２０７）、その結果を文字パタン部品分解データ（０２０８）として記録する。部品分解パタンは、文字パタンＡの扁や旁などが、部分的に似ている別の文字パタンＢに似ているといった情報を記憶しており、これを辞書３（検索補完辞書、０２０９）として記録する。辞書３は累積認識率向上に寄与する辞書である。辞書３は、各文字パタンの偏や旁の情報を記憶している。例えば、”和”はノギヘンとクチがそれぞれ左右に配置されている。また、偏と旁の情報の記憶様式としては、ノギヘンを含む漢字のリストの中に”和”があり、その位置は文字中の左にある、という別の様式でも持つことができる。この辞書を使うと、フィードバックプロセス（０１１２）において対話的な文字コード検索が可能となる。例えば、利用者がマウスやデジタルペンでクサカンムリを書いた時点で、それを含む文字コード候補を提示するといった使い方ができる。また、”金”という文字が入力された場合、金偏の全般の漢字を検索していると見なして、これを含む文字コード候補を提示することができる。 In addition, a vertical or horizontal projection is taken with respect to the input character pattern image, a connected component of black pixels is obtained, and it is calculated whether there is another character pattern B that matches a part of the character pattern A. Through the above process, the character pattern parts are disassembled (0207), and the result is recorded as character pattern part disassembly data (0208). The parts disassembly pattern stores information such that the character pattern A is similar to another character pattern B that is partially similar to the character pattern A, and this is used as a dictionary 3 (search complement dictionary, 0209). Record. The dictionary 3 is a dictionary that contributes to improving the cumulative recognition rate. The dictionary 3 stores information on the deviation and habit of each character pattern. For example, “Japanese” has Nogihen and Kuchi placed on the left and right. In addition, as a storage format for the information of bias and 旁, there is another format in which “sum” is in the list of kanji characters including nogihen, and the position is on the left in the character. When this dictionary is used, interactive character code search can be performed in the feedback process (0112). For example, when the user has written a cross with a mouse or a digital pen, a character code candidate including that can be presented. Further, when the character “gold” is input, it is considered that the kanji biased general kanji is being searched, and a character code candidate including this can be presented.

文字パタン部品については、更に疑似文字モデル生成（０２１１）を行う。疑似文字モデル生成は、辞書に未登録の文字や認識率が低い文字について、１つまたは少数の見本サンプルを元に、統計学習に必要となる大量の文字パタンを生成する機能である。まず、手書き文字パタン部品（０２１０）には、文字パタンを構成する部品（ノギヘン、モンガマエなどの扁と旁や、文字の一部として使われることが多い”日””口”などの要素漢字）があらかじめ格納されている。疑似パタン合成（０２１１）では、学習すべき文字パタンから、必要であれば文字の筆順を推定し、連結成分や時系列上で一群となり得るパタンに対して、当該部分に対応する手書き文字パタンの部品が存在するかどうかを辞書（０２１０）から調べる。対応する部品が得られた場合、その箇所に手書き文字パタンの部品を合成することで、組み合わせ式に、大量の疑似パタンの合成が可能となる。また、対応する手書き文字パタンの部品が無い場合でも、入力された学習すべき文字パタンから筆順を推定しているため、ストローク単位で変形や、偏や旁などの群れを構成すると思われる部分的パタンに対して、変形を行うことが可能となる。これにより、全体に対するアフィン変換や、樽型変換では得られない非線形な変形パタンが得られ、結果として得られる特徴空間でのバリエーションが増え、より高精度な文字認識が可能となる。 For the character pattern component, pseudo character model generation (0211) is further performed. The pseudo-character model generation is a function that generates a large amount of character patterns necessary for statistical learning based on one or a small number of sample samples for characters not registered in the dictionary or characters with a low recognition rate. First, in the handwritten character pattern part (0210), there are parts that make up the character pattern (flat knots such as Nogihen and Mongamae, and elemental kanji such as “day” and “mouth” that are often used as part of characters) Is stored in advance. In the pseudo pattern synthesis (0211), if necessary, the stroke order of characters is estimated from the character patterns to be learned, and the handwritten character pattern corresponding to the part is compared with the patterns that can be a group on the connected component or time series. It is checked from the dictionary (0210) whether a part exists. When a corresponding part is obtained, a large number of pseudo patterns can be synthesized in a combinational manner by synthesizing a part of a handwritten character pattern at that part. In addition, even if there is no corresponding handwritten character pattern component, the stroke order is estimated from the input character pattern to be learned. The pattern can be deformed. As a result, a non-linear deformation pattern that cannot be obtained by affine transformation or barrel transformation with respect to the whole is obtained, and variations in the resulting feature space are increased, enabling more accurate character recognition.

ここで生成した文字パタンから、特徴ベクトルを抽出し（２０１２）、多くのパタンを元に高次識別器学習（０２１３）を行う。高次識別器学習としては、サポートベクトルマシンや、多層ニューラルネット、パラメトリック統計判別法などが使える。疑似的な文字パタンの生成により、さまざまなパタンの文字が作られるため、一般に高精度な認識が可能となる高次識別器学習が可能となる。辞書４（高次識別辞書、０２１４）は、高次識別器学習によって生成された１位認識率向上に寄与するための辞書である。辞書４は、追加した外字文字パタンを識別するために、１つまたは少数の見本サンプルを元に、疑似的に文字パタンを生成して、これを元に統計的学習を行った結果を格納する。一方で、疑似的に文字パタンを合成しているため、必ずしも人間が考える文字の変形が実現できるわけではない。そのため文字識別辞書（０２０４）にデータを追加するのではなく、文字識別辞書と高次識別辞書（０２１４）を分けて管理し、それぞれの認識結果を利用することが必要となる。 A feature vector is extracted from the generated character pattern (2012), and high-order classifier learning (0213) is performed based on many patterns. For higher-order classifier learning, support vector machines, multilayer neural networks, parametric statistical discrimination methods, and the like can be used. By generating pseudo character patterns, characters with various patterns are created, so that higher-order discriminator learning that generally enables high-accuracy recognition is possible. The dictionary 4 (higher-order identification dictionary, 0214) is a dictionary that contributes to improvement of the first-order recognition rate generated by higher-order discriminator learning. In order to identify the added external character pattern, the dictionary 4 generates a pseudo character pattern based on one or a small number of sample samples, and stores the result of statistical learning based on this character pattern. . On the other hand, since the character patterns are synthesized in a pseudo manner, it is not always possible to realize the deformation of characters considered by humans. Therefore, instead of adding data to the character identification dictionary (0204), it is necessary to manage the character identification dictionary and the higher-order identification dictionary (0214) separately and use the respective recognition results.

このように、辞書４による追加学習機能を有する自動認識装置では、外字として提示された少数の見本の画像を元に、文字パタンの疑似生成モデルを生成し、これを学習することで、少数見本パタンからでも認識精度の向上が可能となる自動認識装置を実現できる。 As described above, the automatic recognition apparatus having the additional learning function by the dictionary 4 generates a pseudo-generation model of a character pattern based on a small number of sample images presented as external characters, and learns this to generate a small number of samples. An automatic recognition device that can improve recognition accuracy even from a pattern can be realized.

図３の認識処理は図１の認識部１０５の処理の例であり、先の認識辞書生成装置で生成した辞書を使って、どのように認識を行うかを示している。ここで言う認識処理とは、図１の認識部１０５に示す認識部の一形態である。まず認識対象の未知の文字パタン（０３０１）が入力されると、辞書１（単純学習辞書、０２０４）と辞書４（高次識別辞書、０２１４）をそれぞれ使って認識を行う（０３０２）。例えば単純学習辞書では、学習用文字パタン０２０１から得られた特徴ベクトルをそのまま記憶しており、最近傍距離の計算によって認識候補を計算する。一方、高次識別辞書では、疑似文字モデル生成を通して自動的に作った手書き文字パタン群から抽出した大量の特徴ベクトルを記憶する、あるいは大量パタンを学習して得た二次判別関数のパラメータ、サポートベクトルなどが記憶され、単純学習辞書では判別が難しいパタンを識別する。擬似モデル生成において、少ない見本から学習することが出来、さらに高精度な識別が可能となっている。 The recognition process of FIG. 3 is an example of the process of the recognition unit 105 of FIG. 1, and shows how recognition is performed using the dictionary generated by the previous recognition dictionary generation apparatus. The recognition process referred to here is one form of the recognition unit shown in the recognition unit 105 in FIG. First, when an unknown character pattern (0301) to be recognized is input, recognition is performed using dictionary 1 (simple learning dictionary, 0204) and dictionary 4 (higher-order identification dictionary, 0214), respectively (0302). For example, in the simple learning dictionary, the feature vector obtained from the learning character pattern 0201 is stored as it is, and the recognition candidate is calculated by calculating the nearest neighbor distance. On the other hand, the high-order classification dictionary stores a large amount of feature vectors extracted from handwritten character patterns automatically created through pseudo-character model generation, or supports secondary discriminant function parameters obtained by learning a large number of patterns. Vectors and the like are stored, and patterns that are difficult to discriminate with a simple learning dictionary are identified. In the generation of a pseudo model, learning can be performed from a small number of samples, and identification with higher accuracy is possible.

また、複数の識別辞書を使っているので、認識候補は多重で得られることになるが、これについては認識順位を交互にする、あるいは認識尤度（特徴空間内における各カテゴリの中心と未知パタンとの距離で算出）の順序で並び変えるなどの手段により、候補の序列を一系列にすることができる。このように、２つの識別辞書を組み合わせることで、正しい候補が上位に上がる確率を増やす事ができる。次に、ここで得た認識結果を使って、文字候補を補完する。文字候補の補完としては、辞書２（０２０６）を使って誤認識し易い文字を文字候補として追加する方策や、辞書３（０２０９）を使って、扁や旁など部分的に似ている文字候補を追加するという方策がある。この文字候補の追加により、１位の認識精度の向上は図れないが、累積認識率の向上を図れるというメリットがある。以上のプロセスにより高精度化された文字認識結果（０３０４）が得られる。 In addition, since a plurality of identification dictionaries are used, recognition candidates can be obtained in multiples. For this, the recognition order is alternated, or the recognition likelihood (the center of each category in the feature space and the unknown pattern). The order of candidates can be made into a series by means such as rearranging in the order of (calculated by the distance to). In this way, by combining the two identification dictionaries, it is possible to increase the probability that the correct candidate will be ranked higher. Next, the character candidate is complemented using the recognition result obtained here. Completion of character candidates includes a method of adding characters that are easily misrecognized using the dictionary 2 (0206) as character candidates, or character candidates that are partially similar such as flats and wrinkles using the dictionary 3 (0209). There is a way to add. By adding this character candidate, the first recognition accuracy cannot be improved, but there is an advantage that the cumulative recognition rate can be improved. Through the above process, a highly accurate character recognition result (0304) is obtained.

図４は、文書認識システムが、帳票処理用計算機０４００によって構成されることを示している。ここで言う文書認識システムとは、図１の０１０３に示す認識クラウドを使った、認識サービス・認識システムの一形態である。帳票処理用計算機０４００は、画像入力部０４０３、入力部０４０５、出力部０４０６、通信部０４０７、制御部０４０８、補助記憶部０４０９、記憶部０４１０、及び外部補助記憶部０４１１を備える。これらは、内部バス０４１２を介して互いに接続される。画像入力部０４０３には、スキャナ０４０２によって読み取られた帳票画像が入力される。画像入力部０４０３には、スキャナ０４０２を介さずネットワーク等を介して帳票画像が電子データ０４０４として入力されてもよい。入力部０４０５はユーザによる入力を受け付ける。例えば、入力部０４０５はキーボード及びマウス等である。出力部０４０６は帳票処理の結果を出力する。例えば、出力部０４０６はディスプレイ及びプリンタ等である。通信部０４０７は、外部ネットワーク０４１３に接続されるインタフェースである。帳票処理の結果は、外部ネットワーク０４１３に接続された外部サーバ０４１４に出力されてもよい。制御部０４０８は帳票処理用計算機０４００の制御に関する各種処理を実行し、例えば、ＣＰＵ等である。補助記憶部０４０９は、帳票処理用計算機０４００の内部に備わる記憶部０４１０以外の記憶部であり、例えばＨＤＤである。記憶部０４１０は、制御部０４０８が直接アクセスできる記憶部であり、例えばメモリである。外部補助記憶部０４１１は、補助記憶部０４０９の一種であり、帳票処理用計算機０４００の外部に備わる記憶部である。例えば、外部補助記憶部０４１１は、ＣＤ−Ｒ、及びＤＶＤ−Ｒ等である。帳票処理に関するプログラム（帳票処理プログラム）を含む各種プログラムは、補助記憶部０４０９又は外部補助記憶部０４１１に記憶され、制御部０４０８が各種プログラムに実行する場合に記憶部０４１０にロードされる。制御部０４０８は、記憶部０４１０にロードされたプログラムを実行する。また、制御部０４０８は、画像入力部０４０３に入力された帳票画像を、内部バス０４１２を介して記憶部０４１０、補助記憶部０４０９、及び外部補助記憶部０４１１等に記憶する。なお、帳票処理用計算機０４００は、画像入力部０４０３、制御部０４０８、及び記憶部０４１０を少なくとも備えていればよく、他の部は備えなくてもよい。 FIG. 4 shows that the document recognition system includes a form processing computer 0400. The document recognition system referred to here is one form of a recognition service / recognition system using a recognition cloud indicated by 0103 in FIG. The form processing computer 0400 includes an image input unit 0403, an input unit 0405, an output unit 0406, a communication unit 0407, a control unit 0408, an auxiliary storage unit 0409, a storage unit 0410, and an external auxiliary storage unit 0411. These are connected to each other via an internal bus 0412. A form image read by the scanner 0402 is input to the image input unit 0403. A form image may be input to the image input unit 0403 as electronic data 0404 via a network or the like without using the scanner 0402. The input unit 0405 receives input from the user. For example, the input unit 0405 is a keyboard and a mouse. The output unit 0406 outputs the result of the form processing. For example, the output unit 0406 is a display, a printer, or the like. A communication unit 0407 is an interface connected to the external network 0413. The result of the form processing may be output to the external server 0414 connected to the external network 0413. The control unit 0408 executes various processes related to the control of the form processing computer 0400, and is, for example, a CPU. The auxiliary storage unit 0409 is a storage unit other than the storage unit 0410 provided in the form processing computer 0400, and is an HDD, for example. The storage unit 0410 is a storage unit that can be directly accessed by the control unit 0408, and is, for example, a memory. The external auxiliary storage unit 0411 is a kind of auxiliary storage unit 0409, and is a storage unit provided outside the form processing computer 0400. For example, the external auxiliary storage unit 0411 is a CD-R, a DVD-R, or the like. Various programs including a program related to form processing (form processing program) are stored in the auxiliary storage unit 0409 or the external auxiliary storage unit 0411, and loaded into the storage unit 0410 when the control unit 0408 executes the various programs. The control unit 0408 executes the program loaded in the storage unit 0410. In addition, the control unit 0408 stores the form image input to the image input unit 0403 in the storage unit 0410, the auxiliary storage unit 0409, the external auxiliary storage unit 0411, and the like via the internal bus 0412. Note that the form processing computer 0400 may include at least the image input unit 0403, the control unit 0408, and the storage unit 0410, and may not include other units.

図５は、文書認識システムのおける入力デバイスとして、電子筆記データ０４０４を入力する際に、電子ペンを用いた場合の、部分的な装置構成を示している。通信装置０５０３は、ネットワーク（図示省略）に接続され、そのネットワークに接続された他の装置（図示省略）と通信するインタフェースである。例えば、通信装置０５０３は、申込書などの文書０５０１に記入した内容を、電子ペンデバイス０５０２を通して、無線ランなどの形でデータを受信する。この受信したデータを送信することで、ストロークという形式を持った文字パタンデータを送る事ができる。例えばこのようなデバイスによって、図１のフィードバック処理０１１２における入力がなされる。 FIG. 5 shows a partial apparatus configuration when an electronic pen is used when inputting electronic writing data 0404 as an input device in the document recognition system. The communication device 0503 is an interface that is connected to a network (not shown) and communicates with other devices (not shown) connected to the network. For example, the communication device 0503 receives data entered in a document 0501 such as an application form through the electronic pen device 0502 in the form of a wireless run or the like. By transmitting the received data, character pattern data having a stroke format can be transmitted. For example, the input in the feedback process 0112 of FIG. 1 is made by such a device.

図６は図１に示す文書認識システムを用いたアプリケーションの一例である申請書認識サービスの認識処理フローを示している。図1の認識部０１０５の処理の詳細にあたる。 FIG. 6 shows a recognition processing flow of an application form recognition service which is an example of an application using the document recognition system shown in FIG. This corresponds to the details of the processing of the recognition unit 0105 in FIG.

図６の認識処理フローで使われる、申請書の様式の一例が図１０である。１００１が申請書（贈り物申込書）であり、その中の記入欄として送付先の記入欄１００２と顧客名の１００３がある。数多くの異体字を持つ氏名を正しくコード変換する際には、１００２や１００３に書かれた手書きの記入を読み取り、文字コードに変換し、これを元に配送を手配する事になる。その際、変換した文字コードをチェックして、それが間違えていた場合、あるいは異体字の文字コードの候補を複数出力して、候補文字を選ばせることによって、認識結果に対するユーザからのフィードバックを得られることができ、更にそれを元に文字認識の辞書・パラメータを調整することが可能となる。 FIG. 10 shows an example of an application form used in the recognition process flow of FIG. Reference numeral 1001 is an application form (gift application form), and there are an entry field 1002 for a destination and a customer name 1003 as entry fields. When correctly converting a name having a large number of variants, the handwritten entry written in 1002 or 1003 is read, converted into a character code, and delivery is arranged based on this. At that time, if the converted character code is checked and it is wrong, or if multiple candidate character codes are output and candidate characters are selected, feedback from the user on the recognition result is obtained. Furthermore, it is possible to adjust the dictionary and parameters for character recognition based on this.

図６に戻って、申請書認識サービスにおける文字列認識の処理フローの概要を説明する。本発明の実施例である文字列認識装置では、ＯＣＲ装置が紙文書を撮像して、これを電子的画像データに変換する。本処理は、元々の文書が電子的画像データである場合は省略可能である（０６０１）。次に、電子的画像データを元にして、罫線抽出、枠構造解析、読取対象枠の位置推定等の文書構造解析を行う（０６０２）。次に、文書構造解析の結果を受けて、読取対象である文字行を抽出する（０６０３）。次に、文字行画像から文字パタン候補の切出しと、各文字パタンの文字識別を行う（０６０４）。文字切出パタン及び識別結果を併せて文字列仮説と称する。読取対象とする文書において、書かれ得る文字表記列が事前に決まっている場合は、文字列仮説に対して表記解析を行う（０６０５）。これにより、文字切出や文字識別の曖昧性を含んだ文字列仮設は、文字列テキストに変換され、読取結果テキストとしてＯＣＲから出力される（０６０６）、但し、表記知識での解析が十分に行えなかった場合など、読取結果テキストの信頼度が低い場合は、文字列仮説を出力とする。読取結果テキスト、並びに読取仮説データの双方は、必要であれば当該文字列の書かれた文書画像上の位置情報を保持するものとする。以上の処理により、読取結果テキスト、読取仮説データが出力され、一般にはこれらのデータを元に文書処理を行う。外字などの文字パタンを追加学習した結果の辞書は、辞書０６１０に反映される。このようにして、文章を入力情報とした場合であっても、少ない見本数を用いた高精度な文字認識が可能となる。 Returning to FIG. 6, the outline of the processing flow of character string recognition in the application form recognition service will be described. In the character string recognition device according to the embodiment of the present invention, the OCR device captures a paper document and converts it into electronic image data. This process can be omitted when the original document is electronic image data (0601). Next, based on the electronic image data, document structure analysis such as ruled line extraction, frame structure analysis, and position estimation of the reading target frame is performed (0602). Next, in response to the result of the document structure analysis, a character line to be read is extracted (0603). Next, extraction of character pattern candidates from the character line image and character identification of each character pattern are performed (0604). The character extraction pattern and the identification result are collectively referred to as a character string hypothesis. In a document to be read, if a character notation string that can be written is determined in advance, a notation analysis is performed on the character string hypothesis (0605). As a result, the temporary character string including character extraction and character identification ambiguity is converted into character string text and output from the OCR as a read result text (0606), however, analysis with notation knowledge is sufficient. When the reliability of the read result text is low, such as when it could not be performed, the character string hypothesis is output. Both the reading result text and the reading hypothesis data hold position information on the document image in which the character string is written, if necessary. Through the above processing, the reading result text and the reading hypothesis data are output, and the document processing is generally performed based on these data. A dictionary obtained as a result of additionally learning character patterns such as external characters is reflected in the dictionary 0610. In this way, even when a sentence is used as input information, highly accurate character recognition using a small number of samples is possible.

図７から図９は、図１の学習部１０８を実現するための具体的な学習方式の一例を説明した図である。ただし、図１を実現する学習方法はこの例に限定されない。 7 to 9 are diagrams illustrating an example of a specific learning method for realizing the learning unit 108 of FIG. However, the learning method for realizing FIG. 1 is not limited to this example.

図７は文字認識のプロセスを示している。 FIG. 7 shows the process of character recognition.

１個の文字パタンの画像０７１０を入力とすると、特徴抽出０７０２が行われる。このとき文字のストロークの方向成分などを抽出して、文字パタン画像を１つのベクトルに変換する。文字パタンからベクトルが得られた後で、当該字種が何であるかを判定する。これをカテゴリ識別０７０３と称する。カテゴリ識別では、事前に大量のパタンを使った分布のようすから、どの字種が特徴空間上のどの辺に分布しているかを辞書に記憶しており、未知の入力パタンがどのカテゴリに属するかを決定する。この図ではカテゴリ“８”や“５”や“９”などの情報が記憶されている様子を、概念的に示している。なお、本来は高次元のベクトルであるが、可視化を容易にするために2次元にて表示している。以上のプロセスにより文字コード０７０４を得る。 When an image 0710 having one character pattern is input, feature extraction 0702 is performed. At this time, the direction component of the character stroke is extracted, and the character pattern image is converted into one vector. After the vector is obtained from the character pattern, it is determined what the character type is. This is referred to as category identification 0703. In category identification, it looks like a distribution using a large number of patterns in advance, so it stores in the dictionary which character type is distributed in which side of the feature space, and to which category the unknown input pattern belongs. To decide. This figure conceptually shows that information such as categories “8”, “5”, and “9” is stored. Although it is originally a high-dimensional vector, it is displayed in two dimensions to facilitate visualization. The character code 0704 is obtained by the above process.

このように文字認識のためには、大量の文字パタンから得られた大量の特徴ベクトルが、どのように特徴空間上に分布しているかを知る事が重要となる。特徴空間上における特徴ベクトルのばらつきは、文字のパタンの変形に起因する。このため、新たな外字を追加登録する場合、少ない文字パタンから大量の文字パタンを生成する文字の疑似生成モデルが重要となる。
疑似モデル生成には、学習するべきパラメータがある。これらのパラメータの学習を実現する一例として自動微分がある。自動微分を用いた学習の構成を図８、図９に示す。 Thus, for character recognition, it is important to know how a large amount of feature vectors obtained from a large amount of character patterns are distributed in the feature space. The variation of the feature vector on the feature space is caused by the deformation of the character pattern. For this reason, when a new external character is additionally registered, a pseudo-character generation model that generates a large number of character patterns from a small number of character patterns is important.
There are parameters to be learned in the pseudo model generation. One example of realizing learning of these parameters is automatic differentiation. The learning configuration using automatic differentiation is shown in FIGS.

学習部では、重回帰分析や、構造方程式モデリング、更には自動微分などの機構を使って、文字認識に必要なパラメータを学習する。
パラメータ学習においては、自動微分と回帰分析や構造化モデリングとの組合せを用いることができる。 The learning unit learns parameters necessary for character recognition using a mechanism such as multiple regression analysis, structural equation modeling, and automatic differentiation.
In parameter learning, a combination of automatic differentiation and regression analysis or structured modeling can be used.

図８は、文字認識に必要となるパラメータを学習するしくみと、自動微分の関係について記している。文字認識を行う場合、パラメータを利用して、原信号（画像）または特徴ベクトルから、各種ベクトル演算、行列演算、自己相関演算、畳込み演算などを行って、未知パタンのカテゴリが何かを計算する。自動微分は、この計算過程を支える基本的な数値演算体系である。
更に具体的な処理プロセスを述べるため、学習において、自動微分機構と回帰分析とを併用するための仕組みについて記す。自動微分では「数」と「演算」を独自に定義する。自動微分（ＡｕｔｏｍａｔｉｃＤｉｆｆｅｒｅｎｔｉａｔｉｏｎ）で用いる演算関数例を図９に示す。ここで、プログラム中にある全変数の内、n個の変数が偏微分の対象であるとする。このとき「数」の構造は次のベクトルで表される。 FIG. 8 describes the relationship between the mechanism for learning parameters necessary for character recognition and automatic differentiation. When character recognition is performed, parameters are used to calculate the unknown pattern category by performing various vector operations, matrix operations, autocorrelation operations, convolution operations, etc. from the original signal (image) or feature vector. To do. Automatic differentiation is a basic numerical operation system that supports this calculation process.
In order to describe a more specific processing process, a mechanism for using an automatic differentiation mechanism and regression analysis together in learning will be described. Automatic differentiation defines "number" and "calculation" independently. An example of an arithmetic function used in automatic differentiation is shown in FIG. Here, it is assumed that n variables among all variables in the program are objects of partial differentiation. At this time, the structure of “number” is represented by the following vector.

ここで、ｖは関数の値を保持する場所である。また、ｄｋ（ｋ＝１〜ｎ）は関数を第k番目の変数で偏微分した時の値を保持する場所である。自動微分では上記構造を持つもの数をＡＤ数［数１］として、これに基づいて各種の演算を行う。
このような機構を導入するのは、学習でのパラメータ調整を柔軟に構成するためである。後述するように、ルール計算を行う上では暗黙的なパラメータが利用される。例えば、文字認識の高次識別として二次判別関数を利用したとする、この時、高次識別辞書に記憶されるパラメータは各カテゴリの分布を表現した二次関数の係数である。図１の文書認識サービスにおいて、申請書の認識を処理した場合、認識辞書（０１０６）として、この高次識別辞書が使われる。その結果、文字認識が行われ、結果をユーザに提示して、そのフィードバックを得る。その結果、どの文字が認識を間違えたかが分かることになる。その場合、学習プロセス０１０８において、辞書を更新する。文字認識が間違えたということは、本来正しい判別関数Ａの出した尤度を、間違えた判別関数Ｂが出力した尤度が上回ったことであると解釈できるので、この間違えた判別関数Ｂを計算するために使ったパラメータを、尤度が下がるように微修正すれば良い。その際に、値の計算と導関数値の計算が同時に処理できる自動微分機構が役に立つ。自動微分機構により、計算値（この場合の尤度）が下げるように、導関数値の値からパラメータの微調整量が計算できる。この結果として、微調整されたパラメータを格納した学習辞書（０１０９）が作成され、認識辞書（０１０６）に反映されることとなる。 Here, v is a place where a function value is held. Further, dk (k = 1 to n) is a place for holding a value when the function is partially differentiated by the kth variable. In the automatic differentiation, the number having the above structure is set as the AD number [Expression 1], and various calculations are performed based on this.
The reason why such a mechanism is introduced is to flexibly configure parameter adjustment in learning. As will be described later, an implicit parameter is used for rule calculation. For example, assuming that a secondary discriminant function is used as a high-order discrimination for character recognition, the parameters stored in the high-order discrimination dictionary are coefficients of a quadratic function expressing the distribution of each category. In the document recognition service of FIG. 1, when the recognition of the application form is processed, this higher-order identification dictionary is used as the recognition dictionary (0106). As a result, character recognition is performed, the result is presented to the user, and the feedback is obtained. As a result, it can be understood which character has made a mistake in recognition. In that case, in the learning process 0108, the dictionary is updated. If the character recognition is wrong, it can be interpreted that the likelihood that the originally correct discriminant function A is output is higher than the likelihood that the wrong discriminant function B outputs. Therefore, the mistaken discriminant function B is calculated. It is only necessary to finely modify the parameters used for the adjustment so that the likelihood decreases. In doing so, an automatic differentiation mechanism that can simultaneously process the calculation of the value and the calculation of the derivative value is useful. By the automatic differentiation mechanism, the fine adjustment amount of the parameter can be calculated from the value of the derivative value so that the calculated value (likelihood in this case) is lowered. As a result, a learning dictionary (0109) storing the finely adjusted parameters is created and reflected in the recognition dictionary (0106).

上記のようにパラメータを微調整する方法を勾配法と称する。勾配法では目標関数の偏微分係数が必要になる。文書認識サービスや申請書受付システムでは、目標の関数は文字認識の精度や帳票の処理精度などになる。ここで、勾配法の具体的なプロセスを述べる。ここでは教師信号として、文字認識の正しい、正しくないがシステムユーザからのフィードバックとして与えられるとする。例えば、回帰分析に対して、文字識別精度向上に有効なパラメータを推定するならば、［数２］が目標関数となる。 The method of finely adjusting the parameters as described above is called a gradient method. The gradient method requires a partial derivative of the target function. In the document recognition service and application reception system, the target functions are the accuracy of character recognition and the accuracy of form processing. Here, a specific process of the gradient method will be described. Here, it is assumed that the teacher signal is given as feedback from the system user although the character recognition is correct or incorrect. For example, if a parameter effective for improving character identification accuracy is estimated for regression analysis, [Equation 2] is the target function.

この目標関数は“文字識別精度向上”を高くすることになる。実際の文字識別精度向上度合いを“Ｙ文字識別精度向上”で表すとする。また、文字識別精度向上に関係すると思われる項目として、変数１、変数２、変数３などの重みに掛かる複数のパラメータがあるとする。例えば変数としては、文字左上領域の標準的な黒さであったり、右半分エリアのストロークの密度であったり、文字の輪郭の斜め方向性分の平均的な強さなどであったりする。更に、文字パタンの変形についても文字識別精度に影響を与え得るパラメータとして考慮すると、上記の［数２］は、更に文字変形パラメータの関数とみなすことができる。すなわち、 This target function increases the “character recognition accuracy improvement”. Assume that the actual degree of improvement in character identification accuracy is represented by “improvement in Y character identification accuracy”. Further, it is assumed that there are a plurality of parameters related to weights such as variable 1, variable 2, and variable 3 as items that are considered to be related to the improvement of character identification accuracy. For example, the variable may be the standard blackness of the upper left area of the character, the stroke density of the right half area, the average strength of the diagonal directionality of the character outline, or the like. Further, considering the deformation of the character pattern as a parameter that can affect the character identification accuracy, the above [Equation 2] can be further regarded as a function of the character deformation parameter. That is,

となる。 It becomes.

［数２］の回帰分析のパラメータａ１〜は、線形方程式を解くことで容易に得られる。また、［数３］のパラメータの学習は勾配法によって、徐々にパラメータを変化するように学習することができる。
一般に、勾配法による学習を実装する場合は、固定の関数の定義式(例えばニューロの中間層の計算式や、多項式識別関数のパラメータ重畳など)から、手計算により偏微分方程式を導き、これを元に学習するプログラムを実装する。しかし、ルールの追加、削除は動的に行われ得る。さらに、目標関数についても文字識別精度向上重視か、件数重視かなど変わりえる。目標関数の計算式が動的に変更され得て、それに合せて勾配法によるパラメータ学習を行う必要がある場合は、偏微分方程式も動的に変更しなければならない。 The regression analysis parameters a1 to [Equation 2] can be easily obtained by solving a linear equation. In addition, the learning of the parameters of [Equation 3] can be performed by gradually changing the parameters by the gradient method.
In general, when implementing learning by the gradient method, a partial differential equation is derived by manual calculation from a fixed function definition expression (e.g., a calculation formula for a neuro hidden layer or a parameter superposition of a polynomial discriminant function). Implement the original learning program. However, the addition and deletion of rules can be performed dynamically. Furthermore, the target function can be changed depending on whether the importance of improving character identification accuracy is important or the number of cases. If the calculation formula of the objective function can be dynamically changed, and it is necessary to perform parameter learning by the gradient method accordingly, the partial differential equation must also be changed dynamically.

目標関数を求めるプログラム(関数)は、if文、for文、数学関数・数学演算から成立する。これらの内、数学関数・数学演算部分を、自動微分の数構造を利用して記述する。自動微分を用いると定義した関数から値と微分値を同時に求める事ができるため、計算式の変更に対しても容易に微分値の導出が行える。かつ、回帰分析と組み合わせることで、文字識別精度向上に有効と思われるルールに絞って、パラメータ調整を行うことができる。 A program (function) for obtaining a target function consists of an if statement, a for statement, a mathematical function, and a mathematical operation. Of these, the mathematical function / math operation part is described using the number structure of automatic differentiation. Since the value and the differential value can be obtained simultaneously from the defined function by using automatic differentiation, the differential value can be easily derived even when the calculation formula is changed. In addition, by combining with regression analysis, parameter adjustment can be performed focusing on rules that are considered effective for improving character identification accuracy.

０２０１…学習用文字パタン、０２０４…単純学習辞書、検索補完辞書・コンフュージョンマトリクス情報…０２０６、検索補完辞書・扁／旁情報…０２０９、０２１１…手書きパタン合成、０２１３…高次識別器学習
０６０１…画像入力部、０６０２…文書構造解析部、０６０３…文字行抽出部、０６０４…文字列仮説作成部、０６０５…文字列表記解析部、０６０６…テキスト出力部、０６０１…従来の文書処理システムに入力される紙文書 0201 ... Character pattern for learning, 0204 ... Simple learning dictionary, search complement dictionary / confusion matrix information ... 0206, search complement dictionary / fuzzy / darkness information ... 0209, 0211 ... handwritten pattern synthesis, 0213 ... higher classifier learning 0601 ... Image input unit 0602 ... Document structure analysis unit 0603 ... Character line extraction unit 0604 ... Character string hypothesis creation unit 0605 ... Character string notation analysis unit 0606 ... Text output unit 0601 ... Input to a conventional document processing system Paper document

Claims

A sample character image input receiving unit for receiving input of a sample character image;
A character component extraction unit that extracts a character component based on the sample character image;
A pseudo-character model generation unit that generates a pseudo-character model based on the character parts;
An identification dictionary generating unit for generating an identification dictionary by generating a character identification pattern based on the pseudo-character model;
A character identification system comprising:

The character identification system according to claim 1,
A character image input receiving unit for receiving input of a character image;
An identification unit that identifies the character image using the identification dictionary and generates an identification result;
A character identification system further comprising:

The character identification system according to claim 2,
An identification result output for outputting the identification result;
An identification result success / failure accepting unit that accepts input of success / failure information of the identification result;
A feedback unit that updates the character identification pattern of the identification dictionary based on the success / failure information;
A character identification system further comprising:

The character identification system according to claim 1,
It further has a parts information database storing part information constituting the character pattern,
The character part includes stroke order information,
The character part extraction unit extracts the character parts based on the stroke order information.

The character identification system according to claim 2,
A flat information database for storing flat information indicating the relationship between flat and characters;
A character candidate interpolation unit that extracts characters related to the identification result as character candidates using the prone information;
A character identification system further comprising:

The character identification system according to claim 2,
An OCR unit that captures a document and converts it into electronic image data;
A document structure analysis unit for specifying a document structure of the document based on the electronic image data;
A character extraction unit that extracts a character image to be read based on the document structure and inputs the character image to the character image input reception unit;
A character identification system further comprising: