JP6057112B1

JP6057112B1 - Character recognition apparatus, method and program

Info

Publication number: JP6057112B1
Application number: JP2016084081A
Authority: JP
Inventors: 択渡久地
Original assignee: Ai Inside; AI Inside Inc
Current assignee: Ai Inside; AI Inside Inc
Priority date: 2016-04-19
Filing date: 2016-04-19
Publication date: 2017-01-11
Anticipated expiration: 2036-04-19
Also published as: JP2017194806A

Abstract

【課題】様々な手書き文字が含まれている書類について、高精度に文字認識を行うことができる文字認識装置、方法およびプログラムを提供すること。【解決手段】多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する第１画像認識部１１と、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する第２画像認識部１２と、第２画像認識部１２により出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストに形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する自然言語処理部１３と、第１画像認識部１１の出力と自然言語処理部１３の出力とを対比する判断部１４とを備える。【選択図】図１To provide a character recognition device, method, and program capable of performing character recognition with high accuracy on a document containing various handwritten characters. A feature point is extracted from image data using a multilayer neural network, a first image recognition unit 11 for outputting a plurality of text candidates and likelihoods, and a feature point from the image data using a multilayer neural network. And the second image recognition unit 12 that outputs the text candidates separated into the prime factor text that is a factor smaller than the text and the likelihood, and the text candidates output by the second image recognition unit 12, A natural language processing unit 13 that joins and separates adjacent prime factor texts, performs morphological analysis on a plurality of patterns of text to be combined, and outputs a plurality of text candidates and likelihoods that are likely from a natural language perspective; And a determination unit 14 that compares the output of the first image recognition unit 11 and the output of the natural language processing unit 13. [Selection] Figure 1

Description

本発明は、画像データの文字認識を行う文字認識装置、方法およびプログラムに関する。 The present invention relates to a character recognition apparatus, method, and program for character recognition of image data.

手書きで文字が記入された書類をイメージスキャナなどで読み取り、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）処理を行うことにより、入力情報を所定の文字コードに変換したデジタルデータを生成する技術が普及している。 2. Description of the Related Art A technique for generating digital data in which input information is converted into a predetermined character code by reading a handwritten document with an image scanner or the like and performing OCR (Optical Character Recognition) processing has become widespread.

例えば、特許文献１によれば、機械学習により文字認識を行う文字識別システムが開示されている。文字認識システムは、見本文字画像の入力を受け付ける文字画像入力受付部と、見本文字画像に基づいて文字部品を抽出する文字部品抽出と、文字部品に基づいて擬似文字モデルを生成する擬似文字モデル生成部と、擬似文字モデルに基づいて文字識別パターンを生成して識別辞書を生成する識別辞書生成と、により構成されている。 For example, according to Patent Document 1, a character identification system that performs character recognition by machine learning is disclosed. The character recognition system includes a character image input receiving unit that receives input of a sample character image, character component extraction that extracts a character component based on the sample character image, and pseudo character model generation that generates a pseudo character model based on the character component And an identification dictionary generation for generating an identification dictionary by generating a character identification pattern based on a pseudo-character model.

また、特許文献２によれば、文字認識後に形態素解析をして、品詞尤度と文字類似度の両方が含まれる特徴量ベクトルを用いて、尤もらしさを判定する情報処理装置が開示されている。情報処理装置は、文字認識結果に対して形態素解析を行う形態素解析手段と、文字認識結果の文字について、形態素解析手段による形態素解析結果である対象の文字が属する単語の品詞らしさから作成するＰ種の品詞それぞれの品詞尤度と、該文字認識結果の各文字の文字類似度によって構成されるＰ＋１個の要素の特徴量ベクトルを作成する特徴量ベクトル作成手段と、特徴量ベクトル作成手段によって作成された特徴量ベクトルから、文字認識結果の各文字の確信度を算出する確信度算出手段とにより構成されている。 Patent Document 2 discloses an information processing apparatus that performs morphological analysis after character recognition and determines likelihood using a feature vector that includes both part-of-speech likelihood and character similarity. . The information processing apparatus includes: a morpheme analysis unit that performs morpheme analysis on a character recognition result; and a P type that is created based on a part of speech quality of a word to which a target character that is a morpheme analysis result by the morpheme analysis unit belongs A feature quantity vector creating means for creating a feature quantity vector of P + 1 elements composed of the part of speech likelihood of each part of speech and the character similarity of each character of the character recognition result, and a feature quantity vector creating means. The certainty factor calculating means for calculating the certainty factor of each character of the character recognition result from the feature amount vector.

特開２０１５−０６９２５６号公報Japanese Patent Application Laid-Open No. 2015-069256 特開２０１４−１２００５９号公報JP 2014-120059 A

上述した特許文献１，２によっても、様々な手書き文字（例えば、達筆な手書き文字や薄くて雑な手書き文字など）の認識を高精度に行うことは困難であり、さらに高精度に文字認識を行いたい要望がある。 According to Patent Documents 1 and 2 described above, it is difficult to recognize various handwritten characters (for example, handwritten characters that are handwritten and thin handwritten characters) with high accuracy. There is a desire to do.

本発明では、様々な手書き文字が含まれている書類について、高精度に文字認識を行うことができる文字認識装置、方法およびプログラムを提供することを目的とする。 An object of the present invention is to provide a character recognition device, method, and program capable of performing character recognition with high accuracy on a document containing various handwritten characters.

上記目的を達成するために、本発明の一態様における文字認識装置は、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する第１画像認識部と、多層のニューラルネットワークを用いて前記画像データから特徴点を抽出し、前記テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する第２画像認識部と、前記第２画像認識部により出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストにそれぞれ形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する自然言語処理部と、前記第１画像認識部により出力されたテキスト候補と、前記自然言語処理部により出力されたテキスト候補とを対比する判断部とを備え、前記判断部は、所定以上の尤度のテキストを出力する。 In order to achieve the above object, a character recognition device according to an aspect of the present invention extracts a feature point from image data using a multilayer neural network, and outputs a plurality of text candidates and likelihoods. Extracting a feature point from the image data using a multi-layer neural network, and outputting a text candidate separated into a prime factor text that is a factor smaller than the text and a likelihood; and Adjacent prime factor texts are joined and separated from the text candidates output by the second image recognition unit, and morpheme analysis is performed on each of the combined patterns of text, which is plausible from a natural language perspective. A natural language processing unit that outputs a plurality of text candidates and likelihoods; a text candidate that is output by the first image recognition unit; Serial and a determining section for comparing the text candidate output by the natural language processing unit, wherein the determination unit outputs the text of a predetermined or more likelihoods.

また、本発明の一態様における文字認識装置では、前記第１画像認識部により出力された最も尤度の高いテキスト候補と、前記自然言語処理部により出力された最も尤度の高いテキスト候補とを対比し、所定以上の尤もらしさを得られなかった場合、所定以上の尤もらしさが得られるまで、尤度の高い順に他の候補同士の対比を行う構成でもよい。 In the character recognition device according to the aspect of the present invention, the most likely text candidate output by the first image recognition unit and the most likely text candidate output by the natural language processing unit In contrast, if a likelihood greater than or equal to a predetermined value is not obtained, another candidate may be compared in order of the likelihood until a likelihood greater than or equal to a predetermined value is obtained.

また、本発明の一態様における文字認識装置では、前記判断部は、前記自然言語処理部により出力されたテキスト候補の中で尤度の高いテキスト候補を、前記第１画像認識部により出力されたテキスト候補よりも優先的に扱う構成でもよい。 In the character recognition device according to an aspect of the present invention, the determination unit outputs a text candidate having a high likelihood among the text candidates output by the natural language processing unit, by the first image recognition unit. It may be configured to handle with priority over text candidates.

また、本発明の一態様における文字認識装置では、前記判断部は、対比した結果、２つのテキスト候補の差分が所定の閾値を超えない場合に、所定以上の尤度のテキストであると判断して出力する構成でもよい。 In the character recognition device according to an aspect of the present invention, the determination unit determines that the text has a likelihood greater than or equal to a predetermined value when the difference between the two text candidates does not exceed a predetermined threshold as a result of the comparison. May be configured to output.

また、本発明の一態様における文字認識装置では、処理にかかる時間を設定する設定部を備え、前記判断部は、前記第１画像認識部により出力された最も尤度の高いテキスト候補と、前記自然言語処理部により出力された最も尤度の高いテキスト候補とを対比し、所定以上の尤もらしさを得られなかった場合、前記設定部によって設定された時間以内において、所定以上の尤もらしさが得られるまで、尤度の高い順に他の候補同士の対比を行う構成でもよい。 In the character recognition device according to an aspect of the present invention, the character recognition device further includes a setting unit that sets a time required for the processing, and the determination unit includes the most likely text candidate output by the first image recognition unit, When the most likely text candidate output by the natural language processing unit is compared and a likelihood greater than or equal to a predetermined value is not obtained, a likelihood greater than or equal to a predetermined value is obtained within the time set by the setting unit. Until it is determined, another candidate may be compared in descending order of likelihood.

また、本発明の一態様における文字認識装置では、前記判断部は、所定以上の尤もらしさが得られなかったテキストを伏字にして出力する構成でもよい。 In the character recognition device according to an aspect of the present invention, the determination unit may be configured to output text in which the likelihood that the predetermined or higher likelihood is not obtained is converted to a buff.

また、本発明の一態様における文字認識装置では、前記第１画像認識部は、項目ごとに適した多層のニューラルネットワークを機械学習により有しており、前記画像データに含まれる項目を探索し、当該項目に適した多層のニューラルネットワークを用いて前記画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力し、前記第２画像認識部は、項目ごとに適した多層のニューラルネットワークを機械学習により有しており、前記画像データに含まれる項目を探索し、当該項目に適した多層のニューラルネットワークを用いて前記画像データから特徴点を抽出し、前記テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する構成でもよい。 In the character recognition device according to an aspect of the present invention, the first image recognition unit has a multilayer neural network suitable for each item by machine learning, searches for an item included in the image data, A feature point is extracted from the image data using a multi-layer neural network suitable for the item, a plurality of text candidates and likelihoods are output, and the second image recognition unit is a multi-layer neural network suitable for each item. Is a factor that is smaller than the text by searching for an item included in the image data, extracting a feature point from the image data using a multilayer neural network suitable for the item A configuration may be adopted in which text candidates and likelihoods separated into prime factor texts are output.

上記目的を達成するために、本発明の一態様における文字認識方法は、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する第１画像認識工程と、多層のニューラルネットワークを用いて前記画像データから特徴点を抽出し、前記テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する第２画像認識工程と、前記第２画像認識工程により出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストにそれぞれ形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する自然言語処理工程と、前記第１画像認識工程により出力されたテキスト候補と、前記自然言語処理工程により出力されたテキスト候補とを対比する判断工程とを備え、前記判断工程は、所定以上の尤度のテキストを出力する。 To achieve the above object, a character recognition method according to an aspect of the present invention includes a first image recognition step of extracting feature points from image data using a multilayer neural network and outputting a plurality of text candidates and likelihoods. Extracting a feature point from the image data using a multi-layer neural network, and outputting a text candidate and likelihood separated into a prime factor text that is a factor smaller than the text; and Adjacent prime factor texts are joined to and separated from the text candidates output in the second image recognition process, and morpheme analysis is performed on each of the combined patterns of text, which is plausible from a natural language perspective. A natural language processing step for outputting a plurality of text candidates and likelihoods, and a text output by the first image recognition step. Comprising a candidate, and a judgment step of comparing the output text candidates by the natural language processing steps, said determining step outputs a text of a predetermined or more likelihoods.

上記目的を達成するために、本発明の一態様における文字認識プログラムは、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する第１画像認識工程と、多層のニューラルネットワークを用いて前記画像データから特徴点を抽出し、前記テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する第２画像認識工程と、前記第２画像認識工程により出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストにそれぞれ形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する自然言語処理工程と、前記第１画像認識工程により出力されたテキスト候補と、前記自然言語処理工程により出力されたテキスト候補とを対比する判断工程と、をコンピュータによって実現するための文字認識プログラムであって、前記判断工程は、所定以上の尤度のテキストを出力する文字認識プログラムである。 To achieve the above object, a character recognition program according to one aspect of the present invention extracts a feature point from image data using a multilayer neural network, and outputs a plurality of text candidates and likelihoods. Extracting a feature point from the image data using a multi-layer neural network, and outputting a text candidate and likelihood separated into a prime factor text that is a factor smaller than the text; and Adjacent prime factor texts are joined to and separated from the text candidates output in the second image recognition process, and morpheme analysis is performed on each of the combined patterns of text, which is plausible from a natural language perspective. A natural language processing step for outputting a plurality of text candidates and likelihoods, and the first image recognition step. A character recognition program for realizing, by a computer, a determination step for comparing a text candidate and a text candidate output by the natural language processing step, wherein the determination step includes text having a likelihood greater than or equal to a predetermined value. This is a character recognition program to output.

本発明によれば、高精度に文字認識を行うことができる。 According to the present invention, character recognition can be performed with high accuracy.

文字認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of a character recognition apparatus. 画像認識部による特徴抽出とベクトル変換の様子を模式的に示す図である。It is a figure which shows typically the mode of the feature extraction and vector conversion by an image recognition part. 画像認識部による字種の判定の様子を模式的に示す図である。It is a figure which shows typically the mode of determination of the character type by an image recognition part. 第１画像認識部と第２画像認識部による画像認識についての説明に供する図である。It is a figure where it uses for description about the image recognition by a 1st image recognition part and a 2nd image recognition part. 文字認識装置の動作の流れについての説明に供するフローチャートである。It is a flowchart with which it uses for description about the flow of operation | movement of a character recognition apparatus.

以下、本発明の実施形態に係る文字認識装置、方法およびプログラムについて図面を参照しながら説明する。なお、実施形態を説明する全図において、共通の構成要素には同一の符号を付し、繰り返しの説明を省略する。 Hereinafter, a character recognition device, method, and program according to embodiments of the present invention will be described with reference to the drawings. In all the drawings for explaining the embodiments, common constituent elements are denoted by the same reference numerals, and repeated explanation is omitted.

以下では、一例として、手書き文字が含まれた帳票やアンケート用紙などの書類をスキャナ等で画像化し、画像化した画像データの文字を認識する文字認識装置の構成と動作について説明する。なお、手書き文字が含まれていない書類、いわゆる、書体データを利用してプリンタによって文字が印刷された書類であっても、印刷されている文字が掠れていたり、または、滲んでいたりすると、文字の認識率が低下する。本実施形態にかかる文字認識装置はこのような書類をスキャナ等で画像化し、画像化した画像データの文字の認識に適用されてもよい。 In the following, as an example, the configuration and operation of a character recognition apparatus that recognizes characters in imaged image data by imaging a document such as a form or a questionnaire sheet containing handwritten characters with a scanner or the like will be described. Even if a document does not contain handwritten characters, that is, a document in which characters are printed using a typeface data, if the printed characters are blurred or blurred, the characters The recognition rate decreases. The character recognition apparatus according to the present embodiment may be applied to the recognition of characters of image data obtained by imaging such a document with a scanner or the like.

文字認識装置１は、概念的には、バックプロパゲーション（誤差逆伝播法）によって、入力層、一または複数の中間層、および出力層から構成される多層のニューラルネットワークを学習させるアルゴリズムを利用して、文字認識を行う。 Conceptually, the character recognition device 1 uses an algorithm for learning a multilayer neural network including an input layer, one or a plurality of intermediate layers, and an output layer by backpropagation (error back propagation method). Character recognition.

具体的には、文字認識装置１は、図１に示すように、画像データを入力する入力部１０と、画像データを認識してテキストを生成する画像認識部１１と、画像認識部１１により生成されたテキストを自然言語処理する自然言語処理部１２と、画像認識部１１により生成されたテキストと自然言語処理部１２により自然言語処理されたテキストを対比する判断部１３とを備える。また、文字認識装置１は、判断部１３から出力されるテキストを出力する出力部１６を備える。出力部１６は、テキストを項目に分けてｃｓｖデータとして出力してもよい。また、文字認識装置１は、画像認識部１１、自然言語処理部１２および判断部１３とを独立に機能させ、それぞれの出力結果が相互に出力結果に影響を与えることにより、所定以上の尤度のテキストを出力する。 Specifically, as shown in FIG. 1, the character recognition device 1 is generated by an input unit 10 that inputs image data, an image recognition unit 11 that recognizes image data and generates text, and an image recognition unit 11. A natural language processing unit 12 that performs natural language processing on the text that has been processed, and a determination unit 13 that compares the text generated by the image recognition unit 11 with the text subjected to natural language processing by the natural language processing unit 12. The character recognition device 1 also includes an output unit 16 that outputs text output from the determination unit 13. The output unit 16 may divide the text into items and output them as csv data. In addition, the character recognition device 1 causes the image recognition unit 11, the natural language processing unit 12, and the determination unit 13 to function independently, and each output result affects the output result. The text of is output.

入力部１０は、例えば、スキャナ装置によって構成されており、書類を画像化して画像データを生成し、生成した画像データを記憶部１４に入力する。 The input unit 10 is constituted by, for example, a scanner device, generates an image data by imaging a document, and inputs the generated image data to the storage unit 14.

ここで、画像認識部１１の動作について説明する。画像認識部１１は、画像データに基づいて、例えば、罫線抽出、枠構造解析、読取対象枠の位置推定などの文書構造解析を行う。次に、画像認識部１１は、文書構造解析の結果を受けて、読取対象である文字行を抽出する。次に、画像認識部１１は、文字行画像から文字パターン候補の切出しと、各文字パターンの文字識別を行う。 Here, the operation of the image recognition unit 11 will be described. The image recognition unit 11 performs document structure analysis such as ruled line extraction, frame structure analysis, and position estimation of the reading target frame based on the image data. Next, the image recognition unit 11 receives a result of the document structure analysis and extracts a character line to be read. Next, the image recognition unit 11 cuts out character pattern candidates from the character line image and performs character identification of each character pattern.

つぎに、文字識別の手順について説明する。画像認識部１１は、図２に示すように、切出した１個の文字パターンの画像データに対して特徴抽出を行う。そして、画像認識部１１は、文字のストロークの方向成分などを抽出して、画像データを１つのベクトルに変換する。図２に示す例では、画像データＸが多層のニューラルネットワークに入力され、方向や位置等の特徴を捉えて特徴抽出をされている様子を模式的に示している。また、図２に示す例では、ベクトルＸ_１と、ベクトルＸ_２と、ベクトルＸ_３とに変換された様子を模式的に示している。 Next, a character identification procedure will be described. As shown in FIG. 2, the image recognition unit 11 performs feature extraction on the image data of one extracted character pattern. Then, the image recognition unit 11 extracts the direction component of the character stroke and converts the image data into one vector. In the example illustrated in FIG. 2, the image data X is input to a multi-layer neural network, and a state in which features are extracted by capturing features such as directions and positions is schematically illustrated. Further, in the example illustrated in FIG. 2, a state in which the vectors are converted into the vector X ₁ , the vector X _2, and the vector X ₃ is schematically illustrated.

画像認識部１１は、図３に示すように、変換されたベクトルに基づいて、字種が何であるかを判定する。画像認識部１１は、当該判定において、事前に大量のパターンを使った分布の様子から、どの字種が特徴空間上のどの辺に分布しているかを保持している辞書データを参照し、未知の入力パターンでる画像データの候補を決定する。図３に示す例では、辞書データにおいて、字種「中」、字種「申」および字種「十」の情報が記憶されている様子を概念的に示している。 As shown in FIG. 3, the image recognition unit 11 determines what the character type is based on the converted vector. In the determination, the image recognition unit 11 refers to the dictionary data holding which character type is distributed to which side in the feature space from the state of distribution using a large number of patterns in advance, and is unknown Image data candidates with the input pattern are determined. In the example shown in FIG. 3, the dictionary data conceptually shows the information of the character type “medium”, the character type “Sen”, and the character type “10”.

画像認識部１１は、以上のプロセスにより、複数のテキスト候補（例えば、中、申、十）と、各テキスト候補の尤度を取得する。なお、各テキスト候補の尤度は、特徴空間内における各候補の中心と、未知の入力パターンである画像データとの距離で算出することができる。 The image recognizing unit 11 acquires a plurality of text candidates (for example, Naka, Shin, and Ten) and the likelihood of each text candidate by the above process. The likelihood of each text candidate can be calculated by the distance between the center of each candidate in the feature space and image data that is an unknown input pattern.

画像認識部１１は、図１に示すように、第１画像認識部１１ａと、第２画像認識部１１ｂとから構成されている。 As shown in FIG. 1, the image recognizing unit 11 includes a first image recognizing unit 11a and a second image recognizing unit 11b.

第１画像認識部１１ａは、記憶部１５から画像データを読み出し、多層のニューラルネットワークを用いて当該画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する。 The first image recognition unit 11a reads image data from the storage unit 15, extracts feature points from the image data using a multilayer neural network, and outputs a plurality of text candidates and likelihoods.

第２画像認識部１１ｂは、記憶部１５から画像データを読み出し、多層のニューラルネットワークを用いて当該画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する。 The second image recognition unit 11b reads the image data from the storage unit 15, extracts feature points from the image data using a multilayer neural network, and separates the text candidates into prime factor texts that are factors smaller than the text. And likelihood.

第１画像認識部１１ａと第２画像認識部１１ｂとの主な違いは、画像認識の機械学習を行う際に利用する学習データの違いである。第１画像認識部１１ａは１文字を１文字として出力するように学習データが用意されているのに対し、第２画像認識部１１ｂは１文字をより小さい因子である素因子テキストに分離して出力するように学習データが用意されている。 The main difference between the first image recognition unit 11a and the second image recognition unit 11b is a difference in learning data used when performing machine learning for image recognition. The first image recognition unit 11a prepares learning data so that one character is output as one character, whereas the second image recognition unit 11b separates one character into a prime factor text that is a smaller factor. Learning data is prepared for output.

ここで、第１画像認識部１１ａによる画像認識と第２画像認識部１１ｂによる画像認識の具体例について説明する。以下では、第１画像認識部１１ａおよび第２画像認識部１１ｂ、例えば、図４（ａ）に示すように、画像データＡ１について画像認識を行う場合について説明する。 Here, specific examples of image recognition by the first image recognition unit 11a and image recognition by the second image recognition unit 11b will be described. Hereinafter, a case where image recognition is performed on the first image recognition unit 11a and the second image recognition unit 11b, for example, image data A1 as illustrated in FIG. 4A will be described.

第１画像認識部１１ａは、画像データＡ１を分割する処理を行う。本実施例では、分割する処理により、画像データＡ１は、図４（ｂ）に示すように、４つの画像データａ１，ａ２，ａ３，ａ４に分割できたものとする。 The first image recognition unit 11a performs processing for dividing the image data A1. In this embodiment, it is assumed that the image data A1 can be divided into four pieces of image data a1, a2, a3, and a4 as shown in FIG.

第１画像認識部１１ａは、多層のニューラルネットワークを用いて画像データａ１から特徴点を抽出し、複数の候補（例えば、「高」，「喬」，「富」，「畜」等）を生成し、各候補の尤度を計算する。 The first image recognition unit 11a extracts feature points from the image data a1 using a multilayer neural network, and generates a plurality of candidates (for example, “high”, “，”, “wealth”, “livestock”, etc.) Then, the likelihood of each candidate is calculated.

第１画像認識部１１ａは、多層のニューラルネットワークを用いて画像データａ２から特徴点を抽出し、複数の候補（例えば、「校」，「核」，「梓」，「検」等）を生成し、各候補の尤度を計算する。 The first image recognition unit 11a extracts feature points from the image data a2 using a multilayer neural network, and generates a plurality of candidates (for example, “school”, “core”, “梓”, “inspection”, etc.) Then, the likelihood of each candidate is calculated.

第１画像認識部１１ａは、多層のニューラルネットワークを用いて画像データａ３から特徴点を抽出し、複数の候補（例えば、「時」，「暁」，「待」，「晤」等）を生成し、各候補の尤度を計算する。 The first image recognition unit 11a extracts feature points from the image data a3 using a multilayer neural network, and generates a plurality of candidates (for example, “hour”, “，”, “wait”, “晤”, etc.) Then, the likelihood of each candidate is calculated.

第１画像認識部１１ａは、多層のニューラルネットワークを用いて画像データａ４から特徴点を抽出し、複数の候補（例えば、「代」，「付」，「何」，「仕」等）を生成し、各候補の尤度を計算する。 The first image recognition unit 11a uses a multilayer neural network to extract feature points from the image data a4, and generates a plurality of candidates (for example, “price”, “addition”, “what”, “finish”, etc.) Then, the likelihood of each candidate is calculated.

第１画像認識部１１ａは、各候補の尤度に基づいて、例えば、「高校時代」、「喬核暁付」等をテキスト候補として判断部１３に出力する。 Based on the likelihood of each candidate, the first image recognizing unit 11a outputs, for example, “high school age”, “喬喬暁” etc. to the determination unit 13 as text candidates.

第２画像認識部１１ｂは、第１画像認識部１１ａよりも小さい単位である素因子テキストを生成するように画像データＡ１を分割する処理を行う。素因子テキストとは、１文字をより小さい因子で表したテキストである。例えば、「校」という画像データから小さい因子で表した「木」と「交」が素因子テキストである。本実施例では、分割する処理により、画像データＡ１は、図４（ｃ）に示すように、６つの画像データｂ１，ｂ２，ｂ３，ｂ４，ｂ５，ｂ６に分割できたものとする。 The second image recognition unit 11b performs a process of dividing the image data A1 so as to generate a prime factor text that is a smaller unit than the first image recognition unit 11a. Prime factor text is text that represents a single character with a smaller factor. For example, “tree” and “intersection” represented by small factors from the image data “school” are prime factor texts. In this embodiment, it is assumed that the image data A1 can be divided into six pieces of image data b1, b2, b3, b4, b5, and b6 by the dividing process as shown in FIG.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ１から特徴点を抽出し、複数の素因子テキストの候補（例えば、「高」，「喬」，「富」，「畜」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b1 using a multi-layer neural network, and a plurality of prime factor text candidates (for example, “high”, “喬”, “rich”, “livestock”). Etc.) and the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ２から特徴点を抽出し、複数の素因子テキストの候補（例えば、「木」，「不」，「六」，「禾」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b2 using a multilayer neural network, and a plurality of prime factor text candidates (for example, “tree”, “not”, “six”, “禾”). Etc.) and the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ３から特徴点を抽出し、複数の素因子テキストの候補（例えば、「交」，「定」，「気」，「充」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b3 using a multi-layer neural network, and a plurality of prime factor text candidates (for example, “intersection”, “fix”, “ki”, “fill”). Etc.) and the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ４から特徴点を抽出し、複数の素因子テキストの候補（例えば、「日」，「曰」，「月」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b4 using a multilayer neural network, and generates a plurality of prime factor text candidates (for example, “day”, “，”, “month”, etc.). Then, the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ５から特徴点を抽出し、複数の素因子テキストの候補（例えば、「寺」，「圭」，「茉」，「苦」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b5 using a multi-layer neural network, and a plurality of prime factor text candidates (for example, “Tera”, “圭”, “茉”, “bitter”). Etc.) and the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、多層のニューラルネットワークを用いて画像データｂ６から特徴点を抽出し、複数の素因子テキストの候補（例えば、「代」，「付」，「何」，「仕」等）を生成し、各候補の尤度を計算する。 The second image recognition unit 11b extracts feature points from the image data b6 using a multi-layer neural network, and a plurality of prime factor text candidates (for example, “price”, “addition”, “what”, “finish”). Etc.) and the likelihood of each candidate is calculated.

第２画像認識部１１ｂは、各素因子テキストの尤度に基づいて、例えば、「高木交日寺代」、「喬不定曰圭付」等をテキスト候補として判断部１３に出力する。 Based on the likelihood of each prime factor text, the second image recognition unit 11b outputs, for example, “Takagi Kyohichijiro”, “喬喬不定曰圭” etc. to the determination unit 13 as text candidates.

自然言語処理部１３は、第２画像認識部１１ｂにより出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストに形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する。 The natural language processing unit 13 joins and separates adjacent prime factor texts from the text candidates output by the second image recognition unit 11b, performs morpheme analysis on a plurality of patterns of combined text, Output multiple text candidates and likelihoods that are likely from a linguistic point of view.

具体的には、自然言語処理部１３は、「高木交日寺代」について、「高木」「交日寺代」に分離したり、「高」「木交」「日寺代」に分離したりして、それぞれに対して形態素解析を行って、それぞれの尤度を計算する。 Specifically, the natural language processing unit 13 separates “Takagi Kyo Nichitera” into “Takagi” and “Kohichi Terayo”, or separates it into “Taka”, “Kiko”, and “Hijitera”. Or, morphological analysis is performed on each of them, and each likelihood is calculated.

また、自然言語処理部１３は、「高木」「交日寺代」について、「日」と「寺」を接合して「時」にし、「高木」「交時代」にして、形態素解析を行って、尤度を計算する。 Further, the natural language processing unit 13 performs morphological analysis on “Takagi” and “Kojitsujiro” by joining “Sun” and “Tera” to “Time” and “Takagi” and “Kyodai”. And calculate the likelihood.

また、自然言語処理部１３は、「高」「木交」「日寺代」について、「木」と「交」を接合して「校」にし、前のテキスト「高」と組み合わせて「高校」にし、「高校」「日寺代」にして、形態素解析を行って、尤度を計算する。 In addition, the natural language processing unit 13 joins “tree” and “crossing” into “school” for “high”, “kikko”, “hidera dai”, and combines “high school” with the previous text “high”. ”,“ High school ”,“ Hijirayo ”, perform morphological analysis, and calculate the likelihood.

また、自然言語処理部１３は、「高校」「日寺代」について、「日」と「寺」を接合して「時」にし、「高校」「時代」にして、形態素解析を行って、尤度を計算する。 The natural language processing unit 13 joins “day” and “temple” to “time” and “high school” and “era” for “high school” and “Hijirayo”, and performs morphological analysis. Calculate the likelihood.

このようにして、自然言語処理部１３は、第２画像認識部１１ｂから出力されたテキスト候補を、全てのパターンで接合、切り離しの処理を行う。例えば、「私の高木交日寺代は」というテキストについて形態素解析を行うと、「主語」「格助詞」「名詞（苗字）」「名詞」「名詞」「係助詞」となり、自然言語的な観点から不当な並びとなるため、当該テキストに対しては、尤度が低いという評価を行う。一方、「私の高校時代は」というテキストについて形態素解析を行うと、「主語」「格助詞」「名詞」「係助詞」となり、自然言語的な観点から適当な並びとなるため、当該テキストに対しては、尤度が高いという評価行う。素因子テキストに分離し、全てのパターンで組み合わせたテキスト候補に対して形態素解析による評価を行うことで、画像認識のエラーによる誤りを減らすことができる。 In this way, the natural language processing unit 13 performs processing for joining and separating the text candidates output from the second image recognition unit 11b with all patterns. For example, a morphological analysis of the text “My Takagi Kyohijidaiha” would result in “subject”, “case particle”, “noun (surname)”, “noun”, “noun”, and “coordinate particle”. Since it is an unreasonable arrangement from the viewpoint, the text is evaluated to have a low likelihood. On the other hand, when the morphological analysis is performed on the text “My high school days”, it becomes “subject”, “case particle”, “noun”, and “participant particle”. On the other hand, it is evaluated that the likelihood is high. An error due to an image recognition error can be reduced by performing evaluation by morphological analysis on a text candidate separated into prime factor texts and combined in all patterns.

判断部１４は、第１画像認識部１１ａにより出力されたテキスト候補（例えば、「高校時代」）と、自然言語処理部１３により出力されたテキスト候補（例えば、「高木交日寺代」）とを対比する。 The determination unit 14 includes a text candidate output by the first image recognition unit 11a (for example, “high school age”), a text candidate output by the natural language processing unit 13 (for example, “Takagi Koji Neraji”), Contrast.

判断部１４は、対比した結果に基づいて、所定以上の尤度のテキスト（例えば、「高校時代」）を出力する。 Based on the comparison result, the determination unit 14 outputs a text with a predetermined likelihood or higher (for example, “high school age”).

画像認識を重視した第１画像認識部１１ａと、自然言語を重視した第２画像認識部１１ｂという二つの異なる観点のテキストを対比することで、文字認識装置１は、より高精度に文字認識を行うことができる。 By comparing the text from two different viewpoints, the first image recognition unit 11a that emphasizes image recognition and the second image recognition unit 11b that emphasizes natural language, the character recognition device 1 performs character recognition with higher accuracy. It can be carried out.

また、判断部１４は、第１画像認識部１１ａにより出力された最も尤度の高いテキスト候補と、自然言語処理部１３により出力された最も尤度の高いテキスト候補とを対比し、所定以上の尤もらしさ（例えば、一致率が９８パーセント以上）を得られなかった場合、所定以上の尤もらしさが得られるまで、尤度の高い順に他の候補同士の対比を行う構成でもよい。 In addition, the determination unit 14 compares the text candidate with the highest likelihood output by the first image recognition unit 11a with the text candidate with the highest likelihood output by the natural language processing unit 13, and exceeds a predetermined value. When the likelihood (for example, the coincidence rate is 98% or more) is not obtained, the other candidates may be compared with each other in the descending order of likelihood until a predetermined or more likelihood is obtained.

判断部１４は、第１画像認識部１１ａと自然言語処理部１３で得られたすべての出力パターンの中から、どの回答が尤もらしいかの重みを画像認識部１１にフィードバックする。また、判断部１４は、出力パターンには存在しない「Ｘという文字である可能性」をフィードバックする。画像認識部１１は、当該フィードバックを受けて、再度重み付けに基づく画像認識を行う。 The determination unit 14 feeds back to the image recognition unit 11 the weight of which answer is likely from all the output patterns obtained by the first image recognition unit 11a and the natural language processing unit 13. Further, the determination unit 14 feeds back “possibility of letter X” that does not exist in the output pattern. The image recognition unit 11 receives the feedback and performs image recognition based on weighting again.

例えば、判断部１４は、第１画像認識部１１ａにより出力された最も尤度の高いテキスト候補である「喬校時付」と、自然言語処理部１３により出力された最も尤度の高いテキスト候補である「高木交時代」とを対比し、差分が大きく、所定以上の尤もらしさを得られなかった場合、第１画像認識部１１ａにより出力された次に尤度の高いテキスト候補である「高校時代」と、自然言語処理部１３により出力された最も尤度の高いテキスト候補である「高校時代」とを対比し、所定以上の尤もらしさが得られた場合、「高校時代」を出力する。 For example, the determination unit 14 includes the “highest likelihood text candidate” output by the first image recognition unit 11 a and the highest likelihood text candidate output by the natural language processing unit 13. If the difference is large and the likelihood greater than or equal to the predetermined value is not obtained, “high school” which is the next most likely text candidate output by the first image recognition unit 11a is compared. The “high school age” is output when the likelihood is more than a predetermined level by comparing the “era” with the “high school age” that is the most likely text candidate output by the natural language processing unit 13.

よって、文字認識装置１は、様々な手書き文字が含まれている書類について、所定以上の尤もらしさが得られるまで処理を繰り返すので、信頼性の高い高精度な文字認識を行うことができる。 Therefore, since the character recognition device 1 repeats the process for a document containing various handwritten characters until a likelihood equal to or higher than a predetermined value is obtained, highly reliable and highly accurate character recognition can be performed.

また、判断部１４は、自然言語処理部１３により出力されたテキスト候補の中で、尤度の高いテキスト候補を第１画像認識部１１ａにより出力されたテキスト候補よりも優先的に扱う構成でもよい。 In addition, the determination unit 14 may have a configuration in which a text candidate having a high likelihood among text candidates output by the natural language processing unit 13 is preferentially handled over a text candidate output by the first image recognition unit 11a. .

当該構成の場合には、文字認識装置１は、より自然言語処理部１３による処理を優先するので、自然言語処理に適しているといえ、文章の文字認識に対して強みを発揮する。 In the case of this configuration, the character recognition device 1 gives priority to the processing by the natural language processing unit 13, and thus is suitable for natural language processing, and exhibits strength in character recognition of sentences.

また、判断部１４は、対比した結果、２つのテキスト候補の差分が所定の閾値を超えない場合に、所定以上の尤度のテキストであると判断して出力する構成でもよい。つまり、判断部１４は、自然言語処理部１３により出力されたテキスト候補と、第１画像認識部１１ａにより出力されたテキスト候補とがほぼ一致している場合に、所定以上の尤度のテキストであると判断する。 In addition, the determination unit 14 may be configured to determine that the text has a likelihood equal to or higher than a predetermined value and output it when the difference between the two text candidates does not exceed a predetermined threshold value as a result of the comparison. That is, when the text candidate output by the natural language processing unit 13 and the text candidate output by the first image recognition unit 11a substantially coincide with each other, the determination unit 14 uses a text having a likelihood greater than or equal to a predetermined value. Judge that there is.

当該構成の場合には、文字認識装置１は、自然言語処理部１３と第１画像認識部１１ａの二つの処理の結果を利用して文字認識を行っているので、高精度に文字認識を行うことができる。 In the case of this configuration, the character recognition device 1 performs character recognition using the results of the two processes of the natural language processing unit 13 and the first image recognition unit 11a, and thus performs character recognition with high accuracy. be able to.

文字認識装置１は、図１に示すように、処理にかかる時間を設定する設定部１５を備える構成でもよい。設定部１５は、例えば、ユーザの指示にしたがって、処理時間を１０分などに設定する。 As shown in FIG. 1, the character recognition device 1 may include a setting unit 15 that sets a time required for processing. For example, the setting unit 15 sets the processing time to 10 minutes or the like in accordance with a user instruction.

判断部１４は、第１画像認識部１１ａにより出力された最も尤度の高いテキスト候補と、自然言語処理部１３により出力された最も尤度の高いテキスト候補とを対比し、所定以上の尤もらしさを得られなかった場合、設定部１５によって設定された時間以内において、所定以上の尤もらしさが得られるまで、尤度の高い順に他の候補同士の対比を行う。 The determination unit 14 compares the text candidate having the highest likelihood output from the first image recognition unit 11a with the text candidate having the highest likelihood output from the natural language processing unit 13, and has a likelihood greater than or equal to a predetermined value. In the case where the values are not obtained, the other candidates are compared in descending order of likelihood until a likelihood equal to or higher than a predetermined value is obtained within the time set by the setting unit 15.

よって、文字認識装置１は、設定された時間内において、所定以上の尤もらしさが得られるまで処理を繰り返すので、認識処理を何回行っても所定以上の尤もらしさが得られないような認識困難な画像データに対して何度も試行を繰り返すことがないメリットがある。 Therefore, since the character recognition device 1 repeats the process until a predetermined likelihood or more is obtained within the set time, it is difficult to recognize that no more than a predetermined likelihood is obtained no matter how many times the recognition process is performed. There is a merit that the trial is not repeated many times for the image data.

また、判断部１４は、所定以上の尤もらしさが得られなかったテキストを伏字にして出力する構成でもよい。 Moreover, the determination part 14 may be the structure which outputs the text in which the likelihood more than predetermined was not obtained as a baffle.

伏字とは、文字認識ができなかった箇所を示すものであり、例えば、「○」や「△」などである。具体的には、「高校時代」の「時」の箇所に対して所定以上の尤もらしさが得られなかった場合には、判断部１４は、「高校○代」を出力する。 A prone character indicates a portion where character recognition could not be performed, such as “◯” and “Δ”. Specifically, when a likelihood greater than or equal to a predetermined value is not obtained for the “time” portion of “high school age”, the determination unit 14 outputs “high school ○ fee”.

よって、文字認識装置１は、文字識別できた箇所と文字識別ができなかった箇所を明示して出力することができる。なお、伏字にした箇所について、正しい文字（本実施例では、「○」の箇所は「時」である）を文字認識装置１にフィードバックしてもよい。当該フィードバックにより、文字認識装置１は、次回の文字認識において、前回「○」となった画像データを正しいテキスト「校」として出力することができる。 Therefore, the character recognition device 1 can clearly output the location where the character was identified and the location where the character could not be identified. It should be noted that the correct character (in this embodiment, the portion of “◯” is “hour”) may be fed back to the character recognition device 1 for the portion that has been turned upside down. With this feedback, the character recognition device 1 can output the image data that was previously “◯” as the correct text “school” in the next character recognition.

また、第２画像認識部１１ｂは、項目ごとに適した多層のニューラルネットワークを機械学習により有しており、画像データに含まれる項目を探索し、当該項目に適した多層のニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補を出力する構成でもよい。 The second image recognition unit 11b has a multilayer neural network suitable for each item by machine learning, searches for an item included in the image data, and uses the multilayer neural network suitable for the item. The configuration may be such that feature points are extracted from the image data, and text candidates separated into prime factor texts, which are factors smaller than the text, are output.

例えば、「申込日」のような項目には、「２０１５年５月１日」等の申込日に関する情報が入力されることが予想できる。つまり、「申込日」のような項目は、数字「０〜９」と、漢字「年」，「月」，「日」が入力され、他の文字は入力されない。よって、第２画像認識部１１ｂは、認識する項目が「申込日」のような場合には、数字「０〜９」と、漢字「年」，「月」，「日」とを出力するようなニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補を出力する。 For example, information relating to an application date such as “May 1, 2015” can be expected to be input to an item such as “application date”. That is, for items such as “application date”, the numbers “0-9” and the kanji characters “year”, “month”, “day” are input, and no other characters are input. Therefore, when the recognized item is “application date”, the second image recognition unit 11b outputs the numbers “0-9” and the kanji characters “year”, “month”, “day”. A feature point is extracted from image data using a simple neural network, and text candidates separated into prime factor texts, which are factors smaller than the text, are output.

また、「氏名＿フリガナ」のような項目には、「トッキョタロウ」等の氏名のカタカナに関する情報が入力されることが予想できる。つまり、「氏名＿フリガナ」のような項目は、カタカナ「ア〜ン」等が入力され、他の文字（漢字、数字等）は入力されない。よって、第２画像認識部１１ｂは、認識する項目が「氏名＿フリガナ」のような場合には、カタカナを出力するようなニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補を出力する。 In addition, it can be expected that information related to a katakana of a name such as “Tokitataro” is input to an item such as “name_phonetic”. That is, for an item such as “name_reading”, katakana “A-N” or the like is input, and other characters (kanji, numbers, etc.) are not input. Therefore, the second image recognition unit 11b extracts feature points from the image data using a neural network that outputs katakana when the item to be recognized is “name_reading”, and is smaller than the text. Output text candidates separated into prime factor texts.

また、「電話番号（ＴＥＬ）」のような項目には、「０３−３５８１−１１１１」等の電話番号に関する情報が入力されることが予想できる。つまり、「電話番号（ＴＥＬ）」のような項目は、数字「０〜９」とハイフン「−」が入力され、他の文字（漢字、ひらがな等）は入力されない。よって、第２画像認識部１１ｂは、認識する項目が「電話番号（ＴＥＬ）」のような場合には、数字「０〜９」とハイフン「−」を出力するようなニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補を出力する。 In addition, it is expected that information related to a telephone number such as “03-3581-1111” is input to an item such as “telephone number (TEL)”. That is, for items such as “telephone number (TEL)”, numbers “0-9” and hyphen “-” are input, and other characters (kanji, hiragana, etc.) are not input. Therefore, when the item to be recognized is “telephone number (TEL)”, the second image recognition unit 11b uses a neural network that outputs numbers “0-9” and a hyphen “-”. Feature points are extracted from the data, and text candidates separated into prime factor texts, which are factors smaller than the text, are output.

よって、文字認識装置１は、項目ごとに適した多層のニューラルネットワークを利用して画像データからテキスト候補を出力するので、効率的に高精度に文字認識を行うことができる。 Therefore, since the character recognition apparatus 1 outputs a text candidate from image data using the multilayer neural network suitable for every item, it can perform character recognition efficiently and with high precision.

ここで、文字認識装置１の動作の流れについて、図５に示すフローチャートを参照しながら説明する。 Here, the operation flow of the character recognition device 1 will be described with reference to the flowchart shown in FIG.

ステップＳ１において、第１画像認識部１１ａは、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する。 In step S1, the first image recognition unit 11a extracts feature points from the image data using a multilayer neural network, and outputs a plurality of text candidates and likelihoods.

ステップＳ２において、第２画像認識１１ｂは、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補を出力する。 In step S2, the second image recognition 11b extracts feature points from the image data using a multilayer neural network, and outputs text candidates separated into prime factor text that is a factor smaller than the text.

ステップＳ３において、自然言語処理部１２は、ステップＳ２の工程により出力されたテキスト候補に対して、隣接する素因子テキストを接合および切り離しを行って、組み合わせられる複数のパターンのテキストに形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する。 In step S3, the natural language processing unit 12 performs morphological analysis on the texts of a plurality of patterns to be combined by joining and separating the adjacent prime factor texts with respect to the text candidates output in the process of step S2. A plurality of text candidates and likelihoods that are likely from a natural language perspective are output.

ステップＳ４において、判断部１３は、ステップＳ１の工程により出力されたテキスト候補と、ステップＳ３の工程により出力されたテキスト候補とを対比する。本工程において、判断部１３は、所定以上の尤度（例えば、一致率が９８パーセント以上）のテキストを出力する。 In step S4, the determination unit 13 compares the text candidate output in the step S1 with the text candidate output in the step S3. In this step, the determination unit 13 outputs a text with a predetermined or higher likelihood (for example, a matching rate of 98% or higher).

よって、文字認識装置１は、様々な手書き文字が含まれている書類について、高精度に文字認識を行うことができる。 Therefore, the character recognition device 1 can perform character recognition with high accuracy for a document containing various handwritten characters.

また、本実施例では、主に、様々な手書き文字が含まれている書類について、高精度に文字認識を行うことができる文字認識装置１の構成と動作について説明したが、これに限られず、各構成要素を備え、様々な手書き文字が含まれている書類について、高精度に文字認識を行うための方法、およびプログラムとして構成されてもよい。 In the present embodiment, the configuration and operation of the character recognition device 1 capable of performing character recognition with high accuracy for documents mainly including various handwritten characters have been described. However, the present invention is not limited thereto. A document including each component and including various handwritten characters may be configured as a method and a program for performing character recognition with high accuracy.

さらに、文字認識装置１を構成する各機能を実現するためのプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、指示することによって実現してもよい。 Further, it is realized by recording a program for realizing each function constituting the character recognition device 1 on a computer-readable recording medium, causing the computer system to read the program recorded on the recording medium, and instructing it. May be.

具体的には、当該プログラムは、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、複数のテキスト候補と尤度を出力する第１画像認識工程と、多層のニューラルネットワークを用いて画像データから特徴点を抽出し、テキストよりも小さい因子である素因子テキストに分離されたテキスト候補と尤度を出力する第２画像認識工程と、第２画像認識工程により出力されたテキスト候補に対して、隣接する素因子テキストの接合および切り離しを行って、組み合わせられる複数のパターンのテキストに形態素解析を行い、自然言語的な観点から尤もらしい複数のテキスト候補と尤度を出力する自然言語処理工程と、第１画像認識工程により出力されたテキスト候補と、自然言語処理工程により出力されたテキスト候補とを対比する判断工程と、をコンピュータによって実現するためのプログラムである。また、判断工程は、所定以上の尤度のテキストを出力する。 Specifically, the program extracts a feature point from image data using a multi-layer neural network, outputs a plurality of text candidates and likelihoods, and uses a multi-layer neural network to generate an image. A feature point is extracted from the data, a text candidate separated into a prime factor text that is a factor smaller than the text, a second image recognition step for outputting the likelihood, and a text candidate output by the second image recognition step A natural language processing step that combines and separates adjacent prime factor texts, performs morphological analysis on multiple patterns of combined text, and outputs multiple text candidates and likelihoods that are likely from a natural language perspective And the text candidate output by the first image recognition step and the text candidate output by the natural language processing step. A program for realizing the Hisuru determining step, the computer. In the determination step, text having a likelihood equal to or higher than a predetermined value is output.

さらに、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータで読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Furthermore, the “computer system” here includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a hard disk built in the computer system.

さらに「コンピュータで読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでもよい。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Furthermore, “computer-readable recording medium” means that a program is dynamically held for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It is also possible to include one that holds a program for a certain time, such as a volatile memory inside a computer system that becomes a server or client in that case. Further, the program may be for realizing a part of the above-described functions, and may be capable of realizing the above-described functions in combination with a program already recorded in the computer system. .

１文字認識装置、１０入力部、１１画像認識部、１１ａ第１画像認識部、１１ｂ第２画像認識部、１２自然言語処理部、１３判断部、１４記憶部、１５設定部、１６出力部。 DESCRIPTION OF SYMBOLS 1 Character recognition apparatus, 10 input part, 11 image recognition part, 11a 1st image recognition part, 11b 2nd image recognition part, 12 natural language processing part, 13 judgment part, 14 memory | storage part, 15 setting part, 16 output part

Claims

A first image recognition unit for extracting feature points from image data using a multilayer neural network and outputting a plurality of text candidates and likelihoods;
A second image recognition unit that extracts feature points from the image data using a multi-layer neural network, and outputs a text candidate and likelihood separated into a prime factor text that is a factor smaller than the text;
For the text candidates output by the second image recognition unit, adjacent prime factor texts are joined and separated, and morpheme analysis is performed on each of a plurality of pattern texts to be combined. A natural language processing unit that outputs multiple text candidates and likelihoods,
A determination unit that compares the text candidate output by the first image recognition unit with the text candidate output by the natural language processing unit;
The determination unit is a character recognition device that outputs text having a likelihood equal to or higher than a predetermined value.

The determination unit compares the text candidate with the highest likelihood output by the first image recognition unit with the text candidate with the highest likelihood output by the natural language processing unit, and has a likelihood greater than or equal to a predetermined value. The character recognition device according to claim 1, wherein if the candidate is not obtained, the other candidates are compared in descending order of likelihood until a likelihood equal to or greater than a predetermined value is obtained.

The determination unit treats a text candidate having a high likelihood among text candidates output by the natural language processing unit with priority over a text candidate output by the first image recognition unit. The character recognition device described.

4. The determination unit according to claim 1, wherein if the difference between the two text candidates does not exceed a predetermined threshold as a result of the comparison, the determination unit determines that the text has a likelihood greater than or equal to a predetermined level and outputs the text. The character recognition device described in 1.

It has a setting part to set the time required for processing
The determination unit compares the text candidate with the highest likelihood output by the first image recognition unit with the text candidate with the highest likelihood output by the natural language processing unit, and has a likelihood greater than or equal to a predetermined value. 5. If any of the candidates is not obtained, the other candidates are compared in descending order of likelihood until a likelihood greater than or equal to a predetermined value is obtained within the time set by the setting unit. The character recognition device according to item.

6. The character recognition device according to claim 1, wherein the determination unit outputs a text for which a likelihood greater than or equal to a predetermined value is not obtained as a illegitimate character.

The first image recognition unit
It has a multilayer neural network suitable for each item by machine learning,
Search for items included in the image data, extract feature points from the image data using a multilayer neural network suitable for the items, and output a plurality of text candidates and likelihoods,
The second image recognition unit
It has a multilayer neural network suitable for each item by machine learning,
Search for items included in the image data, extract feature points from the image data using a multilayer neural network suitable for the items, and separate text candidates into prime factor texts that are factors smaller than the text The character recognition device according to claim 1, wherein the likelihood is output.

A first image recognition step of extracting feature points from image data using a multilayer neural network and outputting a plurality of text candidates and likelihoods;
A second image recognition step of extracting feature points from the image data using a multi-layer neural network, and outputting text candidates and likelihoods separated into prime factor text that is a factor smaller than the text;
From the text candidates output in the second image recognition step, adjacent prime factor texts are joined and separated, and a plurality of combined patterns of text are respectively subjected to morpheme analysis. A natural language processing step that outputs multiple text candidates and likelihoods,
A determination step of comparing the text candidate output by the first image recognition step with the text candidate output by the natural language processing step;
The determination step is a character recognition method for outputting text having a likelihood equal to or greater than a predetermined value.

A first image recognition step of extracting feature points from image data using a multilayer neural network and outputting a plurality of text candidates and likelihoods;
A second image recognition step of extracting feature points from the image data using a multi-layer neural network, and outputting text candidates and likelihoods separated into prime factor text that is a factor smaller than the text;
From the text candidates output in the second image recognition step, adjacent prime factor texts are joined and separated, and a plurality of combined patterns of text are respectively subjected to morpheme analysis. A natural language processing step that outputs multiple text candidates and likelihoods,
A character recognition program for realizing, by a computer, a determination step of comparing the text candidates output by the first image recognition step with the text candidates output by the natural language processing step,
The determination step is a character recognition program that outputs text having a likelihood equal to or higher than a predetermined value.