JP2021047797A

JP2021047797A - Machine learning equipment, machine learning methods, and programs

Info

Publication number: JP2021047797A
Application number: JP2019171493A
Authority: JP
Inventors: 美恵大串; Mie Ogushi; 貴広馬場; Takahiro Baba; 陽太 ▲高▼岡; Yota Takaoka; 英雄寺田; Hideo Terada
Original assignee: Toppan Forms Co Ltd; Open Stream Inc
Current assignee: Open Stream Inc; Toppan Edge Inc
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2021-03-25

Abstract

【課題】学習済みモデルを用いて画像内の文字領域に属する画素と図形領域に属する画素との区別の精度を向上することが可能な機械学習装置、機械学習方法、及びプログラムを提供する。【解決手段】文字と幾何学的図形とを含む画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を推定する推定部と、前記画像に対応する教師データに基づき、前記推定部による推定結果を評価する評価部と、前記評価部による評価結果に基づき、前記画像における前記文字が含まれる所定の領域に存在する前記画素の前記要素種別を前記文字要素と判定することを前記推定部に学習させる学習制御部と、を備える、機械学習装置。【選択図】図１PROBLEM TO BE SOLVED: To provide a machine learning device, a machine learning method, and a program capable of improving the accuracy of distinguishing a pixel belonging to a character area and a pixel belonging to a graphic area in an image by using a trained model. SOLUTION: A pixel in an image including a character and a geometric figure is a character element indicating an element constituting the character, a geometric element indicating an element constituting the geometric figure, the character and the character. An estimation unit that estimates the element type that distinguishes whether it is a background element that indicates an element that constitutes a background that is not a geometric figure, and an evaluation that evaluates the estimation result by the estimation unit based on the teacher data corresponding to the image. A learning control unit that causes the estimation unit to learn to determine the element type of the pixel existing in a predetermined region including the character in the image as the character element based on the evaluation result of the unit and the evaluation unit. And, equipped with a machine learning device. [Selection diagram] Fig. 1

Description

本発明は、機械学習装置、機械学習方法、及びプログラムに関する。 The present invention relates to machine learning devices, machine learning methods, and programs.

近年、文書画像に対する文字認識により、当該文書画像における文字や幾何学的図形などを認識する技術が各種提案されている。 In recent years, various techniques for recognizing characters and geometric figures in a document image by character recognition have been proposed.

例えば、下記特許文献１には、機械学習を利用して文書画像の所定の領域の画素が文字を示す文字画素であるか否かを判定する技術が開示されている。当該技術では、文書画像を入力された機械学習モデルは、所定の領域内の画素が文字を示す画素である確率を示す文字確率に基づき、所定の領域が文字を示す領域であるか否かを判定する。 For example, Patent Document 1 below discloses a technique for determining whether or not a pixel in a predetermined region of a document image is a character pixel indicating a character by using machine learning. In this technique, a machine learning model in which a document image is input determines whether or not a predetermined area is a character-representing area based on a character probability indicating the probability that a pixel in a predetermined area is a character-representing pixel. judge.

特開２０１９−５７８０３号公報JP-A-2019-57803

しかしながら、判定の目的が大まかに文字、あるいはその他の領域（幾何学的図形・背景）なのかを判別することにある場合においては、特許文献１の技術では、誤判定してしまうことがある。特許文献１の技術では、文字を１画素単位の細かな粒度で種別を判定していくためであり、例えば、文字と形状が似ていて紛らわしい幾何学的図形の一部が文字であると誤判定される。これとは逆に、文字が幾何学的図形の一部であると誤判定されることもある。 However, when the purpose of the determination is to roughly determine whether the character is a character or another area (geometric figure / background), the technique of Patent Document 1 may make an erroneous determination. In the technique of Patent Document 1, the type of a character is determined with a fine particle size of one pixel unit. For example, it is mistaken that a part of a confusing geometric figure having a shape similar to that of a character is a character. It is judged. On the contrary, the character may be erroneously determined to be part of a geometric figure.

上述の課題を鑑み、本発明の目的は、学習済みモデルを用いて画像内の文字領域に属する画素と図形領域に属する画素との区別の精度を向上することが可能な機械学習装置、機械学習方法、及びプログラムを提供することにある。 In view of the above problems, an object of the present invention is a machine learning device and machine learning capable of improving the accuracy of distinguishing a pixel belonging to a character area and a pixel belonging to a graphic area in an image by using a trained model. To provide methods and programs.

上述の課題を解決するために、本発明の一態様に係る機械学習装置は、文字と幾何学的図形とを含む画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を推定する推定部と、前記画像に対応する教師データに基づき、前記推定部による推定結果を評価する評価部と、前記評価部による評価結果に基づき、前記画像における前記文字が含まれる所定の領域に存在する前記画素の前記要素種別を前記文字要素と判定することを前記推定部に学習させる学習制御部と、を備える。 In order to solve the above-mentioned problems, in the machine learning device according to one aspect of the present invention, whether the pixels in the image including the character and the geometric figure are character elements indicating the elements constituting the character, or geometry. An estimation unit that estimates the element type that distinguishes between a geometric element that indicates an element that constitutes a target figure and a background element that indicates an element that is not a character and a background that is not a geometric figure, and the image. An evaluation unit that evaluates the estimation result by the estimation unit based on the corresponding teacher data, and the element type of the pixel existing in a predetermined region including the character in the image based on the evaluation result by the evaluation unit. It includes a learning control unit that causes the estimation unit to learn to determine that it is a character element.

本発明の一態様に係る機械学習方法は、推定部が、文字と幾何学的図形とを含む画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を推定することと、評価部が、前記画像に対応する教師データに基づき、前記推定部による推定結果を評価することと、学習制御部が、前記評価部による評価結果に基づき、前記画像における前記文字が含まれる所定の領域に存在する前記画素の前記要素種別を前記文字要素と判定することを前記推定部に学習させることと、を含む。 In the machine learning method according to one aspect of the present invention, the estimation unit determines that the pixels in the image including the character and the geometric figure are character elements indicating the elements constituting the character, or the geometric figure is formed. Estimating the element type that distinguishes whether it is a geometric element indicating an element or a background element indicating an element constituting a background that is not a character or a geometric figure, and an evaluation unit corresponds to the image. Based on the teacher data, the estimation result by the estimation unit is evaluated, and the learning control unit evaluates the element of the pixel existing in a predetermined region including the character in the image based on the evaluation result by the evaluation unit. It includes having the estimation unit learn to determine the type as the character element.

本発明の一態様に係るプログラムは、コンピュータを、文字と幾何学的図形とを含む画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を推定する推定部と、前記画像に対応する教師データに基づき、前記推定部による推定結果を評価する評価部と、前記評価部による評価結果に基づき、前記画像における前記文字が含まれる所定の領域に存在する前記画素の前記要素種別を前記文字要素と判定することを前記推定部に学習させる学習制御部と、として機能させる。 In a program according to an aspect of the present invention, a computer indicates that a pixel in an image including a character and a geometric figure is a character element indicating an element constituting the character, or an element constituting the geometric figure. Based on the estimation unit that estimates the element type that distinguishes whether it is a geometric element or a background element that indicates an element that constitutes a background that is not a character or a geometric figure, and the teacher data corresponding to the image. Based on the evaluation unit that evaluates the estimation result by the estimation unit and the evaluation result by the evaluation unit, it is determined that the element type of the pixel existing in the predetermined region including the character in the image is determined as the character element. It functions as a learning control unit to be trained by the estimation unit.

本発明によれば、学習済みモデルを用いて画像内の文字領域に属する画素と図形領域に属する画素との区別の精度を向上することができる。 According to the present invention, it is possible to improve the accuracy of distinguishing between the pixels belonging to the character area and the pixels belonging to the graphic area in the image by using the trained model.

本発明の実施形態に係る機械学習装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the machine learning apparatus which concerns on embodiment of this invention. 同実施形態に係る各種データの入出力関係を示す図である。It is a figure which shows the input / output relation of various data which concerns on this embodiment. 同実施形態に係る機械学習装置における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing in the machine learning apparatus which concerns on this embodiment.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜機械学習装置の構成例＞
まず、図１と図２を参照しながら、本実施形態に係る機械学習装置について説明する。図１は、本発明の実施形態に係る機械学習装置１０の構成例を示すブロック図である。図２は、本発明の実施形態に係る各種データの入出力関係を示す図である。 <Configuration example of machine learning device>
First, the machine learning device according to the present embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram showing a configuration example of the machine learning device 10 according to the embodiment of the present invention. FIG. 2 is a diagram showing an input / output relationship of various data according to the embodiment of the present invention.

機械学習装置１０は、機械学習に用いられる学習済みモデルを生成する機能を有する装置である。以下では、機械学習により画像における画素ごとの要素種別を判定する際に用いられる学習済みモデルを機械学習装置１０が生成する例について説明する。 The machine learning device 10 is a device having a function of generating a trained model used for machine learning. Hereinafter, an example in which the machine learning device 10 generates a trained model used when determining the element type for each pixel in an image by machine learning will be described.

要素種別とは、画像における各画素が如何なる種類を構成する要素であるかを示す情報であり、文字要素、線分要素、及び背景要素のいずれかを示す情報である。文字要素は、画素が画像における文字領域を構成する要素であることを示す。線分要素は、画素が画像における線分を構成する要素であることを示す。背景要素は、画素が画像における背景（線分ではなく、且つ文字領域ではないもの）を構成する要素であることを示す。ここで、線分要素は「幾何学要素」の一例である。 The element type is information indicating what kind of element each pixel in the image constitutes, and is information indicating any one of a character element, a line segment element, and a background element. The character element indicates that the pixel is an element that constitutes a character area in the image. The line segment element indicates that the pixel is an element that constitutes a line segment in the image. The background element indicates that the pixel is an element that constitutes the background (not a line segment and not a character area) in the image. Here, the line segment element is an example of a "geometric element".

文字領域は、画像における文字が含まれる領域（所定の領域）である。文字領域には、文字以外に背景が含まれてもよい。なお、１つの文字領域には、画像における文字が１つ含まれてもよいし、複数含まれていてもよい。文字領域の形状は、一例として矩形が挙げられるが、かかる例に限定されない。文字領域のサイズは、例えば、文字のサイズに応じたサイズが設定される。なお、画像における文字が含まれない領域は、以下では、図形領域とも称される。図形領域は、例えば、文字領域以外の領域である。 The character area is an area (predetermined area) including characters in the image. The character area may include a background in addition to the characters. It should be noted that one character area may include one character in the image, or may include a plurality of characters in the image. The shape of the character area is, for example, a rectangle, but is not limited to such an example. The size of the character area is set, for example, according to the size of the character. The area in the image that does not include characters is also referred to as a graphic area below. The graphic area is, for example, an area other than the character area.

学習済みモデルは、画像における各画素が画像に示されている如何なる内容を構成する要素であるか判定する装置（以下、「判定装置」とも称される）に用いられ得る。例えば、判定装置は、学習済みモデルを用いて画像における画素が文字とそれ以外の要素との何れであるかを判定する。ここで、それ以外の要素とは、例えば、幾何学的図形である。幾何学的図形とは、幾何学的な図形であって、例えば、線、線分、一定条件を満たす状態で配置された記号の群等である。 The trained model can be used as a device (hereinafter, also referred to as a “determination device”) for determining what kind of content each pixel in the image constitutes in the image. For example, the determination device uses the trained model to determine whether a pixel in an image is a character or another element. Here, the other elements are, for example, geometric figures. A geometric figure is a geometric figure, for example, a line, a line segment, a group of symbols arranged in a state satisfying a certain condition, and the like.

画像は、線、文字、及び背景を含む画像である。線が組み合わされる、或いは線の一部が屈曲（或いは湾曲）されることで、罫線や枠線等が構成される場合もある。画像は、判定装置による判定の対象となる画像である。すなわち、画像は、「対象画像」の一例である。 An image is an image that includes lines, characters, and a background. Ruled lines, borders, and the like may be formed by combining lines or by bending (or bending) a part of the lines. The image is an image to be determined by the determination device. That is, the image is an example of a "target image".

学習済みモデルは、対象画像（入力データ）が入力されると、対象画像における画素の要素種別を判定する。学習済みモデルは、入力された対象画像の画素の要素種別を判定するために、あらかじめ学習データを用いて機械学習を行うことで生成される。 When the target image (input data) is input, the trained model determines the element type of the pixel in the target image. The trained model is generated by performing machine learning in advance using the training data in order to determine the element type of the pixel of the input target image.

学習済みモデルの生成では、例えば、教師有り学習が行われる。教師有り学習では、学習モデルに学習用のデータセットを用いた学習を行わせる。データセットは、入力データと、当該入力データと対応する教師データのセットである。 In the generation of the trained model, for example, supervised learning is performed. In supervised learning, a learning model is trained using a learning data set. A data set is a set of input data and teacher data corresponding to the input data.

入力データは、学習時の入力となるデータである。本実施形態に係る入力データは、文字と幾何学的図形とを含む画像（以下、「入力画像」とも称される）であり、画像情報を有する。画像情報は、画素ごとに、画像に関する情報が対応付けられた情報であり、例えば、画素ごとのグレースケール値が示された情報、或いは、画素ごとのＲＧＢ値が示された情報などである。 The input data is data that is input during learning. The input data according to the present embodiment is an image including characters and geometric figures (hereinafter, also referred to as “input image”), and has image information. The image information is information in which information about an image is associated with each pixel, and is, for example, information showing a grayscale value for each pixel, information showing an RGB value for each pixel, and the like.

教師データは、入力データに基づき出力される出力データの正解を示すデータである。本実施形態に係る教師データは、入力画像の各画素に要素種別を示す情報を対応付けた情報である。また、教師データには、文字領域であることを示す情報も対応付けられてよい。 The teacher data is data indicating the correct answer of the output data output based on the input data. The teacher data according to the present embodiment is information in which information indicating an element type is associated with each pixel of the input image. Further, the teacher data may be associated with information indicating that it is a character area.

（ＤＣＮＮの基本）
教師有り学習における学習済みモデルは、学習用のデータセットを用いてＤＣＮＮ（Deep Convolutional Neural Network）などのモデルを学習させることにより生成される。ＤＣＮＮは、Convolution（畳込み積分）層を主要部分に使用する深層形のニューラルネットワークである。画像認識においては、ＤＣＮＮにて入力層に２次元のConvolution層を使用することにより、着目画素とその近傍にある画素の双方の情報を加味した画像特徴情報を効率よく認識できる。さらに２次元Convolutionを重ねて多層化して適用することにより、着目画素の近傍だけでなく、より離れた画素の情報も加味した大域的な画像特徴情報も認識できることが知られている。 (Basics of DCNN)
A trained model in supervised learning is generated by training a model such as DCNN (Deep Convolutional Neural Network) using a data set for learning. DCNN is a deep neural network that uses the Convolution layer as the main part. In image recognition, by using a two-dimensional Convolution layer as an input layer in DCNN, it is possible to efficiently recognize image feature information in which information of both a pixel of interest and a pixel in the vicinity thereof is added. Further, it is known that by superimposing two-dimensional Convolutions and applying them in multiple layers, it is possible to recognize not only the information of pixels in the vicinity of the pixel of interest but also the global image feature information including the information of pixels farther away.

（ＤＣＮＮの学習）
Convolution層の計算は、数学的な線形変換式(y=<W,x>+b)で表現することができる。すなわち、これは微分可能な計算式である。微分可能な計算層は、誤差逆伝播法として知られているニューラルネットの教師有り学習の原理を使って、学習を実行することが可能である。 (Learning of DCNN)
The calculation of the Convolution layer can be expressed by a mathematical linear transformation formula (y = <W, x> + b). That is, this is a differentiable formula. The differentiable computational layer can perform learning using the principle of supervised learning of neural networks known as backpropagation.

ＤＣＮＮでは、ある層のユニットから、より深い層のユニットにデータが出力される際に、ユニット同士を接続するノードの結合係数に応じた重みＷ、及びバイアス成分ｂが付与されたデータが出力される。学習モデルは、入力されたデータ（入力データ）に対し、各ユニット間の演算を行い、出力層から出力データを出力する。 In DCNN, when data is output from a unit in a certain layer to a unit in a deeper layer, data with a weight W corresponding to the coupling coefficient of the node connecting the units and a bias component b is output. The node. The learning model performs operations between each unit on the input data (input data), and outputs the output data from the output layer.

学習の過程において、学習モデルに、学習用のデータセットの入力データを入力させる。学習モデルは、入力データに対して出力層から出力されるデータ（出力データ）が、学習用のデータセットの出力（教師データ）に近づくように、学習モデルのパラメータ（重みＷ及びバイアス成分ｂ）を調整することにより、学習モデルを学習させる。 In the process of learning, the learning model is made to input the input data of the data set for learning. In the learning model, the parameters (weight W and bias component b) of the learning model so that the data (output data) output from the output layer with respect to the input data approaches the output (teacher data) of the data set for training. The learning model is trained by adjusting.

例えば、ＤＣＮＮモデルのパラメータ（重みＷ、及びバイアス成分ｂ）の調整には、誤差逆伝播法が用いられる。誤差逆伝播法では、学習モデルの出力層から出力されるデータと、学習用のデータセットの出力との乖離度合いが、損失関数として表現される。ここでの乖離度合いには、任意の指標が用いられてよいが、例えば、誤差の二乗（二乗誤差）やクロスエントロピー等が用いられる。誤差逆伝播法では、出力層から入力層側に至る方向に、損失関数が最小となるように、重みＷとバイアス成分ｂの値を決定（更新）する。これにより学習モデルを学習させ、判定の精度を向上させる。 For example, the backpropagation method is used to adjust the parameters (weight W and bias component b) of the DCNN model. In the back-propagation method, the degree of deviation between the data output from the output layer of the learning model and the output of the training data set is expressed as a loss function. Any index may be used for the degree of dissociation here, and for example, the square of the error (square error), cross entropy, or the like is used. In the back-propagation method, the values of the weight W and the bias component b are determined (updated) so that the loss function is minimized in the direction from the output layer to the input layer side. As a result, the learning model is trained and the accuracy of the judgment is improved.

なお、学習モデルは、ＤＣＮＮに限定されることはない。学習モデルとして、例えば、決定木、階層ベイズ、ＳＶＭ（Support Vector Machine）などの手法が用いられてもよい。 The learning model is not limited to DCNN. As the learning model, for example, a method such as a decision tree, hierarchical Bayes, or SVM (Support Vector Machine) may be used.

機械学習装置１０は、学習済みモデルを生成する機能を実現するために、図１に示すように、推定部１１０、評価部１２０、及び学習制御部１３０を備える。 As shown in FIG. 1, the machine learning device 10 includes an estimation unit 110, an evaluation unit 120, and a learning control unit 130 in order to realize a function of generating a trained model.

（推定部１１０）
推定部１１０は、画像における画素の要素種別を推定する機能を有する。当該機能は、例えば、ＤＣＮＮのような機械学習によって実現される。すなわち、推定部１１０の内部の計算過程は、学習によって変化するパラメータ群であるモデルパラメータ１１２によって支配されている。学習開始直後においては、推定部１１０のモデルパラメータ１１２の初期値にランダム値などが設定されており、要素種別を文字要素と推定するようにパラメータが調整されていないため、誤りを多く含む推定結果を出力する。学習が進むにつれて、学習制御部１３０によりパラメータが修正され、推定の誤りが減少していく。十分に誤りが小さくなったら、モデルパラメータ１１２を決定させ学習を完了する。こうして得られたパラメータ群のことを『学習済みモデル』という。 (Estimating unit 110)
The estimation unit 110 has a function of estimating the element type of a pixel in an image. The function is realized by machine learning such as DCNN. That is, the calculation process inside the estimation unit 110 is governed by the model parameter 112, which is a parameter group that changes by learning. Immediately after the start of learning, a random value or the like is set as the initial value of the model parameter 112 of the estimation unit 110, and the parameter is not adjusted so as to estimate the element type as a character element. Is output. As the learning progresses, the learning control unit 130 corrects the parameters and reduces estimation errors. When the error is sufficiently small, the model parameter 112 is determined and the learning is completed. The parameter group obtained in this way is called a "trained model".

具体的に、推定部１１０は、モデルパラメータ１１２に基づき、要素種別の推定を行う。モデルパラメータ１１２には、例えば、文字領域における画素の要素種別を文字要素と推定するように、学習により決定したパラメータが設定されている。これにより、推定部１１０は、入力された画像において、要素種別が文字要素である画素を領域単位で推定することができる。 Specifically, the estimation unit 110 estimates the element type based on the model parameter 112. In the model parameter 112, for example, a parameter determined by learning is set so as to estimate the element type of the pixel in the character area as a character element. As a result, the estimation unit 110 can estimate the pixels whose element type is a character element in the input image in units of regions.

より具体的に、推定部１１０は、例えば、データセットの入力データである入力画像が有する画像情報をモデルパラメータ１１２へ入力することにより、モデルパラメータ１１２からの出力（要素種別）を取得する。モデルパラメータ１１２からの出力は、例えば、「文字要素である可能性が１２％、線分要素である可能性が８０％、背景要素である可能性が８％」など、各画素が要素種別のそれぞれである可能性を、確率（以下、「要素種別確率」とも称される）で示す情報である。推定部１１０は、学習モデルからの出力に基づいて、例えば、画素ごとの、最も高い確率で示される要素種別を、その画像における要素種別と推定する。 More specifically, the estimation unit 110 acquires the output (element type) from the model parameter 112 by inputting the image information contained in the input image, which is the input data of the data set, into the model parameter 112, for example. In the output from the model parameter 112, for example, each pixel has an element type such as "12% probability of being a character element, 80% probability of being a line segment element, and 8% probability of being a background element". It is information indicating each possibility by probability (hereinafter, also referred to as "element type probability"). Based on the output from the learning model, the estimation unit 110 estimates, for example, the element type indicated by the highest probability for each pixel as the element type in the image.

推定後、推定部１１０は、各画素の要素種別を示す情報を推定結果として出力する。例えば、推定部１１０は、推定した要素種別を示す整数値が画素ごとに示された画像を生成する。要素種別を示す整数値が画素ごとに示された画像は、以下では、「ラベル画像」と称される。なお、推定部１１０が生成するラベル画像は、以下では、「推定ラベル画像」と称される。また、教師データであるラベル画像は、以下では、「教師ラベル画像」と称される。 After the estimation, the estimation unit 110 outputs information indicating the element type of each pixel as an estimation result. For example, the estimation unit 110 generates an image in which an integer value indicating an estimated element type is shown for each pixel. An image in which an integer value indicating an element type is shown for each pixel is hereinafter referred to as a "label image". The label image generated by the estimation unit 110 is hereinafter referred to as an "estimated label image". Further, the label image which is the teacher data is hereinafter referred to as a "teacher label image".

また、要素種別を示す整数値は、以下では、「ラベル値」と称される。ラベル値には、任意の値が設定されてよく、各値には任意の定義が設定されてよい。本実施形態では、例えば、ラベル値に０〜２の値が設定され、０は背景を示すラベル値、１は図形を示すラベル値、２は文字領域を示すラベル値であると定義される。 Further, the integer value indicating the element type is hereinafter referred to as a "label value". An arbitrary value may be set for the label value, and an arbitrary definition may be set for each value. In the present embodiment, for example, a value of 0 to 2 is set for the label value, 0 is defined as a label value indicating a background, 1 is a label value indicating a figure, and 2 is a label value indicating a character area.

推定ラベル画像の生成後、推定部１１０は、生成した推定ラベル画像を評価部１２０へ入力する。 After generating the estimated label image, the estimation unit 110 inputs the generated estimated label image to the evaluation unit 120.

ここで、図２を参照して、推定部１１０におけるデータの入出力関係の例について説明する。本実施形態のデータセット２０は、図２に示すように、入力データである入力画像３０と対応する教師データである教師ラベル画像４０で構成される。 Here, an example of the data input / output relationship in the estimation unit 110 will be described with reference to FIG. As shown in FIG. 2, the data set 20 of the present embodiment is composed of an input image 30 which is input data and a teacher label image 40 which is teacher data corresponding to the input image 30.

入力画像３０は、「あいう」という文字と、矩形の枠である２つの幾何学的図形を含む画像である。当該入力画像３０の各画素における要素種別を整数値に置き換えた画像が教師ラベル画像４０である。教師ラベル画像４０の文字要素が対応付けられた画素には、ラベル値として２が格納されている。また、教師ラベル画像４０の線分要素が対応付けられた画素には、ラベル値として１が格納されている。さらに、教師ラベル画像４０の背景要素が対応付けられた画素には、ラベル値として０が格納されている。 The input image 30 is an image including the characters "Ai" and two geometric figures that are rectangular frames. The teacher label image 40 is an image in which the element type in each pixel of the input image 30 is replaced with an integer value. 2 is stored as a label value in the pixel to which the character element of the teacher label image 40 is associated. Further, 1 is stored as a label value in the pixel to which the line segment element of the teacher label image 40 is associated. Further, 0 is stored as a label value in the pixel to which the background element of the teacher label image 40 is associated.

図２に示すように、入力画像３０が推定部１１０へ入力されると、推定部１１０は、推定結果を示すラベル画像である推定ラベル画像５０を評価部１２０へ出力する。ここでの推定ラベル画像５０の定義は、上述した教師ラベル画像４０の定義と同様である。なお、図２に示す例は、モデルパラメータ１１２の学習が不十分な段階であるため、推定部１１０は、誤ったラベル値を含む推定ラベル画像５０を出力している。 As shown in FIG. 2, when the input image 30 is input to the estimation unit 110, the estimation unit 110 outputs the estimation label image 50, which is a label image showing the estimation result, to the evaluation unit 120. The definition of the estimated label image 50 here is the same as the definition of the teacher label image 40 described above. In the example shown in FIG. 2, since the learning of the model parameter 112 is insufficient, the estimation unit 110 outputs an estimated label image 50 including an erroneous label value.

（評価部１２０）
評価部１２０は、推定結果を評価する機能を有する。例えば、評価部１２０は、教師データに基づき、推定部１１０から入力される推定結果を評価する。具体的に、評価部１２０は、入力される教師ラベル画像と推定ラベル画像とを比較し、その乖離度合いを損失関数により表現した情報を評価値として算出する。ここでいう乖離度合いとは、例えば、教師ラベル画像と推定ラベル画像において、それぞれの画像の対応する位置における画素の要素種別が異なっている度合いである。 (Evaluation unit 120)
The evaluation unit 120 has a function of evaluating the estimation result. For example, the evaluation unit 120 evaluates the estimation result input from the estimation unit 110 based on the teacher data. Specifically, the evaluation unit 120 compares the input teacher label image with the estimated label image, and calculates the information expressing the degree of deviation by the loss function as the evaluation value. The degree of divergence referred to here is, for example, the degree to which the element types of pixels at the corresponding positions of the teacher label image and the estimated label image are different.

ここで、図２を参照して、評価部１２０におけるデータの入出力関係の例について説明する。図２に示すように、評価部１２０には、データセット２０の教師ラベル画像４０と、推定部１１０が生成した推定ラベル画像５０が入力される。評価部１２０は、入力された教師ラベル画像４０と推定ラベル画像５０に基づき、評価値を算出する。そして、評価部１２０は、算出した評価値を評価結果として学習制御部１３０へ入力する。 Here, an example of the data input / output relationship in the evaluation unit 120 will be described with reference to FIG. As shown in FIG. 2, the teacher label image 40 of the data set 20 and the estimated label image 50 generated by the estimation unit 110 are input to the evaluation unit 120. The evaluation unit 120 calculates an evaluation value based on the input teacher label image 40 and the estimated label image 50. Then, the evaluation unit 120 inputs the calculated evaluation value as an evaluation result to the learning control unit 130.

（学習制御部１３０）
学習制御部１３０は、機械学習装置１０における機械学習を制御する機能を有する。例えば、学習制御部１３０は、評価部１２０から入力される評価結果に基づき、推定部１１０のモデルパラメータ１１２に学習させる。この時、学習制御部１３０は、評価結果を示す評価値に基づき、モデルパラメータ１１２を修正することで、モデルパラメータ１１２に学習させる。具体的に、学習制御部１３０は、誤差逆伝播法により、損失関数により表現された評価値が最小となるように決定したパラメータの値でモデルパラメータ１１２を修正する。このように、学習制御部１３０は、モデルパラメータ１１２に学習させることで、推定部１１０によるモデルパラメータ１１２に基づく推定の精度を向上することができる。 (Learning control unit 130)
The learning control unit 130 has a function of controlling machine learning in the machine learning device 10. For example, the learning control unit 130 causes the model parameter 112 of the estimation unit 110 to learn based on the evaluation result input from the evaluation unit 120. At this time, the learning control unit 130 modifies the model parameter 112 based on the evaluation value indicating the evaluation result so that the model parameter 112 learns. Specifically, the learning control unit 130 modifies the model parameter 112 with the value of the parameter determined so that the evaluation value expressed by the loss function is minimized by the error back propagation method. In this way, the learning control unit 130 can improve the accuracy of estimation based on the model parameter 112 by the estimation unit 110 by training the model parameter 112.

具体的に、本実施形態の学習制御部１３０は、文字領域に存在する画素が、文字要素と判定されるように、推定部１１０に学習させる。例えば、推定部１１０が文字領域に存在する画素の文字要素の要素種別確率を他の要素種別確率よりも低く推定した場合、推定部１１０による推定結果は誤っているといえる。そこで、学習制御部１３０は、当該画素における文字要素の要素種別確率が他の要素種別確率よりも高くなるまで、モデルパラメータ１１２の修正を繰り返し行い、推定部１１０に繰り返し学習させる。 Specifically, the learning control unit 130 of the present embodiment causes the estimation unit 110 to learn so that the pixels existing in the character area are determined to be character elements. For example, when the estimation unit 110 estimates the element type probability of the character element of the pixel existing in the character area to be lower than the other element type probabilities, it can be said that the estimation result by the estimation unit 110 is incorrect. Therefore, the learning control unit 130 repeatedly modifies the model parameter 112 until the element type probability of the character element in the pixel becomes higher than the other element type probabilities, and causes the estimation unit 110 to repeatedly learn.

また、本実施形態の学習制御部１３０は、文字領域以外の画素が線分要素又は背景要素と判定されるように、推定部１１０に学習させる。例えば、学習制御部１３０は、点線が含まれる所定の領域に存在する画素が、線分要素と判定されるように、推定部１１０に学習させる。また、学習制御部１３０は、画素の濃度が縦方向或いは横方向に沿った境界線により変化する領域に存在する画素が、線分要素と判定されるように、推定部１１０に学習させる。また、学習制御部１３０は、文字要素及び線分要素ではない画素が、背景要素と判定されるように、推定部１１０に学習させる。学習制御部１３０による学習のさせ方は、上述した、文字領域に存在する画素が、文字要素と判定されるように学習させる方法と同様であるため、その説明を省略する。 Further, the learning control unit 130 of the present embodiment causes the estimation unit 110 to learn so that the pixels other than the character region are determined to be line segment elements or background elements. For example, the learning control unit 130 causes the estimation unit 110 to learn so that the pixels existing in the predetermined region including the dotted line are determined to be line segment elements. Further, the learning control unit 130 causes the estimation unit 110 to learn the pixels existing in the region where the pixel density changes depending on the boundary line along the vertical direction or the horizontal direction so as to be determined as a line segment element. Further, the learning control unit 130 causes the estimation unit 110 to learn pixels that are not character elements and line segment elements so that they are determined to be background elements. Since the method of learning by the learning control unit 130 is the same as the above-described method of learning so that the pixels existing in the character area are determined to be character elements, the description thereof will be omitted.

学習制御部１３０は、入力画像３０における文字領域に存在する画素の要素種別を文字要素と判定するようにモデルパラメータ１１２を修正して、推定部１１０に繰り返し推定させることにより、学習させる。これにより、モデルパラメータ１１２は、入力画像３０が入力された際に、入力画像における文字が含まれる文字領域に存在する画素を文字要素であると判定することができるようになる。 The learning control unit 130 modifies the model parameter 112 so as to determine the element type of the pixel existing in the character region in the input image 30 as a character element, and causes the estimation unit 110 to repeatedly estimate the learning. As a result, when the input image 30 is input, the model parameter 112 can determine that the pixels existing in the character area including the characters in the input image are character elements.

ここで、図２を参照して、学習制御部１３０におけるデータの入出力関係の例について説明する。図２に示すように、学習制御部１３０には、評価部１２０から評価値が入力される。学習制御部１３０は、入力された評価値に基づき、モデルパラメータ１１２のパラメータを修正する。 Here, an example of the data input / output relationship in the learning control unit 130 will be described with reference to FIG. As shown in FIG. 2, an evaluation value is input from the evaluation unit 120 to the learning control unit 130. The learning control unit 130 corrects the parameter of the model parameter 112 based on the input evaluation value.

なお、図２に示した推定部１１０による要素種別の推定から学習制御部１３０によるモデルパラメータ１１２の修正までの処理は、繰り返し行われ得る。即ち、モデルパラメータ１１２におけるパラメータの修正と学習が繰り返される。学習開始直後においては、推定部１１０は、誤りを多く含む推定結果を出力する。しかし、学習が進むにつれてモデルパラメータ１１２が修正され、推定結果に含まれる誤りは減少していく。即ち、学習が繰り返されることでモデルパラメータ１１２における推定の精度が向上する。よって、学習を繰り返すことで、推定部１１０は、モデルパラメータ１１２に基づく要素種別の推定の精度を向上することができる。 The process from the estimation of the element type by the estimation unit 110 shown in FIG. 2 to the modification of the model parameter 112 by the learning control unit 130 can be repeated. That is, the parameter correction and learning in the model parameter 112 are repeated. Immediately after the start of learning, the estimation unit 110 outputs an estimation result containing many errors. However, as the learning progresses, the model parameter 112 is modified, and the errors contained in the estimation result are reduced. That is, the accuracy of estimation in the model parameter 112 is improved by repeating the learning. Therefore, by repeating the learning, the estimation unit 110 can improve the accuracy of estimating the element type based on the model parameter 112.

また、学習を繰り返すことで、推定部１１０は、画素及びその周辺に存在する画素の配置パターン特徴に基づいて、当該画素の種別を推定するモデルパラメータ１１２を獲得することができる。 Further, by repeating the learning, the estimation unit 110 can acquire the model parameter 112 that estimates the type of the pixel based on the arrangement pattern characteristics of the pixel and the pixels existing around the pixel.

なお、学習を繰り返し、推定結果に含まれる誤りが十分に少なくなった場合、推定部１１０は、学習を完了する。そして、この学習の繰り返しにより得られたモデルパラメータ１１２が「学習済みモデル」として用いられる。 When the learning is repeated and the number of errors included in the estimation result is sufficiently reduced, the estimation unit 110 completes the learning. Then, the model parameter 112 obtained by repeating this learning is used as the "learned model".

以上より、学習制御部１３０は、推定部１１０に学習を繰り返させることで、ランダムに初期値が設定されたモデルパラメータ１１２から、画像における文字領域に存在する画素の要素種別を文字要素と判定する学習済みモデルを生成することができる。 From the above, the learning control unit 130 determines that the element type of the pixel existing in the character area in the image is a character element from the model parameter 112 in which the initial value is randomly set by causing the estimation unit 110 to repeat the learning. A trained model can be generated.

なお、モデルパラメータ１１２には、画像における文字領域に存在する画素の要素種別を文字要素と推定するための初期値があらかじめ設定されていてもよい。この場合、学習制御部１３０は、推定部１１０に学習を繰り返させることで、学習開始直後よりも文字要素の推定の精度を向上させたモデルパラメータ１１２を学習済みモデルとして生成することができる。 The model parameter 112 may be preset with an initial value for estimating the element type of the pixel existing in the character area in the image as a character element. In this case, the learning control unit 130 can generate the model parameter 112, which has improved the accuracy of estimating the character element as compared with immediately after the start of learning, as a trained model by causing the estimation unit 110 to repeat the learning.

＜処理の流れ＞
以上、機械学習装置１０の構成例について説明した。続いて、本実施形態に係る機械学習装置１０における処理の流れについて説明する。図３は、本発明の実施形態に係る機械学習装置１０における処理の流れを示すフローチャートである。以下では、１つのデータセットに基づくモデルパラメータの修正処理について説明する。 <Processing flow>
The configuration example of the machine learning device 10 has been described above. Subsequently, the flow of processing in the machine learning device 10 according to the present embodiment will be described. FIG. 3 is a flowchart showing a processing flow in the machine learning device 10 according to the embodiment of the present invention. The modification process of the model parameter based on one data set will be described below.

まず、機械学習装置１０の推定部１１０は、入力データである入力画像における画素の要素種別を推定し、推定結果を示す推定ラベル画像を生成する（Ｓ１０２）。生成後、推定部１１０は、生成した推定ラベル画像を評価部１２０へ入力する。 First, the estimation unit 110 of the machine learning device 10 estimates the element type of the pixel in the input image which is the input data, and generates an estimated label image showing the estimation result (S102). After generation, the estimation unit 110 inputs the generated estimated label image to the evaluation unit 120.

次いで、機械学習装置１０の評価部１２０は、教師データである教師ラベル画像に基づき、推定部１１０から入力された推定ラベル画像の評価を行い、評価値を算出する（Ｓ１０４）。算出後、評価部１２０は、算出した評価値を学習制御部１３０へ入力する。 Next, the evaluation unit 120 of the machine learning device 10 evaluates the estimated label image input from the estimation unit 110 based on the teacher label image which is the teacher data, and calculates the evaluation value (S104). After the calculation, the evaluation unit 120 inputs the calculated evaluation value to the learning control unit 130.

そして、機械学習装置１０の学習制御部１３０は、評価部１２０に入力された評価値に基づき、推定部１１０のモデルパラメータ１１２を修正する（Ｓ１０６）。 Then, the learning control unit 130 of the machine learning device 10 corrects the model parameter 112 of the estimation unit 110 based on the evaluation value input to the evaluation unit 120 (S106).

モデルパラメータ１１２の修正後、学習制御部１３０は、推定結果に含まれる誤りが十分に少なくなったか否かを判定する（Ｓ１０８）。 After modifying the model parameter 112, the learning control unit 130 determines whether or not the error included in the estimation result is sufficiently reduced (S108).

誤りが十分に少なくなった場合（Ｓ１０８／ＹＥＳ）、学習制御部１３０は、学習を完了し、処理を終了する。誤りが十分に少なくなっていない場合（Ｓ１０８／ＮＯ）、学習制御部１３０は、Ｓ１０２〜Ｓ１０６の処理を繰り返すことで、推定結果に含まれる誤りが十分に少なくなるまで推定部１１０に学習を繰り返させる。 When the number of errors is sufficiently small (S108 / YES), the learning control unit 130 completes the learning and ends the process. When the error is not sufficiently reduced (S108 / NO), the learning control unit 130 repeats the processing of S102 to S106, and repeats the learning to the estimation unit 110 until the error included in the estimation result is sufficiently reduced. Let me.

以上説明したように、本実施形態に係る機械学習装置１０は、まず、文字と幾何学的図形とを含む画像における画素の要素種別を推定する。次いで、機械学習装置１０は、画像に対応する教師データに基づき、推定結果を評価する。そして、機械学習装置１０は、評価結果に基づき、画像における文字領域に存在する画素の要素種別を文字要素と判定することを学習する。かかる構成により、機械学習装置１０は、要素種別が幾何学要素又は背景要素である画素を画素単位で判定し、要素種別が文字要素である画素を領域単位で判定するようになる。 As described above, the machine learning device 10 according to the present embodiment first estimates the element type of the pixel in the image including the character and the geometric figure. Next, the machine learning device 10 evaluates the estimation result based on the teacher data corresponding to the image. Then, the machine learning device 10 learns to determine the element type of the pixel existing in the character region in the image as the character element based on the evaluation result. With such a configuration, the machine learning device 10 determines pixels whose element type is a geometric element or a background element on a pixel-by-pixel basis, and determines pixels whose element type is a character element on a region-by-area basis.

これにより、機械学習装置１０は、画像における文字を構成する要素である画素と図形を構成する要素である画素を明確に区別する学習済みモデルを生成することができる。よって、機械学習装置１０は、学習済みモデルを用いて画像内の文字領域に属する画素と図形領域に属する画素との区別の精度を向上することができる。 As a result, the machine learning device 10 can generate a trained model that clearly distinguishes the pixels that are the elements that make up the characters in the image from the pixels that are the elements that make up the figure. Therefore, the machine learning device 10 can improve the accuracy of distinguishing the pixels belonging to the character area and the pixels belonging to the graphic area in the image by using the trained model.

以上、本発明の実施形態について説明した。なお、上述した実施形態における機械学習装置１０をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The embodiment of the present invention has been described above. The machine learning device 10 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. In that case, a program may be held for a certain period of time, such as a volatile memory inside a computer system serving as a server or a client. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、図面を参照してこの発明の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the gist of the present invention. It is possible to do.

１０機械学習装置
１１０推定部
１１２モデルパラメータ
１２０評価部
１３０学習制御部 10 Machine learning device 110 Estimating unit 112 Model parameter 120 Evaluation unit 130 Learning control unit

Claims

A pixel in an image containing a character and a geometric figure is a character element indicating an element that constitutes a character, a geometric element that indicates an element that constitutes a geometric figure, or a character and a geometric figure. An estimation unit that estimates the element type that distinguishes whether it is a background element that indicates an element that constitutes a background that is not
An evaluation unit that evaluates the estimation result by the estimation unit based on the teacher data corresponding to the image, and an evaluation unit.
Based on the evaluation result by the evaluation unit, a learning control unit that causes the estimation unit to learn to determine the element type of the pixel existing in a predetermined region including the character in the image as the character element.
A machine learning device equipped with.

The estimation unit estimates the element type of the pixel in the image based on the model parameters learned based on the evaluation result so as to estimate the element type of the pixel in the predetermined region as the character element. , The machine learning device according to claim 1.

The machine learning device according to claim 1 or 2, wherein the predetermined area is an area including the characters and the background.

Whether the pixel in the image containing the character and the geometric figure is a character element indicating an element constituting the character, or a geometric element indicating an element constituting the geometric figure, the character and the estimation unit. Estimating the element type that distinguishes whether it is a background element that indicates an element that constitutes a background that is not a geometric figure,
The evaluation unit evaluates the estimation result by the estimation unit based on the teacher data corresponding to the image.
To make the estimation unit learn that the learning control unit determines that the element type of the pixel existing in the predetermined region including the character in the image is the character element based on the evaluation result by the evaluation unit. When,
Machine learning methods, including.

Computer,
A pixel in an image containing a character and a geometric figure is a character element indicating an element that constitutes a character, a geometric element that indicates an element that constitutes a geometric figure, or a character and a geometric figure. An estimation unit that estimates the element type that distinguishes whether it is a background element that indicates an element that constitutes a background that is not
An evaluation unit that evaluates the estimation result by the estimation unit based on the teacher data corresponding to the image, and an evaluation unit.
Based on the evaluation result by the evaluation unit, a learning control unit that causes the estimation unit to learn to determine the element type of the pixel existing in a predetermined region including the character in the image as the character element.
A program that functions as.