JP7316771B2

JP7316771B2 - Learning device, parameter creation method, neural network, and information processing device using the same

Info

Publication number: JP7316771B2
Application number: JP2018170893A
Authority: JP
Inventors: 晃一丹治; 敦史野上; 裕輔御手洗
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2023-07-28
Anticipated expiration: 2038-09-12
Also published as: JP2020042664A

Description

本発明は、学習装置、パラメータの作成方法、ニューラルネットワーク及びこれを用いた情報処理装置に関し、より詳細には、例えば画像認識技術に関する。 The present invention relates to a learning device, a method of creating parameters, a neural network, and an information processing device using the same, and more particularly to image recognition technology, for example.

学習データを用いて階層型ネットワークの学習を行うことで、入力データを識別する識別器を生成する方法が知られている。一方、階層型ネットワークの層数が増えるにつれ、いわゆる勾配消失問題（重み係数の更新に必要なデルタを逆伝播させる際にデルタが消失又は発散してしまう）が顕在化し、学習の進行に障害が生じうることが知られている。 A method of generating a discriminator for discriminating input data by performing hierarchical network learning using learning data is known. On the other hand, as the number of layers in a hierarchical network increases, the so-called vanishing gradient problem (deltas disappear or diverge when backpropagating the deltas needed to update the weighting coefficients) becomes apparent, impeding the progress of learning. known to occur.

このような問題に対処するための方法として、deep supervisionと呼ばれる、ネットワークの中間層においても誤差評価及び誤差逆伝播を行う方法（以下、サイドアウト学習と呼ぶ）が知られている（非特許文献１）。また、画像の特徴量を抽出するように階層型ネットワークの学習を行うことに加えて、特定の特徴が存在する場合に特定のニューロンが活動するように学習を行うことにより、特徴に応じた的確な特徴量抽出を可能とする方法も知られている（特許文献１）。 As a method for coping with such a problem, a method called deep supervision, in which error evaluation and error backpropagation are performed even in the middle layer of the network (hereinafter referred to as side-out learning), is known (non-patent literature 1). In addition to training the hierarchical network to extract the feature values of an image, we also learned to activate specific neurons when specific features are present. There is also known a method that enables feature quantity extraction (Patent Document 1).

特開２０１６－３１７４６号公報JP 2016-31746 A

Xie, S., Tu, Z. "Holistically-nested edge detection" ICCV, 1395-1403 (2015)Xie, S., Tu, Z. "Holistically-nested edge detection" ICCV, 1395-1403 (2015)

しかしながら、非特許文献１の方法においては、中間層からの出力に対する誤差評価の精度が低くなり、好ましい最終学習結果が得られない可能性があることが見出された。 However, in the method of Non-Patent Document 1, it was found that the error evaluation accuracy for the output from the intermediate layer is low, and there is a possibility that a favorable final learning result cannot be obtained.

本発明は、ニューラルネットワークの学習をより効果的に行うことを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to make neural network learning more effective.

本発明の目的を達成するために、例えば、本発明の学習装置は以下の構成を備える。すなわち、
画像の各画素の属性判定処理に用いられるニューラルネットワークの学習を行う学習装置であって、
単一の学習用入力データに対応する、ニューラルネットワークからの２以上の異なる出力のそれぞれについての教師データを設定する設定手段と、
前記学習用入力データを前記ニューラルネットワークに入力して得られる、前記２以上の異なる出力のそれぞれと、前記出力に対応する教師データと、の誤差に基づいて、前記ニューラルネットワークの学習を行う学習手段と、を備え、
前記２以上の異なる出力のそれぞれについての教師データは、前記単一の学習用入力データに対応する基本教師データに対するそれぞれ異なる変形処理又はフィルタ処理の結果を示し、前記基本教師データは、前記単一の学習用入力データが示す各画素の属性を示す画像データであることを特徴とする。 In order to achieve the object of the present invention, for example, the learning device of the present invention has the following configuration. i.e.
A learning device for learning a neural network used for attribute determination processing of each pixel of an image ,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
The training data for each of the two or more different outputs indicates the results of different deformation processing or filtering processing on the basic training data corresponding to the single learning input data, and the basic training data is the single learning input data. is image data indicating the attribute of each pixel indicated by the input data for learning .

ニューラルネットワークの学習をより効果的に行うことができる。 Neural network learning can be performed more effectively.

一実施形態に係る学習装置の一例を示す機能構成図。1 is a functional configuration diagram showing an example of a learning device according to an embodiment; FIG. 一実施形態に係るパラメータ生成方法の一例を示すフローチャート。4 is a flowchart illustrating an example of a parameter generation method according to one embodiment; サイドアウト学習を行う階層型ネットワークの一例を示す模式図。Schematic diagram showing an example of a hierarchical network that performs side-out learning. サイドアウト出力とＧＴの関係を説明するための図。FIG. 4 is a diagram for explaining the relationship between side-out output and GT; 従来技術に従ってサイドアウト学習を行う場合の課題を説明するための図。FIG. 10 is a diagram for explaining a problem when side-out learning is performed according to the conventional technology; 一実施形態に係る適合的ＧＴの生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating adaptive GTs according to one embodiment; 一実施形態に係る適合的ＧＴの生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating adaptive GTs according to one embodiment; 一実施形態に係る適合的ＧＴの生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating adaptive GTs according to one embodiment; 一実施形態に係る適合的ＧＴの生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating adaptive GTs according to one embodiment; 一実施形態に係る適合的ＧＴの生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating adaptive GTs according to one embodiment; 一実施形態で用いられるコンピュータの概略ブロック図。1 is a schematic block diagram of a computer used in one embodiment; FIG.

以下、本発明の実施形態について、フローチャートと図面とを参照しながら具体的に説明する。なお、以下の具体例は本発明に係る実施形態の一例ではあるが、本発明は以下の具体的形態に限定されるものではない。本発明は、学習データを用いた階層型ネットワーク（以下、ニューラルネットワーク又は単にネットワークと呼ぶことがある）の学習に適用することができ、階層型ネットワークの学習が行われるいかなる手法にも適用可能である。 Hereinafter, embodiments of the present invention will be specifically described with reference to flowcharts and drawings. In addition, although the following specific examples are examples of embodiments according to the present invention, the present invention is not limited to the following specific forms. INDUSTRIAL APPLICABILITY The present invention can be applied to learning a hierarchical network (hereinafter sometimes referred to as a neural network or simply a network) using learning data, and can be applied to any method for learning a hierarchical network. be.

図１は、実施形態１に係る学習装置１００の機能構成の一例を示す。学習装置１００は、階層型ネットワークの学習を行う。基本データ記憶部１０１は、学習に用いる基本学習データを保持する。学習データとは、ネットワークの学習に用いられる教師データ（以下、ＧＴと呼ぶことがある）である。本実施形態においては、学習用入力データと、学習用入力データに対する判定結果を示す教師データ（学習データ）と、を用いてネットワークの学習が行われる。 FIG. 1 shows an example of the functional configuration of a learning device 100 according to the first embodiment. The learning device 100 learns a hierarchical network. The basic data storage unit 101 holds basic learning data used for learning. Learning data is teacher data (hereinafter sometimes referred to as GT) used for network learning. In the present embodiment, learning of the network is performed using input data for learning and teacher data (learning data) indicating determination results for the input data for learning.

例えば、一実施形態においては、ネットワークを用いて画像の各画素についての属性判定（ラベリング）を行うことができる。すなわち、入力データとして画像データをネットワークに入力すると、入力データに対する判定処理の結果として、画像データの各画素の属性情報（ラベル）が得られる。例えば、画像の輪郭抽出を行う具体的な一例において、入力データに対する判定処理の結果としては、入力データに対応する輪郭パターン（輪郭であるか否かを示す属性情報を画素値として有する画像）が得られる。このように、一実施形態において、入力データに対する処理結果は、入力データに対応する（輪郭パターンのような）線画パターンでありうる。 For example, in one embodiment, a network can be used to perform attribute determination (labeling) for each pixel of an image. That is, when image data is input to the network as input data, attribute information (label) of each pixel of the image data is obtained as a result of determination processing on the input data. For example, in a specific example of extracting the contour of an image, as a result of determination processing for input data, a contour pattern (an image having attribute information indicating whether or not it is a contour as a pixel value) corresponding to the input data is obtained. can get. Thus, in one embodiment, the result of processing the input data may be a line drawing pattern (such as a contour pattern) corresponding to the input data.

このような構成において、学習用入力データは画像データであり、学習データは学習用入力データの各画素についてのラベル（判定結果）を示すデータである。例えば、学習用入力データは、例えば文字又は図形等を含む画像でありうる。そして、画像の輪郭抽出を行う具体的な一例において、学習データは学習用入力データである画像中の輪郭を示す画像であり、例えばユーザ入力に従って作成されたものでありうる。基本データ記憶部は、学習データと組み合わせて、このような学習用入力データをさらに保持することができる。本明細書において、基本学習データ（基本教師データ）は設定部１０２による加工又は変形のような処理が行われる前の学習データ（教師データ）を指す。 In such a configuration, the input data for learning is image data, and the learning data is data indicating a label (determination result) for each pixel of the input data for learning. For example, the input data for learning can be an image including characters, graphics, or the like. In a specific example of extracting the contour of an image, the learning data is an image showing the contour in the image, which is the input data for learning, and can be created according to user input, for example. The basic data storage unit can further hold such learning input data in combination with learning data. In this specification, basic learning data (basic teacher data) refers to learning data (teacher data) before processing such as processing or transformation by the setting unit 102 is performed.

設定部１０２は、ネットワークの学習に用いる学習データを設定する。また、適合的データ記憶部１０３は、設定部１０２により設定された学習データを保持する。一実施形態において、設定部１０２は、基本学習データに対して加工、変形、又はフィルタ処理のような処理を行うことにより、学習データを生成する。設定部１０２は、このように生成した学習データを適合的データ記憶部１０３に格納することにより、ネットワークの学習に用いる学習データ（以下、適合的学習データ、適合的教師データ、又は適合的ＧＴと呼ぶことがある）を設定する。設定部１０２は、さらに、元の基本学習データを適合的データ記憶部１０３に格納してもよい。後述するように、設定部１０２は、単一の学習用入力データに対応する、階層型ネットワークからの２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を設定する。 The setting unit 102 sets learning data used for network learning. Also, the adaptive data storage unit 103 holds learning data set by the setting unit 102 . In one embodiment, the setting unit 102 generates learning data by processing, transforming, or filtering the basic learning data. The setting unit 102 stores the learning data generated in this way in the adaptive data storage unit 103 to obtain learning data used for network learning (hereinafter referred to as adaptive learning data, adaptive teacher data, or adaptive GT). (sometimes called). Setting section 102 may further store the original basic learning data in adaptive data storage section 103 . As will be described later, the setting unit 102 sets learning data (adaptive teacher data) for each of two or more different outputs from the hierarchical network corresponding to single learning input data.

学習部１０４は、適合的データ記憶部１０３に格納された学習データを読み込み、ネットワークの学習処理を行う。また、学習部１０４は、学習により得られた最終的な学習結果（例えば、ネットワークのパラメータ）を、学習結果記憶部１０５に格納する。階層型ネットワークの学習方法としては、公知の方法を用いることができる。例えば、階層型ネットワークにおける順伝搬計算の結果得られた出力値の誤差をこのネットワークにおいて逆伝播させることにより、ネットワークの結合状態に対応する重み係数その他のパラメータを反復的に更新することができる。後述するように、本実施形態において学習部１０４は、学習用入力データをネットワークに入力して得られる、２以上の異なる出力のそれぞれと、出力に対応する学習データ（適合的教師データ）と、の誤差に基づいて、階層型ネットワークの学習を行う。 The learning unit 104 reads learning data stored in the adaptive data storage unit 103 and performs network learning processing. Also, the learning unit 104 stores the final learning result (for example, network parameters) obtained by learning in the learning result storage unit 105 . A known method can be used as a learning method for the hierarchical network. For example, by back-propagating in this network the errors in the output values resulting from the forward propagation calculations in the hierarchical network, the weighting factors and other parameters corresponding to the connectivity states of the network can be iteratively updated. As will be described later, in the present embodiment, the learning unit 104 includes two or more different outputs obtained by inputting learning input data to a network, learning data corresponding to the outputs (adaptive teacher data), The hierarchical network is trained based on the error of

テストデータ記憶部１０６は、ネットワークの評価に用いるテストデータを保持する。評価部１０７は、テストデータを用いてネットワークの評価を行う。このようにして得られた学習後の階層型ネットワークは、後述するように、入力データに対応する２以上の異なる出力を与える。こうして得られた２以上の異なる出力の統合結果が、入力データに対する認識処理の結果を示す。 The test data storage unit 106 holds test data used for network evaluation. The evaluation unit 107 evaluates the network using test data. The hierarchical network after learning thus obtained provides two or more different outputs corresponding to the input data, as will be described later. The integration result of two or more different outputs obtained in this manner indicates the result of recognition processing for the input data.

図２は本実施形態に係る学習方法のフローチャートである。以下、このフローチャートに沿って説明する。ステップＳ２１０において設定部１０２は、基本データ記憶部１０１から基本学習データを読み込む。ステップＳ２２０において、設定部１０２は、基本学習データに基づいて、適合的学習データを設定する。ここで、設定部１０２は、階層型ネットワークの構造に基づいて、２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を設定する。設定される適合的学習データは、階層型ネットワークを形成するユニットの構成又はそれらの結合状態に依存しうる。以下では、一例として、サイドアウト学習（最終層の出力誤差だけではなく、中間層の出力誤差にも基づいて学習を行う手法であり、詳細は後述する）を行う場合について説明する。 FIG. 2 is a flow chart of the learning method according to this embodiment. Description will be made below along this flow chart. In step S<b>210 , the setting unit 102 reads basic learning data from the basic data storage unit 101 . In step S220, the setting unit 102 sets adaptive learning data based on the basic learning data. Here, the setting unit 102 sets learning data (adaptive teacher data) for each of two or more different outputs based on the structure of the hierarchical network. The adaptive learning data to be set can depend on the configuration of the units forming the hierarchical network or their connection state. In the following, as an example, a case of performing side-out learning (a method of performing learning based not only on the output error of the final layer but also on the output error of the intermediate layer, the details of which will be described later) will be described.

ステップＳ２３０において、学習部１０４は、ステップＳ２２０で設定された適合的学習データを用いて、階層的ネットワークの学習を行う。本実施形態で用いられるネットワークはサイドアウト（中間層からの出力）を有しており、このサイドアウトに基づいて判定結果を得ることができる。具体的な学習方法については後述する。 In step S230, the learning unit 104 performs hierarchical network learning using the adaptive learning data set in step S220. The network used in this embodiment has a side-out (output from the intermediate layer), and the determination result can be obtained based on this side-out. A specific learning method will be described later.

ステップＳ２４０において、学習部１０４は、ステップＳ２３０における学習を終了するか否かを判定する。例えば、学習部１０４は、ネットワークの学習結果が所定の基準に達した際に、学習を終了すると判定することができる。一例として、評価部１０７は、テストデータ記憶部１０６に格納されているテストデータ（評価用のデータ）を用いて、ネットワークの誤認識率を評価することができる。このテストデータは、例えば、基本データ記憶部１０１が格納しているデータとは異なる、学習用入力データと、学習用入力データに対する判定結果を示す教師データと、のセットでありうる。また、誤認識率は、評価に用いたテストデータ全体のうち、誤った認識結果が得られたテストデータの比率として定義することができる。そして、ネットワークの誤認識率が所定の閾値以下となっている場合に、学習部１０４は、学習を終了すると判定することができる。学習を終了しない場合、処理はステップＳ２３０に戻り、学習部１０４が再びネットワークの学習を行う。一方、学習を終了する場合、処理はステップＳ２５０に進み、ここで学習部１０４は最終的な学習結果（例えば、後述するようなネットワークの重みパラメータ及び中間層準出力の結合係数）を、学習結果記憶部１０５に格納する。 In step S240, the learning unit 104 determines whether or not to end the learning in step S230. For example, the learning unit 104 can determine to end learning when the learning result of the network reaches a predetermined criterion. As an example, the evaluation unit 107 can use the test data (evaluation data) stored in the test data storage unit 106 to evaluate the recognition error rate of the network. This test data can be, for example, a set of learning input data different from the data stored in the basic data storage unit 101 and teacher data indicating the determination result for the learning input data. In addition, the false recognition rate can be defined as the ratio of test data with false recognition results to all the test data used for evaluation. Then, when the recognition error rate of the network is equal to or less than a predetermined threshold, the learning unit 104 can determine that learning is finished. If the learning is not finished, the process returns to step S230, and the learning unit 104 learns the network again. On the other hand, when the learning ends, the process proceeds to step S250, where the learning unit 104 converts the final learning result (for example, the weight parameter of the network and the coupling coefficient of the intermediate stratified output as described later) to the learning result Stored in the storage unit 105 .

本実施形態に係る学習装置１００は、図１に示す機能構成を実現する装置によって実現することができる。例えば、学習装置１００は、各処理部を実現する専用のハードウェアを有していてもよい。一方で、一部又は全部の処理部が、コンピュータにより実現されてもよい。 The learning device 100 according to this embodiment can be implemented by a device that implements the functional configuration shown in FIG. For example, the learning device 100 may have dedicated hardware that implements each processing unit. On the other hand, part or all of the processing units may be implemented by a computer.

図１１は、学習装置１００又はその処理部として動作可能なコンピュータの基本構成を示す図である。図１１においてプロセッサ１１１０は、例えばＣＰＵであり、コンピュータ全体の動作をコントロールする。メモリ１１２０は、例えばＲＡＭであり、プログラム及びデータ等を一時的に記憶する。コンピュータが読み取り可能な記憶媒体１１３０は、例えばハードディスク又はＣＤ－ＲＯＭ等であり、プログラム及びデータ等を長期的に記憶する。本実施形態においては、記憶媒体１１３０が格納している、各部の機能を実現するプログラムが、メモリ１１２０へと読み出される。そして、プロセッサ１１１０が、メモリ１１２０上のプログラムに従って動作することにより、各部の機能が実現される。また、メモリ１１２０又は記憶媒体１１３０は、基本データ記憶部１０１、適合的データ記憶部１０３、学習結果記憶部１０５、又はテストデータ記憶部１０６のような記憶部として動作することもできる。 FIG. 11 is a diagram showing the basic configuration of a computer that can operate as the learning device 100 or its processing unit. A processor 1110 in FIG. 11 is, for example, a CPU, and controls the operation of the entire computer. The memory 1120 is, for example, a RAM, and temporarily stores programs, data, and the like. The computer-readable storage medium 1130 is, for example, a hard disk or CD-ROM, and stores programs, data, and the like for a long period of time. In this embodiment, a program that implements the function of each unit stored in the storage medium 1130 is read into the memory 1120 . The processor 1110 operates in accordance with the programs on the memory 1120 to implement the functions of each unit. The memory 1120 or storage medium 1130 can also act as a storage unit such as the basic data storage unit 101 , adaptive data storage unit 103 , learning result storage unit 105 , or test data storage unit 106 .

図１１において、入力インタフェース１１４０は外部の装置から情報を取得するためのインタフェースである。また、出力インタフェース１１５０は外部の装置へと情報を出力するためのインタフェースである。バス１１６０は、上述の各部を接続し、データのやりとりを可能とする。 In FIG. 11, an input interface 1140 is an interface for acquiring information from an external device. An output interface 1150 is an interface for outputting information to an external device. A bus 1160 connects the above units and enables data exchange.

（階層型ネットワークの構成及び学習方法）
以下、本実施形態において使用可能な階層型ネットワークの例及びステップＳ２３０で行われるネットワークの学習について説明する。図３は、階層型ネットワークの一例を示す。図３のネットワークは、３つの中間層群３０２，３０３，３０４によって構成されている。それぞれの中間層群の具体的な構成は特に限定されないが、例えば、畳み込み層、プーリング層、及びフルコネクト層のうち１以上の組み合わせにより構成されていてもよい。 (Hierarchical network configuration and learning method)
An example of a hierarchical network that can be used in this embodiment and the learning of the network performed in step S230 will be described below. FIG. 3 shows an example of a hierarchical network. The network of FIG. 3 is made up of three hidden layer groups 302 , 303 and 304 . Although the specific configuration of each intermediate layer group is not particularly limited, for example, it may be configured by a combination of one or more of a convolutional layer, a pooling layer, and a fully connected layer.

本実施形態において、階層型ネットワークからは、単一の入力データに対応する２以上の異なる出力が得られる。例えば、図３のネットワークにおいては、２以上の異なる階層から出力が得られる。すなわち、図３のネットワークにおいては、中間層群３０２，３０３，３０４のそれぞれから中間層準出力３０７，３０８，３０９が得られる。そして、中間層準出力３０７，３０８，３０９を統合することにより統合出力３０５が得られる。この統合出力３０５に基づいて、入力３０１に対する判定結果が得られる。すなわち、学習データを入力３０１として入力すると、中間層群３０２、中間層群３０３、中間層群３０４を経て、統合出力３０５が得られる。本実施形態では、一例として、中間層群３０２は２つの畳み込み層から、中間層群３０３は１つのプーリング層及び続く２つの畳み込み層から、中間層群３０４も１つのプーリング層及び続く２つの畳み込み層から構成されるものとする。 In this embodiment, the hierarchical network provides two or more different outputs corresponding to a single input data. For example, in the network of FIG. 3, outputs are available from two or more different layers. That is, in the network of FIG. 3, hidden layer sub-outputs 307, 308, and 309 are obtained from hidden layer groups 302, 303, and 304, respectively. An integrated output 305 is obtained by integrating the intermediate layer sub-outputs 307 , 308 , and 309 . Based on this integrated output 305, the determination result for the input 301 is obtained. That is, when learning data is input as an input 301, an integrated output 305 is obtained via an intermediate layer group 302, an intermediate layer group 303, and an intermediate layer group 304. FIG. In this embodiment, as an example, hidden layers 302 are made from two convolution layers, hidden layers 303 are made from one pooling layer followed by two convolution layers, and hidden layers 304 are also made from one pooling layer followed by two convolution layers. It shall consist of layers.

本実施形態において、ネットワークの学習は、それぞれの中間層群からのサイドアウト学習によって行われる。通常、階層型ネットワークを用いる場合、最終出力のみに対して誤差評価が行われ、そして誤差逆伝播法によってネットワークの学習が行われる。一方、サイドアウト学習においては、中間層群からの出力に対しても誤差評価が行われる。そして、誤差の情報を中間層群にも入力し、これを逆伝播させることができる。例えば、非特許文献１のＨＥＤ(Holistically-nested Edge Detection)は、階層型ネットワークを用いて輪郭抽出（入力対象画像中に含まれる物体の輪郭部分を抽出する）を行う方法を開示している。非特許文献１においては、サイドアウト学習を用いており、具体的には中間層部分においても学習データとの誤差評価を行い、誤差逆伝播法を用いてネットワークの学習を行っている。 In this embodiment, the training of the network is done by side-out learning from each hidden layer. Usually, when using hierarchical networks, error estimation is performed only on the final output, and the network is trained by error backpropagation. On the other hand, in side-out learning, error evaluation is also performed on the output from the hidden layer group. Then, error information can also be input to the hidden layers and backpropagated. For example, HED (Holistically-nested Edge Detection) in Non-Patent Document 1 discloses a method of performing contour extraction (extracting the contour portion of an object included in an input target image) using a hierarchical network. In Non-Patent Document 1, side-out learning is used, and more specifically, error evaluation with learning data is performed even in the intermediate layer, and network learning is performed using error backpropagation.

本実施形態の場合、それぞれの中間層群３０２，３０３，３０４から、中間層準出力３０７，３０８，３０９がサイドアウト学習用に出力される。そして、それぞれ中間層準出力３０７，３０８，３０９と学習データ（ＧＴ）との誤差である、中間層誤差３１０，３１１，３１２が算出される。ここで、中間層誤差３１０はｌ_ｓｉｄｅ ^１と、中間層誤差３１１はｌ_ｓｉｄｅ ^２と、中間層誤差３１２はｌ_ｓｉｄｅ ^３と、それぞれ表される。このように評価された中間層誤差３１０，３１１，３１２の総和を計算することにより、中間層全体での誤差評価値（式（１）のＬ_ｓｉｄｅ）が得られる。

In the case of this embodiment, hidden

layer sub-outputs

307, 308, and 309 are output from respective hidden

layer groups

302, 303, and 304 for side-out learning. Then, hidden

layer errors

310, 311, and 312, which are errors between the

hidden layer sub-outputs

307, 308, and 309 and the learning data (GT), respectively, are calculated. Here, the hidden layer error 310 is denoted as l _side ¹ , the hidden layer error 311 as l _side ² , and the hidden layer error 312 as l _side ³ , respectively. By calculating the sum of the

intermediate layer errors

310, 311, and 312 evaluated in this way, an error evaluation value (L _side in Equation (1)) for the entire intermediate layer is obtained.

誤差の評価方法は特に限定されない。例えば、ＧＴのラベル値が０と１の２値である場合には、式（２）に示されるようにクロスエントロピーを用いて中間層ｍの誤差評価値Ｌ_ｓｉｄｅ ^ｍを規定することができる。式（２）において、ｙ_ｊ ^ｍは中間層ｍの各画素の出力値を表す。Ｙ_＋ ^ｍは、中間層ｍに与えるＧＴのうちポジティブ（ラベル値が１）である領域を、Ｙ_－ ^ｍは中間層ｍに与えるＧＴのうちネガティブ（ラベル値が０）である領域を、それぞれ表す。そして、Σは全画素についての和を意味する。βはＧＴのうちポジティブなものとネガティブなものとの比率のアンバランスを補正する係数であり、例えば、ＧＴ全体の画素数に対するネガティブな領域の画素数の比率として定義することができる。この値βは、ＧＴ毎に算出され設定されてもよいし、全ＧＴに対して同じ値（例えば、各ＧＴについての値βの平均値）が設定されてもよい。

The error evaluation method is not particularly limited. For example, if the label value of GT is a binary value of 0 and 1, the cross entropy can be used to define the error evaluation value L _side ^m of the hidden layer m as shown in equation (2). In equation (2), y _j ^m represents the output value of each pixel in intermediate layer m. Y ₊ ^m represents a positive region (with a label value of 1) out of the GT given to the intermediate layer m, and Y ₋ ^m represents a negative region (with a label value of 0) out of the GT given to the intermediate layer m. show. Σ means the sum of all pixels. β is a coefficient for correcting an imbalance in the ratio of positive and negative GTs, and can be defined, for example, as the ratio of the number of pixels in the negative region to the number of pixels in the entire GT. This value β may be calculated and set for each GT, or the same value (for example, the average value of the values β for each GT) may be set for all GTs.

また、統合出力３０５は、入力データに対応する２以上の異なる出力を統合することにより得ることができる。例えば、中間層準出力３０７，３０８，３０９の線形和を求めることにより、中間層準出力３０７，３０８，３０９を重ね合わせることができる。そして、こうして得られた線形和に対してさらにシグモイド関数のような活性化関数σを作用させることにより、統合出力３０５を得ることができる。ここで、中間層準出力３０７をＡ_ｓｉｄｅ ^１と、中間層準出力３０８をＡ_ｓｉｄｅ ^２と、中間層準出力３０９をＡ_ｓｉｄｅ ^３と、それぞれ表すことができる。この場合、例えば式（３）に従うＹ_ｆｕｓｅを、統合出力３０５として得ることができる。統合出力３０５を得る際に用いる各中間層準出力３０７，３０８，３０９の重みも、学習により決定することができる。例えば、式（３）に示される線形和の結合係数ｈ_ｍも、学習により決定することができる。

Alternatively, integrated output 305 can be obtained by integrating two or more different outputs corresponding to the input data. For example, the hidden

layer sub-outputs

307, 308, 309 can be overlaid by taking a linear sum of the hidden

layer sub-outputs

307, 308, 309. FIG. Then, an integrated output 305 can be obtained by applying an activation function σ such as a sigmoid function to the linear sum thus obtained. Here, the hidden level output 307 can be represented as A _side ¹ , the hidden level output 308 as A _side ² , and the hidden level output 309 as A _side ³ , respectively. In this case, for example, Y _fuse according to equation (3) can be obtained as integrated output 305 . The weights of the

intermediate layer sub-outputs

307, 308, and 309 used to obtain the integrated output 305 can also be determined by learning. For example, the linear sum coupling coefficient h _m shown in equation (3) can also be determined by learning.

本実施形態では、統合出力３０５とＧＴとの誤差である、統合誤差３１３も評価される。例えば、式（４）に従って、統合出力Ｙ_ｆｕｓｅとＧＴのラベル値Ｙとの誤差であるＬ_ｆｕｓｅを、統合誤差３１３として得ることができる。式４においてＤｉｓｔ（）は、ＹとＹ_ｆｕｓｅとの誤差評価に用いる距離関数を意味し、この関数としては例えばクロスエントロピーを用いることができる。

In this embodiment, the integrated error 313, which is the error between the integrated output 305 and the GT, is also evaluated. For example, L _fuse , which is the error between the integrated output Y _fuse and the GT label value Y, can be obtained as the integrated error 313 according to equation (4). In Equation 4, Dist() means a distance function used for error evaluation between Y and Y _fuse , and cross entropy, for example, can be used as this function.

ネットワーク全体の誤差は、統合誤差３１３（Ｌ_ｆｕｓｅ）と、各中間層誤差３１０，３１１，３１２の総和（Ｌ_ｓｉｄｅ）と、にしたがって得ることができる。例えば、ネットワーク全体の誤差は、式（５）で示されるＬ_{ｔｏｔａｌ}でありうる。階層型ネットワーク内の各重みパラメータ及び上記中間層準出力の結合係数（ｈ_ｍ）は、このネットワーク全体の誤差（Ｌ_{ｔｏｔａｌ}）を最小化するように、学習によって決定することができる。

The overall network error can be obtained according to the integration error 313 (L _fuse ) and the sum of each

hidden layer error

310, 311, 312 (L _side ). For example, the overall network error can be L _total given in equation (5). Each weight parameter in the hierarchical network and the coupling coefficient (h _m ) of the above intermediate layer sub-outputs can be determined by learning so as to minimize the error (L _total ) of the entire network.

上記のような階層型ネットワークの構成及び学習方法は、例えば非特許文献１にも記載されている通りである。一方、本実施形態においては、中間層誤差３１０，３１１，３１２を得る際に、それぞれの中間層群３０２，３０３，３０４（又は中間層準出力３０７，３０８，３０９）に合わせて設定された、適合的学習データが用いられる。すなわち、中間層誤差３１０，３１１，３１２は、それぞれの中間層群３０２，３０３，３０４に合わせて設定された適合的ＧＴ３０６－１，３０６－２，３０６－３と、中間層準出力３０７，３０８，３０９と、の誤差として定義される。以下、この構成について説明する。 The configuration of the hierarchical network as described above and the learning method are as described in Non-Patent Document 1, for example. On the other hand, in this embodiment, when obtaining the hidden layer errors 310, 311, and 312, it is set according to the respective hidden layer groups 302, 303, and 304 (or the hidden layer sub-outputs 307, 308, and 309), Adaptive learning data is used. That is, the hidden layer errors 310, 311, and 312 are the adaptive GTs 306-1, 306-2, and 306-3 set in accordance with the respective hidden layer groups 302, 303, and 304, and the hidden layer standard outputs 307, 308 , 309 and . This configuration will be described below.

図４は、例えば階層型ネットワークを画像からの輪郭抽出に適用する場合における、ネットワークのサイドアウト学習について説明する図である。図４は、非特許文献１のように、同じＧＴ（基本学習データに相当）を用いて、各中間層準出力の誤差評価をする場合を、模式的に表している。図４は、統合出力３０５、及び中間層群３０２～３０４からの中間層準出力３０７～３０９と、ＧＴ３０６との関係を表す。 FIG. 4 is a diagram for explaining side-out learning of a network when applying a hierarchical network to contour extraction from an image, for example. FIG. 4 schematically shows a case where the same GT (corresponding to basic learning data) is used as in Non-Patent Document 1 to evaluate the error of each intermediate layer sub-output. FIG. 4 shows the relationship between the GT 306 and the integrated output 305 and the hidden sub-outputs 307-309 from the hidden layers 302-304.

畳み込みニューラルネットワークのような階層型ネットワークにおいては、通常、畳み込み層の後にプーリング層が配置される。プーリング層を配置することにより、畳み込み層で抽出された特徴の位置感度が低下し、プーリング層からの出力が位置変化に対するロバストネスを得ることができる。 In hierarchical networks such as convolutional neural networks, pooling layers are usually placed after the convolutional layers. By arranging the pooling layer, the position sensitivity of the features extracted by the convolutional layer is reduced, and the output from the pooling layer can obtain robustness against changes in position.

例えば、プーリング層においてストライド２の２×２ＭＡＸプーリングを行うと、プーリングにより２×２の４画素のうち最大値のみが出力される。上述のように、図３の例において中間層群３０３，３０４はそれぞれ１層のプーリング層を有している。したがって、例えば１２８×１２８サイズの学習用入力データである画像をネットワークに入力し、これらのプーリング層がストライド２の２×２ＭＡＸプーリングを行う場合、中間層群３０３からは６４×６４サイズの出力が得られる。また、中間層群３０４からは３２×３２サイズの出力が得られる。 For example, if 2×2 MAX pooling with stride 2 is performed in the pooling layer, the pooling will output only the maximum value among the 2×2 4 pixels. As described above, in the example of FIG. 3, the intermediate layer groups 303 and 304 each have one pooling layer. Therefore, for example, when an image that is training input data of 128×128 size is input to the network and these pooling layers perform 2×2 MAX pooling with stride 2, the intermediate layer group 303 outputs 64×64 size. can get. An output of 32×32 size is obtained from the intermediate layer group 304 .

一方、ＧＴ（基本学習データに相当）は通常、学習用入力データと同サイズの画像（例えば輪郭画像）である。したがって、中間層準出力をＧＴと比較して誤差評価するために、中間層準出力はＧＴと同じサイズの１２８×１２８サイズに拡大される。すると、図４に示されるように、中間層準出力における１画素が、誤差評価の段階では、中間層準出力３０８の場合には２×２のサイズに、中間層準出力３０９の場合には４×４のサイズに拡大される。したがって、例えば輪郭抽出の場合においては、中間層準出力３０７及びＧＴにおける輪郭線幅が１ピクセルサイズだったとしても、中間層準出力３０８の輪郭線幅は２ピクセルサイズに、中間層準出力３０９の輪郭線幅は４ピクセルサイズになる。したがって、誤差を評価する際には、中間層準出力３０８，３０９においては、線幅の違いによる誤差の過大評価が生じる可能性がある。 On the other hand, GT (corresponding to basic learning data) is usually an image (for example, a contour image) of the same size as the learning input data. Therefore, in order to compare the hidden layer reference output with GT for error evaluation, the hidden layer reference output is enlarged to 128×128 size, which is the same size as GT. Then, as shown in FIG. 4, in the stage of error evaluation, one pixel in the intermediate layer reference output has a size of 2×2 in the case of the intermediate layer reference output 308, and is reduced to 2×2 in the case of the intermediate layer reference output 309. Enlarged to 4x4 size. Therefore, in the case of contour extraction, for example, even if the contour line width in the intermediate layer output 307 and GT is 1 pixel size, the line width in the intermediate layer output 308 is 2 pixel size, and in the intermediate layer output 309 has a width of 4 pixels. Therefore, when estimating the error, there is a possibility of overestimation of the error due to the difference in line width in the intermediate layer sub-outputs 308 and 309 .

図５は、ＧＴと中間層準出力における線幅の相違によって、誤差が過大に評価される過程を模式的に示す。図５（Ａ）に示すように、中間層準出力３０７とＧＴ３０６において線幅の相違はないため、誤差評価においては、中間層群３０２から出力された輪郭パターンとＧＴ３０６のパターンとの相違が評価される。一方、図５（Ｂ）に示すように、中間層準出力３０８とＧＴ３０６との間には線幅の相違が存在するため、誤差評価においては、中間層群３０３から出力された輪郭パターンとＧＴ３０６のパターンとの相違の他に、線幅の相違に起因する誤差も評価される。さらに、図５（Ｃ）に示すように、中間層準出力３０９とＧＴ３０６との間にはより大きな線幅の相違が存在するため、線幅の相違に起因する誤差はより大きくなる。 FIG. 5 schematically shows the process of overestimating the error due to the difference in line width between the GT and the intermediate layer output. As shown in FIG. 5A, since there is no line width difference between the intermediate layer output 307 and the GT 306, the difference between the outline pattern output from the intermediate layer group 302 and the pattern of the GT 306 is evaluated in the error evaluation. be done. On the other hand, as shown in FIG. 5B, since there is a difference in line width between the intermediate layer output 308 and the GT 306, in the error evaluation, the outline pattern output from the intermediate layer group 303 and the GT 306 Errors due to line width differences are also evaluated, as well as differences from the pattern of . Furthermore, as shown in FIG. 5C, there is a larger linewidth difference between the intermediate layer output 309 and the GT 306, so the error due to the linewidth difference is larger.

図５（Ｄ）は、誤差が過大に評価される様子を模式的に示す。このように、中間層準出力に示される中間層群から出力された輪郭パターン５１０と、ＧＴ５２０と、の間に線幅の相違が存在する場合には、ＧＴに示される輪郭線の両側に正しく誤差評価がなされない領域５３０が存在する。輪郭抽出の問題において正しく評価したいのは、出力とＧＴとのパターンの相違であるため、線幅の相違のようなそれ以外の誤差が評価されてしまうと好ましい最終学習結果が得られない可能性が生じる。 FIG. 5D schematically shows how the error is overestimated. In this way, when there is a difference in line width between the contour pattern 510 output from the intermediate layer group shown in the intermediate layer reference output and the GT 520, correct There is a region 530 where no error evaluation is made. What we want to evaluate correctly in the contour extraction problem is the pattern difference between the output and the GT. occurs.

非特許文献１には、中間層準出力の誤差に基づく学習に適した学習データを、基本学習データから生成するような処理は記載されていない。そして、最終的な統合出力の誤差に基づく学習データ（基本学習データに相当）と同一の学習データを用いて、各中間層準出力に基づくサイドアウト学習を行う場合、中間層準出力の誤差評価性能が低下し、学習の効率が低下する可能性があった。 Non-Patent Document 1 does not describe a process for generating learning data suitable for learning based on errors in intermediate layer standard outputs from basic learning data. Then, when performing side-out learning based on each hidden layer reference output using the same training data (equivalent to basic learning data) based on the error of the final integrated output, the error evaluation of the hidden layer reference output Performance could be degraded and learning less efficient.

このため、本実施形態において、設定部１０２は、単一の学習用入力データに対応する、ネットワークからの２以上の異なる出力のそれぞれについての教師データ（適合的ＧＴ）を設定する。例えば設定部１０２は、それぞれの中間層群（又は中間層準出力）ごとに適合的ＧＴを設定することができる。このような構成により、線幅のような他の影響を低減して本来評価したい誤差をより正しく評価することが可能となる。その結果、サイドアウト学習の収束性及び得られる階層型ネットワークの性能向上を図ることができる。 Therefore, in this embodiment, the setting unit 102 sets teacher data (adaptive GT) for each of two or more different outputs from the network, corresponding to a single input data for learning. For example, the setting unit 102 can set an adaptive GT for each hidden layer group (or hidden layer reference output). With such a configuration, it is possible to reduce other influences such as line width and more accurately evaluate errors that are originally intended to be evaluated. As a result, it is possible to improve the convergence of side-out learning and the performance of the resulting hierarchical network.

このために、設定部１０２は、それぞれの中間層群ごとに、元の基本学習データを加工して得られた適合的学習データを設定することができる。例えば、設定部１０２は、中間層群ごとに、中間層準出力における線幅と誤差評価に用いる適合的ＧＴの線幅とが近くなるように、又は少なくとも誤差評価が過大に行われないように、適合的学習データを生成することができる。このようにして、設定部１０２は、それぞれの中間層準出力に対して適切な誤差評価が行われるように、学習データを生成することができる。 For this reason, the setting unit 102 can set adaptive learning data obtained by processing the original basic learning data for each intermediate layer group. For example, for each hidden layer group, the setting unit 102 is set so that the line width in the hidden layer standard output and the line width of the adaptive GT used for error evaluation are close to each other, or at least so that the error evaluation is not performed excessively. , can generate adaptive training data. In this way, the setting unit 102 can generate learning data so that appropriate error evaluation is performed for each intermediate stratified output.

一方で、基本データ記憶部１０１は、単一の学習用入力データに対応する、階層型ネットワークからの２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を格納していてもよい。この場合、設定部１０２は、基本データ記憶部１０１から適合的学習データを取得して適合的データ記憶部１０３に格納してもよい。 On the other hand, the basic data storage unit 101 may store learning data (adaptive teacher data) for each of two or more different outputs from the hierarchical network corresponding to single learning input data. . In this case, setting section 102 may acquire adaptive learning data from basic data storage section 101 and store it in adaptive data storage section 103 .

（適合的学習データの設定方法）
以下、ステップＳ２２０における適合的学習データの設定方法の具体例を説明する。 (Method of setting adaptive learning data)
A specific example of a method for setting adaptive learning data in step S220 will be described below.

図６は、本実施形態における適合的学習データの設定方法を、図３の階層的ネットワークを用いる場合について説明する図である。図６（Ａ）は、中間層準出力３０７に示される輪郭パターンと、中間層準出力３０７の誤差評価用のＧＴ６０１に示されるポジティブ領域（輪郭パターンを表し、以下単にＧＴと呼ぶことがある）と、を示す。同様に、図６（Ｂ）及び図６（Ｃ）は、中間層準出力３０８，３０９に示される輪郭パターンと、中間層準出力３０８，３０９の誤差評価用のＧＴ６０２，６０３に示される輪郭パターンと、を示す。既に説明したように、中間層準出力３０８，３０９の解像度と、ＧＴの解像度とが一致するように、中間層準出力３０８，３０９はＧＴに合わせて拡大される。これに合わせて、中間層準出力３０８，３０９に示される輪郭パターンの線幅も大きくなる。 FIG. 6 is a diagram for explaining a method of setting adaptive learning data according to the present embodiment in the case of using the hierarchical network of FIG. FIG. 6A shows the contour pattern shown in the intermediate layer reference output 307 and the positive area (representing the contour pattern, hereinafter simply referred to as GT) indicated by the GT 601 for error evaluation of the intermediate layer reference output 307. and indicate. Similarly, FIGS. 6(B) and 6(C) show the contour patterns shown in the intermediate layer sub-outputs 308 and 309 and the contour patterns shown in the GTs 602 and 603 for error evaluation of the intermediate layer sub-outputs 308 and 309. and indicate. As already explained, the hidden layer standard outputs 308 and 309 are expanded to match the GT so that the resolution of the hidden layer standard outputs 308 and 309 matches the resolution of the GT. Correspondingly, the line width of the contour pattern shown in the intermediate layer sub-outputs 308 and 309 also increases.

したがって、設定部１０２は、２以上の異なる出力についての教師データを、２以上の異なる出力の解像度に基づいて設定することができる。例えば、設定部１０２は、中間層準出力３０７～３０９用のＧＴ６０１～６０３を、中間層準出力３０７～３０９の解像度に基づいて設定することができる。本実施形態において、設定部１０２は、２以上の異なる出力のそれぞれに対応する幅を有する線画パターンを、２以上の異なる出力についての教師データとして設定する。例えば、設定部１０２は、中間層準出力３０７～３０９の解像度に対応する幅を有する線画パターンを示すＧＴ６０１～６０３を、中間層準出力３０７～３０９の評価用に設定することができる。 Therefore, the setting unit 102 can set teacher data for two or more different outputs based on the resolutions of the two or more different outputs. For example, the setting unit 102 can set the GTs 601-603 for the intermediate layer output 307-309 based on the resolution of the intermediate layer output 307-309. In this embodiment, the setting unit 102 sets a line drawing pattern having a width corresponding to each of two or more different outputs as teacher data for two or more different outputs. For example, the setting unit 102 can set GTs 601 to 603 representing line drawing patterns having widths corresponding to the resolutions of the intermediate layer standard outputs 307 to 309 for evaluation of the intermediate layer standard outputs 307 to 309 .

具体的には、中間層準出力とＧＴに示される、輪郭を表す線画パターンの線幅が近くなるように、中間層準出力３０８，３０９用のＧＴ６０２，６０３の線幅が大きくされる。より具体的には、図６の例において、中間層準出力３０７，３０８，３０９用のＧＴ６０１，６０２，６０３に示される輪郭パターンの線幅は、それぞれ１，２，４である。このように設定部１０２は、中間層準出力の解像度が大きい（画素数が多い）場合と比較して、解像度が小さい（画素数が少ない）場合に、線画パターンの線幅が大きくなるように、適合的ＧＴを設定することができる。例えば設定部１０２は、適合的ＧＴに示される線画パターンの線幅が、（基本学習データの解像度／中間層準出力の解像度）にほぼ一致するように、適合的ＧＴを設定することができる。 Specifically, the line widths of the GTs 602 and 603 for the intermediate layer reference outputs 308 and 309 are increased so that the line widths of the contour-representing line drawing patterns shown in the intermediate layer reference outputs and GT are close to each other. More specifically, in the example of FIG. 6, the line widths of the contour patterns shown in GTs 601, 602, 603 for intermediate layer sub-outputs 307, 308, 309 are 1, 2, 4, respectively. In this way, the setting unit 102 sets the line width of the line drawing pattern to be larger when the resolution of the intermediate layer output is low (the number of pixels is small) compared to when the resolution is high (the number of pixels is large). , adaptive GT can be set. For example, the setting unit 102 can set the adaptive GT such that the line width of the line drawing pattern shown in the adaptive GT substantially matches (the resolution of the basic training data/the resolution of the intermediate layered output).

設定部１０２は、基本学習データを用いて、中間層準出力の誤差評価用の適合的学習データを生成することができる。本実施形態の場合、設定部１０２は、学習用入力データに対応する線画パターンである基本教師データを用いて適合的学習データを生成することができる。設定部１０２は、例えば図９（Ｄ）のフローチャートに従って、中間層準出力３０７～３０９の誤差評価用の適合的学習データ（ＧＴ９１１～９１３）を生成することができる。 The setting unit 102 can use the basic learning data to generate adaptive learning data for error evaluation of the intermediate layer standard output. In the case of this embodiment, the setting unit 102 can generate adaptive learning data using basic teacher data, which is a line drawing pattern corresponding to learning input data. The setting unit 102 can generate adaptive learning data (GT911-913) for error evaluation of the intermediate stratified outputs 307-309, for example, according to the flowchart of FIG. 9(D).

ステップＳ９０１において設定部１０２は、基本データ記憶部１０１に格納された基本学習データ（ＧＴ９１２）を取得する。ステップＳ９０２において設定部１０２は、ＧＴ９１２にフィルタ処理を行うことにより、ＧＴ９１１及びＧＴ９１３を生成する。ステップＳ９０３において設定部１０２は、こうして得られたＧＴ９１１～ＧＴ９１３を適合的データ記憶部１０３に格納することにより、各中間層準出力３０７～３０９用のＧＴ９１１～９１３を設定できる。 In step S<b>901 , the setting unit 102 acquires basic learning data (GT 912 ) stored in the basic data storage unit 101 . In step S902, the setting unit 102 generates GT911 and GT913 by filtering the GT912. In step S903, the setting unit 102 stores GT911-GT913 thus obtained in the adaptive data storage unit 103, thereby setting GT911-913 for each of the intermediate layer reference outputs 307-309.

この例では、設定部１０２は、基本学習データに対してフィルタ処理を行うことにより、適合的学習データを生成した。すなわち、設定部１０２は、学習用入力データに対応する線画パターンである基本学習データ（ＧＴ９１２）に対して、中間層準出力ごとに異なるフィルタを作用させることにより、異なる適合的学習データ（ＧＴ９１１，９１３）を得ることができる。中間層準出力に示される輪郭パターンは、最終出力側に近づくにつれ、テクスチャを反映した細かな形態から、大まかな形態へと変化していく。基本学習データに対して変換を施すフィルタの効果により、このような変化をモデル化し、このような変化に合わせてＧＴの形態を変化させることができる。一例として、設定部１０２は、中間層準出力の解像度が大きい（画素数が多い）場合と比較して、解像度が小さい（画素数が少ない）場合に、線画パターンの線幅が大きくなるように、用いるフィルタを選択することができる。 In this example, the setting unit 102 generated the adaptive learning data by filtering the basic learning data. That is, the setting unit 102 causes different adaptive learning data (GT911, 913) can be obtained. The outline pattern shown in the intermediate layer sub-output changes from a fine form reflecting the texture to a rough form as it approaches the final output side. Such changes can be modeled and the shape of the GT can be altered to accommodate such changes by the effect of filters that apply transformations to the base training data. As an example, the setting unit 102 sets the line width of the line drawing pattern to be larger when the resolution of the intermediate layer sub-output is low (the number of pixels is small) compared to when the resolution is high (the number of pixels is large). , you can choose which filters to use.

フィルタの具体例としては、特定の周波数帯域のみを通過させるバンドバスフィルタが挙げられる。図９（Ａ）には、ＧＴ９１２に対して高周波パスフィルタを適用することにより得られたＧＴ９１１が示されている。図９（Ｂ）には、輪郭パターンの線幅が２であるＧＴ９１２が示され、中間層準出力３０８に対してはＧＴ９１２がそのまま用いられる。図９（Ｃ）には、ＧＴ９１２に対して低周波パスフィルタを適用することにより得られたＧＴ９１３が示されている。図９（Ａ）～（Ｃ）からわかるように、ＧＴ９１１はＧＴ９１２よりも輪郭パターンの線幅が細く、ＧＴ９１３はＧＴ９１２よりも輪郭パターンの線幅が太い。なお、図９（Ａ）～（Ｃ）に示される周波数と強度のグラフにおいて、灰色の部分はフィルタ処理で通過させる帯域を示している。なお、長さの短い輪郭パターン（例えば最大長さが１０ピクセル以下など）に対しては、フィルタ処理を省略し、又は輪郭パターンを消す処理を行ってもよい。このような処理によれば、例えば、ノイズの影響を抑える効果が期待できる。 A specific example of the filter is a bandpass filter that passes only a specific frequency band. FIG. 9A shows GT911 obtained by applying a high-frequency pass filter to GT912. FIG. 9B shows a GT 912 whose outline pattern has a line width of 2, and the GT 912 is used as it is for the intermediate layer sub-output 308 . FIG. 9C shows GT913 obtained by applying a low frequency pass filter to GT912. As can be seen from FIGS. 9A to 9C, GT911 has a narrower contour pattern line width than GT912, and GT913 has a thicker contour pattern line width than GT912. In the frequency-intensity graphs shown in FIGS. 9A to 9C, the gray portions indicate the bands passed by filtering. Note that the filtering process may be omitted or the contour pattern may be erased for a short contour pattern (for example, the maximum length is 10 pixels or less). According to such processing, for example, an effect of suppressing the influence of noise can be expected.

別の例として、基本データ記憶部１０１は輪郭パターンを示すベクタデータを格納していてもよい。この場合、設定部１０２は、中間層群に対応する線幅を有するＧＴを生成することができる。 As another example, the basic data storage unit 101 may store vector data representing contour patterns. In this case, the setting unit 102 can generate a GT having a line width corresponding to the intermediate layer group.

また、中間層準出力３０７～３０９の誤差評価用の適合的学習データ（ＧＴ６０１～６０３）は、予め基本データ記憶部１０１に格納されていてもよい。さらに、設定部１０２は、基本データ記憶部１０１に格納されているデータに基づいてＧＴ６０１～６０３を生成してもよい。図６（Ｄ）は、基本データ記憶部１０１における、ＧＴ６０１～６０３を生成するためのデータの格納方法の例を説明する図である。また、図６（Ｅ）は、図６（Ｄ）の縦線部分の拡大図である。図６（Ｄ）（Ｅ）に示されるように、統合出力３０５及び中間層準出力３０７の誤差評価用のＧＴ６０１としては、「１」で示される輪郭パターンが用いられ、より具体的にはＧＴ６０１のポジティブ領域は「１」で示される領域である。また、中間層準出力３０８の誤差評価用のＧＴ６０２としては、「１」及び「２」で示される輪郭パターンが用いられ、中間層準出力３０９の誤差評価用のＧＴ６０３としては、「１」及び「２」及び「３」で示される輪郭パターンが用いられる。すなわち、ＧＴ６０２のポジティブ領域は「１」及び「２」で表される領域であり、ＧＴ６０３のポジティブ領域は「１」及び「２」及び「３」で表される領域である。 Also, the adaptive learning data (GT601-603) for error evaluation of the intermediate layer sub-outputs 307-309 may be stored in the basic data storage unit 101 in advance. Further, setting section 102 may generate GTs 601 to 603 based on data stored in basic data storage section 101 . FIG. 6D is a diagram for explaining an example of a data storage method for generating GTs 601 to 603 in basic data storage unit 101. As shown in FIG. FIG. 6(E) is an enlarged view of the vertical line portion of FIG. 6(D). As shown in FIGS. 6(D) and 6(E), as GT 601 for error evaluation of integrated output 305 and intermediate layer sub-output 307, a contour pattern indicated by "1" is used. More specifically, GT 601 is the area indicated by "1". Further, as the GT 602 for error evaluation of the intermediate layer reference output 308, contour patterns indicated by "1" and "2" are used, and as the GT 603 for error evaluation of the intermediate layer reference output 309, "1" and The contour patterns indicated by "2" and "3" are used. That is, the positive regions of GT602 are the regions represented by "1" and "2", and the positive regions of GT603 are the regions represented by "1", "2" and "3".

この場合、設定部１０２は、基本データ記憶部１０１に格納されたデータを用いて、それぞれの中間層準出力３０７～３０９の誤差評価用の適合的学習データ（ＧＴ６０１～６０３）を生成及び設定することができる。このように、中間層準出力３０７～３０９の誤差評価用のＧＴ６０１～６０３における輪郭パターンの線幅を順次太くすることにより、パターンの相違以外に起因する誤差が過大に評価されるのを防ぎ、より効果的にサイドアウト学習を行うことができる。例えば、第１の中間層からの出力の誤差評価用のＧＴよりも、第１の中間層よりもプーリング層を通って下流にある第２の中間層からの出力の誤差評価用のＧＴの方が、輪郭パターンの線幅が太くなるように、ＧＴを設定することができる。 In this case, the setting unit 102 uses the data stored in the basic data storage unit 101 to generate and set adaptive learning data (GTs 601 to 603) for error evaluation of the respective intermediate stratified outputs 307 to 309. be able to. In this way, by sequentially increasing the line widths of the contour patterns in the GTs 601 to 603 for error evaluation of the intermediate layer reference outputs 307 to 309, errors caused by factors other than pattern differences are prevented from being overestimated. Side-out learning can be performed more effectively. For example, the GT for error estimation of the output from the first hidden layer is more likely than the GT for error estimation of the output from the second hidden layer that is downstream through the pooling layer from the first hidden layer. However, GT can be set so that the line width of the contour pattern is thickened.

設定部１０２は、上記のように得られたそれぞれの中間層準出力用のＧＴに対して、ぼかし処理のようなさらなる画像処理を行って得られたＧＴを、適合的学習データとして設定してもよい。例えば図８（Ａ）～（Ｃ）には、図６に示すＧＴ６０１～６０３に対して、さらにガウシアンブラー（ガウス関数を用いて画像をぼかす処理）を適用した結果を示す。すなわち、図８（Ａ）には、統合出力３０５と中間層準出力３０７の誤差評価に用いるための、線幅１のＧＴ６０１にガウシアンブラーを作用させた後の断面８０１（輪郭パターンの幅方向の画素値分布）を示す。同様に、図８（Ｂ）（Ｃ）には、中間層準出力３０８，３０９の誤差評価に用いるための、線幅２，４のＧＴ６０２，６０３にガウシアンブラーを作用させた後の断面８０２，８０３を示す。それぞれのＧＴ６０１～６０３に適用する処理は、同一の強さであってもよいし、中間層準出力の特性に合わせた異なる強さであってもよい。 The setting unit 102 sets the GTs obtained by performing further image processing such as blurring on the respective GTs for intermediate layer standard output obtained as described above, as adaptive learning data. good too. For example, FIGS. 8A to 8C show the results of applying Gaussian blur (processing for blurring an image using a Gaussian function) to GTs 601 to 603 shown in FIG. That is, FIG. 8A shows a cross section 801 after applying Gaussian blur to the GT 601 with a line width of 1 (in the width direction of the contour pattern) for use in error evaluation between the integrated output 305 and the intermediate layer sub-output 307. pixel value distribution). Similarly, in FIGS. 8B and 8C, cross sections 802 and 802 after Gaussian blur is applied to GTs 602 and 603 with line widths of 2 and 4 for use in error evaluation of intermediate layer sub-outputs 308 and 309. 803 is shown. The processing applied to each GT 601-603 may be of the same strength or may be of different strengths to match the characteristics of the intermediate strata output.

このように設定部１０２は、ぼかし処理が行われた線画パターンを、２以上の異なる出力についての教師データとして設定することができる。学習用入力データに示される正しい輪郭パターンの位置と、ＧＴに示される輪郭パターンの位置とは、入力時の誤差のためにわずかにずれている可能性がある。ここで、ＧＴに対してぼかし処理（例えばガウシアンブラー処理）を行うことにより、真の位置を中心とした入力誤差（例えば、ガウシアン分布に従う入力誤差）をＧＴに反映させ、より効果的にサイドアウト学習を行うことができる。 In this manner, the setting unit 102 can set the line drawing pattern subjected to the blurring process as teacher data for two or more different outputs. There is a possibility that the correct position of the contour pattern indicated by the learning input data and the position of the contour pattern indicated by the GT are slightly deviated due to an input error. Here, by performing blurring processing (for example, Gaussian blur processing) on the GT, the input error centered on the true position (for example, the input error following the Gaussian distribution) is reflected in the GT, and side-out is performed more effectively. can learn.

ここまで、主にＧＴにおける輪郭パターンの線幅を、中間層準出力の特性に応じて変更する構成について説明したが、適合的学習データの設定方法はこのような方法に限られない。例えば、設定部１０２は、２以上の異なる出力のそれぞれに対応する幅を有する誤差評価対象外領域が線画パターンの周囲に設定された、２以上の異なる出力についての教師データを設定することができる。 So far, the configuration in which the line width of the outline pattern in the GT is mainly changed according to the characteristics of the intermediate stratified output has been described, but the adaptive learning data setting method is not limited to such a method. For example, the setting unit 102 can set teacher data for two or more different outputs in which an error evaluation non-target area having a width corresponding to each of the two or more different outputs is set around the line drawing pattern. .

このように、ＧＴに誤差評価を行わない誤差評価対象外領域を設定する方法について、図７を参照して説明する。図７（Ａ）は、中間層準出力３０７及び誤差評価用のＧＴ６０１を示し、これは図６（Ａ）と同様である。一方、図７（Ｂ）は、中間層準出力３０８、及び線幅１のＧＴ６０１（ＧＴのポジティブ領域）と線幅２の付帯領域７０２とで構成される中間層準出力３０８の誤差評価用のＧＴを表す。また、図７（Ｃ）は、中間層準出力３０９、及び線幅１のＧＴ６０１（ＧＴのポジティブ領域）と線幅４の付帯領域７０３とで構成される中間層準出力３０９の誤差評価用のＧＴを表す。ここで、付帯領域とは、誤差評価において評価を行わない、ポジティブ領域である輪郭パターンの両側に付属する領域のことを表す。この場合、式（２）を用いた評価において、Ｙ_＋ ^ｍは中間層ｍに与えるＧＴのうちポジティブ（例えばラベル値が１）な領域を表す。また、Ｙ_－ ^ｍは中間層ｍに与えるＧＴのうちネガティブ（例えばラベル値が０）である領域を表す。このネガティブな領域は、全体の領域からポジティブ領域と付帯領域（例えばラベル値が２）を除いた領域である。 A method of setting an error-evaluation-excluded region in the GT where error evaluation is not performed in this way will be described with reference to FIG. FIG. 7(A) shows the hidden layer sub-output 307 and the GT 601 for error estimation, which is similar to FIG. 6(A). On the other hand, FIG. 7B shows the error evaluation of the intermediate layer sub-output 308, and the error evaluation of the intermediate layer sub-output 308 composed of the GT 601 (positive region of GT) with a line width of 1 and the incidental region 702 with a line width of 2. represents GT. In addition, FIG. 7(C) shows the error evaluation of the intermediate layer secondary output 309 and the intermediate layer secondary output 309 composed of the GT 601 (positive region of GT) with a line width of 1 and the incidental region 703 with a line width of 4. represents GT. Here, the incidental area means an area attached to both sides of the contour pattern, which is a positive area and is not evaluated in the error evaluation. In this case, in the evaluation using equation (2), Y ₊ ^m represents a positive region (for example, the label value is 1) of the GT given to the intermediate layer m. Also, Y ₋ ^m represents a negative region (for example, the label value is 0) of the GT given to the intermediate layer m. This negative area is the area excluding the positive area and the incidental area (for example, the label value is 2) from the entire area.

このような付帯領域を有するＧＴは、例えば、図６（Ｄ）（Ｅ）に示されるデータに従って作成することができる。例えば、図７（Ｂ）に示すＧＴは、「１」の領域をポジティブ領域に、「２」の領域を付帯領域に、それぞれ設定することにより作成することができる。また、図７（Ｃ）に示すＧＴは、「１」の領域をポジティブ領域に、「２」及び「３」の領域を付帯領域に、それぞれ設定することにより作成することができる。また、上記のようなフィルタ処理を用いて付帯領域を設定することも可能である。このように、中間層準出力３０７～３０９の誤差評価用のＧＴ６０１における付帯領域７０２，７０３の線幅を順次太くすることによっても、パターンの相違以外に起因する誤差が過大に評価されるのを防ぎ、より効果的にサイドアウト学習を行うことができる。 A GT having such an incidental area can be created, for example, according to the data shown in FIGS. 6(D) and 6(E). For example, the GT shown in FIG. 7B can be created by setting the area of "1" as a positive area and the area of "2" as an incidental area. Also, the GT shown in FIG. 7(C) can be created by setting the "1" area as the positive area and the "2" and "3" areas as the incidental areas. Moreover, it is also possible to set the incidental area using the filtering process as described above. In this way, by sequentially increasing the line widths of the incidental regions 702 and 703 in the GT 601 for error evaluation of the intermediate layer sub-outputs 307 to 309, the overestimation of errors caused by factors other than pattern differences can be prevented. It is possible to prevent side-out learning more effectively.

（様々なネットワーク構成への応用例）
ここまでは、それぞれの中間層群からの中間層準出力に基づいてサイドアウト学習を行う場合について説明したが、本実施形態に係る方法の適用例はこれに限られない。例えば、図１０に示すように、１つの中間層群からの複数の出力に基づいてサイドアウト学習を行うこともできる。図１０に示す構成においては、ネットワークの１つの中間層群における２以上の異なる中間層からの出力に基づいて、サイドアウト学習が行われる。図１０（Ａ）において、１つの中間層群１３００には、畳み込み層１３０１，１３０２，１３０３、及びプーリング１３０４層が含まれる。また、図１０（Ａ）には、畳み込み層１３０１～１３０３の出力１３１１～１３１３と、そこでの誤差評価に用いるＧＴ１３２１～１３２３が示されている。図１０（Ｂ）には、ＧＴ１３２１～１３２３における輪郭パターンの線幅の変化を示しており、次第に線幅が大きくなることがわかる。 (Application examples for various network configurations)
Up to this point, a case has been described where side-out learning is performed based on the intermediate layer output from each layer group, but the application example of the method according to the present embodiment is not limited to this. For example, as shown in FIG. 10, side-out learning can also be performed based on multiple outputs from one hidden layer group. In the configuration shown in FIG. 10, side-out learning is performed based on outputs from two or more different hidden layers in one hidden layer group of the network. In FIG. 10A, one hidden layer group 1300 includes convolutional layers 1301, 1302, 1303 and a pooling 1304 layer. FIG. 10A also shows outputs 1311 to 1313 of convolution layers 1301 to 1303 and GTs 1321 to 1323 used for error evaluation there. FIG. 10B shows changes in the line width of contour patterns in GT1321 to GT1323, and it can be seen that the line width gradually increases.

この場合、設定部１０２は、ネットワークの１つの中間層群における２以上の異なる中間層からの出力のそれぞれについて、学習用入力データに対する教師データを設定することができる。例えば、出力１３１１～１３１３の誤差評価用のＧＴ１３２１～１３２３における輪郭パターンの線幅を順次太くすることができる。具体例として設定部１０２は、第１の中間層からの出力の誤差評価用のＧＴよりも、第１の中間層よりも畳み込み層を通って下流にある第２の中間層からの出力の誤差評価用のＧＴの方が、輪郭パターンの線幅が太くなるように、ＧＴを設定することができる。このような構成により、畳み込み層で順次フィルタを作用させていくことによる画素の空間的な相互依存範囲の拡大の影響を取り込み、パターンの相違以外に起因する誤差が過大に評価されるのを防ぐことができる。このために、より効果的にサイドアウト学習を行うことができる。 In this case, the setting unit 102 can set teacher data for learning input data for each of outputs from two or more different hidden layers in one hidden layer group of the network. For example, the line widths of contour patterns in GTs 1321 to 1323 for error evaluation of outputs 1311 to 1313 can be made thicker sequentially. As a specific example, the setting unit 102 sets the error of the output from the second hidden layer, which is downstream of the first hidden layer through the convolutional layer, from the GT for error evaluation of the output from the first hidden layer. The GT for evaluation can be set so that the line width of the contour pattern is thicker than the GT for evaluation. With such a configuration, the influence of the expansion of the spatial interdependence range of pixels due to the sequential application of filters in the convolution layers is incorporated, and errors caused by factors other than pattern differences are prevented from being overestimated. be able to. Therefore, side-out learning can be performed more effectively.

別の例として、図９（Ｅ）に示すように、ネットワークの１つの中間層からの複数の出力に基づいてサイドアウト学習を行うこともできる。一例として、図９（Ｅ）には、中間層群９５０が、畳み込み層９５１～９５３及びプーリング層９５４で構成される場合を示す。図９（Ｅ）の例において、設定部１０２は、ネットワークの１つの階層における２以上の異なるチャネル群からの出力のそれぞれについて、学習用入力データに対する教師データを設定することができる。 As another example, side-out learning can also be performed based on multiple outputs from one hidden layer of the network, as shown in FIG. 9(E). As an example, FIG. 9E shows a case where an intermediate layer group 950 is composed of convolution layers 951 to 953 and a pooling layer 954 . In the example of FIG. 9E, the setting unit 102 can set teacher data for learning input data for each of outputs from two or more different channel groups in one layer of the network.

例えば、設定部１０２は、基本学習データに示される画像を所定の条件に従って分離し、それぞれの部分画像を示す複数の適合的学習データを生成することができる。具体例として、ＧＴに示される輪郭パターンを特定の方向ごとに分離し、それぞれの輪郭パターンを用いて対応するネットワークの重み係数（畳み込みフィルタ）の学習を行ってもよい。ここで、サイドアウトを出力する畳み込み層９５１は、畳み込み層９６１と畳み込み層９６２に分割される。畳み込み層９６１及び畳み込み層９６２は、畳み込み層９５１における異なるチャネル群に相当する。ここで設定部１０２は、畳み込み層９６１，９６２のそれぞれに、異なる方向成分を有するＧＴを設定することができる。この場合、畳み込み層９６１，９６２のそれぞれの重み係数の学習は、異なる方向成分を有するＧＴを用いて行われる。例えば、畳み込み層９６１の学習は第１の方向の輪郭パターンを示すＧＴ９７１を用いて、畳み込み層９６２の学習は第１の方向とは異なる第２の方向の輪郭パターンを示すＧＴ９７２を用いて、それぞれ行うことができる。このように、それぞれの畳み込み層について特定のパターンを有するＧＴを用いた学習を集中的に行うことにより、全体の認識性能が向上することが期待される。このような構成は上記の各種の構成と組み合わせることができ、例えばＧＴに対してガウシアンブラー処理のようなさらなる画像処理を適用する場合と組み合わせてもよい。 For example, the setting unit 102 can separate an image represented by basic learning data according to a predetermined condition and generate a plurality of adaptive learning data representing respective partial images. As a specific example, the contour patterns shown in GT may be separated for each specific direction, and the respective contour patterns may be used to learn the weighting coefficients (convolution filters) of the corresponding networks. Here, the convolutional layer 951 that outputs side-out is divided into a convolutional layer 961 and a convolutional layer 962 . Convolutional layer 961 and convolutional layer 962 correspond to different channel groups in convolutional layer 951 . Here, the setting unit 102 can set GTs having different directional components in each of the convolution layers 961 and 962 . In this case, the learning of the weighting factors for each of the convolutional layers 961 and 962 is performed using GTs with different directional components. For example, convolutional layer 961 is trained using GT971, which indicates contour patterns in a first direction, and convolutional layer 962 is trained using GT972, which indicates contour patterns in a second direction different from the first direction. It can be carried out. In this way, by performing intensive training using GTs having specific patterns for each convolutional layer, it is expected that the overall recognition performance will be improved. Such an arrangement can be combined with the various arrangements described above, for example with applying further image processing such as Gaussian blurring to the GT.

ここまで、中間層準出力をＧＴに合わせて拡大することを前提として、中間層準出力ごとにＧＴを設定する場合について説明した。一方、設定部１０２は、中間層準出力のそれぞれのサイズに合わせたＧＴを設定してもよい。例えば、設定部１０２は、輪郭パターンを示すＧＴ（基本学習データ）を、中間層準出力のサイズに合わせて縮小してもよい。具体例としては、基本学習データに対してフィルタ処理を行うことにより適合的学習データを生成する方法が挙げられる。例えば、基本学習データが二値画像（「１」値が輪郭を表す）場合、２×２のＭＡＸプーリングをストライド２×２で行うことにより、基本学習データに示される輪郭パターンを維持しながら解像度が半分になった適合的学習データを得ることができる。このように、単に画素を間引きし又は繰り返すことにより基本学習データから適合的学習データを生成するのではなく、基本学習データに対してフィルタ処理のような画像処理を行って適合的学習データを生成することができる。このような方法によれば、中間層準出力に適した適合的学習データを生成することが可能となる。 So far, the case where GT is set for each intermediate stratum output has been described on the premise that the intermediate stratum output is expanded in accordance with the GT. On the other hand, the setting unit 102 may set the GT according to each size of the intermediate layer standard output. For example, the setting unit 102 may reduce the GT (basic learning data) indicating the contour pattern to match the size of the intermediate layer reference output. A specific example is a method of generating adaptive learning data by performing filtering on basic learning data. For example, if the basic training data is a binary image (the "1" values represent contours), 2×2 MAX pooling with a stride of 2×2 provides a resolution while maintaining the contour pattern shown in the basic training data. is halved, adaptive training data can be obtained. In this way, instead of simply thinning out or repeating pixels to generate adaptive learning data from basic learning data, image processing such as filtering is performed on the basic learning data to generate adaptive learning data. can do. According to such a method, it is possible to generate adaptive learning data suitable for intermediate layer standard output.

以上説明した方法により階層型ネットワークの学習を行うことにより、階層型ネットワークのパラメータを作成することができる。また、一実施形態に係る情報処理装置は、このように作成されたパラメータが設定された階層型ネットワークを用いて、入力データに対応する認識処理の結果を生成する生成部を有している。このような階層型ネットワークは、プログラムにより実現することもできるし、パラメータを格納するメモリとＧＰＵのような演算部とを備える演算装置により実現することもできる。本実施形態に係る方法によれば、階層型ネットワークからの２以上の異なる出力のそれぞれが、従来のように同じ基本学習データを用いて評価する代わりに、それぞれに合った適合的学習データを用いて評価される。このため、学習によって得られるネットワークのパラメータは、従来とは異なり、より入力データに対する認識処理に適したものとなる。 By learning the hierarchical network by the method described above, the parameters of the hierarchical network can be created. Further, the information processing apparatus according to one embodiment includes a generation unit that generates a result of recognition processing corresponding to input data using the hierarchical network in which the created parameters are set. Such a hierarchical network can be implemented by a program, or by an arithmetic device having a memory for storing parameters and an arithmetic unit such as a GPU. According to the method of the present embodiment, each of two or more different outputs from the hierarchical network are evaluated using the same adaptive training data, instead of using the same basic training data as in the conventional method. evaluated. Therefore, the parameters of the network obtained by learning are different from the conventional ones and are more suitable for recognition processing of input data.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１００：学習装置、１０２：設定部、１０４：学習部 100: learning device, 102: setting unit, 104: learning unit

Claims

A learning device for learning a neural network used for attribute determination processing of each pixel of an image ,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
The training data for each of the two or more different outputs indicates the results of different deformation processing or filtering processing on the basic training data corresponding to the single learning input data, and the basic training data is the single learning input data. is image data indicating the attribute of each pixel indicated by the learning input data of .

2. The learning apparatus according to claim 1, wherein said setting means sets teacher data for each of said two or more different outputs based on the structure of said neural network.

3. The learning apparatus according to claim 1, wherein said setting means sets teacher data for learning input data for each of outputs from two or more different hierarchies of said neural network.

3. The setter according to claim 1, wherein said setting means sets teacher data for learning input data for each of outputs from two or more different channel groups in one layer of said neural network. learning device.

5. The learning according to any one of claims 1 to 4, wherein said setting means sets the teacher data for said two or more different outputs based on the resolutions of said two or more different outputs. Device.

The setting means sets an error non-evaluation area for teacher data for each of the two or more different outputs based on basic teacher data for outputs from the neural network corresponding to the single input data for learning. 6. The learning device according to any one of claims 1 to 5, characterized in that:

7. The learning device according to any one of claims 1 to 6 , wherein the result of said attribute determination processing indicates a line drawing pattern.

A learning device for learning a neural network,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
the neural network after learning provides two or more different outputs corresponding to input data, and a result of integration of the two or more different outputs indicates a result of recognition processing for the input data;
the input data is image data, and the result of recognition processing for the input data is attribute information of each pixel of the image data;
a result of recognition processing for the input data is a line drawing pattern corresponding to the input data;
The learning device, wherein the setting means sets a line drawing pattern having a width corresponding to each of the two or more different outputs as teacher data for the two or more different outputs.

A learning device for learning a neural network,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
the neural network after learning provides two or more different outputs corresponding to input data, and a result of integration of the two or more different outputs indicates a result of recognition processing for the input data;
the input data is image data, and the result of recognition processing for the input data is attribute information of each pixel of the image data;
a result of recognition processing for the input data is a line drawing pattern corresponding to the input data;
The learning device, wherein the setting means sets a line drawing pattern subjected to blurring processing as teacher data for the two or more different outputs.

A learning device for learning a neural network,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
the neural network after learning provides two or more different outputs corresponding to input data, and a result of integration of the two or more different outputs indicates a result of recognition processing for the input data;
the input data is image data, and the result of recognition processing for the input data is attribute information of each pixel of the image data;
a result of recognition processing for the input data is a line drawing pattern corresponding to the input data;
The setting means sets teacher data for the two or more different outputs, in which an error evaluation non-object area having a width corresponding to each of the two or more different outputs is set around the line drawing pattern. and learning device.

11. The learning device according to any one of claims 7 to 10, wherein said setting means generates said teacher data using basic teacher data that is a line drawing pattern corresponding to said input data for learning. .

A learning device for learning a neural network,
setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single input data for learning;
Learning means for learning the neural network based on the error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
the neural network after learning provides two or more different outputs corresponding to input data, and a result of integration of the two or more different outputs indicates a result of recognition processing for the input data;
the input data is image data, and the result of recognition processing for the input data is attribute information of each pixel of the image data;
a result of recognition processing for the input data is a line drawing pattern corresponding to the input data;
The learning device, wherein the setting means generates the teacher data by filtering basic teacher data, which is a line drawing pattern corresponding to the input data for learning .

A method for creating a trained neural network used for attribute determination processing of each pixel of an image ,
A setting step of setting teacher data for each of two or more different outputs from the neural network corresponding to a single learning input data;
A learning step of learning the neural network based on an error between each of the two or more different outputs obtained by inputting the learning input data to the neural network and teacher data corresponding to the output. and
The training data for each of the two or more different outputs indicates the results of different deformation processing or filtering processing on the basic training data corresponding to the single learning input data, and the basic training data is the single learning input data. is image data indicating the attribute of each pixel indicated by the learning input data .

A neural network for causing an information processing device to generate a processing result of recognition processing corresponding to input data, wherein the neural network is set with parameters created by the creating method according to claim 13.

15. An information processing apparatus comprising processing means for generating a processing result of recognition processing corresponding to input data using the neural network according to claim 14.

A program for causing a computer to function as each means of the learning device according to any one of claims 1 to 12.