JP2020042664A

JP2020042664A - Learning device, parameter creation method, neural network, and information processor using the same

Info

Publication number: JP2020042664A
Application number: JP2018170893A
Authority: JP
Inventors: 晃一丹治; Koichi Tanji; 敦史野上; Atsushi Nogami; 裕輔御手洗; Hirosuke Mitarai
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2020-03-19
Anticipated expiration: 2038-09-12
Also published as: JP7316771B2

Abstract

To more effectively perform learning of a neural network.SOLUTION: A learning device performs learning of a neural network. The learning device sets teacher data for each of two or more different outputs from the neural network, corresponding to a single input data for learning. The learning device performs the learning of the neural network on the basis of an error between each of the two or more different outputs and the teacher data corresponding to the outputs, obtained by inputting the input data for learning in the neural network. The neural network after learning provides the two or more different outputs corresponding to the input data, and an integration result of the two or more different outputs indicates a result of recognition processing for the input data.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、パラメータの作成方法、ニューラルネットワーク及びこれを用いた情報処理装置に関し、より詳細には、例えば画像認識技術に関する。 The present invention relates to a learning device, a parameter creation method, a neural network, and an information processing device using the same, and more particularly to, for example, an image recognition technology.

学習データを用いて階層型ネットワークの学習を行うことで、入力データを識別する識別器を生成する方法が知られている。一方、階層型ネットワークの層数が増えるにつれ、いわゆる勾配消失問題（重み係数の更新に必要なデルタを逆伝播させる際にデルタが消失又は発散してしまう）が顕在化し、学習の進行に障害が生じうることが知られている。 There is known a method of generating a classifier for identifying input data by learning a hierarchical network using learning data. On the other hand, as the number of layers in the hierarchical network increases, a so-called gradient disappearance problem (delta disappears or diverges when backpropagating the delta required for updating the weighting factor) becomes apparent, and the progress of learning becomes impaired. It is known that it can occur.

このような問題に対処するための方法として、deep supervisionと呼ばれる、ネットワークの中間層においても誤差評価及び誤差逆伝播を行う方法（以下、サイドアウト学習と呼ぶ）が知られている（非特許文献１）。また、画像の特徴量を抽出するように階層型ネットワークの学習を行うことに加えて、特定の特徴が存在する場合に特定のニューロンが活動するように学習を行うことにより、特徴に応じた的確な特徴量抽出を可能とする方法も知られている（特許文献１）。 As a method for coping with such a problem, a method called “deep supervision” that performs error evaluation and error back-propagation even in a middle layer of a network (hereinafter, referred to as side-out learning) is known (Non-Patent Document) 1). In addition to learning the hierarchical network so as to extract the features of the image, it also learns so that a specific neuron is activated when a specific feature exists, so that the accuracy according to the feature can be improved. There is also known a method capable of extracting a characteristic amount (Patent Document 1).

特開２０１６−３１７４６号公報JP 2016-31746 A

Xie, S., Tu, Z. "Holistically-nested edge detection" ICCV, 1395-1403 (2015)Xie, S., Tu, Z. "Holistically-nested edge detection" ICCV, 1395-1403 (2015)

しかしながら、非特許文献１の方法においては、中間層からの出力に対する誤差評価の精度が低くなり、好ましい最終学習結果が得られない可能性があることが見出された。 However, in the method of Non-Patent Document 1, it has been found that the accuracy of the error evaluation with respect to the output from the hidden layer is reduced, and a preferable final learning result may not be obtained.

本発明は、ニューラルネットワークの学習をより効果的に行うことを目的とする。 An object of the present invention is to perform neural network learning more effectively.

本発明の目的を達成するために、例えば、本発明の学習装置は以下の構成を備える。すなわち、
ニューラルネットワークの学習を行う学習装置であって、
単一の学習用入力データに対応する、前記ニューラルネットワークからの２以上の異なる出力のそれぞれについての教師データを設定する設定手段と、
前記学習用入力データを前記ニューラルネットワークに入力して得られる、前記２以上の異なる出力のそれぞれと、前記出力に対応する教師データと、の誤差に基づいて、前記ニューラルネットワークの学習を行う学習手段と、を備え、
学習後の前記ニューラルネットワークは、入力データに対応する２以上の異なる出力を与え、前記２以上の異なる出力の統合結果が前記入力データに対する認識処理の結果を示すことを特徴とする。 In order to achieve the object of the present invention, for example, the learning device of the present invention has the following configuration. That is,
A learning device for learning a neural network,
Setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single learning input data;
Learning means for learning the neural network based on an error between each of the two or more different outputs obtained by inputting the learning input data into the neural network and teacher data corresponding to the outputs; And
After the learning, the neural network provides two or more different outputs corresponding to the input data, and an integrated result of the two or more different outputs indicates a result of a recognition process on the input data.

ニューラルネットワークの学習をより効果的に行うことができる。 Learning of the neural network can be performed more effectively.

一実施形態に係る学習装置の一例を示す機能構成図。FIG. 1 is a functional configuration diagram illustrating an example of a learning device according to an embodiment. 一実施形態に係るパラメータ生成方法の一例を示すフローチャート。9 is a flowchart illustrating an example of a parameter generation method according to an embodiment. サイドアウト学習を行う階層型ネットワークの一例を示す模式図。The schematic diagram which shows an example of the hierarchical network which performs a side-out learning. サイドアウト出力とＧＴの関係を説明するための図。The figure for demonstrating the relationship between a side-out output and GT. 従来技術に従ってサイドアウト学習を行う場合の課題を説明するための図。The figure for demonstrating the subject at the time of performing a side-out learning according to a prior art. 一実施形態に係る適合的ＧＴの生成方法を説明する図。The figure explaining the generation method of the adaptive GT which concerns on one Embodiment. 一実施形態に係る適合的ＧＴの生成方法を説明する図。The figure explaining the generation method of the adaptive GT which concerns on one Embodiment. 一実施形態に係る適合的ＧＴの生成方法を説明する図。The figure explaining the generation method of the adaptive GT which concerns on one Embodiment. 一実施形態に係る適合的ＧＴの生成方法を説明する図。The figure explaining the generation method of the adaptive GT which concerns on one Embodiment. 一実施形態に係る適合的ＧＴの生成方法を説明する図。The figure explaining the generation method of the adaptive GT which concerns on one Embodiment. 一実施形態で用いられるコンピュータの概略ブロック図。FIG. 1 is a schematic block diagram of a computer used in one embodiment.

以下、本発明の実施形態について、フローチャートと図面とを参照しながら具体的に説明する。なお、以下の具体例は本発明に係る実施形態の一例ではあるが、本発明は以下の具体的形態に限定されるものではない。本発明は、学習データを用いた階層型ネットワーク（以下、ニューラルネットワーク又は単にネットワークと呼ぶことがある）の学習に適用することができ、階層型ネットワークの学習が行われるいかなる手法にも適用可能である。 Hereinafter, embodiments of the present invention will be specifically described with reference to flowcharts and drawings. The following specific example is an example of an embodiment according to the present invention, but the present invention is not limited to the following specific embodiment. INDUSTRIAL APPLICABILITY The present invention can be applied to learning of a hierarchical network (hereinafter sometimes referred to as a neural network or simply a network) using learning data, and can be applied to any method in which learning of a hierarchical network is performed. is there.

図１は、実施形態１に係る学習装置１００の機能構成の一例を示す。学習装置１００は、階層型ネットワークの学習を行う。基本データ記憶部１０１は、学習に用いる基本学習データを保持する。学習データとは、ネットワークの学習に用いられる教師データ（以下、ＧＴと呼ぶことがある）である。本実施形態においては、学習用入力データと、学習用入力データに対する判定結果を示す教師データ（学習データ）と、を用いてネットワークの学習が行われる。 FIG. 1 illustrates an example of a functional configuration of a learning device 100 according to the first embodiment. The learning device 100 performs learning of a hierarchical network. The basic data storage unit 101 stores basic learning data used for learning. The learning data is teacher data (hereinafter sometimes referred to as GT) used for network learning. In the present embodiment, network learning is performed using learning input data and teacher data (learning data) indicating a determination result for the learning input data.

例えば、一実施形態においては、ネットワークを用いて画像の各画素についての属性判定（ラベリング）を行うことができる。すなわち、入力データとして画像データをネットワークに入力すると、入力データに対する判定処理の結果として、画像データの各画素の属性情報（ラベル）が得られる。例えば、画像の輪郭抽出を行う具体的な一例において、入力データに対する判定処理の結果としては、入力データに対応する輪郭パターン（輪郭であるか否かを示す属性情報を画素値として有する画像）が得られる。このように、一実施形態において、入力データに対する処理結果は、入力データに対応する（輪郭パターンのような）線画パターンでありうる。 For example, in one embodiment, attribute determination (labeling) for each pixel of an image can be performed using a network. That is, when image data is input to the network as input data, attribute information (label) of each pixel of the image data is obtained as a result of the determination process on the input data. For example, in a specific example of extracting an outline of an image, as a result of the determination process on the input data, an outline pattern corresponding to the input data (an image having attribute information indicating whether or not the outline is a pixel value) is obtained. can get. Thus, in one embodiment, the result of processing the input data may be a line drawing pattern (such as a contour pattern) corresponding to the input data.

このような構成において、学習用入力データは画像データであり、学習データは学習用入力データの各画素についてのラベル（判定結果）を示すデータである。例えば、学習用入力データは、例えば文字又は図形等を含む画像でありうる。そして、画像の輪郭抽出を行う具体的な一例において、学習データは学習用入力データである画像中の輪郭を示す画像であり、例えばユーザ入力に従って作成されたものでありうる。基本データ記憶部は、学習データと組み合わせて、このような学習用入力データをさらに保持することができる。本明細書において、基本学習データ（基本教師データ）は設定部１０２による加工又は変形のような処理が行われる前の学習データ（教師データ）を指す。 In such a configuration, the learning input data is image data, and the learning data is data indicating a label (determination result) for each pixel of the learning input data. For example, the input data for learning may be an image including, for example, characters or figures. In a specific example of extracting the outline of an image, the learning data is an image indicating an outline in the image, which is input data for learning, and may be, for example, one created in accordance with a user input. The basic data storage unit can further hold such input data for learning in combination with the learning data. In this specification, basic learning data (basic teacher data) refers to learning data (teacher data) before processing such as processing or deformation by the setting unit 102 is performed.

設定部１０２は、ネットワークの学習に用いる学習データを設定する。また、適合的データ記憶部１０３は、設定部１０２により設定された学習データを保持する。一実施形態において、設定部１０２は、基本学習データに対して加工、変形、又はフィルタ処理のような処理を行うことにより、学習データを生成する。設定部１０２は、このように生成した学習データを適合的データ記憶部１０３に格納することにより、ネットワークの学習に用いる学習データ（以下、適合的学習データ、適合的教師データ、又は適合的ＧＴと呼ぶことがある）を設定する。設定部１０２は、さらに、元の基本学習データを適合的データ記憶部１０３に格納してもよい。後述するように、設定部１０２は、単一の学習用入力データに対応する、階層型ネットワークからの２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を設定する。 The setting unit 102 sets learning data used for network learning. Further, the adaptive data storage unit 103 holds the learning data set by the setting unit 102. In one embodiment, the setting unit 102 generates learning data by performing processing such as processing, deformation, or filtering on the basic learning data. The setting unit 102 stores the learning data generated in this way in the adaptive data storage unit 103, and thereby stores learning data (hereinafter referred to as adaptive learning data, adaptive teacher data, or adaptive GT) used for network learning. May be called). The setting unit 102 may further store the original basic learning data in the adaptive data storage unit 103. As described later, the setting unit 102 sets learning data (adapted teacher data) for each of two or more different outputs from the hierarchical network corresponding to a single learning input data.

学習部１０４は、適合的データ記憶部１０３に格納された学習データを読み込み、ネットワークの学習処理を行う。また、学習部１０４は、学習により得られた最終的な学習結果（例えば、ネットワークのパラメータ）を、学習結果記憶部１０５に格納する。階層型ネットワークの学習方法としては、公知の方法を用いることができる。例えば、階層型ネットワークにおける順伝搬計算の結果得られた出力値の誤差をこのネットワークにおいて逆伝播させることにより、ネットワークの結合状態に対応する重み係数その他のパラメータを反復的に更新することができる。後述するように、本実施形態において学習部１０４は、学習用入力データをネットワークに入力して得られる、２以上の異なる出力のそれぞれと、出力に対応する学習データ（適合的教師データ）と、の誤差に基づいて、階層型ネットワークの学習を行う。 The learning unit 104 reads the learning data stored in the adaptive data storage unit 103 and performs a network learning process. The learning unit 104 stores the final learning result (for example, network parameters) obtained by the learning in the learning result storage unit 105. As a learning method for the hierarchical network, a known method can be used. For example, by backpropagating the error of the output value obtained as a result of the forward propagation calculation in the hierarchical network, the weighting factor and other parameters corresponding to the connection state of the network can be updated iteratively. As will be described later, in the present embodiment, the learning unit 104 includes two or more different outputs obtained by inputting learning input data to the network, learning data (adapted teacher data) corresponding to the outputs, Learning of the hierarchical network is performed based on the error of.

テストデータ記憶部１０６は、ネットワークの評価に用いるテストデータを保持する。評価部１０７は、テストデータを用いてネットワークの評価を行う。このようにして得られた学習後の階層型ネットワークは、後述するように、入力データに対応する２以上の異なる出力を与える。こうして得られた２以上の異なる出力の統合結果が、入力データに対する認識処理の結果を示す。 The test data storage unit 106 holds test data used for network evaluation. The evaluation unit 107 evaluates the network using the test data. The learned hierarchical network thus obtained provides two or more different outputs corresponding to the input data, as described later. The integration result of the two or more different outputs thus obtained indicates the result of the recognition processing on the input data.

図２は本実施形態に係る学習方法のフローチャートである。以下、このフローチャートに沿って説明する。ステップＳ２１０において設定部１０２は、基本データ記憶部１０１から基本学習データを読み込む。ステップＳ２２０において、設定部１０２は、基本学習データに基づいて、適合的学習データを設定する。ここで、設定部１０２は、階層型ネットワークの構造に基づいて、２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を設定する。設定される適合的学習データは、階層型ネットワークを形成するユニットの構成又はそれらの結合状態に依存しうる。以下では、一例として、サイドアウト学習（最終層の出力誤差だけではなく、中間層の出力誤差にも基づいて学習を行う手法であり、詳細は後述する）を行う場合について説明する。 FIG. 2 is a flowchart of the learning method according to the present embodiment. Hereinafter, description will be given along this flowchart. In step S210, the setting unit 102 reads the basic learning data from the basic data storage unit 101. In step S220, the setting unit 102 sets appropriate learning data based on the basic learning data. Here, the setting unit 102 sets learning data (adapted teacher data) for each of two or more different outputs based on the structure of the hierarchical network. The set adaptive learning data may depend on the configuration of the units forming the hierarchical network or their connection state. Hereinafter, as an example, a case of performing side-out learning (a method of performing learning based on not only the output error of the final layer but also the output error of the intermediate layer, which will be described in detail later) will be described.

ステップＳ２３０において、学習部１０４は、ステップＳ２２０で設定された適合的学習データを用いて、階層的ネットワークの学習を行う。本実施形態で用いられるネットワークはサイドアウト（中間層からの出力）を有しており、このサイドアウトに基づいて判定結果を得ることができる。具体的な学習方法については後述する。 In step S230, the learning unit 104 learns a hierarchical network using the adaptive learning data set in step S220. The network used in the present embodiment has a side-out (output from the hidden layer), and a determination result can be obtained based on the side-out. A specific learning method will be described later.

ステップＳ２４０において、学習部１０４は、ステップＳ２３０における学習を終了するか否かを判定する。例えば、学習部１０４は、ネットワークの学習結果が所定の基準に達した際に、学習を終了すると判定することができる。一例として、評価部１０７は、テストデータ記憶部１０６に格納されているテストデータ（評価用のデータ）を用いて、ネットワークの誤認識率を評価することができる。このテストデータは、例えば、基本データ記憶部１０１が格納しているデータとは異なる、学習用入力データと、学習用入力データに対する判定結果を示す教師データと、のセットでありうる。また、誤認識率は、評価に用いたテストデータ全体のうち、誤った認識結果が得られたテストデータの比率として定義することができる。そして、ネットワークの誤認識率が所定の閾値以下となっている場合に、学習部１０４は、学習を終了すると判定することができる。学習を終了しない場合、処理はステップＳ２３０に戻り、学習部１０４が再びネットワークの学習を行う。一方、学習を終了する場合、処理はステップＳ２５０に進み、ここで学習部１０４は最終的な学習結果（例えば、後述するようなネットワークの重みパラメータ及び中間層準出力の結合係数）を、学習結果記憶部１０５に格納する。 In step S240, the learning unit 104 determines whether to end the learning in step S230. For example, when the learning result of the network reaches a predetermined criterion, the learning unit 104 can determine to end the learning. As an example, the evaluation unit 107 can evaluate the network misrecognition rate using test data (data for evaluation) stored in the test data storage unit 106. The test data may be, for example, a set of learning input data, which is different from the data stored in the basic data storage unit 101, and teacher data indicating a determination result for the learning input data. Further, the false recognition rate can be defined as a ratio of test data in which a false recognition result is obtained, of the entire test data used for evaluation. Then, when the erroneous recognition rate of the network is equal to or less than the predetermined threshold, the learning unit 104 can determine that the learning is to be ended. If the learning is not completed, the process returns to step S230, and the learning unit 104 performs network learning again. On the other hand, if the learning is to be ended, the process proceeds to step S250, where the learning unit 104 determines the final learning result (for example, the network weight parameter and the coupling coefficient of the intermediate layer quasi-output described later) as the learning result. It is stored in the storage unit 105.

本実施形態に係る学習装置１００は、図１に示す機能構成を実現する装置によって実現することができる。例えば、学習装置１００は、各処理部を実現する専用のハードウェアを有していてもよい。一方で、一部又は全部の処理部が、コンピュータにより実現されてもよい。 The learning device 100 according to the present embodiment can be realized by a device that realizes the functional configuration illustrated in FIG. For example, the learning device 100 may have dedicated hardware for realizing each processing unit. On the other hand, some or all of the processing units may be realized by a computer.

図１１は、学習装置１００又はその処理部として動作可能なコンピュータの基本構成を示す図である。図１１においてプロセッサ１１１０は、例えばＣＰＵであり、コンピュータ全体の動作をコントロールする。メモリ１１２０は、例えばＲＡＭであり、プログラム及びデータ等を一時的に記憶する。コンピュータが読み取り可能な記憶媒体１１３０は、例えばハードディスク又はＣＤ−ＲＯＭ等であり、プログラム及びデータ等を長期的に記憶する。本実施形態においては、記憶媒体１１３０が格納している、各部の機能を実現するプログラムが、メモリ１１２０へと読み出される。そして、プロセッサ１１１０が、メモリ１１２０上のプログラムに従って動作することにより、各部の機能が実現される。また、メモリ１１２０又は記憶媒体１１３０は、基本データ記憶部１０１、適合的データ記憶部１０３、学習結果記憶部１０５、又はテストデータ記憶部１０６のような記憶部として動作することもできる。 FIG. 11 is a diagram illustrating a basic configuration of a learning device 100 or a computer operable as a processing unit thereof. In FIG. 11, a processor 1110 is, for example, a CPU and controls the operation of the entire computer. The memory 1120 is, for example, a RAM, and temporarily stores a program, data, and the like. The computer-readable storage medium 1130 is, for example, a hard disk or a CD-ROM, and stores programs and data for a long time. In the present embodiment, a program for realizing the function of each unit stored in the storage medium 1130 is read out to the memory 1120. Then, the processor 1110 operates according to the program on the memory 1120 to realize the function of each unit. Further, the memory 1120 or the storage medium 1130 can also operate as a storage unit such as the basic data storage unit 101, the adaptive data storage unit 103, the learning result storage unit 105, or the test data storage unit 106.

図１１において、入力インタフェース１１４０は外部の装置から情報を取得するためのインタフェースである。また、出力インタフェース１１５０は外部の装置へと情報を出力するためのインタフェースである。バス１１６０は、上述の各部を接続し、データのやりとりを可能とする。 In FIG. 11, an input interface 1140 is an interface for acquiring information from an external device. The output interface 1150 is an interface for outputting information to an external device. A bus 1160 connects the above-described units and enables data exchange.

（階層型ネットワークの構成及び学習方法）
以下、本実施形態において使用可能な階層型ネットワークの例及びステップＳ２３０で行われるネットワークの学習について説明する。図３は、階層型ネットワークの一例を示す。図３のネットワークは、３つの中間層群３０２，３０３，３０４によって構成されている。それぞれの中間層群の具体的な構成は特に限定されないが、例えば、畳み込み層、プーリング層、及びフルコネクト層のうち１以上の組み合わせにより構成されていてもよい。 (Structure of hierarchical network and learning method)
Hereinafter, an example of a hierarchical network that can be used in the present embodiment and learning of the network performed in step S230 will be described. FIG. 3 shows an example of a hierarchical network. The network shown in FIG. 3 includes three intermediate layer groups 302, 303, and 304. Although a specific configuration of each intermediate layer group is not particularly limited, for example, it may be configured by a combination of at least one of a convolutional layer, a pooling layer, and a full connect layer.

本実施形態において、階層型ネットワークからは、単一の入力データに対応する２以上の異なる出力が得られる。例えば、図３のネットワークにおいては、２以上の異なる階層から出力が得られる。すなわち、図３のネットワークにおいては、中間層群３０２，３０３，３０４のそれぞれから中間層準出力３０７，３０８，３０９が得られる。そして、中間層準出力３０７，３０８，３０９を統合することにより統合出力３０５が得られる。この統合出力３０５に基づいて、入力３０１に対する判定結果が得られる。すなわち、学習データを入力３０１として入力すると、中間層群３０２、中間層群３０３、中間層群３０４を経て、統合出力３０５が得られる。本実施形態では、一例として、中間層群３０２は２つの畳み込み層から、中間層群３０３は１つのプーリング層及び続く２つの畳み込み層から、中間層群３０４も１つのプーリング層及び続く２つの畳み込み層から構成されるものとする。 In this embodiment, two or more different outputs corresponding to a single input data are obtained from the hierarchical network. For example, in the network of FIG. 3, outputs are obtained from two or more different layers. That is, in the network of FIG. 3, intermediate layer quasi outputs 307, 308, and 309 are obtained from each of the intermediate layer groups 302, 303, and 304. Then, an integrated output 305 is obtained by integrating the intermediate output 307, 308, and 309. Based on the integrated output 305, a determination result for the input 301 is obtained. That is, when learning data is input as an input 301, an integrated output 305 is obtained through an intermediate layer group 302, an intermediate layer group 303, and an intermediate layer group 304. In the present embodiment, as an example, the middle layer group 302 includes two convolution layers, the middle layer group 303 includes one pooling layer and two subsequent convolution layers, and the middle layer group 304 also includes one pooling layer and two subsequent convolution layers. It shall be composed of layers.

本実施形態において、ネットワークの学習は、それぞれの中間層群からのサイドアウト学習によって行われる。通常、階層型ネットワークを用いる場合、最終出力のみに対して誤差評価が行われ、そして誤差逆伝播法によってネットワークの学習が行われる。一方、サイドアウト学習においては、中間層群からの出力に対しても誤差評価が行われる。そして、誤差の情報を中間層群にも入力し、これを逆伝播させることができる。例えば、非特許文献１のＨＥＤ(Holistically-nested Edge Detection)は、階層型ネットワークを用いて輪郭抽出（入力対象画像中に含まれる物体の輪郭部分を抽出する）を行う方法を開示している。非特許文献１においては、サイドアウト学習を用いており、具体的には中間層部分においても学習データとの誤差評価を行い、誤差逆伝播法を用いてネットワークの学習を行っている。 In the present embodiment, network learning is performed by side-out learning from each of the intermediate layer groups. Usually, when a hierarchical network is used, error evaluation is performed only on the final output, and learning of the network is performed by the error back propagation method. On the other hand, in the side-out learning, the error evaluation is also performed on the output from the hidden layer group. Then, information on the error is also input to the intermediate layer group and can be back-propagated. For example, HED (Holistically-nested Edge Detection) of Non-Patent Document 1 discloses a method of performing contour extraction (extracting a contour part of an object included in an input target image) using a hierarchical network. In Non-Patent Document 1, side-out learning is used. Specifically, an error evaluation with respect to the learning data is performed even in the intermediate layer portion, and network learning is performed using an error back propagation method.

本実施形態の場合、それぞれの中間層群３０２，３０３，３０４から、中間層準出力３０７，３０８，３０９がサイドアウト学習用に出力される。そして、それぞれ中間層準出力３０７，３０８，３０９と学習データ（ＧＴ）との誤差である、中間層誤差３１０，３１１，３１２が算出される。ここで、中間層誤差３１０はｌ_ｓｉｄｅ ^１と、中間層誤差３１１はｌ_ｓｉｄｅ ^２と、中間層誤差３１２はｌ_ｓｉｄｅ ^３と、それぞれ表される。このように評価された中間層誤差３１０，３１１，３１２の総和を計算することにより、中間層全体での誤差評価値（式（１）のＬ_ｓｉｄｅ）が得られる。
In the case of the present embodiment, intermediate layer quasi-outputs 307, 308, 309 are output from the respective intermediate layer groups 302, 303, 304 for side-out learning. Then, intermediate layer errors 310, 311 and 312, which are errors between the intermediate layer quasi outputs 307, 308 and 309 and the learning data (GT), respectively, are calculated. Here, the intermediate layer error 310 is represented by l _side ¹ , the intermediate layer error 311 is represented by l _side ^2, and the intermediate layer error 312 is represented by l _side ³ . By calculating the sum of the intermediate layer errors 310, 311 and 312 evaluated in this way, an error evaluation value (L _{side in} equation (1)) for the entire intermediate layer is obtained.

誤差の評価方法は特に限定されない。例えば、ＧＴのラベル値が０と１の２値である場合には、式（２）に示されるようにクロスエントロピーを用いて中間層ｍの誤差評価値Ｌ_ｓｉｄｅ ^ｍを規定することができる。式（２）において、ｙ_ｊ ^ｍは中間層ｍの各画素の出力値を表す。Ｙ_＋ ^ｍは、中間層ｍに与えるＧＴのうちポジティブ（ラベル値が１）である領域を、Ｙ₋ ^ｍは中間層ｍに与えるＧＴのうちネガティブ（ラベル値が０）である領域を、それぞれ表す。そして、Σは全画素についての和を意味する。βはＧＴのうちポジティブなものとネガティブなものとの比率のアンバランスを補正する係数であり、例えば、ＧＴ全体の画素数に対するネガティブな領域の画素数の比率として定義することができる。この値βは、ＧＴ毎に算出され設定されてもよいし、全ＧＴに対して同じ値（例えば、各ＧＴについての値βの平均値）が設定されてもよい。
The error evaluation method is not particularly limited. For example, when the GT label value is a binary value of 0 and 1, the error evaluation value L _side ^m of the intermediate layer m can be defined using the cross entropy as shown in Expression (2). In the equation (2), y _j ^m represents an output value of each pixel of the intermediate layer m. Y ₊ ^m indicates a positive (label value 1) region in the GT applied to the intermediate layer m, and Y ₋ ^m indicates a negative (label value 0) region in the GT applied to the intermediate layer m. Represent. And Σ means the sum for all pixels. β is a coefficient for correcting the imbalance in the ratio between the positive and negative GTs, and can be defined as, for example, the ratio of the number of pixels in the negative region to the number of pixels in the entire GT. This value β may be calculated and set for each GT, or the same value (for example, an average value of the values β for each GT) may be set for all GTs.

また、統合出力３０５は、入力データに対応する２以上の異なる出力を統合することにより得ることができる。例えば、中間層準出力３０７，３０８，３０９の線形和を求めることにより、中間層準出力３０７，３０８，３０９を重ね合わせることができる。そして、こうして得られた線形和に対してさらにシグモイド関数のような活性化関数σを作用させることにより、統合出力３０５を得ることができる。ここで、中間層準出力３０７をＡ_ｓｉｄｅ ^１と、中間層準出力３０８をＡ_ｓｉｄｅ ^２と、中間層準出力３０９をＡ_ｓｉｄｅ ^３と、それぞれ表すことができる。この場合、例えば式（３）に従うＹ_ｆｕｓｅを、統合出力３０５として得ることができる。統合出力３０５を得る際に用いる各中間層準出力３０７，３０８，３０９の重みも、学習により決定することができる。例えば、式（３）に示される線形和の結合係数ｈ_ｍも、学習により決定することができる。
Further, the integrated output 305 can be obtained by integrating two or more different outputs corresponding to the input data. For example, by calculating the linear sum of the intermediate layer quasi outputs 307, 308, 309, the intermediate layer quasi outputs 307, 308, 309 can be superimposed. Then, by further applying an activation function σ such as a sigmoid function to the linear sum thus obtained, an integrated output 305 can be obtained. Here, the intermediate layer quasi output 307 can be represented as A _side ¹ , the intermediate layer quasi output 308 as A _side ^2, and the intermediate layer quasi output 309 as A _side ³ . In this case, for example, Y _fuse according to Expression (3) can be obtained as the integrated output 305. The weights of the respective intermediate-layer quasi-outputs 307, 308, and 309 used when obtaining the integrated output 305 can also be determined by learning. For example, the coupling coefficient h _m of the linear sum as shown in equation (3) can also be determined by learning.

本実施形態では、統合出力３０５とＧＴとの誤差である、統合誤差３１３も評価される。例えば、式（４）に従って、統合出力Ｙ_ｆｕｓｅとＧＴのラベル値Ｙとの誤差であるＬ_ｆｕｓｅを、統合誤差３１３として得ることができる。式４においてＤｉｓｔ（）は、ＹとＹ_ｆｕｓｅとの誤差評価に用いる距離関数を意味し、この関数としては例えばクロスエントロピーを用いることができる。
In the present embodiment, the integrated error 313, which is the error between the integrated output 305 and GT, is also evaluated. For example, according to Equation (4), L _fuse , which is the error between the integrated output Y _fuse and the label value Y of the GT, can be obtained as the integrated error 313. In Expression 4, Dist () means a distance function used for evaluating an error between Y and Y _fuse . For example, cross entropy can be used as this function.

ネットワーク全体の誤差は、統合誤差３１３（Ｌ_ｆｕｓｅ）と、各中間層誤差３１０，３１１，３１２の総和（Ｌ_ｓｉｄｅ）と、にしたがって得ることができる。例えば、ネットワーク全体の誤差は、式（５）で示されるＬ_{ｔｏｔａｌ}でありうる。階層型ネットワーク内の各重みパラメータ及び上記中間層準出力の結合係数（ｈ_ｍ）は、このネットワーク全体の誤差（Ｌ_{ｔｏｔａｌ}）を最小化するように、学習によって決定することができる。
The error of the entire network can be obtained according to the integrated error 313 (L _fuse ) and the sum (L _side ) of the respective intermediate layer errors 310, 311 and 312. For example, the error of the entire network may be L _total represented by Expression (5). Coupling coefficient of each weight parameter and the intermediate layer quasi output in a hierarchical network (h _m) is to minimize the entire network error (L _total), it can be determined by learning.

上記のような階層型ネットワークの構成及び学習方法は、例えば非特許文献１にも記載されている通りである。一方、本実施形態においては、中間層誤差３１０，３１１，３１２を得る際に、それぞれの中間層群３０２，３０３，３０４（又は中間層準出力３０７，３０８，３０９）に合わせて設定された、適合的学習データが用いられる。すなわち、中間層誤差３１０，３１１，３１２は、それぞれの中間層群３０２，３０３，３０４に合わせて設定された適合的ＧＴ３０６−１，３０６−２，３０６−３と、中間層準出力３０７，３０８，３０９と、の誤差として定義される。以下、この構成について説明する。 The configuration and learning method of the above-mentioned hierarchical network are as described in Non-Patent Document 1, for example. On the other hand, in the present embodiment, when the intermediate layer errors 310, 311 and 312 are obtained, they are set according to the respective intermediate layer groups 302, 303 and 304 (or the intermediate layer quasi outputs 307, 308 and 309). Adaptive learning data is used. That is, the intermediate layer errors 310, 311, and 312 are determined by the adaptive GTs 306-1, 306-2, and 306-3 set according to the respective intermediate layer groups 302, 303, and 304, and the intermediate layer quasi outputs 307 and 308. , 309 and 309. Hereinafter, this configuration will be described.

図４は、例えば階層型ネットワークを画像からの輪郭抽出に適用する場合における、ネットワークのサイドアウト学習について説明する図である。図４は、非特許文献１のように、同じＧＴ（基本学習データに相当）を用いて、各中間層準出力の誤差評価をする場合を、模式的に表している。図４は、統合出力３０５、及び中間層群３０２〜３０４からの中間層準出力３０７〜３０９と、ＧＴ３０６との関係を表す。 FIG. 4 is a diagram illustrating side-out learning of a network when a hierarchical network is applied to contour extraction from an image, for example. FIG. 4 schematically illustrates a case in which the same GT (corresponding to basic learning data) is used to evaluate the error of each intermediate layer quasi-output as in Non-Patent Document 1. FIG. 4 shows the relationship between the integrated output 305 and the intermediate layer quasi-outputs 307 to 309 from the intermediate layer groups 302 to 304, and the GT 306.

畳み込みニューラルネットワークのような階層型ネットワークにおいては、通常、畳み込み層の後にプーリング層が配置される。プーリング層を配置することにより、畳み込み層で抽出された特徴の位置感度が低下し、プーリング層からの出力が位置変化に対するロバストネスを得ることができる。 In a hierarchical network such as a convolutional neural network, a pooling layer is usually arranged after a convolutional layer. By arranging the pooling layer, the position sensitivity of the feature extracted in the convolutional layer is reduced, and the output from the pooling layer can obtain robustness against position change.

例えば、プーリング層においてストライド２の２×２ＭＡＸプーリングを行うと、プーリングにより２×２の４画素のうち最大値のみが出力される。上述のように、図３の例において中間層群３０３，３０４はそれぞれ１層のプーリング層を有している。したがって、例えば１２８×１２８サイズの学習用入力データである画像をネットワークに入力し、これらのプーリング層がストライド２の２×２ＭＡＸプーリングを行う場合、中間層群３０３からは６４×６４サイズの出力が得られる。また、中間層群３０４からは３２×３２サイズの出力が得られる。 For example, when 2 × 2 MAX pooling of stride 2 is performed in the pooling layer, only the maximum value of the 2 × 2 four pixels is output by pooling. As described above, each of the intermediate layer groups 303 and 304 in the example of FIG. 3 has one pooling layer. Therefore, for example, when an image that is 128 × 128 size learning input data is input to the network and these pooling layers perform 2 × 2 MAX pooling of stride 2, a 64 × 64 size output is output from the intermediate layer group 303. can get. Also, an output of 32 × 32 size is obtained from the intermediate layer group 304.

一方、ＧＴ（基本学習データに相当）は通常、学習用入力データと同サイズの画像（例えば輪郭画像）である。したがって、中間層準出力をＧＴと比較して誤差評価するために、中間層準出力はＧＴと同じサイズの１２８×１２８サイズに拡大される。すると、図４に示されるように、中間層準出力における１画素が、誤差評価の段階では、中間層準出力３０８の場合には２×２のサイズに、中間層準出力３０９の場合には４×４のサイズに拡大される。したがって、例えば輪郭抽出の場合においては、中間層準出力３０７及びＧＴにおける輪郭線幅が１ピクセルサイズだったとしても、中間層準出力３０８の輪郭線幅は２ピクセルサイズに、中間層準出力３０９の輪郭線幅は４ピクセルサイズになる。したがって、誤差を評価する際には、中間層準出力３０８，３０９においては、線幅の違いによる誤差の過大評価が生じる可能性がある。 On the other hand, GT (corresponding to basic learning data) is usually an image (for example, a contour image) of the same size as the input data for learning. Therefore, in order to evaluate the error by comparing the intermediate layer quasi-output with the GT, the intermediate layer quasi-output is enlarged to the 128 × 128 size which is the same size as the GT. Then, as shown in FIG. 4, one pixel in the intermediate layer quasi-output has a size of 2 × 2 in the case of the intermediate layer quasi-output 308 and in the case of the intermediate layer quasi-output 309 in the error evaluation stage. It is enlarged to a size of 4x4. Therefore, for example, in the case of the contour extraction, even if the contour width in the intermediate layer semi-output 307 and the GT is 1 pixel size, the contour width of the intermediate layer semi-output 308 is 2 pixel size and the intermediate layer semi-output 309 is Has a 4-pixel size. Therefore, when the error is evaluated, the intermediate layer quasi outputs 308 and 309 may overestimate the error due to the difference in line width.

図５は、ＧＴと中間層準出力における線幅の相違によって、誤差が過大に評価される過程を模式的に示す。図５（Ａ）に示すように、中間層準出力３０７とＧＴ３０６において線幅の相違はないため、誤差評価においては、中間層群３０２から出力された輪郭パターンとＧＴ３０６のパターンとの相違が評価される。一方、図５（Ｂ）に示すように、中間層準出力３０８とＧＴ３０６との間には線幅の相違が存在するため、誤差評価においては、中間層群３０３から出力された輪郭パターンとＧＴ３０６のパターンとの相違の他に、線幅の相違に起因する誤差も評価される。さらに、図５（Ｃ）に示すように、中間層準出力３０９とＧＴ３０６との間にはより大きな線幅の相違が存在するため、線幅の相違に起因する誤差はより大きくなる。 FIG. 5 schematically illustrates a process in which an error is overestimated due to a difference in line width between the GT and the intermediate layer quasi-output. As shown in FIG. 5A, there is no difference in the line width between the intermediate layer quasi-output 307 and the GT 306. Therefore, in the error evaluation, the difference between the contour pattern output from the intermediate layer group 302 and the pattern of the GT 306 is evaluated. Is done. On the other hand, as shown in FIG. 5B, since there is a difference in line width between the intermediate layer semi-output 308 and the GT 306, in the error evaluation, the contour pattern output from the intermediate layer group 303 and the GT 306 are compared. In addition to the differences from the patterns described above, errors due to differences in line width are also evaluated. Further, as shown in FIG. 5C, since there is a larger difference in line width between the intermediate layer quasi-output 309 and the GT 306, the error caused by the difference in line width is larger.

図５（Ｄ）は、誤差が過大に評価される様子を模式的に示す。このように、中間層準出力に示される中間層群から出力された輪郭パターン５１０と、ＧＴ５２０と、の間に線幅の相違が存在する場合には、ＧＴに示される輪郭線の両側に正しく誤差評価がなされない領域５３０が存在する。輪郭抽出の問題において正しく評価したいのは、出力とＧＴとのパターンの相違であるため、線幅の相違のようなそれ以外の誤差が評価されてしまうと好ましい最終学習結果が得られない可能性が生じる。 FIG. 5D schematically shows how the error is overestimated. As described above, when there is a difference in the line width between the contour pattern 510 output from the intermediate layer group shown in the intermediate layer quasi-output and the GT 520, the contour pattern is correctly placed on both sides of the contour shown in the GT. There is a region 530 where error evaluation is not performed. In the problem of contour extraction, what we want to evaluate correctly is the difference between the pattern of the output and the GT. Therefore, if other errors such as the difference in line width are evaluated, there is a possibility that a favorable final learning result cannot be obtained. Occurs.

非特許文献１には、中間層準出力の誤差に基づく学習に適した学習データを、基本学習データから生成するような処理は記載されていない。そして、最終的な統合出力の誤差に基づく学習データ（基本学習データに相当）と同一の学習データを用いて、各中間層準出力に基づくサイドアウト学習を行う場合、中間層準出力の誤差評価性能が低下し、学習の効率が低下する可能性があった。 Non-Patent Document 1 does not disclose a process of generating learning data suitable for learning based on an error of a quasi-output of an intermediate layer from basic learning data. When performing side-out learning based on each intermediate layer quasi-output using the same learning data as the final learning data (corresponding to basic learning data) based on the error of the integrated output, the error evaluation of the intermediate quasi-output is performed. Performance could be reduced and learning efficiency could be reduced.

このため、本実施形態において、設定部１０２は、単一の学習用入力データに対応する、ネットワークからの２以上の異なる出力のそれぞれについての教師データ（適合的ＧＴ）を設定する。例えば設定部１０２は、それぞれの中間層群（又は中間層準出力）ごとに適合的ＧＴを設定することができる。このような構成により、線幅のような他の影響を低減して本来評価したい誤差をより正しく評価することが可能となる。その結果、サイドアウト学習の収束性及び得られる階層型ネットワークの性能向上を図ることができる。 For this reason, in the present embodiment, the setting unit 102 sets the teacher data (adapted GT) for each of two or more different outputs from the network corresponding to the single learning input data. For example, the setting unit 102 can set an appropriate GT for each of the intermediate layer groups (or the intermediate layer quasi-output). With such a configuration, it is possible to reduce the other influences such as the line width, and to more correctly evaluate the error originally desired to be evaluated. As a result, the convergence of the side-out learning and the performance of the obtained hierarchical network can be improved.

このために、設定部１０２は、それぞれの中間層群ごとに、元の基本学習データを加工して得られた適合的学習データを設定することができる。例えば、設定部１０２は、中間層群ごとに、中間層準出力における線幅と誤差評価に用いる適合的ＧＴの線幅とが近くなるように、又は少なくとも誤差評価が過大に行われないように、適合的学習データを生成することができる。このようにして、設定部１０２は、それぞれの中間層準出力に対して適切な誤差評価が行われるように、学習データを生成することができる。 For this reason, the setting unit 102 can set, for each intermediate layer group, adaptive learning data obtained by processing the original basic learning data. For example, the setting unit 102 sets, for each intermediate layer group, such that the line width in the intermediate layer quasi-output and the line width of the adaptive GT used for error evaluation are close, or at least the error evaluation is not performed excessively. , Adaptive learning data can be generated. In this way, the setting unit 102 can generate the learning data so that an appropriate error evaluation is performed for each intermediate layer quasi-output.

一方で、基本データ記憶部１０１は、単一の学習用入力データに対応する、階層型ネットワークからの２以上の異なる出力のそれぞれについての学習データ（適合的教師データ）を格納していてもよい。この場合、設定部１０２は、基本データ記憶部１０１から適合的学習データを取得して適合的データ記憶部１０３に格納してもよい。 On the other hand, the basic data storage unit 101 may store learning data (adapted teacher data) for each of two or more different outputs from the hierarchical network corresponding to a single learning input data. . In this case, the setting unit 102 may acquire adaptive learning data from the basic data storage unit 101 and store the acquired adaptive learning data in the adaptive data storage unit 103.

（適合的学習データの設定方法）
以下、ステップＳ２２０における適合的学習データの設定方法の具体例を説明する。 (Method of setting adaptive learning data)
Hereinafter, a specific example of the method of setting the adaptive learning data in step S220 will be described.

図６は、本実施形態における適合的学習データの設定方法を、図３の階層的ネットワークを用いる場合について説明する図である。図６（Ａ）は、中間層準出力３０７に示される輪郭パターンと、中間層準出力３０７の誤差評価用のＧＴ６０１に示されるポジティブ領域（輪郭パターンを表し、以下単にＧＴと呼ぶことがある）と、を示す。同様に、図６（Ｂ）及び図６（Ｃ）は、中間層準出力３０８，３０９に示される輪郭パターンと、中間層準出力３０８，３０９の誤差評価用のＧＴ６０２，６０３に示される輪郭パターンと、を示す。既に説明したように、中間層準出力３０８，３０９の解像度と、ＧＴの解像度とが一致するように、中間層準出力３０８，３０９はＧＴに合わせて拡大される。これに合わせて、中間層準出力３０８，３０９に示される輪郭パターンの線幅も大きくなる。 FIG. 6 is a diagram illustrating a method for setting adaptive learning data according to the present embodiment in a case where the hierarchical network of FIG. 3 is used. FIG. 6A shows a contour pattern shown in the intermediate layer quasi-output 307 and a positive area shown in the GT 601 for error evaluation of the intermediate layer quasi-output 307 (representing a contour pattern, sometimes simply referred to as GT hereinafter). And Similarly, FIG. 6B and FIG. 6C show the contour patterns shown in the intermediate layer quasi-outputs 308 and 309 and the contour patterns shown in the GT 602 and 603 for error evaluation of the intermediate layer quasi-outputs 308 and 309, respectively. And As described above, the intermediate layer quasi outputs 308 and 309 are enlarged so as to match the GT so that the resolution of the intermediate layer quasi outputs 308 and 309 matches the resolution of the GT. In accordance with this, the line width of the contour pattern shown in the intermediate layer quasi-outputs 308 and 309 also increases.

したがって、設定部１０２は、２以上の異なる出力についての教師データを、２以上の異なる出力の解像度に基づいて設定することができる。例えば、設定部１０２は、中間層準出力３０７〜３０９用のＧＴ６０１〜６０３を、中間層準出力３０７〜３０９の解像度に基づいて設定することができる。本実施形態において、設定部１０２は、２以上の異なる出力のそれぞれに対応する幅を有する線画パターンを、２以上の異なる出力についての教師データとして設定する。例えば、設定部１０２は、中間層準出力３０７〜３０９の解像度に対応する幅を有する線画パターンを示すＧＴ６０１〜６０３を、中間層準出力３０７〜３０９の評価用に設定することができる。 Therefore, the setting unit 102 can set the teacher data for two or more different outputs based on the resolution of the two or more different outputs. For example, the setting unit 102 can set the GTs 601 to 603 for the intermediate layer quasi outputs 307 to 309 based on the resolution of the intermediate layer quasi outputs 307 to 309. In the present embodiment, the setting unit 102 sets a line drawing pattern having a width corresponding to each of two or more different outputs as teacher data for two or more different outputs. For example, the setting unit 102 can set GTs 601 to 603 indicating a line drawing pattern having a width corresponding to the resolution of the intermediate layer quasi outputs 307 to 309 for evaluation of the intermediate layer quasi outputs 307 to 309.

具体的には、中間層準出力とＧＴに示される、輪郭を表す線画パターンの線幅が近くなるように、中間層準出力３０８，３０９用のＧＴ６０２，６０３の線幅が大きくされる。より具体的には、図６の例において、中間層準出力３０７，３０８，３０９用のＧＴ６０１，６０２，６０３に示される輪郭パターンの線幅は、それぞれ１，２，４である。このように設定部１０２は、中間層準出力の解像度が大きい（画素数が多い）場合と比較して、解像度が小さい（画素数が少ない）場合に、線画パターンの線幅が大きくなるように、適合的ＧＴを設定することができる。例えば設定部１０２は、適合的ＧＴに示される線画パターンの線幅が、（基本学習データの解像度／中間層準出力の解像度）にほぼ一致するように、適合的ＧＴを設定することができる。 Specifically, the line widths of the GTs 602 and 603 for the intermediate layer semi-outputs 308 and 309 are increased so that the line widths of the line drawing patterns representing the contours shown in the intermediate layer semi-output and GT are close to each other. More specifically, in the example of FIG. 6, the line widths of the contour patterns shown in the GTs 601, 602, and 603 for the intermediate layer quasi outputs 307, 308, and 309 are 1, 2, and 4, respectively. As described above, the setting unit 102 increases the line width of the line drawing pattern when the resolution is small (the number of pixels is small), as compared with the case where the resolution of the intermediate layer quasi-output is large (the number of pixels is large). , An adaptive GT can be set. For example, the setting unit 102 can set the adaptive GT such that the line width of the line drawing pattern indicated by the adaptive GT substantially matches (the resolution of the basic learning data / the resolution of the intermediate layer quasi-output).

設定部１０２は、基本学習データを用いて、中間層準出力の誤差評価用の適合的学習データを生成することができる。本実施形態の場合、設定部１０２は、学習用入力データに対応する線画パターンである基本教師データを用いて適合的学習データを生成することができる。設定部１０２は、例えば図９（Ｄ）のフローチャートに従って、中間層準出力３０７〜３０９の誤差評価用の適合的学習データ（ＧＴ９１１〜９１３）を生成することができる。 The setting unit 102 can use the basic learning data to generate adaptive learning data for evaluating the error of the intermediate layer quasi-output. In the case of the present embodiment, the setting unit 102 can generate adaptive learning data using basic teacher data that is a line drawing pattern corresponding to the learning input data. The setting unit 102 can generate adaptive learning data (GT911 to 913) for error evaluation of the intermediate layer quasi-outputs 307 to 309, for example, according to the flowchart of FIG.

ステップＳ９０１において設定部１０２は、基本データ記憶部１０１に格納された基本学習データ（ＧＴ９１２）を取得する。ステップＳ９０２において設定部１０２は、ＧＴ９１２にフィルタ処理を行うことにより、ＧＴ９１１及びＧＴ９１３を生成する。ステップＳ９０３において設定部１０２は、こうして得られたＧＴ９１１〜ＧＴ９１３を適合的データ記憶部１０３に格納することにより、各中間層準出力３０７〜３０９用のＧＴ９１１〜９１３を設定できる。 In step S901, the setting unit 102 acquires the basic learning data (GT912) stored in the basic data storage unit 101. In step S902, the setting unit 102 generates a GT 911 and a GT 913 by performing a filtering process on the GT 912. In step S903, the setting unit 102 can set the GTs 911 to 913 for the respective intermediate-layer quasi-outputs 307 to 309 by storing the GTs 911 to GT 913 thus obtained in the appropriate data storage unit 103.

この例では、設定部１０２は、基本学習データに対してフィルタ処理を行うことにより、適合的学習データを生成した。すなわち、設定部１０２は、学習用入力データに対応する線画パターンである基本学習データ（ＧＴ９１２）に対して、中間層準出力ごとに異なるフィルタを作用させることにより、異なる適合的学習データ（ＧＴ９１１，９１３）を得ることができる。中間層準出力に示される輪郭パターンは、最終出力側に近づくにつれ、テクスチャを反映した細かな形態から、大まかな形態へと変化していく。基本学習データに対して変換を施すフィルタの効果により、このような変化をモデル化し、このような変化に合わせてＧＴの形態を変化させることができる。一例として、設定部１０２は、中間層準出力の解像度が大きい（画素数が多い）場合と比較して、解像度が小さい（画素数が少ない）場合に、線画パターンの線幅が大きくなるように、用いるフィルタを選択することができる。 In this example, the setting unit 102 generates adaptive learning data by performing a filtering process on the basic learning data. In other words, the setting unit 102 applies different filters to the basic learning data (GT912), which is a line drawing pattern corresponding to the learning input data, for each intermediate layer quasi-output, so that different adaptive learning data (GT911, GT911). 913) can be obtained. The contour pattern shown in the intermediate quasi-output changes from a fine form reflecting the texture to a rough form as it approaches the final output side. Such a change can be modeled by the effect of a filter that converts the basic learning data, and the form of the GT can be changed in accordance with such a change. As an example, the setting unit 102 increases the line width of the line drawing pattern when the resolution is small (the number of pixels is small), as compared with the case where the resolution of the intermediate layer quasi-output is large (the number of pixels is large). , A filter to be used can be selected.

フィルタの具体例としては、特定の周波数帯域のみを通過させるバンドバスフィルタが挙げられる。図９（Ａ）には、ＧＴ９１２に対して高周波パスフィルタを適用することにより得られたＧＴ９１１が示されている。図９（Ｂ）には、輪郭パターンの線幅が２であるＧＴ９１２が示され、中間層準出力３０８に対してはＧＴ９１２がそのまま用いられる。図９（Ｃ）には、ＧＴ９１２に対して低周波パスフィルタを適用することにより得られたＧＴ９１３が示されている。図９（Ａ）〜（Ｃ）からわかるように、ＧＴ９１１はＧＴ９１２よりも輪郭パターンの線幅が細く、ＧＴ９１３はＧＴ９１２よりも輪郭パターンの線幅が太い。なお、図９（Ａ）〜（Ｃ）に示される周波数と強度のグラフにおいて、灰色の部分はフィルタ処理で通過させる帯域を示している。なお、長さの短い輪郭パターン（例えば最大長さが１０ピクセル以下など）に対しては、フィルタ処理を省略し、又は輪郭パターンを消す処理を行ってもよい。このような処理によれば、例えば、ノイズの影響を抑える効果が期待できる。 As a specific example of the filter, there is a band pass filter that allows only a specific frequency band to pass. FIG. 9A illustrates a GT 911 obtained by applying a high-frequency pass filter to the GT 912. FIG. 9B shows a GT 912 in which the line width of the contour pattern is 2, and the GT 912 is used as it is for the intermediate layer semi-output 308. FIG. 9C shows a GT 913 obtained by applying a low-frequency pass filter to the GT 912. As can be seen from FIGS. 9A to 9C, the line width of the contour pattern of the GT 911 is smaller than that of the GT 912, and the line width of the contour pattern of the GT 913 is larger than that of the GT 912. Note that, in the frequency and intensity graphs shown in FIGS. 9A to 9C, the gray portions indicate the bands passed by the filter processing. For a contour pattern having a short length (for example, a maximum length of 10 pixels or less), a filtering process may be omitted or a process of deleting the contour pattern may be performed. According to such processing, for example, an effect of suppressing the influence of noise can be expected.

別の例として、基本データ記憶部１０１は輪郭パターンを示すベクタデータを格納していてもよい。この場合、設定部１０２は、中間層群に対応する線幅を有するＧＴを生成することができる。 As another example, the basic data storage unit 101 may store vector data indicating an outline pattern. In this case, the setting unit 102 can generate a GT having a line width corresponding to the intermediate layer group.

また、中間層準出力３０７〜３０９の誤差評価用の適合的学習データ（ＧＴ６０１〜６０３）は、予め基本データ記憶部１０１に格納されていてもよい。さらに、設定部１０２は、基本データ記憶部１０１に格納されているデータに基づいてＧＴ６０１〜６０３を生成してもよい。図６（Ｄ）は、基本データ記憶部１０１における、ＧＴ６０１〜６０３を生成するためのデータの格納方法の例を説明する図である。また、図６（Ｅ）は、図６（Ｄ）の縦線部分の拡大図である。図６（Ｄ）（Ｅ）に示されるように、統合出力３０５及び中間層準出力３０７の誤差評価用のＧＴ６０１としては、「１」で示される輪郭パターンが用いられ、より具体的にはＧＴ６０１のポジティブ領域は「１」で示される領域である。また、中間層準出力３０８の誤差評価用のＧＴ６０２としては、「１」及び「２」で示される輪郭パターンが用いられ、中間層準出力３０９の誤差評価用のＧＴ６０３としては、「１」及び「２」及び「３」で示される輪郭パターンが用いられる。すなわち、ＧＴ６０２のポジティブ領域は「１」及び「２」で表される領域であり、ＧＴ６０３のポジティブ領域は「１」及び「２」及び「３」で表される領域である。 Further, the adaptive learning data (GTs 601 to 603) for error evaluation of the intermediate layer quasi outputs 307 to 309 may be stored in the basic data storage unit 101 in advance. Further, the setting unit 102 may generate the GTs 601 to 603 based on the data stored in the basic data storage unit 101. FIG. 6D is a diagram illustrating an example of a method of storing data for generating GTs 601 to 603 in basic data storage unit 101. FIG. 6E is an enlarged view of a vertical line portion in FIG. 6D. As shown in FIGS. 6D and 6E, a contour pattern indicated by “1” is used as the GT 601 for error evaluation of the integrated output 305 and the intermediate layer quasi-output 307. More specifically, the GT 601 is used. Are the areas indicated by “1”. Further, as the GT 602 for error evaluation of the intermediate layer quasi-output 308, contour patterns indicated by “1” and “2” are used, and as the GT 603 for error evaluation of the intermediate layer quasi-output 309, “1” and “1” are used. Contour patterns indicated by “2” and “3” are used. That is, the positive area of the GT 602 is an area represented by “1” and “2”, and the positive area of the GT 603 is an area represented by “1”, “2”, and “3”.

この場合、設定部１０２は、基本データ記憶部１０１に格納されたデータを用いて、それぞれの中間層準出力３０７〜３０９の誤差評価用の適合的学習データ（ＧＴ６０１〜６０３）を生成及び設定することができる。このように、中間層準出力３０７〜３０９の誤差評価用のＧＴ６０１〜６０３における輪郭パターンの線幅を順次太くすることにより、パターンの相違以外に起因する誤差が過大に評価されるのを防ぎ、より効果的にサイドアウト学習を行うことができる。例えば、第１の中間層からの出力の誤差評価用のＧＴよりも、第１の中間層よりもプーリング層を通って下流にある第２の中間層からの出力の誤差評価用のＧＴの方が、輪郭パターンの線幅が太くなるように、ＧＴを設定することができる。 In this case, the setting unit 102 uses the data stored in the basic data storage unit 101 to generate and set adaptive learning data (GTs 601 to 603) for error evaluation of the respective intermediate layer quasi outputs 307 to 309. be able to. In this way, by sequentially increasing the line width of the contour pattern in the GTs 601 to 603 for error evaluation of the intermediate layer quasi-outputs 307 to 309, it is possible to prevent errors due to other than pattern differences from being overestimated, The side-out learning can be performed more effectively. For example, the GT for evaluating the error of the output from the second intermediate layer downstream through the pooling layer with respect to the first intermediate layer is better than the GT for evaluating the error of the output from the first intermediate layer. However, the GT can be set so that the line width of the contour pattern is increased.

設定部１０２は、上記のように得られたそれぞれの中間層準出力用のＧＴに対して、ぼかし処理のようなさらなる画像処理を行って得られたＧＴを、適合的学習データとして設定してもよい。例えば図８（Ａ）〜（Ｃ）には、図６に示すＧＴ６０１〜６０３に対して、さらにガウシアンブラー（ガウス関数を用いて画像をぼかす処理）を適用した結果を示す。すなわち、図８（Ａ）には、統合出力３０５と中間層準出力３０７の誤差評価に用いるための、線幅１のＧＴ６０１にガウシアンブラーを作用させた後の断面８０１（輪郭パターンの幅方向の画素値分布）を示す。同様に、図８（Ｂ）（Ｃ）には、中間層準出力３０８，３０９の誤差評価に用いるための、線幅２，４のＧＴ６０２，６０３にガウシアンブラーを作用させた後の断面８０２，８０３を示す。それぞれのＧＴ６０１〜６０３に適用する処理は、同一の強さであってもよいし、中間層準出力の特性に合わせた異なる強さであってもよい。 The setting unit 102 sets a GT obtained by performing further image processing such as a blurring process on each of the intermediate layer quasi-output GTs obtained as described above as adaptive learning data. Is also good. For example, FIGS. 8A to 8C show the results of further applying Gaussian blur (processing for blurring an image using a Gaussian function) to the GTs 601 to 603 shown in FIG. That is, FIG. 8A shows a cross section 801 (in the width direction of the contour pattern) after a Gaussian blur is applied to the GT 601 having a line width of 1 to be used for evaluating the error between the integrated output 305 and the intermediate layer quasi output 307. (Pixel value distribution). Similarly, FIGS. 8B and 8C show cross sections 802 and 802 after applying a Gaussian blur to GTs 602 and 603 having a line width of 2 and 4 to be used for error evaluation of the intermediate layer quasi outputs 308 and 309, respectively. 803 is shown. The processing applied to each of the GTs 601 to 603 may have the same strength or different strengths according to the characteristics of the intermediate layer quasi-output.

このように設定部１０２は、ぼかし処理が行われた線画パターンを、２以上の異なる出力についての教師データとして設定することができる。学習用入力データに示される正しい輪郭パターンの位置と、ＧＴに示される輪郭パターンの位置とは、入力時の誤差のためにわずかにずれている可能性がある。ここで、ＧＴに対してぼかし処理（例えばガウシアンブラー処理）を行うことにより、真の位置を中心とした入力誤差（例えば、ガウシアン分布に従う入力誤差）をＧＴに反映させ、より効果的にサイドアウト学習を行うことができる。 As described above, the setting unit 102 can set the line drawing pattern on which the blurring processing has been performed as teacher data for two or more different outputs. There is a possibility that the position of the correct contour pattern shown in the input data for learning and the position of the contour pattern shown in GT are slightly shifted due to an error at the time of input. Here, by performing a blurring process (for example, Gaussian blur process) on the GT, an input error centered on a true position (for example, an input error according to a Gaussian distribution) is reflected on the GT, and the GT is more effectively side-out. Can learn.

ここまで、主にＧＴにおける輪郭パターンの線幅を、中間層準出力の特性に応じて変更する構成について説明したが、適合的学習データの設定方法はこのような方法に限られない。例えば、設定部１０２は、２以上の異なる出力のそれぞれに対応する幅を有する誤差評価対象外領域が線画パターンの周囲に設定された、２以上の異なる出力についての教師データを設定することができる。 The configuration in which the line width of the contour pattern in the GT is mainly changed in accordance with the characteristic of the intermediate layer quasi-output has been described above, but the method of setting the adaptive learning data is not limited to such a method. For example, the setting unit 102 can set teacher data for two or more different outputs in which an error evaluation non-target area having a width corresponding to each of two or more different outputs is set around the line drawing pattern. .

このように、ＧＴに誤差評価を行わない誤差評価対象外領域を設定する方法について、図７を参照して説明する。図７（Ａ）は、中間層準出力３０７及び誤差評価用のＧＴ６０１を示し、これは図６（Ａ）と同様である。一方、図７（Ｂ）は、中間層準出力３０８、及び線幅１のＧＴ６０１（ＧＴのポジティブ領域）と線幅２の付帯領域７０２とで構成される中間層準出力３０８の誤差評価用のＧＴを表す。また、図７（Ｃ）は、中間層準出力３０９、及び線幅１のＧＴ６０１（ＧＴのポジティブ領域）と線幅４の付帯領域７０３とで構成される中間層準出力３０９の誤差評価用のＧＴを表す。ここで、付帯領域とは、誤差評価において評価を行わない、ポジティブ領域である輪郭パターンの両側に付属する領域のことを表す。この場合、式（２）を用いた評価において、Ｙ_＋ ^ｍは中間層ｍに与えるＧＴのうちポジティブ（例えばラベル値が１）な領域を表す。また、Ｙ₋ ^ｍは中間層ｍに与えるＧＴのうちネガティブ（例えばラベル値が０）である領域を表す。このネガティブな領域は、全体の領域からポジティブ領域と付帯領域（例えばラベル値が２）を除いた領域である。 A method of setting a non-error evaluation target area in which error evaluation is not performed on the GT will be described with reference to FIG. FIG. 7A shows an intermediate layer quasi-output 307 and a GT 601 for error evaluation, which is the same as FIG. 6A. On the other hand, FIG. 7B shows an error evaluation of the intermediate layer quasi-output 308 and the intermediate layer quasi-output 308 composed of the GT 601 having a line width of 1 (a positive region of GT) and the incidental region 702 having a line width of 2. GT. FIG. 7 (C) shows the error evaluation of the intermediate layer quasi-output 309 and the intermediate layer quasi-output 309 composed of the GT 601 having a line width of 1 (positive area of GT) and the incidental region 703 having a line width of 4. GT. Here, the supplementary region refers to a region that is not evaluated in the error evaluation and that is attached to both sides of the contour pattern that is a positive region. In this case, in the evaluation using Expression (2), Y ₊ ^m represents a positive (for example, a label value of 1) region in the GT provided to the intermediate layer m. Y ₋ ^m represents a negative region (for example, a label value of 0) in the GT provided to the intermediate layer m. The negative region is a region obtained by removing the positive region and the accompanying region (for example, the label value is 2) from the entire region.

このような付帯領域を有するＧＴは、例えば、図６（Ｄ）（Ｅ）に示されるデータに従って作成することができる。例えば、図７（Ｂ）に示すＧＴは、「１」の領域をポジティブ領域に、「２」の領域を付帯領域に、それぞれ設定することにより作成することができる。また、図７（Ｃ）に示すＧＴは、「１」の領域をポジティブ領域に、「２」及び「３」の領域を付帯領域に、それぞれ設定することにより作成することができる。また、上記のようなフィルタ処理を用いて付帯領域を設定することも可能である。このように、中間層準出力３０７〜３０９の誤差評価用のＧＴ６０１における付帯領域７０２，７０３の線幅を順次太くすることによっても、パターンの相違以外に起因する誤差が過大に評価されるのを防ぎ、より効果的にサイドアウト学習を行うことができる。 A GT having such an incidental region can be created, for example, in accordance with the data shown in FIGS. For example, the GT shown in FIG. 7B can be created by setting an area “1” as a positive area and an area “2” as an incidental area. The GT shown in FIG. 7C can be created by setting the area of “1” as a positive area and the areas of “2” and “3” as ancillary areas. In addition, it is also possible to set an incidental region using the above-described filter processing. As described above, by sequentially increasing the line widths of the incidental regions 702 and 703 in the GT 601 for error evaluation of the intermediate layer quasi-outputs 307 to 309, the error caused by other than the pattern difference can be overestimated. Prevention and more effective side-out learning.

（様々なネットワーク構成への応用例）
ここまでは、それぞれの中間層群からの中間層準出力に基づいてサイドアウト学習を行う場合について説明したが、本実施形態に係る方法の適用例はこれに限られない。例えば、図１０に示すように、１つの中間層群からの複数の出力に基づいてサイドアウト学習を行うこともできる。図１０に示す構成においては、ネットワークの１つの中間層群における２以上の異なる中間層からの出力に基づいて、サイドアウト学習が行われる。図１０（Ａ）において、１つの中間層群１３００には、畳み込み層１３０１，１３０２，１３０３、及びプーリング１３０４層が含まれる。また、図１０（Ａ）には、畳み込み層１３０１〜１３０３の出力１３１１〜１３１３と、そこでの誤差評価に用いるＧＴ１３２１〜１３２３が示されている。図１０（Ｂ）には、ＧＴ１３２１〜１３２３における輪郭パターンの線幅の変化を示しており、次第に線幅が大きくなることがわかる。 (Examples of application to various network configurations)
Up to this point, a case has been described in which side-out learning is performed based on the intermediate layer quasi-outputs from the respective intermediate layer groups, but the application example of the method according to the present embodiment is not limited to this. For example, as shown in FIG. 10, side-out learning can be performed based on a plurality of outputs from one intermediate layer group. In the configuration shown in FIG. 10, side-out learning is performed based on outputs from two or more different hidden layers in one hidden layer group of the network. In FIG. 10A, one intermediate layer group 1300 includes convolutional layers 1301, 1302, 1303, and a pooling 1304 layer. FIG. 10A shows outputs 1311 to 1313 of the convolution layers 1301 to 1303 and GTs 1321 to 1323 used for error evaluation there. FIG. 10B shows a change in the line width of the contour pattern in the GTs 1321 to 1323, and it can be seen that the line width gradually increases.

この場合、設定部１０２は、ネットワークの１つの中間層群における２以上の異なる中間層からの出力のそれぞれについて、学習用入力データに対する教師データを設定することができる。例えば、出力１３１１〜１３１３の誤差評価用のＧＴ１３２１〜１３２３における輪郭パターンの線幅を順次太くすることができる。具体例として設定部１０２は、第１の中間層からの出力の誤差評価用のＧＴよりも、第１の中間層よりも畳み込み層を通って下流にある第２の中間層からの出力の誤差評価用のＧＴの方が、輪郭パターンの線幅が太くなるように、ＧＴを設定することができる。このような構成により、畳み込み層で順次フィルタを作用させていくことによる画素の空間的な相互依存範囲の拡大の影響を取り込み、パターンの相違以外に起因する誤差が過大に評価されるのを防ぐことができる。このために、より効果的にサイドアウト学習を行うことができる。 In this case, the setting unit 102 can set the teacher data for the learning input data for each of the outputs from two or more different hidden layers in one hidden layer group of the network. For example, the line widths of the contour patterns in the GTs 1321 to 1323 for error evaluation of the outputs 1311 to 1313 can be sequentially increased. As a specific example, the setting unit 102 outputs the error of the output from the second intermediate layer downstream from the first intermediate layer through the convolutional layer with respect to the GT for evaluating the error of the output from the first intermediate layer. The GT for evaluation can be set such that the line width of the contour pattern is larger in the GT for evaluation. With such a configuration, the influence of the expansion of the spatial interdependence range of pixels caused by sequentially applying filters in the convolutional layer is taken in, and errors caused by other than pattern differences are prevented from being overestimated. be able to. Therefore, the side-out learning can be performed more effectively.

別の例として、図９（Ｅ）に示すように、ネットワークの１つの中間層からの複数の出力に基づいてサイドアウト学習を行うこともできる。一例として、図９（Ｅ）には、中間層群９５０が、畳み込み層９５１〜９５３及びプーリング層９５４で構成される場合を示す。図９（Ｅ）の例において、設定部１０２は、ネットワークの１つの階層における２以上の異なるチャネル群からの出力のそれぞれについて、学習用入力データに対する教師データを設定することができる。 As another example, as shown in FIG. 9E, side-out learning can be performed based on a plurality of outputs from one hidden layer of the network. As an example, FIG. 9E illustrates a case where the intermediate layer group 950 includes the convolution layers 951 to 953 and the pooling layer 954. In the example of FIG. 9E, the setting unit 102 can set teacher data for learning input data for each of outputs from two or more different channel groups in one layer of the network.

例えば、設定部１０２は、基本学習データに示される画像を所定の条件に従って分離し、それぞれの部分画像を示す複数の適合的学習データを生成することができる。具体例として、ＧＴに示される輪郭パターンを特定の方向ごとに分離し、それぞれの輪郭パターンを用いて対応するネットワークの重み係数（畳み込みフィルタ）の学習を行ってもよい。ここで、サイドアウトを出力する畳み込み層９５１は、畳み込み層９６１と畳み込み層９６２に分割される。畳み込み層９６１及び畳み込み層９６２は、畳み込み層９５１における異なるチャネル群に相当する。ここで設定部１０２は、畳み込み層９６１，９６２のそれぞれに、異なる方向成分を有するＧＴを設定することができる。この場合、畳み込み層９６１，９６２のそれぞれの重み係数の学習は、異なる方向成分を有するＧＴを用いて行われる。例えば、畳み込み層９６１の学習は第１の方向の輪郭パターンを示すＧＴ９７１を用いて、畳み込み層９６２の学習は第１の方向とは異なる第２の方向の輪郭パターンを示すＧＴ９７２を用いて、それぞれ行うことができる。このように、それぞれの畳み込み層について特定のパターンを有するＧＴを用いた学習を集中的に行うことにより、全体の認識性能が向上することが期待される。このような構成は上記の各種の構成と組み合わせることができ、例えばＧＴに対してガウシアンブラー処理のようなさらなる画像処理を適用する場合と組み合わせてもよい。 For example, the setting unit 102 can separate an image indicated by the basic learning data according to a predetermined condition, and generate a plurality of adaptive learning data indicating each partial image. As a specific example, the contour pattern shown in the GT may be separated for each specific direction, and learning of the weight coefficient (convolution filter) of the corresponding network may be performed using each contour pattern. Here, the convolution layer 951 for outputting the side-out is divided into a convolution layer 961 and a convolution layer 962. The convolution layer 961 and the convolution layer 962 correspond to different channel groups in the convolution layer 951. Here, the setting unit 102 can set GTs having different directional components in each of the convolution layers 961 and 962. In this case, learning of the respective weight coefficients of the convolution layers 961 and 962 is performed using GTs having different direction components. For example, learning of the convolution layer 961 is performed using GT971 indicating a contour pattern in a first direction, and learning of the convolution layer 962 is performed using GT972 indicating a contour pattern in a second direction different from the first direction. It can be carried out. As described above, it is expected that the overall recognition performance will be improved by intensively performing learning using a GT having a specific pattern for each convolutional layer. Such a configuration can be combined with the various configurations described above, and may be combined with, for example, a case where further image processing such as Gaussian blur processing is applied to GT.

ここまで、中間層準出力をＧＴに合わせて拡大することを前提として、中間層準出力ごとにＧＴを設定する場合について説明した。一方、設定部１０２は、中間層準出力のそれぞれのサイズに合わせたＧＴを設定してもよい。例えば、設定部１０２は、輪郭パターンを示すＧＴ（基本学習データ）を、中間層準出力のサイズに合わせて縮小してもよい。具体例としては、基本学習データに対してフィルタ処理を行うことにより適合的学習データを生成する方法が挙げられる。例えば、基本学習データが二値画像（「１」値が輪郭を表す）場合、２×２のＭＡＸプーリングをストライド２×２で行うことにより、基本学習データに示される輪郭パターンを維持しながら解像度が半分になった適合的学習データを得ることができる。このように、単に画素を間引きし又は繰り返すことにより基本学習データから適合的学習データを生成するのではなく、基本学習データに対してフィルタ処理のような画像処理を行って適合的学習データを生成することができる。このような方法によれば、中間層準出力に適した適合的学習データを生成することが可能となる。 So far, a case has been described in which the GT is set for each intermediate layer quasi-output, on the assumption that the intermediate layer quasi-output is expanded in accordance with the GT. On the other hand, the setting unit 102 may set a GT according to each size of the intermediate-layer quasi-output. For example, the setting unit 102 may reduce the GT (basic learning data) indicating the contour pattern according to the size of the intermediate layer quasi-output. As a specific example, there is a method of generating adaptive learning data by performing a filtering process on basic learning data. For example, when the basic learning data is a binary image (the value “1” represents an outline), 2 × 2 MAX pooling is performed with a stride of 2 × 2, so that the resolution is maintained while maintaining the outline pattern indicated in the basic learning data. , Learning data can be obtained. In this way, adaptive learning data is generated by performing image processing such as filter processing on the basic learning data, instead of generating adaptive learning data from the basic learning data simply by thinning out or repeating pixels. can do. According to such a method, it is possible to generate adaptive learning data suitable for the intermediate layer quasi-output.

以上説明した方法により階層型ネットワークの学習を行うことにより、階層型ネットワークのパラメータを作成することができる。また、一実施形態に係る情報処理装置は、このように作成されたパラメータが設定された階層型ネットワークを用いて、入力データに対応する認識処理の結果を生成する生成部を有している。このような階層型ネットワークは、プログラムにより実現することもできるし、パラメータを格納するメモリとＧＰＵのような演算部とを備える演算装置により実現することもできる。本実施形態に係る方法によれば、階層型ネットワークからの２以上の異なる出力のそれぞれが、従来のように同じ基本学習データを用いて評価する代わりに、それぞれに合った適合的学習データを用いて評価される。このため、学習によって得られるネットワークのパラメータは、従来とは異なり、より入力データに対する認識処理に適したものとなる。 By learning the hierarchical network by the method described above, the parameters of the hierarchical network can be created. In addition, the information processing apparatus according to one embodiment includes a generation unit that generates a result of a recognition process corresponding to input data using a hierarchical network in which parameters created in this way are set. Such a hierarchical network can be realized by a program, or by an arithmetic device including a memory for storing parameters and an arithmetic unit such as a GPU. According to the method according to the present embodiment, instead of each of the two or more different outputs from the hierarchical network being evaluated using the same basic learning data as in the related art, the adaptive learning data corresponding to each is used. Is evaluated. For this reason, the parameters of the network obtained by learning are different from the conventional ones, and are more suitable for the recognition processing of the input data.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program for realizing one or more functions of the above-described embodiments to a system or an apparatus via a network or a storage medium, and one or more processors in a computer of the system or the apparatus read and execute the program. This processing can be realized. Further, it can also be realized by a circuit (for example, an ASIC) that realizes one or more functions.

１００：学習装置、１０２：設定部、１０４：学習部 100: learning device, 102: setting unit, 104: learning unit

Claims

A learning device for learning a neural network,
Setting means for setting teacher data for each of two or more different outputs from the neural network corresponding to a single learning input data;
Learning means for learning the neural network based on an error between each of the two or more different outputs obtained by inputting the learning input data into the neural network and teacher data corresponding to the outputs; And
A learning apparatus, wherein the neural network after learning provides two or more different outputs corresponding to input data, and an integrated result of the two or more different outputs indicates a result of a recognition process on the input data.

The learning device according to claim 1, wherein the setting unit sets teacher data for each of the two or more different outputs based on a structure of the neural network.

The learning device according to claim 1, wherein the setting unit sets teacher data for learning input data for each of outputs from two or more different layers of the neural network.

The said setting means sets the teacher data with respect to the input data for learning about each output from two or more different channel groups in one hierarchy of the said neural network, The Claims characterized by the above-mentioned. Learning device.

The learning according to any one of claims 1 to 4, wherein the input data is image data, and a result of a recognition process on the input data is attribute information of each pixel of the image data. apparatus.

The learning device according to claim 5, wherein the setting unit sets the teacher data for the two or more different outputs based on the resolution of the two or more different outputs.

The learning device according to claim 5, wherein a result of the recognition processing on the input data is a line drawing pattern corresponding to the input data.

The learning apparatus according to claim 7, wherein the setting unit sets a line drawing pattern having a width corresponding to each of the two or more different outputs as teacher data for the two or more different outputs. .

The learning device according to claim 7, wherein the setting unit sets the line drawing pattern on which the blurring processing has been performed as teacher data for the two or more different outputs.

The setting means sets teacher data for the two or more different outputs, wherein an error evaluation non-target area having a width corresponding to each of the two or more different outputs is set around a line drawing pattern. The learning device according to claim 7, wherein

The learning device according to claim 7, wherein the setting unit generates the teacher data using basic teacher data that is a line drawing pattern corresponding to the learning input data. .

10. The teacher data according to claim 7, wherein the setting unit generates the teacher data by performing a filtering process on basic teacher data that is a line drawing pattern corresponding to the learning input data. The learning device according to claim 1.

A method of creating a learned neural network,
A setting step of setting teacher data for each of two or more different outputs from the neural network corresponding to a single learning input data;
A learning step of learning the neural network based on an error between each of the two or more different outputs obtained by inputting the learning input data into the neural network and teacher data corresponding to the outputs. And having
The learning step generates parameters of the learned neural network, and the learned neural network provides two or more different outputs corresponding to the input data, and an integrated result of the two or more different outputs corresponds to the input data. A creation method characterized by indicating a result of a recognition process.

A neural network in which parameters created by the creation method according to claim 13 are set.

An information processing apparatus comprising: a processing unit configured to generate a processing result of a recognition process corresponding to input data using the neural network according to claim 14.

A program for causing a computer to function as each unit of the learning device according to any one of claims 1 to 12.