JP2023124376A

JP2023124376A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2023124376A
Application number: JP2022028107A
Authority: JP
Inventors: 聡志川村; Satoshi Kawamura
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-09-06

Abstract

To establish a high-accuracy neural network.SOLUTION: A method includes the steps of: outputting input data and a first inference result of the input data on the basis of a first neural network; outputting a first evaluation result based on the first inference result; updating a first weight parameter of the first neural network on the basis of the first evaluation result; outputting a second inference result of the input data on the basis of the input data, the first neural network set with the updated first weight parameter, a second neural network, and a weight coefficient of the second neural network; outputting a second evaluation result based on the second inference result; updating the weight coefficient on the basis of the second evaluation result; and updating a second weight parameter on the basis of the updated first weight parameter, the second weight parameter of the second neural network, and the updated weight coefficient.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

ニューラルネットワーク（以下、「ＮＮ」とも記す。）は、画像認識などの問題において高い性能を有する。このＮＮを活用した分野において、学習用データを用いてＮＮを学習し、評価用データを用いてＮＮの性能の検証を行う。しかし、学習中におけるＮＮのパラメータは学習用データに依存することで激しく変化し、パラメータの激しい変化は過学習にもつながる恐れがある。過学習とは学習用データに過度に適応することであり、評価用データに対する精度が劣化する現象のことを差す。 A neural network (hereinafter also referred to as "NN") has high performance in problems such as image recognition. In the field utilizing the NN, learning data is used to learn the NN, and evaluation data is used to verify the performance of the NN. However, the parameters of the NN undergo drastic changes during learning due to their dependence on the training data, and drastic changes in the parameters may lead to over-learning. Over-learning is excessive adaptation to learning data, and refers to a phenomenon in which accuracy with respect to evaluation data deteriorates.

近年、この課題解決のため、ＮＮを保存する際に、ＮＮパラメータに対して指数移動平均（ＥｘｐｏｎｅｎｔｉａｌｌｙＭｏｖｉｎｇＡｖｅｒａｇｅ：ＥＭＡ）を適用した頑強性向上手法が取り入れられることが多い。ＥＭＡは検証用ＮＮを学習用ＮＮとは別に用意し、学習用ＮＮのパラメータと検証用ＮＮのパラメータとの重み付き和を取る手法である。この手法により汎化性能が向上し、評価用データにおいても高精度な識別が期待できる。ＥＭＡの処理は、式（１）のように示される。 In recent years, in order to solve this problem, a method for improving robustness by applying an exponentially moving average (EMA) to the NN parameters is often adopted when storing the NN. EMA is a technique in which a verification NN is prepared separately from a learning NN, and a weighted sum of the parameters of the learning NN and the verification NN is obtained. This method improves the generalization performance, and high-precision identification can be expected even in the evaluation data. The processing of EMA is shown as Equation (1).

非特許文献１に記載の方法は、特に画像認識におけるＥＭＡを用いた半教師あり学習の手法の一つであり、学習されたＮＮに比べるとＥＭＡによって保存されたＮＮのほうが検証用データでの精度が高いことを示している。ここで、半教師有り学習とは、ＮＮの学習に必要とされる膨大な入力データと教師ラベルの内、一部にのみ教師ラベルが存在する状態で学習する手法であり、人手による教師ラベル付与作業の負担を軽減できる。 The method described in Non-Patent Document 1 is one of the methods of semi-supervised learning using EMA, especially in image recognition, and the NN saved by EMA is more effective in verification data than the learned NN. This indicates that the accuracy is high. Here, semi-supervised learning is a method of learning in a state in which teacher labels exist only in part of the huge amount of input data and teacher labels required for NN learning. Work load can be reduced.

非特許文献２に記載の方法は、敵対的生成ネットワーク（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ：ＧＡＮ）の学習にＥＭＡを用いた学習手法の一つであり、学習されたＮＮに比べるとＥＭＡによって保存されたＮＮのほうが安定して学習が進むことを示している。ここで、ＧＡＮとはＮＮに入力されるデータの特徴を学習する手法であり、実在しないが高品質のデータを生成したり、入力データの特徴に沿ったデータの変換をしたりできる。 The method described in Non-Patent Document 2 is one of the learning methods using EMA for learning of a generative adversarial network (GAN). This indicates that learning progresses more stably. Here, GAN is a technique for learning the characteristics of data input to the NN, and can generate non-existent but high-quality data and convert data in line with the characteristics of the input data.

Zhaowei Cai、他5名、"Exponential Moving Average Normalizationfor Self-supervised and Semi-supervised Learning"、[online]、［令和4年2月9日検索］、インターネット＜https://arxiv.org/abs/2101.08482＞Zhaowei Cai, 5 others, "Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning", [online], [searched on February 9, 2020], Internet <https://arxiv.org/abs/ 2101.08482> Yasin Yazici、他5名、"THE UNUSUAL EFFECTIVENESS OF AVERAGING IN GAN TRAINING"、[online]、［令和4年2月9日検索］、インターネット＜https://arxiv.org/abs/1806.04498＞Yasin Yazici, 5 others, "THE UNUSUAL EFFECTIVENESS OF AVERAGING IN GAN TRAINING", [online], [searched on February 9, 2020], Internet <https://arxiv.org/abs/1806.04498>

非特許文献１に記載の方法および非特許文献２に記載の方法は、式（１）の係数αが常に一定であり、学習開始時に人手で設定するパラメータとなる。しかし、学習対象となるＮＮまたは使用するデータセットによって適切な係数αが異なることも考えられる。したがって、ユーザは係数αを変えながら、ＮＮの学習を複数回試行したり、学習後のＮＮの分析をしたりする必要があり、ユーザの負担が高くなる。 In the method described in Non-Patent Document 1 and the method described in Non-Patent Document 2, the coefficient α in Equation (1) is always constant, and is a parameter that is manually set at the start of learning. However, it is conceivable that the appropriate coefficient α differs depending on the NN to be learned or the data set to be used. Therefore, the user needs to try learning the NN a plurality of times while changing the coefficient α, or analyze the NN after learning, which increases the burden on the user.

そこで、本発明は、これらの問題点を解決すべく提案されたものであり、ユーザの負担を軽減しつつ、高精度なニューラルネットワークを構築することを可能とする技術が提供されることが望まれる。 Therefore, the present invention has been proposed to solve these problems, and it is desirable to provide a technique that enables construction of a highly accurate neural network while reducing the burden on the user. be

上記問題を解決するために、本発明のある観点によれば、第１の入力データと、第１のニューラルネットワークとに基づいて、前記第１の入力データの第１の推論結果を出力する第１の演算部と、前記第１の推論結果に基づく第１の評価結果を出力する第１の評価部と、前記第１の評価結果に基づいて前記第１のニューラルネットワークの第１の重みパラメータを更新する第１の更新部と、第２の入力データと、更新後の前記第１の重みパラメータが設定された前記第１のニューラルネットワークと、第２のニューラルネットワークと、前記第２のニューラルネットワークの重み係数とに基づいて、前記第２の入力データの第２の推論結果を出力する第２の演算部と、前記第２の推論結果に基づく第２の評価結果を出力する第２の評価部と、前記第２の評価結果に基づいて、前記重み係数を更新する重み係数更新部と、前記更新後の第１の重みパラメータと、前記第２のニューラルネットワークの第２の重みパラメータと、更新後の前記重み係数とに基づいて、前記第２の重みパラメータを更新する第２の更新部と、を備える、情報処理装置が提供される。 In order to solve the above problem, according to one aspect of the present invention, a first input data and a first neural network output a first inference result of the first input data. 1 calculation unit, a first evaluation unit that outputs a first evaluation result based on the first inference result, and a first weight parameter of the first neural network based on the first evaluation result a first update unit that updates the second input data; the first neural network in which the updated first weight parameter is set; a second neural network; and the second neural network a second calculation unit for outputting a second inference result of the second input data based on the weighting coefficient of the network; and a second calculation unit for outputting a second evaluation result based on the second inference result. an evaluation unit, a weighting factor updating unit that updates the weighting factor based on the second evaluation result, the updated first weighting parameter, and a second weighting parameter of the second neural network; and a second updating unit that updates the second weighting parameter based on the updated weighting factor.

前記第２の演算部は、前記更新後の前記第１の重みパラメータが設定された前記第１のニューラルネットワークおよび前記第２のニューラルネットワークそれぞれの第１層に対して、前記第２の入力データを入力してもよい。 The second computing unit inputs the second input data to the first layer of each of the first neural network and the second neural network to which the updated first weight parameter is set. can be entered.

前記第２の演算部は、前記更新後の前記第１の重みパラメータが設定された前記第１のニューラルネットワークおよび前記第２のニューラルネットワークそれぞれの第２層以降に対して、直前の層からの出力に対する前記重み係数による重み付き和を入力してもよい。 The second computing unit performs the following operations on the second and subsequent layers of each of the first neural network and the second neural network to which the updated first weight parameter is set: A weighted sum of the weighting factors for the output may be input.

前記第２の評価部は、前記第２の評価結果と教師ラベルとに基づく損失を算出し、前記損失に基づいて前記第２の評価結果を出力してもよい。 The second evaluation unit may calculate a loss based on the second evaluation result and the teacher label, and output the second evaluation result based on the loss.

前記重み係数更新部は、前記第２の評価結果に基づく誤差逆伝播法によって前記重み係数を更新してもよい。 The weighting factor updating unit may update the weighting factor by error backpropagation based on the second evaluation result.

前記第２の更新部は、前記更新後の第１の重みパラメータと、前記第２のニューラルネットワークの第２の重みパラメータとに対する、前記更新後の重み係数による重み付き和によって、前記第２の重みパラメータを更新してもよい。 The second updating unit calculates the second weighting parameter by a weighted sum of the updated first weighting parameter and the second weighting parameter of the second neural network using the updated weighting factor. Weight parameters may be updated.

前記重み係数は、前記第１のニューラルネットワークおよび前記第２のニューラルネットワークに含まれる層ごとに設けられてもよい。 The weighting factor may be provided for each layer included in the first neural network and the second neural network.

前記第１の入力データと前記第２の入力データとは、同じデータであってもよい。 The first input data and the second input data may be the same data.

前記第１の入力データと前記第２の入力データとは、異なるデータであってもよい。 The first input data and the second input data may be different data.

また、本発明の別の観点によれば、第１の入力データと、第１のニューラルネットワークとに基づいて、前記第１の入力データの第１の推論結果を出力することと、前記第１の推論結果に基づく第１の評価結果を出力することと、前記第１の評価結果に基づいて前記第１のニューラルネットワークの第１の重みパラメータを更新することと、第２の入力データと、更新後の前記第１の重みパラメータが設定された前記第１のニューラルネットワークと、第２のニューラルネットワークと、前記第２のニューラルネットワークの重み係数とに基づいて、前記第２の入力データの第２の推論結果を出力することと、前記第２の推論結果に基づく第２の評価結果を出力することと、前記第２の評価結果に基づいて、前記重み係数を更新することと、前記更新後の第１の重みパラメータと、前記第２のニューラルネットワークの第２の重みパラメータと、更新後の前記重み係数とに基づいて、前記第２の重みパラメータを更新することと、を備える、情報処理方法が提供される。 According to another aspect of the present invention, outputting a first inference result of the first input data based on the first input data and a first neural network; outputting a first evaluation result based on the inference result of; updating a first weight parameter of the first neural network based on the first evaluation result; and second input data; Based on the first neural network set with the updated first weighting parameter, the second neural network, and the weighting factor of the second neural network, the second input data outputting an inference result of 2; outputting a second evaluation result based on the second inference result; updating the weighting factor based on the second evaluation result; updating the second weighting parameter based on the subsequent first weighting parameter, the second weighting parameter of the second neural network, and the updated weighting factor. A processing method is provided.

また、本発明の別の観点によれば、コンピュータを、第１の入力データと、第１のニューラルネットワークとに基づいて、前記第１の入力データの第１の推論結果を出力する第１の演算部と、前記第１の推論結果に基づく第１の評価結果を出力する第１の評価部と、前記第１の評価結果に基づいて前記第１のニューラルネットワークの第１の重みパラメータを更新する第１の更新部と、第２の入力データと、更新後の前記第１の重みパラメータが設定された前記第１のニューラルネットワークと、第２のニューラルネットワークと、前記第２のニューラルネットワークの重み係数とに基づいて、前記第２の入力データの第２の推論結果を出力する第２の演算部と、前記第２の推論結果に基づく第２の評価結果を出力する第２の評価部と、前記第２の評価結果に基づいて、前記重み係数を更新する重み係数更新部と、前記更新後の第１の重みパラメータと、前記第２のニューラルネットワークの第２の重みパラメータと、更新後の前記重み係数とに基づいて、前記第２の重みパラメータを更新する第２の更新部と、を備える情報処理装置として機能させるプログラムが提供される。 According to another aspect of the present invention, a computer is provided with a first input data and a first neural network for outputting a first inference result of the first input data. a calculation unit, a first evaluation unit that outputs a first evaluation result based on the first inference result, and a first weight parameter of the first neural network that is updated based on the first evaluation result. a first update unit that performs the above, second input data, the first neural network in which the updated first weight parameter is set, a second neural network, and the second neural network a second calculation unit that outputs a second inference result of the second input data based on the weighting factor; and a second evaluation unit that outputs a second evaluation result based on the second inference result. a weighting factor updating unit that updates the weighting factor based on the second evaluation result; the updated first weighting parameter; the second weighting parameter of the second neural network; and a second updating unit that updates the second weighting parameter based on the weighting factor afterward.

以上説明したように本発明によれば、ユーザの負担を軽減しつつ、高精度なニューラルネットワークを構築することを可能とする技術が提供される。 As described above, according to the present invention, a technique is provided that enables construction of a highly accurate neural network while reducing the burden on the user.

本発明の第１の実施形態に係る学習装置の機能構成例を示す図である。1 is a diagram showing a functional configuration example of a learning device according to a first embodiment of the present invention; FIG. 同実施形態に係る演算部および二重演算部それぞれの処理を説明するための図である。It is a figure for demonstrating each process of the calculating part and double calculating part which concern on the same embodiment. 同実施形態に係る学習装置によって実行される学習段階の動作例を示すフローチャートである。5 is a flow chart showing an example of operation in a learning stage performed by the learning device according to the embodiment; 同実施形態に係る二重演算部の処理を説明するための図である。It is a figure for demonstrating the process of the double operation part which concerns on the same embodiment. 本発明の第１の実施形態に係る学習装置の例としての情報処理装置のハードウェア構成を示す図である。1 is a diagram showing a hardware configuration of an information processing device as an example of a learning device according to a first embodiment of the present invention; FIG.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.

また、本明細書および図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合がある。ただし、実質的に同一の機能構成を有する複数の構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。また、異なる実施形態の類似する構成要素については、同一の符号の後に異なるアルファベットを付して区別する場合がある。ただし、異なる実施形態の類似する構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。 In addition, in this specification and drawings, a plurality of components having substantially the same functional configuration may be distinguished by attaching different numerals after the same reference numerals. However, when there is no particular need to distinguish between a plurality of constituent elements having substantially the same functional configuration, only the same reference numerals are used. Also, similar components in different embodiments may be distinguished by attaching different alphabets after the same reference numerals. However, when there is no particular need to distinguish between similar components of different embodiments, only the same reference numerals are used.

（０．実施形態の概要）
本発明の実施形態の概要について説明する。本発明の実施形態では、ニューラルネットワークの学習を行う情報処理装置（以下、「学習装置」とも言う。）について説明する。学習装置においては、学習用データに基づいてニューラルネットワークの学習が行われる（学習段階）。その後、識別装置において、学習済みのニューラルネットワークと識別用データ（テストデータ）とに基づいて識別結果が出力される（推論段階）。 (0. Outline of embodiment)
An outline of an embodiment of the present invention will be described. In the embodiment of the present invention, an information processing device (hereinafter also referred to as a "learning device") that performs neural network learning will be described. In the learning device, learning of the neural network is performed based on the learning data (learning stage). After that, the identification device outputs the identification result based on the trained neural network and identification data (test data) (inference stage).

本発明の実施形態では、学習装置と識別装置とが同一のコンピュータによって実現される場合を主に想定する。しかし、学習装置と識別装置とは、別のコンピュータによって実現されてもよい。かかる場合には、学習装置によって生成された学習済みのニューラルネットワークが識別装置に提供される。例えば、学習済みのニューラルネットワークは、学習装置から識別装置に記録媒体を介して提供されてもよいし、通信を介して提供されてもよい。 In the embodiments of the present invention, it is mainly assumed that the learning device and the identification device are realized by the same computer. However, the learning device and the identification device may be implemented by separate computers. In such a case, the trained neural network generated by the learning device is provided to the identification device. For example, the trained neural network may be provided from the learning device to the identification device via a recording medium or via communication.

以下では、学習装置において実行される「学習段階」について説明する。以下では、ニューラルネットワークを「ＮＮ」とも表記する。 The “learning stage” performed in the learning device will be described below. Below, a neural network is also written as "NN."

（１．第１の実施形態）
まず、本発明の第１の実施形態について説明する。 (1. First embodiment)
First, a first embodiment of the present invention will be described.

（学習装置の構成）
図１を参照しながら、本発明の第１の実施形態に係る学習装置の構成例について説明する。図１は、本発明の第１の実施形態に係る学習装置１０の機能構成例を示す図である。図１に示されるように、本発明の第１の実施形態に係る学習装置１０は、学習用データセット１０１と、入力部１０２と、学習用ＮＮ１１０と、評価部１２１と、更新部１２２と、検証用ＮＮ１３０と、ＥＭＡ評価部１４１と、ＥＭＡ更新部１４２と、保存部１４３とを備える。 (Structure of learning device)
A configuration example of a learning device according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram showing a functional configuration example of a learning device 10 according to the first embodiment of the present invention. As shown in FIG. 1, the learning device 10 according to the first embodiment of the present invention includes a learning data set 101, an input unit 102, a learning NN 110, an evaluation unit 121, an update unit 122, It includes a verification NN 130 , an EMA evaluation unit 141 , an EMA update unit 142 and a storage unit 143 .

学習用ＮＮ１１０は、第１のニューラルネットワークの例に該当し、重みパラメータ１１１を含んで構成される。重みパラメータ１１１は、第１の重みパラメータの例に該当し得る。また、学習装置１０は、重みパラメータ１１１が設定された学習用ＮＮ１１０を用いて演算を行う演算部１１２を含んで構成される。検証用ＮＮ１３０は、第２のニューラルネットワークの例に該当し、ＥＭＡ重みパラメータ１３１と、ＥＭＡ係数１３２とを含んで構成される。ＥＭＡ重みパラメータ１３１は、第２の重みパラメータの例に該当し得る。学習装置１０は、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２を用いて演算を行う二重演算部１３３を含んで構成される。 The learning NN 110 corresponds to an example of a first neural network and is configured including weight parameters 111 . Weight parameter 111 may correspond to an example of a first weight parameter. The learning device 10 also includes a computing unit 112 that performs computation using the learning NN 110 in which the weighting parameter 111 is set. Verification NN 130 corresponds to an example of a second neural network, and includes EMA weighting parameters 131 and EMA coefficients 132 . EMA weight parameter 131 may be an example of a second weight parameter. The learning device 10 includes a double calculator 133 that performs calculations using EMA weighting parameters 131 and EMA coefficients 132 .

入力部１０２、演算部１１２、評価部１２１、更新部１２２、二重演算部１３３、ＥＭＡ評価部１４１、ＥＭＡ更新部１４２および保存部１４３などは、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。 The input unit 102, the calculation unit 112, the evaluation unit 121, the update unit 122, the double calculation unit 133, the EMA evaluation unit 141, the EMA update unit 142, the storage unit 143, and the like are CPU (Central Processing Unit) or GPU (Graphics Processing Unit). ), and a program stored in a ROM (Read Only Memory) is loaded into a RAM by the arithmetic device and executed, thereby realizing its function.

このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。あるいは、これらのブロックは、専用のハードウェアにより構成されていてもよいし、複数のハードウェアの組み合わせにより構成されてもよい。演算装置による演算に必要なデータは、図示しない記憶部によって適宜記憶される。 At this time, a computer-readable recording medium recording the program may also be provided. Alternatively, these blocks may be composed of dedicated hardware, or may be composed of a combination of multiple pieces of hardware. Data necessary for calculation by the calculation device are appropriately stored in a storage unit (not shown).

学習用データセット１０１、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２は、図示しない記憶部によって記憶される。かかる記憶部は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ハードディスクドライブまたはフラッシュメモリなどのメモリによって構成されてよい。 The learning data set 101, weight parameters 111, EMA weight parameters 131 and EMA coefficients 132 are stored in a storage unit (not shown). The storage unit may be composed of a memory such as a RAM (Random Access Memory), a hard disk drive, or a flash memory.

初期状態において、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２それぞれには、初期値が設定されている。例えば、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２それぞれに設定される初期値は、ランダムな値であってよいが、どのような値であってもよい。例えば、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２それぞれに設定される初期値は、あらかじめ学習によって得られた学習済みの値であってもよい。 In the initial state, the weight parameter 111, the EMA weight parameter 131, and the EMA coefficient 132 are each set to an initial value. For example, the initial values set for the weight parameter 111, the EMA weight parameter 131, and the EMA coefficient 132 may be random values, but may be any values. For example, the initial values set for the weight parameter 111, the EMA weight parameter 131, and the EMA coefficient 132 may be learned values obtained in advance through learning.

（学習用データセット１０１）
学習用データセット１０１は、複数の学習用データ（以下、「入力データ」とも言う。）を含んで構成される。複数の入力データそれぞれには、教師ラベルが対応付けられていてよい。なお、本発明の実施形態では、入力データが画像データである場合（特に、静止画像データである場合）を主に想定する。しかし、入力データの種類は特に限定されず、静止画像データ以外も入力データとして用いられ得る。例えば、入力データは、複数のフレームを含んだ動画像データであってもよいし、時系列データまたは音声データであってもよい。 (Learning data set 101)
The learning data set 101 includes a plurality of pieces of learning data (hereinafter also referred to as “input data”). A teacher label may be associated with each of the plurality of pieces of input data. Note that the embodiment of the present invention mainly assumes that the input data is image data (especially still image data). However, the type of input data is not particularly limited, and data other than still image data can be used as input data. For example, the input data may be moving image data including a plurality of frames, time-series data, or audio data.

（入力部１０２）
入力部１０２は、学習用データセット１０１から入力データを順次に取得し、取得した入力データをもとにミニバッチを作成し、作成したミニバッチを学習用ＮＮ１１０の演算部１１２および検証用ＮＮ１３０の二重演算部１３３それぞれに出力する。ミニバッチのサイズは特に限定されない。また、ここでは、学習用ＮＮ１１０および検証用ＮＮ１３０
それぞれに入力される入力データは同じデータである場合を主に想定する。しかし、学習用ＮＮ１１０および検証用ＮＮ１３０それぞれに入力される入力データは異なるデータであってもよい。 (Input unit 102)
The input unit 102 sequentially acquires input data from the learning data set 101, creates mini-batches based on the acquired input data, and distributes the created mini-batches to the computing unit 112 of the learning NN 110 and the verification NN 130. It outputs to each of the calculation units 133 . The size of the mini-batch is not particularly limited. Also, here, the learning NN 110 and the verification NN 130
It is mainly assumed that the input data input to each are the same data. However, the input data input to the learning NN 110 and the verification NN 130 may be different data.

演算部１１２および二重演算部１３３それぞれは、ニューロンによって構築される計算グラフが処理順に接続されて構成されており、全体として１つのニューラルネットワークとみなされ得る。より詳細に、演算部１１２および二重演算部１３３それぞれは、畳み込み層、プーリング層および活性化関数を主に含んでよい。以下では、畳み込み層として、２次元畳み込み層が用いられる場合を主に想定するが、３次元畳み込み層が用いられてもよい。 Each of the arithmetic unit 112 and the double arithmetic unit 133 is configured by connecting computation graphs constructed by neurons in order of processing, and can be regarded as one neural network as a whole. More specifically, each of the computing unit 112 and the double computing unit 133 may mainly include a convolutional layer, a pooling layer and an activation function. In the following, it is mainly assumed that a two-dimensional convolutional layer is used as the convolutional layer, but a three-dimensional convolutional layer may also be used.

（演算部１１２）
演算部１１２は、入力部１０２から出力されたミニバッチに含まれる入力データと学習用ＮＮ１１０とに基づいて推論結果を得る。演算部１１２は、第１の演算部の例に該当し得る。また、演算部１１２によって得られる推論結果は、第１の推論結果の例に該当し得る。より詳細に、演算部１１２は、重みパラメータ１１１が設定された学習用ＮＮ１１０にミニバッチに含まれる入力データを入力させたことに基づいて、学習用ＮＮ１１０から出力されるデータを推論結果として得る。 (Calculation unit 112)
The calculation unit 112 obtains an inference result based on the input data included in the mini-batch output from the input unit 102 and the learning NN 110 . The computing unit 112 can correspond to an example of a first computing unit. Also, the inference result obtained by the calculation unit 112 can correspond to an example of the first inference result. More specifically, the calculation unit 112 obtains the data output from the learning NN 110 as an inference result based on inputting the input data included in the mini-batch to the learning NN 110 to which the weight parameter 111 is set.

なお、演算部１１２から出力される推論結果の形式は、特に限定されない。しかし、演算部１１２から出力される推論結果の形式は、教師ラベルの形式と合わせて設定されているのがよい。例えば、教師ラベルが分類問題のクラスを示し、クラス数分の長さを有するｏｎｅ－ｈｏｔベクトルである場合、演算部１１２から出力される推論結果の形式も、クラス数分の長さを有するベクトルであってよい。このとき、演算部１１２から出力される推論結果は、クラスごとの値（以下、「推論値」とも言う。）を含み得る。 Note that the format of the inference result output from the calculation unit 112 is not particularly limited. However, it is preferable that the format of the inference result output from the calculation unit 112 is set together with the format of the teacher label. For example, if the teacher label indicates the class of the classification problem and is a one-hot vector having the length of the number of classes, the format of the inference result output from the calculation unit 112 is also a vector having the length of the number of classes. can be At this time, the inference result output from the calculation unit 112 may include a value for each class (hereinafter also referred to as “inference value”).

一例として、演算部１１２によって全クラスの推論値の合計が１になるように調整される場合には、それぞれのクラスに対応する推論値は、それぞれのクラスに対応する確率に相当し得る。しかし、全クラスの推論値の合計は、演算部１１２によって１になるように調整されていなくてもよい。いずれの場合であっても、演算部１１２から出力される推論値は、そのクラスの確からしさが高いほど、大きい値であり得る。 As an example, when the calculation unit 112 adjusts the sum of the inference values of all classes to 1, the inference value corresponding to each class can correspond to the probability corresponding to each class. However, the sum of the inference values of all classes does not have to be adjusted to be 1 by the calculation unit 112 . In any case, the inference value output from the calculation unit 112 can be a larger value as the probability of the class is higher.

ここで、図２を参照しながら、演算部１１２の処理について説明する。 Here, the processing of the calculation unit 112 will be described with reference to FIG.

図２は、本発明の第１の実施形態に係る演算部１１２および二重演算部１３３それぞれの処理を説明するための図である。図２を参照すると、演算部１１２の構成例が示されている。図２に示された例では、演算部１１２によって用いられる学習用ＮＮ１１０の複数の層のうち、第ｉ層（ただし、ｉ＝１，２，・・・，ｎ）が畳み込み層であるとし、その畳み込み層が「ｆ_ｉ」と表現されている。畳み込み層ｆ_ｉには、重みパラメータ１１１のうち第ｉ層に対応する重みパラメータが設定される。 FIG. 2 is a diagram for explaining processing of each of the arithmetic unit 112 and the double arithmetic unit 133 according to the first embodiment of the present invention. Referring to FIG. 2, a configuration example of the calculation unit 112 is shown. In the example shown in FIG. 2, among the plurality of layers of the learning NN 110 used by the calculation unit 112, the i-th layer (where i=1, 2, . The convolutional layer is denoted as "f _i ". A weight parameter corresponding to the i-th layer among the weight parameters 111 is set for the convolution layer f _i .

また、畳み込み層ｆ_ｉへの入力がｘ_ｉと表現され、その畳み込み層ｆ_ｉからの出力がｙ_ｉと表現されている。ただし、学習用ＮＮ１１０のｉ番目の処理層は、畳み込み層でなくてもよい。なお、第１層への入力ｘ_１は、入力部１０２から出力された入力データである。第２層以降への入力ｘ_ｉ（ｉ≧２）は、直前の層からの出力ｙ_ｉ－１である。演算部１１２は、学習用ＮＮ１１０の最終層に対応する出力ｙ_ｎを推論結果として評価部１２１に出力する。 Also, the input to the convolution layer f _i is expressed as x _i and the output from the convolution layer f _i is expressed as y _i . However, the i-th processing layer of the learning NN 110 may not be a convolutional layer. The input _x1 to the first layer is the input data output from the input section 102 . The input x _i (i≧2) to the second and subsequent layers is the output y _i−1 from the immediately preceding layer. The calculation unit 112 outputs the output y _n corresponding to the final layer of the learning NN 110 to the evaluation unit 121 as an inference result.

図１に戻って説明を続ける。 Returning to FIG. 1, the description continues.

（評価部１２１）
評価部１２１は、ミニバッチごとの推論結果に基づいて評価結果を得る。評価部１２１は、第１の評価部の例に該当し得る。より詳細に、評価部１２１は、ミニバッチごとに、推論結果と目標とに基づいて損失を算出し、ミニバッチごとに、損失に基づいて評価結果を得る。評価部１２１は、評価結果を更新部１２２に出力する。 (Evaluation unit 121)
The evaluation unit 121 obtains an evaluation result based on the inference result for each mini-batch. The evaluator 121 may correspond to an example of a first evaluator. More specifically, the evaluation unit 121 calculates the loss based on the inference result and target for each mini-batch, and obtains the evaluation result based on the loss for each mini-batch. The evaluation unit 121 outputs the evaluation result to the updating unit 122 .

ここで、目標は特定の目標に限定されず、一般的なニューラルネットワークにおいて用いられる目標と同様の目標が用いられてよい。例えば、教師あり学習が行われる場合、目標は、入力データに対応する教師ラベルであってもよい。 Here, the goal is not limited to a specific goal, and a goal similar to that used in general neural networks may be used. For example, when supervised learning is performed, the goal may be the supervised label corresponding to the input data.

損失の算出に用いられる損失関数は特定の関数に限定されず、一般的なニューラルネットワークにおいて用いられる損失関数と同様の損失関数が用いられてよい。例えば、損失関数は、教師ラベルと推論結果との差分に基づく平均二乗誤差であってもよいし、教師ラベルと推論結果との差分に基づく交差エントロピー誤差であってもよい。 The loss function used to calculate the loss is not limited to a specific function, and loss functions similar to loss functions used in general neural networks may be used. For example, the loss function may be the mean squared error based on the difference between the teacher label and the inference result, or the cross-entropy error based on the difference between the teacher label and the inference result.

（更新部１２２）
更新部１２２は、評価部１２１から出力された評価結果に基づいて、重みパラメータ１１１の更新を行う。これによって、推論結果が教師ラベルに近づくように、重みパラメータ１１１が訓練され得る。更新部１２２は、第１の更新部の例に該当し得る。より詳細に、更新部１２２は、評価結果に基づく誤差逆伝播法（バックプロパゲーション）によって重みパラメータ１１１を更新してよい。 (Update unit 122)
The update unit 122 updates the weight parameter 111 based on the evaluation result output from the evaluation unit 121 . This allows the weight parameter 111 to be trained so that the inference result approaches the teacher label. The updating unit 122 can correspond to an example of a first updating unit. More specifically, the updating unit 122 may update the weight parameter 111 by error back propagation (back propagation) based on the evaluation result.

なお、更新部１２２は、重みパラメータ１１１の更新が終わるたびに、学習用ＮＮ１１０の学習の終了条件が満たされたか否かを判断する。学習の終了条件が満たされていないと判断された場合には、入力部１０２によって次の入力データが取得され、演算部１１２、評価部１２１および更新部１２２それぞれによって、当該次の入力データに基づく各自の処理が再度実行される。一方、学習用ＮＮ１１０の学習の終了条件が満たされたと判断された場合には、学習が終了される。 Note that the update unit 122 determines whether or not the learning termination condition of the learning NN 110 is satisfied each time the update of the weight parameter 111 is completed. When it is determined that the learning end condition is not satisfied, the next input data is acquired by the input unit 102, and the calculation unit 112, the evaluation unit 121, and the update unit 122 each perform processing based on the next input data. Each process is executed again. On the other hand, when it is determined that the end condition of learning of the learning NN 110 is satisfied, the learning is ended.

なお、学習用ＮＮ１１０の学習の終了条件は特に限定されず、学習用ＮＮ１１０の学習がある程度行われたことを示す条件であればよい。 The condition for ending the learning of the learning NN 110 is not particularly limited, and any condition may be used as long as it indicates that the learning of the learning NN 110 has been completed to some extent.

具体的に、学習用ＮＮ１１０の学習の終了件は、損失が閾値よりも小さいという条件を含んでもよい。あるいは、学習用ＮＮ１１０の学習の終了条件は、当該損失の変化が閾値よりも小さいという条件（当該損失が収束状態になったという条件）を含んでもよい。あるいは、学習用ＮＮ１１０の学習の終了条件は、重みパラメータ１１１の更新が所定の回数行われたという条件を含んでもよい。あるいは、学習用ＮＮ１１０の精度（例えば、正解率など）が算出される場合、学習用ＮＮ１１０の学習の終了条件は、精度が所定の割合（例えば、９０％など）を超えるという条件を含んでもよい。 Specifically, the learning end condition of the learning NN 110 may include a condition that the loss is less than the threshold. Alternatively, the learning termination condition of the learning NN 110 may include a condition that the change in the loss is smaller than a threshold (a condition that the loss has converged). Alternatively, the learning end condition of the learning NN 110 may include a condition that the weight parameter 111 has been updated a predetermined number of times. Alternatively, when the accuracy of the learning NN 110 (for example, accuracy rate) is calculated, the conditions for ending the learning of the learning NN 110 may include a condition that the accuracy exceeds a predetermined percentage (for example, 90%). .

（二重演算部１３３）
二重演算部１３３は、入力部１０２から出力されたミニバッチに含まれる入力データと、更新部１２２による更新後の重みパラメータ１１１が設定された学習用ＮＮ１１０と、検証用ＮＮ１３０と、ＥＭＡ係数１３２とに基づいて、推論結果（以下、「ＥＭＡ推論結果」とも言う。）を得る。ＥＭＡ推論結果は、第２の推論結果の例に該当し得る。二重演算部１３３は、第２の演算部の例に該当し得る。ＥＭＡ係数１３２は、スカラーの係数であり、検証用ＮＮ１３０の重み係数に該当する。以下では、ＥＭＡ係数１３２は、時刻ｔごとに変化するため、時刻ｔにおけるＥＭＡ係数１３２を重み係数α_ｔとも表現する。 (Double operation unit 133)
The double computing unit 133 receives the input data included in the mini-batch output from the input unit 102, the learning NN 110 set with the weighting parameter 111 updated by the updating unit 122, the verification NN 130, and the EMA coefficient 132. An inference result (hereinafter also referred to as an "EMA inference result") is obtained based on. An EMA inference result may correspond to an example of a second inference result. The double computing unit 133 can correspond to an example of a second computing unit. The EMA coefficient 132 is a scalar coefficient and corresponds to the weighting coefficient of the verification NN 130 . Since the EMA coefficient 132 changes at each time t, the EMA coefficient 132 at time t is also expressed as a weighting coefficient α _t below.

なお、時刻ｔは、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１およびＥＭＡ係数１３２それぞれに対応する更新回数を示す。ここでは一例として、時刻ｔの初期値が０であり、重みパラメータ１１１、ＥＭＡ重みパラメータ１３１またはＥＭＡ係数１３２それぞれの更新のたびに、その重みパラメータまたは係数に対応する時刻ｔが１増加する場合を想定する。 Note that the time t indicates the number of updates corresponding to the weight parameter 111, the EMA weight parameter 131, and the EMA coefficient 132, respectively. Here, as an example, assume that the initial value of time t is 0, and time t corresponding to the weighting parameter or coefficient is increased by 1 each time the weighting parameter 111, the EMA weighting parameter 131, or the EMA coefficient 132 is updated. Suppose.

ここで、図２を再度参照しながら、二重演算部１３３の処理について説明する。 Now, referring to FIG. 2 again, the processing of the double operation unit 133 will be described.

図２を参照すると、二重演算部１３３の構成例が示されている。図２に示された例では、二重演算部１３３によって用いられる検証用ＮＮ１３０の複数の層のうち、第ｉ層（ただし、ｉ＝１，２，・・・，ｎ）が畳み込み層であるとし、その畳み込み層が「ｆ’_ｉ」と表現されている。畳み込み層ｆ’_ｉには、ＥＭＡ重みパラメータ１３１のうち第ｉ層に対応する重みパラメータが設定される。畳み込み層ｆ’_ｉと畳み込み層ｆ_ｉとは、設定される重みパラメータは異なるが、実行する処理は同じであってよい。 Referring to FIG. 2, a configuration example of the double operation unit 133 is shown. In the example shown in FIG. 2, among the multiple layers of the verification NN 130 used by the double operation unit 133, the i-th layer (where i=1, 2, . . . , n) is a convolutional layer. , and its convolutional layer is expressed as “f′ _i ”. A weight parameter corresponding to the _i -th layer among the EMA weight parameters 131 is set in the convolution layer f'i. The convolutional layer f′ _i and the convolutional layer f _i may have different set weight parameters, but may perform the same processing.

また、畳み込み層ｆ’_ｉへの入力がＸ_ｉと表現され、その畳み込み層ｆ’_ｉからの出力がＺ’_ｉと表現されている。ただし、検証用ＮＮ１３０の第ｉ層は、畳み込み層でなくてもよい。なお、ここでは、検証用ＮＮ１３０の第１層への入力Ｘ_１が、学習用ＮＮ１１０の第１層への入力ｘ_１と同じである場合を想定する。しかし、検証用ＮＮ１３０の第１層への入力Ｘ_１は、学習用ＮＮ１１０の第１層への入力ｘ_１と異なってもよい。すなわち、学習用ＮＮ１１０への入力データと、検証用ＮＮ１３０への入力データとは同じでなくてもよい。 Also, the input to the convolutional layer _f'i is expressed as _Xi , and the output from the convolutional layer _f'i is expressed as _Z'i . However, the i-th layer of the verification NN 130 may not be a convolutional layer. Here, it is assumed that the input _X1 to the first layer of the verification NN130 is the same as the input _x1 to the first layer of the learning NN110. However, the input x ₁ to the first layer of verification NN 130 may be different than the input x ₁ to the first layer of training NN 110 . That is, the input data to the learning NN 110 and the input data to the verification NN 130 may not be the same.

また、二重演算部１３３によって、学習用ＮＮ１１０に含まれる畳み込み層ｆ_ｉも用いられる。図２を参照すると、二重演算部１３３によって用いられる畳み込み層ｆ_ｉへの入力がＸ_ｉと表現され、その畳み込み層ｆ’_ｉからの出力がＺ_ｉと表現されている。二重演算部１３３は、畳み込み層ｆ’_ｉからの出力Ｚ’_ｉ＝ｆ’_ｉ（Ｘ_ｉ）と、畳み込み層ｆ_ｉからの出力Ｚ_ｉ＝ｆ_ｉ（Ｘ_ｉ）と、ＥＭＡ係数１３２であるα_ｔとに基づいて、出力Ｙ_ｉを算出する。 In addition, the convolution layer f _i included in the learning NN 110 is also used by the double operation unit 133 . Referring to FIG. 2, the input to the convolutional layer f _i used by the double operator 133 is denoted X _i and the output from that convolutional layer f′ _i is denoted Z _i . The dual operation unit 133 calculates the output Z′ _i =f′ _i (X _i ) from the convolutional layer f′ _i , the output Z _i =f _i (X _i ) from the convolutional layer f _i , and the EMA coefficient 132 as Based on a certain α _t , the output Y _i is calculated.

より詳細に、二重演算部１３３は、出力Ｚ_ｉ＝ｆ_ｉ（Ｘ_ｉ）と出力Ｚ’_ｉ＝ｆ’_ｉ（Ｘ_ｉ）とに対する、ＥＭＡ係数１３２であるα_ｔによる重み付き和によって、出力Ｙ_ｉを算出する。すなわち、二重演算部１３３は、Ｚ_ｉ＝ｆ_ｉ（Ｘ_ｉ）に（１－α_ｔ）を乗算し、出力Ｚ’_ｉ＝ｆ’_ｉ（Ｘ_ｉ）にα_ｔを乗算し、乗算結果同士を足し合わせることによって、出力Ｙ_ｉを算出する。かかる出力Ｙ_ｉの算出式は、以下の式（２）によって表現される。 More specifically, the double operation unit 133 performs a weighted summation of the outputs Z _i =f _i (X _i ) and Z′ _i =f′ _i (X _i ) by the EMA coefficient 132 α _t to obtain Calculate the output _Yi . That is, the double operation unit 133 multiplies Z _i =f _i (X _i ) by (1−α _t ), multiplies the output Z′ _i =f′ _i (X _i ) by α _t , and the multiplication result The output Y _i is calculated by adding them together. A formula for calculating the output _Yi is expressed by the following formula (2).

式（２）は、ｆ_ｉとｆ’_ｉとにおいて、対応する重みパラメータごとに計算される。第１層への入力Ｘ_１は、ミニバッチに含まれる入力データであり、第２層以降への入力Ｘ_ｉ（ｉ≧２）は、直前の層からの出力Ｙ_ｉ－１である。二重演算部１３３は、最終層に対応する出力Ｙ_ｎをＥＭＡ推論結果としてＥＭＡ評価部１４１に出力する。なお、二重演算部１３３から出力されるＥＭＡ推論結果の形式は、演算部１１２から出力される推論結果の形式と同様であってよい。 Equation (2) is computed for each corresponding weight parameter in f _i and f′ _i . The input X ₁ to the first layer is the input data contained in the mini-batch, and the input X _i (i≧2) to the second and subsequent layers is the output Y _i−1 from the immediately preceding layer. The double operation unit 133 outputs the output _Yn corresponding to the final layer to the EMA evaluation unit 141 as the EMA inference result. Note that the format of the EMA inference result output from the double computing unit 133 may be the same as the format of the inference result output from the computing unit 112 .

（ＥＭＡ評価部１４１）
ＥＭＡ評価部１４１は、ミニバッチごとのＥＭＡ推論結果に基づいて評価結果（以下、「ＥＭＡ評価結果」とも言う。）を得る。ＥＭＡ評価部１４１は、第２の評価部の例に該当し得る。ＥＭＡ評価結果は、第２の評価結果の例に該当し得る。より詳細に、ＥＭＡ評価部１４１は、ミニバッチごとに、ＥＭＡ推論結果と目標とに基づいて損失を算出し、ミニバッチごとに、損失に基づいてＥＭＡ評価結果を得る。ＥＭＡ評価部１４１は、ＥＭＡ評価結果をＥＭＡ更新部１４２に出力する。 (EMA evaluation unit 141)
The EMA evaluation unit 141 obtains evaluation results (hereinafter also referred to as “EMA evaluation results”) based on the EMA inference results for each mini-batch. The EMA evaluator 141 can correspond to an example of a second evaluator. An EMA evaluation result may correspond to an example of a second evaluation result. More specifically, the EMA evaluation unit 141 calculates the loss based on the EMA inference result and the target for each mini-batch, and obtains the EMA evaluation result based on the loss for each mini-batch. The EMA evaluation unit 141 outputs the EMA evaluation result to the EMA updating unit 142 .

損失の算出に用いられる損失関数は特定の関数に限定されず、一般的なニューラルネットワークにおいて用いられる損失関数と同様の損失関数が用いられてよい。例えば、損失関数は、教師ラベルとＥＭＡ推論結果との差分に基づく平均二乗誤差であってもよいし、教師ラベルとＥＭＡ推論結果との差分に基づく交差エントロピー誤差であってもよい。 The loss function used to calculate the loss is not limited to a specific function, and loss functions similar to loss functions used in general neural networks may be used. For example, the loss function may be the mean squared error based on the difference between the teacher label and the EMA inference result, or the cross entropy error based on the difference between the teacher label and the EMA inference result.

あるいは、損失関数は、以下の式（３）に示されるように、ＬａｂｅｌＳｍｏｏｔｈｉｎｇが適用された教師ラベルとＥＭＡ推論結果との差分に基づく交差エントロピー誤差であってもよい。 Alternatively, the loss function may be the cross-entropy error based on the difference between the teacher label to which Label Smoothing is applied and the EMA inference result, as shown in Equation (3) below.

ここで、ｌ_ｅｍａは、ＥＭＡ推論結果に基づく損失関数であり、Ｃは分類問題におけるクラス数であり、δはノイズの量であり、ｑは、正解クラスを１、それ以外を０とするｏｎｅ－ｈｏｔベクトルによって表現される教師ラベルである。ｑ_ｌｓは、ＬａｂｅｌＳｍｏｏｔｈｉｎｇが適用された教師ラベルである。 Here, l _ema is the loss function based on the EMA inference result, C is the number of classes in the classification problem, δ is the amount of noise, and q is the correct class as 1 and the others as 0. - is the teacher label represented by the hot vector. q _ls is the teacher label with Label Smoothing applied.

（ＥＭＡ更新部１４２）
ＥＭＡ更新部１４２は、ＥＭＡ評価部１４１から出力されたＥＭＡ評価結果に基づいて、ＥＭＡ係数１３２を更新する。これによって、ＥＭＡ推論結果が教師ラベルに近づくように、ＥＭＡ重みパラメータ１３１が訓練され得る。ＥＭＡ更新部１４２は、重み係数更新部の例に該当し得る。より詳細に、ＥＭＡ更新部１４２は、ＥＭＡ評価結果に基づく誤差逆伝播法（バックプロパゲーション）によってＥＭＡ係数１３２である重み係数α_ｔを更新してよい。一例として、重み係数α_ｔの更新式は、以下の式（４）によって表現され得る。 (EMA update unit 142)
The EMA updating unit 142 updates the EMA coefficients 132 based on the EMA evaluation results output from the EMA evaluating unit 141 . This allows the EMA weight parameter 131 to be trained such that the EMA inference result approaches the teacher label. The EMA updater 142 may correspond to an example of a weighting factor updater. More specifically, the EMA updating unit 142 may update the weighting factor _αt , which is the EMA coefficient 132, by error backpropagation based on the EMA evaluation result. As an example, the weighting factor α _t update formula can be expressed by the following formula (4).

ここで、ｌ_ｅｍａは、ＥＭＡ推論結果に基づく損失関数であり、μは学習率である。ＥＭＡ係数１３２である重み係数α_ｔの更新条件が設定されており、更新条件が満たされた場合に重み係数α_ｔが更新されてもよい。例えば、更新条件は、重みパラメータ１１１が更新されたという条件であってもよいし、ユーザによってあらかじめ指定された更新タイミングが到来したという条件であってもよい。 where l _ema is the loss function based on the EMA inference results and μ is the learning rate. A condition for updating the weighting factor _αt , which is the EMA coefficient 132, may be set, and the weighting factor _αt may be updated when the update condition is satisfied. For example, the update condition may be a condition that the weight parameter 111 has been updated, or a condition that an update timing specified in advance by the user has arrived.

更新タイミングとしては、更新間隔（例えば、所定の回数に１回など）が指定されていてもよい。あるいは、重み係数α_ｔの更新間隔は、重みパラメータ１１１の更新開始から常に等間隔であってもよいが、等間隔でなくてもよい。例えば、重み係数α_ｔは、重みパラメータ１１１の更新開始直後の初期段階（例えば、重みパラメータ１１１が所定の更新回数だけ更新されるまでの段階など）において更新されなくてもよい。 As the update timing, an update interval (for example, once every predetermined number of times) may be specified. Alternatively, the update interval of the weighting factor _αt may be always equal intervals from the start of updating the weighting parameter 111, but may not be equal intervals. For example, the weighting factor α _t may not be updated in the initial stage immediately after the weighting parameter 111 is updated (for example, until the weighting parameter 111 is updated a predetermined number of times).

（保存部１４３）
保存部１４３は、更新後の重みパラメータ１１１と、ＥＭＡ重みパラメータ１３１と、更新後のＥＭＡ係数１３２とに基づいて、ＥＭＡ重みパラメータ１３１を更新する。保存部１４３は、第２の更新部の例に該当し得る。 (storage unit 143)
The storage unit 143 updates the EMA weighting parameter 131 based on the updated weighting parameter 111 , the EMA weighting parameter 131 , and the updated EMA coefficient 132 . The storage unit 143 can correspond to an example of a second updating unit.

より詳細に、保存部１４３は、更新後の重みパラメータ１１１とＥＭＡ重みパラメータ１３１とに対する、更新後のＥＭＡ係数１３２による重み付き和によって、ＥＭＡ重みパラメータ１３１を更新する。すなわち、保存部１４３は、更新後の重みパラメータ１１１に（１－α_ｔ＋１）を乗算し、ＥＭＡ重みパラメータ１３１にα_ｔ＋１を乗算し、乗算結果同士を足し合わせることによって、更新後のＥＭＡ重みパラメータ１３１を算出する。かかる更新後のＥＭＡ重みパラメータ１３１の算出式は、以下の式（５）によって表現される。 More specifically, the storage unit 143 updates the EMA weight parameter 131 by the weighted sum of the updated weight parameter 111 and the EMA weight parameter 131 by the updated EMA coefficient 132 . That is, the storage unit 143 multiplies the updated weight parameter 111 by (1−α _t+1 ), multiplies the EMA weight parameter 131 by α _t+1 , and adds the multiplication results together to obtain the updated EMA weight parameter 131 is calculated. A formula for calculating the EMA weighting parameter 131 after such updating is expressed by the following formula (5).

以上に説明したように、学習装置１０による学習によって、重みパラメータ１１１が更新されるとともに、ＥＭＡ重みパラメータ１３１が更新される。 As described above, learning by the learning device 10 updates the weighting parameter 111 and the EMA weighting parameter 131 .

以上、本発明の第１の実施形態に係る学習装置の構成例について説明した。 The configuration example of the learning device according to the first embodiment of the present invention has been described above.

（学習段階の動作）
続いて、図３を参照しながら、本発明の第１の実施形態に係る学習装置１０によって実行される「学習段階」の動作の流れについて説明する。図３は、本発明の第１の実施形態に係る学習装置１０によって実行される学習段階の動作例を示すフローチャートである。 (Operation in learning stage)
Next, with reference to FIG. 3, the flow of operations in the "learning stage" performed by the learning device 10 according to the first embodiment of the present invention will be described. FIG. 3 is a flow chart showing an operation example of the learning stage performed by the learning device 10 according to the first embodiment of the present invention.

まず、入力部１０２は、学習用データセット１０１からバッチサイズの入力データを取得することによってミニバッチを作成し、作成したミニバッチを学習用ＮＮ１１０の演算部１１２に出力する（Ｓ１０１）。 First, the input unit 102 creates a mini-batch by obtaining batch size input data from the learning data set 101, and outputs the created mini-batch to the calculation unit 112 of the learning NN 110 (S101).

続いて、演算部１１２は、入力部１０２によって作成されたミニバッチと重みパラメータ１１１が設定された学習用ＮＮ１１０とに基づいて、ミニバッチに対応する推論結果を得る（Ｓ１０２）。演算部１１２は、ミニバッチに対応する推論結果を評価部１２１に出力する。 Subsequently, the calculation unit 112 obtains an inference result corresponding to the mini-batch based on the mini-batch created by the input unit 102 and the learning NN 110 to which the weight parameter 111 is set (S102). The calculation unit 112 outputs the inference result corresponding to the mini-batch to the evaluation unit 121 .

評価部１２１は、演算部１１２から出力される推論結果を評価して評価結果を得る。評価部１２１は、評価結果を更新部１２２に出力する（Ｓ１０３）。更新部１２２は、評価部１２１から出力される評価結果に基づいて、学習用ＮＮ１１０の重みパラメータ１１１を更新する（Ｓ１０４）。検証用ＮＮ１１０を更新する場合（Ｓ１０５において「ＮＯ」）、すなわち、ＥＭＡ係数１３２（重み係数α_ｔ）の更新条件が満たされない場合、Ｓ１１０に動作が移行される。 The evaluation unit 121 obtains an evaluation result by evaluating the inference result output from the calculation unit 112 . The evaluation unit 121 outputs the evaluation result to the update unit 122 (S103). The update unit 122 updates the weight parameter 111 of the learning NN 110 based on the evaluation result output from the evaluation unit 121 (S104). If verification NN 110 is to be updated ("NO" in S105), that is, if the update condition for EMA coefficient 132 (weighting coefficient α _t ) is not satisfied, the operation proceeds to S110.

一方、二重演算部１３３は、検証用ＮＮ１１０を更新する場合（Ｓ１０５において「ＹＥＳ」）、すなわち、ＥＭＡ係数１３２（重み係数α_ｔ）の更新条件が満たされた場合、入力部１０２からミニバッチを入力する。そして、二重演算部１３３は、入力部１０２から入力したミニバッチと、更新後の重みパラメータ１１１が設定された学習用ＮＮ１１０と、ＥＭＡ重みパラメータ１３１が設定された検証用ＮＮ１３０と、ＥＭＡ係数１３２とに基づいて、式（２）を用いてＥＭＡ推論結果を得る。二重演算部１３３は、ＥＭＡ推論結果をＥＭＡ評価部１４１に出力する（Ｓ１０６）。 On the other hand, when updating the verification NN 110 (“YES” in S105), that is, when the update condition of the EMA coefficient 132 (weighting coefficient α _t ) is satisfied, the double operation unit 133 receives the mini-batch from the input unit 102. input. Then, the double computing unit 133 receives the mini-batch input from the input unit 102, the learning NN 110 to which the updated weighting parameter 111 is set, the verification NN 130 to which the EMA weighting parameter 131 is set, and the EMA coefficient 132. , the EMA inference result is obtained using equation (2). The double operation unit 133 outputs the EMA inference result to the EMA evaluation unit 141 (S106).

ＥＭＡ評価部１４１は、二重演算部１３３から出力されるＥＭＡ推論結果を評価してＥＭＡ評価結果を得る（Ｓ１０７）。ＥＭＡ評価部１４１は、ＥＭＡ評価結果をＥＭＡ更新部１４２に出力する。ＥＭＡ更新部１４２は、ＥＭＡ評価部１４１から出力されるＥＭＡ評価結果に基づいてＥＭＡ係数１３２である重み係数α_ｔを更新する（Ｓ１０８）。 The EMA evaluation unit 141 obtains an EMA evaluation result by evaluating the EMA inference result output from the double operation unit 133 (S107). The EMA evaluation unit 141 outputs the EMA evaluation result to the EMA updating unit 142 . The EMA updating unit 142 updates the weighting factor _αt , which is the EMA coefficient 132, based on the EMA evaluation result output from the EMA evaluating unit 141 (S108).

その後、保存部１４３は、更新後の重みパラメータ１１１と、ＥＭＡ重みパラメータ１３１と、更新後のＥＭＡ係数１３２とに基づいて、式（５）を用いて、ＥＭＡ重みパラメータ１３１を更新する（Ｓ１０９）。 Thereafter, the storage unit 143 updates the EMA weighting parameter 131 using Equation (5) based on the updated weighting parameter 111, the EMA weighting parameter 131, and the updated EMA coefficient 132 (S109). .

学習用ＮＮ１１０の学習の終了条件が満たされない場合（Ｓ１１０において「ＮＯ」）、Ｓ１０１に動作が移行される。一方、学習用ＮＮ１１０の学習の終了条件が満たされた場合（Ｓ１１０において「ＹＥＳ」）は、学習が終了される。 If the learning end condition of the learning NN 110 is not satisfied ("NO" in S110), the operation proceeds to S101. On the other hand, if the learning termination condition of learning NN 110 is satisfied ("YES" in S110), learning is terminated.

以上、本発明の第１の実施形態に係る学習装置１０によって実行される「学習段階」の動作の流れについて説明した。 The flow of operations in the "learning stage" performed by the learning device 10 according to the first embodiment of the present invention has been described above.

（第１の実施形態のまとめ）
以上に説明したように、本発明の第１の実施形態に係る学習装置１０は、検証用ＮＮの構築におけるＥＭＡ処理において、検証用ＮＮの重みパラメータと学習用ＮＮの重みパラメータとの重み付き和を計算するときに用いられる係数を学習可能とする。これにより、最適な係数が自動的に決定されるため、安定した学習が可能になるという効果が享受される。また、これによって、係数を調整するユーザの負担が軽減され得るという効果が享受される。 (Summary of the first embodiment)
As described above, the learning apparatus 10 according to the first embodiment of the present invention performs the weighted sum of the weight parameters of the verification NN and the weight parameters of the learning NN in the EMA processing in constructing the verification NN. Make it possible to learn the coefficients used when calculating As a result, the optimum coefficient is automatically determined, so that the effect of enabling stable learning can be enjoyed. In addition, this provides the effect that the user's burden of adjusting the coefficients can be reduced.

以上、本発明の第１の実施形態について説明した。 The first embodiment of the present invention has been described above.

（２．第２の実施形態）
続いて、本発明の第２の実施形態について説明する。本発明の第１の実施形態においては、ＥＭＡ係数１３２がスカラーの係数α_ｔである場合について説明した。これは、ＮＮを構築する全ての層が同じ値によって重み付けされることを意味する。本発明の第２の実施形態においては、ＥＭＡ係数がＮＮに含まれる層ごとに設けられる場合について説明する。 (2. Second embodiment)
Next, a second embodiment of the invention will be described. In the first embodiment of the present invention, the case where the EMA coefficient 132 is the scalar coefficient α _t has been described. This means that all layers building the NN are weighted by the same value. In the second embodiment of the present invention, a case will be described in which an EMA coefficient is provided for each layer included in the NN.

なお、本発明の第２の実施形態に係る学習装置１０の機能構成図は、本発明の第１の実施形態に係る学習装置１０の機能構成図（図１）と同様である。また、本発明の第２の実施形態に係る学習装置１０は、本発明の第１の実施形態に係る学習装置１０と比較して、ＥＭＡ係数１３２、二重演算部１３３、ＥＭＡ更新部１４２および保存部１４３が有する機能が異なり、その他の構成要素が有する機能は同様である。 A functional configuration diagram of the learning device 10 according to the second embodiment of the present invention is the same as the functional configuration diagram (FIG. 1) of the learning device 10 according to the first embodiment of the present invention. Further, the learning device 10 according to the second embodiment of the present invention has an EMA coefficient 132, a double operation unit 133, an EMA updating unit 142 and The functions of the storage unit 143 are different, and the functions of the other components are the same.

したがって、以下では、本発明の第２の実施形態に係る学習装置１０が有する、ＥＭＡ係数１３２、二重演算部１３３、ＥＭＡ更新部１４２および保存部１４３について主に説明し、その他の構成要素が有する機能の詳細な説明は省略する。 Therefore, hereinafter, the EMA coefficient 132, the double operation unit 133, the EMA update unit 142, and the storage unit 143, which are included in the learning device 10 according to the second embodiment of the present invention, will be mainly described, and other components will be described. A detailed description of the functions it has is omitted.

本発明の第２の実施形態において、ＥＭＡ係数１３２は、重み係数α_ｔ，ｉ（ｉ＝１，２，・・・，ｎ）によって表現される。すなわち、本発明の第２の実施形態において、ＥＭＡ係数１３２である重み係数α_ｔは、学習用ＮＮ１１０または検証用ＮＮ１３０を構成する層の総数ｎと同じ数設けられている。 In the second embodiment of the present invention, the EMA coefficients 132 are represented by weighting coefficients α _t,i (i=1, 2, . . . , n). That is, in the second embodiment of the present invention, the number of weighting coefficients α _t that are the EMA coefficients 132 is the same as the total number n of layers that constitute the learning NN 110 or the verification NN 130 .

（二重演算部１３３）
二重演算部１３３は、入力部１０２から出力されたミニバッチに含まれる入力データと、更新部１２２による更新後の重みパラメータ１１１が設定された学習用ＮＮ１１０と、検証用ＮＮ１３０と、層ごとに設けられたＥＭＡ係数１３２とに基づいて、ＥＭＡ推論結果を得る。 (Double operation unit 133)
The double computing unit 133 includes the input data included in the mini-batch output from the input unit 102, the learning NN 110 set with the weighting parameter 111 updated by the updating unit 122, and the verification NN 130 provided for each layer. EMA inference results are obtained based on the calculated EMA coefficients 132 .

図４は、本発明の第２の実施形態に係る二重演算部１３３の処理を説明するための図である。図４を参照すると、二重演算部１３３の構成例が示されている。図４に示された例では、第ｋ層と第（ｋ＋１）層（ただし、ｋ＝１，２，・・・，ｎ－１）に着目する。なお、畳み込み層ｆ_ｋは、本発明の第１の実施形態に係る畳み込み層ｆ_ｉと同様の構成を有し、畳み込み層ｆ’_ｋは、本発明の第１の実施形態に係る畳み込み層ｆ’_ｉと同様の構成を有する。 FIG. 4 is a diagram for explaining the processing of the double computing unit 133 according to the second embodiment of the present invention. Referring to FIG. 4, a configuration example of the double operation unit 133 is shown. In the example shown in FIG. 4, attention is paid to the kth layer and the (k+1)th layer (where k=1, 2, . . . , n−1). Note that the convolution layer f _k has the same configuration as the convolution layer f _i according to the first embodiment of the present invention, and the convolution layer f′ _k has the same configuration as the convolution layer f k according to the first embodiment of the present invention. ' has the same configuration as _i .

そして、畳み込み層ｆ’_ｋへの入力がＸ_ｋと表現され、その畳み込み層ｆ’_ｋからの出力がＺ’_ｋと表現されている。また、畳み込み層ｆ_ｋへの入力がＸ_ｋと表現され、その畳み込み層ｆ’_ｋからの出力がＺ_ｋと表現されている。二重演算部１３３は、畳み込み層ｆ’_ｋからの出力Ｚ’_ｋ＝ｆ’_ｋ（Ｘ_ｋ）と、畳み込み層ｆ_ｋからの出力Ｚ_ｋ＝ｆ_ｋ（Ｘ_ｋ）と、第ｋ層に対応するＥＭＡ係数１３２であるα_ｔ，ｋとに基づいて、出力Ｙ_ｋを算出する。 The input to the convolutional layer _f'k is expressed as _Xk , and the output from the convolutional layer _f'k is expressed as _Z'k . Also, the input to the convolutional layer _fk is expressed as _Xk , and the output from the convolutional layer _f'k is expressed as _Zk . The dual operation unit 133 outputs Z′ _k =f′ _k (X _k ) from the convolutional layer f′ _k , Z _k =f _k (X _k ) from the convolutional layer f _k , and the k-th layer Based on the corresponding EMA coefficients 132, α _t,k , the output Y _k is calculated.

より詳細に、二重演算部１３３は、出力Ｚ_ｋ＝ｆ_ｋ（Ｘ_ｋ）と出力Ｚ’_ｋ＝ｆ’_ｋ（Ｘ_ｋ）とに対する、第ｋ層に対応するＥＭＡ係数１３２であるα_ｔ，ｋによる重み付き和によって、出力Ｙ_ｋを算出する。すなわち、二重演算部１３３は、Ｚ_ｋ＝ｆ_ｋ（Ｘ_ｋ）に（１－α_ｔ，ｋ）を乗算し、出力Ｚ’_ｋ＝ｆ’_ｋ（Ｘ_ｋ）にα_ｔ，ｋを乗算し、乗算結果同士を足し合わせることによって、出力Ｙ_ｋを算出する。かかる出力Ｙ_ｋの算出式は、以下の式（６）によって表現される。 _More _specifically _, the _dual operation unit 133 _calculates _α _{t , k} to compute the output Y _k . That is, the double operation unit 133 multiplies Z _k =f _k (X _k ) by (1−α _t,k ), and multiplies the output Z′ _k =f′ _k (X _k ) by α _t,k Then, the output _Yk is calculated by adding the multiplication results together. A formula for calculating the output _Yk is expressed by the following formula (6).

なお、出力Ｙ_ｋは、第（ｋ＋１）層への入力Ｘ_ｋ＋１となる。そして、第（ｋ＋１）層における処理も、第ｋ層と同様にして実行される。すなわち、二重演算部１３３は、畳み込み層ｆ’_ｋ＋１からの出力Ｚ’_ｋ＋１＝ｆ’_ｋ＋１（Ｘ_ｋ＋１）と、畳み込み層ｆ_ｋ＋１からの出力Ｚ_ｋ＋１＝ｆ_ｋ＋１（Ｘ_ｋ＋１）と、ＥＭＡ係数１３２であるα_{ｔ，ｋ＋１}とに基づいて、出力Ｙ_ｋ＋１を算出する。 Note that the output Y _k becomes the input X _k+1 to the (k+1)th layer. Processing in the (k+1)-th layer is also performed in the same manner as in the k-th layer. That is, the dual operation unit 133 outputs Z′ _k+1 =f′ _k+1 (X _k+1 ₎ from the convolutional layer f′ _k+1 , Z _k+1 =f _k+1 (X _k+1 ) from the convolutional layer f k+1 , and the EMA coefficient Based on α _t,k+1 which is 132, the output Y _k+1 is calculated.

より詳細に、二重演算部１３３は、出力Ｚ’_ｋ＋１＝ｆ’_ｋ＋１（Ｘ_ｋ＋１）と出力Ｚ_ｋ＋１＝ｆ_ｋ＋１（Ｘ_ｋ＋１）とに対する、ＥＭＡ係数１３２であるα_{ｔ，ｋ＋１}による重み付き和によって、出力Ｙ_ｋ＋１を算出する。すなわち、二重演算部１３３は、Ｚ_ｋ＋１＝ｆ_ｋ＋１（Ｘ_ｋ＋１）に（１－α_{ｔ，ｋ＋１}）を乗算し、出力Ｚ’_ｋ＋１＝ｆ’_ｋ＋１（Ｘ_ｋ＋１）にα_{ｔ，ｋ＋１}を乗算し、乗算結果同士を足し合わせることによって、出力Ｙ_ｋ＋１を算出する。 More specifically, the double operation unit 133 performs a weighted summation of the output Z′ _k+1 =f′ _k+1 (X _k+1 ) and the output Z _k+1 =f _k+1 (X _k+1 ) by α _t,k+1 which is the EMA coefficient 132. to calculate the output Yk ₊₁ . That is, the double operation unit 133 multiplies Z _k+1 =f _k+1 (X _k+1 ) by (1−α _t,k+1 ), and multiplies the output Z′ _k+1 =f′ _k+1 (X _k+1 ) by α _t,k+1 Then, the output Yk ₊₁ is calculated by adding the multiplication results together.

なお、出力Ｙ_ｋ＋１は、第（ｋ＋２）層への入力Ｘ_ｋ＋２となる。このようにして、第１層から最終層までの各層による処理が実行され、最終層に対応する出力Ｙ_ｎが算出される。二重演算部１３３は、最終層に対応する出力Ｙ_ｎをＥＭＡ推論結果としてＥＭＡ評価部１４１に出力する。 Note that the output Y _k+1 becomes the input X _k+2 to the (k+2)th layer. In this way, each layer from the first layer to the final layer is processed, and the output _Yn corresponding to the final layer is calculated. The double operation unit 133 outputs the output _Yn corresponding to the final layer to the EMA evaluation unit 141 as the EMA inference result.

（ＥＭＡ更新部１４２）
ＥＭＡ更新部１４２は、ＥＭＡ評価部１４１から出力されたＥＭＡ評価結果に基づいて、ＥＭＡ係数１３２である重み係数α_ｔ，ｋ（ｋ＝１，２，・・・，ｎ）を更新する。これによって、ＥＭＡ推論結果が教師ラベルに近づくように、ＥＭＡ重みパラメータ１３１が訓練され得る。より詳細に、ＥＭＡ更新部１４２は、ＥＭＡ評価結果に基づく誤差逆伝播法（バックプロパゲーション）によってＥＭＡ係数１３２である重み係数α_ｔ，ｋ（ｋ＝ｎ，ｎ－１，・・・，１）を順に更新してよい。一例として、重み係数α_ｔ，ｋの更新式は、以下の式（７）によって表現され得る。 (EMA update unit 142)
The EMA updating unit 142 updates the weighting coefficients α _t,k (k=1, 2, . This allows the EMA weight parameter 131 to be trained such that the EMA inference result approaches the teacher label. More specifically, the EMA updating unit 142 updates weighting coefficients α _t,k (k=n, n−1, . . . , 1 ) may be updated sequentially. As an example, the weighting coefficient α _t,k update formula can be expressed by the following formula (7).

（保存部１４３）
保存部１４３は、第１層から最終層までの各層について、更新後の重みパラメータ１１１に（１－α_{ｔ＋１，ｋ}）を乗算し、ＥＭＡ重みパラメータ１３１にα_{ｔ＋１，ｋ}を乗算し、乗算結果同士を足し合わせることによって、更新後のＥＭＡ重みパラメータ１３１を算出する。かかる更新後のＥＭＡ重みパラメータ１３１の算出式は、以下の式（８）によって表現される。 (storage unit 143)
For each layer from the first layer to the last layer, the storage unit 143 multiplies the updated weight parameter 111 by (1−α _t+1,k ), multiplies the EMA weight parameter 131 by α _t+1,k , and stores the multiplication result The updated EMA weight parameter 131 is calculated by adding them together. A formula for calculating the EMA weighting parameter 131 after such updating is expressed by the following formula (8).

以上、本発明の第２の実施形態に係る学習装置の構成例について説明した。 The configuration example of the learning device according to the second embodiment of the present invention has been described above.

（学習段階の動作）
続いて、本発明の第２の実施形態に係る学習装置１０によって実行される「学習段階」の動作の流れについて説明する。本発明の第２の実施形態に係る学習装置１０によって実行される「学習段階」の動作例を示すフローチャートは、本発明の第１の実施形態に係る学習装置１０によって実行される「学習段階」の動作例を示すフローチャート（図２）と同様である。 (Operation in learning stage)
Next, the flow of operations in the "learning stage" performed by the learning device 10 according to the second embodiment of the present invention will be described. A flowchart showing an operation example of the "learning phase" executed by the learning device 10 according to the second embodiment of the present invention is a "learning phase" executed by the learning device 10 according to the first embodiment of the present invention. This is the same as the flow chart (FIG. 2) showing an example of the operation of .

また、本発明の第２の実施形態に係る学習装置１０は、本発明の第１の実施形態に係る学習装置１０と比較して、二重演算部１３３、ＥＭＡ更新部１４２および保存部１４３の動作の詳細が異なり、その他の動作の詳細は同様である。したがって、以下では、本発明の第２の実施形態に係る学習装置１０が有する二重演算部１３３、ＥＭＡ更新部１４２および保存部１４３の動作について主に説明を行い、他の動作についての詳細な説明は省略する。 In addition, the learning device 10 according to the second embodiment of the present invention has double operation unit 133, EMA updating unit 142, and storage unit 143, compared to learning device 10 according to the first embodiment of the present invention. Operational details differ, and other operational details are similar. Therefore, hereinafter, the operations of the double operation unit 133, the EMA update unit 142, and the storage unit 143 of the learning device 10 according to the second embodiment of the present invention will be mainly described, and other operations will be described in detail. Description is omitted.

本発明の第２の実施形態においても、本発明の第１の実施形態と同様に、Ｓ１０１～Ｓ１０５が実行される。 Also in the second embodiment of the present invention, S101 to S105 are executed as in the first embodiment of the present invention.

続いて、二重演算部１３３は、入力部１０２から入力したミニバッチと、更新後の重みパラメータ１１１が設定された学習用ＮＮ１１０と、ＥＭＡ重みパラメータ１３１が設定された検証用ＮＮ１３０と、第１層から最終層までの各層に対応するＥＭＡ係数１３２とに基づいて、式（６）を用いてＥＭＡ推論結果を得る。二重演算部１３３は、ＥＭＡ推論結果をＥＭＡ評価部１４１に出力する（Ｓ１０６）。 Next, the double computing unit 133 receives the mini-batch input from the input unit 102, the learning NN 110 set with the updated weighting parameter 111, the verification NN 130 with the EMA weighting parameter 131 set, and the first layer (6) is used to obtain the EMA inference result based on the EMA coefficients 132 corresponding to each layer from to the final layer. The double operation unit 133 outputs the EMA inference result to the EMA evaluation unit 141 (S106).

ＥＭＡ評価部１４１は、二重演算部１３３から出力されるＥＭＡ推論結果を評価してＥＭＡ評価結果を得る（Ｓ１０７）。ＥＭＡ評価部１４１は、ＥＭＡ評価結果をＥＭＡ更新部１４２に出力する。ＥＭＡ更新部１４２は、ＥＭＡ評価部１４１から出力されるＥＭＡ評価結果に基づいて、式（７）を用いて第１層から最終層までの各層に対応するＥＭＡ係数１３２を更新する（Ｓ１０８）。 The EMA evaluation unit 141 obtains an EMA evaluation result by evaluating the EMA inference result output from the double operation unit 133 (S107). The EMA evaluation unit 141 outputs the EMA evaluation result to the EMA updating unit 142 . The EMA update unit 142 updates the EMA coefficients 132 corresponding to each layer from the first layer to the final layer using Equation (7) based on the EMA evaluation results output from the EMA evaluation unit 141 (S108).

その後、保存部１４３は、更新後の重みパラメータ１１１と、ＥＭＡ重みパラメータ１３１と、第１層から最終層までの各層に対応する更新後のＥＭＡ係数１３２とに基づいて、式（８）を用いて、ＥＭＡ重みパラメータ１３１を更新する（Ｓ１０９）。 After that, the storage unit 143 uses Equation (8) based on the updated weight parameter 111, the EMA weight parameter 131, and the updated EMA coefficient 132 corresponding to each layer from the first layer to the final layer. Then, the EMA weight parameter 131 is updated (S109).

本発明の第２の実施形態においても、本発明の第１の実施形態と同様に、Ｓ１１０が実行される。 Also in the second embodiment of the present invention, S110 is executed as in the first embodiment of the present invention.

以上、本発明の第２の実施形態に係る学習装置１０によって実行される「学習段階」の動作の流れについて説明した。 The flow of operations in the "learning stage" performed by the learning device 10 according to the second embodiment of the present invention has been described above.

（第２の実施形態のまとめ）
以上に説明したように、本発明の第２の実施形態によれば、検証用ＮＮの構築におけるＥＭＡ処理において、検証用ＮＮの重みパラメータと学習用ＮＮの重みパラメータとの重み付き和を計算するときに用いられる係数が学習可能であるたけでなく、その係数が学習用ＮＮの層ごとに設けられる。これにより、最適な係数が層ごとに自動的に決定されるため、本発明の第１の実施形態と同様の効果が享受され得るだけでなく、さらに自由度の高い検証用ＮＮが構築され得る。 (Summary of the second embodiment)
As described above, according to the second embodiment of the present invention, the weighted sum of the weight parameter of the verification NN and the weight parameter of the learning NN is calculated in the EMA processing in constructing the verification NN. Not only are the sometimes used coefficients learnable, but the coefficients are provided for each layer of the training NN. As a result, the optimum coefficient is automatically determined for each layer, so not only can the same effects as in the first embodiment of the present invention be obtained, but a verification NN with a higher degree of freedom can be constructed. .

以上、本発明の第２の実施形態について説明した。 The second embodiment of the present invention has been described above.

（３．ハードウェア構成例）
続いて、本発明の第１の実施形態に係る学習装置１０のハードウェア構成例について説明する。なお、本発明の第２の実施形態に係る学習装置１０のハードウェア構成も、本発明の第１の実施形態に係る学習装置１０のハードウェア構成と同様に実現され得る。 (3. Hardware configuration example)
Next, a hardware configuration example of the learning device 10 according to the first embodiment of the present invention will be described. Note that the hardware configuration of the learning device 10 according to the second embodiment of the present invention can also be realized in the same manner as the hardware configuration of the learning device 10 according to the first embodiment of the present invention.

以下では、本発明の第１の実施形態に係る学習装置１０のハードウェア構成例として、情報処理装置９００のハードウェア構成例について説明する。なお、以下に説明する情報処理装置９００のハードウェア構成例は、学習装置１０のハードウェア構成の一例に過ぎない。したがって、学習装置１０のハードウェア構成は、以下に説明する情報処理装置９００のハードウェア構成から不要な構成が削除されてもよいし、新たな構成が追加されてもよい。 A hardware configuration example of the information processing device 900 will be described below as a hardware configuration example of the learning device 10 according to the first embodiment of the present invention. Note that the hardware configuration example of the information processing device 900 described below is merely an example of the hardware configuration of the learning device 10 . Therefore, as for the hardware configuration of the learning device 10, unnecessary configurations may be deleted from the hardware configuration of the information processing device 900 described below, or a new configuration may be added.

図５は、本発明の第１の実施形態に係る学習装置１０の例としての情報処理装置９００のハードウェア構成を示す図である。情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３と、ホストバス９０４と、ブリッジ９０５と、外部バス９０６と、インタフェース９０７と、入力装置９０８と、出力装置９０９と、ストレージ装置９１０と、通信装置９１１と、を備える。 FIG. 5 is a diagram showing the hardware configuration of an information processing device 900 as an example of the learning device 10 according to the first embodiment of the present invention. The information processing device 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, a host bus 904, a bridge 905, an external bus 906, and an interface 907. , an input device 908 , an output device 909 , a storage device 910 and a communication device 911 .

ＣＰＵ９０１は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバス等から構成されるホストバス９０４により相互に接続されている。 The CPU 901 functions as an arithmetic processing device and a control device, and controls general operations within the information processing device 900 according to various programs. Alternatively, the CPU 901 may be a microprocessor. The ROM 902 stores programs, calculation parameters, and the like used by the CPU 901 . The RAM 903 temporarily stores programs used in the execution of the CPU 901, parameters that change as appropriate during the execution, and the like. These are interconnected by a host bus 904 comprising a CPU bus or the like.

ホストバス９０４は、ブリッジ９０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス９０６に接続されている。なお、必ずしもホストバス９０４、ブリッジ９０５および外部バス９０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 904 is connected via a bridge 905 to an external bus 906 such as a PCI (Peripheral Component Interconnect/Interface) bus. Note that the host bus 904, the bridge 905 and the external bus 906 do not necessarily have to be configured separately, and these functions may be implemented in one bus.

入力装置９０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチおよびレバー等ユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路等から構成されている。情報処理装置９００を操作するユーザは、この入力装置９０８を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 908 includes input means for the user to input information, such as a mouse, keyboard, touch panel, button, microphone, switch, and lever, and an input control circuit that generates an input signal based on the user's input and outputs it to the CPU 901 . etc. A user who operates the information processing apparatus 900 can input various data to the information processing apparatus 900 and instruct processing operations by operating the input device 908 .

出力装置９０９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ（ＬＣＤ）装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置、ランプ等の表示装置およびスピーカ等の音声出力装置を含む。 The output device 909 includes, for example, a CRT (Cathode Ray Tube) display device, a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Diode) device, a display device such as a lamp, and an audio output device such as a speaker.

ストレージ装置９１０は、データ格納用の装置である。ストレージ装置９１０は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置等を含んでもよい。ストレージ装置９１０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）で構成される。このストレージ装置９１０は、ハードディスクを駆動し、ＣＰＵ９０１が実行するプログラムや各種データを格納する。 The storage device 910 is a device for data storage. The storage device 910 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 910 is configured by, for example, an HDD (Hard Disk Drive). The storage device 910 drives a hard disk and stores programs executed by the CPU 901 and various data.

通信装置９１１は、例えば、ネットワークに接続するための通信デバイス等で構成された通信インタフェースである。また、通信装置９１１は、無線通信または有線通信のどちらに対応してもよい。 The communication device 911 is, for example, a communication interface configured with a communication device or the like for connecting to a network. Also, the communication device 911 may support either wireless communication or wired communication.

以上、本発明の第１の実施形態に係る学習装置１０のハードウェア構成例について説明した。 The hardware configuration example of the learning device 10 according to the first embodiment of the present invention has been described above.

（４．まとめ）
以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 (4. Summary)
Although the preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention belongs can conceive of various modifications or modifications within the scope of the technical idea described in the claims. It is understood that these also naturally belong to the technical scope of the present invention.

本発明の第１の実施形態および本発明の第２の実施形態では、学習用データが画像データである場合（特に、静止画像データである場合）について主に説明した。しかし、学習用データの種類は特に限定されない。例えば、学習用データの種類に合わせた特徴量が抽出されれば、静止画像データ以外も学習用データとして用いられ得る。例えば、学習用データは、複数のフレームを含んだ動画像データであってもよいし、音声データであってもよいし、他の時系列データであってもよい。 In the first embodiment of the present invention and the second embodiment of the present invention, the case where the learning data is image data (particularly, the case where it is still image data) has been mainly described. However, the type of learning data is not particularly limited. For example, data other than still image data can be used as learning data if a feature amount suitable for the type of learning data is extracted. For example, the learning data may be moving image data including a plurality of frames, audio data, or other time-series data.

このとき、学習用データが静止画像データである場合には、学習用ＮＮ１１０および検証用ＮＮ１３０それぞれに含まれる畳み込み層として２次元畳み込み層が用いられるのが一般的である。一方、学習用ＮＮ１１０および検証用ＮＮ１３０それぞれに含まれる畳み込み層として３次元畳み込み層が用いられれば、学習用データとして動画像データが適用され得る。 At this time, when the learning data is still image data, generally two-dimensional convolution layers are used as the convolution layers included in each of the learning NN 110 and the verification NN 130 . On the other hand, if a three-dimensional convolutional layer is used as the convolutional layer included in each of the learning NN 110 and the verification NN 130, moving image data can be applied as the learning data.

本発明の第１の実施形態および本発明の第２の実施形態では、ＥＭＡ係数１３２、すなわち重み係数αに対する制限が設けられていない。しかし、重み係数αに対して（０，１）の範囲に写像される関数を適用してもよい。例えば、（０，１）の範囲に写像される関数は、シグモイド関数sgmなどであってもよい。これによって、より安定した重み係数αの学習が行われることが期待され得る。αにシグモイド関数が適用された場合には、式（２）および式（５）は、以下の式（９）および式（１０）に置き換えられる。 In the first embodiment of the invention and the second embodiment of the invention, no restriction is placed on the EMA factor 132, ie the weighting factor α. However, a function that maps to the range (0, 1) may be applied to the weighting factor α. For example, the function mapped to the range (0,1) may be the sigmoid function sgm. As a result, it can be expected that more stable learning of the weighting factor α will be performed. If a sigmoid function is applied to α, equations (2) and (5) are replaced by equations (9) and (10) below.

１０学習装置
１０１学習用データセット
１０２入力部
１１１重みパラメータ
１１２演算部
１２１評価部
１２２更新部
１３１ＥＭＡ重みパラメータ
１３２ＥＭＡ係数
１３３二重演算部
１４１ＥＭＡ評価部
１４２ＥＭＡ更新部
１４３保存部

10 learning device 101 learning data set 102 input unit 111 weight parameter 112 calculation unit 121 evaluation unit 122 update unit 131 EMA weight parameter 132 EMA coefficient 133 double calculation unit 141 EMA evaluation unit 142 EMA update unit 143 storage unit

Claims

a first computing unit that outputs a first inference result of the first input data based on the first input data and a first neural network;
a first evaluation unit that outputs a first evaluation result based on the first inference result;
a first updating unit that updates a first weight parameter of the first neural network based on the first evaluation result;
Based on the second input data, the first neural network set with the updated first weight parameter, the second neural network, and the weighting factor of the second neural network, the a second computing unit that outputs a second inference result of the second input data;
a second evaluation unit that outputs a second evaluation result based on the second inference result;
a weighting factor updating unit that updates the weighting factor based on the second evaluation result;
A second updating unit that updates the second weighting parameter based on the updated first weighting parameter, the second weighting parameter of the second neural network, and the updated weighting factor. and,
An information processing device.

The second computing unit
inputting the second input data to the first layer of each of the first neural network and the second neural network in which the updated first weight parameter is set;
The information processing device according to claim 1 .

The second computing unit
For each of the first neural network and the second neural network to which the updated first weight parameter is set, the second and subsequent layers are weighted by the weight coefficient with respect to the output from the immediately preceding layer. enter the sum,
The information processing apparatus according to claim 1 or 2.

The second evaluation unit calculates a loss based on the second evaluation result and the teacher label, and outputs the second evaluation result based on the loss.
The information processing apparatus according to any one of claims 1 to 3.

The weighting factor updating unit
updating the weighting factor by error backpropagation based on the second evaluation result;
The information processing apparatus according to any one of claims 1 to 4.

The second updating unit
updating the second weighting parameter by a weighted sum of the updated first weighting parameter and the second weighting parameter of the second neural network by the updated weighting factor;
The information processing apparatus according to any one of claims 1 to 5.

The weighting factor is provided for each layer included in the first neural network and the second neural network.
The information processing apparatus according to any one of claims 1 to 6.

The first input data and the second input data are the same data,
The information processing apparatus according to any one of claims 1 to 7.

The first input data and the second input data are different data,
The information processing apparatus according to any one of claims 1 to 7.

outputting a first inference result of the first input data based on the first input data and a first neural network;
outputting a first evaluation result based on the first inference result;
updating a first weight parameter of the first neural network based on the first evaluation result;
Based on the second input data, the first neural network set with the updated first weight parameter, the second neural network, and the weighting factor of the second neural network, the outputting a second inference result for the second input data;
outputting a second evaluation result based on the second inference result;
updating the weighting factor based on the second evaluation result;
updating the second weighting parameter based on the updated first weighting parameter, the second weighting parameter of the second neural network, and the updated weighting factor;
A method of processing information, comprising:

the computer,
a first computing unit that outputs a first inference result of the first input data based on the first input data and a first neural network;
a first evaluation unit that outputs a first evaluation result based on the first inference result;
a first updating unit that updates a first weight parameter of the first neural network based on the first evaluation result;
Based on the second input data, the first neural network set with the updated first weight parameter, the second neural network, and the weighting factor of the second neural network, the a second computing unit that outputs a second inference result of the second input data;
a second evaluation unit that outputs a second evaluation result based on the second inference result;
a weighting factor updating unit that updates the weighting factor based on the second evaluation result;
A second updating unit that updates the second weighting parameter based on the updated first weighting parameter, the second weighting parameter of the second neural network, and the updated weighting factor. and,
A program that functions as an information processing device comprising