WO2022162839A1 - Learning device, learning method, and recording medium - Google Patents

Learning device, learning method, and recording medium Download PDF

Info

Publication number
WO2022162839A1
WO2022162839A1 PCT/JP2021/003058 JP2021003058W WO2022162839A1 WO 2022162839 A1 WO2022162839 A1 WO 2022162839A1 JP 2021003058 W JP2021003058 W JP 2021003058W WO 2022162839 A1 WO2022162839 A1 WO 2022162839A1
Authority
WO
WIPO (PCT)
Prior art keywords
teacher
data
learning
label
models
Prior art date
Application number
PCT/JP2021/003058
Other languages
French (fr)
Japanese (ja)
Inventor
学 中野
裕一 中谷
遊哉 石井
哲夫 井下
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2022577920A priority Critical patent/JPWO2022162839A5/en
Priority to PCT/JP2021/003058 priority patent/WO2022162839A1/en
Publication of WO2022162839A1 publication Critical patent/WO2022162839A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a neural network learning method using distillation.
  • a highly accurate learning model In machine learning, it is possible to construct a highly accurate learning model by constructing a neural network with deep layers. Such a learning model is called deep learning or deep learning, and consists of several million to hundreds of millions of neural networks. In deep learning, it is known that the more complicated the learning model and the deeper the layers, that is, the more the number of neural networks, the higher the accuracy. On the other hand, bloated models require more computer memory, so methods have been proposed to build smaller models while maintaining the performance of huge models.
  • Non-Patent Document 1 and Patent Document 1 describe Knowledge Distillation ( hereinafter referred to as “distillation”).
  • the data used during training of the teacher model is used as input to the teacher model and the student model, and the weighted average of the predicted label output by the teacher model and the true label given by the learning data is approximated to the weighted average of the student model. do the learning. Since the learning method described in Non-Patent Document 1 uses a weighted average label, the same data used for learning the teacher model is required when learning the student model. However, deep learning requires a large amount of training data, so from the viewpoint of storage medium capacity limitations, protection of privacy information contained in the data, and data copyright, it is not possible to retain the training data itself. It can be difficult.
  • Non-Patent Document 2 describes distillation learning using data unknown to the teacher model, that is, data for which the true label associated with the input data is unknown, without using the data used in learning the teacher model. There is This learning method trains the student model so as to approximate the predicted label of the teacher model for unknown data.
  • Non-Patent Document 2 images generated using a GAN (Generative Adversarial Network) are used to perform distillation learning from teacher models to student models.
  • GAN Geneative Adversarial Network
  • One purpose of the present invention is to realize distillation learning that generates high-performance student models using unknown data.
  • the learning device comprises: multiple trained teacher models, Data generation means for generating generated data based on an input pseudo-correct label, wherein each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • data generating means for generating data as the generated data; learning means for performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models.
  • a learning method comprises: Get multiple trained teacher models, Generate generated data based on the input pseudo-correct label, Using the generated data as an input, performing distillation learning of a student model using the plurality of teacher models, The generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • the recording medium comprises Get multiple trained teacher models, Generate generated data based on the input pseudo-correct label, A process of performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models,
  • the generated data records a program that causes a computer to execute processing such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • FIG. 2 shows a hardware configuration of a learning device according to the first embodiment
  • 4 is a flowchart showing the overall flow of learning processing
  • FIG. 10 is a diagram illustrating an example of a discrimination boundary of a teacher model
  • the learning method of the supervised model is shown schematically.
  • the functional configuration of the learning device when performing the learning of the data generator is shown.
  • 4 shows a configuration example of a label distribution determination unit
  • 1 shows the functional configuration of a learning device when learning a student model
  • 10 is a flow chart of learning processing of a student model
  • 2 shows the functional configuration of a learning device according to a second embodiment
  • 9 is a flowchart of learning processing according to the second embodiment
  • FIG. 1 is a block diagram showing the hardware configuration of the learning device according to the first embodiment.
  • the learning device 10 includes an interface (I/F) 12, a processor 13, a memory 14, a recording medium 15, and a database (DB) 16.
  • I/F interface
  • DB database
  • the interface 12 inputs and outputs data with an external device. Specifically, the interface 12 acquires learning data and unknown data used by the learning device 10 from an external device.
  • the processor 13 is a computer such as a CPU (Central Processing Unit), and controls the learning device 100 as a whole by executing a program prepared in advance.
  • the processor 13 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array).
  • the processor 13 executes learning processing, which will be described later.
  • the memory 14 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like.
  • the memory 14 stores neural network models used by the learning apparatus 10, specifically, teacher models, student models, and the like.
  • the memory 14 is also used as a working memory while the processor 13 is executing various processes.
  • the recording medium 15 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or a semiconductor memory, and is configured to be detachable from the learning device 10 .
  • the recording medium 15 records various programs executed by the processor 13 .
  • programs recorded in the recording medium 15 are loaded into the memory 14 and executed by the processor 13 .
  • Database 16 stores data entered via interface 12 .
  • FIG. 2 is a flowchart showing the overall flow of learning processing.
  • the learning process is roughly divided into teacher model learning (step S10), data generator learning (step S20), and student model learning (step S30).
  • Training of a teacher model is to learn multiple teacher models using data obtained from multiple sites (domains). As a result, a plurality of trained teacher models are obtained.
  • the learning of the data generation unit is to learn the data generation unit that generates data used for learning the student model using a plurality of trained teacher models. Note that the data generation unit generates an image using a GAN.
  • the student model is learned by distillation using a plurality of trained teacher models and a trained data generator. A detailed description will be given below in order.
  • Aim A To have high performance for images in the target domain. This is similar to normal learning.
  • Purpose B Make the output distribution of each teacher model as different as possible for images other than the target domain. That is, each training model is trained to intentionally increase the degree of disagreement of the output with respect to images other than the target domain.
  • FIG. 3 shows an example of a distribution map of feature quantities.
  • class X and class Y are to be classified in a certain target domain. It is assumed that feature amounts belonging to area 1 on the distribution map are classified into class X, and feature amounts belonging to area 2 are classified into class Y.
  • FIG. 3 shows an example of a distribution map of feature quantities.
  • class X and class Y are to be classified in a certain target domain. It is assumed that feature amounts belonging to area 1 on the distribution map are classified into class X, and feature amounts belonging to area 2 are classified into class Y.
  • the discrimination boundaries of teacher models 1 and 2 that simultaneously satisfy the above objectives A and B are indicated by F1 and F2, respectively.
  • the identification boundaries F1 and F2 both divide the area 1 and the area 2 into different regions, class X and class Y can be classified correctly. Therefore, the teacher models 1 and 2 both satisfy the objective A described above.
  • the identification boundaries F1 and F2 classify most of the areas (white areas) other than Area 1 and Area 2 into different classes. Therefore, teacher models 1 and 2 satisfy objective B above. That is, the discrimination boundaries F1 and F2 correctly classify classes X and Y of the target domain, and classify most other regions into different classes. Therefore, the teacher models 1 and 2 simultaneously satisfy the objectives A and B described above.
  • the discrimination boundary F3 is, for example, similar to the discrimination boundaries F1 and F2, as shown in FIG. , and the areas other than areas 1 and 2 are divided into two areas different from the identification boundaries F1 and F2.
  • a plurality of teacher models learned in this way are used in the learning of the data generation unit and in the learning of the student models, which will be described later.
  • FIG. 4 schematically shows the learning method of the teacher model.
  • N teacher models 20-1 to 20-N are learned.
  • Each teacher model 20-1 to 20-N is a model using a neural network.
  • teacher model 20 when individual teacher models 20-1 to 20-N are not distinguished, they may simply be referred to as "teacher model 20".
  • elements to be learned are shown in gray.
  • learning data is input to each teacher model 20-1 to 20-N.
  • This learning data is learning data of the target domain, and a correct label is prepared. That is, this training data includes images obtained in the target domain and correct labels for the images.
  • Each teacher model 20-1 to 20-N outputs predicted labels 1 to N for the input image, respectively.
  • the learning device 10 learns the teacher model 20-1 so that the error between the predicted label 1 output by the teacher model 20-1 and the correct label prepared as learning data is minimized.
  • the learning device 10 also performs the same processing for the other teacher models 20-2 to 20-N to learn each of the teacher models 20-2 to 20-N.
  • each of the teacher models 20-1 to 20-N is learned to correctly predict the image data of the target domain.
  • Unknown data is data that is unknown to the teacher model, that is, data that is not used for learning of the teacher model.
  • the unknown data is an image other than the image of the target domain, and no correct label is prepared.
  • Each teacher model 20-1 to 20-N outputs prediction labels 1 to N, respectively, for the input unknown data.
  • the learning device 10 learns the teacher model 20-1 so that the degree of mismatch between the predicted label 1 and the other predicted labels 2 to N is maximized.
  • the learning device 10 also performs the same processing for the other teacher models 20-2 to 20-N to learn each of the teacher models 20-2 to 20-N.
  • each teacher model 20-1 to 20-N has a high degree of discrepancy in predicted labels for unknown data that are images of domains other than the target domain (hereinafter also referred to as “non-target domains”). that is, to output different prediction labels as much as possible. This satisfies Objective B above.
  • a data generator generates an image using a GAN.
  • the GAN is trained so that the image generated by the data generation unit is close to the image of the target domain.
  • consistency loss as a loss function. That is, when an image generated by a GAN is input to a plurality of teacher models, a loss is added that decreases as the output distributions of the plurality of teacher models match.
  • each supervised model will output a highly matching prediction label for images in the target domain and a low matching prediction label for images in the non-target domain.
  • the learning device 10 inputs images generated by the GAN to each teacher model, and calculates consistency loss based on the predicted label output by each teacher model. Then, the learning device 10 learns the GAN so as to generate an image with a small consistency loss. As a result, the GAN is trained to output an image close to the image of the target domain.
  • FIG. 5 shows the functional configuration of the learning device 10 when learning the data generator.
  • the learning device 10 includes a random number generator 31, a data generation unit 32, teacher models 20-1 to 20-N, a label error minimization unit 33, and a label distribution determination unit .
  • the data generation unit 32 shown in gray is the object of learning.
  • the teacher models 20-1 to 20-N have already been learned in the direction described above.
  • the random number generator 31 generates random number vectors and outputs them to the data generator 32 .
  • the data generator 32 can generate various variations of images.
  • the data generator 32 is configured by GAN. Unknown data is input to the data generator 32 . Unknown data is image data of non-target domains, as described above. The unknown data is for GAN to learn the likeness of natural images, and images obtained from general image datasets such as Image Net can be used. By using the image of the image data set as the unknown data, the data generator 32 can generate an image that looks like a natural image. Note that unknown data can also be regarded as auxiliary data or proxy data in the sense that the GAN learns the likeness of a natural image.
  • a pseudo-correct label D3 is input to the data generation unit 32.
  • the pseudo-correct label D3 is data specifying the class of the image generated by the data generation unit 32, and can be, for example, a class number.
  • the data generator 32 generates an image D1 of the class indicated by the pseudo-correct label D3 based on the input random number vector and the pseudo-correct label D3, and outputs it to the teacher models 20-1 to 20-N.
  • the data generator (GAN) 32 includes a generator and a discriminator. As a basic operation, the generator receives a random number vector and a pseudo-correct label D3 and generates an image D1. The image D1 or unknown data is input to the discriminator. The classifier is trained with the goal of distinguishing between the image D1 generated by the generator and unknown data, and the generator is trained with the goal of generating the image D1 that the classifier cannot distinguish. In this embodiment, in addition to the above learning, generator learning is performed using the label error minimizing unit 33 as will be described later.
  • the teacher models 20-1 to 20-N each perform prediction on the image D1 and output the predicted label D2 to the label error minimizing unit 33 and the label distribution determining unit 34.
  • the predicted label output by the teacher model 20 is hereinafter referred to as a "teacher predicted label”.
  • the label distribution determination unit 34 calculates the label distribution based on the teacher prediction labels D2 input from the teacher models 20-1 to 20-N, and determines the pseudo-correct label D3 so that the calculated distribution is uniform. and output to the data generator 32 . For example, when the teacher model 20 performs 10-class classification, each of the teacher models 20-1 to 20-N outputs the 10-class classification result as the teacher prediction label D2.
  • the label distribution determination unit 34 aggregates the teacher prediction labels D2 output by the teacher models 20-1 to 20-N, and determines the class of the image to be generated by the data generation unit 32 next so that the distribution is uniform.
  • a pseudo-correct label D3 shown is generated and output to the data generator 32 .
  • the data generation unit 32 generates images so that the teacher prediction labels D2 output from the teacher models 20-1 to 20-N are evenly distributed.
  • the label distribution determining unit 34 outputs the pseudo-correct label D3 to the label error minimizing unit 33.
  • the label error minimizing unit 33 makes the data generating unit 32 learn using the teacher prediction label D2 and the pseudo-correct label D3 input from each of the teacher models 20-1 to 20-N. Specifically, the label error minimizing unit 33 calculates the error between the teacher prediction label D2 output by each of the teacher models 20-1 to 20-N and the pseudo-correct label D3, and minimizes the sum of the errors.
  • the parameters of the neural network that constitutes the data generator 32 are optimized.
  • the label error minimization unit 33 performs learning of the data generation unit 32 based on the consistency loss described above. Specifically, the label error minimizing unit 33 calculates the consistency loss based on the teacher prediction label D2 output by each of the teacher models 20-1 to 20-N.
  • the consistency loss is a loss that becomes smaller as the distributions of teacher prediction labels D2 output by a plurality of teacher models 20 match each other. Therefore, the label error minimizing unit 33 causes the data generating unit 32 to generate Learn instruments.
  • the data generating unit 32 generates an image such that the distribution of the teacher prediction labels D2 output by the teacher models 20-1 to 20-N matches when the generated image is input, that is, the image of the target domain. It is trained to produce close images.
  • FIG. 6 shows a configuration example of the label distribution determination unit 34.
  • the label distribution determining section 34 includes a cumulative probability density calculating section 35 , a weight calculating section 36 and a multiplier 37 .
  • a teacher prediction label D2 output from each teacher model 20-1 to 20-N is input to a cumulative probability density calculator 35 and a multiplier 37.
  • FIG. The cumulative probability density calculation unit 35 calculates the cumulative probability distribution of each class from each teacher prediction label D2, obtains the cumulative probability density, and inputs the cumulative probability density to the weight calculation unit .
  • the weight calculator 36 calculates a weight for each class so that the cumulative probability density of each class is uniform. For example, the weight calculator 36 may use the reciprocal of the cumulative probability density as the weight, or the user may arbitrarily determine the weight for some classes.
  • the multiplier 37 then multiplies the teacher prediction label D2 by a weight to determine a pseudo-correct label D3 for each piece of unknown data.
  • FIG. 7 shows the functional configuration of the learning device 10 when learning a student model.
  • the learning device 10 includes a random number generator 31 , a data generator 32 , teacher models 20 - 1 to 20 -N, a label distribution determiner 34 , a student model 40 and a distillation learning section 41 .
  • the student model 40 is the object of learning.
  • the teacher models 20-1 to 20-N and the data generator 32 have been trained by the learning method described above. Random number generator 31 and label distribution determining unit 34 are the same as those at the time of learning of the data generating unit shown in FIG.
  • the data generation unit 32 uses the pseudo-correct label D3 and the random number vector from the random number generator 31 to generate the image D1, and the teacher model 20- 1 to 20-N and to the student model 40.
  • the student model 40 is constructed using a neural network like the teacher model.
  • Each of the teacher models 20-1 to 20-N outputs a teacher prediction label D2 for the image D1 to the distillation learning unit 41.
  • the student model 40 also outputs a predicted label (hereinafter also referred to as a “student predicted label”) D5 for the image D1 to the distillation learning unit 41 .
  • the distillation learning unit 41 learns the student model 40 so that the student model 40 approaches the teacher model 20 . Specifically, the distillation learning unit 41 adjusts the parameters of the neural network that constitutes the student model 40 so that the sum of errors between the student prediction label D5 and each teacher prediction label D2 and pseudo-correct label D3 is minimized. Optimize. In this way, the learning of the student model is performed by distillation.
  • the data generation unit 32 is trained so that it can generate an image D1 close to the image of the target domain based on unknown data. Therefore, even if the training data for the teacher model cannot be obtained, the student model 40 undergoes distillation learning using the image D1 that is close to the image of the target domain generated from the unknown data. can be inherited.
  • the data generation unit 32 is an example of data generation means
  • the image D1 is an example of generated data.
  • the distillation learning unit 41 is an example of learning means
  • the label distribution determination unit 34 is an example of label distribution determination means.
  • FIG. 8 is a flowchart of a student model learning process by the learning device 10 shown in FIG. This processing is realized by the processor 13 shown in FIG. 1 executing a program prepared in advance.
  • the label distribution determination unit 34 generates a pseudo-correct label D3 and outputs it to the data generation unit 32 (step S31).
  • the data generator 32 uses the random number vector to generate the image D1 of the class indicated by the input pseudo-correct label D3, and outputs it to the teacher model 20 and the student model 40 (step S32).
  • each teacher model 20 and student model 40 predict the image D1 and output a teacher prediction label D2 and a student prediction label D5 to the distillation learning unit 41 (step S33).
  • the distillation learning unit 41 learns the student model so that the error between the student prediction label D5 and each teacher prediction label D2 and pseudo-correct label D3 is minimized (step S34).
  • the processing of steps S31 to S34 is repeatedly executed until a predetermined end condition is satisfied, and when the predetermined end condition is satisfied (step S35: Yes), the processing ends.
  • distillation learning is performed using an image similar to the image of the target domain generated by the trained data generation unit 32. Therefore, even when unknown data is used, the performance of the teacher model It is possible to obtain a student model that appropriately inherits
  • FIG. 9 shows the functional configuration of a learning device 50 according to the second embodiment.
  • the hardware configuration of the learning device 50 is the same as that shown in FIG.
  • the learning device 50 performs distillation learning using unknown data that has not been learned by the teacher model. 54.
  • a plurality of teacher models have already been trained, and the student model 54 is the subject of learning.
  • the data generating means 52 generates generated data based on the input pseudo-correct label. Specifically, the data generating means 52 generates data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label as the generated data.
  • the learning means 53 receives the generated data and performs distillation learning of the student model 54 using a plurality of teacher models 51 . Distillation learning can thus be performed using unknown data.
  • FIG. 10 is a flowchart of learning processing according to the second embodiment.
  • a plurality of trained teacher models are acquired (step S51).
  • generation data is generated based on the input pseudo-correct label (step S52).
  • the generated data is data such that each of a plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • distillation learning of the student model is performed using a plurality of teacher models (step S53).
  • the learning means inputs the generated data to the plurality of teacher models and student models, and uses teacher prediction labels output by the plurality of teacher models as correct labels to learn the student models. learning device.
  • Appendix 4 The learning device according to appendix 3, wherein the known input data is data used for learning the teacher model, and the unknown input data is data not used for learning the teacher model.
  • the learning means minimizes the sum of the error between the student predicted label output by the student model and the teacher predicted label output by the plurality of teacher models and the error between the student predicted label and the pseudo-correct label. 7.
  • the learning device according to any one of appendices 1 to 6, which learns the student model to
  • Appendix 8 The learning device according to any one of Appendices 1 to 7, further comprising: label distribution determining means for adjusting the values of the pseudo-correct labels so that the teacher-predicted labels output by the plurality of teacher models are evenly distributed among the classes. .
  • (Appendix 9) Get multiple trained teacher models, Generate generated data based on the input pseudo-correct label, Using the generated data as an input, performing distillation learning of a student model using the plurality of teacher models, The learning method, wherein the generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • (Appendix 10) Get multiple trained teacher models, Generate generated data based on the input pseudo-correct label, A process of performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models, The generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  • a recording medium recording a program for causing a computer to execute processing. .

Abstract

This learning device performs distillation learning using unknown data that a teacher model has not learned. A data generation means generates generated data on the basis of an input pseudo-correct label. Specifically, the data generation means generates, as the generated data, data such that each of a plurality of teacher models outputs a teacher prediction label close to the pseudo-correct label when the data is input into the teacher model. A learning means uses the plurality of teacher models to perform distillation learning of a student model using the generated data as input.

Description

学習装置、学習方法、及び、記録媒体LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM
 本発明は、蒸留を利用したニューラルネットワークの学習方法に関する。 The present invention relates to a neural network learning method using distillation.
 機械学習においては、層の深いニューラルネットワークを組むことで高精度な学習モデルを構成することができる。このような学習モデルはディープラーニングや深層学習と呼ばれ、数百万から数億個ものニューラルネットからなる。ディープラーニングにおいては、学習モデルが複雑で層が深いほど、つまり、ニューラルネットの個数が多いほど高精度になることが知られている。一方で、モデルの肥大化はより多くの計算機のメモリを要するため、巨大なモデルの性能を維持したまま、より小さいモデルを構築する方法が提案されている。 In machine learning, it is possible to construct a highly accurate learning model by constructing a neural network with deep layers. Such a learning model is called deep learning or deep learning, and consists of several million to hundreds of millions of neural networks. In deep learning, it is known that the more complicated the learning model and the deeper the layers, that is, the more the number of neural networks, the higher the accuracy. On the other hand, bloated models require more computer memory, so methods have been proposed to build smaller models while maintaining the performance of huge models.
 非特許文献1及び特許文献1には、学習済みの巨大なモデル(以下、「教師モデル」と呼ぶ。)を小規模なモデル(以下、「生徒モデル」と呼ぶ。)で模倣するKnowledge Distillation(以下、「蒸留」と呼ぶ。)という学習方法が記載されている。この方法は、教師モデルの学習時に利用したデータを教師モデルと生徒モデルへの入力とし、教師モデルが出力する予測ラベルと学習データで与えられる真のラベルとの加重平均に近づくように生徒モデルの学習を行う。非特許文献1に記載された学習方法は、加重平均ラベルを用いるため、生徒モデルの学習の際に教師モデルの学習に用いたのと同一のデータが必要である。しかしながら、ディープラーニングには多量の学習データが必要なため、記憶媒体の容量制限や、データに含まれるプライバシー情報の保護や、データの著作権などの観点から、学習データそのものを残しておくことが困難なことがある。 Non-Patent Document 1 and Patent Document 1 describe Knowledge Distillation ( hereinafter referred to as “distillation”). In this method, the data used during training of the teacher model is used as input to the teacher model and the student model, and the weighted average of the predicted label output by the teacher model and the true label given by the learning data is approximated to the weighted average of the student model. do the learning. Since the learning method described in Non-Patent Document 1 uses a weighted average label, the same data used for learning the teacher model is required when learning the student model. However, deep learning requires a large amount of training data, so from the viewpoint of storage medium capacity limitations, protection of privacy information contained in the data, and data copyright, it is not possible to retain the training data itself. It can be difficult.
 非特許文献2には、教師モデルの学習時に利用したデータを用いずに、教師モデルにとって未知のデータ、つまり入力データに対応付けられた真のラベルが不明なデータを用いる蒸留学習が記載されている。この学習方法は、未知データに対する教師モデルの予測ラベルに近づくように生徒モデルの学習を行う。 Non-Patent Document 2 describes distillation learning using data unknown to the teacher model, that is, data for which the true label associated with the input data is unknown, without using the data used in learning the teacher model. there is This learning method trains the student model so as to approximate the predicted label of the teacher model for unknown data.
特開2019-046380号公報JP 2019-046380 A
 非特許文献2に記載の学習方法では、GAN(Generative Adversarial Network)を用いて生成した画像を用いて、教師モデルから生徒モデルへの蒸留学習を行う。しかし、GANを用いて生成する画像がターゲットドメインの画像とかけ離れていると、生徒モデルの性能向上が期待できない。 In the learning method described in Non-Patent Document 2, images generated using a GAN (Generative Adversarial Network) are used to perform distillation learning from teacher models to student models. However, if the image generated using the GAN is far from the image of the target domain, improvement in the performance of the student model cannot be expected.
 本発明の1つの目的は、未知データを用いて高性能な生徒モデルを生成する蒸留学習を実現することにある。 One purpose of the present invention is to realize distillation learning that generates high-performance student models using unknown data.
 本発明の一つの観点では、学習装置は、
 学習済みの複数の教師モデルと、
 入力された疑似正解ラベルに基づいて生成データを生成するデータ生成手段であって、前記生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータを前記生成データとして生成するデータ生成手段と、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う学習手段と、を備える。
In one aspect of the invention, the learning device comprises:
multiple trained teacher models,
Data generation means for generating generated data based on an input pseudo-correct label, wherein each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. data generating means for generating data as the generated data;
learning means for performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models.
 本発明の他の観点では、学習方法は、
 学習済みの複数の教師モデルを取得し、
 入力された疑似正解ラベルに基づいて生成データを生成し、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行い、
 前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである。
In another aspect of the invention, a learning method comprises:
Get multiple trained teacher models,
Generate generated data based on the input pseudo-correct label,
Using the generated data as an input, performing distillation learning of a student model using the plurality of teacher models,
The generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
 本発明のさらに他の観点では、記録媒体は、
 学習済みの複数の教師モデルを取得し、
 入力された疑似正解ラベルに基づいて生成データを生成し、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う処理であって、
 前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである処理をコンピュータに実行させるプログラムを記録する。
In still another aspect of the present invention, the recording medium comprises
Get multiple trained teacher models,
Generate generated data based on the input pseudo-correct label,
A process of performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models,
The generated data records a program that causes a computer to execute processing such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
 本発明によれば、未知データを用いて高性能な生徒モデルを生成する蒸留学習を実現することができる。 According to the present invention, it is possible to realize distillation learning that generates high-performance student models using unknown data.
第1実施形態に係る学習装置のハードウェア構成を示す。2 shows a hardware configuration of a learning device according to the first embodiment; 学習処理の全体の流れを示すフローチャートである。4 is a flowchart showing the overall flow of learning processing; 教師モデルの識別境界の例を説明する図である。FIG. 10 is a diagram illustrating an example of a discrimination boundary of a teacher model; 教師モデルの学習方法を模式的に示す。The learning method of the supervised model is shown schematically. データ生成部の学習を行う際の学習装置の機能構成を示す。The functional configuration of the learning device when performing the learning of the data generator is shown. ラベル分布決定部の構成例を示す。4 shows a configuration example of a label distribution determination unit; 生徒モデルの学習を行う際の学習装置の機能構成を示す。1 shows the functional configuration of a learning device when learning a student model; 生徒モデルの学習処理のフローチャートである。10 is a flow chart of learning processing of a student model; 第2実施形態に係る学習装置の機能構成を示す。2 shows the functional configuration of a learning device according to a second embodiment; 第2実施形態による学習処理のフローチャートである。9 is a flowchart of learning processing according to the second embodiment;
 以下、図面を参照して、本発明の好適な実施形態について説明する。
 <第1実施形態>
 [基本概念]
 一般的に、蒸留の手法を用いて生徒モデルの学習(以下、「蒸留学習」とも呼ぶ。)を行う場合、教師モデルの学習に使用した学習データを用いて生徒モデルを学習する。また、教師モデルの学習に使用した学習データが入手できない場合、GANなどを用いて生成した画像を用いて生徒モデルを学習する。しかし、GANを用いて生成する画像がターゲットドメインの画像とかけ離れていると、蒸留学習による生徒モデルの性能向上が期待できない。そこで、本実施形態では、GANが生成する画像を、教師モデルの学習を行ったドメイン、即ちターゲットドメインに近づけることにより、蒸留学習による生徒モデルの性能を向上させる。
Preferred embodiments of the present invention will be described below with reference to the drawings.
<First embodiment>
[Basic concept]
In general, when learning a student model using a distillation method (hereinafter also referred to as “distilled learning”), the student model is learned using the learning data used for learning the teacher model. Also, if the learning data used for learning the teacher model cannot be obtained, the student model is learned using images generated using GAN or the like. However, if the image generated using the GAN is far from the image of the target domain, the performance improvement of the student model by distillation learning cannot be expected. Therefore, in this embodiment, the performance of the student model by distillation learning is improved by approximating the image generated by the GAN to the domain in which the teacher model was trained, that is, the target domain.
 [ハードウェア構成]
 図1は、第1実施形態に係る学習装置のハードウェア構成を示すブロック図である。図示のように、学習装置10は、インタフェース(I/F)12と、プロセッサ13と、メモリ14と、記録媒体15と、データベース(DB)16と、を備える。
[Hardware configuration]
FIG. 1 is a block diagram showing the hardware configuration of the learning device according to the first embodiment. As illustrated, the learning device 10 includes an interface (I/F) 12, a processor 13, a memory 14, a recording medium 15, and a database (DB) 16.
 インタフェース12は、外部装置との間でデータの入出力を行う。具体的に、インタフェース12は、学習装置10が使用する学習データや未知データを外部装置から取得する。 The interface 12 inputs and outputs data with an external device. Specifically, the interface 12 acquires learning data and unknown data used by the learning device 10 from an external device.
 プロセッサ13は、CPU(Central Processing Unit)などのコンピュータであり、予め用意されたプログラムを実行することにより、学習装置100の全体を制御する。なお、プロセッサ13は、GPU(Graphics Processing Unit)またはFPGA(Field-Programmable Gate Array)であってもよい。プロセッサ13は後述する学習処理を実行する。 The processor 13 is a computer such as a CPU (Central Processing Unit), and controls the learning device 100 as a whole by executing a program prepared in advance. The processor 13 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The processor 13 executes learning processing, which will be described later.
 メモリ14は、ROM(Read Only Memory)、RAM(Random Access Memory)などにより構成される。メモリ14は、学習装置10が使用するニューラルネットワークのモデル、具体的には教師モデル、生徒モデルなどを記憶する。また、メモリ14は、プロセッサ13による各種の処理の実行中に作業メモリとしても使用される。 The memory 14 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. The memory 14 stores neural network models used by the learning apparatus 10, specifically, teacher models, student models, and the like. The memory 14 is also used as a working memory while the processor 13 is executing various processes.
 記録媒体15は、ディスク状記録媒体、半導体メモリなどの不揮発性で非一時的な記録媒体であり、学習装置10に対して着脱可能に構成される。記録媒体15は、プロセッサ13が実行する各種のプログラムを記録している。学習装置10が各種の処理を実行する際には、記録媒体15に記録されているプログラムがメモリ14にロードされ、プロセッサ13により実行される。データベース16は、インタフェース12を介して入力されたデータを記憶する。 The recording medium 15 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or a semiconductor memory, and is configured to be detachable from the learning device 10 . The recording medium 15 records various programs executed by the processor 13 . When the learning device 10 executes various processes, programs recorded in the recording medium 15 are loaded into the memory 14 and executed by the processor 13 . Database 16 stores data entered via interface 12 .
 [学習処理の概要]
 次に、学習装置10による学習処理の概要について説明する。図2は、学習処理の全体の流れを示すフローチャートである。学習処理は、大別して教師モデルの学習(ステップS10)と、データ生成部の学習(ステップS20)と、生徒モデルの学習(ステップS30)とにより構成される。
[Overview of learning process]
Next, an overview of the learning process by the learning device 10 will be described. FIG. 2 is a flowchart showing the overall flow of learning processing. The learning process is roughly divided into teacher model learning (step S10), data generator learning (step S20), and student model learning (step S30).
 教師モデルの学習は、複数の現場(ドメイン)で得られたデータを用いて、複数の教師モデルを学習するものである。これにより、学習済みの複数の教師モデルが得られる。データ生成部の学習は、学習済みの複数の教師モデルを用いて、生徒モデルの学習に使用するデータを生成するデータ生成部を学習するものである。なお、データ生成部は、GANを用いて画像を生成する。そして、生徒モデルの学習は、学習済みの複数の教師モデルと、学習済みのデータ生成部とを用いて、蒸留により生徒モデルを学習するものである。以下、順に詳しく説明する。 Training of a teacher model is to learn multiple teacher models using data obtained from multiple sites (domains). As a result, a plurality of trained teacher models are obtained. The learning of the data generation unit is to learn the data generation unit that generates data used for learning the student model using a plurality of trained teacher models. Note that the data generation unit generates an image using a GAN. In the learning of the student model, the student model is learned by distillation using a plurality of trained teacher models and a trained data generator. A detailed description will be given below in order.
 [教師モデルの学習]
 まず、教師モデルの学習について説明する。
 (基本概念)
 教師モデルの学習では、個々の現場(ターゲットドメイン)において、その現場で得られた画像を用いて教師モデルを学習する。即ち、個々のターゲットドメイン毎に教師モデルの学習を行い、複数のターゲットドメインに対応する複数の教師モデルを学習する。ここで、各々の教師モデルは、次の2つの目的を同時に満たすように学習される。
[Teacher model learning]
First, the learning of the teacher model will be explained.
(Basic concept)
In learning a teacher model, a teacher model is learned using images obtained at each site (target domain). That is, a teacher model is learned for each individual target domain, and a plurality of teacher models corresponding to a plurality of target domains are learned. Here, each teacher model is learned so as to simultaneously satisfy the following two objectives.
 目的A:ターゲットドメインの画像に対して性能が高くなるようにする。これは、通常の学習と同様である。
 目的B:ターゲットドメイン以外の画像に対しては、各教師モデルの出力分布がなるべく異なるようにする。即ち、各教師モデルは、ターゲットドメイン以外の画像に対する出力の不一致度を故意に高くするように学習される。
Aim A: To have high performance for images in the target domain. This is similar to normal learning.
Purpose B: Make the output distribution of each teacher model as different as possible for images other than the target domain. That is, each training model is trained to intentionally increase the degree of disagreement of the output with respect to images other than the target domain.
 上記の目的A、Bを同時に満たす教師モデルの例を説明する。図3は、特徴量の分布図の一例を示す。この例では、あるターゲットドメインにおいて、クラスXとクラスYの分類が行われるとする。分布図上のエリア1に属する特徴量はクラスXに分類され、エリア2に属する特徴量はクラスYに分類されるものとする。 An example of a teacher model that satisfies the above objectives A and B at the same time will be explained. FIG. 3 shows an example of a distribution map of feature quantities. In this example, it is assumed that class X and class Y are to be classified in a certain target domain. It is assumed that feature amounts belonging to area 1 on the distribution map are classified into class X, and feature amounts belonging to area 2 are classified into class Y. FIG.
 ここで、上記の目的A、Bを同時に満たす教師モデル1、2の識別境界をそれぞれF1、F2で示す。まず、識別境界F1、F2は、共にエリア1とエリア2を異なる領域に分割しているので、クラスXとクラスYを正しく分類できる。よって、教師モデル1、2は共に上記の目的Aを満たす。さらに、識別境界F1、F2は、エリア1及びエリア2以外の領域(白色の領域)のうちの大半を異なるクラスに分類している。よって、教師モデル1、2は上記の目的Bを満たす。即ち、識別境界F1、F2は、ターゲットドメインのクラスX、クラスYを正しく分類し、かつ、それ以外のほとんどの領域を異なるクラスに分類している。よって、教師モデル1、2は、上記の目的A、Bを同時に満たしている。 Here, the discrimination boundaries of teacher models 1 and 2 that simultaneously satisfy the above objectives A and B are indicated by F1 and F2, respectively. First, since the identification boundaries F1 and F2 both divide the area 1 and the area 2 into different regions, class X and class Y can be classified correctly. Therefore, the teacher models 1 and 2 both satisfy the objective A described above. Furthermore, the identification boundaries F1 and F2 classify most of the areas (white areas) other than Area 1 and Area 2 into different classes. Therefore, teacher models 1 and 2 satisfy objective B above. That is, the discrimination boundaries F1 and F2 correctly classify classes X and Y of the target domain, and classify most other regions into different classes. Therefore, the teacher models 1 and 2 simultaneously satisfy the objectives A and B described above.
 なお、仮に教師モデル1、2に加えて別の教師モデル3を生成する場合、その識別境界F3は、例えば図3に示すように識別境界F1、F2と同様にエリア1、2を別の領域に分割し、かつ、エリア1、2以外の領域を識別境界F1、F2とは異なる2つの領域に分割するものとなる。このように学習された複数の教師モデルは、後述のデータ生成部の学習、及び、生徒モデルの学習において使用される。 If another teacher model 3 is to be generated in addition to the teacher models 1 and 2, the discrimination boundary F3 is, for example, similar to the discrimination boundaries F1 and F2, as shown in FIG. , and the areas other than areas 1 and 2 are divided into two areas different from the identification boundaries F1 and F2. A plurality of teacher models learned in this way are used in the learning of the data generation unit and in the learning of the student models, which will be described later.
 (教師モデルの学習方法)
 図4は、教師モデルの学習方法を模式的に示す。この例では、N個の教師モデル20-1~20-Nを学習するものとする。各教師モデル20-1~20-Nは、ニューラルネットワークを用いたモデルである。なお、以下の説明においては、個々の教師モデル20-1~20-Nを区別しない場合には、単に「教師モデル20」と表記することがある。また、以下の図面においては、学習の対象となる要素をグレーで示すものとする。
(Teacher model learning method)
FIG. 4 schematically shows the learning method of the teacher model. In this example, it is assumed that N teacher models 20-1 to 20-N are learned. Each teacher model 20-1 to 20-N is a model using a neural network. In the following description, when individual teacher models 20-1 to 20-N are not distinguished, they may simply be referred to as "teacher model 20". In the drawings below, elements to be learned are shown in gray.
 まず、図4(A)に示すように、各教師モデル20-1~20-Nに学習データが入力される。この学習データは、ターゲットドメインの学習データであり、正解ラベルが用意されている。即ち、この学習データは、ターゲットドメインで得られた画像と、その画像に対する正解ラベルとを含む。各教師モデル20-1~20-Nは、入力された画像に対する予測ラベル1~Nをそれぞれ出力する。 First, as shown in FIG. 4(A), learning data is input to each teacher model 20-1 to 20-N. This learning data is learning data of the target domain, and a correct label is prepared. That is, this training data includes images obtained in the target domain and correct labels for the images. Each teacher model 20-1 to 20-N outputs predicted labels 1 to N for the input image, respectively.
 学習装置10は、教師モデル20-1が出力した予測ラベル1と、学習データとして用意された正解ラベルとの誤差が最小となるように、教師モデル20-1を学習する。また、学習装置10は、他の教師モデル20-2~20-Nについても同様の処理を行い、各教師モデル20-2~20-Nを学習する。これにより、各教師モデル20-1~20-Nは、ターゲットドメインの画像データに対して正しい予測を行うように学習される。こうして、上記の目的Aが満足される。 The learning device 10 learns the teacher model 20-1 so that the error between the predicted label 1 output by the teacher model 20-1 and the correct label prepared as learning data is minimized. The learning device 10 also performs the same processing for the other teacher models 20-2 to 20-N to learn each of the teacher models 20-2 to 20-N. As a result, each of the teacher models 20-1 to 20-N is learned to correctly predict the image data of the target domain. Thus, objective A above is satisfied.
 次に、図4(B)に示すように、各教師モデル20-1~20-Nに対して未知データが入力される。未知データは、教師モデルにとって未知のデータ、即ち、教師モデルの学習に用いられていないデータである。具体的に、未知データは、ターゲットドメインの画像以外の画像であり、正解ラベルは用意されていない。各教師モデル20-1~20-Nは、入力された未知データに対してそれぞれ予測ラベル1~Nを出力する。学習装置10は、予測ラベル1と他の予測ラベル2~Nとの不一致度が最大となるように教師モデル20-1を学習する。また、学習装置10は、他の教師モデル20-2~20-Nについても同様の処理を行い、各教師モデル20-2~20-Nを学習する。これにより、各教師モデル20-1~20-Nは、ターゲットドメイン以外のドメイン(以下、「非ターゲットドメイン」とも呼ぶ。)の画像である未知データに対しては、予測ラベルの不一致度が高くなるように、即ち、なるべく異なる予測ラベルを出力するように学習される。これにより、上記の目的Bが満足される。 Next, as shown in FIG. 4(B), unknown data is input to each teacher model 20-1 to 20-N. Unknown data is data that is unknown to the teacher model, that is, data that is not used for learning of the teacher model. Specifically, the unknown data is an image other than the image of the target domain, and no correct label is prepared. Each teacher model 20-1 to 20-N outputs prediction labels 1 to N, respectively, for the input unknown data. The learning device 10 learns the teacher model 20-1 so that the degree of mismatch between the predicted label 1 and the other predicted labels 2 to N is maximized. The learning device 10 also performs the same processing for the other teacher models 20-2 to 20-N to learn each of the teacher models 20-2 to 20-N. As a result, each teacher model 20-1 to 20-N has a high degree of discrepancy in predicted labels for unknown data that are images of domains other than the target domain (hereinafter also referred to as “non-target domains”). that is, to output different prediction labels as much as possible. This satisfies Objective B above.
 なお、上記の未知データを用いた学習を行う方法としては、例えば下記文献に記載の手法を用いることができる。
 "Maximum Classifier Discrepancy for Unsupervised Domain Adaptation",Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3723-3732
As a method of performing learning using the unknown data, for example, the method described in the following document can be used.
"Maximum Classifier Discrepancy for Unsupervised Domain Adaptation",Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3723-3732
 また、上記の手法以外でも、各教師モデルが出力する予測ラベルの不一致度を示す損失関数を定義し、その損失関数を通常の学習データを用いて学習を行う際の損失関数に加えて学習を行えばよい。 In addition to the above method, we define a loss function that indicates the degree of discrepancy in the predicted labels output by each teacher model, and add this loss function to the loss function used when learning using normal training data. Do it.
 なお、上記の説明では、ターゲットドメインの学習データを用いた学習により目的Aを満足し、次に、非ターゲットドメインの未知データを用いた学習により目的Bを満足するように、2種類の学習を分けて順に行っている。その代わりに、学習データと未知データを混ぜて各教師モデル20に入力し、各教師モデル20が目的Aと目的Bを同時に満足するように学習を行ってもよい。 In the above description, two types of learning are performed so that objective A is satisfied by learning using the learning data of the target domain, and then objective B is satisfied by learning using unknown data of the non-target domain. I'm doing it separately. Instead, learning data and unknown data may be mixed and input to each teacher model 20, and each teacher model 20 may learn so as to satisfy purpose A and purpose B at the same time.
 [データ生成部の学習]
 次に、データ生成部の学習について説明する。
 (基本概念)
 データ生成部は、GANを用いて画像を生成する。ここで、本実施形態では、データ生成部が生成した画像がターゲットドメインの画像に近くなるようにGANを学習する。具体的には、GANの学習において、損失関数としてコンシステンシーロス(consistency loss)を加える。即ち、GANが生成した画像を複数の教師モデルに入力したとき、複数の教師モデルの出力分布が一致するほど小さくなるようなロスを加える。上記の教師モデルの学習により、各教師モデルは、ターゲットドメインの画像に対しては一致度の高い予測ラベルを出力し、非ターゲットドメインの画像に対しては一致度の低い予測ラベルを出力するように学習されている。よって、ある画像を各教師モデルに入力したときに各教師モデルが出力した予測ラベルの一致度が高い場合(コンシステンシーロスが小さい場合)、その画像はターゲットドメインの画像に近いと考えられる。逆に、ある画像を各教師モデルに入力したときに各教師モデルが出力した予測ラベルの一致度が低い場合(コンシステンシーロスが大きい場合)、その画像はターゲットドメインの画像に近くないと考えられる。
[Learning of the data generator]
Next, learning of the data generator will be described.
(Basic concept)
A data generator generates an image using a GAN. Here, in the present embodiment, the GAN is trained so that the image generated by the data generation unit is close to the image of the target domain. Specifically, in learning GAN, we add consistency loss as a loss function. That is, when an image generated by a GAN is input to a plurality of teacher models, a loss is added that decreases as the output distributions of the plurality of teacher models match. By training the above-mentioned supervised model, each supervised model will output a highly matching prediction label for images in the target domain and a low matching prediction label for images in the non-target domain. is learned by Therefore, when an image is input to each teacher model and the prediction label output by each teacher model has a high degree of matching (low consistency loss), the image is considered to be close to the image of the target domain. Conversely, when an image is input to each teacher model and the prediction label output by each teacher model has a low matching score (large consistency loss), the image is considered not close to the image of the target domain. .
 そこで、学習装置10は、GANが生成した画像を各教師モデルに入力し、各教師モデルが出力する予測ラベルに基づいてコンシステンシーロスを算出する。そして、学習装置10は、コンシステンシーロスが小さくなるような画像を生成するようにGANを学習する。これにより、GANはターゲットドメインの画像に近い画像を出力できるように学習される。 Therefore, the learning device 10 inputs images generated by the GAN to each teacher model, and calculates consistency loss based on the predicted label output by each teacher model. Then, the learning device 10 learns the GAN so as to generate an image with a small consistency loss. As a result, the GAN is trained to output an image close to the image of the target domain.
 (機能構成)
 図5は、データ生成部の学習を行う際の学習装置10の機能構成を示す。学習装置10は、乱数発生器31と、データ生成部32と、教師モデル20-1~20-Nと、ラベル誤差最小化部33と、ラベル分布決定部34とを備える。ここでは、グレーで示すデータ生成部32が学習の対象となる。また、教師モデル20-1~20-Nは、上述した方向により学習済みのものである。
(Functional configuration)
FIG. 5 shows the functional configuration of the learning device 10 when learning the data generator. The learning device 10 includes a random number generator 31, a data generation unit 32, teacher models 20-1 to 20-N, a label error minimization unit 33, and a label distribution determination unit . Here, the data generation unit 32 shown in gray is the object of learning. Also, the teacher models 20-1 to 20-N have already been learned in the direction described above.
 乱数発生器31は、乱数ベクトルを生成し、データ生成部32へ出力する。乱数ベクトルを用いることにより、データ生成部32は様々なバリエーションの画像を生成可能となる。データ生成部32は、GANにより構成される。データ生成部32には、未知データが入力される。未知データは、前述のように、非ターゲットドメインの画像データである。未知データは、GANに自然画像らしさを学習させるためのものであり、例えばImage Netのような一般的な画像データセットから得た画像を用いることができる。未知データとして、画像データセットの画像を使用することにより、データ生成部32は自然画像らしい画像を生成可能となる。なお、GANに自然画像らしさを学習させるという意味では、未知データは、補助データ又は代理データなどと捉えることもできる。 The random number generator 31 generates random number vectors and outputs them to the data generator 32 . By using the random number vector, the data generator 32 can generate various variations of images. The data generator 32 is configured by GAN. Unknown data is input to the data generator 32 . Unknown data is image data of non-target domains, as described above. The unknown data is for GAN to learn the likeness of natural images, and images obtained from general image datasets such as Image Net can be used. By using the image of the image data set as the unknown data, the data generator 32 can generate an image that looks like a natural image. Note that unknown data can also be regarded as auxiliary data or proxy data in the sense that the GAN learns the likeness of a natural image.
 また、データ生成部32には、疑似正解ラベルD3が入力される。疑似正解ラベルD3は、データ生成部32が生成する画像のクラスを指定するデータであり、例えばクラス番号などとすることができる。データ生成部32は、入力された乱数ベクトルと、疑似正解ラベルD3とに基づいて、疑似正解ラベルD3が示すクラスの画像D1を生成し、教師モデル20-1~20-Nへ出力する。 In addition, a pseudo-correct label D3 is input to the data generation unit 32. The pseudo-correct label D3 is data specifying the class of the image generated by the data generation unit 32, and can be, for example, a class number. The data generator 32 generates an image D1 of the class indicated by the pseudo-correct label D3 based on the input random number vector and the pseudo-correct label D3, and outputs it to the teacher models 20-1 to 20-N.
 データ生成部(GAN)32は、生成器(Generator)と、識別器(Discriminator)とを備える。基本的な動作として、生成器は、乱数ベクトルと疑似正解ラベルD3を入力とし、画像D1を生成する。識別器には、画像D1又は未知データが入力される。識別器は、生成器が生成する画像D1と未知データとを区別することを目標に学習され、生成器は識別器が区別できない画像D1を生成することを目標に学習される。なお、本実施形態では、上記の学習に加えて、後述するようにラベル誤差最小化部33を用いて生成器の学習が行われる。 The data generator (GAN) 32 includes a generator and a discriminator. As a basic operation, the generator receives a random number vector and a pseudo-correct label D3 and generates an image D1. The image D1 or unknown data is input to the discriminator. The classifier is trained with the goal of distinguishing between the image D1 generated by the generator and unknown data, and the generator is trained with the goal of generating the image D1 that the classifier cannot distinguish. In this embodiment, in addition to the above learning, generator learning is performed using the label error minimizing unit 33 as will be described later.
 教師モデル20-1~20-Nは、それぞれ画像D1に対して予測を行い、予測ラベルD2をラベル誤差最小化部33及びラベル分布決定部34へ出力する。以下、教師モデル20が出力する予測ラベルを「教師予測ラベル」と呼ぶ。ラベル分布決定部34は、教師モデル20-1~20-Nから入力される教師予測ラベルD2に基づいてラベルの分布を算出し、算出された分布が均等となるように疑似正解ラベルD3を決定してデータ生成部32へ出力する。例えば、教師モデル20が10クラスの分類を行う場合、各教師モデル20-1~20-Nは10クラスの分類結果を教師予測ラベルD2として出力する。ラベル分布決定部34は、教師モデル20-1~20-Nが出力した教師予測ラベルD2を集計し、その分布が均等となるように、次にデータ生成部32が生成すべき画像のクラスを示す疑似正解ラベルD3を生成してデータ生成部32へ出力する。これにより、データ生成部32は、教師モデル20-1~20-Nが出力する教師予測ラベルD2の分布が均等となるように画像を生成するようになる。 The teacher models 20-1 to 20-N each perform prediction on the image D1 and output the predicted label D2 to the label error minimizing unit 33 and the label distribution determining unit 34. The predicted label output by the teacher model 20 is hereinafter referred to as a "teacher predicted label". The label distribution determination unit 34 calculates the label distribution based on the teacher prediction labels D2 input from the teacher models 20-1 to 20-N, and determines the pseudo-correct label D3 so that the calculated distribution is uniform. and output to the data generator 32 . For example, when the teacher model 20 performs 10-class classification, each of the teacher models 20-1 to 20-N outputs the 10-class classification result as the teacher prediction label D2. The label distribution determination unit 34 aggregates the teacher prediction labels D2 output by the teacher models 20-1 to 20-N, and determines the class of the image to be generated by the data generation unit 32 next so that the distribution is uniform. A pseudo-correct label D3 shown is generated and output to the data generator 32 . As a result, the data generation unit 32 generates images so that the teacher prediction labels D2 output from the teacher models 20-1 to 20-N are evenly distributed.
 また、ラベル分布決定部34は、疑似正解ラベルD3をラベル誤差最小化部33へ出力する。ラベル誤差最小化部33は、各教師モデル20-1~20-Nから入力された教師予測ラベルD2と、疑似正解ラベルD3を用いて、データ生成部32の学習を行う。具体的には、ラベル誤差最小化部33は、各教師モデル20-1~20-Nが出力した教師予測ラベルD2と疑似正解ラベルD3との誤差を算出し、その総和が最小となるようにデータ生成部32を構成するニューラルネットワークのパラメータを最適化する。 In addition, the label distribution determining unit 34 outputs the pseudo-correct label D3 to the label error minimizing unit 33. The label error minimizing unit 33 makes the data generating unit 32 learn using the teacher prediction label D2 and the pseudo-correct label D3 input from each of the teacher models 20-1 to 20-N. Specifically, the label error minimizing unit 33 calculates the error between the teacher prediction label D2 output by each of the teacher models 20-1 to 20-N and the pseudo-correct label D3, and minimizes the sum of the errors. The parameters of the neural network that constitutes the data generator 32 are optimized.
 これに加えて、ラベル誤差最小化部33は、前述のコンシステンシーロスに基づいてデータ生成部32の学習を行う。具体的には、ラベル誤差最小化部33は、各教師モデル20-1~20-Nが出力した教師予測ラベルD2に基づいてコンシステンシーロスを算出する。コンシステンシーロスは、複数の教師モデル20が出力した教師予測ラベルD2の分布が一致するほど小さくなる損失である。よって、ラベル誤差最小化部33は、コンシステンシーロスが小さくなるように、即ち、教師モデル20-1~20-Nが出力した教師予測ラベルD2の分布が近づくように、データ生成部32の生成器を学習する。これにより、データ生成部32は、生成した画像を入力したときに各教師モデル20-1~20-Nが出力する教師予測ラベルD2の分布が一致するような画像、即ち、ターゲットドメインの画像に近い画像を生成するように学習される。 In addition to this, the label error minimization unit 33 performs learning of the data generation unit 32 based on the consistency loss described above. Specifically, the label error minimizing unit 33 calculates the consistency loss based on the teacher prediction label D2 output by each of the teacher models 20-1 to 20-N. The consistency loss is a loss that becomes smaller as the distributions of teacher prediction labels D2 output by a plurality of teacher models 20 match each other. Therefore, the label error minimizing unit 33 causes the data generating unit 32 to generate Learn instruments. As a result, the data generating unit 32 generates an image such that the distribution of the teacher prediction labels D2 output by the teacher models 20-1 to 20-N matches when the generated image is input, that is, the image of the target domain. It is trained to produce close images.
 図6は、ラベル分布決定部34の構成例を示す。ラベル分布決定部34は、累積確率密度算出部35と、重み算出部36と、乗算器37とを備える。各教師モデル20-1~20-Nから出力された教師予測ラベルD2は、累積確率密度算出部35と、乗算器37とに入力される。累積確率密度算出部35は、各教師予測ラベルD2から各クラスの累積確率分布を計算し、累積確率密度を求めて重み算出部36に入力する。重み算出部36は、各クラスの累積確率密度が均等になるように、各クラスに対する重みを計算する。例えば、重み算出部36は累積確率密度の逆数を重みとしてもよいし、一部のクラスへの重みをユーザが任意に決定してもよい。そして、乗算器37は、教師予測ラベルD2に重みを乗算し、個々の未知データに対する疑似正解ラベルD3を決定する。 FIG. 6 shows a configuration example of the label distribution determination unit 34. The label distribution determining section 34 includes a cumulative probability density calculating section 35 , a weight calculating section 36 and a multiplier 37 . A teacher prediction label D2 output from each teacher model 20-1 to 20-N is input to a cumulative probability density calculator 35 and a multiplier 37. FIG. The cumulative probability density calculation unit 35 calculates the cumulative probability distribution of each class from each teacher prediction label D2, obtains the cumulative probability density, and inputs the cumulative probability density to the weight calculation unit . The weight calculator 36 calculates a weight for each class so that the cumulative probability density of each class is uniform. For example, the weight calculator 36 may use the reciprocal of the cumulative probability density as the weight, or the user may arbitrarily determine the weight for some classes. The multiplier 37 then multiplies the teacher prediction label D2 by a weight to determine a pseudo-correct label D3 for each piece of unknown data.
 [生徒モデルの学習]
 次に、生徒モデルの学習について説明する。
 (機能構成)
 図7は、生徒モデルの学習を行う際の学習装置10の機能構成を示す。学習装置10は、乱数発生器31と、データ生成部32と、教師モデル20-1~20-Nと、ラベル分布決定部34と、生徒モデル40と、蒸留学習部41とを備える。ここでは、生徒モデル40が学習の対象となる。なお、各教師モデル20-1~20-N及びデータ生成部32は、前述の学習方法により学習済みである。また、乱数発生器31、ラベル分布決定部34は、図5に示すデータ生成部の学習時のものと同様である。
[Student model learning]
Next, the learning of the student model will be explained.
(Functional configuration)
FIG. 7 shows the functional configuration of the learning device 10 when learning a student model. The learning device 10 includes a random number generator 31 , a data generator 32 , teacher models 20 - 1 to 20 -N, a label distribution determiner 34 , a student model 40 and a distillation learning section 41 . Here, the student model 40 is the object of learning. The teacher models 20-1 to 20-N and the data generator 32 have been trained by the learning method described above. Random number generator 31 and label distribution determining unit 34 are the same as those at the time of learning of the data generating unit shown in FIG.
 ラベル分布決定部34から疑似正解ラベルD3が入力されると、データ生成部32は、疑似正解ラベルD3と、乱数発生器31からの乱数ベクトルとを用いて画像D1を生成し、教師モデル20-1~20-N、及び、生徒モデル40へ出力する。生徒モデル40は、教師モデルと同様にニューラルネットワークを用いて構成される。 When the pseudo-correct label D3 is input from the label distribution determination unit 34, the data generation unit 32 uses the pseudo-correct label D3 and the random number vector from the random number generator 31 to generate the image D1, and the teacher model 20- 1 to 20-N and to the student model 40. The student model 40 is constructed using a neural network like the teacher model.
 各教師モデル20-1~20-Nは、画像D1に対する教師予測ラベルD2を蒸留学習部41へ出力する。また、生徒モデル40は、画像D1に対する予測ラベル(以下、「生徒予測ラベル」とも呼ぶ。)D5を蒸留学習部41へ出力する。蒸留学習部41は、生徒モデル40が教師モデル20に近づくように生徒モデル40を学習する。具体的には、蒸留学習部41は、生徒予測ラベルD5と、各教師予測ラベルD2及び疑似正解ラベルD3との誤差の総和が最小となるように、生徒モデル40を構成するニューラルネットワークのパラメータを最適化する。こうして、蒸留による生徒モデルの学習が行われる。 Each of the teacher models 20-1 to 20-N outputs a teacher prediction label D2 for the image D1 to the distillation learning unit 41. The student model 40 also outputs a predicted label (hereinafter also referred to as a “student predicted label”) D5 for the image D1 to the distillation learning unit 41 . The distillation learning unit 41 learns the student model 40 so that the student model 40 approaches the teacher model 20 . Specifically, the distillation learning unit 41 adjusts the parameters of the neural network that constitutes the student model 40 so that the sum of errors between the student prediction label D5 and each teacher prediction label D2 and pseudo-correct label D3 is minimized. Optimize. In this way, the learning of the student model is performed by distillation.
 先に述べたように、データ生成部32は未知データに基づいてターゲットドメインの画像に近い画像D1を生成できるように学習されている。よって、教師モデルの学習データが入手できない場合でも、生徒モデル40は、未知データから生成されたターゲットドメインの画像に近い画像D1を用いて蒸留学習されるので、各教師モデル20の性能を適切に受け継ぐことができる。 As described above, the data generation unit 32 is trained so that it can generate an image D1 close to the image of the target domain based on unknown data. Therefore, even if the training data for the teacher model cannot be obtained, the student model 40 undergoes distillation learning using the image D1 that is close to the image of the target domain generated from the unknown data. can be inherited.
 上記の構成において、データ生成部32はデータ生成手段の一例であり、画像D1は生成データの一例である。また、蒸留学習部41は学習手段の一例であり、ラベル分布決定部34はラベル分布決定手段の一例である。 In the above configuration, the data generation unit 32 is an example of data generation means, and the image D1 is an example of generated data. The distillation learning unit 41 is an example of learning means, and the label distribution determination unit 34 is an example of label distribution determination means.
 (生徒モデルの学習処理)
 図8は、図7に示す学習装置10による生徒モデルの学習処理のフローチャートである。この処理は、図1に示すプロセッサ13が、予め用意されたプログラムを実行することにより実現される。
(Student model learning process)
FIG. 8 is a flowchart of a student model learning process by the learning device 10 shown in FIG. This processing is realized by the processor 13 shown in FIG. 1 executing a program prepared in advance.
 まず、ラベル分布決定部34が疑似正解ラベルD3を生成し、データ生成部32へ出力する(ステップS31)。データ生成部32は、乱数ベクトルを用いて、入力された疑似正解ラベルD3が示すクラスの画像D1を生成し、教師モデル20及び生徒モデル40へ出力する(ステップS32)。次に、各教師モデル20及び生徒モデル40は、画像D1に対する予測を行い、教師予測ラベルD2及び生徒予測ラベルD5を蒸留学習部41へ出力する(ステップS33)。 First, the label distribution determination unit 34 generates a pseudo-correct label D3 and outputs it to the data generation unit 32 (step S31). The data generator 32 uses the random number vector to generate the image D1 of the class indicated by the input pseudo-correct label D3, and outputs it to the teacher model 20 and the student model 40 (step S32). Next, each teacher model 20 and student model 40 predict the image D1 and output a teacher prediction label D2 and a student prediction label D5 to the distillation learning unit 41 (step S33).
 次に、蒸留学習部41は、生徒予測ラベルD5と、各教師予測ラベルD2及び疑似正解ラベルD3との誤差が最小となるように生徒モデルを学習する(ステップS34)。ステップS31~S34の処理は、所定の終了条件が具備されるまで繰り返し実行され、所定の終了条件が具備されると(ステップS35:Yes)、処理は終了する。 Next, the distillation learning unit 41 learns the student model so that the error between the student prediction label D5 and each teacher prediction label D2 and pseudo-correct label D3 is minimized (step S34). The processing of steps S31 to S34 is repeatedly executed until a predetermined end condition is satisfied, and when the predetermined end condition is satisfied (step S35: Yes), the processing ends.
 以上のように、生徒モデルの学習処理においては、学習済みのデータ生成部32が生成するターゲットドメインの画像に近い画像を用いて蒸留学習を行うので、未知データを用いる場合でも、教師モデルの性能を適切に受け継いだ生徒モデルを得ることができる。 As described above, in the learning process of the student model, distillation learning is performed using an image similar to the image of the target domain generated by the trained data generation unit 32. Therefore, even when unknown data is used, the performance of the teacher model It is possible to obtain a student model that appropriately inherits
 [第2実施形態]
 次に、本発明の第2実施形態について説明する。図9は、第2実施形態に係る学習装置50の機能構成を示す。なお、学習装置50のハードウェア構成は、図1に示すものと同様である。
[Second embodiment]
Next, a second embodiment of the invention will be described. FIG. 9 shows the functional configuration of a learning device 50 according to the second embodiment. The hardware configuration of the learning device 50 is the same as that shown in FIG.
 学習装置50は、教師モデルが学習していない未知データを用いて蒸留学習を行うものであり、図示のように、複数の教師モデル51と、データ生成手段52と、学習手段53と、生徒モデル54とを備える。複数の教師モデルは学習済みであり、生徒モデル54が学習の対象である。データ生成手段52は、入力された疑似正解ラベルに基づいて生成データを生成する。具体的に、データ生成手段52は、生成データが入力された複数の教師モデルの各々が、疑似正解ラベルに近しい教師予測ラベルを出力するようなデータを、生成データとして生成する。学習手段53は、生成データを入力とし、複数の教師モデル51を用いて生徒モデル54の蒸留学習を行う。こうして、未知データを用いて、蒸留学習を行うことができる。 The learning device 50 performs distillation learning using unknown data that has not been learned by the teacher model. 54. A plurality of teacher models have already been trained, and the student model 54 is the subject of learning. The data generating means 52 generates generated data based on the input pseudo-correct label. Specifically, the data generating means 52 generates data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label as the generated data. The learning means 53 receives the generated data and performs distillation learning of the student model 54 using a plurality of teacher models 51 . Distillation learning can thus be performed using unknown data.
 図10は、第2実施形態による学習処理のフローチャートである。まず、学習済みの複数の教師モデルが取得される(ステップS51)。次に、入力された疑似正解ラベルに基づいて生成データが生成される(ステップS52)。ここで、生成データは、当該生成データが入力された複数の教師モデルの各々が、疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである。そして、生成データを入力とし、複数の教師モデルを用いて生徒モデルの蒸留学習が行われる(ステップS53)。 FIG. 10 is a flowchart of learning processing according to the second embodiment. First, a plurality of trained teacher models are acquired (step S51). Next, generation data is generated based on the input pseudo-correct label (step S52). Here, the generated data is data such that each of a plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. Then, with the generated data as input, distillation learning of the student model is performed using a plurality of teacher models (step S53).
 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.
 (付記1)
 学習済みの複数の教師モデルと、
 入力された疑似正解ラベルに基づいて生成データを生成するデータ生成手段であって、前記生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータを前記生成データとして生成するデータ生成手段と、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う学習手段と、
 を備える学習装置。
(Appendix 1)
multiple trained teacher models,
Data generation means for generating generated data based on an input pseudo-correct label, wherein each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. data generating means for generating data as the generated data;
learning means for performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models;
A learning device with
 (付記2)
 前記学習手段は、前記生成データを前記複数の教師モデル及び生徒モデルに入力し、前記複数の教師モデルが出力する教師予測ラベルを正解ラベルとして用いて、前記生徒モデルの学習を行う付記1に記載の学習装置。
(Appendix 2)
The learning means inputs the generated data to the plurality of teacher models and student models, and uses teacher prediction labels output by the plurality of teacher models as correct labels to learn the student models. learning device.
 (付記3)
 前記複数の教師モデルは、既知の入力データに対して各々が出力する教師予測ラベルが正解ラベルと近しくなり、未知の入力データに対して各々が出力する教師予測ラベルの不一致度を最大化するように学習済みである付記1又は2に記載の学習装置。
(Appendix 3)
In the plurality of teacher models, the teacher prediction label output by each for known input data is close to the correct label, and the degree of discrepancy between the teacher prediction label output by each for unknown input data is maximized. 3. The learning device according to appendix 1 or 2, which has been trained as follows.
 (付記4)
 前記既知の入力データは前記教師モデルの学習に用いたデータであり、前記未知の入力データは前記教師モデルの学習に用いられていないデータである付記3に記載の学習装置。
(Appendix 4)
The learning device according to appendix 3, wherein the known input data is data used for learning the teacher model, and the unknown input data is data not used for learning the teacher model.
 (付記5)
 前記既知の入力データはターゲットドメインのデータであり、前記未知の入力データは前記ターゲットドメインのデータ以外のデータである付記3又は4記載の学習装置。
(Appendix 5)
5. The learning device according to appendix 3 or 4, wherein the known input data is target domain data, and the unknown input data is data other than the target domain data.
 (付記6)
 前記データ生成手段は、前記生成データを前記複数の教師モデルに入力した場合に、前記複数の教師モデルの各々が出力する教師予測ラベルの分布が一致するほど小さくなる損失関数を最小化するように学習済みである付記1乃至5のいずれか一項に記載の学習装置。
(Appendix 6)
The data generating means, when the generated data is input to the plurality of teacher models, minimizes a loss function that becomes smaller as distributions of teacher prediction labels output from each of the plurality of teacher models match each other. 6. The learning device according to any one of appendices 1 to 5, which has already been trained.
 (付記7)
 前記学習手段は、前記生徒モデルが出力する生徒予測ラベルと前記複数の教師モデルが出力する教師予測ラベルとの誤差と、前記生徒予測ラベルと前記疑似正解ラベルとの誤差の和を最小化するように前記生徒モデルを学習する付記1乃至6のいずれか一項に記載の学習装置。
(Appendix 7)
The learning means minimizes the sum of the error between the student predicted label output by the student model and the teacher predicted label output by the plurality of teacher models and the error between the student predicted label and the pseudo-correct label. 7. The learning device according to any one of appendices 1 to 6, which learns the student model to
 (付記8)
 前記複数の教師モデルが出力する教師予測ラベルが各クラスに均等に分布するように前記疑似正解ラベルの値を調整するラベル分布決定手段を備える付記1乃至7のいずれか一項に記載の学習装置。
(Appendix 8)
8. The learning device according to any one of Appendices 1 to 7, further comprising: label distribution determining means for adjusting the values of the pseudo-correct labels so that the teacher-predicted labels output by the plurality of teacher models are evenly distributed among the classes. .
 (付記9)
 学習済みの複数の教師モデルを取得し、
 入力された疑似正解ラベルに基づいて生成データを生成し、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行い、
 前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである学習方法。
(Appendix 9)
Get multiple trained teacher models,
Generate generated data based on the input pseudo-correct label,
Using the generated data as an input, performing distillation learning of a student model using the plurality of teacher models,
The learning method, wherein the generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
 (付記10)
 学習済みの複数の教師モデルを取得し、
 入力された疑似正解ラベルに基づいて生成データを生成し、
 前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う処理であって、
 前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである処理をコンピュータに実行させるプログラムを記録した記録媒体。
(Appendix 10)
Get multiple trained teacher models,
Generate generated data based on the input pseudo-correct label,
A process of performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models,
The generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. A recording medium recording a program for causing a computer to execute processing. .
 以上、実施形態及び実施例を参照して本発明を説明したが、本発明は上記実施形態及び実施例に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 10 学習装置
 20 教師モデル
 31 乱数発生器
 32 データ生成部
 33 ラベル誤差最小化部
 34 ラベル分布決定部
 40 生徒モデル
 41 蒸留学習部
REFERENCE SIGNS LIST 10 learning device 20 teacher model 31 random number generator 32 data generation unit 33 label error minimization unit 34 label distribution determination unit 40 student model 41 distillation learning unit

Claims (10)

  1.  学習済みの複数の教師モデルと、
     入力された疑似正解ラベルに基づいて生成データを生成するデータ生成手段であって、前記生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータを前記生成データとして生成するデータ生成手段と、
     前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う学習手段と、
     を備える学習装置。
    multiple trained teacher models,
    Data generation means for generating generated data based on an input pseudo-correct label, wherein each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. data generating means for generating data as the generated data;
    learning means for performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models;
    A learning device with
  2.  前記学習手段は、前記生成データを前記複数の教師モデル及び生徒モデルに入力し、前記複数の教師モデルが出力する教師予測ラベルを正解ラベルとして用いて、前記生徒モデルの学習を行う請求項1に記載の学習装置。 2. The learning unit according to claim 1, wherein the generated data is input to the plurality of teacher models and student models, and the teacher prediction labels output by the plurality of teacher models are used as correct labels to learn the student models. A learning device as described.
  3.  前記複数の教師モデルは、既知の入力データに対して各々が出力する教師予測ラベルが正解ラベルと近しくなり、未知の入力データに対して各々が出力する教師予測ラベルの不一致度を最大化するように学習済みである請求項1又は2に記載の学習装置。 In the plurality of teacher models, the teacher prediction label output by each for known input data is close to the correct label, and the degree of discrepancy between the teacher prediction label output by each for unknown input data is maximized. 3. The learning device according to claim 1 or 2, wherein the learning device is already trained as follows.
  4.  前記既知の入力データは前記教師モデルの学習に用いたデータであり、前記未知の入力データは前記教師モデルの学習に用いられていないデータである請求項3に記載の学習装置。 The learning device according to claim 3, wherein the known input data is data used for learning the teacher model, and the unknown input data is data not used for learning the teacher model.
  5.  前記既知の入力データはターゲットドメインのデータであり、前記未知の入力データは前記ターゲットドメインのデータ以外のデータである請求項3又は4記載の学習装置。 The learning device according to claim 3 or 4, wherein the known input data is target domain data, and the unknown input data is data other than the target domain data.
  6.  前記データ生成手段は、前記生成データを前記複数の教師モデルに入力した場合に、前記複数の教師モデルの各々が出力する教師予測ラベルの分布が一致するほど小さくなる損失関数を最小化するように学習済みである請求項1乃至5のいずれか一項に記載の学習装置。 The data generating means, when the generated data is input to the plurality of teacher models, minimizes a loss function that becomes smaller as distributions of teacher prediction labels output from each of the plurality of teacher models match each other. 6. The learning device according to any one of claims 1 to 5, which has been learned.
  7.  前記学習手段は、前記生徒モデルが出力する生徒予測ラベルと前記複数の教師モデルが出力する教師予測ラベルとの誤差と、前記生徒予測ラベルと前記疑似正解ラベルとの誤差の和を最小化するように前記生徒モデルを学習する請求項1乃至6のいずれか一項に記載の学習装置。 The learning means minimizes the sum of the error between the student predicted label output by the student model and the teacher predicted label output by the plurality of teacher models and the error between the student predicted label and the pseudo-correct label. 7. The learning device according to any one of claims 1 to 6, wherein the student model is trained to
  8.  前記複数の教師モデルが出力する教師予測ラベルが各クラスに均等に分布するように前記疑似正解ラベルの値を調整するラベル分布決定手段を備える請求項1乃至7のいずれか一項に記載の学習装置。 8. The learning according to any one of claims 1 to 7, further comprising label distribution determining means for adjusting the values of the pseudo-correct labels so that the teacher prediction labels output by the plurality of teacher models are evenly distributed in each class. Device.
  9.  学習済みの複数の教師モデルを取得し、
     入力された疑似正解ラベルに基づいて生成データを生成し、
     前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行い、
     前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである学習方法。
    Get multiple trained teacher models,
    Generate generated data based on the input pseudo-correct label,
    Using the generated data as an input, performing distillation learning of a student model using the plurality of teacher models,
    The learning method, wherein the generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label.
  10.  学習済みの複数の教師モデルを取得し、
     入力された疑似正解ラベルに基づいて生成データを生成し、
     前記生成データを入力とし、前記複数の教師モデルを用いて生徒モデルの蒸留学習を行う処理であって、
     前記生成データは、当該生成データが入力された前記複数の教師モデルの各々が、前記疑似正解ラベルに近しい教師予測ラベルを出力するようなデータである処理をコンピュータに実行させるプログラムを記録した記録媒体。
    Get multiple trained teacher models,
    Generate generated data based on the input pseudo-correct label,
    A process of performing distillation learning of a student model using the generated data as an input and using the plurality of teacher models,
    The generated data is data such that each of the plurality of teacher models to which the generated data is input outputs a teacher prediction label close to the pseudo-correct label. A recording medium recording a program for causing a computer to execute processing. .
PCT/JP2021/003058 2021-01-28 2021-01-28 Learning device, learning method, and recording medium WO2022162839A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022577920A JPWO2022162839A5 (en) 2021-01-28 Learning devices, learning methods, and programs
PCT/JP2021/003058 WO2022162839A1 (en) 2021-01-28 2021-01-28 Learning device, learning method, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/003058 WO2022162839A1 (en) 2021-01-28 2021-01-28 Learning device, learning method, and recording medium

Publications (1)

Publication Number Publication Date
WO2022162839A1 true WO2022162839A1 (en) 2022-08-04

Family

ID=82652722

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/003058 WO2022162839A1 (en) 2021-01-28 2021-01-28 Learning device, learning method, and recording medium

Country Status (1)

Country Link
WO (1) WO2022162839A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496509A (en) * 2023-12-25 2024-02-02 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695699A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, electronic equipment and readable storage medium for model distillation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695699A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, electronic equipment and readable storage medium for model distillation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FUKUDA, TAKASHI ET AL.: "Knowledge distillation from neural network acoustic models for wideband to acoustic models for narrowband", IPSJ SIG TECHNICAL REPORT: MUSIC AND COMPUTER (MUS), vol. 15, 13 February 2018 (2018-02-13), pages 1 - 6, ISSN: 2188-8752 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496509A (en) * 2023-12-25 2024-02-02 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation
CN117496509B (en) * 2023-12-25 2024-03-19 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation

Also Published As

Publication number Publication date
JPWO2022162839A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
Oliva et al. Multilevel thresholding segmentation based on harmony search optimization
Schuman et al. Evolutionary optimization for neuromorphic systems
US20180240031A1 (en) Active learning system
US20180240017A1 (en) Difference metric for machine learning-based processing systems
CN111191709B (en) Continuous learning framework and continuous learning method of deep neural network
CN110210625B (en) Modeling method and device based on transfer learning, computer equipment and storage medium
US11551080B2 (en) Learning dataset generation method, new learning dataset generation device and learning method using generated learning dataset
KR20210032140A (en) Method and apparatus for performing pruning of neural network
CN113723438A (en) Classification model calibration
Nokhwal et al. Rtra: Rapid training of regularization-based approaches in continual learning
CN110263808B (en) Image emotion classification method based on LSTM network and attention mechanism
CN113377964B (en) Knowledge graph link prediction method, device, equipment and storage medium
WO2022162839A1 (en) Learning device, learning method, and recording medium
Dehuri et al. A condensed polynomial neural network for classification using swarm intelligence
Wu et al. Ensemble model of intelligent paradigms for stock market forecasting
KR20200092989A (en) Production organism identification using unsupervised parameter learning for outlier detection
CN112541530A (en) Data preprocessing method and device for clustering model
CN111753995A (en) Local interpretable method based on gradient lifting tree
Cuevas et al. Otsu and Kapur segmentation based on harmony search optimization
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN111949530B (en) Test result prediction method and device, computer equipment and storage medium
JP2016194912A (en) Method and device for selecting mixture model
Pontes-Filho et al. EvoDynamic: a framework for the evolution of generally represented dynamical systems and its application to criticality
Kumar et al. Cluster-than-label: Semi-supervised approach for domain adaptation
JP7405148B2 (en) Information processing device, learning method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922855

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022577920

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922855

Country of ref document: EP

Kind code of ref document: A1