WO2021064787A1 - 学習システム、学習装置、および学習方法 - Google Patents

学習システム、学習装置、および学習方法 Download PDF

Info

Publication number
WO2021064787A1
WO2021064787A1 PCT/JP2019/038498 JP2019038498W WO2021064787A1 WO 2021064787 A1 WO2021064787 A1 WO 2021064787A1 JP 2019038498 W JP2019038498 W JP 2019038498W WO 2021064787 A1 WO2021064787 A1 WO 2021064787A1
Authority
WO
WIPO (PCT)
Prior art keywords
dnn
label
student
training data
feature amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2019/038498
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
亮 高本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US17/762,418 priority Critical patent/US20220343163A1/en
Priority to PCT/JP2019/038498 priority patent/WO2021064787A1/ja
Priority to JP2021550747A priority patent/JP7468540B2/ja
Publication of WO2021064787A1 publication Critical patent/WO2021064787A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the present invention relates to a learning system and a learning device including a deep neural network, and a learning method using the deep neural network.
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • DNN Many parameters are used in DNN. As a result, the amount of calculation by the computer that realizes DNN increases. As a result, it is difficult to apply DNN to mobile terminals and the like whose computing power (calculation speed and storage capacity) is relatively low.
  • distillation model compression as a method of reducing the calculation cost while suppressing the decrease in accuracy.
  • the model is first trained, for example by a supervised learning method, to create a supervised model.
  • a student model which is a smaller model separate from the teacher model, is then trained using the output of the teacher model instead of the correct label (see, eg, Patent Document 1).
  • the label may contain noise.
  • the noisy teacher data affects the accuracy of the DNN.
  • Patent Document 1 describes a student model trained using the output of the teacher model instead of the correct label, but Patent Document 1 does not consider teacher data containing noise.
  • Non-Patent Document 1 also describes a student model trained using the output of the teacher model instead of the correct label. However, in Non-Patent Document 1, no countermeasures for teacher data containing noise have been examined.
  • An object of the present invention is to provide a learning system, a learning device, and a learning method capable of efficiently causing a student DNN to learn information learned by a teacher DNN.
  • the learning system is a learning system using a teacher DNN and a student DNN having a size smaller than the size of the teacher DNN, and is a teacher DNN feature amount extraction means for extracting the feature amount of each of a plurality of training data.
  • a teacher DNN estimation value calculation means for calculating the first estimation value of each label corresponding to each of the plurality of training data
  • a student DNN feature amount extraction means for extracting the feature amount of each of the plurality of training data.
  • the label corresponding to the training data is noisy.
  • the feature amount extracted by the teacher DNN feature amount extraction means and the feature amount extracted by the student DNN feature amount extraction means while reducing the influence of the noise label correction means for determining whether or not the label contains noise and the label containing noise. Includes an update means that updates the weights in the student DNN so that there is no difference with.
  • the learning device is a learning device that uses a student DNN, and is a student DNN feature amount extracting means for extracting a feature amount of input data and a student DNN for calculating a plurality of estimated values of labels corresponding to the input data.
  • a teacher DNN feature extraction means that includes an estimate calculation means and an output integration means that integrates a plurality of estimated values, and the weight of the student DNN feature extraction means extracts each feature of a plurality of training data.
  • the teacher DNN estimate calculation means that calculates the first estimate of each label corresponding to each of the training data, the label corresponding to the training data and the label corresponding to the training data make noise based on the label corresponding to the training data and the first estimate.
  • the learning method according to the present invention is a learning method using a teacher DNN and a student DNN having a size smaller than the size of the teacher DNN, and each feature amount of a plurality of training data is extracted as a teacher DNN feature amount.
  • the first estimated value of each label corresponding to each of the plurality of training data is calculated, the feature amount of each of the plurality of training data is extracted as the student DNN feature amount, and the feature amount of each label corresponding to each of the plurality of training data is extracted.
  • the second estimated value is calculated, and based on the label corresponding to the training data and the first estimated value, it is determined whether or not the label corresponding to the training data is a label containing noise, and the extracted teacher DNN feature amount.
  • the weight in the student DNN is updated so that there is no difference between the and the student DNN feature amount.
  • the recording medium is a computer-readable recording medium in which a learning program is stored, and the learning program includes a process of extracting each feature amount of a plurality of training data as a teacher DNN feature amount, and a plurality of trainings.
  • the processor is made to execute the process of updating the weight in the student DNN so that there is no difference between the teacher DNN feature amount and the student DNN feature amount.
  • the information learned by the teacher DNN can be efficiently learned by the student DNN.
  • the learning system of the first embodiment is a learning system to which the distillation method is applied.
  • FIG. 1 is a block diagram showing a configuration example of a learning system.
  • the learning system 200 of the present embodiment includes a data reading unit 201, a label reading unit 202, a teacher DNN feature amount extraction unit 203, a teacher DNN estimated value calculation unit 204, a student DNN feature amount extraction unit 205, and a student DNN. It includes an estimated value calculation unit 206, a student DNN feature amount learning unit 207, a noise label correction unit 208, a student DNN learning unit 209, an output integration unit 210, and an output unit 211.
  • Data such as images, sounds, and sentences are input to the data reading unit 201.
  • the input data is temporarily saved in the memory.
  • the data reading unit 201 outputs the input data to the teacher DNN feature amount extraction unit 203 and the student DNN feature amount extraction unit 205.
  • the label corresponding to the data input to the data reading unit 201 is input to the label reading unit 202.
  • the input label is temporarily saved in the memory.
  • the label reading unit 202 outputs the input label to the noise label correction unit 208 and the student DNN learning unit 209.
  • the teacher DNN feature amount extraction unit 203 converts the data input from the data reading unit 201 into the feature amount of the scalar array.
  • the teacher DNN estimated value calculation unit 204 calculates the label estimated value using the feature amount of the scalar array input from the teacher DNN feature amount extraction unit 203.
  • the student DNN feature amount extraction unit 205 converts the data input from the data reading unit 201 into the feature amount of the scalar array in the same manner as the teacher DNN feature amount extraction unit 203.
  • the student DNN estimated value calculation unit 206 calculates the label estimated value using the feature amount of the scalar array input from the student DNN feature amount extraction unit 205. Student DNN estimate calculation unit 206 outputs a plurality of estimates for statistical averaging. The student DNN estimated value calculation unit 206 outputs an estimated value of the output from the noise label correction unit 208, an estimated value of the output from the teacher DNN estimated value calculation unit 204, and the like.
  • the student DNN feature amount learning unit 207 receives the feature amount from each of the teacher DNN feature amount extraction unit 203 and the student DNN feature amount extraction unit 205, and calculates a function of the difference between them. Then, the student DNN feature learning unit 207 calculates a gradient that reduces the value of the function. Gradients are used to update the weights of the student DNN.
  • the noise label correction unit 208 compares the label value input from the label reading unit 202 with the label estimated value input from the teacher DNN estimation value calculation unit 204.
  • the noise label correction unit 208 regards a label having a large difference between the label value and the estimated label value as an erroneous label (label containing noise).
  • the noise label correction unit 208 corrects an incorrect label.
  • a correction method for example, it is conceivable to use the label estimation value input from the teacher DNN estimation value calculation unit 204 as the correction label as it is.
  • the correction method is not limited to the method of using the label estimation value from the teacher DNN estimation value calculation unit 204 as the correction label as it is, and other methods may be used.
  • the student DNN learning unit 209 inputs a label from the label reading unit 202, inputs a label estimated value from the teacher DNN estimated value calculation unit 204, and inputs a correction label from the noise label correction unit 208. Further, the student DNN learning unit 209 inputs a label estimated value from the student DNN estimated value calculation unit 206.
  • the student DNN learning unit 209 may refer to, for example, the label estimated value from the teacher DNN estimated value calculation unit 204 and the label estimated value from the student DNN estimated value calculation unit 206 (from the teacher DNN estimated value calculation unit 204) while referring to the correction label. Calculate the difference from the estimated value of the output of.
  • the student DNN learning unit 209 calculates a gradient that reduces the value of the function and uses it to update the weight of the student DNN. As functions, for example, mean square error, mean absolute value error, and Wing-Loss can be used.
  • the output integration unit 210 receives the output from the student DNN estimate value calculation unit 206 and integrates the values.
  • an integration method for example, there is a statistical average.
  • the output unit 211 inputs the output from the output integration unit 210 at the time of operation (application phase) after the training phase (learning phase) is completed, and outputs it as an estimated value of the student DNN.
  • the teacher DNN (including the teacher DNN feature extraction unit 203 and the teacher DNN estimate calculation unit 204) is of a relatively large size with a sufficient number of parameters to achieve the required accuracy in learning. It is a DNN model.
  • ResNet and Wider ResNet which have a large number of channels, can be used as an example.
  • the size of the DNN model corresponds to, for example, the number of parameters, but may also correspond to the number of layers, the feature map size, and the kernel size.
  • the size of the student DNN (including the student DNN feature amount extraction unit 205, the student DNN estimated value calculation unit 206, the student DNN feature amount learning unit 207, and the student DNN learning unit 209) is smaller than the size of the teacher DNN.
  • the number of parameters in student DNN is relatively small.
  • the number of parameters in the student DNN is less than the number of parameters in the teacher DNN.
  • the student DNN is, for example, a DNN model of a small size that can be actually mounted on a device that is supposed to be mounted.
  • the student DNN501 Mobile Net, ResNet with a sufficiently reduced number of channels, and Wider ResNet can be considered.
  • FIG. 2 is an explanatory diagram showing an example of learning a student DNN from a teacher DNN.
  • An example of training (learning) a student DNN with a small number of parameters will be described by using the output of the teacher DNN with a large number of parameters instead of the correct label with reference to FIG.
  • the student DNN301 inputs data from the data reading unit 310.
  • the feature amount extraction unit 321 converts the data into a feature amount.
  • the estimated value calculation unit 331 converts the feature amount into the estimated value 341.
  • the data reading unit 310, the feature amount extraction unit 321 and the estimated value calculation unit 331 correspond to the data reading unit 201, the student DNN feature amount extraction unit 205, and the student DNN estimated value calculation unit 206 in the learning system 200 shown in FIG. To do. That is, the learning system 300 is the same system as the learning system 200 shown in FIG. 1, although the expression method is different.
  • the teacher DNN 302 inputs data from the data reading unit 310.
  • the feature amount extraction unit 322 converts the data into a feature amount.
  • the estimated value calculation unit 332 converts the feature amount into the estimated value 342.
  • the data reading unit 310, the feature amount extraction unit 322, and the estimated value calculation unit 332 correspond to the data reading unit 201, the teacher DNN feature amount extraction unit 203, and the teacher DNN estimated value calculation unit 204 in the learning system 200 shown in FIG. To do.
  • the error signal calculation unit 350 calculates an error signal from each of the obtained feature quantities and each converted estimated value. Then, the learning system 300 updates the weights by backpropagation for updating the network parameters of the student DNN301.
  • the processing of the error signal calculation unit 350 is executed by the student DNN learning unit 209.
  • FIG. 3 is an explanatory diagram showing an example of the teacher DNN model.
  • the teacher DNN 401 in the teacher DNN model 400 includes a feature amount extraction unit 406 and an estimated value calculation unit 407.
  • the feature amount extraction unit 406 includes a plurality of hidden layers 404.
  • the hidden layer consists of a plurality of nodes 403. Each node has a corresponding weight parameter.
  • the weight parameters are updated by learning.
  • the data is supplied from the data reading unit 402.
  • the feature amount extracted by the feature amount extraction unit 406 is output to the estimated value calculation unit 407 from the final layer of the feature amount extraction unit 406.
  • the estimated value calculation unit 407 converts the input feature amount into the label estimated value 405.
  • the data reading unit 402, the feature amount extraction unit 406, and the estimated value calculation unit 407 are the data reading unit 201, the teacher DNN feature amount extraction unit 203, and the teacher DNN estimated value calculation unit 204 in the learning system 200 shown in FIG. Corresponds to.
  • FIG. 4 is an explanatory diagram showing an example of a student DNN model.
  • the student DNN 501 in the student DNN model 500 includes a feature amount extraction unit 506 and an estimated value calculation unit 507.
  • the feature amount extraction unit 506 includes a plurality of hidden layers 504.
  • the hidden layer consists of a plurality of nodes 503. Each node has a corresponding weight parameter.
  • the weight parameters are updated by learning.
  • the feature amount extracted by the feature amount extraction unit 506 is output to the estimated value calculation unit 507 from the final layer of the feature amount extraction unit 506.
  • the estimated value calculation unit 507 converts the input feature amount into a plurality of label estimated values 505.
  • the data reading unit 502, the feature amount extraction unit 506, and the estimated value calculation unit 507 are the data reading unit 201, the student DNN feature amount extraction unit 205, and the student DNN estimated value calculation unit 206 in the learning system 200 shown in FIG. Corresponds to
  • the learning system 300 determines the first DNN model as the teacher DNN model (step S110).
  • the teacher DNN includes a teacher DNN feature amount extraction unit 203 and a teacher DNN estimate value calculation unit 204.
  • the learning system 300 initializes the second DNN model as the student DNN model (step S120).
  • an initial value is given using a normally distributed random number having an average of 0 and a variance of 1.
  • the student DNN model includes a student DNN feature amount extraction unit 205, a student DNN estimated value calculation unit 206, a student DNN feature amount learning unit 207, and a student DNN learning unit 209.
  • the learning system 300 receives a set of labeled training data as inputs of the teacher DNN model and the student DNN model (step S130).
  • the data reading unit 201 and the label reading unit 202 input the labeled training data.
  • the data reading unit 201 and the label reading unit 202 may be integrated.
  • training data means labeled training data.
  • the teacher DNN401 and the student DNN501 calculate the output using the subset of the received training data (step S140).
  • the output of the teacher DNN estimation value calculation unit 204 corresponds to the output of the teacher DNN 401. Further, the output of the student DNN estimation value calculation unit 206 corresponds to the output of the student DNN 501.
  • erroneous label data (noise label) of the training data is determined using the output of the teacher DNN401 (step S150).
  • the noise label correction unit 208 determines whether or not the label in the training data is incorrect.
  • the output of the student DNN 501 is evaluated by being compared with the output of the teacher DNN 401 and the modified training data label (corrected label) (step S160).
  • the student DNN learning unit 209 evaluates.
  • step S165 it is determined whether or not to repeat the processes of steps S140 to S160 using a certain criterion.
  • a criterion for example, it is conceivable to calculate the mean square error between the output of student DNN501 and the label, and the value exceeds (or falls below) a certain threshold value.
  • the student DNN learning unit 209 executes the determination process of step S165.
  • step S165 When it is determined to repeat in step S165, the weight parameter of the student DNN 501 (specifically, the weight of the node in the layer constituting the student DNN feature amount extraction unit 205) is set based on the evaluation in the learning system 300. It is updated (step S170). In step S165, if it is not determined to repeat, i.e., if it is determined to end the training, the learning system 300 provides the trained student DNN501 (step S180).
  • the student DNN model 500 is the target of implementation. Providing a trained student DNN 501 means that a viable student DNN 501 has been determined for the device.
  • the first DNN model which is large enough to train the data set, is adopted as the teacher model and trained.
  • a weight learned by using a random number or some data set is set as an initial value.
  • a subset of the data set is given to the teacher DNN feature extraction unit 203.
  • the output value y output from the teacher DNN estimation value calculation unit 204 is compared with the label value y label.
  • a function of the difference between the output value y output and the label value y label for example, the mean square error ( ⁇ (y output ⁇ y label ) 2 / N) is calculated.
  • the comparison process and the calculation process are executed by, for example, the teacher feature amount learning unit (not shown in FIG. 1).
  • the gradient is calculated using error back propagation, etc. in the direction of decreasing the value of the function, and the weight parameter is updated by the stochastic gradient descent method, etc.
  • the process of calculating the gradient and updating the weight parameters continues until certain criteria, such as the output and label mean square error, fall below a certain threshold.
  • the teacher DNN401 is obtained by the above processing.
  • the gradient calculation and the weight parameter update process are executed by, for example, the teacher feature amount learning unit (not shown in FIG. 1).
  • the student DNN501 is also set with a weight that has been learned using a random number or some data set as an initial value.
  • a subset of the data set is given to the teacher DNN feature extraction unit 203 and the student DNN feature extraction unit 205.
  • teacher , y student, i are calculated. Since the student DNN estimation value calculation unit 206 outputs a plurality of data, the subscript i is added to the output value.
  • the student DNN feature learning unit 207 calculates a function of the difference between z teacher and z student , for example, a mean square error ( ⁇ (z student ⁇ z teacher ) 2 / N). If the dimensions of the feature quantity outputs z teacher and z student of the teacher DNN 401 and the student DNN 501 are different, the student DNN feature quantity learning unit 207 aligns the dimensions of both. For example, the student DNN feature learning unit 207 causes an appropriate CNN to act on the feature output of the teacher DNN. For example, the output of the intermediate layer, which is intended to align the dimensions, is supplied to the convolutional layer, and the dimensions are adjusted by the convolutional operation.
  • the output of the teacher DNN estimation value calculation unit 204 is used by the noise label correction unit 208 for label correction.
  • the noise label correction unit 208 When determining whether or not it is a noise label, for example, the estimated value of the teacher DNN401 is compared with the label value, and if the difference is smaller than a certain threshold value, it is considered to be the correct label, and the difference is greater than the certain threshold value. There is a way to consider it as an incorrect label (noise label) when it is large.
  • the student DNN learning unit 209 calculates the gradient by using error back propagation or the like in the direction of decreasing the value of the calculated function of the plurality of differences.
  • the student DNN learning unit 209 updates the weight parameter by a stochastic gradient descent method or the like.
  • the student DNN learning unit 209 combines the feature amount extracted by the teacher DNN feature amount extraction unit 203 with the feature amount extracted by the student DNN feature amount extraction unit 205 while reducing the influence of the label containing noise. Update the weights in the student DNN so that there is no difference.
  • Student DNN501 is obtained by the above processing.
  • the output integration unit 210 calculates, for example, the statistical average of the output.
  • the output unit 211 outputs it as a final estimated value.
  • the student DNN 501 uses the student DNN feature amount learning unit 207 to learn the output of the student DNN feature amount extraction unit 205 so as to reproduce the output of the teacher DNN feature amount extraction unit 203.
  • the learning system can efficiently make the student DNN learn the information learned by the teacher DNN.
  • the output of the final layer of the DNN feature extraction unit corresponds to the basis vector in the case of the linear regression apparatus. Being able to reproduce the basis vector means that the feature extractor of the teacher DNN401 has been completely reproduced. Learning is generally easy if the basis vector can be reproduced.
  • the teacher DNN401 implicitly learns the correctness of the training data label in the learning process. Then, in the teacher DNN 401, the noise label correction unit 208 compares the output of the teacher DNN estimation value calculation unit 204 with the label data supplied from the label reading unit 202, so that the input label is an erroneous label. This is because it guesses whether or not and corrects the wrong label.
  • the output of the DNN includes a random statistical error, because in the present embodiment, the student DNN 501 outputs a plurality of results, and the output integration unit 210 takes the statistical average of those outputs.
  • Embodiment 2 In the learning system of the second embodiment, the student DNN 501 receives the output from any layer other than the final layer in the teacher DNN 401.
  • FIG. 6 is a block diagram showing a configuration example of the learning system.
  • the learning system 600 of the second embodiment includes a data reading unit 201, a label reading unit 202, a teacher DNN feature amount extraction unit 203, a teacher DNN estimation value calculation unit 204, a student DNN feature amount extraction unit 205, and the like. It includes a student DNN estimation value calculation unit 206, a student DNN feature amount learning unit 207, a noise label correction unit 208, a student DNN learning unit 209, an output integration unit 210, and an output unit 211.
  • the learning system 600 further includes a student DNN intermediate feature learning unit 612.
  • the student DNN intermediate feature amount learning unit 612 inputs the output from any layer other than the final layer from the teacher DNN feature amount extraction unit 203 and the student DNN feature amount extraction unit 205.
  • the student DNN intermediate feature learning unit 612 calculates a function of their difference.
  • the student DNN intermediate feature learning unit 612 calculates a gradient that reduces the function of the difference and uses it to update the weight of the student DNN.
  • the configuration other than the student DNN intermediate feature amount learning unit 612 is the same as the configuration of the learning system 200 of the first embodiment.
  • FIG. 7 is an explanatory diagram showing an example of the DNN learning system of the second embodiment.
  • the learning system 700 of the present invention has the student DNN 701 and the teacher DNN 702, similarly to the learning system 300 shown in FIG. including.
  • the learning system 700 is the same system as the learning system 600 shown in FIG. 6, although the expression method is different.
  • Student DNN701 inputs data (training data) from the data reading unit 310.
  • the feature amount extraction unit 321 converts the data into a feature amount.
  • the estimated value calculation unit 331 converts the feature amount into the estimated value 341.
  • the teacher DNN702 inputs data (training data) from the data reading unit 310.
  • the feature amount extraction unit 322 converts the data into a feature amount.
  • the estimated value calculation unit 332 converts the feature amount into the estimated value 342.
  • the error signal calculation unit 750 calculates the error signal from the obtained feature amount of the final layer, the feature amount of the intermediate layer, and each estimated value.
  • the learning system 700 then updates the weights by backpropagation to update the network parameters of the student DNN701.
  • the learning system 600 performs the same processing as the processing of the learning system 200 of the first embodiment shown in the flowchart of FIG. However, in the present embodiment, the processes of steps S140 and S160 are different from the processes in the first embodiment.
  • step S140 the student DNN501 (specifically, the student DNN estimation value calculation unit 206) also executes a process of inputting a feature amount (intermediate feature amount) from the intermediate layer in the teacher DNN401.
  • the student DNN 501 inputs a feature amount from one or a plurality of predetermined intermediate layers.
  • step S160 the student DNN501 (specifically, the student DNN learning unit 209) also executes a process of comparing the feature amount from the intermediate layer in the teacher DNN401 with the feature amount from the intermediate layer in the student DNN501.
  • more knowledge of the teacher DNN401 can be transmitted to the student DNN501 by having the student DNN501 learn the intermediate feature amount of the teacher DNN401.
  • the learning systems 200 and 600 of each of the above embodiments can be applied to devices that handle regression problems.
  • the object detector is constructed with DNN
  • the position of the object can be treated as a regression problem.
  • the postures of the human body and objects can also be treated as regression problems.
  • Each function (each processing) in each of the above embodiments can be realized by a computer having a processor such as a CPU (Central Processing Unit) or a memory.
  • a program for carrying out the method (processing) in the above embodiment may be stored in a storage device (storage medium), and each function may be realized by executing the program stored in the storage device on the CPU. Good.
  • FIG. 8 is a block diagram showing an example of a computer having a CPU.
  • the computer is implemented in the learning system.
  • the CPU 1000 realizes each function in each of the above embodiments by executing the process according to the program stored in the storage device 1001. That is, the computer has the teacher DNN feature amount extraction unit 203, the teacher DNN estimated value calculation unit 204, the student DNN feature amount extraction unit 205, the student DNN estimated value calculation unit 206, and the student DNN feature amount shown in FIGS.
  • the functions of the learning unit 207, the noise label correction unit 208, the student DNN learning unit 209, and the output integration unit 210 are realized.
  • the storage device 1001 is, for example, a non-transitory computer readable medium.
  • Non-temporary computer-readable media include various types of tangible storage media. Specific examples of non-temporary computer-readable media include magnetic recording media (for example, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Compact Disc-Read Only Memory), and CD-Rs (CD-Rs). Compact Disc-Recordable), CD-R / W (Compact Disc-ReWritable), semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM).
  • magnetic recording media for example, hard disk drives
  • magneto-optical recording media for example, magneto-optical disks
  • CD-ROMs Compact Disc-Read Only Memory
  • CD-Rs Compact Disc-Recordable
  • CD-R / W Compact Disc-ReWritable
  • semiconductor memory for example, mask ROM, PROM (Pro
  • the program may also be stored on various types of temporary computer-readable media (transitory computer readable medium).
  • the program is supplied to the temporary computer-readable medium, for example, via a wired or wireless communication path, that is, via an electrical signal, an optical signal, or an electromagnetic wave.
  • the memory 1002 is realized by, for example, a RAM (Random Access Memory), and is a storage means for temporarily storing data when the CPU 1000 executes a process.
  • a mode in which a program held by the storage device 1001 or a temporary computer-readable medium is transferred to the memory 1002 and the CPU 1000 executes processing based on the program in the memory 1002 can be assumed.
  • FIG. 9 is a block diagram showing a main part of the learning system according to the present invention.
  • the learning system 800 includes a teacher DNN feature amount extraction means 801 (for example, a teacher DNN feature amount extraction unit 203) that extracts each feature amount of a plurality of training data, and a label corresponding to each of the plurality of training data.
  • teacher DNN estimate calculation means 802 for example, teacher DNN estimate calculation unit 204 for calculating an estimate
  • student DNN feature extraction means 803 for example, student DNN for extracting features of each of a plurality of training data.
  • the noise label correction means 805 for example, the noise label correction unit 208) for determining whether or not the label corresponding to the training data is a label containing noise based on the corresponding label and the first estimated value, and the noise are included.
  • An update means 806 that updates the weights in the student DNN so that there is no difference between the feature amount extracted by the teacher DNN feature amount extraction means 801 and the feature amount extracted by the student DNN feature amount extraction means 803 while reducing the influence of the label. (For example, student DNN learning department 209).
  • FIG. 10 is a block diagram showing a main part of the learning device according to the present invention.
  • the learning device 900 includes a student DNN feature amount extracting means 803 (for example, a student DNN feature amount extraction unit 205) that extracts a feature amount of input data, and a student DNN estimation that calculates a plurality of estimated values of labels corresponding to the input data.
  • the value calculation means 804 for example, the student DNN estimation value calculation unit 206) and the output integration means 807 (for example, the output integration unit 210) for integrating a plurality of estimated values are provided, and the weight of the student DNN feature amount extraction means 803 is increased.
  • the teacher DNN feature amount extraction means 801 (for example, the teacher DNN feature amount extraction unit 203) for extracting the feature amount of each of the plurality of training data, and the first estimated value of each label corresponding to each of the plurality of training data is calculated.
  • Teacher DNN estimate calculation means 802 (for example, teacher DNN estimate calculation unit 204), based on the label corresponding to the training data and the first estimated value, is the label corresponding to the training data a label containing noise?
  • It is updated by the teacher DNN910 including the update means 806 (eg, student DNN learning unit 209) that updates the weights in the student DNN so that there is no difference from the feature quantities extracted by the 803.
  • a learning system that uses a teacher DNN (Deep Neural Network) and a student DNN whose size is smaller than the size of the teacher DNN.
  • a teacher DNN feature extraction means for extracting features of each of a plurality of training data
  • a teacher DNN estimate calculation means for calculating the first estimate of each label corresponding to each of the plurality of training data
  • Student DNN feature extraction means for extracting features of each of the plurality of training data
  • a student DNN estimate calculation means for calculating a second estimate for each label corresponding to each of the plurality of training data
  • a noise label correcting means for determining whether or not the label corresponding to the training data is a label containing noise based on the label corresponding to the training data and the first estimated value.
  • the weight in the student DNN is set so that there is no difference between the feature amount extracted by the teacher DNN feature amount extracting means and the feature amount extracted by the student DNN feature amount extracting means.
  • a learning system characterized by having an update means for updating.
  • the updating means calculates the value of the function by reducing the influence of the label containing noise in the function representing the difference between the plurality of the first estimated values and the plurality of the second estimated values.
  • the learning system of Appendix 1 that updates the weights of the nodes in the layers constituting the student DNN according to the calculation result.
  • the updating means is the learning system of Appendix 2, which calculates a gradient that reduces the value of the function and updates the weight by the gradient descent method.
  • a learning device that uses student DNN.
  • Student DNN feature extraction means for extracting features of input data
  • a student DNN estimate calculation means that calculates a plurality of estimates of labels corresponding to the input data, and It is provided with an output integration means for integrating the plurality of estimates.
  • the weight of the student DNN feature extraction means Teacher DNN feature extraction means for extracting features of each of a plurality of training data,
  • a teacher DNN estimate calculation means for calculating the first estimate of each label corresponding to each of the plurality of training data.
  • a noise label correcting means for determining whether or not the label corresponding to the training data is a label containing noise based on the label corresponding to the training data and the first estimated value, and the influence of the label containing noise.
  • a teacher including an updating means for updating the weight in the student DNN so that there is no difference between the feature amount extracted by the teacher DNN feature amount extracting means and the feature amount extracted by the student DNN feature amount extracting means.
  • a learning device characterized by being updated by DNN.
  • a learning method using a teacher DNN and a student DNN having a size smaller than the size of the teacher DNN Each feature of a plurality of training data is extracted as a teacher DNN feature, A first estimate for each label corresponding to each of the plurality of training data was calculated. Each feature of the plurality of training data is extracted as a student DNN feature, A second estimate for each label corresponding to each of the plurality of training data was calculated. Based on the label corresponding to the training data and the first estimated value, it is determined whether or not the label corresponding to the training data is a label containing noise.
  • a learning method characterized in that the weight in the student DNN is updated so that the difference between the extracted teacher DNN feature amount and the student DNN feature amount disappears.
  • Appendix 7 In a function representing the difference between the plurality of first estimated values and the plurality of second estimated values, the influence of the label containing noise is reduced, the value of the function is calculated, and the value of the function is calculated according to the calculation result.
  • Appendix 8 The learning method of Appendix 7 in which a gradient for reducing the value of the function is calculated and the weight is updated by the gradient descent method.
  • the learning program The process of extracting the features of each of the multiple training data as the teacher DNN features, and The process of calculating the first estimated value of each label corresponding to each of the plurality of training data, and A process of extracting each feature of the plurality of training data as a student DNN feature, and The process of calculating the second estimated value of each label corresponding to each of the plurality of training data, and A process of determining whether or not the label corresponding to the training data is a label containing noise based on the label corresponding to the training data and the first estimated value.
  • a recording medium that causes a processor to execute a process of updating weights in the student DNN so that there is no difference between the extracted teacher DNN feature amount and the student DNN feature amount.
  • the learning program is The value of the function is calculated by reducing the influence of the label containing the noise in the function representing the difference between the plurality of the first estimated values and the plurality of the second estimated values, and the student according to the calculation result.
  • the recording medium of Appendix 10 which causes a processor to execute a process of updating the weights of nodes in the layers constituting the DNN.
  • the learning program is The recording medium of Appendix 11 which calculates a gradient that reduces the value of the function and causes the processor to execute a process of updating the weight by the gradient descent method.
  • the learning program is A recording medium according to any one of Appendix 10 to Appendix 12, which causes a processor to execute a process of correcting the label when it is determined that the label corresponding to the training data is a label containing noise.
  • Appendix 15 The computer calculates the value of the function by reducing the influence of the label containing noise in the function representing the difference between the plurality of first estimated values and the plurality of second estimated values.
  • the learning program of Appendix 14 for executing a process of updating the weights of nodes in the layer constituting the student DNN according to the result.
  • Appendix 16 To the computer The learning program of Appendix 15 that calculates a gradient that reduces the value of the function and executes a process of updating the weight by the gradient descent method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/JP2019/038498 2019-09-30 2019-09-30 学習システム、学習装置、および学習方法 Ceased WO2021064787A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/762,418 US20220343163A1 (en) 2019-09-30 2019-09-30 Learning system, learning device, and learning method
PCT/JP2019/038498 WO2021064787A1 (ja) 2019-09-30 2019-09-30 学習システム、学習装置、および学習方法
JP2021550747A JP7468540B2 (ja) 2019-09-30 2019-09-30 学習システム、学習装置、および学習方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/038498 WO2021064787A1 (ja) 2019-09-30 2019-09-30 学習システム、学習装置、および学習方法

Publications (1)

Publication Number Publication Date
WO2021064787A1 true WO2021064787A1 (ja) 2021-04-08

Family

ID=75337760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/038498 Ceased WO2021064787A1 (ja) 2019-09-30 2019-09-30 学習システム、学習装置、および学習方法

Country Status (3)

Country Link
US (1) US20220343163A1 (https=)
JP (1) JP7468540B2 (https=)
WO (1) WO2021064787A1 (https=)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283578A (zh) * 2021-04-14 2021-08-20 南京大学 一种基于标记风险控制的数据去噪方法
JP2023034529A (ja) * 2021-08-31 2023-03-13 株式会社Jvcケンウッド 画像処理装置、画像処理方法、および画像処理プログラム
JP2024080116A (ja) * 2022-12-01 2024-06-13 株式会社東芝 学習装置、方法、プログラム及び推論装置
JP2024537971A (ja) * 2021-09-25 2024-10-18 メディカル・エーアイ・カンパニー・リミテッド 医療データに基づくディープラーニングモデルの学習及び推論方法、プログラム及び装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114981820A (zh) * 2019-12-20 2022-08-30 谷歌有限责任公司 用于在边缘设备上评估和选择性蒸馏机器学习模型的系统和方法
WO2021182748A1 (ko) * 2020-03-10 2021-09-16 삼성전자주식회사 전자 장치 및 그 제어 방법
US12210585B2 (en) * 2021-03-10 2025-01-28 Qualcomm Incorporated Efficient test-time adaptation for improved temporal consistency in video processing
CN117751380A (zh) * 2021-08-05 2024-03-22 富士通株式会社 生成方法、信息处理装置以及生成程序
US11853392B2 (en) * 2021-11-30 2023-12-26 International Business Machines Corporation Providing reduced training data for training a machine learning model
CN116030323B (zh) * 2023-03-27 2023-08-29 阿里巴巴(中国)有限公司 图像处理方法以及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132528A1 (en) * 2015-11-06 2017-05-11 Microsoft Technology Licensing, Llc Joint model training

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "You can see the mistake in labeling! ?", THE LABELLIO BLOG, KYOCERA COMMUNICATION SYSTEMS (KCCS), 4 July 2019 (2019-07-04), pages 1 - 11, XP055811668, Retrieved from the Internet <URL:https://www.kccs.co.jp/labellio_blog_ja/2019/07/rps.html> [retrieved on 20191210] *
ROMERO ADRIANA, BALLAS NICOLAS, KAHOU SAMIRA EBRAHIMI, CHASSANG ANTOINE, GATTA CARLO, BENGIO YOSHUA: "itNets: Hints for Thin Deep Nets", ARXIV, 27 March 2015 (2015-03-27), pages 1 - 13, XP055349753, Retrieved from the Internet <URL:https://arxiv.org/pdf/1412.6550v4> [retrieved on 20191210] *
YUZUPEPPER: "Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach", A LOSS CORRECTION APPROACH,, 20 August 2017 (2017-08-20), Qiita, pages 2233 - 2241, XP033249566, Retrieved from the Internet <URL:https://qiita.com/yuzupepper/items/bl55b3487118626f62f2> [retrieved on 20191210] *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283578A (zh) * 2021-04-14 2021-08-20 南京大学 一种基于标记风险控制的数据去噪方法
JP2023034529A (ja) * 2021-08-31 2023-03-13 株式会社Jvcケンウッド 画像処理装置、画像処理方法、および画像処理プログラム
JP7683426B2 (ja) 2021-08-31 2025-05-27 株式会社Jvcケンウッド 画像処理装置、画像処理方法、および画像処理プログラム
JP2024537971A (ja) * 2021-09-25 2024-10-18 メディカル・エーアイ・カンパニー・リミテッド 医療データに基づくディープラーニングモデルの学習及び推論方法、プログラム及び装置
JP7847205B2 (ja) 2021-09-25 2026-04-16 メディカル・エーアイ・カンパニー・リミテッド 医療データに基づくディープラーニングモデルの学習及び推論方法、プログラム及び装置
JP2024080116A (ja) * 2022-12-01 2024-06-13 株式会社東芝 学習装置、方法、プログラム及び推論装置
JP7840834B2 (ja) 2022-12-01 2026-04-06 株式会社東芝 学習装置、方法、プログラム及び推論装置

Also Published As

Publication number Publication date
US20220343163A1 (en) 2022-10-27
JPWO2021064787A1 (https=) 2021-04-08
JP7468540B2 (ja) 2024-04-16

Similar Documents

Publication Publication Date Title
WO2021064787A1 (ja) 学習システム、学習装置、および学習方法
US11264044B2 (en) Acoustic model training method, speech recognition method, acoustic model training apparatus, speech recognition apparatus, acoustic model training program, and speech recognition program
US9870768B2 (en) Subject estimation system for estimating subject of dialog
US10395646B2 (en) Two-stage training of a spoken dialogue system
US20220405682A1 (en) Inverse reinforcement learning-based delivery means detection apparatus and method
US20190130212A1 (en) Deep Network Embedding with Adversarial Regularization
US20190267023A1 (en) Speech recognition using connectionist temporal classification
CN112465138A (zh) 模型蒸馏方法、装置、存储介质及设备
US12518168B2 (en) Training and application method apparatus system and storage medium of neural network model
US12437214B2 (en) Machine-learning system and method for identifying same person in genealogical databases
CN114861917A (zh) 贝叶斯小样本学习的知识图谱推理模型、系统及推理方法
US11456003B2 (en) Estimation device, learning device, estimation method, learning method, and recording medium
KR20200128938A (ko) 모델 학습 방법 및 장치
JP2018106216A (ja) 学習データ生成装置、開発データ生成装置、モデル学習装置、それらの方法、及びプログラム
CN112819050A (zh) 知识蒸馏和图像处理方法、装置、电子设备和存储介质
US12033658B2 (en) Acoustic model learning apparatus, acoustic model learning method, and program
US10741184B2 (en) Arithmetic operation apparatus, arithmetic operation method, and computer program product
Nakamachi et al. Text simplification with reinforcement learning using supervised rewards on grammaticality, meaning preservation, and simplicity
CN113763928B (zh) 音频类别预测方法、装置、存储介质及电子设备
Petelin et al. Evolving Gaussian process models for predicting chaotic time-series
CN111737439A (zh) 一种问题生成方法及装置
WO2021064931A1 (ja) ナレッジトレース装置、方法、および、プログラム
CN110414845A (zh) 针对目标交易的风险评估方法及装置
US12499896B2 (en) Learning device, method, and program
KR20200120987A (ko) 상관관계 점수 행렬 생성 알고리즘을 이용한 인공 신경망 기반의 장소 인식 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19947550

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021550747

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19947550

Country of ref document: EP

Kind code of ref document: A1