WO2021100818A1 - Learning method and learning device employing augmentation - Google Patents

Learning method and learning device employing augmentation Download PDF

Info

Publication number
WO2021100818A1
WO2021100818A1 PCT/JP2020/043248 JP2020043248W WO2021100818A1 WO 2021100818 A1 WO2021100818 A1 WO 2021100818A1 JP 2020043248 W JP2020043248 W JP 2020043248W WO 2021100818 A1 WO2021100818 A1 WO 2021100818A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning
padding
error
multidimensional
Prior art date
Application number
PCT/JP2020/043248
Other languages
French (fr)
Japanese (ja)
Inventor
剛 岡留
敦也 井手
Original Assignee
学校法人関西学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 学校法人関西学院 filed Critical 学校法人関西学院
Priority to JP2021558450A priority Critical patent/JP7160416B2/en
Publication of WO2021100818A1 publication Critical patent/WO2021100818A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a technique that enables highly accurate recognition in machine learning, especially in a deep neural network (DNN) even when there is little learning data.
  • DNN deep neural network
  • AI Artificial Intelligence
  • DNN Deep Network
  • DNN Deepened neural network
  • the DNN classifier is required to have generalization performance. That is, it is necessary to be able to accurately predict even unknown data. Therefore, the error calculation (cross entropy) between the probability distribution of the prediction label and the true probability distribution is performed, and the parameters are optimized so that the error becomes small.
  • data augmentation is generally performed to increase the data by processing the data at hand at the time of training such as scaling, inversion, rotation, or adjusting the contrast. ing.
  • a method called ensemble learning in which a plurality of predictors are trained and their prediction results are integrated at the time of operation, is also known.
  • Non-Patent Document 1 a technique related to a convolutional neural network for image recognition focusing on ensemble learning
  • Non-patent Document 2 a technique related to a neural network having a residual structure
  • Patent Document 1 a learning processing method for identifying whether or not a disease is present based on skin image data.
  • all of the above documents utilize the output results predicted by a plurality of learners, and do not utilize the output results in which a plurality of inflated learning data are input to one learner.
  • Non-Patent Document 1 and Patent Document 1 a technique for inflating learning data by rotating and inverting an image is disclosed, but some of the plurality of learning data after inflating are used for learning. As a result, there may be data that deteriorates the recognition accuracy, and if it is used as it is as learning data after padding, there is a problem that the recognition accuracy cannot be sufficiently improved as a result.
  • GridNet Convolutional Neural Network for Image Recognition Focusing on Ensemble Learning, Atsushi Takeda, FIT2017 (16th Information Science and Technology Forum), 2017. : Residual Networks Behave Like Ensembles of Relatively Shallow Networks, Andreas Veit et al., Advances in Neural Information Processing Systems, pp.550-558, 2016.
  • the present invention excludes data that deteriorates recognition accuracy from the inflated learning data in order to improve the generalization ability that can accurately predict even unknown data and improve the recognition accuracy in machine learning. It is an object of the present invention to provide a learning method and a learning device capable of learning efficiently.
  • the learning method causes one classifier to input a plurality of learning data after padding with respect to the multidimensional amount of learning data before padding.
  • the probability distributions of the prediction labels output for each of the plurality of training data at least one probability distribution selected based on the error from the correct answer of the training data before padding is used to obtain the training data before padding. Learn based on the error from the correct answer.
  • the padding includes padding that does not involve information deterioration such as inversion, rotation, and translation with respect to the training data before padding, and padding that causes information degradation with respect to the learning data before padding, or both. Inflating can be applied.
  • the correct answer of the training data before padding is the same as the correct answer of the plurality of training data after padding with respect to the learning data before padding.
  • the correct label does not change even if the information is degraded. That is, the same correct answer label as the training data before padding is assigned to the plurality of training data after the information deterioration, and the probability distribution of the correct answer is also the same.
  • cross entropy which is an index showing how far the two probability distributions are apart. That is, by using cross entropy as a loss function, efficient learning becomes possible.
  • the error from the correct answer of the training data before padding is selected as a scale from the probability distributions of the prediction labels output for each of the plurality of training data. Even if it is not used much, it can be learned and the generalization ability can be improved. In the present invention, the data that is not suitable for learning has a small error from the correct answer.
  • a plurality of training data after padding are input to one classifier for a multidimensional amount of training data before padding, and for a plurality of training data. Based on the probability distribution of each output prediction label, learning is performed using at least one training data after padding selected by measuring the error from the correct answer of the training data before padding.
  • a plurality of training data after padding are input to one classifier for a multidimensional amount of training data before padding, and for a plurality of training data. Based on the probability distribution of each output prediction label, one training data after padding selected by using the error from the correct answer of the training data before padding as a scale is replaced with the training data before padding for learning.
  • the multidimensional amount of training data before padding is one source data in the mini-batch, the source data is padded and extended to multiple extended data, and each extended data. Is input to the classifier, and based on the output probability distribution of the prediction label, one extended data selected by using the error from the correct answer of the original data as a scale is replaced with the original data for learning.
  • information deterioration means rewriting a part of data to lose information, adding noise to the original data of a multidimensional amount, deleting a part of the multidimensional amount, or determining. It means to set a value and reduce the amount of information that the original data has. For example, information is degraded by filling a specific partial area in black in the image before padding (the pixel value of the specific partial area is set to 0 (zero)) or by setting the pixel value of the specific partial area to a specific value. Can be made to.
  • the training data used in the present invention is represented by a multidimensional quantity of pixel dimensions constituting the one image or a multidimensional quantity representing the characteristics of the image.
  • the training data is one-dimensional time-series data, it is represented by a multi-dimensional quantity in which the data values at each time are arranged for the duration, or a multi-dimensional quantity representing the characteristics of the one-dimensional time-series data, or training.
  • the data is multidimensional time series data, it is represented by a multidimensional quantity in which the multidimensional data values at each time are arranged for the duration, or a multidimensional quantity representing the characteristics of the multidimensional time series data.
  • the data selected on the basis of the error from the correct answer is preferably the data sampled with a high probability of being selected according to the magnitude of the error from the correct answer.
  • the data selected based on the error from the correct answer refers to probability distribution data or learning data.
  • the error between the probability distribution data (categorical distribution) of the predicted label and the probability distribution data (one-hot expression) of the correct answer label output for each of the multiple training data after padding is calculated, and the calculated error is large.
  • the probability of selection of the probability distribution data of each output prediction label is increased and sampled. As for the probability distribution data selected by sampling, the larger the error from the correct answer, the higher the probability of selection and the easier it is to select.
  • the learning efficiency is higher when the data has a larger error from the correct answer.
  • Data with a large error from the correct answer is difficult data for the classifier, and it is presumed that learning with difficult data will improve learning efficiency rather than simple data (data with a small error from the correct answer).
  • learning efficiency tends to be higher when learning a mixture of different data than when learning only difficult data. Therefore, the larger the error from the correct answer, the higher the probability of selection, and the smaller the error from the correct answer, the lower the probability of selection.
  • the number of data to be sampled may be one or two or more.
  • the data selected on the basis of the error from the correct answer may be the data having the maximum error from the correct answer.
  • the data that maximizes the error from the correct answer may be one or two or more. There may be two or more maximum data. When there are two or more, any one data may be randomly selected, or a plurality of data may be used.
  • the data selected on the basis of the error from the correct answer may be the data having the minimum difference from the average value of the error from the correct answer.
  • the probability distribution data and training data selected based on the error from the correct answer are based on the average value and median value of the error from the correct answer, the quartile value, the 3/4 division value, and the load average value. It can be selected, but it is preferable to select in order from the probability distribution closest to the average value of the error.
  • the data that minimizes the difference from the average value of the error from the correct answer may be one or two or more. There may be two or more data that minimize the difference from the average value. When there are two or more, any one data may be randomly selected, or a plurality of data may be used.
  • the probability distribution selected with the above error as a scale is preferably two or more. By selecting two or more probability distributions, the recognition accuracy can be further improved. Similarly, when the number of learning data selected using the above error as a scale is two or more, the recognition accuracy can be further improved.
  • the time-series data is data that changes with time, and examples thereof include time-series acceleration data obtained by an acceleration sensor mounted on a human. Based on the time-series acceleration data obtained by the acceleration sensor attached to the human, it is possible to predict and classify the daily behavior of the human. Dividing time-series data into smaller time-series data means dividing the obtained time-series acceleration data into sections of a certain period of time and adding random noise to the acceleration data in some division areas, or some. It is to learn the classifier model of human daily behavior by modifying the acceleration data of the divided region to zero and modifying it to deteriorated acceleration data and using them as inflated data.
  • the training data is image data
  • at least one of rotation, inversion, zoom, movement, cropping, division, blurring, and noise is applied to the image data
  • the training data is time.
  • at least one of inversion, movement, cutting, division, and noise may be applied to the time series data to obtain a plurality of training data after padding.
  • the learning device inputs an inflated processing unit that inflates a plurality of training data with respect to the multidimensional amount of learning data before inflating, and a plurality of learning data after inflating. At least 1 is measured by the error between the label prediction unit that predicts the label using one classifier and the correct answer of the training data before padding in the probability distribution of the prediction labels output for each of the plurality of training data. It includes a selection unit that selects one probability distribution, and a learning unit that learns based on the error between the selected probability distribution and the correct answer of the training data before padding.
  • the learning device inputs an inflated processing unit that inflates a plurality of learning data with respect to the multidimensional amount of learning data before inflating, and a plurality of learning data after inflating. Based on the probability distribution of the prediction label output for each of the multiple training data and the label prediction unit that predicts the label using one classifier, at least 1 is measured by the error between the correct answer of the training data before padding. It is provided with a selection unit for selecting one inflated learning data and a learning unit for learning using the selected inflated learning data.
  • the learning device inputs an inflated processing unit that inflates a plurality of training data with respect to the multidimensional amount of training data before inflating, and a plurality of learning data after inflating. Based on the probability distribution of the prediction label output for each of the multiple training data and the label prediction unit that predicts the label using one classifier, one scale is the error between the correct answer of the training data before padding. It is provided with a selection unit for selecting the learning data after padding, and a learning unit for learning by replacing the selected learning data after padding with the learning data before padding.
  • the data selected on the basis of the error from the correct answer is preferably data sampled with a high probability of being selected according to the magnitude of the error from the correct answer. ..
  • the data selected based on the error from the correct answer refers to the probability distribution data or the training data, and the probability distribution data of the prediction label output for each of the plurality of training data after padding and the correct answer label.
  • the error from the probability distribution data is calculated, and the probability of selection of the probability distribution data of each output prediction label is increased and sampled according to the magnitude of the calculated error.
  • the data selected on the basis of the error from the correct answer may be the data having the maximum error from the correct answer.
  • the data selected on the basis of the error from the correct answer may be the data having the minimum difference from the average value of the error from the correct answer.
  • the padding processing unit rewrites a part of the data to deteriorate the learning data of the multidimensional amount before padding, and when the learning data is an image, one image is the 1st.
  • the training data is one-dimensional time-series data, which is represented by a pixel-dimensional multidimensional quantity that constitutes an image or a multidimensional quantity that represents the characteristics of the image
  • the data value at each time is used for the duration. If the multidimensional quantity is arranged or is represented by a multidimensional quantity that represents the characteristics of the one-dimensional time series data, or if the training data is multidimensional time series data, the multidimensional data value at each time is used for the duration.
  • the information deterioration may be performed by adding noise to the original data of the multidimensional amount, or deleting a part of the multidimensional amount or setting it to a predetermined value to reduce the amount of information contained in the original data.
  • the learning method and learning device of the present invention there is an effect that data that deteriorates recognition accuracy can be excluded from the learning data after padding and learning can be performed efficiently, and generalization ability and recognition accuracy can be improved.
  • Functional block diagram of the learning device of the first embodiment Schematic flow chart of the learning method of Example 1 Image image before padding Explanatory drawing of cutout of image before padding Image after cutting out Output image of probability distribution Explanatory diagram of harmonization processing when selecting M pieces around the average Explanatory diagram of how to inflate learning data Graph showing the relationship between prediction accuracy and model size (1) Graph showing the relationship between prediction accuracy and model size (2) Schematic flow chart of the learning method of Example 2 Schematic flow chart of the learning method of Example 3
  • Functional block diagram of the learning device of the fourth embodiment Illustrated diagram of expansion and selection of training data for mini-batch Explanatory diagram of typical data expansion Explanatory diagram of how to select an extended data image with a probability distribution close to the average value of the error Explanatory diagram of how to select an extended data image of the probability distribution with the maximum error Explanatory drawing of the method of sampling and selecting with a high probability of being selected according to the magnitude of the error Part 1 Explanatory drawing of the method of sampling and selecting with a high probability of being selected according to the magnitude
  • FIG. 1 shows a functional block diagram of an embodiment of the learning device of the present invention.
  • the learning device 10 includes an inflating processing unit 1, a label prediction unit 2, a selection unit 3, and a learning unit 4.
  • the padding processing unit 1 is input with the learning data 11 before padding, and performs padding without information deterioration such as inversion, rotation, and translation, or deteriorates the information possessed by the learning data 11 to paddle, and after padding.
  • the learning data 21 of the above is output.
  • the label prediction unit 2 inputs the inflated learning data 21 to the classifier 22, and outputs prediction labels for each of the plurality of training data.
  • the selection unit 3 inputs the probability distribution 15 of the output prediction label, calculates the error 31 from the correct answer of the training data before padding, and uses the error as a scale to calculate any probability distribution in the probability distribution 15. Select. Specifically, it is sampled and selected so that the probability distribution close to the average value of the error 31 or the probability distribution with the largest error 31 or the probability of being selected according to the magnitude of the error 31 is high. Select one of the probability distributions.
  • the learning unit 4 inputs the probability distribution 32 selected with the error from the correct answer as a scale, further calculates the error 41 from the correct answer, and adjusts the weight parameter 42 of the classifier 22.
  • the present invention is also useful for padding without information deterioration such as inversion, rotation, and translation.
  • FIG. 2 shows a processing flow diagram of one embodiment of the learning method of the present invention.
  • the training data is inflated (step S01).
  • the image image before padding is shown in FIG.
  • the image 5 shown in FIG. 3 is used as one learning data before padding.
  • Image 5 has a black background and is white with the number "7" displayed.
  • FIG. 4 shows an example of extracting a plurality of images after padding whose information has been deteriorated from one original image before padding.
  • an example of cutting out four images 51 to 54 after inflating from the image 5 before inflating will be shown.
  • the padding is not limited to cutting out four images, and it is possible to cut out even more images.
  • each of the four inflated images 51 to 54 is an image in which information is deteriorated by cutting out a part of "7".
  • the image 51 is a cutout of the upper right position of "7”
  • the image 52 is a cutout of the upper left position of "7".
  • the image 53 is a slightly lower right position from the center of the “7”, and the image 54 is a cutout of a lower position of the “7”.
  • FIGS. 5 (1) to 5 (4) show images after cutting out.
  • the image 51 shown in FIG. 5 (1) can be easily identified as a part of the number “7” by the human eye, whereas the images 52 and 5 (3) shown in FIG. 5 (2) show.
  • the image 53 shown and the image 54 shown in FIG. 5 (4) it is difficult for the human eye to determine whether or not the content of the image is a part of the number “7”.
  • FIG. 6 is an output image of a probability distribution classified by image recognition by a classifier.
  • FIG. 6 (1) is a probability distribution of the prediction labels output for each of the plurality of training data
  • FIG. 6 (2) is a probability distribution showing the correct answer of the training data before padding.
  • N 4
  • the probability distribution indicating the correct answer of the training data is "7" among the prediction labels "0" to “9", and "0" to "”. 6 ”,“ 8 ”and“ 9 ”are 0 (one-hot expression).
  • FIG. 7 shows an explanatory diagram of the harmonization process when M pieces (M is a natural number) around the average value are selected. An image 6 that input x i, the true label 7 is set to y i. Here, "Frog" is set as the correct answer label 7.
  • the image data group 60 after padding there are M image data, and in FIG. 7, four image data (6a to 6d) are displayed.
  • the leftmost image data 6a has a frog's face hidden and is not like a frog, so it is a distance from other image data points on a one-dimensional line. Is far away.
  • the horizontal axis of the graph 9 shows the error calculated from the image data group 60 after padding and its label 7.
  • the leftmost image data 6a in which the frog's face is hidden has a large error because it is difficult to classify it correctly, and a point is drawn at the position from the rightmost position.
  • the broken line 14 in the graph 9 shows the average value of the output of the cross entropy error function (loss function). If most of the inflated image data group 60 has enough information to classify correctly, it is close to the average value of the output of the loss function, and the data that cannot be correctly classified is from the average value. It can be judged that the image data is distant and not useful for learning.
  • image data close to the average value is useful for learning and can improve the accuracy of the classifier after learning, so it is actively used for learning.
  • two image data (6b, 6c) close to the average value of the output of the loss function shown by the broken line 14 in the graph 9 are extracted as useful data for learning. , Will be used for learning.
  • the image data group 60 also includes image data other than the image data (6a to 6d), and here, two images (6e, 6f) are used for learning.
  • a probability distribution having a small error from the average value of the output of the cross entropy error function (loss function) is selected (step S04). Then, the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected in step S04 and the correct answer becomes small, and the weight parameter of the classifier is used by the error back propagation method. Is adjusted (step S05).
  • Example 1 using image data set Using image data sets (CIFAR-10, CIFAR-100) as training data, the effectiveness of the learning device and learning method of this example was verified.
  • CIFAR-10 and CIFAR-100 are composed of 32 ⁇ 32 pixel RGB images.
  • Dropout is a method of invalidating neurons in the hidden layer with a certain probability for a network during learning, and is useful for improving generalization performance.
  • FIG. 8 is an explanatory image diagram of the method of padding the training data, (1) is the image data before padding, (2) is zero padding, (3) is random horizontal inversion, (4) is randomly cut out, and (5). Shows an example of clipping.
  • the image 61 shown in FIG. 8 (2) is obtained by applying zero padding to the image 6 shown in FIG. 8 (1), and the periphery of the image is filled with zeros. The range filled with zeros is 4 pixels in all directions.
  • the image 62 shown in FIG. 8 (3) is an image 6 horizontally inverted with a predetermined probability.
  • the image 63 shown in FIG. 8 (4) is a randomly cut out image 6.
  • the cutout size is 32 ⁇ 32 pixels, which is the same as the image 61.
  • the image 64 shown in FIG. 8 (5) has a cutout portion 64a provided in the image 6. Even if the image 61 is a large image, an important portion of the image may disappear depending on the location where the cutout portion 64a is provided.
  • the batch size is 64 and the number of epochs is 100.
  • a stochastic gradient descent (SGD) method was used.
  • the learning rate is 0.1, the weight decay is 0.0005, and the momentum is 0.9, and the learning rate is increased by 0.2 times every 40, 60, and 80 epochs. did.
  • SGD stochastic gradient descent
  • 32 inflated image data were prepared, their errors were calculated, and M samples close to the average of the errors were selected and used for learning.
  • the specific harmonization process is as described above.
  • the experimental results in Tables 1 and 2 below represent the error rate (%) of the top candidate.
  • Example A As shown in Table 1 above, when CIFAR-10 was used, it was 5.43% ⁇ 0.174 in Comparative Example 1 and 4.97% in Example A in which the one closest to the average error was selected. It was ⁇ 0.095, and it was found that the error rate was lower in Example A. Also, when CIFAR-100 is used, it is 24.63% ⁇ 0.280 in Comparative Example 1 and 23.93% ⁇ 0.225 in Example A, and the error rate is lower in Example A. It turned out.
  • Example B or Example C As shown in Table 2 above, when CIFAR-10 was used, it was 4.34% ⁇ 0.108 in Comparative Example 2, and 3.85 in Example B in which four were selected from those close to the average error. % ⁇ 0.039, which is 4.02% ⁇ 0.158 in Example C in which eight are selected from those close to the average of the errors, and the error rate in Example B or Example C is higher than that in Comparative Example 2. Turned out to be low. Further, also in the case of using CIFAR-100, 20.64% ⁇ 0.095 in Comparative Example 2, 20.05% ⁇ 0.140 in Example B, and 19.32% ⁇ 0.295 in Example C. Therefore, it was found that the error rate was lower in Example B or Example C than in Comparative Example 2.
  • FIGS. 9 and 10 show graphs showing the relationship between prediction accuracy and model size.
  • the graph shown in FIG. 9 shows Example D in which the image data of 32 ⁇ 32 pixels before padding was subjected to information deterioration processing to 27 ⁇ 27 pixels after padding, and Comparative Example 3a with padding without information deterioration treatment and padding. It is a plot of Comparative Example 4a without.
  • FIG. 9 (1) shows the prediction accuracy of the top candidate when CIFAR-10 is used
  • FIG. 9 (2) shows the prediction accuracy of the top candidate when CIFAR-100 is used
  • FIG. 9 (3) shows the prediction accuracy of the top candidate. It shows the prediction accuracy of the top five candidates when CIFAR-100 is used.
  • FIGS. 9 show the prediction accuracy of the top five candidates when CIFAR-100 is used.
  • Example D As compared with Comparative Examples 3a and 4a, it was found that the prediction accuracy of not only the top candidate but also the top five candidates is less likely to decrease even if the size of the learning model is reduced.
  • FIG. 10 shows Example E in which information deterioration processing was performed on the image data of 32 ⁇ 32 pixels before padding to 23 ⁇ 23 pixels after padding, Comparative Example 3b with padding, and Comparative Example 4b without padding.
  • FIG. 10 (1) shows the prediction accuracy of the top candidate when CIFAR-10 is used
  • FIG. 10 (2) shows the prediction accuracy of the top candidate when CIFAR-100 is used
  • FIG. 10 (3) shows the prediction accuracy of the top candidate. It shows the prediction accuracy of the top five candidates when CIFAR-100 is used. It was confirmed that the prediction accuracy of Example E was higher than that of Comparative Example 3b and Comparative Example 4b in any of FIGS. 10 (1) to 10 (3). Further, in the case of Example E as well, it was found that, as in Example D, the accuracy of prediction of not only the top candidate but also the top five candidates does not easily decrease even if the model size becomes small.
  • FIG. 11 shows a processing flow diagram of another embodiment of the learning method of the present invention.
  • the learning data image is inflated in the same manner as in the first embodiment (step S11).
  • the images after padding may be images whose information is deteriorated with respect to the original image before padding, or images which are not accompanied by information degradation with respect to the original image such as rotation and translation.
  • a plurality of inflated images are input to one classifier (step S12).
  • the output image of the probability distribution of the predicted label classified and output by the image recognition by the classifier and the probability distribution showing the correct answer of the training data before padding are the same as those shown in FIG.
  • the cross-entropy error between the plurality of probability distributions of the predicted labels output for each of the plurality of images after padding and the correct answer of the original image before padding is calculated (step S13).
  • the probability distribution having the maximum output of the cross entropy error function (loss function) is selected (step S14).
  • the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected in step S14 and the correct answer becomes small, and the weight parameter of the classifier is used by the error back propagation method. Is adjusted (step S15).
  • FIG. 12 shows a processing flow diagram of another embodiment of the learning method of the present invention.
  • the learning data image is inflated in the same manner as in the first embodiment (step S21).
  • the images after padding may be images whose information is deteriorated with respect to the original image before padding, or images which are not accompanied by information degradation with respect to the original image such as rotation and translation.
  • a plurality of inflated images are input to one classifier (step S22).
  • the output image of the probability distribution of the predicted label classified and output by the image recognition by the classifier and the probability distribution showing the correct answer of the training data before padding are the same as those shown in FIG.
  • the cross entropy error between the plurality of probability distributions of the predicted labels output for each of the plurality of images after padding and the correct answer of the original image before padding is calculated (step S23). From the probability distributions of the prediction labels output for each of the plurality of inflated images, the probability of being selected is increased according to the magnitude of the error and sampled, and the probability distribution is selected (step S24). .. Then, the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected by sampling in step S24 and the correct answer becomes small, and the classifier uses the error backpropagation method. The weight parameter is adjusted (step S25).
  • FIG. 13 shows a functional block diagram of another embodiment of the learning device of the present invention.
  • the learning device 10a includes an inflating processing unit 1, a label prediction unit 2, a selection unit 3, and a learning unit 4a.
  • the padding processing unit 1 inputs the learning data 11 before padding and outputs the learning data 21 after padding.
  • the label prediction unit 2 inputs the inflated learning data 21 to the classifier 22, and outputs prediction labels for each of the plurality of training data.
  • the selection unit 3 inputs the probability distribution 15 of the output prediction label, calculates the error 31 from the correct answer of the learning data before padding, and selects the learning data using the error as a scale.
  • the inflated learning data having the largest probability distribution of the error 31, or the magnitude of the error 31.
  • One of the inflated training data selected by sampling so as to increase the probability of being selected is selected.
  • the classifier 22 inputs the training data after padding selected with the error from the correct answer as a scale, and the error calculation 41 between the probability distribution of the prediction label output by the classifier 22 and the correct answer is performed. , The weight parameter adjustment 42 of the classifier 22 is performed.
  • DNN deep learning
  • FIG. 14 shows a conceptual diagram of a learning method using a mini-batch.
  • one extended data is selected using the error from the correct answer of the original data as a measure (see the “selection” arrow in FIG.
  • the method of selecting the extended data using the error as a measure is to select the extended data of the probability distribution close to the average value of the error, select the extended data of the probability distribution with the largest error, or adjust to the magnitude of the error. Use one of the methods of selecting the extended data by sampling with a high probability of being selected accordingly. d) Replace the selected extended data with the original data and train the classifier as training data for the mini-batch.
  • FIG. 15 shows a typical data extension.
  • FIG. 15 shows four data extensions: inversion, cropping, cutout, and mixup.
  • Inversion and other rotations and translations relatively preserve and extend the information in the original data.
  • the cropping cuts and expands unnecessary parts such as margins while leaving the important parts in the original image as they are. Inversion, rotation, translation, and cropping do not degrade the original data.
  • the cutout cuts (hides) a part of the important part, and the mixup fuses the two images, both of which deteriorate the information of the original data.
  • random extended data can be created by randomly selecting the size and position of the portion to be cut from the original data.
  • the extended method is not limited to the above, and other methods can also be applied.
  • FIG. 16 is an explanatory diagram of a method of selecting extended data of a probability distribution close to the average value of errors
  • FIG. 17 is an explanatory diagram of a method of selecting extended data of a probability distribution having the maximum error
  • FIG. 18 is an explanatory diagram of a method of selecting extended data of a probability distribution having the maximum error. It is explanatory drawing of the method of selecting extended data by sampling with a high probability of being selected according to the size of. First, as shown in FIG.
  • the method of selecting the extended data of the probability distribution close to the average value of the error from the correct answer is to expand one original image to M for each original image in the mini-batch, and during learning.
  • M extended data images are input to the classifier (DNN), and the probability distribution of the prediction label of each of the M extended data images is output.
  • the correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image closest to the average value of the error is replaced with the original image, and the training data of the mini-batch is used.
  • the method of selecting the extended data of the probability distribution having the largest error from the correct answer is to expand one original image to M for each original image in the mini-batch and classify during learning.
  • M extended data images are input to the device (DNN), and the probability distribution of the prediction label of each of the M extended data images is output.
  • the correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image having the largest error is replaced with the original image, and the learning data of the mini-batch is used.
  • the method of selecting the extended data by sampling with a high probability of being selected according to the magnitude of the error is as shown in FIGS. 18 and 19, for each original image in the mini-batch, one original image. Is expanded to M sheets, M sheets of extended data images are input to the classifier (DNN) in the middle of learning, and the probability distribution of the prediction label of each of the M sheets of extended data images is output.
  • the correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image selected by sampling with a high probability of being selected according to the magnitude of the error is replaced with the original image to obtain the learning data of the mini-batch.
  • the error When increasing the probability of selection according to the magnitude of the error, as shown in FIG. 19, the error may be normalized and the probability of selection may be set using a relative error based on the minimum error.
  • the probability that the extended data image with the minimum error is selected becomes 0, and it can be excluded from the sampling target.
  • Data with a small error from the correct answer is data that can easily reach the correct answer and is not suitable for learning. Therefore, the probability of being selected according to the magnitude of the relative error is increased, data that is not suitable for learning is excluded from the sampling target, and extended data is selected.
  • the batch size is 64 and the number of epochs is 100.
  • a stochastic gradient descent method (SGD: Stochastic Gradient Descent) was used.
  • the learning rate (Learning Rate) is 0.1
  • the weight decay (weight decay) is 0.0005
  • the inertial term (Momentum) is 0.9
  • the learning rate is increased by 0.2 times every 40, 60, and 80 epochs. did.
  • the padding 1 is a padding of the original image by performing a normal padding without information deterioration such as inversion, and padding to 8 images.
  • the padding 2 is a padding of the original image using a normal padding and a cutout accompanied by information deterioration, and padding to 8 images.
  • the padding 3 is a padding of the original image using a normal padding and a mixup accompanied by information deterioration, and padding to 8 images.
  • Example E As shown in the experimental results in Table 4 above, when CIFAR-10 was used, inflating 1 without information deterioration, inflating 2 and inflating 3 with information deterioration, Example F and Example in all cases. It was shown that G has a higher accuracy rate than Comparative Example 5, which is a conventional learning method. However, in Example E, in the case of padding 1, the correct answer rate was higher than that of Comparative Example 5, but in the case of padding 2 and 3, the correct answer rate was substantially the same or lower than that of Comparative Example 5. .. Further, as shown in the experimental results in Table 5 above, when CIFAR-100 is used, Example E corresponding to three types of image selection methods after padding, regardless of the presence or absence of information deterioration at the time of padding.
  • Example F and Example G all showed that the correct answer rate was higher than that of Comparative Example 5, which is a conventional learning method. Comparing the three types of padding, it was shown that the padding with information deterioration (padding 2 and 3) had a higher accuracy rate in almost all cases than the padding 1 without information deterioration.
  • the present invention is useful as a technique that enables highly accurate recognition in machine learning, especially in DNN.
  • Inflating processing unit 2 Label prediction unit 3 Selection unit 4, 4a Learning unit 5, 6, 6a to 6f, 51 to 54, 61 to 64 Image 7 Label 8, 9 Graph 10, 10a Learning device 11 Learning data before padding 15 Probability distribution of output prediction label 21 Training data after padding 22 Classifier 31 Error with correct answer of training data before padding 32 Probability distribution selected based on error with correct answer 41 Error calculation with correct answer 42 Classifier Adjust the weight parameter of 60 Image data group 64a Cutout part

Abstract

In relation to machine learning, the objective of the present invention is to provide a learning method and device with which it is possible to improve generalization capability and to improve recognition accuracy. A plurality of items of learning data after augmentation with respect to multidimensional quantity learning data before augmentation are input into one classifier, and from among probability distributions of prediction labels output for each of the plurality of items of learning data, at least one probability distribution, selected using the error compared with the correct answer of the learning data before augmentation as a scale, is used to perform learning on the basis of the error compared with the correct answer of the learning data before augmentation. Data sampled with the probability of selection increased in accordance with the magnitude of the error compared with the correct answer, data with which the error compared with the correct answer is greatest, or data with which the difference from the average value of the error compared with the correct answer is smallest are used as the data selected using the error compared with the correct answer as a scale.

Description

水増しを用いた学習方法及び学習装置Learning method and learning device using padding
 本発明は、機械学習、特に、深層ニューラルネットワーク(DNN;Deep Neural Network)において、学習用データが少ない場合でも高精度な認識を可能にする技術に関するものである。 The present invention relates to a technique that enables highly accurate recognition in machine learning, especially in a deep neural network (DNN) even when there is little learning data.
 近年、計算機の技術発展などにより、AI(Artificial Intelligence)は目覚ましい進化を遂げている。その中心的な技術はニューラルネットワーク(NN;Neural Network)であり、特にニューラルネットワークを深層化したDNNが技術を席巻している。一般に、DNNは層が深いほど表現力が増すため、近年は、層の深いDNNの設計や、層の深いDNNをうまく学習させる方法が多く研究されている。 In recent years, AI (Artificial Intelligence) has undergone remarkable evolution due to technological developments in computers. The central technology is a neural network (NN; Neural Network), and in particular, DNN, which is a deepened neural network, is sweeping the technology. In general, the deeper the layer of DNN, the more expressive it becomes. Therefore, in recent years, many studies have been conducted on the design of a DNN with a deep layer and a method for successfully learning a DNN with a deep layer.
 しかしながら、層を深くするためには、学習に大量のデータが必要となるが、実世界ではデータ準備のコストが大きいという問題がある。また、層を深くすることで生まれる問題点としては、過学習を引き起こしやすいという問題がある。これは、モデルの表現力が高すぎるためデータにフィットしすぎるというものである。さらに、層を深くするとパラメータ数が膨大なものとなるため、モデルサイズが大きくなり、モバイル端末での利用が困難となったり、予測時の計算量が多くなったりするという問題がある。 However, in order to deepen the layer, a large amount of data is required for learning, but there is a problem that the cost of data preparation is high in the real world. In addition, as a problem created by deepening the layer, there is a problem that overfitting is likely to occur. This means that the model is too expressive and fits the data too much. Further, if the layer is deepened, the number of parameters becomes enormous, so that there is a problem that the model size becomes large, it becomes difficult to use it on a mobile terminal, and the amount of calculation at the time of prediction increases.
 一方で、DNNの分類器には汎化性能が求められる。すなわち、未知のデータでも精度よく予測できる必要がある。そこで、予測ラベルの確率分布と真の確率分布との誤差計算(交差エントロピー)を行い、誤差が小さくなるようにパラメータを最適化している。
 予測精度を高めるための工夫としては、訓練時に、手元にあるデータを、拡大縮小、反転、回転、又はコントラストを調整する等の加工を施しデータを増やすデータの水増し(data augmentation)が一般に行われている。また、複数の予測器を学習させ、運用時にそれらの予測結果を統合するアンサンブル学習という手法も知られている。
On the other hand, the DNN classifier is required to have generalization performance. That is, it is necessary to be able to accurately predict even unknown data. Therefore, the error calculation (cross entropy) between the probability distribution of the prediction label and the true probability distribution is performed, and the parameters are optimized so that the error becomes small.
As a device to improve the prediction accuracy, data augmentation is generally performed to increase the data by processing the data at hand at the time of training such as scaling, inversion, rotation, or adjusting the contrast. ing. In addition, a method called ensemble learning, in which a plurality of predictors are trained and their prediction results are integrated at the time of operation, is also known.
 また、学習時においてアンサンブルを行う技術としては、アンサンブル学習に着目した画像認識のための畳み込みニューラルネットワークに関する技術(非特許文献1を参照)や、残差構造を持ったニューラルネットワークに関する技術(非特許文献2を参照)、皮膚画像データに基づいて、疾患であるか否かを識別する学習処理方法(特許文献1を参照)が開示されている。
 しかしながら、上記文献は何れも、複数の学習器により予測された出力結果を利用するものであり、水増し後の複数の学習データを1つの学習器に入力させた出力結果を利用するものではない。
 また、上記非特許文献1及び特許文献1においては、画像を回転、反転して学習データの水増しを行う技術が開示されているが、水増し後の複数の学習データの中には、学習に用いることにより、認識精度を悪化させるデータが存在することがあり、そのまま水増し後の学習データとして用いると、結果として、認識精度の向上が十分に図られないという問題があった。
In addition, as a technique for performing an ensemble at the time of learning, a technique related to a convolutional neural network for image recognition focusing on ensemble learning (see Non-Patent Document 1) and a technique related to a neural network having a residual structure (non-patent). (Refer to Document 2), a learning processing method (see Patent Document 1) for identifying whether or not a disease is present based on skin image data is disclosed.
However, all of the above documents utilize the output results predicted by a plurality of learners, and do not utilize the output results in which a plurality of inflated learning data are input to one learner.
Further, in Non-Patent Document 1 and Patent Document 1, a technique for inflating learning data by rotating and inverting an image is disclosed, but some of the plurality of learning data after inflating are used for learning. As a result, there may be data that deteriorates the recognition accuracy, and if it is used as it is as learning data after padding, there is a problem that the recognition accuracy cannot be sufficiently improved as a result.
特開2017-45341号公報Japanese Unexamined Patent Publication No. 2017-45341
 かかる状況に鑑みて、本発明は、機械学習において、未知のデータでも精度よく予測できる汎化能力を向上し認識精度を向上すべく、水増し後の学習データから認識精度を悪化させるデータを排除して効率良く学習できる学習方法及び学習装置を提供することを目的とする。 In view of such a situation, the present invention excludes data that deteriorates recognition accuracy from the inflated learning data in order to improve the generalization ability that can accurately predict even unknown data and improve the recognition accuracy in machine learning. It is an object of the present invention to provide a learning method and a learning device capable of learning efficiently.
 上記課題を解決すべく、本発明の第1の観点によれば、学習方法は、水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、複数の学習データに対して各々出力された予測ラベルの確率分布の内、水増し前の学習データの正解との誤差を尺度として選択される少なくとも1つの確率分布を用いて、水増し前の学習データの正解との誤差に基づき学習する。 In order to solve the above problem, according to the first aspect of the present invention, the learning method causes one classifier to input a plurality of learning data after padding with respect to the multidimensional amount of learning data before padding. Of the probability distributions of the prediction labels output for each of the plurality of training data, at least one probability distribution selected based on the error from the correct answer of the training data before padding is used to obtain the training data before padding. Learn based on the error from the correct answer.
 ここで、水増しは、水増し前の学習データに対して、反転、回転や平行移動といった情報劣化を伴わない水増し、又は、水増し前の学習データに対して情報劣化させる水増し、又は、その両方を含める水増しが適用できる。水増し前の学習データの正解と、水増し前の学習データに対して水増し後の複数の学習データの正解は、同じである。情報劣化させても正解ラベルは変わらない。すなわち、情報劣化させた水増し後の複数の学習データにも、水増し前の学習データと同じ正解ラベルが割り当てられ、正解の確率分布も同じである。
 水増し前の学習データの正解との誤差の尺度としては、2つの確率分布がどれくらい離れているかを示す指標であるクロスエントロピーが好適に用いられる。すなわち、損失関数としてクロスエントロピーを用いることにより、効率的な学習が可能となる。
Here, the padding includes padding that does not involve information deterioration such as inversion, rotation, and translation with respect to the training data before padding, and padding that causes information degradation with respect to the learning data before padding, or both. Inflating can be applied. The correct answer of the training data before padding is the same as the correct answer of the plurality of training data after padding with respect to the learning data before padding. The correct label does not change even if the information is degraded. That is, the same correct answer label as the training data before padding is assigned to the plurality of training data after the information deterioration, and the probability distribution of the correct answer is also the same.
As a measure of the error from the correct answer of the training data before padding, cross entropy, which is an index showing how far the two probability distributions are apart, is preferably used. That is, by using cross entropy as a loss function, efficient learning becomes possible.
 また、複数の学習データに対して各々出力された予測ラベルの確率分布の内、水増し前の学習データの正解との誤差を尺度として選択することにより、学習に向かないデータを用いないか、用いるとしてもあまり用いないで学習することができ、汎化能力を向上させることができる。本発明において、学習に向かないデータとは、正解との誤差が小さいものである。 In addition, by selecting the error from the correct answer of the training data before padding as a scale from the probability distributions of the prediction labels output for each of the plurality of training data, data that is not suitable for learning is used or used. Even if it is not used much, it can be learned and the generalization ability can be improved. In the present invention, the data that is not suitable for learning has a small error from the correct answer.
 本発明の第2の観点によれば、学習方法は、水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として選択される少なくとも1つの水増し後の学習データを用いて学習する。 According to the second aspect of the present invention, in the learning method, a plurality of training data after padding are input to one classifier for a multidimensional amount of training data before padding, and for a plurality of training data. Based on the probability distribution of each output prediction label, learning is performed using at least one training data after padding selected by measuring the error from the correct answer of the training data before padding.
 本発明の第3の観点によれば、学習方法は、水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として選択される1つの水増し後の学習データを水増し前の学習データと置き換えて学習する。
 ミニバッチ学習が用いられる場合には、上記の水増し前の多次元量の学習データは、ミニバッチ中の1つの元データであり、元データを水増しして複数の拡張データに拡張し、各々の拡張データを分類器に入力し出力された予測ラベルの確率分布に基づき、元データの正解との誤差を尺度として選択される1つの拡張データを元データと置き換えて学習する。
According to the third aspect of the present invention, in the learning method, a plurality of training data after padding are input to one classifier for a multidimensional amount of training data before padding, and for a plurality of training data. Based on the probability distribution of each output prediction label, one training data after padding selected by using the error from the correct answer of the training data before padding as a scale is replaced with the training data before padding for learning.
When mini-batch learning is used, the multidimensional amount of training data before padding is one source data in the mini-batch, the source data is padded and extended to multiple extended data, and each extended data. Is input to the classifier, and based on the output probability distribution of the prediction label, one extended data selected by using the error from the correct answer of the original data as a scale is replaced with the original data for learning.
 上記の本発明の学習方法では、情報劣化させて水増しした学習データを用いることで、より汎化能力を向上でき、より認識精度を向上できる。ここで、情報劣化させるとは、一部のデータを書き換えて情報を欠落させたり、多次元量の元データに対してノイズを加える、又は、多次元量の内の一部を削除するもしくは所定値に設定し、元データが有する情報量を削減させたりすることをいう。例えば、水増し前の画像のうち特定部分領域を黒く塗りつぶす(特定部分領域のピクセル値を0(ゼロ)とする)、もしくは、特定部分領域のピクセル値をある特定の値に設定することにより情報劣化させることができる。 In the above-mentioned learning method of the present invention, by using the learning data in which information is deteriorated and inflated, the generalization ability can be further improved and the recognition accuracy can be further improved. Here, information deterioration means rewriting a part of data to lose information, adding noise to the original data of a multidimensional amount, deleting a part of the multidimensional amount, or determining. It means to set a value and reduce the amount of information that the original data has. For example, information is degraded by filling a specific partial area in black in the image before padding (the pixel value of the specific partial area is set to 0 (zero)) or by setting the pixel value of the specific partial area to a specific value. Can be made to.
 本発明に用いる学習データは、画像の場合には、1枚の画像は、その1枚の画像を構成するピクセル次元の多次元量、あるいは、その画像の特徴を表す多次元量によって表現され、学習データが1次元時系列データの場合には、各時刻のデータ値を継続時間分並べた多次元量、あるいは、その1次元時系列データの特徴を表す多次元量によって表現され、又は、学習データが多次元時系列データの場合には、各時刻の多次元データ値を継続時間分並べた多次元量、あるいは、その多次元時系列データの特徴を表す多次元量によって表現される。 In the case of images, the training data used in the present invention is represented by a multidimensional quantity of pixel dimensions constituting the one image or a multidimensional quantity representing the characteristics of the image. When the training data is one-dimensional time-series data, it is represented by a multi-dimensional quantity in which the data values at each time are arranged for the duration, or a multi-dimensional quantity representing the characteristics of the one-dimensional time-series data, or training. When the data is multidimensional time series data, it is represented by a multidimensional quantity in which the multidimensional data values at each time are arranged for the duration, or a multidimensional quantity representing the characteristics of the multidimensional time series data.
 本発明の学習方法において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差の大きさに応じて選択される確率を高くしてサンプリングされたデータであることが好ましい。正解との誤差を尺度として選択されるデータとは、確率分布データ、又は、学習データを指す。水増し後の複数の学習データに対して各々出力された予測ラベルの確率分布(カテゴリカル分布)データと、正解ラベルの確率分布データ(ワンホット表現)との誤差を算出し、算出した誤差の大きさに応じて、各々出力された予測ラベルの確率分布データの選択される確率を高くしてサンプリングする。サンプリングにより選択された確率分布データは、正解との誤差が大きいものほど、選択される確率が高くなり選択されやすくなる。 In the learning method of the present invention, the data selected on the basis of the error from the correct answer is preferably the data sampled with a high probability of being selected according to the magnitude of the error from the correct answer. The data selected based on the error from the correct answer refers to probability distribution data or learning data. The error between the probability distribution data (categorical distribution) of the predicted label and the probability distribution data (one-hot expression) of the correct answer label output for each of the multiple training data after padding is calculated, and the calculated error is large. Correspondingly, the probability of selection of the probability distribution data of each output prediction label is increased and sampled. As for the probability distribution data selected by sampling, the larger the error from the correct answer, the higher the probability of selection and the easier it is to select.
 後述する実施例で示すとおり、正解との誤差が大きいデータの方が、学習効率が上がることが分かっている。正解との誤差が大きいデータは、分類器にとって難しいデータであり、簡単なデータ(正解との誤差が小さいデータ)よりも、難しいデータで学習する方が、学習効率が上がるものと推察する。
 しかしながら、一方で、難しいデータばかり学習するよりも、違うデータが混じって学習する方が、より学習効率が上がる傾向にある。そのため、正解との誤差が大きいものほど選択される確率を高くし、正解との誤差が小さいものほど選択される確率を低くして、水増し前の学習データに対して情報劣化させた水増し後の複数の学習データをサンプリングして選択することにしたものである。
 なお、サンプリングされるデータは、1つであっても、2以上であってもよい。
As shown in the examples described later, it is known that the learning efficiency is higher when the data has a larger error from the correct answer. Data with a large error from the correct answer is difficult data for the classifier, and it is presumed that learning with difficult data will improve learning efficiency rather than simple data (data with a small error from the correct answer).
However, on the other hand, learning efficiency tends to be higher when learning a mixture of different data than when learning only difficult data. Therefore, the larger the error from the correct answer, the higher the probability of selection, and the smaller the error from the correct answer, the lower the probability of selection. We decided to sample and select multiple training data.
The number of data to be sampled may be one or two or more.
 本発明の学習方法において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差が最大となるデータであってもよい。
 上述のとおり、正解との誤差が大きいデータの方が、学習効率が上がることが分かっている。正解との誤差が大きいデータは、分類器にとって難しいデータであり、簡単なデータ(正解との誤差が小さいデータ)よりも、難しいデータで学習する方が、学習効率が向上する。
 なお、正解との誤差が最大となるデータは、1つであっても、2以上であってもよい。最大となるデータが2つ以上存在する場合もある。2つ以上存在する場合は、どれか1つのデータをランダムに選択してもよいし、複数のデータを用いてもよい。
In the learning method of the present invention, the data selected on the basis of the error from the correct answer may be the data having the maximum error from the correct answer.
As described above, it is known that the learning efficiency is higher for the data having a large error from the correct answer. Data with a large error from the correct answer is difficult data for the classifier, and learning efficiency is improved by learning with difficult data rather than simple data (data with a small error from the correct answer).
The data that maximizes the error from the correct answer may be one or two or more. There may be two or more maximum data. When there are two or more, any one data may be randomly selected, or a plurality of data may be used.
 本発明の学習方法において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差の平均値との差が最小となるデータであってもよい。
 正解との誤差を尺度として選択される確率分布データや学習データとしては、正解との誤差の平均値や中央値、1/4分位数値、3/4分位数値、荷重平均値に基づいて選択できるが、誤差の平均値に最も近い確率分布から順に選択されるのが好ましい。
 なお、正解との誤差の平均値との差が最小となるデータは、1つであっても、2以上であってもよい。平均値との差が最小となるデータが2つ以上存在する場合もある。2つ以上存在する場合は、どれか1つのデータをランダムに選択してもよいし、複数のデータを用いてもよい。
In the learning method of the present invention, the data selected on the basis of the error from the correct answer may be the data having the minimum difference from the average value of the error from the correct answer.
The probability distribution data and training data selected based on the error from the correct answer are based on the average value and median value of the error from the correct answer, the quartile value, the 3/4 division value, and the load average value. It can be selected, but it is preferable to select in order from the probability distribution closest to the average value of the error.
The data that minimizes the difference from the average value of the error from the correct answer may be one or two or more. There may be two or more data that minimize the difference from the average value. When there are two or more, any one data may be randomly selected, or a plurality of data may be used.
 本発明の学習方法において、上記の誤差を尺度として選択される確率分布は、2つ以上であることが好ましい。選択される確率分布が、2つ以上であることにより、より認識精度を向上させることができる。
 同様に、上記の誤差を尺度として選択される学習データは、2つ以上であることで、より認識精度を向上させることができる。
In the learning method of the present invention, the probability distribution selected with the above error as a scale is preferably two or more. By selecting two or more probability distributions, the recognition accuracy can be further improved.
Similarly, when the number of learning data selected using the above error as a scale is two or more, the recognition accuracy can be further improved.
 本発明の学習方法において、時系列データとは、時間と共に変化するデータであり、例えば、ヒトに装着した加速度センサで得られた時系列の加速度データが挙げられる。ヒトに装着した加速度センサで得られた時系列の加速度データに基づいて、ヒトの日常行動の予測分類を行うことができる。時系列データを小さな時系列データに分割するとは、得られた時系列の加速度データを、ある一定時間の区間に分割して、幾つかの分割領域の加速度データにランダムノイズを加えたり、幾つかの分割領域の加速度データをゼロにするなど劣化した加速度データに修正したりして、それらを水増しデータとして、ヒトの日常行動の分類器モデルを学習することである。 In the learning method of the present invention, the time-series data is data that changes with time, and examples thereof include time-series acceleration data obtained by an acceleration sensor mounted on a human. Based on the time-series acceleration data obtained by the acceleration sensor attached to the human, it is possible to predict and classify the daily behavior of the human. Dividing time-series data into smaller time-series data means dividing the obtained time-series acceleration data into sections of a certain period of time and adding random noise to the acceleration data in some division areas, or some. It is to learn the classifier model of human daily behavior by modifying the acceleration data of the divided region to zero and modifying it to deteriorated acceleration data and using them as inflated data.
 本発明の学習方法において、学習データが画像データである場合には、画像データに対して、回転、反転、ズーム、移動、切り出し、分割、ボカシ、ノイズの少なくとも何れかを施し、学習データが時系列データである場合には、時系列データに対して、反転、移動、切り出し、分割、ノイズの少なくとも何れかを施し、水増し後の複数の学習データとすることでもよい。 In the learning method of the present invention, when the training data is image data, at least one of rotation, inversion, zoom, movement, cropping, division, blurring, and noise is applied to the image data, and the training data is time. In the case of series data, at least one of inversion, movement, cutting, division, and noise may be applied to the time series data to obtain a plurality of training data after padding.
 次に、本発明の学習装置について説明する。
 本発明の第4の観点によれば、学習装置は、水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、複数の学習データに対して各々出力された予測ラベルの確率分布の内、水増し前の学習データの正解との誤差を尺度として少なくとも1つの確率分布を選択する選択部と、選択した確率分布と水増し前の学習データの正解との誤差に基づき学習する学習部を備える。
Next, the learning device of the present invention will be described.
According to the fourth aspect of the present invention, the learning device inputs an inflated processing unit that inflates a plurality of training data with respect to the multidimensional amount of learning data before inflating, and a plurality of learning data after inflating. At least 1 is measured by the error between the label prediction unit that predicts the label using one classifier and the correct answer of the training data before padding in the probability distribution of the prediction labels output for each of the plurality of training data. It includes a selection unit that selects one probability distribution, and a learning unit that learns based on the error between the selected probability distribution and the correct answer of the training data before padding.
 本発明の第5の観点によれば、学習装置は、水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として少なくとも1つの水増し後の学習データを選択する選択部と、選択した水増し後の学習データを用いて学習する学習部を備える。 According to the fifth aspect of the present invention, the learning device inputs an inflated processing unit that inflates a plurality of learning data with respect to the multidimensional amount of learning data before inflating, and a plurality of learning data after inflating. Based on the probability distribution of the prediction label output for each of the multiple training data and the label prediction unit that predicts the label using one classifier, at least 1 is measured by the error between the correct answer of the training data before padding. It is provided with a selection unit for selecting one inflated learning data and a learning unit for learning using the selected inflated learning data.
 本発明の第6の観点によれば、学習装置は、水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として1つの水増し後の学習データを選択する選択部と、選択された水増し後の学習データを水増し前の学習データと置き換えて学習する学習部を備える。 According to the sixth aspect of the present invention, the learning device inputs an inflated processing unit that inflates a plurality of training data with respect to the multidimensional amount of training data before inflating, and a plurality of learning data after inflating. Based on the probability distribution of the prediction label output for each of the multiple training data and the label prediction unit that predicts the label using one classifier, one scale is the error between the correct answer of the training data before padding. It is provided with a selection unit for selecting the learning data after padding, and a learning unit for learning by replacing the selected learning data after padding with the learning data before padding.
 本発明の学習装置の選択部において、正解との誤差を尺度として選択されるデータは、正解との誤差の大きさに応じて選択される確率を高くしてサンプリングされたデータであることが好ましい。正解との誤差を尺度として選択されるデータとは、確率分布データ、又は、学習データを指し、水増し後の複数の学習データに対して各々出力された予測ラベルの確率分布データと、正解ラベルの確率分布データとの誤差を算出し、算出した誤差の大きさに応じて、各々出力された予測ラベルの確率分布データの選択される確率を高くしてサンプリングする。
 また、本発明の学習装置の選択部において、正解との誤差を尺度として選択されるデータは、正解との誤差が最大となるデータであってもよい。さらに、本発明の学習装置の選択部において、正解との誤差を尺度として選択されるデータは、正解との誤差の平均値との差が最小となるデータであってもよい。
 また、本発明の学習装置の水増し処理部において、一部のデータを書き換えて水増し前の多次元量の学習データを情報欠落させることが好ましい。情報劣化させて水増しした学習データを用いることで、より汎化能力を向上でき、より認識精度を向上できる。
In the selection unit of the learning device of the present invention, the data selected on the basis of the error from the correct answer is preferably data sampled with a high probability of being selected according to the magnitude of the error from the correct answer. .. The data selected based on the error from the correct answer refers to the probability distribution data or the training data, and the probability distribution data of the prediction label output for each of the plurality of training data after padding and the correct answer label. The error from the probability distribution data is calculated, and the probability of selection of the probability distribution data of each output prediction label is increased and sampled according to the magnitude of the calculated error.
Further, in the selection unit of the learning device of the present invention, the data selected on the basis of the error from the correct answer may be the data having the maximum error from the correct answer. Further, in the selection unit of the learning device of the present invention, the data selected on the basis of the error from the correct answer may be the data having the minimum difference from the average value of the error from the correct answer.
Further, in the padding processing unit of the learning device of the present invention, it is preferable to rewrite a part of the data so that the learning data of the multidimensional amount before padding is lost. By using the learning data in which the information is deteriorated and inflated, the generalization ability can be further improved and the recognition accuracy can be further improved.
 本発明の学習装置において、水増し処理部は、一部のデータを書き換えて水増し前の多次元量の学習データを情報劣化させ、学習データが画像の場合には、1枚の画像は、その1枚の画像を構成するピクセル次元の多次元量、あるいは、その画像の特徴を表す多次元量によって表現され、学習データが1次元時系列データの場合には、各時刻のデータ値を継続時間分並べた多次元量、あるいは、その1次元時系列データの特徴を表す多次元量によって表現され、又は、学習データが多次元時系列データの場合には、各時刻の多次元データ値を継続時間分並べた多次元量、あるいは、その多次元時系列データの特徴を表す多次元量によって表現される。情報劣化は、多次元量の元データに対してノイズを加える、又は、多次元量の内の一部を削除するもしくは所定値に設定し、元データが有する情報量を削減することでもよい。 In the learning device of the present invention, the padding processing unit rewrites a part of the data to deteriorate the learning data of the multidimensional amount before padding, and when the learning data is an image, one image is the 1st. When the training data is one-dimensional time-series data, which is represented by a pixel-dimensional multidimensional quantity that constitutes an image or a multidimensional quantity that represents the characteristics of the image, the data value at each time is used for the duration. If the multidimensional quantity is arranged or is represented by a multidimensional quantity that represents the characteristics of the one-dimensional time series data, or if the training data is multidimensional time series data, the multidimensional data value at each time is used for the duration. It is represented by a multidimensional quantity arranged in a fraction, or a multidimensional quantity that represents the characteristics of the multidimensional time series data. The information deterioration may be performed by adding noise to the original data of the multidimensional amount, or deleting a part of the multidimensional amount or setting it to a predetermined value to reduce the amount of information contained in the original data.
 本発明の学習方法及び学習装置によれば、水増し後の学習データから認識精度を悪化させるデータを排除して効率良く学習でき、汎化能力及び認識精度を向上できるといった効果がある。 According to the learning method and learning device of the present invention, there is an effect that data that deteriorates recognition accuracy can be excluded from the learning data after padding and learning can be performed efficiently, and generalization ability and recognition accuracy can be improved.
実施例1の学習装置の機能ブロック図Functional block diagram of the learning device of the first embodiment 実施例1の学習方法の概略フロー図Schematic flow chart of the learning method of Example 1 水増し前の画像イメージImage image before padding 水増し前の画像の切り出しの説明図Explanatory drawing of cutout of image before padding 切り出した後の画像Image after cutting out 確率分布の出力イメージOutput image of probability distribution 平均周りのM個を選択する場合の調和処理の説明図Explanatory diagram of harmonization processing when selecting M pieces around the average 学習データの水増し方法の説明図Explanatory diagram of how to inflate learning data 予測精度とモデルサイズの関係を示すグラフ(1)Graph showing the relationship between prediction accuracy and model size (1) 予測精度とモデルサイズの関係を示すグラフ(2)Graph showing the relationship between prediction accuracy and model size (2) 実施例2の学習方法の概略フロー図Schematic flow chart of the learning method of Example 2 実施例3の学習方法の概略フロー図Schematic flow chart of the learning method of Example 3 実施例4の学習装置の機能ブロック図Functional block diagram of the learning device of the fourth embodiment ミニバッチの学習データの拡張と選択の説明図Illustrated diagram of expansion and selection of training data for mini-batch 代表的なデータ拡張の説明図Explanatory diagram of typical data expansion 誤差の平均値に近い確率分布の拡張データ画像を選択するやり方の説明図Explanatory diagram of how to select an extended data image with a probability distribution close to the average value of the error 誤差最大の確率分布の拡張データ画像を選択するやり方の説明図Explanatory diagram of how to select an extended data image of the probability distribution with the maximum error 誤差の大きさに応じて選択される確率を高くしてサンプリングし選択するやり方の説明図その1Explanatory drawing of the method of sampling and selecting with a high probability of being selected according to the magnitude of the error Part 1 誤差の大きさに応じて選択される確率を高くしてサンプリングし選択するやり方の説明図その2Explanatory drawing of the method of sampling and selecting with a high probability of being selected according to the magnitude of the error Part 2
 以下、本発明の実施形態の一例を、図面を参照しながら詳細に説明していく。なお、本発明の範囲は、以下の実施例や図示例に限定されるものではなく、幾多の変更及び変形が可能である。 Hereinafter, an example of the embodiment of the present invention will be described in detail with reference to the drawings. The scope of the present invention is not limited to the following examples and illustrated examples, and many modifications and modifications can be made.
 図1は、本発明の学習装置の一実施態様の機能ブロック図を示している。図1に示すように、学習装置10は、水増し処理部1、ラベル予測部2、選択部3及び学習部4を備える。水増し処理部1は、水増し前の学習データ11が入力され、反転、回転や平行移動といった情報劣化を伴わない水増しを行い、或いは、学習データ11が有する情報を劣化させて水増しを行い、水増し後の学習データ21を出力する。ラベル予測部2は、水増し後の学習データ21を分類器22に入力し、それらの複数の学習データに対して各々、予測ラベルを出力する。そして、選択部3は、出力された予測ラベルの確率分布15を入力し、水増し前の学習データの正解との誤差31を計算し、誤差を尺度として確立分布15の内で何れかの確率分布を選択する。具体的には、誤差31の平均値に近い確率分布、或いは、誤差31が最も大きい確率分布、或いは、誤差31の大きさに応じて選択される確率が高くなるようにしてサンプリングして選択された確率分布の何れかを選択する。学習部4は、正解との誤差を尺度として選択された確率分布32を入力し、さらに正解との誤差計算41がなされ、分類器22の重みパラメータの調整42が行われる。
 本実施例では、水増し前の学習データを情報劣化させて水増しを行う例を説明するが、反転、回転や平行移動といった情報劣化を伴わない水増しにも本発明は有用である。
FIG. 1 shows a functional block diagram of an embodiment of the learning device of the present invention. As shown in FIG. 1, the learning device 10 includes an inflating processing unit 1, a label prediction unit 2, a selection unit 3, and a learning unit 4. The padding processing unit 1 is input with the learning data 11 before padding, and performs padding without information deterioration such as inversion, rotation, and translation, or deteriorates the information possessed by the learning data 11 to paddle, and after padding. The learning data 21 of the above is output. The label prediction unit 2 inputs the inflated learning data 21 to the classifier 22, and outputs prediction labels for each of the plurality of training data. Then, the selection unit 3 inputs the probability distribution 15 of the output prediction label, calculates the error 31 from the correct answer of the training data before padding, and uses the error as a scale to calculate any probability distribution in the probability distribution 15. Select. Specifically, it is sampled and selected so that the probability distribution close to the average value of the error 31 or the probability distribution with the largest error 31 or the probability of being selected according to the magnitude of the error 31 is high. Select one of the probability distributions. The learning unit 4 inputs the probability distribution 32 selected with the error from the correct answer as a scale, further calculates the error 41 from the correct answer, and adjusts the weight parameter 42 of the classifier 22.
In this embodiment, an example in which the learning data before padding is inflated by information deterioration will be described, but the present invention is also useful for padding without information deterioration such as inversion, rotation, and translation.
 図2は、本発明の学習方法の一実施態様の処理フロー図を示している。図2に示すように、まず、学習データを水増しする(ステップS01)。水増し前の画像イメージ図を図3に示す。本実施例では、図3に示す画像5を水増し前の1つの学習データとして使用する。画像5は、背景が黒色で、白色で数字“7”が表示されたものである。 FIG. 2 shows a processing flow diagram of one embodiment of the learning method of the present invention. As shown in FIG. 2, first, the training data is inflated (step S01). The image image before padding is shown in FIG. In this embodiment, the image 5 shown in FIG. 3 is used as one learning data before padding. Image 5 has a black background and is white with the number "7" displayed.
 図4には、水増し前の1枚の元画像から情報劣化された複数の水増し後の画像を抽出する一例を示す。具体的には、水増し前の画像5から4枚の水増し後の画像51~54を切り出す例を示す。ここで、水増しは、4枚の画像の切り出しに限定されるものではなく、更により多くの画像の切り出しを行うことが可能である。図4に示すように、4枚の水増し後の画像51~54は、それぞれ、“7”の一部が切り取られ、情報劣化した画像となっている。例えば、画像51は“7”の右上の位置、画像52は“7”の左上の位置を切り出したものである。また、画像53は“7”の中央よりやや右下、画像54は“7”の下方の位置を切り出したものである。このように画像の一部を切り出すことにより、水増し前の学習データを情報劣化させて、水増し後の複数の学習データを作成できる。 FIG. 4 shows an example of extracting a plurality of images after padding whose information has been deteriorated from one original image before padding. Specifically, an example of cutting out four images 51 to 54 after inflating from the image 5 before inflating will be shown. Here, the padding is not limited to cutting out four images, and it is possible to cut out even more images. As shown in FIG. 4, each of the four inflated images 51 to 54 is an image in which information is deteriorated by cutting out a part of "7". For example, the image 51 is a cutout of the upper right position of "7", and the image 52 is a cutout of the upper left position of "7". Further, the image 53 is a slightly lower right position from the center of the “7”, and the image 54 is a cutout of a lower position of the “7”. By cutting out a part of the image in this way, it is possible to deteriorate the training data before the padding and create a plurality of training data after the padding.
 図5(1)~(4)は、切り出した後の画像を示している。図5(1)に示す画像51は、人の目で容易に数字“7”の一部であると判別できるのに対して、図5(2)に示す画像52、図5(3)に示す画像53及び図5(4)に示す画像54は、人の目では画像の内容が数字“7”の一部であるか否かを判別困難なものである。 FIGS. 5 (1) to 5 (4) show images after cutting out. The image 51 shown in FIG. 5 (1) can be easily identified as a part of the number “7” by the human eye, whereas the images 52 and 5 (3) shown in FIG. 5 (2) show. In the image 53 shown and the image 54 shown in FIG. 5 (4), it is difficult for the human eye to determine whether or not the content of the image is a part of the number “7”.
 これら水増しされた4つの画像51~54を学習データとして1つの分類器に入力する(ステップS02)。図6は、分類器による画像認識によって分類された確率分布の出力イメージである。図6(1)は、複数の学習データに対して各々出力された予測ラベルの確率分布であり、図6(2)は、水増し前の学習データの正解を示す確率分布である。図6(1)に示すように、予測ラベルは数字“0”~“9”までの10種類であり、n=1からn=Nまでの確率分布が表示されている。ここでは、4つの画像51~54を学習データとして1つの分類器に入力しているので、N=4となる。水増し前の学習データの画像は数字“7”であるため、学習データの正解を示す確率分布は、予測ラベル“0”~“9”の内、“7”が1となり、“0”~“6”、“8”及び“9”が0となる(ワンホット表現)。 The four inflated images 51 to 54 are input to one classifier as learning data (step S02). FIG. 6 is an output image of a probability distribution classified by image recognition by a classifier. FIG. 6 (1) is a probability distribution of the prediction labels output for each of the plurality of training data, and FIG. 6 (2) is a probability distribution showing the correct answer of the training data before padding. As shown in FIG. 6 (1), there are 10 types of prediction labels from the numbers "0" to "9", and the probability distribution from n = 1 to n = N is displayed. Here, since the four images 51 to 54 are input to one classifier as learning data, N = 4. Since the image of the training data before padding is the number "7", the probability distribution indicating the correct answer of the training data is "7" among the prediction labels "0" to "9", and "0" to "". 6 ”,“ 8 ”and“ 9 ”are 0 (one-hot expression).
 次に、4つの水増し後の学習データに対して各々出力された予測ラベルの4つの確率分布と、水増し前の学習データの正解とのクロスエントロピー誤差をそれぞれ算出し、得られた誤差の平均値を算出する(ステップS03)。ここで、水増し後の学習データは、水増し前の学習データに対して情報劣化しただけであり、水増し前の学習データの正解ラベルが継承される。
 図7は、平均値周りのM個(Mは自然数)を選択する場合の調和処理の説明図を示している。入力する画像6をx、正解ラベル7をyとしている。ここでは、正解ラベル7としては“Frog”が設定されている。水増し後の画像データ群60の例として、M個の画像データがあり、図7ではその内、4個の画像データ(6a~6d)が表示されている。図7におけるグラフ9は、分かりやすくするために画像データを1次元の線上の1点として表しているが、実際の画像データは超多次元であり、例えば、28×28ピクセルの画像データの場合では、784(=28×28)次元となる。4個の画像データ(6a~6d)の内、1番左の画像データ6aは、カエルの顔が隠れており、カエルらしくないため、1次元の線上において、他の画像データの点との距離が遠くなっている。
Next, the cross-entropy error between the four probability distributions of the prediction labels output for each of the four training data after padding and the correct answer of the training data before padding was calculated, and the average value of the obtained errors was calculated. Is calculated (step S03). Here, the learning data after inflating is only information-deteriorated with respect to the learning data before inflating, and the correct label of the learning data before inflating is inherited.
FIG. 7 shows an explanatory diagram of the harmonization process when M pieces (M is a natural number) around the average value are selected. An image 6 that input x i, the true label 7 is set to y i. Here, "Frog" is set as the correct answer label 7. As an example of the image data group 60 after padding, there are M image data, and in FIG. 7, four image data (6a to 6d) are displayed. In the graph 9 in FIG. 7, the image data is represented as one point on a one-dimensional line for the sake of clarity, but the actual image data is super-multidimensional, for example, in the case of 28 × 28 pixel image data. Then, it becomes 784 (= 28 × 28) dimension. Of the four image data (6a to 6d), the leftmost image data 6a has a frog's face hidden and is not like a frog, so it is a distance from other image data points on a one-dimensional line. Is far away.
 グラフ9の横軸は、水増し後の画像データ群60とそのラベル7から計算された誤差を示している。例えば、カエルの顔が隠れている一番左の画像データ6aは、正しく分類するのが困難なため誤差が大きく、最も右よりの位置に点が描かれている。グラフ9の破線14は、クロスエントロピー誤差関数(損失関数)の出力の平均値を示している。水増し後の画像データ群60の多くが、正しく分類することができるだけの情報があるならば、損失関数の出力の平均値の近くにあり、正しく分類することができないデータは、反対に平均値から離れており、学習に有用でない画像データと判断できる。言いかえれば、平均値に近い画像データは、学習に有用なデータで学習後の分類器の精度が向上できることから、積極的に学習に利用するのである。
 4個の画像データ(6a~6d)の中では、グラフ9の破線14で示される損失関数の出力の平均値に近い2個の画像データ(6b,6c)が学習に有用なデータとして抽出され、学習に使用されることになる。この他、画像データ群60には、画像データ(6a~6d)以外の画像データも含まれており、ここでは2枚の画像(6e,6f)が学習に使用される。
The horizontal axis of the graph 9 shows the error calculated from the image data group 60 after padding and its label 7. For example, the leftmost image data 6a in which the frog's face is hidden has a large error because it is difficult to classify it correctly, and a point is drawn at the position from the rightmost position. The broken line 14 in the graph 9 shows the average value of the output of the cross entropy error function (loss function). If most of the inflated image data group 60 has enough information to classify correctly, it is close to the average value of the output of the loss function, and the data that cannot be correctly classified is from the average value. It can be judged that the image data is distant and not useful for learning. In other words, image data close to the average value is useful for learning and can improve the accuracy of the classifier after learning, so it is actively used for learning.
Of the four image data (6a to 6d), two image data (6b, 6c) close to the average value of the output of the loss function shown by the broken line 14 in the graph 9 are extracted as useful data for learning. , Will be used for learning. In addition, the image data group 60 also includes image data other than the image data (6a to 6d), and here, two images (6e, 6f) are used for learning.
 4個の学習データに対して各々出力された予測ラベルの確率分布の内、クロスエントロピー誤差関数(損失関数)の出力の平均値との誤差が小さい確率分布を選択する(ステップS04)。そして、ステップS04で選択された予測ラベルの確率分布と正解との誤差が小さくなるように、すなわち、クロスエントロピー誤差関数(損失関数)を最適化し、誤差逆伝搬法を用いて分類器の重みパラメータを調整する(ステップS05)。 From the probability distributions of the prediction labels output for each of the four training data, a probability distribution having a small error from the average value of the output of the cross entropy error function (loss function) is selected (step S04). Then, the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected in step S04 and the correct answer becomes small, and the weight parameter of the classifier is used by the error back propagation method. Is adjusted (step S05).
(画像データセットを用いた実験1)
 学習データとして画像データセット(CIFAR-10,CIFAR-100)を使用して、本実施例の学習装置及び学習方法の有効性を検証した。CIFAR-10及びCIFAR-100は、32×32ピクセルのRGB画像で構成されている。分類器モデルとしては、WideResNet28-4にDropout(p=0.3)を導入したものを用いている。ここで、Dropoutとは、学習時のネットワークについて、隠れ層のニューロンを一定確率で無効化する手法であり、汎化性能の向上に役立つものである。
(Experiment 1 using image data set)
Using image data sets (CIFAR-10, CIFAR-100) as training data, the effectiveness of the learning device and learning method of this example was verified. CIFAR-10 and CIFAR-100 are composed of 32 × 32 pixel RGB images. As a classifier model, a model in which Dropout (p = 0.3) is introduced into WideResNet28-4 is used. Here, Dropout is a method of invalidating neurons in the hidden layer with a certain probability for a network during learning, and is useful for improving generalization performance.
 学習データの水増し方法としては、ゼロパディング(zero-padding)、ランダム水平反転(Random horizontal-flip)、ランダム切り出し(random crop)、及び切り抜き(cutout)を行った。図8は、学習データの水増し方法の説明イメージ図であり、(1)は水増し前の画像データ、(2)はゼロパディング、(3)はランダム水平反転、(4)はランダム切り出し、(5)は切り抜きの例を示している。
 図8(2)に示す画像61は、図8(1)に示す画像6につきゼロパディングを施したものであり、画像の周囲がゼロ埋めされている。ゼロ埋めされる範囲は上下左右いずれも4ピクセルである。図8(3)に示す画像62は、画像6を所定の確率で水平反転したものである。図8(4)に示す画像63は、画像6をランダムに切り出したものである。切り出しサイズは、画像61と同じ32×32ピクセルである。図8(5)に示す画像64は、画像6中に切り抜き部64aを設けたものである。画像61が大きな画像であっても、切り抜き部64aを設ける箇所によっては、その画像の重要な部分が消える可能性がある。
As a method of padding the training data, zero-padding, random horizontal-flip, random crop, and cutout were performed. FIG. 8 is an explanatory image diagram of the method of padding the training data, (1) is the image data before padding, (2) is zero padding, (3) is random horizontal inversion, (4) is randomly cut out, and (5). Shows an example of clipping.
The image 61 shown in FIG. 8 (2) is obtained by applying zero padding to the image 6 shown in FIG. 8 (1), and the periphery of the image is filled with zeros. The range filled with zeros is 4 pixels in all directions. The image 62 shown in FIG. 8 (3) is an image 6 horizontally inverted with a predetermined probability. The image 63 shown in FIG. 8 (4) is a randomly cut out image 6. The cutout size is 32 × 32 pixels, which is the same as the image 61. The image 64 shown in FIG. 8 (5) has a cutout portion 64a provided in the image 6. Even if the image 61 is a large image, an important portion of the image may disappear depending on the location where the cutout portion 64a is provided.
 バッチサイズは64、エポック数は100である。最適化手法としては、確率的勾配降下法(SGD:Stochastic Gradient Descent)を用いた。学習率(Learning Rate)は0.1、重み減衰(weight decay)は0.0005、慣性項(Momentum)は0.9であり、学習率を40、60、80エポック毎に0.2倍にした。
 調和方法としては、32個の水増し後の画像データを用意し、それらの誤差を計算して、誤差の平均に近いM個のサンプルを選択し学習に使用した。具体的な調和処理は、上述した通りである。
The batch size is 64 and the number of epochs is 100. As an optimization method, a stochastic gradient descent (SGD) method was used. The learning rate is 0.1, the weight decay is 0.0005, and the momentum is 0.9, and the learning rate is increased by 0.2 times every 40, 60, and 80 epochs. did.
As a harmonization method, 32 inflated image data were prepared, their errors were calculated, and M samples close to the average of the errors were selected and used for learning. The specific harmonization process is as described above.
 下記表1は、CIFAR-10又はCIFAR-100を用いた場合において、上記の32個の水増し後の画像データの中から、ランダムに1つの画像を選択した比較例1と、上記の32個の水増し後の画像データの中から、誤差の平均に一番近いものを選択(M=1)した実施例Aを比較した結果を表したものである。
 また、下記表2は、CIFAR-10、又は、CIFAR-100を用いた場合において、8個の画像に水増し後、選択をせずに8個全てを学習に使用した比較例2と、上記の32個の水増し後の画像データの中から、誤差の平均に近いものから4つを選択(M=4)した実施例B、誤差の平均に近いものから8つを選択(M=8)した実施例Cを比較した結果を表したものである。下記表1及び表2の実験結果は、いずれも最上位の候補のエラー率(%)を表している。
Table 1 below shows Comparative Example 1 in which one image was randomly selected from the above 32 inflated image data when CIFAR-10 or CIFAR-100 was used, and the above 32 images. It shows the result of comparing Example A in which the image data closest to the average error was selected (M = 1) from the image data after padding.
In addition, Table 2 below shows Comparative Example 2 in which CIFAR-10 or CIFAR-100 was used, and after padding into 8 images, all 8 images were used for learning without selection, and the above. From the 32 inflated image data, 4 were selected from those close to the average error (M = 4), and 8 were selected from those close to the average error (M = 8). It shows the result of having compared Example C. The experimental results in Tables 1 and 2 below represent the error rate (%) of the top candidate.
Figure JPOXMLDOC01-appb-T000001
Figure JPOXMLDOC01-appb-T000001
Figure JPOXMLDOC01-appb-T000002
Figure JPOXMLDOC01-appb-T000002
 上記表1に示されるように、CIFAR-10を用いた場合は、比較例1では5.43%±0.174、誤差の平均に一番近いものを選択した実施例Aでは4.97%±0.095となり、実施例Aの方が、エラー率が低いことが判った。また、CIFAR-100を用いた場合についても、比較例1では24.63%±0.280、実施例Aでは23.93%±0.225となり、実施例Aの方が、エラー率が低いことが判った。 As shown in Table 1 above, when CIFAR-10 was used, it was 5.43% ± 0.174 in Comparative Example 1 and 4.97% in Example A in which the one closest to the average error was selected. It was ± 0.095, and it was found that the error rate was lower in Example A. Also, when CIFAR-100 is used, it is 24.63% ± 0.280 in Comparative Example 1 and 23.93% ± 0.225 in Example A, and the error rate is lower in Example A. It turned out.
 上記表2に示されるように、CIFAR-10を用いた場合は、比較例2では4.34%±0.108、誤差の平均に近いものから4つを選択した実施例Bでは3.85%±0.039、誤差の平均に近いものから8つを選択した実施例Cでは4.02%±0.158となり、比較例2よりも実施例B又は実施例Cの方が、エラー率が低いことが判った。また、CIFAR-100を用いた場合についても、比較例2では20.64%±0.095、実施例Bでは20.05%±0.140、実施例Cでは19.32%±0.295となり、比較例2よりも実施例B又は実施例Cの方が、エラー率が低いことが判った。 As shown in Table 2 above, when CIFAR-10 was used, it was 4.34% ± 0.108 in Comparative Example 2, and 3.85 in Example B in which four were selected from those close to the average error. % ± 0.039, which is 4.02% ± 0.158 in Example C in which eight are selected from those close to the average of the errors, and the error rate in Example B or Example C is higher than that in Comparative Example 2. Turned out to be low. Further, also in the case of using CIFAR-100, 20.64% ± 0.095 in Comparative Example 2, 20.05% ± 0.140 in Example B, and 19.32% ± 0.295 in Example C. Therefore, it was found that the error rate was lower in Example B or Example C than in Comparative Example 2.
(画像データセットを用いた実験2)
 実験1と同様に、学習データとして画像データセット(CIFAR-10,CIFAR-100)を使用して、本実施例の学習装置及び学習方法の有効性を検証した。畳み込みニューラルネットワーク(Convolutional Neural Network:CNN)としては、VGG6、VGG9、VGG13及びVGG16の4種類を用いた。下記表3は、本実験に用いた分類器モデルの概要を表したものである。
(Experiment 2 using image data set)
Similar to Experiment 1, the effectiveness of the learning device and learning method of this example was verified using the image data set (CIFAR-10, CIFAR-100) as the learning data. As the convolutional neural network (CNN), four types of VGG6, VGG9, VGG13 and VGG16 were used. Table 3 below shows the outline of the classifier model used in this experiment.
Figure JPOXMLDOC01-appb-T000003
Figure JPOXMLDOC01-appb-T000003
 訓練用の教師画像として50000枚、テスト用の評価画像として10000枚を使用した。本実施例との比較を目的とし、情報劣化処理がない水増しありの場合(比較例3)と、水増しなしの場合(比較例4)について実験を行った。情報劣化処理がない水増しとは、通常の水増しのことであり、例えば、元画像の反転のことである。 50,000 images were used as teacher images for training and 10,000 images were used as evaluation images for testing. For the purpose of comparison with this example, experiments were conducted in the case of having inflated without information deterioration treatment (Comparative Example 3) and in the case of not inflating (Comparative Example 4). Inflating without information deterioration processing is normal inflating, for example, inversion of the original image.
 図9及び図10は、予測精度とモデルサイズの関係を示すグラフを示している。図9に示すグラフは、32×32ピクセルの水増し前の画像データを、水増し後は27×27ピクセルに情報劣化処理した実施例Dと、情報劣化処理がない水増しありの比較例3aと、水増しなしの比較例4aをプロットしたものである。図9(1)はCIFAR-10を用いた場合の最上位の候補の予測精度、図9(2)はCIFAR-100を用いた場合の最上位の候補の予測精度、図9(3)はCIFAR-100を用いた場合の上位5つの候補の予測精度を表している。図9(1)~(3)の何れについても、実施例Dの予測精度が、比較例3a又は比較例4aと同等あるいはより高い精度であったことが確認できた。実施例Dでは、比較例3a,4aと比べて、最上位の候補だけでなく、上位5つの候補の予測精度において、学習モデルのサイズが小さくなっても精度が落ち難いことがわかった。 9 and 10 show graphs showing the relationship between prediction accuracy and model size. The graph shown in FIG. 9 shows Example D in which the image data of 32 × 32 pixels before padding was subjected to information deterioration processing to 27 × 27 pixels after padding, and Comparative Example 3a with padding without information deterioration treatment and padding. It is a plot of Comparative Example 4a without. FIG. 9 (1) shows the prediction accuracy of the top candidate when CIFAR-10 is used, FIG. 9 (2) shows the prediction accuracy of the top candidate when CIFAR-100 is used, and FIG. 9 (3) shows the prediction accuracy of the top candidate. It shows the prediction accuracy of the top five candidates when CIFAR-100 is used. In each of FIGS. 9 (1) to 9 (3), it was confirmed that the prediction accuracy of Example D was equal to or higher than that of Comparative Example 3a or Comparative Example 4a. In Example D, as compared with Comparative Examples 3a and 4a, it was found that the prediction accuracy of not only the top candidate but also the top five candidates is less likely to decrease even if the size of the learning model is reduced.
 また、図10に示すグラフは、32×32ピクセルの水増し前の画像データを、水増し後は23×23ピクセルに情報劣化処理した実施例Eと水増しありの比較例3bと水増しなしの比較例4bをプロットしたものである。図10(1)はCIFAR-10を用いた場合の最上位の候補の予測精度、図10(2)はCIFAR-100を用いた場合の最上位の候補の予測精度、図10(3)はCIFAR-100を用いた場合の上位5つの候補の予測精度を表している。図10(1)~(3)の何れについても、実施例Eの予測精度が、比較例3b、比較例4bと比べて、高い精度であったことが確認できた。また、実施例Eの場合も、実施例Dと同様に、最上位の候補だけでなく、上位5つの候補の予測精度においても、モデルサイズが小さくなっても精度が落ち難いことがわかった。 Further, the graph shown in FIG. 10 shows Example E in which information deterioration processing was performed on the image data of 32 × 32 pixels before padding to 23 × 23 pixels after padding, Comparative Example 3b with padding, and Comparative Example 4b without padding. Is a plot. FIG. 10 (1) shows the prediction accuracy of the top candidate when CIFAR-10 is used, FIG. 10 (2) shows the prediction accuracy of the top candidate when CIFAR-100 is used, and FIG. 10 (3) shows the prediction accuracy of the top candidate. It shows the prediction accuracy of the top five candidates when CIFAR-100 is used. It was confirmed that the prediction accuracy of Example E was higher than that of Comparative Example 3b and Comparative Example 4b in any of FIGS. 10 (1) to 10 (3). Further, in the case of Example E as well, it was found that, as in Example D, the accuracy of prediction of not only the top candidate but also the top five candidates does not easily decrease even if the model size becomes small.
 図11は、本発明の学習方法の他の実施態様の処理フロー図を示す。実施例1と同様に、学習データ(画像)を水増しする(ステップS11)。水増し後の画像は、それぞれ、水増し前の元画像に対して情報劣化した画像であっても、回転や平行移動といった元画像に対して情報劣化を伴わない画像であってもよい。水増し後の複数の画像を1つの分類器に入力する(ステップS12)。分類器による画像認識によって分類され出力された予測ラベルの確率分布の出力イメージや水増し前の学習データの正解を示す確率分布は、図6に示したものと同様である。
 次に、水増し後の複数の画像に対して各々出力された予測ラベルの複数の確率分布と、水増し前の元画像の正解とのクロスエントロピー誤差をそれぞれ算出する(ステップS13)。水増し後の複数の画像に対して各々出力された予測ラベルの確率分布の内、クロスエントロピー誤差関数(損失関数)の出力が最大の確率分布を選択する(ステップS14)。そして、ステップS14で選択された予測ラベルの確率分布と正解との誤差が小さくなるように、すなわち、クロスエントロピー誤差関数(損失関数)を最適化し、誤差逆伝搬法を用いて分類器の重みパラメータを調整する(ステップS15)。
FIG. 11 shows a processing flow diagram of another embodiment of the learning method of the present invention. The learning data (image) is inflated in the same manner as in the first embodiment (step S11). The images after padding may be images whose information is deteriorated with respect to the original image before padding, or images which are not accompanied by information degradation with respect to the original image such as rotation and translation. A plurality of inflated images are input to one classifier (step S12). The output image of the probability distribution of the predicted label classified and output by the image recognition by the classifier and the probability distribution showing the correct answer of the training data before padding are the same as those shown in FIG.
Next, the cross-entropy error between the plurality of probability distributions of the predicted labels output for each of the plurality of images after padding and the correct answer of the original image before padding is calculated (step S13). Among the probability distributions of the prediction labels output for each of the plurality of padded images, the probability distribution having the maximum output of the cross entropy error function (loss function) is selected (step S14). Then, the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected in step S14 and the correct answer becomes small, and the weight parameter of the classifier is used by the error back propagation method. Is adjusted (step S15).
 図12は、本発明の学習方法の他の実施態様の処理フロー図を示す。実施例1と同様に、学習データ(画像)を水増しする(ステップS21)。水増し後の画像は、それぞれ、水増し前の元画像に対して情報劣化した画像であっても、回転や平行移動といった元画像に対して情報劣化を伴わない画像であってもよい。水増し後の複数の画像を1つの分類器に入力する(ステップS22)。分類器による画像認識によって分類され出力された予測ラベルの確率分布の出力イメージや水増し前の学習データの正解を示す確率分布は、図6に示したものと同様である。
 次に、水増し後の複数の画像に対して各々出力された予測ラベルの複数の確率分布と、水増し前の元画像の正解とのクロスエントロピー誤差をそれぞれ算出する(ステップS23)。水増し後の複数の画像に対して各々出力された予測ラベルの確率分布の中から、誤差の大きさに応じて選択される確率を高くしてサンプリングして、確率分布を選択する(ステップS24)。そして、ステップS24でサンプリングにより選択された予測ラベルの確率分布と正解との誤差が小さくなるように、すなわち、クロスエントロピー誤差関数(損失関数)を最適化し、誤差逆伝搬法を用いて分類器の重みパラメータを調整する(ステップS25)。
FIG. 12 shows a processing flow diagram of another embodiment of the learning method of the present invention. The learning data (image) is inflated in the same manner as in the first embodiment (step S21). The images after padding may be images whose information is deteriorated with respect to the original image before padding, or images which are not accompanied by information degradation with respect to the original image such as rotation and translation. A plurality of inflated images are input to one classifier (step S22). The output image of the probability distribution of the predicted label classified and output by the image recognition by the classifier and the probability distribution showing the correct answer of the training data before padding are the same as those shown in FIG.
Next, the cross entropy error between the plurality of probability distributions of the predicted labels output for each of the plurality of images after padding and the correct answer of the original image before padding is calculated (step S23). From the probability distributions of the prediction labels output for each of the plurality of inflated images, the probability of being selected is increased according to the magnitude of the error and sampled, and the probability distribution is selected (step S24). .. Then, the cross entropy error function (loss function) is optimized so that the error between the probability distribution of the prediction label selected by sampling in step S24 and the correct answer becomes small, and the classifier uses the error backpropagation method. The weight parameter is adjusted (step S25).
 図13は、本発明の学習装置の他の実施態様の機能ブロック図を示している。図13に示すように、学習装置10aは、水増し処理部1、ラベル予測部2、選択部3及び学習部4aを備える。実施例1で示した学習装置10と同様に、水増し処理部1は、水増し前の学習データ11が入力され、水増し後の学習データ21を出力する。ラベル予測部2は、水増し後の学習データ21を分類器22に入力し、それらの複数の学習データに対して各々、予測ラベルを出力する。そして、選択部3は、出力された予測ラベルの確率分布15を入力し、水増し前の学習データの正解との誤差31を計算し、誤差を尺度として学習データを選択する。具体的には、誤差31の平均値に近い確率分布となった水増し後の学習データ、或いは、誤差31が最も大きい確率分布となった水増し後の学習データ、或いは、誤差31の大きさに応じて選択される確率が高くなるようにしてサンプリングして選択された水増し後の学習データの何れかを選択する。
 そして、学習部4aでは、正解との誤差を尺度として選択された水増し後の学習データを分類器22が入力し、分類器22が出力した予測ラベルの確率分布と正解との誤差計算41がなされ、分類器22の重みパラメータの調整42が行われる。
FIG. 13 shows a functional block diagram of another embodiment of the learning device of the present invention. As shown in FIG. 13, the learning device 10a includes an inflating processing unit 1, a label prediction unit 2, a selection unit 3, and a learning unit 4a. Similar to the learning device 10 shown in the first embodiment, the padding processing unit 1 inputs the learning data 11 before padding and outputs the learning data 21 after padding. The label prediction unit 2 inputs the inflated learning data 21 to the classifier 22, and outputs prediction labels for each of the plurality of training data. Then, the selection unit 3 inputs the probability distribution 15 of the output prediction label, calculates the error 31 from the correct answer of the learning data before padding, and selects the learning data using the error as a scale. Specifically, depending on the inflated training data having a probability distribution close to the average value of the error 31, the inflated learning data having the largest probability distribution of the error 31, or the magnitude of the error 31. One of the inflated training data selected by sampling so as to increase the probability of being selected is selected.
Then, in the learning unit 4a, the classifier 22 inputs the training data after padding selected with the error from the correct answer as a scale, and the error calculation 41 between the probability distribution of the prediction label output by the classifier 22 and the correct answer is performed. , The weight parameter adjustment 42 of the classifier 22 is performed.
 分類器を学習するためのデータの扱い方の違いにより、バッチ学習、ミニバッチ学習、 オンライン学習という学習手法が存在することが知られているが、通常、深層学習(DNN)では、過学習の抑制や膨大な学習データに対して学習を可能とするために、全データをミニバッチと呼ばれる単位に学習データを分割し、1つのミニバッチに含まれるデータを分類器に入力させて出力する予測ラベルの確率分布と、正解との誤差を小さくするように、分類器のパラメータの更新を行う。分類器のパラメータの一回の更新は、バッチサイズの学習データで行う。そして、全てのミニバッチについて学習を行い、さらに繰り返し学習を行う。 It is known that there are learning methods such as batch learning, mini-batch learning, and online learning due to the difference in how data is handled for learning the classifier, but deep learning (DNN) usually suppresses overfitting. In order to enable learning for a huge amount of training data, the training data is divided into units called mini-batch, and the data contained in one mini-batch is input to the classifier and output. Update the parameters of the classifier so that the error between the distribution and the correct answer is small. The parameters of the classifier are updated once with the batch size training data. Then, learning is performed for all mini-batch, and further repeated learning is performed.
 本発明の学習方法において、ミニバッチ学習が用いられる場合には、以下のa)~d)のプロセスが行われる。図14は、ミニバッチを用いる学習方法の概念図を示している。
a)先ず、ミニバッチ中の1つの元データ(水増し前の学習データ)に対して、元データを情報劣化させた複数の拡張データ(水増し後の複数の学習データ)を作成する(図14における「拡張」の矢印を参照)。
b)次に、各々の拡張データを分類器に入力し予測ラベルの確率分布を出力させる。
c)出力された予測ラベルの確率分布に基づき、元データの正解との誤差を尺度として1つの拡張データを選択する(図14における「選択」の矢印を参照)。
 後述するように、誤差を尺度として拡張データを選択するやり方は、誤差の平均値に近い確率分布の拡張データを選択、誤差が最大の確率分布の拡張データを選択、又は、誤差の大きさに応じて選択される確率を高くしてサンプリングして拡張データを選択の何れかのやり方を用いる。
d)選択した拡張データを元データと置き換えて、ミニバッチの学習データとして分類器を訓練する。
When mini-batch learning is used in the learning method of the present invention, the following processes a) to d) are performed. FIG. 14 shows a conceptual diagram of a learning method using a mini-batch.
a) First, for one original data (learning data before padding) in the mini-batch, a plurality of extended data (a plurality of training data after padding) in which the original data is information-degraded are created (“Learning data after padding” in FIG. 14). See the extended arrow).
b) Next, each extended data is input to the classifier and the probability distribution of the prediction label is output.
c) Based on the probability distribution of the output prediction label, one extended data is selected using the error from the correct answer of the original data as a measure (see the “selection” arrow in FIG. 14).
As will be described later, the method of selecting the extended data using the error as a measure is to select the extended data of the probability distribution close to the average value of the error, select the extended data of the probability distribution with the largest error, or adjust to the magnitude of the error. Use one of the methods of selecting the extended data by sampling with a high probability of being selected accordingly.
d) Replace the selected extended data with the original data and train the classifier as training data for the mini-batch.
 ここで、元データを情報劣化させ複数の拡張データを作成する拡張手法について説明する。図15に代表的なデータ拡張を示す。図15に示すのは、反転、切り抜き、カットアウト(Cutout)、ミックスアップ(Mixup)の4つのデータ拡張である。反転やその他の回転や平行移動は、比較的に元データの情報はそのまま温存して拡張するものである。また、切り抜きは、元画像における重要部分はそのままで、余白など不要部分をカットし拡張するものである。反転、回転、平行移動、及び、切り抜きは、元データを情報劣化するものではない。
 一方、カットアウトは、切り抜きとは異なり重要部分の一部をカット(隠蔽する)ものであり、ミックスアップは2つの画像を融合するものであり、両方とも元データを情報劣化させるものである。例えば、カットアウトでは、元データからカットする部分の大きさと位置をランダムに選択することで、ランダムな拡張データを作成できる。
 なお、拡張手法は、上記に限定されるものではなく、その他の手法も適用可能である。
Here, an extension method for creating a plurality of extended data by degrading the original data will be described. FIG. 15 shows a typical data extension. FIG. 15 shows four data extensions: inversion, cropping, cutout, and mixup. Inversion and other rotations and translations relatively preserve and extend the information in the original data. In addition, the cropping cuts and expands unnecessary parts such as margins while leaving the important parts in the original image as they are. Inversion, rotation, translation, and cropping do not degrade the original data.
On the other hand, unlike the cutout, the cutout cuts (hides) a part of the important part, and the mixup fuses the two images, both of which deteriorate the information of the original data. For example, in the cutout, random extended data can be created by randomly selecting the size and position of the portion to be cut from the original data.
The extended method is not limited to the above, and other methods can also be applied.
 正解との誤差を尺度として拡張データを選択する3つのやり方について、図16~18を参照して説明する。図16は誤差の平均値に近い確率分布の拡張データを選択するやり方の説明図であり、図17は誤差が最大の確率分布の拡張データを選択するやり方の説明図であり、図18は誤差の大きさに応じて選択される確率を高くしてサンプリングして拡張データを選択するやり方の説明図である。
 まず、正解との誤差の平均値に近い確率分布の拡張データを選択するやり方は、図16に示すとおり、ミニバッチ中の元画像それぞれにつき、元画像1枚をM枚に拡張し、学習途中の分類器(DNN)にM枚の拡張データ画像を入力し、M枚の各々の拡張データ画像の予測ラベルの確率分布を出力させる。元画像を拡張した拡張データ画像の正解ラベルは、元画像の正解ラベルと同じである。分類器から出力された拡張データ画像の予測ラベルの確率分布と、正解とのクロスエントロピー誤差を計算する。そして、誤差の平均値に最も近い拡張データ画像を元画像と置き換え、ミニバッチの学習データとする。
Three methods of selecting extended data based on the error from the correct answer will be described with reference to FIGS. 16 to 18. FIG. 16 is an explanatory diagram of a method of selecting extended data of a probability distribution close to the average value of errors, FIG. 17 is an explanatory diagram of a method of selecting extended data of a probability distribution having the maximum error, and FIG. 18 is an explanatory diagram of a method of selecting extended data of a probability distribution having the maximum error. It is explanatory drawing of the method of selecting extended data by sampling with a high probability of being selected according to the size of.
First, as shown in FIG. 16, the method of selecting the extended data of the probability distribution close to the average value of the error from the correct answer is to expand one original image to M for each original image in the mini-batch, and during learning. M extended data images are input to the classifier (DNN), and the probability distribution of the prediction label of each of the M extended data images is output. The correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image closest to the average value of the error is replaced with the original image, and the training data of the mini-batch is used.
 次に、正解との誤差が最大の確率分布の拡張データを選択するやり方は、図17に示すとおり、ミニバッチ中の元画像それぞれにつき、元画像1枚をM枚に拡張し、学習途中の分類器(DNN)にM枚の拡張データ画像を入力し、M枚の各々の拡張データ画像の予測ラベルの確率分布を出力させる。元画像を拡張した拡張データ画像の正解ラベルは、元画像の正解ラベルと同じである。分類器から出力された拡張データ画像の予測ラベルの確率分布と、正解とのクロスエントロピー誤差を計算する。そして、誤差が最大の拡張データ画像を元画像と置き換え、ミニバッチの学習データとする。 Next, as shown in FIG. 17, the method of selecting the extended data of the probability distribution having the largest error from the correct answer is to expand one original image to M for each original image in the mini-batch and classify during learning. M extended data images are input to the device (DNN), and the probability distribution of the prediction label of each of the M extended data images is output. The correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image having the largest error is replaced with the original image, and the learning data of the mini-batch is used.
 次に、誤差の大きさに応じて選択される確率を高くしてサンプリングして拡張データを選択するやり方は、図18及び図19に示すとおり、ミニバッチ中の元画像それぞれにつき、元画像1枚をM枚に拡張し、学習途中の分類器(DNN)にM枚の拡張データ画像を入力し、M枚の各々の拡張データ画像の予測ラベルの確率分布を出力させる。元画像を拡張した拡張データ画像の正解ラベルは、元画像の正解ラベルと同じである。分類器から出力された拡張データ画像の予測ラベルの確率分布と、正解とのクロスエントロピー誤差を計算する。そして、誤差の大きさに応じて選択される確率を高くしてサンプリングして選択された拡張データ画像を元画像と置き換え、ミニバッチの学習データとする。 Next, as shown in FIGS. 18 and 19, the method of selecting the extended data by sampling with a high probability of being selected according to the magnitude of the error is as shown in FIGS. 18 and 19, for each original image in the mini-batch, one original image. Is expanded to M sheets, M sheets of extended data images are input to the classifier (DNN) in the middle of learning, and the probability distribution of the prediction label of each of the M sheets of extended data images is output. The correct label of the extended data image obtained by extending the original image is the same as the correct label of the original image. Calculate the cross-entropy error between the probability distribution of the prediction label of the extended data image output from the classifier and the correct answer. Then, the extended data image selected by sampling with a high probability of being selected according to the magnitude of the error is replaced with the original image to obtain the learning data of the mini-batch.
 誤差の大きさに応じて選択される確率を高くする場合には、図19に示すとおり、誤差を正規化し、さらに最小誤差に基づく相対誤差を用いて選択される確率を設定することでもよい。最小誤差を基準とする相対誤差を用いることにより、誤差最小の拡張データ画が選択される確率が0となり、サンプリング対象から除外することができる。正解との誤差が小さいものは、簡単に正解に辿りつけるデータであり、学習に向かないデータである。このため、相対誤差の大きさに応じて選択される確率を高くして、学習に向かないデータをサンプリング対象から除外して、拡張データを選択する。 When increasing the probability of selection according to the magnitude of the error, as shown in FIG. 19, the error may be normalized and the probability of selection may be set using a relative error based on the minimum error. By using the relative error based on the minimum error, the probability that the extended data image with the minimum error is selected becomes 0, and it can be excluded from the sampling target. Data with a small error from the correct answer is data that can easily reach the correct answer and is not suitable for learning. Therefore, the probability of being selected according to the magnitude of the relative error is increased, data that is not suitable for learning is excluded from the sampling target, and extended data is selected.
(画像データセットを用いた実験3)
 学習データとして画像データセット(CIFAR-10,CIFAR-100)を使用して、本実施例の学習方法の有効性を検証した。CIFAR-10及びCIFAR-100は、32×32ピクセルのRGB画像で構成されている。分類器モデルは、PreAct ResNet18を用いた。
 学習データの水増し方法は、通常の水増しとして、ゼロパディング(zero-padding)、ランダム水平反転(Random horizontal-flip)、ランダム切り出し(random crop)を用い、情報劣化させるデータ拡張として、カットアウト(Cutout)とミックスアップ(Mixup)を用いた。
(Experiment 3 using image data set)
Using image data sets (CIFAR-10, CIFAR-100) as training data, the effectiveness of the learning method of this example was verified. CIFAR-10 and CIFAR-100 are composed of 32 × 32 pixel RGB images. As a classifier model, PreAct ResNet18 was used.
The training data padding method uses zero-padding, random horizontal-flip, and random crop as normal padding, and cutout (Cutout) as data expansion that degrades information. ) And Mixup were used.
 バッチサイズは64、エポック数は100である。最適化手法としては、確率的勾配降下法(SGD:Stochastic Gradient Descent)を用いた。学習率(Learning Rate)は0.1、重み減衰(weight decay)は0.0005、慣性項(Momentum)は0.9であり、学習率を40、60、80エポック毎に0.2倍にした。 The batch size is 64 and the number of epochs is 100. As an optimization method, a stochastic gradient descent method (SGD: Stochastic Gradient Descent) was used. The learning rate (Learning Rate) is 0.1, the weight decay (weight decay) is 0.0005, and the inertial term (Momentum) is 0.9, and the learning rate is increased by 0.2 times every 40, 60, and 80 epochs. did.
 下記表4,5は、学習データとして、それぞれCIFAR-10、CIFAR-100を用いた場合において、ミニバッチ中の元画像を8個の画像に水増し後、以下の4通りの学習方法で学習を行い、正解率の結果を表したものである。なお、実験結果は、いずれも最上位の候補の正解率(%)を表している。
・水増し後の画像を選択せずに元画像を分類器の学習に使用したケース(比較例5)。
・誤差の平均に最も近いものを1つ選択し、選択した拡張画像を元画像と置き換えて分類器の学習に使用したケース(実施例E)。
・誤差が最大のものを1つ選択し、選択した拡張画像を元画像と置き換えて分類器の学習に使用したケース(実施例F)。
・誤差の大きさに応じて選択される確率が高くなるようにしてサンプリングして1つ選択し、選択した拡張画像を元画像と置き換えて分類器の学習に使用したケース(実施例G)。
In Tables 4 and 5 below, when CIFAR-10 and CIFAR-100 are used as learning data, the original images in the mini-batch are inflated to 8 images, and then learning is performed by the following 4 learning methods. , Shows the result of the correct answer rate. The experimental results all represent the correct answer rate (%) of the top candidate.
-A case where the original image was used for learning the classifier without selecting the inflated image (Comparative Example 5).
-A case in which one that is closest to the average error is selected, the selected extended image is replaced with the original image, and the classifier is used for learning (Example E).
-A case in which one with the largest error is selected, the selected extended image is replaced with the original image, and the classifier is used for learning (Example F).
-A case in which sampling is performed so that the probability of selection is high according to the magnitude of the error, one is selected, and the selected extended image is replaced with the original image and used for learning the classifier (Example G).
 ここで、水増し1は、反転など情報劣化を伴わない通常の水増しを元画像に行い8画像に水増ししたものである。また、水増し2は、通常の水増しと情報劣化を伴うカットアウト(Cutout)を使用し水増しを元画像に行い8画像に水増ししたものである。また、水増し3は、通常の水増しと情報劣化を伴うミックスアップ(Mixup)を使用し水増しを元画像に行い8画像に水増ししたものである。 Here, the padding 1 is a padding of the original image by performing a normal padding without information deterioration such as inversion, and padding to 8 images. Further, the padding 2 is a padding of the original image using a normal padding and a cutout accompanied by information deterioration, and padding to 8 images. Further, the padding 3 is a padding of the original image using a normal padding and a mixup accompanied by information deterioration, and padding to 8 images.
Figure JPOXMLDOC01-appb-T000004
Figure JPOXMLDOC01-appb-T000004
Figure JPOXMLDOC01-appb-T000005
Figure JPOXMLDOC01-appb-T000005
 上記表4の実験結果に示されるように、CIFAR-10を用いた場合は、情報劣化を伴わない水増し1、情報劣化を伴う水増し2及び水増し3、すべての場合で、実施例Fと実施例Gが、従来の学習方法である比較例5よりも正解率が高いことが示された。ただし、実施例Eについては、水増し1の場合には、比較例5よりも正解率が高いが、水増し2,3の場合には、略同等であるか比較例5よりも正解率が低かった。
 また、上記表5の実験結果に示されるように、CIFAR-100を用いた場合は、水増し時の情報劣化の有無にかかわらず、3種の水増し後の画像の選択方法に対応する実施例E、実施例F、実施例Gの全てで、従来の学習方法である比較例5よりも正解率が高いことが示された。また、3種の水増しを比較すると、情報劣化を伴う水増し(水増し2,3)が、情報劣化を伴わない水増し1より、ほとんどすべての場合で正解率が高いことが示された。
As shown in the experimental results in Table 4 above, when CIFAR-10 was used, inflating 1 without information deterioration, inflating 2 and inflating 3 with information deterioration, Example F and Example in all cases. It was shown that G has a higher accuracy rate than Comparative Example 5, which is a conventional learning method. However, in Example E, in the case of padding 1, the correct answer rate was higher than that of Comparative Example 5, but in the case of padding 2 and 3, the correct answer rate was substantially the same or lower than that of Comparative Example 5. ..
Further, as shown in the experimental results in Table 5 above, when CIFAR-100 is used, Example E corresponding to three types of image selection methods after padding, regardless of the presence or absence of information deterioration at the time of padding. , Example F and Example G all showed that the correct answer rate was higher than that of Comparative Example 5, which is a conventional learning method. Comparing the three types of padding, it was shown that the padding with information deterioration (padding 2 and 3) had a higher accuracy rate in almost all cases than the padding 1 without information deterioration.
 本発明は、機械学習、特に、DNNにおいて高精度な認識を可能とする技術として有用である。 The present invention is useful as a technique that enables highly accurate recognition in machine learning, especially in DNN.
 1 水増し処理部
 2 ラベル予測部
 3 選択部
 4,4a 学習部
 5,6,6a~6f,51~54,61~64 画像
 7 ラベル
 8,9 グラフ
 10,10a 学習装置
 11 水増し前の学習データ
 15 出力された予測ラベルの確率分布
 21 水増し後の学習データ
 22 分類器
 31 水増し前の学習データの正解との誤差
 32 正解との誤差を尺度として選択された確率分布
 41 正解との誤差計算
 42 分類器の重みパラメータを調整
 60 画像データ群
 64a 切り抜き部
1 Inflating processing unit 2 Label prediction unit 3 Selection unit 4, 4a Learning unit 5, 6, 6a to 6f, 51 to 54, 61 to 64 Image 7 Label 8, 9 Graph 10, 10a Learning device 11 Learning data before padding 15 Probability distribution of output prediction label 21 Training data after padding 22 Classifier 31 Error with correct answer of training data before padding 32 Probability distribution selected based on error with correct answer 41 Error calculation with correct answer 42 Classifier Adjust the weight parameter of 60 Image data group 64a Cutout part

Claims (21)

  1.  水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、前記複数の学習データに対して各々出力された予測ラベルの確率分布の内、水増し前の学習データの正解との誤差を尺度として選択される少なくとも1つの確率分布を用いて、水増し前の学習データの正解との誤差に基づき学習することを特徴とする学習方法。 For the multidimensional amount of training data before padding, a plurality of training data after padding are input to one classifier, and among the probability distributions of the prediction labels output for each of the plurality of training data, before padding. A learning method characterized in that learning is performed based on the error from the correct answer of the training data before padding by using at least one probability distribution selected by measuring the error from the correct answer of the learning data of.
  2.  水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、前記複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として選択される少なくとも1つの水増し後の学習データを用いて学習することを特徴とする学習方法。 For the multidimensional amount of training data before padding, a plurality of training data after padding are input to one classifier, and based on the probability distribution of the prediction label output for each of the plurality of training data, before padding. A learning method characterized in that learning is performed using at least one inflated learning data selected based on an error from the correct answer of the training data of.
  3.  水増し前の多次元量の学習データに対して水増し後の複数の学習データを1つの分類器に入力させ、前記複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として選択される1つの水増し後の学習データを水増し前の学習データと置き換えて学習することを特徴とする学習方法。 For the multidimensional amount of training data before padding, a plurality of training data after padding are input to one classifier, and based on the probability distribution of the prediction label output for each of the plurality of training data, before padding. A learning method characterized in that one inflated learning data selected with an error from the correct answer of the inflated learning data is replaced with the uninflated learning data for learning.
  4.  ミニバッチ学習が用いられる場合に、
     上記の水増し前の多次元量の学習データは、ミニバッチ中の1つの元データであり、
     元データを水増しして複数の拡張データに拡張し、各々の拡張データを分類器に入力し出力された予測ラベルの確率分布に基づき、元データの正解との誤差を尺度として選択される1つの拡張データを元データと置き換えて学習することを特徴とする請求項3に記載の学習方法。
    When mini-batch learning is used
    The above-mentioned multidimensional amount of training data before padding is one source data in the mini-batch.
    One that is selected by inflating the original data and expanding it to multiple extended data, inputting each extended data into the classifier, and based on the probability distribution of the output prediction label, using the error from the correct answer of the original data as a measure. The learning method according to claim 3, wherein the extended data is replaced with the original data for learning.
  5.  上記の正解との誤差を尺度として選択されるデータは、正解との誤差の大きさに応じて選択される確率を高くしてサンプリングされたデータである請求項1~4の何れかに記載の学習方法。 The data selected based on the error from the correct answer is the data sampled with a high probability of being selected according to the magnitude of the error from the correct answer, according to any one of claims 1 to 4. Learning method.
  6.  上記の正解との誤差を尺度として選択されるデータは、正解との誤差が最大となるデータである請求項1~4の何れかに記載の学習方法。 The learning method according to any one of claims 1 to 4, wherein the data selected based on the error from the correct answer is the data having the maximum error from the correct answer.
  7.  上記の正解との誤差を尺度として選択されるデータは、正解との誤差の平均値との差が最小となるデータである請求項1~4の何れかに記載の学習方法。 The learning method according to any one of claims 1 to 4, wherein the data selected based on the error from the correct answer is the data in which the difference from the average value of the error from the correct answer is minimized.
  8.  上記の誤差を尺度として選択される確率分布は、2つ以上であることを特徴とする請求項1に記載の学習方法。 The learning method according to claim 1, wherein the probability distribution selected using the above error as a scale is two or more.
  9.  前記学習データが画像の場合には、1枚の画像は、その1枚の画像を構成するピクセル次元の多次元量、あるいは、その画像の特徴を表す多次元量によって表現され、
     前記学習データが1次元時系列データの場合には、各時刻のデータ値を継続時間分並べた多次元量、あるいは、その1次元時系列データの特徴を表す多次元量によって表現され、
     又は、前記学習データが多次元時系列データの場合には、各時刻の多次元データ値を継続時間分並べた多次元量、あるいは、その多次元時系列データの特徴を表す多次元量によって表現されたことを特徴とする請求項1~8の何れかに記載の学習方法。
    When the training data is an image, one image is represented by a pixel-dimensional multidimensional quantity constituting the one image or a multidimensional quantity representing the characteristics of the image.
    When the training data is one-dimensional time-series data, it is represented by a multi-dimensional quantity in which the data values at each time are arranged for the duration, or a multi-dimensional quantity representing the characteristics of the one-dimensional time-series data.
    Alternatively, when the training data is multidimensional time series data, it is represented by a multidimensional quantity in which the multidimensional data values of each time are arranged for the duration, or a multidimensional quantity representing the characteristics of the multidimensional time series data. The learning method according to any one of claims 1 to 8, wherein the learning method has been performed.
  10.  前記水増し後の複数の学習データは、水増し前の多次元量の学習データに対して情報劣化させたデータであることを特徴とする請求項1~9の何れかに記載の学習方法。 The learning method according to any one of claims 1 to 9, wherein the plurality of learning data after the padding are data whose information is degraded with respect to the learning data of the multidimensional amount before padding.
  11.  前記情報劣化が、一部のデータを書き換えて情報欠落させることを特徴とする請求項10に記載の学習方法。 The learning method according to claim 10, wherein the information deterioration rewrites a part of the data to cause information to be lost.
  12.  前記水増し後の複数の学習データは、水増し前の多次元量の学習データに対して情報劣化させたデータであり、
     前記学習データが画像の場合には、1枚の画像は、その1枚の画像を構成するピクセル次元の多次元量、あるいは、その画像の特徴を表す多次元量によって表現され、
     前記学習データが1次元時系列データの場合には、各時刻のデータ値を継続時間分並べた多次元量、あるいは、その1次元時系列データの特徴を表す多次元量によって表現され、
     又は、前記学習データが多次元時系列データの場合には、各時刻の多次元データ値を継続時間分並べた多次元量、あるいは、その多次元時系列データの特徴を表す多次元量によって表現され、
     前記情報劣化は、前記多次元量の元データに対してノイズを加える、又は、多次元量の内の一部を削除するもしくは所定値に設定し、元データが有する情報量を削減することを特徴とする請求項1~9の何れかに記載の学習方法。
    The plurality of training data after the padding are data in which information is degraded with respect to the learning data of the multidimensional amount before the padding.
    When the training data is an image, one image is represented by a pixel-dimensional multidimensional quantity constituting the one image or a multidimensional quantity representing the characteristics of the image.
    When the training data is one-dimensional time-series data, it is represented by a multi-dimensional quantity in which the data values at each time are arranged for the duration, or a multi-dimensional quantity representing the characteristics of the one-dimensional time-series data.
    Alternatively, when the training data is multidimensional time series data, it is represented by a multidimensional quantity in which the multidimensional data values of each time are arranged for the duration, or a multidimensional quantity representing the characteristics of the multidimensional time series data. Being done
    The information deterioration is to add noise to the original data of the multidimensional amount, or delete a part of the multidimensional amount or set it to a predetermined value to reduce the amount of information possessed by the original data. The learning method according to any one of claims 1 to 9, which is characterized.
  13.  前記学習データが画像データである場合には、
     前記画像データに対して、回転、反転、ズーム、移動、切り出し、分割、ボカシ、ノイズの少なくとも何れかを施し、
     前記学習データが時系列データである場合には、
     前記時系列データに対して、反転、移動、切り出し、分割、ノイズの少なくとも何れかを施し、
     前記水増し後の複数の学習データとすることを特徴とする請求項1~12の何れかに記載の学習方法。
    When the training data is image data,
    The image data is subjected to at least one of rotation, inversion, zoom, movement, cropping, division, blurring, and noise.
    When the training data is time series data,
    At least one of inversion, movement, clipping, division, and noise is applied to the time series data.
    The learning method according to any one of claims 1 to 12, wherein a plurality of learning data after the padding are used.
  14.  水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、
     水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、
     前記複数の学習データに対して各々出力された予測ラベルの確率分布の内、水増し前の学習データの正解との誤差を尺度として少なくとも1つの確率分布を選択する選択部と、
     選択した確率分布と水増し前の学習データの正解との誤差に基づき学習する学習部、
     を備えたことを特徴とする学習装置。
    An inflating processing unit that inflates multiple learning data with respect to the multidimensional amount of learning data before inflating,
    A label prediction unit that inputs multiple training data after padding and predicts labels using one classifier,
    A selection unit that selects at least one probability distribution based on the error from the correct answer of the training data before padding from the probability distributions of the prediction labels output for each of the plurality of training data.
    Learning unit that learns based on the error between the selected probability distribution and the correct answer of the learning data before padding,
    A learning device characterized by being equipped with.
  15.  水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、
     水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、
     前記複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として少なくとも1つの水増し後の学習データを選択する選択部と、
     選択された水増し後の学習データを用いて学習する学習部、
     を備えたことを特徴とする学習装置。
    An inflating processing unit that inflates multiple learning data with respect to the multidimensional amount of learning data before inflating,
    A label prediction unit that inputs multiple training data after padding and predicts labels using one classifier,
    Based on the probability distribution of the prediction label output for each of the plurality of training data, a selection unit that selects at least one training data after padding based on an error from the correct answer of the training data before padding, and a selection unit.
    Learning unit that learns using the selected inflated learning data,
    A learning device characterized by being equipped with.
  16.  水増し前の多次元量の学習データに対して複数の学習データに水増しする水増し処理部と、
     水増し後の複数の学習データを入力し、1つの分類器を用いてラベルを予測するラベル予測部と、
     前記複数の学習データに対して各々出力された予測ラベルの確率分布に基づき、水増し前の学習データの正解との誤差を尺度として1つの水増し後の学習データを選択する選択部と、
     選択された水増し後の学習データを水増し前の学習データと置き換えて学習する学習部、
     を備えたことを特徴とする学習装置。
    An inflating processing unit that inflates multiple learning data with respect to the multidimensional amount of learning data before inflating,
    A label prediction unit that inputs multiple training data after padding and predicts labels using one classifier,
    Based on the probability distribution of the prediction label output for each of the plurality of training data, a selection unit that selects one training data after padding based on the error from the correct answer of the training data before padding, and a selection unit.
    A learning unit that replaces the selected training data after padding with the learning data before padding and learns.
    A learning device characterized by being equipped with.
  17.  前記選択部において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差の大きさに応じて選択される確率を高くしてサンプリングされたデータである請求項14~16の何れかに記載の学習装置。 The data selected by the selection unit on the basis of the error from the correct answer is the data sampled with a high probability of being selected according to the magnitude of the error from the correct answer, according to claims 14 to 16. The learning device according to any one.
  18.  前記選択部において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差が最大となるデータである請求項14~16の何れかに記載の学習装置。 The learning device according to any one of claims 14 to 16, wherein the data selected by the selection unit on the basis of the error from the correct answer is the data having the maximum error from the correct answer.
  19.  前記選択部において、上記の正解との誤差を尺度として選択されるデータは、正解との誤差の平均値との差が最小となるデータである請求項14~16の何れかに記載の学習装置。 The learning device according to any one of claims 14 to 16, wherein the data selected by the selection unit on the basis of the error from the correct answer is the data having the minimum difference from the average value of the error from the correct answer. ..
  20.  前記水増し処理部は、一部のデータを書き換えて水増し前の多次元量の学習データを情報欠落させることを特徴とする請求項14~19の何れかに記載の学習装置。 The learning device according to any one of claims 14 to 19, wherein the padding processing unit rewrites a part of data to lack information in a multidimensional amount of learning data before padding.
  21.  前記水増し処理部は、一部のデータを書き換えて水増し前の多次元量の学習データを情報劣化させ、
     前記学習データが画像の場合には、1枚の画像は、その1枚の画像を構成するピクセル次元の多次元量、あるいは、その画像の特徴を表す多次元量によって表現され、
     前記学習データが1次元時系列データの場合には、各時刻のデータ値を継続時間分並べた多次元量、あるいは、その1次元時系列データの特徴を表す多次元量によって表現され、
     又は、前記学習データが多次元時系列データの場合には、各時刻の多次元データ値を継続時間分並べた多次元量、あるいは、その多次元時系列データの特徴を表す多次元量によって表現され、
     前記情報劣化は、前記多次元量の元データに対してノイズを加える、又は、多次元量の内の一部を削除するもしくは所定値に設定し、元データが有する情報量を削減することを特徴とする請求項20に記載の学習装置。
    The padding processing unit rewrites a part of the data to deteriorate the learning data of the multidimensional amount before padding.
    When the training data is an image, one image is represented by a pixel-dimensional multidimensional quantity constituting the one image or a multidimensional quantity representing the characteristics of the image.
    When the training data is one-dimensional time-series data, it is represented by a multi-dimensional quantity in which the data values at each time are arranged for the duration, or a multi-dimensional quantity representing the characteristics of the one-dimensional time-series data.
    Alternatively, when the training data is multidimensional time series data, it is represented by a multidimensional quantity in which the multidimensional data values of each time are arranged for the duration, or a multidimensional quantity representing the characteristics of the multidimensional time series data. Being done
    The information deterioration is to add noise to the original data of the multidimensional amount, or delete a part of the multidimensional amount or set it to a predetermined value to reduce the amount of information possessed by the original data. The learning device according to claim 20, wherein the learning device is characterized.
PCT/JP2020/043248 2019-11-19 2020-11-19 Learning method and learning device employing augmentation WO2021100818A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021558450A JP7160416B2 (en) 2019-11-19 2020-11-19 LEARNING METHOD AND LEARNING DEVICE USING PADDING

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019209179 2019-11-19
JP2019-209179 2019-11-19

Publications (1)

Publication Number Publication Date
WO2021100818A1 true WO2021100818A1 (en) 2021-05-27

Family

ID=75980143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/043248 WO2021100818A1 (en) 2019-11-19 2020-11-19 Learning method and learning device employing augmentation

Country Status (2)

Country Link
JP (1) JP7160416B2 (en)
WO (1) WO2021100818A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7346767B1 (en) 2023-07-21 2023-09-19 修二 奥野 Learning device and reasoning device
WO2023248948A1 (en) * 2022-06-24 2023-12-28 株式会社東京ウエルズ Learning device, learning method, and learning program
JP7468472B2 (en) 2021-07-08 2024-04-16 Jfeスチール株式会社 Trained model generation method, recognition method, and information processing device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018018350A (en) * 2016-07-28 2018-02-01 富士通株式会社 Image recognition device, image recognition program, image recognition method and recognition device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7124404B2 (en) 2018-04-12 2022-08-24 富士通株式会社 Machine learning program, machine learning method and machine learning apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018018350A (en) * 2016-07-28 2018-02-01 富士通株式会社 Image recognition device, image recognition program, image recognition method and recognition device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HIRUKAWA, SHOYA ET AL.: "Implementation and Evaluation of Optical Character Recognition Considering Color of Papers and Distortion of Spread Books", LECTURE PROCEEDINGS OF 18TH FORUM ON INFORMATION TECHNOLOGY FIT2019, vol. 3, 20 August 2019 (2019-08-20), pages 25 - 30, XP055825762 *
IDE, ATSUYA ET AL.: "Learning method by harmonizing predictive output for multiple samples of learning input", IBIS2019 THE 22ND INFORMATION-BASED INDUCTION SCIENCES WORKSHOP: POSTER SESSION PREVIEW SLIDE, 20 November 2019 (2019-11-20), Retrieved from the Internet <URL:https://drive.google.com/file/d/1f_FRq182b-RdRC2SRAla3D6fEctqYvyR/view?usp=sharing> [retrieved on 20210125] *
KONO, YOHEI ET AL.: "Data expansion using GAN", IPSJ SIG TECHNICAL REPORTS, vol. 2017 -CV, no. 14, 3 May 2017 (2017-05-03), pages 2 - 5, ISSN: 2188-8701, Retrieved from the Internet <URL:https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_uri&item_id=178747&file_id=1&file_no=1> [retrieved on 20210115] *
SEKIZAWA, AKIRA ET AL.: "Training of Traffic Sign Detector and Classifier Using Synthetic Road Scenes", IEICE TECHNICAL REPORT, vol. 118, no. 362, 6 December 2018 (2018-12-06), pages 73 - 78, ISSN: 2432-6380, Retrieved from the Internet <URL:https://www.ieice.org/ken/user/index.php?cmd=download&p=fEOj&t=IEICE-PRMU&1=35412853852cf393d627f8785553370589207eb8ba1197ddb29472a33c3a30d8&lang> [retrieved on 20210125] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7468472B2 (en) 2021-07-08 2024-04-16 Jfeスチール株式会社 Trained model generation method, recognition method, and information processing device
WO2023248948A1 (en) * 2022-06-24 2023-12-28 株式会社東京ウエルズ Learning device, learning method, and learning program
JP7346767B1 (en) 2023-07-21 2023-09-19 修二 奥野 Learning device and reasoning device

Also Published As

Publication number Publication date
JPWO2021100818A1 (en) 2021-05-27
JP7160416B2 (en) 2022-10-25

Similar Documents

Publication Publication Date Title
WO2021100818A1 (en) Learning method and learning device employing augmentation
US11468262B2 (en) Deep network embedding with adversarial regularization
CN109840531B (en) Method and device for training multi-label classification model
CN107836000A (en) For Language Modeling and the improved artificial neural network of prediction
CN110570433B (en) Image semantic segmentation model construction method and device based on generation countermeasure network
US20200134463A1 (en) Latent Space and Text-Based Generative Adversarial Networks (LATEXT-GANs) for Text Generation
US20230222353A1 (en) Method and system for training a neural network model using adversarial learning and knowledge distillation
CN110969086B (en) Handwritten image recognition method based on multi-scale CNN (CNN) features and quantum flora optimization KELM
JP6965206B2 (en) Clustering device, clustering method and program
CN115335830A (en) Neural architecture search with weight sharing
WO2002054757A1 (en) Data coding method and device, and data coding program
EP4092555A1 (en) Control method, information processing device, and control program
CN113963165A (en) Small sample image classification method and system based on self-supervision learning
CN111292349B (en) Data enhancement method for target detection based on fusion of recommendation candidate boxes
CN113886626A (en) Visual question-answering method of dynamic memory network model based on multiple attention mechanism
Calder et al. Use and misuse of machine learning in anthropology
JP6988995B2 (en) Image generator, image generator and image generator
CN114091597A (en) Countermeasure training method, device and equipment based on adaptive group sample disturbance constraint
WO2020230777A1 (en) Training method for machine learning model, data generation device, and trained machine learning model
Lima et al. Automatic design of deep neural networks applied to image segmentation problems
Chen et al. Mixing high-dimensional features for JPEG steganalysis with ensemble classifier
KR20240034804A (en) Evaluating output sequences using an autoregressive language model neural network
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
KR102305981B1 (en) Method for Training to Compress Neural Network and Method for Using Compressed Neural Network
Wang Two-dimensional entropy method based on genetic algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891335

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021558450

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20891335

Country of ref document: EP

Kind code of ref document: A1