WO2021215261A1

WO2021215261A1 - Information processing method, information processing device, and program

Info

Publication number: WO2021215261A1
Application number: PCT/JP2021/014910
Authority: WO
Inventors: 雅人石井
Original assignee: ソニーグループ株式会社
Priority date: 2020-04-20
Filing date: 2021-04-08
Publication date: 2021-10-28

Abstract

An information processing method according to an embodiment of the present technology is executed by a computer system, and includes an extraction step and a generation step. In the extraction step, two or more sets of teacher data are extracted from a plurality of sets of teacher data for training a machine learning model which predicts a correct answer with respect to an input. In the generation step, teacher data for reliability prediction, in which learning data and the reliability of the correct answer prediction are associated with one another, are generated from the two or more extracted sets of teacher data, using a kernel function.

Description

Information processing methods, information processing devices, and programs

This technology relates to information processing methods, information processing devices, and programs applicable to machine learning.

Patent Document 1 discloses an evaluation device capable of evaluating the reliability of an estimated value obtained by using a neural network.
In Non-Patent Document 1, it is possible to use the output of the neural network as reliability by executing mixup which expands the data by interpolating the data and the label used as the teacher data at the same ratio. The experimental result to the effect is described.

Japanese Patent No. 5494034

Regarding prediction using machine learning, there is a need for a technology that can output the reliability of prediction results with high accuracy.

In view of the above circumstances, the purpose of the present technology is to provide an information processing method, an information processing device, and a program capable of outputting the reliability of the prediction result with high accuracy.

In order to achieve the above object, the information processing method according to one embodiment of the present technology is an information processing method executed by a computer system and includes an extraction step and a generation step.
The extraction step extracts two or more teacher data from a plurality of teacher data for training a machine learning model that predicts a correct answer for an input.
In the generation step, a kernel function is used to generate teacher data for reliability prediction in which the learning data and the reliability for the prediction of the correct answer are associated with each other from the extracted two or more teacher data.

In this information processing method, a kernel function is used to generate teacher data for reliability prediction in which the learning data and the reliability related to the prediction of the correct answer are associated with each other from two or more teacher data. By training the machine learning model using the generated teacher data for predicting the reliability, it is possible to output the reliability of the prediction result with high accuracy.

Each of the plurality of teacher data may be data in which the correct answer is associated with the learning data for predicting the correct answer as a teacher label. In this case, assuming that the training data included in the reliability prediction teacher data is the training data for reliability prediction, the generation step is included in each of the two or more extracted teacher data. By synthesizing the learning data for predicting the correct answer, the learning data for predicting the reliability may be generated. Further, the generation step determines the reliability of the generated learning data for predicting the reliability when the teacher label included in each of the two or more extracted teacher data is predicted as the correct answer. It may be generated as the reliability of the prediction of the correct answer.

In the generation step, the kernel function is used to generate a probability distribution to which the training data for reliability prediction follows, and the learning data for reliability prediction is generated based on the generated probability distribution. You may. Further, in the generation step, the teacher label included in each of the extracted two or more teacher data is correct under the condition that the generated learning data for predicting the reliability is input. A certain conditional probability may be generated as the reliability of the prediction of the correct answer.

The extraction step may extract two teacher data from the plurality of teacher data. In this case, the generation step predicts the reliability by performing interpolation at a predetermined interpolation ratio for the learning data for predicting the correct answer included in each of the two extracted teacher data. You may generate training data for. Further, the generation step determines the reliability of the generated learning data for predicting the reliability when the teacher label included in each of the two extracted teacher data is predicted as the correct answer. It may be generated as the reliability of the prediction of.

In the generation step, the kernel function is used to generate a probability distribution to which the interpolation ratio follows, the interpolation ratio is determined based on the generated probability distribution, and the interpolation ratio is determined. By executing the interpolation, the training data for predicting the reliability may be generated. Further, in the generation step, the teacher label included in each of the extracted two teacher data is correct under the condition that the generated learning data for predicting the reliability is input. Conditional probabilities may be generated as confidence in predicting the correct answer.

The kernel function may be a Gaussian kernel.

The information processing device according to one form of the present technology includes an extraction unit and a generation unit.
The extraction unit executes the extraction step.
The generation unit executes the generation step.

The program according to one form of the present technology causes a computer system to execute the extraction step and the generation step.

It is a schematic diagram which shows the structural example of the data generation system which concerns on one Embodiment of this technique. It is a schematic diagram which shows an example of a machine learning model. It is a schematic diagram for explaining the learning of the machine learning model using the teacher data. It is a flowchart which shows an example of the information processing method executed by an information processing apparatus. It is a schematic diagram for demonstrating an example of object recognition using a machine learning model. It is a schematic diagram which shows another example of the teacher data for the correct answer prediction. It is a block diagram which shows the functional configuration example of an information processing apparatus. It is a flowchart which shows an example of the information processing method which concerns on this Embodiment. It is a schematic diagram for demonstrating the example of making a probability distribution. It is a schematic diagram which shows an example of the teacher data for reliability prediction. It is a schematic diagram for demonstrating the technique described in Non-Patent Document 1. It is a block diagram which shows the hardware configuration example of an information processing apparatus.

Hereinafter, embodiments relating to the present technology will be described with reference to the drawings.

[Data generation system]
FIG. 1 is a schematic diagram showing a configuration example of a data generation system according to an embodiment of the present technology.
FIG. 2 is a schematic diagram showing an example of a machine learning model.
FIG. 3 is a schematic diagram for explaining learning of a machine learning model using teacher data.

The data generation system 100 corresponds to an embodiment of an information processing system according to the present technology.
As shown in FIG. 1, the data generation system 100 includes a teacher data DB (database) 10 and an information processing device 20.
A plurality of teacher data are stored in the teacher data DB 10.
The teacher data is data for training the machine learning model 15 that predicts the correct answer for the input, which is illustrated in FIG.
It should be noted that what kind of data is input to the machine learning model 15 and what kind of data is predicted as the correct answer is not limited, and this technique can be applied to any machine learning model.

As shown in FIG. 3, the teacher data is data in which the teacher label 12 is associated with the learning data 11.
Examples of the learning data 11 include image data, audio data, and the like. In addition, arbitrary data to be input to the machine learning model 15 may be used.
The teacher label 12 is a correct answer (correct answer data) to be predicted by the machine learning model 15. In the example shown in FIG. 3, the teacher label 12 is stored in the label DB 13. The label DB 13 is constructed in, for example, the teacher data DB 10 shown in FIG.
Of course, the configuration and method for storing the teacher data (learning data 11 and the teacher label 12) are not limited, and any configuration and method may be adopted.

As shown in FIG. 3, the learning data 11 and the teacher label 12 are associated with each other and are input to the learning unit 14 as teacher data.
The learning unit 14 uses the teacher data and performs learning based on the machine learning algorithm. By learning, the parameters (coefficients) for calculating the correct answer (teacher label) are updated and generated as learned parameters. A program incorporating the generated learned parameters is generated as the machine learning model 15.
As the learning method in the learning unit 14, for example, an error back propagation method that is generally often used for learning a neural network can be used. A neural network is a model that originally imitates a human brain neural circuit, and has a layered structure consisting of three types of layers: an input layer, an intermediate layer (hidden layer), and an output layer. A neural network having a large number of intermediate layers is particularly called a deep neural network, and a deep learning technique for learning this is known as a model capable of learning a complicated pattern hidden in a large amount of data. The error back propagation method is one of such learning methods, and is often used for learning, for example, a convolutional neural network (CNN) used for recognizing images and moving images.
Further, as a hardware structure for realizing such machine learning, a neurochip / neuromorphic chip incorporating the concept of a neural network can be used.
In addition, any machine learning algorithm may be used.

The information processing device 20 shown in FIG. 1 has hardware necessary for configuring a computer, such as a processor such as a CPU, GPU, or DSP, a memory such as a ROM or RAM, and a storage device such as an HDD (see FIG. 12).
For example, the information processing method according to the present technology is executed when the CPU loads and executes the program according to the present technology recorded in advance in the ROM or the like into the RAM.
For example, the information processing device 20 can be realized by an arbitrary computer such as a PC (Personal Computer). Of course, hardware such as FPGA and ASIC may be used.
In the present embodiment, the extraction unit 21 as a functional block and the generation unit 22 are configured by the CPU or the like executing a predetermined program. Of course, dedicated hardware such as an IC (integrated circuit) may be used to realize the functional block.
The program is installed in the information processing apparatus 20 via, for example, various recording media. Alternatively, the program may be installed via the Internet or the like.
The type of recording medium on which the program is recorded is not limited, and any computer-readable recording medium may be used. For example, any non-transient storage medium that can be read by a computer may be used.
The teacher data DB 10 shown in FIG. 1 may be constructed in the information processing apparatus 20.

FIG. 4 is a flowchart showing an example of an information processing method executed by the information processing apparatus 20.
The extraction step is executed by the extraction unit 21 shown in FIG. 1 (step 101).
The extraction step is a step of extracting two or more teacher data from a plurality of teacher data for training the machine learning model 15 that predicts the correct answer for the input.
For example, two or more teacher data may be arbitrarily extracted from a plurality of teacher data. Not limited to this, for example, two or more teacher data may be extracted after processing such as classification is executed for a plurality of teacher data.
The generation step is executed by the generation unit 22 shown in FIG. 1 (step 102).
The generation step is a step of generating teacher data for reliability prediction from two or more extracted teacher data using a kernel function.
The teacher data for predicting the reliability is data in which the learning data and the reliability for predicting the correct answer by the machine learning model 15 are associated with each other. That is, in the teacher data for predicting the reliability, the reliability for the prediction of the correct answer is used as the teacher label.

As described above, in the present embodiment, by using the kernel function from the teacher data (learning data 11 and the teacher label 12) for training the machine learning model 15, the teacher data for predicting the reliability (learning data and the teacher label 12). Reliability) is generated as new teacher data.
As the kernel function, any kernel function such as a Gaussian Kernel, a Square Kernel, or a Circular Kernel may be used.
Hereinafter, the original teacher data shown in FIG. 1 and the like may be described as teacher data for predicting the correct answer. Further, the learning data 11 and the teacher label 12 included in the original teacher data may be described as the learning data 11 for predicting the correct answer and the correct answer label 12 by using the same reference numerals.
In addition, the learning data and the reliability included in the newly generated teacher data for the reliability prediction may be described as the learning data for the reliability prediction and the reliability label. In this case, the reliability of the prediction of the correct answer by the machine learning model 15 corresponds to the reliability label.

For example, the generation step shown in FIG. 4 synthesizes the learning data 11 for correct answer prediction included in each of the two or more extracted teacher data for correct answer prediction, so that the learning data for reliability prediction is synthesized. Is generated.
As a data synthesis method, an arbitrary data expansion method such as mixup may be used. For example, data interpolation may be performed with a predetermined interpolation ratio. In addition, extrapolation of data and the like may be used. Of course, it is also possible to use data interpolation and extrapolation together.

The reliability when the teacher label (correct answer label) 12 included in each of the two or more extracted teacher data for correct answer prediction is predicted as the correct answer for the training data for reliability prediction generated by the synthesis. , Generated as reliability (reliability label) for reliability prediction.
For example, the teacher label (correct answer label) 12 included in each of the extracted two or more correct answer prediction teacher data under the condition that the generated training data for reliability prediction is input. It is possible to generate the correct conditional probability as the reliability (reliability label) for predicting the reliability.
By such processing, it becomes possible to newly generate teacher data for reliability prediction. Hereinafter, the present technology will be described by taking a more detailed embodiment as an example.

[Object recognition by machine learning model]
FIG. 5 is a schematic diagram for explaining an example of object recognition using the machine learning model 15.
Here, the machine learning model 15 recognizes the object shown in the image (image data) 17. That is, with the image 17 as an input, the type of the object shown in the image is predicted as the correct answer.
For example, as shown in FIG. 5, five classes of "dog", "cat", "horse", "sheep", and "monkey" are set, and the score of each class is calculated for the input image 17. The class with the highest score is output as the prediction result. The identification of the five classes of "dog", "cat", "horse", "sheep", and "monkey" is executed by applying an index value such as 1 to 5, for example.

As shown in FIGS. 5A and 5B, it is assumed that an image 17 showing a cat is input. As shown in FIG. 5A, it is assumed that the score of the "cat" class is sufficiently higher than the scores of the other classes. The machine learning model 15 outputs the index value of the "cat" class as a prediction result.
As shown in FIG. 5B, there may be cases where the score of the "cat" class is higher than the score of the other class, but there is not much difference with the score of the other class. Even in such a case, the machine learning model 15 outputs the index value of the "cat" class as a prediction result.

When the identification task shown in FIG. 5 is executed, for example, the following teacher data for correct answer prediction (learning data 11 for correct answer prediction, correct answer label 12) is used.
(Image showing a dog, index value corresponding to "dog" (1))
(Image showing a cat, index value corresponding to "cat" (2))
(Image showing a horse, index value corresponding to "horse" (3))
(Image showing sheep, index value corresponding to "sheep" (4))
(Image showing a monkey, index value corresponding to "monkey" (5))
Highly accurate object recognition is realized by creating a lot of these teacher data and training the machine learning model 15.

FIG. 6 is a schematic diagram showing another example of teacher data for predicting the correct answer.
As shown in FIG. 6, it is also possible to add information on the size of the score of each class as the correct answer label 12. That is, the following data can be used as the teacher data for predicting the correct answer (learning data 11 for predicting the correct answer, the correct answer label 12). The maximum score is set to 1.0.
(Image showing a dog, score 1.0 for "dog (1)" and score 0.0 for other classes)
(Image showing a cat, score 1.0 for "cat (2)" and score 0.0 for other classes)
(Image showing a horse, score 1.0 for "horse (3)" and score 0.0 for other classes)
(Image showing sheep, score 1.0 for "sheep (4)" and score 0.0 for other classes)
(Image showing a monkey, score 1.0 for "monkey (5)" and score 0.0 for other classes)
In addition, any method may be adopted as a method for creating the correct answer label. For example, teacher data is created with the index value of each class as the correct label. Then, the machine learning model 15 is trained so that the score of the class corresponding to the correct answer label is 1.0 and the score of the other class is 0.0. Such learning is also possible.

FIG. 7 is a block diagram showing a functional configuration example of the information processing device 20.
In the present embodiment, the information processing apparatus 20 has a data distribution creation unit 24, an interpolation ratio determination unit 25, an interpolation data generation unit 26, and a reliability label generation unit 27.
Each of these functional blocks is configured, for example, by a processor executing a predetermined program. Of course, in order to realize these functional blocks, dedicated hardware such as an IC (integrated circuit) may be used.
In the present embodiment, the data distribution creation unit 24 realizes the extraction unit 21 shown in FIG.
Further, the data distribution creation unit 24, the interpolation ratio determination unit 25, the interpolation data generation unit 26, and the reliability label generation unit 27 realize the generation unit 22 shown in FIG. Therefore, in the present embodiment, the data distribution creation unit 24 also functions as the extraction unit 21 and also as the generation unit 22.

FIG. 8 is a flowchart showing an example of the information processing method according to the present embodiment.
The data distribution creation unit 24 extracts two teacher data for predicting the correct answer from the teacher data DB 10 (step 201).
In the present embodiment, an image showing any of "dog", "cat", "horse", "sheep", and "monkey" is used as learning data 11 for predicting the correct answer. Further, it is assumed that the type (index value) of the object shown in the image is used as the correct label 12.
Hereinafter, the image which is the learning data 11 for predicting the correct answer is referred to as (x). Further, let (y) be the type (index value) of the object shown in the image. The teacher data for predicting the correct answer stored in the teacher data DB 10 is (a set of (x, y)).
Hereinafter, the two teacher data extracted by the data distribution creation unit 24 will be referred to as (x ₀ , y ₀ ) and (x ₁ , y ₁ ).

The data distribution creation unit 24 creates a probability distribution to which the learning data for reliability prediction (hereinafter referred to as (x')) generated from the teacher data for predicting the correct answer is followed by using the kernel function. (Step 202).
For example, kernel functions are predetermined by the user. Based on the defined kernel function, parameters for creating a probability distribution according to the learning data (x') for reliability prediction are generated and input to the data distribution creation unit 24.
For example, a kernel function is used to set the shape of the probability distribution that the learning data (x') for predicting reliability follows. Of course, the kernel function itself is also included in the parameters for creating the probability distribution that the training data (x') for predicting reliability follows.

FIG. 9 is a schematic diagram for explaining an example of creating a probability distribution.
For example, suppose a Gaussian kernel is used as a kernel function. That is, it is assumed that the Gaussian distribution is set as the form of the probability distribution.
In this case, it is possible to create a probability distribution that the learning data (x') for predicting the reliability follows by the following formula.

_{N (x'| x i} , s) in Eq. (Equation 1) is a Gaussian distribution with a width s centered on _{x i.}
As shown in FIG. 9, learning for reliability prediction is performed by setting a Gaussian distribution based on two learning data for correct answer prediction, (x ₀ ) and (x _{1), and adding appropriate coefficients.} It is possible to create a probability distribution that the data (x') follows.
In Eq. (Equation 1), the coefficients of the two Gaussian distributions are both halved. Not limited to this, for example, an arbitrary coefficient may be set so that the sum of the coefficients is 1.
Any kernel distribution other than the Gaussian distribution may be set.

In this embodiment, as shown in the following equation, two learning data (x ₀ ) and (x ₁ ) for predicting the correct answer are interpolated at a predetermined interpolation ratio (hereinafter referred to as λ). By executing, the learning data (x') for predicting the reliability is generated.

As shown in FIG. 8, the interpolation ratio determination unit 25 determines the interpolation ratio (λ).
The interpolation ratio (λ) is determined based on the probability distribution created in step 202. Specifically, the interpolation ratio (λ) is determined so that the learning data (x') for predicting the reliability generated by interpolation follows the probability distribution created in step 202.
For example, the probability distribution of the interpolation ratio (λ) is calculated so that the learning data (x') for predicting the reliability generated by interpolation follows the probability distribution created in step 202. The probability distribution of the interpolation ratio (λ) can be calculated, for example, by using a well-known change of variable method.
The interpolation ratio (λ) is determined based on the calculated probability distribution. In the present embodiment, since data interpolation is executed, λ is determined in the range of 0 ≦ λ ≦ 1. Not limited to this, extrapolation of data may be performed by expanding the range that λ can take.

Interpolation data and confidence labels are generated based on the determined interpolation ratio (λ) (step 204). The interpolated data is learning data (x') for predicting reliability.
That is, in step 204, teacher data for predicting reliability is generated.

FIG. 10 is a schematic diagram showing an example of teacher data for predicting reliability.
The interpolation data (x') is generated by the interpolation data generation unit 26.
The interpolation data generation unit 26 generates the interpolation data (x') by executing the interpolation at the determined interpolation ratio (λ).
_{For example, two images for learning (x 0} ) and (x ₁ ) for predicting the correct answer are combined based on the equation (Equation 2) to generate interpolated data (x').
In the example shown in FIG. 10, the image showing the "dog (1)" and the image showing the "cat (2)" are two learning data (x ₀ ) and (x _{1) for predicting the correct answer.} ) Is extracted. Then, interpolation is executed for these images, and interpolation data (x') is generated.
As described above, in the present embodiment, the learning data (x') for predicting the reliability is generated based on the probability distribution created in step 202.

The reliability label is generated by the reliability label generation unit 27.
_{In the present embodiment, the teacher labels (y 0} ) and (y ₁ ) included in each of the two extracted teacher data for predicting the reliability of the generated interpolation data (x') are predicted as correct answers. The reliability of the case is generated as a reliability label.
Specifically, as shown in FIG. 10, it is included in each of the two extracted teacher data for predicting the correct answer under the condition that the generated interpolation data (x') is input. Conditional probabilities (p (y = y _i | x')) in which the teacher labels (y ₀ ) and (y _{1) are correct are generated as confidence labels.}
In the example shown in FIG. 10, the conditional probability (p (1 | x')) that "dog (1)" is correct under the condition that the interpolation data (x') is input, and " A conditional probability (p (2 | x')) in which "cat (2)" is the correct answer is generated as a confidence label.
The conditional probability (p (y = y _i | x')) can also be said to be a posterior probability.

In the present embodiment, the data distribution creation unit 24 explicitly creates a probability distribution to which the newly generated learning data (x') for predicting the reliability follows.
Therefore, the reliability of the newly generated learning data (x') for predicting the reliability, that is, the conditional probability (p (y = y _i | x')) can be calculated by the following formula. Is.

In equation (Equation 3), the left side to the middle side are calculated based on Bayes' theorem. The middle side to the right side are calculated by substituting the creation result (Equation (Equation 1)) by the data distribution creation unit 24.
By associating the interpolated data (x') with the conditional probability (p (y = y _i | x')), teacher data for predicting reliability is generated.
Note that step 201 in FIG. 8 corresponds to the extraction step (step 101) shown in FIG.
Steps 202 to 204 are steps included in the generation step (step 202) shown in FIG.

Using the generated teacher data for predicting reliability, a machine learning model is trained based on a machine learning algorithm as described with reference to FIG.
This makes it possible to realize a machine learning model that takes the image (x) as an input and outputs the reliability (p (y | x)) of each class with respect to the image (x).
For example, in the example shown in FIG. 5, the following reliability can be output for the input image.
Reliability when the object in the image is a "dog" (p (y = 1 | x))
Reliability when the object in the image is a "cat" (p (y = 2 | x))
Reliability when the object in the image is a "horse" (p (y = 3 | x))
Reliability when the object in the image is a "sheep" (p (y = 4 | x))
Reliability when the object in the image is a "monkey" (p (y = 5 | x))

By selecting the class with the highest reliability (p (y | x)), it is possible to identify the object in the image. Also, the reliability (p (y | x)) of the class is output. By doing so, it is possible to output the reliability of the prediction. That is, it is possible to output the identification result and the reliability of the identification result at the same time.
By creating a lot of teacher data for predicting reliability and training the machine learning model sufficiently, it is possible to output highly accurate reliability at the same time while achieving a high prediction accuracy rate.

In the present embodiment, the case where two teacher data for correct answer prediction is extracted is taken as an example, but three or more teacher data for correct answer prediction may be extracted.
For example, for learning for reliability prediction, which is generated by synthesizing learning data for predicting 3 or more correct answers included in each of the teacher data for predicting 3 or more correct answers using a kernel function. It is possible to create a probability distribution that the data follows.
Further, it is possible to generate learning data for predicting reliability and a reliability label based on the created probability distribution. That is, it is possible to generate the teacher data for reliability prediction from the extracted teacher data for reliability prediction of 3 or more by using the kernel function.

As described above, in the information processing apparatus and the information processing method according to the present embodiment, the kernel function is used, and the learning data and the reliability regarding the prediction of the correct answer are associated with each other from the two or more teacher data for the reliability prediction. Teacher data is generated. By training the machine learning model using the generated teacher data for predicting the reliability, it is possible to output the reliability of the prediction result with high accuracy.

In recent years, the accuracy of intellectual processing based on machine learning has improved, and various tasks can be automated. As a result, there is an increasing need to automate important tasks that have a large social impact (for example, medical diagnosis, buying and selling financial products, legal interpretation, etc.).
It is difficult to fully automate such tasks because they have a large adverse effect when making incorrect predictions and judgments. It is realistic to adopt a process in which a human makes a judgment when the prediction is difficult or impossible (unlearned) after making an automatic prediction by a machine.
At this time, in order to know the degree of difficulty of prediction and the degree of impossible prediction, it is necessary to estimate the reliability of the prediction together with the prediction result by machine learning.
Here, the most commonly used value as the reliability of the prediction is the probability that the prediction of the model is a correct prediction. That is, when the reliability of the prediction for a certain data is 0.5, the probability that the prediction is correct is 50%. By extracting data whose reliability is below a certain level, it is possible to extract data that is difficult to predict or data that cannot be predicted.
For example, in the machine learning model 15 illustrated in FIG. 5, it is conceivable to use the magnitude of the score of each class as the reliability. For example, in the case of the prediction result shown in FIG. 5A, it can be determined that the reliability of the prediction result that the object in the image is a cat is high. On the other hand, with the prediction result shown in FIG. 5B, it can be determined that the reliability of the prediction result that the object shown in the image is a cat is low.

However, it is known that machine learning models, especially those learned by deep learning, which is often used recently, tend to overestimate the reliability (that is, the reliability becomes a value larger than the actual correct answer rate). Has been done. That is, for example, even when the reliability of the prediction for a certain data is 0.9, the actual correct answer rate often falls far below 90%.
For example, in the example shown in FIG. 5, even if the image is difficult to identify, many prediction results (prediction results exemplified in FIG. 5A) in which the score of a specific class is sufficiently higher than that of other classes are output. That is, the prediction result in which the magnitude (reliability) of the score is overestimated is output.
When such overestimated reliability is used, it is difficult to correctly extract data that is difficult to predict or data that cannot be predicted and have humans judge it.

As a first countermeasure to such a problem, for example, a method of correcting the reliability output from the model by post-processing can be considered. However, since training data and calculation for correction are required separately in order to correct the post-processing of reliability, there is a problem that the data collection cost and the calculation cost at the time of prediction become high.
As a second countermeasure, a method of separately learning a model for estimating reliability in parallel can be considered. However, this coping method has a problem that the calculation cost at the time of learning / prediction becomes high because the number of models increases.
As a third countermeasure, a method of improving the learning method is conceivable so that the model outputs the correct reliability together with the prediction. For example, there is a method of imposing restrictions and penalties that lower the output reliability so as not to output too high reliability, but there is a problem that there is no guarantee that the correct reliability will be output.

The present inventor paid attention to the third coping method, and repeated consideration in order to realize a technique capable of outputting the reliability of the prediction with high accuracy.
For example, as shown in FIG. 11, in Non-Patent Document 1 described above, the teacher data for predicting the correct answer is newly added by data expansion in which both the learning data and the correct answer label are interpolated at the same interpolation ratio (λ). Is generated in. It is described that by training a machine learning model using the teacher data, the identification accuracy can be improved and the accuracy of the reliability when the size of the score is used as the reliability is also improved.
However, in the mixup described in Non-Patent Document 1, for the new learning data interpolated at the interpolation ratio (λ), the correct label inserted at the same interpolation ratio (λ) is attached to the learning data. There is no basis for class confidence.
For example, in the example shown in FIG. 11, an image showing "dog (1)" and an image showing "cat (2)" are interpolated at an interpolation ratio (λ = 0.6). , Interpolation data is generated.
The correct label for the interpolated data is generated as follows by interpolating with the same interpolation ratio (λ = 0.6).
(Image showing a dog, score of "dog (1)" 0.6)
(Image showing a cat, score 0.4 for "cat (2)")
(Image showing a horse, score 0.0 for "horse (3)")
(Image showing sheep, score 0.0 for "sheep (4)")
(Image showing a monkey, score 0.0 for "monkey (5)")
There is no evidence that the magnitude of each class's score is the confidence of each class in the interpolated data.
Therefore, even if the machine learning model is trained using the teacher data generated in this way, there is no guarantee that the correct reliability will be output.

In this technique, a pair of learning data to be learned and reliability is newly generated from a plurality of given teacher data. That is, this technology makes it possible to realize data expansion specialized for learning of correct reliability.
By presetting the data distribution using the kernel function, the interpolation ratio (λ) for generating new learning data is determined, and the reliability according to the interpolation ratio (λ) is generated. , It becomes possible to do it automatically and appropriately.
In the newly generated teacher data for predicting reliability, the training data and the reliability correspond correctly, so a machine learning model that outputs the correct reliability simply by learning these with the usual learning method. Can be realized.
Further, in this technology, since it is not necessary to perform separate learning for predicting the reliability, it is possible to sufficiently suppress the data collection cost and the calculation cost.

<Other Embodiments>
The present technology is not limited to the embodiments described above, and various other embodiments can be realized.

FIG. 12 is a block diagram showing a hardware configuration example of the information processing device 20.
The information processing device 20 includes a CPU 61, a ROM (Read Only Memory) 62, a RAM 63, an input / output interface 65, and a bus 64 that connects them to each other. A display unit 66, an input unit 67, a storage unit 68, a communication unit 69, a drive unit 70, and the like are connected to the input / output interface 65.
The display unit 66 is a display device using, for example, a liquid crystal display, an EL, or the like. The input unit 67 is, for example, a keyboard, a pointing device, a touch panel, or other operating device. When the input unit 67 includes a touch panel, the touch panel can be integrated with the display unit 66.
The storage unit 68 is a non-volatile storage device, for example, an HDD, a flash memory, or other solid-state memory. The drive unit 70 is a device capable of driving a removable recording medium 71 such as an optical recording medium or a magnetic recording tape.
The communication unit 69 is a modem, router, or other communication device for communicating with another device that can be connected to a LAN, WAN, or the like. The communication unit 69 may communicate using either wire or wireless. The communication unit 69 is often used separately from the information processing device 20.
Information processing by the information processing device 20 having the hardware configuration as described above is realized by the cooperation between the software stored in the storage unit 68 or the ROM 62 or the like and the hardware resources of the information processing device 20. Specifically, the information processing method according to the present technology is realized by loading the program constituting the software stored in the ROM 62 or the like into the RAM 63 and executing the program.
The program is installed in the information processing apparatus 20 via, for example, the recording medium 61. Alternatively, the program may be installed in the information processing apparatus 20 via a global network or the like. In addition, any non-transient storage medium that can be read by a computer may be used.

The information processing method and program according to the present technology may be executed and the information processing device according to the present technology may be constructed by the cooperation of a plurality of computers connected so as to be communicable via a network or the like.
That is, the information processing method and the program according to the present technology can be executed not only in a computer system composed of a single computer but also in a computer system in which a plurality of computers operate in conjunction with each other.
In the present disclosure, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether or not all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and one device in which a plurality of modules are housed in one housing are both systems.
The information processing method and program execution related to this technology by a computer system include, for example, extraction of teacher data, creation of probability distribution, determination of interposition ratio, generation of interposition data, generation of reliability label, etc. This includes both cases where the processing is performed by a computer and cases where each process is performed by a different computer. Further, the execution of each process by a predetermined computer includes causing another computer to execute a part or all of the process and acquire the result.
That is, the information processing method and program according to the present technology can be applied to a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.

Each configuration of the data generation system, teacher data DB, information processing device, etc., each processing flow, etc. described with reference to each drawing is only one embodiment, and can be arbitrarily modified as long as it does not deviate from the purpose of the present technology. Is. That is, other arbitrary configurations, algorithms, and the like for implementing the present technology may be adopted.

When the word "abbreviation" is used in this disclosure, it is used only to facilitate the understanding of the explanation, and the use / non-use of the word "abbreviation" does not have any special meaning. ..
That is, in the present disclosure, "center", "center", "uniform", "equal", "same", "orthogonal", "parallel", "symmetrical", "extended", "axial", "cylindrical", "cylindrical", and "ring". Concepts that define shape, size, positional relationship, state, etc., such as "annular shape," are "substantially centered,""substantiallycentered,""substantiallyuniform,""substantiallyequal," and "substantially equal." Same as "substantially orthogonal""substantiallyparallel""substantiallysymmetrical""substantiallyextending""substantiallyaxial""substantiallycylindrical""substantiallycylindrical""substantiallycylindrical" The concept includes "substantially ring shape", "substantially ring shape", and the like.
For example, "perfectly centered", "perfectly centered", "perfectly uniform", "perfectly equal", "perfectly identical", "perfectly orthogonal", "perfectly parallel", "perfectly symmetric", "perfectly extending", "perfectly extending" Includes states that are included in a predetermined range (for example, ± 10% range) based on "axial direction", "completely cylindrical shape", "completely cylindrical shape", "completely ring shape", "completely annular shape", etc. Is done.
Therefore, even when the word "abbreviation" is not added, a concept expressed by adding a so-called "abbreviation" can be included. On the contrary, the complete state is not excluded from the state expressed by adding "abbreviation".

In the present disclosure, expressions using "twist" such as "greater than A" and "less than A" include both the concept including the case equivalent to A and the concept not including the case equivalent to A. It is an expression that includes the concept. For example, "greater than A" is not limited to the case where the equivalent of A is not included, and "greater than or equal to A" is also included. Further, "less than A" is not limited to "less than A", but also includes "less than or equal to A".
When implementing the present technology, specific settings and the like may be appropriately adopted from the concepts included in "greater than A" and "less than A" so that the effects described above can be exhibited.

It is also possible to combine at least two feature parts among the feature parts related to the present technology described above. That is, the various feature portions described in each embodiment may be arbitrarily combined without distinction between the respective embodiments. Further, the various effects described above are merely examples and are not limited, and other effects may be exhibited.

The present technology can also adopt the following configurations.
(1)
An information processing method executed by a computer system.
An extraction step that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer for input, and
It includes a generation step of generating training data for reliability prediction in which the training data and the reliability for the prediction of the correct answer are associated with each other from the extracted two or more teacher data using a kernel function. Information processing method.
(2) The information processing method according to (1).
Each of the plurality of teacher data is data in which the correct answer is associated with the learning data for predicting the correct answer as a teacher label.
Assuming that the learning data included in the teacher data for reliability prediction is the learning data for reliability prediction,
The generation step
By synthesizing the learning data for predicting the correct answer included in each of the extracted two or more teacher data, the learning data for predicting the reliability is generated.
The reliability of the generated training data for predicting the reliability when the teacher label included in each of the two or more extracted teacher data is predicted as the correct answer is the reliability of the prediction of the correct answer. Information processing method generated as.
(3) The information processing method according to (2).
The generation step
Using the kernel function, a probability distribution to which the training data for reliability prediction follows is generated, and the learning data for reliability prediction is generated based on the generated probability distribution.
The conditional probability that the teacher label included in each of the extracted two or more teacher data is correct under the condition that the generated training data for predicting the reliability is input. An information processing method generated as a reliability for the prediction of the correct answer.
(4) The information processing method according to (3).
In the extraction step, two teacher data are extracted from the plurality of teacher data.
The generation step
By performing interpolation at a predetermined interpolation ratio for the learning data for predicting the correct answer contained in each of the two extracted teacher data, the learning data for predicting the reliability is generated. ,
The reliability of the generated training data for predicting the reliability when the teacher label included in each of the two extracted teacher data is predicted as the correct answer is used as the reliability for predicting the correct answer. Information processing method to generate.
(5) The information processing method according to (4).
The generation step
Using the kernel function, generate a probability distribution to which the interpolation ratio follows, determine the interpolation ratio based on the generated probability distribution, and execute interpolation at the determined interpolation ratio. Then, the training data for predicting the reliability is generated.
The conditional probability that the teacher label included in each of the extracted two teacher data is correct under the condition that the generated training data for predicting the reliability is input is described above. An information processing method that is generated as the reliability of predicting the correct answer.
(6) The information processing method according to any one of (1) to (5).
The kernel function is an information processing method that is a Gaussian kernel.
(7)
An extraction unit that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer to the input.
It is provided with a generator that generates training data and teacher data for reliability prediction in which the learning data and the reliability for the prediction of the correct answer are associated with each other from the extracted two or more teacher data using a kernel function. Information processing device.
(8)
An extraction step that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer for the input, and
A computer system uses a kernel function to generate training data and teacher data for prediction of reliability in which the reliability of the prediction of the correct answer is associated with each other from the extracted two or more teacher data. Program to be executed by.

11 ... Learning data for predicting correct answer 12 ... Teacher label (correct answer label)
15 ... Machine learning model 20 ... Information processing device 21 ... Extraction unit 22 ... Generation unit 24 ... Data distribution creation unit 25 ... Interpolation ratio determination unit 26 ... Interpolation data generation unit 27 ... Reliability label generation unit 100 ... Data generation system

Claims

An information processing method executed by a computer system.
An extraction step that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer for input, and
It includes a generation step of generating training data for reliability prediction in which the training data and the reliability for the prediction of the correct answer are associated with each other from the extracted two or more teacher data using a kernel function. Information processing method.
The information processing method according to claim 1.
Each of the plurality of teacher data is data in which the correct answer is associated with the learning data for predicting the correct answer as a teacher label.
Assuming that the learning data included in the teacher data for reliability prediction is the learning data for reliability prediction,
The generation step
By synthesizing the learning data for predicting the correct answer included in each of the extracted two or more teacher data, the learning data for predicting the reliability is generated.
The reliability of the generated training data for predicting the reliability when the teacher label included in each of the two or more extracted teacher data is predicted as the correct answer is the reliability of the prediction of the correct answer. Information processing method generated as.
The information processing method according to claim 2.
The generation step
Using the kernel function, a probability distribution to which the training data for reliability prediction follows is generated, and the learning data for reliability prediction is generated based on the generated probability distribution.
The conditional probability that the teacher label included in each of the extracted two or more teacher data is correct under the condition that the generated training data for predicting the reliability is input. An information processing method generated as a reliability for the prediction of the correct answer.
The information processing method according to claim 3.
In the extraction step, two teacher data are extracted from the plurality of teacher data.
The generation step
By performing interpolation at a predetermined interpolation ratio for the learning data for predicting the correct answer contained in each of the two extracted teacher data, the learning data for predicting the reliability is generated. ,
The reliability of the generated training data for predicting the reliability when the teacher label included in each of the two extracted teacher data is predicted as the correct answer is used as the reliability for predicting the correct answer. Information processing method to generate.
The information processing method according to claim 4.
The generation step
Using the kernel function, generate a probability distribution to which the interpolation ratio follows, determine the interpolation ratio based on the generated probability distribution, and execute interpolation at the determined interpolation ratio. Then, the training data for predicting the reliability is generated.
The conditional probability that the teacher label included in each of the extracted two teacher data is correct under the condition that the generated training data for predicting the reliability is input is described above. An information processing method that is generated as the reliability of predicting the correct answer.
The information processing method according to claim 1.
The kernel function is an information processing method that is a Gaussian kernel.
An extraction unit that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer to the input.
It is provided with a generator that generates training data and teacher data for reliability prediction in which the learning data and the reliability for the prediction of the correct answer are associated with each other from the extracted two or more teacher data using a kernel function. Information processing device.
An extraction step that extracts two or more teacher data from multiple teacher data for training a machine learning model that predicts the correct answer for input, and
A computer system uses a kernel function to generate training data and teacher data for prediction of reliability in which the reliability of the prediction of the correct answer is associated with each other from the extracted two or more teacher data. Program to be executed by.