WO2022195763A1

WO2022195763A1 - Learning device, learning method, and recording medium

Info

Publication number: WO2022195763A1
Application number: PCT/JP2021/010828
Authority: WO
Inventors: 周平吉田
Original assignee: 日本電気株式会社
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-09-22
Also published as: JPWO2022195763A1

Abstract

In the present invention, a first inference means subjects weak correctness-labeled data to first data extension, and performs first inference using the obtained data. A first loss calculation means calculates a first loss using the result of first inference and weak correctness assigned to the weak correctness-labeled data. A second inference means subjects the weakly correct labeled data to second data extension, and performs second inference using the obtained data. A third inference means subjects the weakly correct labeled data to third data extension, and performs third inference using the obtained data. A pseudo label generation means generates a pseudo label using the result of third inference. A second loss calculation means calculates a second loss on the basis of the result of second inference and the pseudo label. An update means updates the parameters of the first inference means, the second inference means, and the third inference means on the basis of the first loss and the second loss.

Description

LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

This disclosure relates to a learning method for a machine learning model.

In recent years, recognition technology based on machine learning has shown extremely high performance, mainly in the field of image recognition. The high accuracy of such recognition technology based on machine learning is supported by a large amount of data with correct answers. That is, high accuracy is achieved by preparing a large amount of data with correct answers and performing learning.

However, preparing a large amount of data with correct answers requires cost and time. From this point of view, Patent Document 1 uses the probability of belonging an input image to each class and the estimated true/false probability representing the likelihood of an artificial image of the input image, even if there are few learning images. also disclosed a method for accurately classifying classes.

JP 2020-16935 A

One purpose of this disclosure is to reduce data collection costs and generate highly accurate machine learning models.

In one aspect of the present disclosure, a learning device includes:
a first inference means for performing a first data augmentation on data with weak correct answers and performing a first inference from the obtained data;
a first loss calculation means for calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
a second inference means for performing a second data augmentation on the weak correct data and performing a second inference from the obtained data;
a third inference means for performing a third data augmentation on the weak correct data and performing a third inference from the obtained data;
Pseudo-label generation means for generating a pseudo-label from the result of the third inference;
a second loss calculation means for calculating a second loss based on the result of the second inference and the pseudo label;
updating means for updating parameters of the first reasoning means, the second reasoning means and the third reasoning means based on the first loss and the second loss.

In another aspect of the disclosure, a learning method comprises:
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
Parameters of the first model, the second model and the third model are updated based on the first loss and the second loss.

In yet another aspect of the present disclosure, the recording medium comprises
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
A program is recorded which causes a computer to execute a process of updating parameters of the first model, the second model and the third model based on the first loss and the second loss.

According to the present disclosure, it is possible to reduce data collection costs and generate highly accurate machine learning models.

An example dataset for a multi-class classification problem is shown. 2 is a block diagram showing the hardware configuration of the learning device of the first embodiment; FIG. 2 is a block diagram showing the functional configuration of the learning device of the first embodiment; FIG. 4 is a flowchart of learning processing by the learning device of the first embodiment; 1 shows the configuration of an inference device according to a first embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a second embodiment; FIG. 9 is a flowchart of learning processing by the learning device of the second embodiment;

Preferred embodiments of the present disclosure will be described below with reference to the drawings.
<First Embodiment>
[Concept explanation]
(weak label)
In this embodiment, a machine learning model is trained using a data set with weak labels (hereinafter also referred to as "weak correct labels"). A normal "correct answer" correctly specifies a single correct answer class to which the data belongs, whereas a "weak correct answer" is a correct answer containing ambiguity, noise, and the like.

Now, consider a multi-class classification problem in which an element x of a data space X is classified into a correct answer category (class) y which is an element of a set Y of correct answer candidates.
(1) Ordinary dataset for multi-class classification problem An ordinary dataset is a set of pairs (x, y) of data x, which is an element of data space X, and correct category y, which is an element of correct candidate set Y. D.

is.

(2) Weak Correct Dataset in Multiclass Classification Problem The weak correct label zεZ is determined according to p(z|y) when the true correct answer y is determined. A data set of weak labels has the following form.

Examples of learning using weak labels include Positive and Unlabeled (PU) learning, Complementary-label learning, Partial-label learning, and Expert dataset learning.

(partial label learning)
In multi-class classification, expert dataset-based learning can be viewed as a special case of partial label-based learning. A “partial label” is a label in which a subset Z of Y is given as a set of correct candidates instead of a correct category y, which is an element of the set Y of all correct candidates, for the data x to be classified. Here the subset Z contains the true correct category y. Hereinafter, for convenience of description, the complement of set A will be referred to as " ^-A ".

In the expert data set, label ^- Z can be viewed as partial label ^- Z in the above sense. A label z that is an element of the subset Z can be regarded as a partial label {z} in the above sense.

It should be noted that partial labels in the above sense are also called weak labels, ambiguous labels, etc., depending on the literature. The terms partial labels, weak labels, and ambiguous labels may be used with different meanings, but in this specification, the term "partial labels" is used as a concept that includes the above partial labels, weak labels, and ambiguous labels. use.

(expert dataset)
A specific example of the expert data set will now be described. An "expert data set" is a training data set that can be used when learning a multi-class classification model, and is composed of a plurality of partial data sets. Specifically, the expert data set is configured to meet the following conditions.
(A) Each of the plurality of partial data sets is assigned at least a portion of all categories to be recognized as a scope of responsibility.
(B) All categories to be recognized are assigned to one of a plurality of partial data sets.
(C) Each data contained in a partial dataset shall have either one of the categories belonging to the scope of responsibility assigned to the partial dataset, or the category to be recognized does not belong to the scope of responsibility of the partial dataset. A correct label indicating is given.

Figure 1 shows an example of a normal dataset and an expert dataset for a multiclass classification problem. FIG. 1(A) shows a typical dataset used for training. Assume now that an object recognition model that performs multi-class classification of 100 classes based on image data is learned. As a normal training data set, one of 100 classes, ie, 100 categories, is assigned as a correct label to each prepared image data.

FIG. 1(B) shows an example of an expert data set. Note that multi-class classification of a total of 100 classes is also performed with this expert data set as in the example of FIG. 1(A). For the expert dataset, prepare multiple partial datasets. In the example of FIG. 1B, a plurality of partial data sets such as "aquatic mammal" and "human" are prepared. A responsibility range is set for each partial data set. In the "Aquatic Mammals" subdataset, five aquatic mammals, "Beaver", "Dolphin", "Otter", "Seal" and "Whale" are assigned as areas of responsibility. In the "People" partial data set, five types of people, "Baby", "Boy", "Girl", "Male", and "Female" are assigned as areas of responsibility. Here, the scope of responsibility is determined so that all classes (categories) to be recognized are assigned to one of the plurality of partial data sets. That is, 100 classes are assigned to multiple partial data sets so that there is no class that is not assigned to any partial data set. In other words, the scope of responsibility is determined so that all recognition targets of 100 classes are covered by a plurality of partial data sets. As a result, even with the expert data set, it is possible to learn 100-class multi-class classification in the same way as with the normal data set shown in FIG. 1(A). Such an expert data set is an example of data with weak answers.

(Use of weak label)
The weak-correct labels as described above require easier correct-correct labeling work than normal correct labels, and can be prepared at a low cost. Therefore, correct class classification can be learned from data with weak answers by using weak labels and losses with weak answers. However, weak answers have a small amount of information, and learning using only losses with weak answers tends to cause over-learning. Therefore, in the embodiment of the present disclosure, in addition to the loss with weak correct answer, overfitting is prevented by introducing a loss without correct answer that imposes that the output of the model does not change significantly due to data augmentation and performing regularization.

[Learning device]
(Hardware configuration)
FIG. 2 is a block diagram showing the hardware configuration of the learning device 100 of the first embodiment. As illustrated, the learning device 100 includes an interface (I/F) 11 , a processor 12 , a memory 13 , a recording medium 14 and a database (DB) 15 .

The interface 11 performs data input/output with an external device. Specifically, data with weak answers used for learning is input through the interface 11 .

The processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire study device 100 by executing a program prepared in advance. The processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The processor 12 executes learning processing, which will be described later.

The memory 13 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. Memory 13 is also used as a working memory during execution of various processes by processor 12 .

The recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the learning device 100 . The recording medium 14 records various programs executed by the processor 12 . When the learning device 100 executes various processes, a program recorded on the recording medium 14 is loaded into the memory 13 and executed by the processor 12 . The DB 15 stores data with weak answers for learning as needed.

(Functional configuration)
FIG. 3 is a block diagram showing the functional configuration of the learning device 100 of the first embodiment. The learning device 100 includes data extension units 21a to 21c, inference units 22a to 22c, a weak correct loss calculation unit 23, a no correct loss calculation unit 24, a pseudo label generation unit 25, a mask generation unit 26, It includes a gradient calculator 27, an updater 28, and

parameter holders

29a and 29b.

Weakly correct data includes input data and weakly correct labels corresponding to the input data. For example, when learning an image recognition model, input data is an image used for learning, and a weak correct label is assigned to the image. Of the data with weak correct answers, the input data is input to the data expansion units 21a to 21c, and the weak correct label is input to the loss calculation unit 23 with weak correct answers.

The data extension unit 21a performs random conversion on the input data, and outputs the converted input data to the inference unit 22a. The inference unit 22a makes inferences on input data using a machine learning model. For example, when learning an image recognition model, the inference unit 22 a infers the class of the object included in the input data, and outputs it to the weak-correct loss calculation unit 23 .

The weak-correct loss calculation unit 23 calculates a weak-correct loss from the inference result input from the inference unit 22a and the weak-correct label. The weak-correct loss calculator 23 outputs the calculated weak-correct loss to the gradient calculator 27 .

The data extension unit 21b performs random conversion on the input data, and outputs the converted data to the inference unit 22b. Similarly, the data expansion unit 21c performs random conversion on the input data and outputs the converted data to the inference unit 22c. The three data extension units 21a to 21c, including the data extension unit 21a, independently perform random transformations on the input data, but the types of transformations may be the same or different. . In a preferred example, the conversion by the data extension unit 21b is stronger than the conversion by the

data extension units

21a and 21c. A strong conversion is a conversion that greatly changes the input data. For example, when the input data is an image, it is a conversion that greatly changes the content of the image.

The inference unit 22b uses a machine learning model to infer the input data converted by the data extension unit 21b, and outputs the inference result to the no-correct loss calculation unit 24. The inference unit 22 c also uses a machine learning model to infer the input data after conversion by the data extension unit 21 c and outputs the inference result to the pseudo label generation unit 25 .

The pseudo-label generation unit 25 generates pseudo-labels based on the inference result of the inference unit 22c. A “pseudo-label” is a label generated from the inference result of a model during learning or after learning. Specifically, the pseudo-label generation unit 25 may convert the inference result of the inference unit 22c into a one-hot vector. A one-hot vector is a vector that has a value of "1" only for the correct class and "0" for the other classes. Instead, the pseudo-label generation unit 25 may convert the inference result of the inference unit 22c into a soft label compared to a so-called hard label using "0" or "1". The pseudo-label generation unit 25 outputs the generated pseudo-label to the no-correct loss calculation unit 24 .

The mask generation unit 26 compares the reliability score of the inference result output by the inference unit 22c, that is, the maximum value of the scores for each class, with a predetermined threshold, and determines the reliability of the inference result by the inference unit 22c. A mask is generated that indicates whether or not it is equal to or greater than a predetermined threshold. Specifically, the mask generation unit 26 generates a mask “1” when the maximum score of the inference result is greater than the threshold, and generates a mask “0” when the maximum value of the score of the inference result is equal to or less than the threshold. Output to the calculation unit 24 . This mask serves as an index indicating whether or not the pseudo label generated by the pseudo label generator 25 should be used for loss calculation by the no-correct loss calculator 24 .

The no-correct loss calculation unit 24 uses the inference result input from the inference unit 22b and the pseudo labels generated by the pseudo-label generation unit 25 to calculate the no-correct loss. Here, based on the mask input from the mask generation unit 26, the no-correct loss calculation unit 24 determines whether or not to perform loss calculation using the pseudo label. Specifically, when the mask input from the mask generation unit 26 is "1", the no-correct loss calculation unit 24 calculates the no-correct loss by assuming that the reliability of the pseudo label is high. On the other hand, when the mask input from the mask generation unit 26 is "0", the no-correct loss calculation unit 24 does not calculate the no-correct loss because the reliability of the pseudo label is low. Then, when the no-correct loss is calculated, the no-correct loss calculator 24 outputs the obtained no-correct loss to the gradient calculator 27 .

The gradient calculation unit 27 calculates the gradients of the inputted weak correct loss and non-correct loss, and outputs them to the updating unit 28 . For example, the gradient calculation unit 27 calculates the gradient of the sum of the weak correct loss and the non-correct loss or the weighted sum, and outputs it to the updating unit 28 .

The update unit 28 uses the input gradient to update the parameters of the

inference units

22a and 22b (hereinafter referred to as "parameter P1") and outputs them to the parameter holding unit 29a. The parameter holding unit 29a sets updated parameters P1 for the

inference units

22a and 22b. Thus, the same parameter P1 is set for the inference section 22a and the inference section 22b.

The update unit 28 also uses the input gradient to update the parameter of the inference unit 22c (hereinafter referred to as "parameter P2") and outputs it to the parameter holding unit 29b. The parameter holding unit 29a sets the updated parameter P2 in the inference unit 22c. Here, the parameter P2 held by the parameter holding unit 29b may be the same as the parameter P1 held by the parameter holding unit 29a. may be taken as

In the above configuration, the data extension unit 21a and the inference unit 22a are an example of the first inference means, and the loss calculation unit with weak correct answer 23 is an example of the first loss calculation means. The data extension unit 21b and the inference unit 22b are examples of second inference means, and the data extension unit 21c and inference unit 22c are examples of third inference means. The pseudo-label generation unit 25 is an example of pseudo-label generation means, and the no-correct loss calculation unit 24 is an example of second loss calculation means. The gradient calculator 27, updater 28, and

parameter holders

29a and 29b are examples of update means.

(learning process)
FIG. 4 is a flowchart of learning processing by the learning device 100 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG. It should be noted that this process is repeatedly executed each time data with a weak correct answer is input.

First, the input data included in the data with weak correct answers are input to the data extension units 21a to 21c. The data extension unit 21a converts the input data of data with weak correct answers, and outputs the converted data to the inference unit 22a (step S11). The inference unit 22a makes an inference from the converted input data, and outputs the inference result to the weak-correct loss calculation unit 23 (step S12). The weak-correct loss calculator 23 calculates a weak-correct loss from the inference result and the weak-correct label, and outputs it to the gradient calculator 27 (step S13).

Also, in parallel with steps S11 to S13, the data extension unit 21c converts the data with weak correct answers and outputs the data to the inference unit 22c (step S14). The inference unit 22c infers from the converted input data, and outputs the inference result to the pseudo label generation unit 25 (step S15). The pseudo-label generation unit 25 generates a pseudo-label from the inference result and outputs it to the no-correct loss calculation unit 24 (step S16). The mask generation unit 26 also generates a mask based on the inference result of the inference unit 22c, and outputs it to the no-correct loss calculation unit 24 (step S17).

The data extension unit 21b converts the data with weak answers and outputs it to the inference unit 22b (step S18). The inference unit 22b performs inference from the converted input data, and outputs the inference result to the no-correct loss calculation unit 24 (step S19). When the mask input from the mask generation unit 26 is "1", the no-correct loss calculation unit 24 calculates a A no-correct loss is calculated and output to the gradient calculator 27 (step S20).

The gradient calculation unit 27 calculates the gradients of the input weak-correct loss and non-correct loss, and outputs them to the updating unit 28 (step S21). The update unit 28 updates the parameter P1 of the

inference units

22a and 22b based on the input gradient and outputs it to the parameter storage unit 29a, and updates the parameter P2 of the inference unit 22c and outputs it to the parameter storage unit 29b. (step S22). Then, the parameter holding unit 29a sets the parameter P1 to the

inference units

22a and 22b, and the parameter holding unit 29b sets the parameter P2 to the inference unit 22c (step S23). Thus, the parameters of the inference units 22a to 22c are updated.

(Modification)
In the above embodiment, in each step of the learning process, the same input data with weak correct answers, that is, the input image is input to the data expansion units 21a to 21c. Alternatively, in each step of the learning process, the image input to the data extension unit 21a may be different from the images input to the

data extension units

21b and 21c. That is, the image used by the inference unit 22b for inference and the image used by the inference unit 22c for inference and the pseudo-label generation unit 25 for pseudo-label generation need to be the same. It may be different from the image used for inference by the inference unit 22a.

[Inference device]
FIG. 5 shows the configuration of the inference device of the first embodiment. The inference device 200 includes an inference unit 201 . The inference unit 201 uses the machine learning model learned by the learning process described above. That is, the inference unit 201 is set with the parameter P1 obtained by the above learning process.

At the time of inference, input data to be inferred is input to the inference unit 201 . This input data is data such as a photographed image acquired in an environment in which the inference apparatus 200 is actually operated, and is data to be subjected to actual image recognition or the like. The inference unit 201 infers from input data and outputs an inference result. For example, in the case of image recognition that performs multi-class classification, the inference unit 201 outputs the probability value of each class as an inference result based on the input image.

<Second embodiment>
FIG. 6 is a block diagram showing the functional configuration of the learning device of the second embodiment. The learning device 70 includes a first inference means 71, a first loss calculation means 72, a second inference means 73, a third inference means 74, a pseudo label generation means 75, and a second loss calculation means. Means 76 and updating means 77 are provided.

FIG. 7 is a flowchart of learning processing by the learning device 70 of the second embodiment. The first inference means 71 performs the first data extension on the data with weak correct answers, and performs the first inference from the obtained data (step S41). The first loss calculation means 72 calculates the first loss from the result of the first inference and the weak correct answers given to the data with weak correct answers (step S42). The second inference means 73 performs the second data extension on the weak correct data, and performs the second inference from the obtained data (step S43). The third inference means 74 performs a third data extension on the weak correct data, and makes a third inference from the obtained data (step S44). The pseudo-label generating means 75 generates a pseudo-label from the result of the third inference (step S45). The second loss calculator 76 calculates a second loss based on the result of the second inference and the pseudo label (step S46). The update means 77 updates the parameters of the first inference means, the second inference means and the third inference means based on the first loss and the second loss (step S47).

According to the learning device of the second embodiment, it is possible to generate a highly accurate machine learning model using data with weak answers.

Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.

(Appendix 1)
a first inference means for performing a first data augmentation on data with weak correct answers and performing a first inference from the obtained data;
a first loss calculation means for calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
a second inference means for performing a second data augmentation on the weak correct data and performing a second inference from the obtained data;
a third inference means for performing a third data augmentation on the weak correct data and performing a third inference from the obtained data;
Pseudo-label generation means for generating a pseudo-label from the result of the third inference;
a second loss calculation means for calculating a second loss based on the result of the second inference and the pseudo label;
updating means for updating parameters of the first reasoning means, the second reasoning means and the third reasoning means based on the first loss and the second loss;
A learning device with

(Appendix 2)
A mask generating means for generating a mask indicating whether the reliability of the result of the third inference is equal to or greater than a predetermined value;
1. The learning device according to Supplementary Note 1, wherein the second loss calculation means calculates the second loss based on the mask when the reliability of the result of the third inference is equal to or higher than a predetermined value.

(Appendix 3)
3. The learning device according to appendix 1 or 2, wherein the first data expansion, the second data expansion, and the third data expansion are performed on the same data with weak correct answers.

(Appendix 4)
The second data extension and the third data extension are performed on the same data with weak answers, and the first data extension is different from the second data extension and the third data extension. 3. The learning device according to appendix 1 or 2, which is performed on data with weak answers.

(Appendix 5)
5. The learning device according to any one of Supplements 1 to 4, wherein the update means sets the same parameter for the first inference means and the second inference means.

(Appendix 6)
6. The learning device according to Supplementary Note 5, wherein the update means generates another parameter based on the parameters set in the first inference means and the second inference means, and sets the parameter in the third inference means.

(Appendix 7)
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
A learning method for updating parameters of the first model, the second model and the third model based on the first loss and the second loss.

(Appendix 8)
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
A recording medium recording a program for causing a computer to execute a process of updating parameters of the first model, the second model and the third model based on the first loss and the second loss.

Although the present disclosure has been described above with reference to the embodiments and examples, the present disclosure is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure.

21a to 21c data extension unit 22a to 22c inference unit 23 loss calculation unit with weak answer 24 loss calculation unit without correct answer 25 pseudo label generation unit 26 mask generation unit 27 gradient calculation unit 28

update unit

29a, 29b parameter storage unit 100 learning device 200 inference device

Claims

a first inference means for performing a first data augmentation on data with weak correct answers and performing a first inference from the obtained data;
a first loss calculation means for calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
a second inference means for performing a second data augmentation on the weak correct data and performing a second inference from the obtained data;
a third inference means for performing a third data augmentation on the weak correct data and performing a third inference from the obtained data;
Pseudo-label generation means for generating a pseudo-label from the result of the third inference;
a second loss calculation means for calculating a second loss based on the result of the second inference and the pseudo label;
updating means for updating parameters of the first reasoning means, the second reasoning means and the third reasoning means based on the first loss and the second loss;
A learning device with
A mask generating means for generating a mask indicating whether the reliability of the result of the third inference is equal to or greater than a predetermined value;
2. The learning device according to claim 1, wherein said second loss calculation means calculates said second loss based on said mask when the reliability of said third inference result is equal to or greater than a predetermined value.
The learning device according to claim 1 or 2, wherein the first data extension, the second data extension and the third data extension are performed on the same data with weak correct answers.
The second data extension and the third data extension are performed on the same data with weak answers, and the first data extension is different from the second data extension and the third data extension. 3. The learning device according to claim 1, wherein the learning is performed on data with weak answers.
The learning device according to any one of claims 1 to 4, wherein the update means sets the same parameter for the first inference means and the second inference means.
6. The learning device according to claim 5, wherein the updating means generates another parameter based on the parameters set in the first inference means and the second inference means, and sets the parameter in the third inference means.
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
A learning method for updating parameters of the first model, the second model and the third model based on the first loss and the second loss.
Performing a first data augmentation on data with weak answers, performing a first inference from the obtained data using a first model,
calculating a first loss from the result of the first inference and the weak correct answer given to the data with the weak correct answer;
Performing a second data augmentation on the weak correct data, performing a second inference from the obtained data using a second model,
Performing a third data augmentation on the weak correct data, performing a third inference from the obtained data using a third model,
generating pseudo-labels from the results of the third inference;
calculating a second loss based on the result of the second inference and the pseudo label;
A recording medium recording a program for causing a computer to execute a process of updating parameters of the first model, the second model and the third model based on the first loss and the second loss.