WO2024042707A1

WO2024042707A1 - Meta-learning method, meta-learning device, and program

Info

Publication number: WO2024042707A1
Application number: PCT/JP2022/032222
Authority: WO
Inventors: 具治岩田
Original assignee: 日本電信電話株式会社
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2024-02-29

Abstract

In a meta-learning method according to an aspect of the present disclosure, a computer executes: an input procedure of inputting a plurality of training datasets made up of training data in which at least a feature of a case example is included, in which the training datasets can include training data not including a label with respect to the feature, and also can have different feature spaces for the feature; a first selecting procedure of selecting one training dataset from the plurality of training datasets; a second selecting procedure of selecting, from the one training dataset, a first feature to take as labelled data and a first label with respect to the first feature, a second feature to take as unlabeled data, and a second label with respect to the second feature; a generating procedure of generating a latent vector for each case example that the first feature or the second feature represents, using a training target parameter, the labeled data, and the unlabeled data; a predicting procedure of predicting a label for the unlabeled data, using the latent vector; and a learning procedure of learning the training target parameter, using prediction results of the label for the unlabeled data, and the second label.

Description

Meta-learning method, meta-learning device and program

The present disclosure relates to a meta-learning method, a meta-learning device, and a program.

Machine learning methods typically use task-specific training data to learn model parameters. In order to achieve high performance on a target task (hereinafter also referred to as target task), a large amount of training data specific to that task is required, but depending on the task, it may be difficult to prepare a sufficient amount of training data. There is a problem of high cost.

In order to solve the above problem, a meta-learning method has been proposed that utilizes training data from different tasks and achieves high performance on the target task even with a small amount of training data (for example, see Non-Patent Document 1). . However, the meta-learning method proposed in Non-Patent Document 1 cannot utilize learning data with different feature spaces. On the other hand, although meta-learning methods that can utilize learning data with different feature spaces have been proposed (for example, see Non-Patent Document 2), in the meta-learning method proposed in Non-Patent Document 2, the labels There is a problem that learning data that has not been assigned cannot be utilized.

The present disclosure has been made in view of the above points, and is a technology that allows meta-learning of model parameters for a target task from a collection of multiple training datasets with different feature spaces, including unlabeled training data. The purpose is to provide

A meta-learning method according to an aspect of the present disclosure provides a learning data set configured of learning data that includes at least a feature amount of a case, and may include learning data that does not include a label for the feature amount, and an input procedure for inputting a plurality of training datasets in which the feature space of feature values may be different; a first selection procedure for selecting one training dataset from the plurality of training datasets; and a first selection procedure for selecting one training dataset from the one training dataset. , a first feature amount to be labeled data, a first label for the first feature amount, a second feature amount to be unlabeled data, and a second label for the second feature amount. A latent vector of each case represented by the first feature quantity or the second feature quantity is calculated using the second selection procedure, the learning target parameter, the labeled data, and the unlabeled data. A prediction procedure for predicting a label for the unlabeled data using the latent vector, a prediction result of a label for the unlabeled data, and the second label. A computer executes a learning procedure for learning parameters.

A technology is provided that allows meta-learning of model parameters for a target task from a collection of multiple learning data sets with different feature spaces, including unlabeled training data.

1 is a diagram illustrating an example of the hardware configuration of a meta-learning device according to the present embodiment. FIG. 1 is a diagram showing an example of a functional configuration of a meta-learning device according to the present embodiment. It is a flowchart which shows an example of meta-learning processing concerning this embodiment. It is a flow chart which shows an example of label prediction processing concerning this embodiment.

An embodiment of the present invention will be described below. In the following embodiments, a meta-learning device 10 that meta-learns model parameters of a target task when a set of multiple training data sets with different feature spaces, including unlabeled training data, is given. explain. Also, a case will be described in which the meta-learning device 10 predicts the label of unlabeled data in the target task using the model parameters learned through this meta-learning.

Here, the meta-learning device 10 has two functions: "during meta-learning" which meta-learns model parameters of the target task, and "labeling" which predicts the label of unlabeled data in the target task using the model parameters learned during meta-learning. There is a "time of prediction". Note that "during meta-learning" may be simply referred to as "during learning", for example. Further, "at the time of label prediction" may be called, for example, "at the time of inference" or "at the time of testing".

The meta-learning device 10 at the time of meta-learning has T pieces of learning data, where the t-th learning data set is D ^(t) = {(x _tn , y _tn ) | n=1,...,N _t } Assume that a set of sets D={D ^(t) |t=1, . . . , T} is given. Here, x _tn is a feature included in the n-th learning data of the t-th learning data set D ^(t) , and y _tn is its label. At this time, there may be features to which no labels are assigned in the learning data set D ^(t) (t=1,...,T). Further, _Nt is the number of learning data included in the t-th learning data set, and T is the number of learning data sets. Note that the learning data (that is, for example, data represented by a set of a feature amount and its label, data represented by a feature amount without a label, etc.) may be called a "case."

Hereinafter, for the sake of simplicity, it is assumed that t represents a task, and the t-th learning data set D ^(t) is a learning data set specific to task t. However, this is just an example, and for example, the learning data set D ^(t) and the learning data set D ^(t') may be task-specific learning data sets for the same task.

It is assumed that the meta-learning device 10 at the time of label prediction is given a labeled feature set ^XL of the target task, its label set ^YL , and an unlabeled feature set ^XU . The target task is assumed to be a task different from any of the tasks targeted by the learning data set given during meta-learning. At the time of label prediction, the purpose is to predict the label of each feature included in the unlabeled feature set ^XU (that is, predict the label of unlabeled data).

Note that in this embodiment, the feature amounts are assumed to be in vector format, but if the feature amounts are not in vector format (for example, when the feature amounts are expressed as images, graphs, etc.), the feature amounts may be converted into vectors. By converting, the following embodiments can be similarly applied. Further, in this embodiment, the task is classification or regression, but the following embodiment can be similarly applied to other machine learning problems such as density estimation and clustering, for example. Furthermore, in addition to the feature amounts and labels, the following embodiments can be applied in the same way even when some information related to the case, such as an explanatory text of the feature amounts, is provided.

In the following embodiments, a case will be described in which the same meta-learning device 10 implements meta-learning and label prediction, but meta-learning and label prediction may be implemented by different devices. In this case, the device that realizes label prediction may be called a "label prediction device," a "prediction device," an "inference device," or the like, for example.

<Example of hardware configuration of meta-learning device 10>
FIG. 1 shows an example of the hardware configuration of a meta-learning device 10 according to this embodiment. As shown in FIG. 1, the meta-learning device 10 according to the present embodiment is realized by the hardware configuration of a general computer or computer system, and includes, for example, an input device 101, a display device 102, an external I/F 103, It has a communication I/F 104, a RAM (Random Access Memory) 105, a ROM (Read Only Memory) 106, an auxiliary storage device 107, and a processor 108. Further, each of these pieces of hardware is connected to each other via a bus 109 so as to be communicable.

The input device 101 is, for example, a keyboard, a mouse, a touch panel, a physical button, or the like. The display device 102 is, for example, a display, a display panel, or the like. Note that the meta-learning device 10 may not include at least one of the input device 101 and the display device 102, for example.

The external I/F 103 is an interface with an external device such as the recording medium 103a. The meta-learning device 10 can read, write, etc. to the recording medium 103a via the external I/F 103. Examples of the recording medium 103a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.

The communication I/F 104 is an interface for connecting the meta-learning device 10 to a communication network or the like. The RAM 105 is a volatile semiconductor memory (storage device) that temporarily holds programs and data. The ROM 106 is a nonvolatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off. The auxiliary storage device 107 is, for example, a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory. The processor 108 is an arithmetic device such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).

By having the hardware configuration shown in FIG. 1, the meta-learning device 10 according to the present embodiment can realize various processes described below. Note that the hardware configuration shown in FIG. 1 is an example, and the hardware configuration of the meta-learning device 10 is not limited to this. For example, the meta-learning device 10 may include multiple auxiliary storage devices 107 and multiple processors 108, may not include some of the illustrated hardware, or may include hardware other than the illustrated hardware. may include various hardware.

<Example of functional configuration of meta-learning device 10>
FIG. 2 shows an example of the functional configuration of the meta-learning device 10 according to this embodiment. As shown in FIG. 2, the meta-learning device 10 according to this embodiment includes an input section 201, a latent vector generation section 202, a prediction section 203, a meta-learning section 204, and an output section 205. Each of these units is realized, for example, by one or more programs installed in the meta-learning device 10 causing the processor 108 or the like to execute the process. Further, the meta-learning device 10 according to this embodiment includes a storage unit 206. The storage unit 206 is realized by, for example, the auxiliary storage device 107 or the like. Note that the storage unit 206 may be realized by, for example, a storage device such as a database server connected to the meta-learning device 10 via a communication network or the like.

The input unit 201 inputs a given set of learning data sets D during meta-learning. In addition, the input unit 201 randomly selects the t-th learning data set D ^(t) from the set D of learning data sets, and then selects the labeled feature set X ^L and the labeled feature set D ^{(t) from this learning data set D (t).} The label set ^YL , the unlabeled feature quantity set ^XU , and the label set ^YU are randomly sampled. Note that the label set Y ^U is used as the correct answer (teacher) for the prediction result when the label of each feature included in the unlabeled feature set X ^U is predicted.

Further, at the time of label prediction, the input unit 201 inputs the given labeled feature set ^XL , its label set ^YL , and unlabeled feature set ^XU .

The latent vector generation unit 202 inputs the labeled feature set ^XL , its label set ^YL , and the unlabeled feature set ^XU , and takes into account all of the input information to generate a feature space. A latent vector representing the properties of each case is generated using operations that can be handled even if the space of labels and labels changes. Here, the latent vector generation unit 202 includes a variable feature amount self-attention mechanism unit 210 that is used as a module of the latent vector generation unit 202. Hereinafter, the variable feature amount self-attention mechanism section 210 will be explained, and then a method for generating a latent vector by the latent vector generation section 202 will be explained.

The variable feature self-attention mechanism unit 210 is realized by a neural network, receives a tensor as input, and converts the tensor. The tensor input to the variable feature self-attention mechanism unit 210 is

shall be. The number of dimensions D ₁ of the first axis and the number of dimensions D ₂ of the second axis may change depending on the input tensor Z (in other words, the dimensions of the first axis and the dimensions of the second axis depend on the input tensor Z ). On the other hand, it is assumed that the number of dimensions _D3 of the third axis is fixed. That is, the first axis and the second axis have equivariance with respect to transformation.

First, the variable feature self-attention mechanism unit 210 converts the input tensor Z into

Convert to three tensors. This is, for example,

It can be converted to Here, W ^Q , W ^K and W ^V are learning target parameters (that is, they are parameters of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism section 210). Moreover, × _d is the d-th mode product. Note that in equation (1) above, tensor Z is converted to query Q, key K, and value V by linear transformation, but even if tensor Z is converted to query Q, key K, and value V by transformation other than linear transformation, good.

Next, the variable feature self-attention mechanism unit 210 creates a weight matrix representing the similarity between slices on the first axis, such that the closer the query Q and key K are, the higher the value is.

Calculate. This is, for example,

It can be calculated as follows. Here, Q ₍₁₎ represents a slice along the first axis of Q, and K ₍₁₎ represents a slice along the first axis of K. Also,

represents transposition.

Next, the variable feature self-attention mechanism unit 210 calculates the output O by integrating the first axis of the value V using the weight matrix A. This is, for example,

It can be calculated as follows.

Note that the variable feature amount self-attention mechanism unit 210 may convert a plurality of outputs O at once as the final output. For example, when collecting R outputs O,

It can be calculated as follows. Here, W ^O is a learning target parameter (that is, a parameter of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism unit 210).

Note that although the case where a self-attention mechanism is used has been described above, it is also possible to use an attention mechanism for other tensors, for example.

Next, a method of generating a latent vector by the latent vector generating unit 202 will be explained. First, the latent vector generation unit 202 creates a tensor that can distinguish between a feature amount and a label, and a feature amount with a label and a feature amount without a label, from the inputted X ^L , Y ^L , and X ^U . This can be considered, for example, if a slice along the first axis creates a tensor as shown below.

Here, the first slice contains information on features and labels, and the second slice contains information on whether features and labels are observed (features or labels are observed). Contains 1 if it is, 0 otherwise. Also, the third slice contains information on whether it is a feature (1 if it is a feature, 0 otherwise), and the fourth slice contains information on whether it is a label or not. Contains information on whether or not the label is a label (1 if it is a label, 0 if not). Note that in the first slice, cases are lined up vertically (that is, each row corresponds to one case), and horizontally, features are lined up first, and then labels are lined up. .

In addition to the feature values and labels, if some information regarding the case, such as an explanatory text of the feature value, is provided, such information should be concatenated with the tensor input to the latent vector generation unit 202. Bye. Furthermore, at this time, if the information is not in vector format, it may be converted into a vector using, for example, a neural network. This makes it possible to generate latent vectors that also take into account some information about the case.

Next, the latent vector generation unit 202 transforms the tensor using an operation that can be handled even if the feature space or the label space changes. This converts the tensor by, for example, using the variable feature amount self-attention mechanism unit 210 to alternately repeat self-attention to cases and self-attention to features and labels. Specifically, as the 2b-1 (where b is a natural number) transformation, the self-attention to the example is

This is done by Here, f ^(b) (Z ^(b) ) is a function including the variable feature amount self-attention mechanism unit 210, for example,

This can be achieved with FF ^(b) represents a neural network, LN ^(b) represents layer normalization, and MVSA ^(b) represents a variable feature amount self-attention mechanism unit 210. W ^R(b) is a learning target parameter.

Also, as the 2bth transformation, self-attention to features and labels is

This is done by

Then, the latent vector generation unit 202 converts the finally obtained slices for each instance of the B-th (B is a predetermined natural number) tensor into a vector, and uses the vector as a latent vector. That is, the latent vector generation unit 202 converts a matrix representing a slice for each example of Z ^(B) into a vector (for example, if the matrix is a row and b column, converts it into an ab-dimensional vector) and generates a latent vector. shall be.

The prediction unit 203 uses the latent vector of each case and the label given to the labeled feature to predict the label of the unlabeled feature. For example, if the labels are discrete, the probability that the label is y can be calculated using a Gaussian mixture distribution as follows.

Here, z _n ^U (n=1, . . . , N ^U ) is a latent vector of an example including an unlabeled feature, and N ^U is the number of unlabeled features. Further, μ _c is the average latent vector of label c (that is, the average vector of the latent vectors of cases including label c), and C is the number of labels of the labeled feature quantity. Θ is a collection of learning target parameters (that is, a collection of parameters of W ^Q , W ^K , W ^V , W ^O , W ^{R (b)} , FF ^(b) , etc.). Hereinafter, Θ will be referred to as a model parameter.

Although the Gaussian mixture distribution is used above, it is also possible to use any classification/regression model other than the Gaussian mixture distribution, such as linear regression or Gaussian process. When a classification/regression model is used, the model parameters Θ may also include the parameters of this classification/regression model.

The meta-learning unit 204 learns the model parameters Θ so that the performance of the target task becomes high. That is, the meta-learning unit 204 learns the model parameter Θ so that the expected test performance of the target task becomes high. This is, for example,

Just optimize it. Here, y _n ^U represents the label of the n-th unlabeled feature. This optimization can be achieved by calculating the expected value using the Monte Carlo method.

The output unit 205 outputs the learned model parameters Θ to a predetermined output destination during meta-learning. Furthermore, during label prediction, the output unit 205 outputs the predicted label to a predetermined output destination.

For example, the output unit 205 outputs the learned model parameters Θ and predicted labels to the storage unit 206. In addition to this, for example, the learned model parameters Θ and predicted labels may be output to the display device 102 such as a display, or may be output to a terminal device etc. connected via a communication network. .

The storage unit 206 stores a given set of learning data sets D during meta-learning. Further, at the time of label prediction, the given labeled feature set ^XL , its label set ^YL , and unlabeled feature set ^XU are stored. In addition to these, the storage unit 206 stores model parameters Θ, hyperparameters (for example, the number of labeled features N ^L and the number of unlabeled features N ^U for each task, etc.), temporary information, etc. .

<Meta-learning processing>
An example of the meta-learning process according to this embodiment will be described below with reference to FIG. 3.

First, the input unit 201 inputs the set D of learning data sets, the number of labeled features ^NL and the number of unlabeled features ^NU for each task from the storage unit 206 (step S101). Note that N ^L and N ^U may be different for each task.

Next, the meta-learning unit 204 initializes the model parameter Θ using any known method (step S102).

Next, the input unit 201 randomly selects one learning data set D ^(t) from {D ⁽¹⁾ , . . . , D ^(T) } (step S103).

Next, the input unit 201 randomly samples N ^L features and their labels from the learning data set D ^(t) selected in step S103 above, and creates a labeled feature set X ^L and its label set Y. Obtain ^L (step S104). However, if a label is not assigned to a feature sampled from the learning dataset D ^(t) , a specific value (for example, 0, etc.) indicating that no label is attached to this feature is regarded as a label and is an element of the label set ^YL .

Next, the input unit 201 inputs the learning data set D ^(t) selected in step S103 above so as not to overlap with the labeled feature set ^XL and its label set Y ^L obtained in step S104 above. N ^U feature quantities and their labels are randomly sampled from , to obtain an unlabeled feature quantity set X ^U and its label set Y ^U (step S105). Although the learning data set D ^(t) may also include feature quantities to which no labels have been assigned, in this step, the feature quantities to which no labels have been assigned are not sampled.

Next, the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set ^XL , its label set ^YL , and the unlabeled feature set ^XU (step S106). . Note that if a feature quantity to which no label has been assigned is sampled in step S104 above, for example, the second slice in the above equation (5) corresponds to an example of a feature quantity to which no label has been assigned. Among the row elements, all elements representing whether or not a label is observed are 0 (that is, a value representing unobserved).

Next, the prediction unit 203 uses the latent vectors of each case generated in step S106 above and the labels included in the label set ^YL to predict the features included in the unlabeled feature set ^XU . A label is predicted (step S107).

Next, the meta-learning unit 204 calculates test performance by comparing the unlabeled feature set ^XU with the label set ^YU (step S108). For example, as a test performance,

Calculate.

Next, the meta-learning unit 204 learns (updates) the model parameter Θ so that the performance of the target task becomes high (step S109). For example, when evaluating the performance of a target task using expected test performance (expected value of test performance), the model parameter Θ is learned using the above equation (10).

Next, the meta-learning unit 204 determines whether a predetermined termination condition is satisfied (step S110). If the termination condition is satisfied, the process advances to step S111; otherwise, the process advances to step S103. As a result, steps S103 to S109 are repeatedly executed until the termination condition is satisfied. Note that the termination conditions include, for example, that the number of repetitions of steps S103 to S109 exceeds a predetermined threshold, that the model parameter Θ has converged, and so on.

If it is determined in step S110 that the predetermined end condition is satisfied, the output unit 205 outputs the learned model parameters Θ to a predetermined output destination (step S111).

<Label prediction processing>
An example of the label prediction process according to this embodiment will be described below with reference to FIG. 4. Note that in the following, it is assumed that the model parameter Θ has been learned.

First, the input unit 201 inputs the labeled feature set ^XL , its label set ^YL , and the unlabeled feature set ^XU from the storage unit 206 (step S201).

Next, the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set ^XL , its label set ^YL , and the unlabeled feature set ^XU (step S202). .

Next, the prediction unit 203 uses the latent vectors of each case generated in step S202 above and the labels included in the label set ^YL to predict the features included in the unlabeled feature set ^XU . A label is predicted (step S203). Note that, specifically, for example, labels of unlabeled feature amounts may be sampled according to the above equation (9).

Then, the output unit 205 outputs the label predicted in step S203 above to a predetermined output destination (step S204).

<Evaluation>
In order to evaluate the meta-learning device 10 according to this embodiment, comparisons were made with existing methods using artificial data. Existing methods include normal mixed distribution (GMM), Gaussian process (GP), label propagation (LP), model agnostic meta-learning (MAML), prototype net (Proto), heterogeneous meta-learning (HML), and meta-label propagation ( MetaLP) was adopted.

In addition, the test correct answer rate was used as an evaluation indicator. The evaluation results (average and standard error) are shown in Table 1 below.

Here, the proposed method represents the meta-learning device 10 according to this embodiment. Further, Shot represents the number of labeled features for each task.

As shown in Table 1 above, it can be seen that the meta-learning device 10 according to the present embodiment is able to achieve a higher test correct answer rate than existing methods.

<Summary>
As described above, the meta-learning device 10 according to the present embodiment is capable of assigning labels when given a set of a plurality of learning data sets with different feature spaces, including training data to which no labels are assigned. It is possible to learn model parameters using training data that is not available. Therefore, according to the meta-learning device 10 according to the present embodiment, even if only a small amount of learning data for the target task is given, high performance can be achieved in the target task.

The present invention is not limited to the above-described specifically disclosed embodiments, and various modifications and changes, combinations with known techniques, etc. are possible without departing from the scope of the claims. .

10 Meta-learning device 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 RAM
106 ROM
107 Auxiliary storage device 108 Processor 109 Bus 201 Input unit 202 Latent vector generation unit 203 Prediction unit 204 Meta learning unit 205 Output unit 206 Storage unit 210 Variable feature self-attention mechanism unit

Claims

A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input procedure for inputting a training dataset;
a first selection procedure of selecting one learning data set from the plurality of training data sets;
From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection step of selecting a second label for;
a generation procedure of generating a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data;
a prediction procedure for predicting a label for the unlabeled data using the latent vector;
a learning procedure of learning the learning target parameter using the label prediction result for the unlabeled data and the second label;
A meta-learning method that is performed by a computer.
The generation procedure is
Using the labeled data and the unlabeled data, create a tensor that distinguishes between a feature amount and a label, and also distinguishes between the labeled data and the unlabeled data,
After alternately repeating attention to the case and attention to the feature amount and label for the tensor a predetermined number of times, a slice for each case of the tensor with the repetition error is generated as a latent vector for each case. , The meta-learning method according to claim 1.
The generation procedure is
A first matrix whose slices along the first axis represent information on features or labels, a second matrix where each element represents information on whether or not the features or labels are observed, and each element represents a feature amount. 3. The tensor is created by creating the tensor including a third matrix representing information on whether each element corresponds to a label, and a fourth matrix representing information on whether each element corresponds to a label. meta-learning methods.
The second selection procedure is
If a label for the first feature does not exist, a label having a predetermined value indicating that there is no label for the first feature is set as the first label for the first feature;
The generation procedure is
If there is no label for the first feature, an element indicating whether or not a label for the first feature is observed among the elements of the second matrix indicates that no label is observed. The meta-learning method according to claim 3, wherein the information to be represented is set.
The generation procedure includes an attention mechanism procedure in which a neural network calculates attention regarding the tensor by inputting a tensor in which the dimension of a predetermined axis is variable;
The generation procedure is
5. The meta-learning method according to claim 2, wherein the attention to the example and the attention to the feature amount and label are calculated by the attention mechanism procedure.
The meta-learning method according to claim 5, wherein the learning target parameters include parameters of the neural network.
A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input portion configured to input a training dataset;
a first selection unit configured to select one learning data set from the plurality of learning data sets;
From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection unit configured to select a second label for the second label;
Generation configured to generate a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data. Department and
a prediction unit configured to predict a label for the unlabeled data using the latent vector;
a learning unit configured to learn the learning target parameter using the label prediction result for the unlabeled data and the second label;
A meta-learning device with
A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input procedure for inputting a training dataset;
a first selection procedure of selecting one learning data set from the plurality of training data sets;
From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection step of selecting a second label for;
a generation procedure of generating a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data;
a prediction procedure for predicting a label for the unlabeled data using the latent vector;
a learning procedure of learning the learning target parameter using the label prediction result for the unlabeled data and the second label;
A program that causes a computer to execute.