WO2024042707A1 - Meta-learning method, meta-learning device, and program - Google Patents

Meta-learning method, meta-learning device, and program Download PDF

Info

Publication number
WO2024042707A1
WO2024042707A1 PCT/JP2022/032222 JP2022032222W WO2024042707A1 WO 2024042707 A1 WO2024042707 A1 WO 2024042707A1 JP 2022032222 W JP2022032222 W JP 2022032222W WO 2024042707 A1 WO2024042707 A1 WO 2024042707A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning
label
feature
data
meta
Prior art date
Application number
PCT/JP2022/032222
Other languages
French (fr)
Japanese (ja)
Inventor
具治 岩田
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/032222 priority Critical patent/WO2024042707A1/en
Publication of WO2024042707A1 publication Critical patent/WO2024042707A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to a meta-learning method, a meta-learning device, and a program.
  • Machine learning methods typically use task-specific training data to learn model parameters.
  • target task In order to achieve high performance on a target task (hereinafter also referred to as target task), a large amount of training data specific to that task is required, but depending on the task, it may be difficult to prepare a sufficient amount of training data. There is a problem of high cost.
  • a meta-learning method has been proposed that utilizes training data from different tasks and achieves high performance on the target task even with a small amount of training data (for example, see Non-Patent Document 1). .
  • the meta-learning method proposed in Non-Patent Document 1 cannot utilize learning data with different feature spaces.
  • meta-learning methods that can utilize learning data with different feature spaces have been proposed (for example, see Non-Patent Document 2), in the meta-learning method proposed in Non-Patent Document 2, the labels There is a problem that learning data that has not been assigned cannot be utilized.
  • the present disclosure has been made in view of the above points, and is a technology that allows meta-learning of model parameters for a target task from a collection of multiple training datasets with different feature spaces, including unlabeled training data.
  • the purpose is to provide
  • a meta-learning method provides a learning data set configured of learning data that includes at least a feature amount of a case, and may include learning data that does not include a label for the feature amount, and an input procedure for inputting a plurality of training datasets in which the feature space of feature values may be different; a first selection procedure for selecting one training dataset from the plurality of training datasets; and a first selection procedure for selecting one training dataset from the one training dataset. , a first feature amount to be labeled data, a first label for the first feature amount, a second feature amount to be unlabeled data, and a second label for the second feature amount.
  • a latent vector of each case represented by the first feature quantity or the second feature quantity is calculated using the second selection procedure, the learning target parameter, the labeled data, and the unlabeled data.
  • a computer executes a learning procedure for learning parameters.
  • a technology allows meta-learning of model parameters for a target task from a collection of multiple learning data sets with different feature spaces, including unlabeled training data.
  • FIG. 1 is a diagram illustrating an example of the hardware configuration of a meta-learning device according to the present embodiment.
  • FIG. 1 is a diagram showing an example of a functional configuration of a meta-learning device according to the present embodiment. It is a flowchart which shows an example of meta-learning processing concerning this embodiment. It is a flow chart which shows an example of label prediction processing concerning this embodiment.
  • a meta-learning device 10 that meta-learns model parameters of a target task when a set of multiple training data sets with different feature spaces, including unlabeled training data, is given. explain. Also, a case will be described in which the meta-learning device 10 predicts the label of unlabeled data in the target task using the model parameters learned through this meta-learning.
  • the meta-learning device 10 has two functions: “during meta-learning” which meta-learns model parameters of the target task, and “labeling” which predicts the label of unlabeled data in the target task using the model parameters learned during meta-learning.
  • time of prediction There is a “time of prediction”.
  • “during meta-learning” may be simply referred to as “during learning”, for example.
  • “at the time of label prediction” may be called, for example, “at the time of inference” or “at the time of testing”.
  • t 1, . . . , T ⁇ is given.
  • x tn is a feature included in the n-th learning data of the t-th learning data set D (t)
  • Nt is the number of learning data included in the t-th learning data set
  • T is the number of learning data sets.
  • the learning data that is, for example, data represented by a set of a feature amount and its label, data represented by a feature amount without a label, etc.
  • the learning data may be called a "case.”
  • the t-th learning data set D (t) is a learning data set specific to task t.
  • the learning data set D (t) and the learning data set D (t') may be task-specific learning data sets for the same task.
  • the meta-learning device 10 at the time of label prediction is given a labeled feature set XL of the target task, its label set YL , and an unlabeled feature set XU .
  • the target task is assumed to be a task different from any of the tasks targeted by the learning data set given during meta-learning.
  • the purpose is to predict the label of each feature included in the unlabeled feature set XU (that is, predict the label of unlabeled data).
  • the feature amounts are assumed to be in vector format, but if the feature amounts are not in vector format (for example, when the feature amounts are expressed as images, graphs, etc.), the feature amounts may be converted into vectors.
  • the task is classification or regression, but the following embodiment can be similarly applied to other machine learning problems such as density estimation and clustering, for example.
  • the following embodiments can be applied in the same way even when some information related to the case, such as an explanatory text of the feature amounts, is provided.
  • meta-learning device 10 implements meta-learning and label prediction
  • meta-learning and label prediction may be implemented by different devices.
  • the device that realizes label prediction may be called a "label prediction device,” a “prediction device,” an “inference device,” or the like, for example.
  • FIG. 1 shows an example of the hardware configuration of a meta-learning device 10 according to this embodiment.
  • the meta-learning device 10 according to the present embodiment is realized by the hardware configuration of a general computer or computer system, and includes, for example, an input device 101, a display device 102, an external I/F 103, It has a communication I/F 104, a RAM (Random Access Memory) 105, a ROM (Read Only Memory) 106, an auxiliary storage device 107, and a processor 108. Further, each of these pieces of hardware is connected to each other via a bus 109 so as to be communicable.
  • a bus 109 so as to be communicable.
  • the input device 101 is, for example, a keyboard, a mouse, a touch panel, a physical button, or the like.
  • the display device 102 is, for example, a display, a display panel, or the like. Note that the meta-learning device 10 may not include at least one of the input device 101 and the display device 102, for example.
  • the external I/F 103 is an interface with an external device such as the recording medium 103a.
  • the meta-learning device 10 can read, write, etc. to the recording medium 103a via the external I/F 103.
  • Examples of the recording medium 103a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.
  • the communication I/F 104 is an interface for connecting the meta-learning device 10 to a communication network or the like.
  • the RAM 105 is a volatile semiconductor memory (storage device) that temporarily holds programs and data.
  • the ROM 106 is a nonvolatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off.
  • the auxiliary storage device 107 is, for example, a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory.
  • the processor 108 is an arithmetic device such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).
  • the meta-learning device 10 can realize various processes described below.
  • the hardware configuration shown in FIG. 1 is an example, and the hardware configuration of the meta-learning device 10 is not limited to this.
  • the meta-learning device 10 may include multiple auxiliary storage devices 107 and multiple processors 108, may not include some of the illustrated hardware, or may include hardware other than the illustrated hardware. may include various hardware.
  • FIG. 2 shows an example of the functional configuration of the meta-learning device 10 according to this embodiment.
  • the meta-learning device 10 includes an input section 201, a latent vector generation section 202, a prediction section 203, a meta-learning section 204, and an output section 205. Each of these units is realized, for example, by one or more programs installed in the meta-learning device 10 causing the processor 108 or the like to execute the process.
  • the meta-learning device 10 according to this embodiment includes a storage unit 206.
  • the storage unit 206 is realized by, for example, the auxiliary storage device 107 or the like.
  • the storage unit 206 may be realized by, for example, a storage device such as a database server connected to the meta-learning device 10 via a communication network or the like.
  • the input unit 201 inputs a given set of learning data sets D during meta-learning. In addition, the input unit 201 randomly selects the t-th learning data set D (t) from the set D of learning data sets, and then selects the labeled feature set X L and the labeled feature set D (t) from this learning data set D (t).
  • the label set YL , the unlabeled feature quantity set XU , and the label set YU are randomly sampled. Note that the label set Y U is used as the correct answer (teacher) for the prediction result when the label of each feature included in the unlabeled feature set X U is predicted.
  • the input unit 201 inputs the given labeled feature set XL , its label set YL , and unlabeled feature set XU .
  • the latent vector generation unit 202 inputs the labeled feature set XL , its label set YL , and the unlabeled feature set XU , and takes into account all of the input information to generate a feature space.
  • a latent vector representing the properties of each case is generated using operations that can be handled even if the space of labels and labels changes.
  • the latent vector generation unit 202 includes a variable feature amount self-attention mechanism unit 210 that is used as a module of the latent vector generation unit 202.
  • the variable feature amount self-attention mechanism section 210 will be explained, and then a method for generating a latent vector by the latent vector generation section 202 will be explained.
  • the variable feature self-attention mechanism unit 210 is realized by a neural network, receives a tensor as input, and converts the tensor.
  • the tensor input to the variable feature self-attention mechanism unit 210 is
  • the number of dimensions D 1 of the first axis and the number of dimensions D 2 of the second axis may change depending on the input tensor Z (in other words, the dimensions of the first axis and the dimensions of the second axis depend on the input tensor Z ).
  • the number of dimensions D3 of the third axis is fixed. That is, the first axis and the second axis have equivariance with respect to transformation.
  • variable feature self-attention mechanism unit 210 converts the input tensor Z into
  • W Q , W K and W V are learning target parameters (that is, they are parameters of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism section 210).
  • ⁇ d is the d-th mode product. Note that in equation (1) above, tensor Z is converted to query Q, key K, and value V by linear transformation, but even if tensor Z is converted to query Q, key K, and value V by transformation other than linear transformation, good.
  • variable feature self-attention mechanism unit 210 creates a weight matrix representing the similarity between slices on the first axis, such that the closer the query Q and key K are, the higher the value is.
  • Q (1) represents a slice along the first axis of Q
  • K (1) represents a slice along the first axis of K.
  • variable feature self-attention mechanism unit 210 calculates the output O by integrating the first axis of the value V using the weight matrix A. This is, for example,
  • variable feature amount self-attention mechanism unit 210 may convert a plurality of outputs O at once as the final output. For example, when collecting R outputs O,
  • W O is a learning target parameter (that is, a parameter of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism unit 210).
  • the latent vector generation unit 202 creates a tensor that can distinguish between a feature amount and a label, and a feature amount with a label and a feature amount without a label, from the inputted X L , Y L , and X U . This can be considered, for example, if a slice along the first axis creates a tensor as shown below.
  • the first slice contains information on features and labels
  • the second slice contains information on whether features and labels are observed (features or labels are observed). Contains 1 if it is, 0 otherwise.
  • the third slice contains information on whether it is a feature (1 if it is a feature, 0 otherwise), and the fourth slice contains information on whether it is a label or not. Contains information on whether or not the label is a label (1 if it is a label, 0 if not). Note that in the first slice, cases are lined up vertically (that is, each row corresponds to one case), and horizontally, features are lined up first, and then labels are lined up. .
  • the latent vector generation unit 202 In addition to the feature values and labels, if some information regarding the case, such as an explanatory text of the feature value, is provided, such information should be concatenated with the tensor input to the latent vector generation unit 202. Bye. Furthermore, at this time, if the information is not in vector format, it may be converted into a vector using, for example, a neural network. This makes it possible to generate latent vectors that also take into account some information about the case.
  • the latent vector generation unit 202 transforms the tensor using an operation that can be handled even if the feature space or the label space changes. This converts the tensor by, for example, using the variable feature amount self-attention mechanism unit 210 to alternately repeat self-attention to cases and self-attention to features and labels. Specifically, as the 2b-1 (where b is a natural number) transformation, the self-attention to the example is
  • f (b) (Z (b) ) is a function including the variable feature amount self-attention mechanism unit 210, for example,
  • FF (b) represents a neural network
  • LN (b) represents layer normalization
  • MVSA (b) represents a variable feature amount self-attention mechanism unit 210.
  • W R(b) is a learning target parameter.
  • the latent vector generation unit 202 converts the finally obtained slices for each instance of the B-th (B is a predetermined natural number) tensor into a vector, and uses the vector as a latent vector. That is, the latent vector generation unit 202 converts a matrix representing a slice for each example of Z (B) into a vector (for example, if the matrix is a row and b column, converts it into an ab-dimensional vector) and generates a latent vector. shall be.
  • the prediction unit 203 uses the latent vector of each case and the label given to the labeled feature to predict the label of the unlabeled feature. For example, if the labels are discrete, the probability that the label is y can be calculated using a Gaussian mixture distribution as follows.
  • ⁇ c is the average latent vector of label c (that is, the average vector of the latent vectors of cases including label c), and C is the number of labels of the labeled feature quantity.
  • is a collection of learning target parameters (that is, a collection of parameters of W Q , W K , W V , W O , W R (b) , FF (b) , etc.).
  • will be referred to as a model parameter.
  • the Gaussian mixture distribution is used above, it is also possible to use any classification/regression model other than the Gaussian mixture distribution, such as linear regression or Gaussian process.
  • the model parameters ⁇ may also include the parameters of this classification/regression model.
  • the meta-learning unit 204 learns the model parameters ⁇ so that the performance of the target task becomes high. That is, the meta-learning unit 204 learns the model parameter ⁇ so that the expected test performance of the target task becomes high. This is, for example,
  • y n U represents the label of the n-th unlabeled feature. This optimization can be achieved by calculating the expected value using the Monte Carlo method.
  • the output unit 205 outputs the learned model parameters ⁇ to a predetermined output destination during meta-learning. Furthermore, during label prediction, the output unit 205 outputs the predicted label to a predetermined output destination.
  • the output unit 205 outputs the learned model parameters ⁇ and predicted labels to the storage unit 206.
  • the learned model parameters ⁇ and predicted labels may be output to the display device 102 such as a display, or may be output to a terminal device etc. connected via a communication network. .
  • the storage unit 206 stores a given set of learning data sets D during meta-learning. Further, at the time of label prediction, the given labeled feature set XL , its label set YL , and unlabeled feature set XU are stored. In addition to these, the storage unit 206 stores model parameters ⁇ , hyperparameters (for example, the number of labeled features N L and the number of unlabeled features N U for each task, etc.), temporary information, etc. .
  • the input unit 201 inputs the set D of learning data sets, the number of labeled features NL and the number of unlabeled features NU for each task from the storage unit 206 (step S101). Note that N L and N U may be different for each task.
  • the meta-learning unit 204 initializes the model parameter ⁇ using any known method (step S102).
  • the input unit 201 randomly selects one learning data set D (t) from ⁇ D (1) , . . . , D (T) ⁇ (step S103).
  • the input unit 201 randomly samples N L features and their labels from the learning data set D (t) selected in step S103 above, and creates a labeled feature set X L and its label set Y. Obtain L (step S104). However, if a label is not assigned to a feature sampled from the learning dataset D (t) , a specific value (for example, 0, etc.) indicating that no label is attached to this feature is regarded as a label and is an element of the label set YL .
  • the input unit 201 inputs the learning data set D (t) selected in step S103 above so as not to overlap with the labeled feature set XL and its label set Y L obtained in step S104 above.
  • N U feature quantities and their labels are randomly sampled from , to obtain an unlabeled feature quantity set X U and its label set Y U (step S105).
  • the learning data set D (t) may also include feature quantities to which no labels have been assigned, in this step, the feature quantities to which no labels have been assigned are not sampled.
  • the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set XL , its label set YL , and the unlabeled feature set XU (step S106). .
  • the second slice in the above equation (5) corresponds to an example of a feature quantity to which no label has been assigned.
  • all elements representing whether or not a label is observed are 0 (that is, a value representing unobserved).
  • the prediction unit 203 uses the latent vectors of each case generated in step S106 above and the labels included in the label set YL to predict the features included in the unlabeled feature set XU .
  • a label is predicted (step S107).
  • the meta-learning unit 204 calculates test performance by comparing the unlabeled feature set XU with the label set YU (step S108). For example, as a test performance,
  • the meta-learning unit 204 learns (updates) the model parameter ⁇ so that the performance of the target task becomes high (step S109). For example, when evaluating the performance of a target task using expected test performance (expected value of test performance), the model parameter ⁇ is learned using the above equation (10).
  • the meta-learning unit 204 determines whether a predetermined termination condition is satisfied (step S110). If the termination condition is satisfied, the process advances to step S111; otherwise, the process advances to step S103. As a result, steps S103 to S109 are repeatedly executed until the termination condition is satisfied.
  • the termination conditions include, for example, that the number of repetitions of steps S103 to S109 exceeds a predetermined threshold, that the model parameter ⁇ has converged, and so on.
  • step S110 If it is determined in step S110 that the predetermined end condition is satisfied, the output unit 205 outputs the learned model parameters ⁇ to a predetermined output destination (step S111).
  • the input unit 201 inputs the labeled feature set XL , its label set YL , and the unlabeled feature set XU from the storage unit 206 (step S201).
  • the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set XL , its label set YL , and the unlabeled feature set XU (step S202). .
  • the prediction unit 203 uses the latent vectors of each case generated in step S202 above and the labels included in the label set YL to predict the features included in the unlabeled feature set XU .
  • a label is predicted (step S203). Note that, specifically, for example, labels of unlabeled feature amounts may be sampled according to the above equation (9).
  • the output unit 205 outputs the label predicted in step S203 above to a predetermined output destination (step S204).
  • test correct answer rate was used as an evaluation indicator.
  • the evaluation results (average and standard error) are shown in Table 1 below.
  • the proposed method represents the meta-learning device 10 according to this embodiment. Further, Shot represents the number of labeled features for each task.
  • the meta-learning device 10 As shown in Table 1 above, it can be seen that the meta-learning device 10 according to the present embodiment is able to achieve a higher test correct answer rate than existing methods.
  • the meta-learning device 10 is capable of assigning labels when given a set of a plurality of learning data sets with different feature spaces, including training data to which no labels are assigned. It is possible to learn model parameters using training data that is not available. Therefore, according to the meta-learning device 10 according to the present embodiment, even if only a small amount of learning data for the target task is given, high performance can be achieved in the target task.
  • Meta-learning device 101 Input device 102 Display device 103 External I/F 103a Recording medium 104 Communication I/F 105 RAM 106 ROM 107 Auxiliary storage device 108 Processor 109 Bus 201 Input unit 202 Latent vector generation unit 203 Prediction unit 204 Meta learning unit 205 Output unit 206 Storage unit 210 Variable feature self-attention mechanism unit

Abstract

In a meta-learning method according to an aspect of the present disclosure, a computer executes: an input procedure of inputting a plurality of training datasets made up of training data in which at least a feature of a case example is included, in which the training datasets can include training data not including a label with respect to the feature, and also can have different feature spaces for the feature; a first selecting procedure of selecting one training dataset from the plurality of training datasets; a second selecting procedure of selecting, from the one training dataset, a first feature to take as labelled data and a first label with respect to the first feature, a second feature to take as unlabeled data, and a second label with respect to the second feature; a generating procedure of generating a latent vector for each case example that the first feature or the second feature represents, using a training target parameter, the labeled data, and the unlabeled data; a predicting procedure of predicting a label for the unlabeled data, using the latent vector; and a learning procedure of learning the training target parameter, using prediction results of the label for the unlabeled data, and the second label.

Description

メタ学習方法、メタ学習装置及びプログラムMeta-learning method, meta-learning device and program
 本開示は、メタ学習方法、メタ学習装置及びプログラムに関する。 The present disclosure relates to a meta-learning method, a meta-learning device, and a program.
 機械学習手法では、通常、タスク固有の学習データを使ってモデルのパラメータを学習する。目的とするタスク(以下、目的タスクともいう。)で高い性能を達成するためにはそのタスク固有の学習データが大量に必要になるが、タスクによっては十分な量の学習データを用意するのに高いコストが掛かるという問題がある。 Machine learning methods typically use task-specific training data to learn model parameters. In order to achieve high performance on a target task (hereinafter also referred to as target task), a large amount of training data specific to that task is required, but depending on the task, it may be difficult to prepare a sufficient amount of training data. There is a problem of high cost.
 上記の問題を解決するために、異なるタスクの学習データを活用し、少量の学習データでも目的タスクで高い性能を達成するためのメタ学習手法が提案されている(例えば、非特許文献1参照)。しかしながら、非特許文献1で提案されているメタ学習手法では、特徴量空間が異なる学習データは活用することができなかった。これに対して、特徴量空間が異なる学習データを活用できるメタ学習手法が提案されているものの(例えば、非特許文献2参照)、非特許文献2で提案されているメタ学習手法では、ラベルが付与されていない学習データを活用できないという問題点がある。 In order to solve the above problem, a meta-learning method has been proposed that utilizes training data from different tasks and achieves high performance on the target task even with a small amount of training data (for example, see Non-Patent Document 1). . However, the meta-learning method proposed in Non-Patent Document 1 cannot utilize learning data with different feature spaces. On the other hand, although meta-learning methods that can utilize learning data with different feature spaces have been proposed (for example, see Non-Patent Document 2), in the meta-learning method proposed in Non-Patent Document 2, the labels There is a problem that learning data that has not been assigned cannot be utilized.
 本開示は、上記の点に鑑みてなされたもので、ラベルが付与されていない学習データを含む、特徴量空間が異なる複数の学習データセットの集合から、目的タスクのモデルパラメータをメタ学習できる技術を提供することを目的とする。 The present disclosure has been made in view of the above points, and is a technology that allows meta-learning of model parameters for a target task from a collection of multiple training datasets with different feature spaces, including unlabeled training data. The purpose is to provide
 本開示の一態様によるメタ学習方法は、事例の特徴量が少なくとも含まれる学習データで構成される学習データセットであって、前記特徴量に対するラベルが含まれない学習データが含まれ得ると共に、前記特徴量の特徴量空間が異なり得る複数の学習データセットを入力する入力手順と、前記複数の学習データセットから一の学習データセットを選択する第1の選択手順と、前記一の学習データセットから、ラベルありデータとする第1の特徴量及び該第1の特徴量に対する第1のラベルと、ラベルなしデータとする第2の特徴量と、前記第2の特徴量に対する第2のラベルとを選択する第2の選択手順と、学習対象パラメータと、前記ラベルありデータと、前記ラベルなしデータとを用いて、前記第1の特徴量又は前記第2の特徴量が表す各事例の潜在ベクトルを生成する生成手順と、前記潜在ベクトルを用いて、前記ラベルなしデータに対するラベルを予測する予測手順と、前記ラベルなしデータに対するラベルの予測結果と、前記第2のラベルとを用いて、前記学習対象パラメータを学習する学習手順と、をコンピュータが実行する。 A meta-learning method according to an aspect of the present disclosure provides a learning data set configured of learning data that includes at least a feature amount of a case, and may include learning data that does not include a label for the feature amount, and an input procedure for inputting a plurality of training datasets in which the feature space of feature values may be different; a first selection procedure for selecting one training dataset from the plurality of training datasets; and a first selection procedure for selecting one training dataset from the one training dataset. , a first feature amount to be labeled data, a first label for the first feature amount, a second feature amount to be unlabeled data, and a second label for the second feature amount. A latent vector of each case represented by the first feature quantity or the second feature quantity is calculated using the second selection procedure, the learning target parameter, the labeled data, and the unlabeled data. A prediction procedure for predicting a label for the unlabeled data using the latent vector, a prediction result of a label for the unlabeled data, and the second label. A computer executes a learning procedure for learning parameters.
 ラベルが付与されていない学習データを含む、特徴量空間が異なる複数の学習データセットの集合から、目的タスクのモデルパラメータをメタ学習できる技術が提供される。 A technology is provided that allows meta-learning of model parameters for a target task from a collection of multiple learning data sets with different feature spaces, including unlabeled training data.
本実施形態に係るメタ学習装置のハードウェア構成の一例を示す図である。1 is a diagram illustrating an example of the hardware configuration of a meta-learning device according to the present embodiment. 本実施形態に係るメタ学習装置の機能構成の一例を示す図である。FIG. 1 is a diagram showing an example of a functional configuration of a meta-learning device according to the present embodiment. 本実施形態に係るメタ学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of meta-learning processing concerning this embodiment. 本実施形態に係るラベル予測処理の一例を示すフローチャートである。It is a flow chart which shows an example of label prediction processing concerning this embodiment.
 以下、本発明の一実施形態について説明する。以下の実施形態では、ラベルが付与されていない学習データを含む、特徴量空間が異なる複数の学習データセットの集合が与えられたときに、目的タスクのモデルパラメータをメタ学習するメタ学習装置10について説明する。また、このメタ学習によって学習されたモデルパラメータを用いて、メタ学習装置10によって目的タスクにおけるラベルなしデータのラベルを予測する場合についても説明する。 An embodiment of the present invention will be described below. In the following embodiments, a meta-learning device 10 that meta-learns model parameters of a target task when a set of multiple training data sets with different feature spaces, including unlabeled training data, is given. explain. Also, a case will be described in which the meta-learning device 10 predicts the label of unlabeled data in the target task using the model parameters learned through this meta-learning.
 ここで、メタ学習装置10には、目的タスクのモデルパラメータをメタ学習する「メタ学習時」と、メタ学習時に学習されたモデルパラメータを用いて目的タスクにおけるラベルなしデータのラベルを予測する「ラベル予測時」とが存在する。なお、「メタ学習時」は、例えば、単に「学習時」と呼ばれてもよい。また、「ラベル予測時」は、例えば、「推論時」や「テスト時」等と呼ばれてもよい。 Here, the meta-learning device 10 has two functions: "during meta-learning" which meta-learns model parameters of the target task, and "labeling" which predicts the label of unlabeled data in the target task using the model parameters learned during meta-learning. There is a "time of prediction". Note that "during meta-learning" may be simply referred to as "during learning", for example. Further, "at the time of label prediction" may be called, for example, "at the time of inference" or "at the time of testing".
 メタ学習時におけるメタ学習装置10には、t番目の学習データセットをD(t)={(xtn,ytn)|n=1,・・・,N}として、T個の学習データセットの集合D={D(t)|t=1,・・・,T}が与えられるものとする。ここで、xtnはt番目の学習データセットD(t)のn番目の学習データに含まれる特徴量、ytnはそのラベルである。このとき、学習データセットD(t)(t=1,・・・,T)の中には、ラベルが付与されていない特徴量が存在してもよい。また、Nはt番目の学習データセットに含まれる学習データ数、Tは学習データセット数である。なお、学習データ(つまり、例えば、特徴量とそのラベルとの組で表されるデータ、ラベルなしの特徴量で表されるデータ等)は「事例」と呼ばれてもよい。 The meta-learning device 10 at the time of meta-learning has T pieces of learning data, where the t-th learning data set is D (t) = {(x tn , y tn ) | n=1,...,N t } Assume that a set of sets D={D (t) |t=1, . . . , T} is given. Here, x tn is a feature included in the n-th learning data of the t-th learning data set D (t) , and y tn is its label. At this time, there may be features to which no labels are assigned in the learning data set D (t) (t=1,...,T). Further, Nt is the number of learning data included in the t-th learning data set, and T is the number of learning data sets. Note that the learning data (that is, for example, data represented by a set of a feature amount and its label, data represented by a feature amount without a label, etc.) may be called a "case."
 以下、簡単のため、tはタスクを表すものとして、t番目の学習データセットD(t)はタスクt固有の学習データセットであるものとする。ただし、これは一例であって、例えば、学習データセットをD(t)と学習データセットをD(t')とが同一タスクにおけるタスク固有の学習データセットであってもよい。 Hereinafter, for the sake of simplicity, it is assumed that t represents a task, and the t-th learning data set D (t) is a learning data set specific to task t. However, this is just an example, and for example, the learning data set D (t) and the learning data set D (t') may be task-specific learning data sets for the same task.
 ラベル予測時におけるメタ学習装置10には、目的タスクのラベルありの特徴量集合Xと、そのラベル集合Yと、ラベルなしの特徴量集合Xとが与えられるものとする。目的タスクは、メタ学習時に与えられた学習データセットが対象とするタスクのいずれとも異なるタスクであるものとする。ラベル予測時では、ラベルなしの特徴量集合Xに含まれる各特徴量のラベルを予測(つまり、ラベルなしデータのラベルを予測)することが目的である。 It is assumed that the meta-learning device 10 at the time of label prediction is given a labeled feature set XL of the target task, its label set YL , and an unlabeled feature set XU . The target task is assumed to be a task different from any of the tasks targeted by the learning data set given during meta-learning. At the time of label prediction, the purpose is to predict the label of each feature included in the unlabeled feature set XU (that is, predict the label of unlabeled data).
 なお、本実施形態では、特徴量はベクトル形式であるものとするが、特徴量がベクトル形式でない場合(例えば、特徴量が画像やグラフ等で表現される場合)には、特徴量をベクトルに変換することで、以下の実施形態を同様に適用することが可能である。また、本実施形態では、タスクは分類や回帰であるものとするが、例えば、密度推定やクラスタリング等といった他の機械学習問題にも、以下の実施形態を同様に適用することが可能である。更に、特徴量とラベル以外にも、例えば、特徴量の説明文等といった事例に関する何等かの情報が付与されている場合にも、以下の実施形態を同様に適用することが可能である。 Note that in this embodiment, the feature amounts are assumed to be in vector format, but if the feature amounts are not in vector format (for example, when the feature amounts are expressed as images, graphs, etc.), the feature amounts may be converted into vectors. By converting, the following embodiments can be similarly applied. Further, in this embodiment, the task is classification or regression, but the following embodiment can be similarly applied to other machine learning problems such as density estimation and clustering, for example. Furthermore, in addition to the feature amounts and labels, the following embodiments can be applied in the same way even when some information related to the case, such as an explanatory text of the feature amounts, is provided.
 以下の実施形態では、メタ学習時とラベル予測時を同一のメタ学習装置10が実現する場合について説明するが、メタ学習時とラベル予測時とが異なる装置で実現されていてもよい。この場合、ラベル予測時を実現する装置は、例えば、「ラベル予測装置」、「予測装置」、「推論装置」等と呼ばれてもよい。 In the following embodiments, a case will be described in which the same meta-learning device 10 implements meta-learning and label prediction, but meta-learning and label prediction may be implemented by different devices. In this case, the device that realizes label prediction may be called a "label prediction device," a "prediction device," an "inference device," or the like, for example.
 <メタ学習装置10のハードウェア構成例>
 本実施形態に係るメタ学習装置10のハードウェア構成例を図1に示す。図1に示すように、本実施形態に係るメタ学習装置10は一般的なコンピュータ又はコンピュータシステムのハードウェア構成により実現され、例えば、入力装置101と、表示装置102と、外部I/F103と、通信I/F104と、RAM(Random Access Memory)105と、ROM(Read Only Memory)106と、補助記憶装置107と、プロセッサ108とを有する。また、これらの各ハードウェアは、それぞれがバス109を介して通信可能に接続されている。
<Example of hardware configuration of meta-learning device 10>
FIG. 1 shows an example of the hardware configuration of a meta-learning device 10 according to this embodiment. As shown in FIG. 1, the meta-learning device 10 according to the present embodiment is realized by the hardware configuration of a general computer or computer system, and includes, for example, an input device 101, a display device 102, an external I/F 103, It has a communication I/F 104, a RAM (Random Access Memory) 105, a ROM (Read Only Memory) 106, an auxiliary storage device 107, and a processor 108. Further, each of these pieces of hardware is connected to each other via a bus 109 so as to be communicable.
 入力装置101は、例えば、キーボード、マウス、タッチパネル、物理ボタン等である。表示装置102は、例えば、ディスプレイ、表示パネル等である。なお、メタ学習装置10は、例えば、入力装置101及び表示装置102のうちの少なくとも一方を有していなくてもよい。 The input device 101 is, for example, a keyboard, a mouse, a touch panel, a physical button, or the like. The display device 102 is, for example, a display, a display panel, or the like. Note that the meta-learning device 10 may not include at least one of the input device 101 and the display device 102, for example.
 外部I/F103は、記録媒体103a等の外部装置とのインタフェースである。メタ学習装置10は、外部I/F103を介して、記録媒体103aの読み取りや書き込み等を行うことができる。記録媒体103aとしては、例えば、フレキシブルディスク、CD(Compact Disc)、DVD(Digital Versatile Disk)、SDメモリカード(Secure Digital memory card)、USB(Universal Serial Bus)メモリカード等が挙げられる。 The external I/F 103 is an interface with an external device such as the recording medium 103a. The meta-learning device 10 can read, write, etc. to the recording medium 103a via the external I/F 103. Examples of the recording medium 103a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.
 通信I/F104は、メタ学習装置10が通信ネットワーク等に接続するためのインタフェースである。RAM105は、プログラムやデータを一時保持する揮発性の半導体メモリ(記憶装置)である。ROM106は、電源を切ってもプログラムやデータを保持することができる不揮発性の半導体メモリ(記憶装置)である。補助記憶装置107は、例えば、HDD(Hard Disk Drive)、SSD(Solid State Drive)、フラッシュメモリ等のストレージ装置(記憶装置)である。プロセッサ108は、CPU(Central Processing Unit)やGPU(Graphics Processing Unit)等の演算装置である。 The communication I/F 104 is an interface for connecting the meta-learning device 10 to a communication network or the like. The RAM 105 is a volatile semiconductor memory (storage device) that temporarily holds programs and data. The ROM 106 is a nonvolatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off. The auxiliary storage device 107 is, for example, a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory. The processor 108 is an arithmetic device such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).
 本実施形態に係るメタ学習装置10は、図1に示すハードウェア構成を有することにより、後述する各種処理を実現することができる。なお、図1に示すハードウェア構成は一例であって、メタ学習装置10のハードウェア構成はこれに限られるものではない。例えば、メタ学習装置10は、複数の補助記憶装置107や複数のプロセッサ108を有していてもよいし、図示したハードウェアの一部を有していなくてもよいし、図示したハードウェア以外の様々なハードウェアを有していてもよい。 By having the hardware configuration shown in FIG. 1, the meta-learning device 10 according to the present embodiment can realize various processes described below. Note that the hardware configuration shown in FIG. 1 is an example, and the hardware configuration of the meta-learning device 10 is not limited to this. For example, the meta-learning device 10 may include multiple auxiliary storage devices 107 and multiple processors 108, may not include some of the illustrated hardware, or may include hardware other than the illustrated hardware. may include various hardware.
 <メタ学習装置10の機能構成例>
 本実施形態に係るメタ学習装置10の機能構成例を図2に示す。図2に示すように、本実施形態に係るメタ学習装置10は、入力部201と、潜在ベクトル生成部202と、予測部203と、メタ学習部204と、出力部205とを有する。これら各部は、例えば、メタ学習装置10にインストールされた1以上のプログラムが、プロセッサ108等に実行させる処理により実現される。また、本実施形態に係るメタ学習装置10は、記憶部206を有する。当該記憶部206は、例えば、補助記憶装置107等により実現される。なお、記憶部206は、例えば、メタ学習装置10と通信ネットワーク等を介して接続されるデータベースサーバ等といった記憶装置により実現されていてもよい。
<Example of functional configuration of meta-learning device 10>
FIG. 2 shows an example of the functional configuration of the meta-learning device 10 according to this embodiment. As shown in FIG. 2, the meta-learning device 10 according to this embodiment includes an input section 201, a latent vector generation section 202, a prediction section 203, a meta-learning section 204, and an output section 205. Each of these units is realized, for example, by one or more programs installed in the meta-learning device 10 causing the processor 108 or the like to execute the process. Further, the meta-learning device 10 according to this embodiment includes a storage unit 206. The storage unit 206 is realized by, for example, the auxiliary storage device 107 or the like. Note that the storage unit 206 may be realized by, for example, a storage device such as a database server connected to the meta-learning device 10 via a communication network or the like.
 入力部201は、メタ学習時において、与えられた学習データセットの集合Dを入力する。また、入力部201は、学習データセットの集合Dからt番目の学習データセットD(t)をランダムに選択した上で、この学習データセットD(t)からラベルありの特徴量集合X及びそのラベル集合Yとラベルなしの特徴量集合Xとそのラベル集合Yとをランダムにサンプリングする。なお、ラベル集合Yは、ラベルなしの特徴量集合Xに含まれる各特徴量のラベルを予測したときの予測結果に対する正解(教師)として用いられる。 The input unit 201 inputs a given set of learning data sets D during meta-learning. In addition, the input unit 201 randomly selects the t-th learning data set D (t) from the set D of learning data sets, and then selects the labeled feature set X L and the labeled feature set D (t) from this learning data set D (t). The label set YL , the unlabeled feature quantity set XU , and the label set YU are randomly sampled. Note that the label set Y U is used as the correct answer (teacher) for the prediction result when the label of each feature included in the unlabeled feature set X U is predicted.
 また、入力部201は、ラベル予測時において、与えられたラベルありの特徴量集合X及びそのラベル集合Yとラベルなしの特徴量集合Xとを入力する。 Further, at the time of label prediction, the input unit 201 inputs the given labeled feature set XL , its label set YL , and unlabeled feature set XU .
 潜在ベクトル生成部202は、ラベルありの特徴量集合Xとそのラベル集合Yとラベルなしの特徴量集合Xとを入力として、これら入力された情報をすべて考慮して、特徴量の空間やラベルの空間が変化しても対応可能な操作により、各事例の性質を表す潜在ベクトルを生成する。ここで、潜在ベクトル生成部202には、潜在ベクトル生成部202のモジュールとして利用される可変特徴量自己注意機構部210が含まれる。以下、可変特徴量自己注意機構部210について説明した後、潜在ベクトル生成部202によって潜在ベクトルを生成する方法について説明する。 The latent vector generation unit 202 inputs the labeled feature set XL , its label set YL , and the unlabeled feature set XU , and takes into account all of the input information to generate a feature space. A latent vector representing the properties of each case is generated using operations that can be handled even if the space of labels and labels changes. Here, the latent vector generation unit 202 includes a variable feature amount self-attention mechanism unit 210 that is used as a module of the latent vector generation unit 202. Hereinafter, the variable feature amount self-attention mechanism section 210 will be explained, and then a method for generating a latent vector by the latent vector generation section 202 will be explained.
 可変特徴量自己注意機構部210はニューラルネットワークにより実現され、テンソルを入力として、そのテンソルを変換する。可変特徴量自己注意機構部210に入力されるテンソルを The variable feature self-attention mechanism unit 210 is realized by a neural network, receives a tensor as input, and converts the tensor. The tensor input to the variable feature self-attention mechanism unit 210 is
Figure JPOXMLDOC01-appb-M000001
とする。入力されるテンソルZによって第1軸の次元数Dや第2軸の次元数Dが変化してもよい(つまり、第1軸の次元と第2軸の次元は、入力されるテンソルZによって異なり得る。)。一方で、第3軸の次元数Dは固定であるものとする。すなわち、第1軸と第2軸に関しては、変換に対して同変性を持つ。
Figure JPOXMLDOC01-appb-M000001
shall be. The number of dimensions D 1 of the first axis and the number of dimensions D 2 of the second axis may change depending on the input tensor Z (in other words, the dimensions of the first axis and the dimensions of the second axis depend on the input tensor Z ). On the other hand, it is assumed that the number of dimensions D3 of the third axis is fixed. That is, the first axis and the second axis have equivariance with respect to transformation.
 まず、可変特徴量自己注意機構部210は、入力されたテンソルZを、 First, the variable feature self-attention mechanism unit 210 converts the input tensor Z into
Figure JPOXMLDOC01-appb-M000002
の3つのテンソルに変換する。これは、例えば、
Figure JPOXMLDOC01-appb-M000002
Convert to three tensors. This is, for example,
Figure JPOXMLDOC01-appb-M000003
と変換できる。ここで、W、W及びWは学習対象パラメータである(つまり、可変特徴量自己注意機構部210を実現するニューラルネットワークの或る線形変換層のパラメータである。)。また、×は第dモード積である。なお、上記の式(1)では線形変換によりテンソルZをクエリQ、キーK及びバリューVに変換したが、線形変換以外の変換によりテンソルZをクエリQ、キーK及びバリューVに変換してもよい。
Figure JPOXMLDOC01-appb-M000003
It can be converted to Here, W Q , W K and W V are learning target parameters (that is, they are parameters of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism section 210). Moreover, × d is the d-th mode product. Note that in equation (1) above, tensor Z is converted to query Q, key K, and value V by linear transformation, but even if tensor Z is converted to query Q, key K, and value V by transformation other than linear transformation, good.
 次に、可変特徴量自己注意機構部210は、クエリQとキーKが近いほど高い値を取るような、第1軸のスライス間の類似度を表す重み行列 Next, the variable feature self-attention mechanism unit 210 creates a weight matrix representing the similarity between slices on the first axis, such that the closer the query Q and key K are, the higher the value is.
Figure JPOXMLDOC01-appb-M000004
を計算する。これは、例えば、
Figure JPOXMLDOC01-appb-M000004
Calculate. This is, for example,
Figure JPOXMLDOC01-appb-M000005
と計算できる。ここで、Q(1)はQの第1軸のスライス、K(1)はKの第1軸のスライスを表す。また、
Figure JPOXMLDOC01-appb-M000005
It can be calculated as follows. Here, Q (1) represents a slice along the first axis of Q, and K (1) represents a slice along the first axis of K. Also,
Figure JPOXMLDOC01-appb-M000006
は転置を表す。
Figure JPOXMLDOC01-appb-M000006
represents transposition.
 次に、可変特徴量自己注意機構部210は、重み行列Aを用いてバリューVの第1軸を統合することで、出力Oを計算する。これは、例えば、 Next, the variable feature self-attention mechanism unit 210 calculates the output O by integrating the first axis of the value V using the weight matrix A. This is, for example,
Figure JPOXMLDOC01-appb-M000007
と計算できる。
Figure JPOXMLDOC01-appb-M000007
It can be calculated as follows.
 なお、可変特徴量自己注意機構部210は、複数の出力Oをまとめて変換したものを最終的な出力としてもよい。例えば、R個の出力Oをまとめる場合、 Note that the variable feature amount self-attention mechanism unit 210 may convert a plurality of outputs O at once as the final output. For example, when collecting R outputs O,
Figure JPOXMLDOC01-appb-M000008
と計算できる。ここで、Wは学習対象パラメータである(つまり、可変特徴量自己注意機構部210を実現するニューラルネットワークの或る線形変換層のパラメータである。)。
Figure JPOXMLDOC01-appb-M000008
It can be calculated as follows. Here, W O is a learning target parameter (that is, a parameter of a certain linear transformation layer of a neural network that implements the variable feature self-attention mechanism unit 210).
 なお、上記では自己に対する注意機構を用いた場合について説明したが、例えば、他のテンソルに対する注意機構を用いることも可能である。 Note that although the case where a self-attention mechanism is used has been described above, it is also possible to use an attention mechanism for other tensors, for example.
 次に、潜在ベクトル生成部202によって潜在ベクトルを生成する方法について説明する。まず、潜在ベクトル生成部202は、入力されたX、Y及びXから、特徴量とラベル、ラベルありの特徴量とラベルなしの特徴量をそれぞれ区別できるようなテンソルを作成する。これは、例えば、第1軸に沿ったスライスが以下のようなテンソルを作成することが考えられる。 Next, a method of generating a latent vector by the latent vector generating unit 202 will be explained. First, the latent vector generation unit 202 creates a tensor that can distinguish between a feature amount and a label, and a feature amount with a label and a feature amount without a label, from the inputted X L , Y L , and X U . This can be considered, for example, if a slice along the first axis creates a tensor as shown below.
Figure JPOXMLDOC01-appb-M000009
 ここで、1つ目のスライスには特徴量とラベルの情報が含まれており、2つ目のスライスには特徴量とラベルが観測されているか否かの情報(特徴量又はラベルが観測されている場合は1、そうでない場合は0)が含まれている。また、3つ目のスライスには特徴量であるか否かの情報(特徴量である場合は1、そうでない場合は0)が含まれており、4つ目のスライスにはラベルであるか否かの情報(ラベルである場合は1、そうでない場合は0)が含まれている。なお、1つ目のスライスでは縦方向に事例が並んでおり(つまり、各行の各々が1つの事例に対応している。)、横方向には最初に特徴量、その後にラベルが並んでいる。
Figure JPOXMLDOC01-appb-M000009
Here, the first slice contains information on features and labels, and the second slice contains information on whether features and labels are observed (features or labels are observed). Contains 1 if it is, 0 otherwise. Also, the third slice contains information on whether it is a feature (1 if it is a feature, 0 otherwise), and the fourth slice contains information on whether it is a label or not. Contains information on whether or not the label is a label (1 if it is a label, 0 if not). Note that in the first slice, cases are lined up vertically (that is, each row corresponds to one case), and horizontally, features are lined up first, and then labels are lined up. .
 特徴量とラベル以外に、例えば、特徴量の説明文等といった事例に関する何等かの情報が付与されている場合には、潜在ベクトル生成部202に入力されるテンソルに対してそれらの情報を連結すればよい。また、このとき、それらの情報がベクトル形式でない場合には、例えば、ニューラルネットワーク等によりベクトルに変換すればよい。これにより、事例に関する何等かの情報も考慮した潜在ベクトルを生成することができるようになる。 In addition to the feature values and labels, if some information regarding the case, such as an explanatory text of the feature value, is provided, such information should be concatenated with the tensor input to the latent vector generation unit 202. Bye. Furthermore, at this time, if the information is not in vector format, it may be converted into a vector using, for example, a neural network. This makes it possible to generate latent vectors that also take into account some information about the case.
 次に、潜在ベクトル生成部202は、特徴量の空間やラベルの空間が変化しても対応可能な操作によりテンソルを変換する。これは、例えば、可変特徴量自己注意機構部210を用いて、事例に対する自己注意と、特徴量とラベルに対する自己注意とを交互に繰り返すことにより、テンソルを変換する。具体的には、2b-1(ただし、bは自然数)番目の変換として、事例に対する自己注意を Next, the latent vector generation unit 202 transforms the tensor using an operation that can be handled even if the feature space or the label space changes. This converts the tensor by, for example, using the variable feature amount self-attention mechanism unit 210 to alternately repeat self-attention to cases and self-attention to features and labels. Specifically, as the 2b-1 (where b is a natural number) transformation, the self-attention to the example is
Figure JPOXMLDOC01-appb-M000010
により行う。ここで、f(b)(Z(b))は可変特徴量自己注意機構部210を含む関数であり、例えば、
Figure JPOXMLDOC01-appb-M000010
This is done by Here, f (b) (Z (b) ) is a function including the variable feature amount self-attention mechanism unit 210, for example,
Figure JPOXMLDOC01-appb-M000011
で実現できる。FF(b)はニューラルネットワーク、LN(b)はレイヤー正規化、MVSA(b)は可変特徴量自己注意機構部210を表す。WR(b)は学習対象パラメータである。
Figure JPOXMLDOC01-appb-M000011
This can be achieved with FF (b) represents a neural network, LN (b) represents layer normalization, and MVSA (b) represents a variable feature amount self-attention mechanism unit 210. W R(b) is a learning target parameter.
 また、2b番目の変換として、特徴量とラベルに対する自己注意を Also, as the 2bth transformation, self-attention to features and labels is
Figure JPOXMLDOC01-appb-M000012
により行う。
Figure JPOXMLDOC01-appb-M000012
This is done by
 そして、潜在ベクトル生成部202は、最終的に得られたB(Bは予め決められた自然数)番目のテンソルの各事例に関するスライスをベクトルに変換したものを潜在ベクトルとする。すなわち、潜在ベクトル生成部202は、Z(B)の各事例に関するスライスを表す行列をベクトルに変換(例えば、当該行列がa行b列であればab次元のベクトルに変換)したものを潜在ベクトルとする。 Then, the latent vector generation unit 202 converts the finally obtained slices for each instance of the B-th (B is a predetermined natural number) tensor into a vector, and uses the vector as a latent vector. That is, the latent vector generation unit 202 converts a matrix representing a slice for each example of Z (B) into a vector (for example, if the matrix is a row and b column, converts it into an ab-dimensional vector) and generates a latent vector. shall be.
 予測部203は、各事例の潜在ベクトルと、ラベルありの特徴量に付与されているラベルとを用いて、ラベルなしの特徴量のラベルを予測する。例えば、ラベルが離散である場合、混合ガウス分布を用いて、ラベルがyである確率を以下のように計算できる。 The prediction unit 203 uses the latent vector of each case and the label given to the labeled feature to predict the label of the unlabeled feature. For example, if the labels are discrete, the probability that the label is y can be calculated using a Gaussian mixture distribution as follows.
Figure JPOXMLDOC01-appb-M000013
 ここで、z (n=1,・・・,N)はラベルなしの特徴量が含まれる事例の潜在ベクトル、Nはラベルなしの特徴量数である。また、μはラベルcの平均潜在ベクトル(つまり、ラベルcが含まれる事例の潜在ベクトルの平均ベクトル)であり、Cはラベルありの特徴量のラベル数である。Θは学習対象パラメータをまとめたもの(つまり、例えば、W、W、W、W、WR(b)、FF(b)のパラメータ等をまとめたもの)である。以下、Θをモデルパラメータと呼ぶことにする。
Figure JPOXMLDOC01-appb-M000013
Here, z n U (n=1, . . . , N U ) is a latent vector of an example including an unlabeled feature, and N U is the number of unlabeled features. Further, μ c is the average latent vector of label c (that is, the average vector of the latent vectors of cases including label c), and C is the number of labels of the labeled feature quantity. Θ is a collection of learning target parameters (that is, a collection of parameters of W Q , W K , W V , W O , W R (b) , FF (b) , etc.). Hereinafter, Θ will be referred to as a model parameter.
 なお、上記では混合ガウス分布を用いたが、混合ガウス分布以外にも、例えば、線形回帰やガウス過程等といった任意の分類・回帰モデルを用いることも可能である。分類・回帰モデルを用いた場合、モデルパラメータΘには、この分類・回帰モデルのパラメータも含まれ得る。 Although the Gaussian mixture distribution is used above, it is also possible to use any classification/regression model other than the Gaussian mixture distribution, such as linear regression or Gaussian process. When a classification/regression model is used, the model parameters Θ may also include the parameters of this classification/regression model.
 メタ学習部204は、目的タスクの性能が高くなるように、モデルパラメータΘを学習する。すなわち、メタ学習部204は、目的タスクの期待テスト性能が高くなるように、モデルパラメータΘを学習する。これは、例えば、 The meta-learning unit 204 learns the model parameters Θ so that the performance of the target task becomes high. That is, the meta-learning unit 204 learns the model parameter Θ so that the expected test performance of the target task becomes high. This is, for example,
Figure JPOXMLDOC01-appb-M000014
と最適化すればよい。ここで、y はn番目のラベルなしの特徴量のラベルを表す。この最適化は、モンテカルロ法を用いて期待値を計算することで実現できる。
Figure JPOXMLDOC01-appb-M000014
Just optimize it. Here, y n U represents the label of the n-th unlabeled feature. This optimization can be achieved by calculating the expected value using the Monte Carlo method.
 出力部205は、メタ学習時において、学習済みのモデルパラメータΘを予め決められた所定の出力先に出力する。また、出力部205は、ラベル予測時において、予測されたラベルを予め決められた所定の出力先に出力する。 The output unit 205 outputs the learned model parameters Θ to a predetermined output destination during meta-learning. Furthermore, during label prediction, the output unit 205 outputs the predicted label to a predetermined output destination.
 例えば、出力部205は、学習済みのモデルパラメータΘや予測されたラベルを記憶部206に出力する。これ以外にも、例えば、学習済みのモデルパラメータΘや予測されたラベルをディスプレイ等の表示装置102に出力してもよいし、通信ネットワークを介して接続される端末装置等に出力してもよい。 For example, the output unit 205 outputs the learned model parameters Θ and predicted labels to the storage unit 206. In addition to this, for example, the learned model parameters Θ and predicted labels may be output to the display device 102 such as a display, or may be output to a terminal device etc. connected via a communication network. .
 記憶部206は、メタ学習時において、与えられた学習データセットの集合Dを記憶する。また、ラベル予測時において、与えられたラベルありの特徴量集合X及びそのラベル集合Yとラベルなしの特徴量集合Xとを記憶する。これら以外にも、記憶部206には、モデルパラメータΘ、ハイパーパラメータ(例えば、タスク毎のラベルあり特徴量数N及びラベルなし特徴量数N等)、一時的な情報等が記憶される。 The storage unit 206 stores a given set of learning data sets D during meta-learning. Further, at the time of label prediction, the given labeled feature set XL , its label set YL , and unlabeled feature set XU are stored. In addition to these, the storage unit 206 stores model parameters Θ, hyperparameters (for example, the number of labeled features N L and the number of unlabeled features N U for each task, etc.), temporary information, etc. .
 <メタ学習処理>
 以下、本実施形態に係るメタ学習処理の一例について、図3を参照しながら説明する。
<Meta-learning processing>
An example of the meta-learning process according to this embodiment will be described below with reference to FIG. 3.
 まず、入力部201は、学習データセットの集合Dとタスク毎のラベルあり特徴量数N及びラベルなし特徴量数Nとを記憶部206から入力する(ステップS101)。なお、N及びNはタスク毎に異なることもあり得る。 First, the input unit 201 inputs the set D of learning data sets, the number of labeled features NL and the number of unlabeled features NU for each task from the storage unit 206 (step S101). Note that N L and N U may be different for each task.
 次に、メタ学習部204は、モデルパラメータΘを既知の任意の手法により初期化する(ステップS102)。 Next, the meta-learning unit 204 initializes the model parameter Θ using any known method (step S102).
 次に、入力部201は、1つの学習データセットD(t)を{D(1),・・・,D(T)}からランダムに選択する(ステップS103)。 Next, the input unit 201 randomly selects one learning data set D (t) from {D (1) , . . . , D (T) } (step S103).
 次に、入力部201は、上記のステップS103で選択した学習データセットD(t)から特徴量及びそのラベルをN個ランダムにサンプリングし、ラベルありの特徴量集合X及びそのラベル集合Yを得る(ステップS104)。ただし、学習データセットD(t)からサンプリングされた特徴量に対してラベルが付与されていない場合、この特徴量に関しては、ラベルが付与されていないことを表す特定の値(例えば、0等)をラベルと見做してラベル集合Yの要素とする。 Next, the input unit 201 randomly samples N L features and their labels from the learning data set D (t) selected in step S103 above, and creates a labeled feature set X L and its label set Y. Obtain L (step S104). However, if a label is not assigned to a feature sampled from the learning dataset D (t) , a specific value (for example, 0, etc.) indicating that no label is attached to this feature is regarded as a label and is an element of the label set YL .
 次に、入力部201は、上記のステップS104で得られたラベルありの特徴量集合X及びそのラベル集合Yと重複しないように、上記のステップS103で選択した学習データセットD(t)から特徴量及びそのラベルをN個ランダムにサンプリングし、ラベルなしの特徴量集合X及びそのラベル集合Yを得る(ステップS105)。なお、学習データセットD(t)にはラベルが付与されていない特徴量も含まれ得るが、本ステップでは、ラベルが付与されていない特徴量はサンプリングしないようにする。 Next, the input unit 201 inputs the learning data set D (t) selected in step S103 above so as not to overlap with the labeled feature set XL and its label set Y L obtained in step S104 above. N U feature quantities and their labels are randomly sampled from , to obtain an unlabeled feature quantity set X U and its label set Y U (step S105). Although the learning data set D (t) may also include feature quantities to which no labels have been assigned, in this step, the feature quantities to which no labels have been assigned are not sampled.
 次に、潜在ベクトル生成部202は、ラベルありの特徴量集合Xとそのラベル集合Yとラベルなしの特徴量集合Xとを入力として、各事例の潜在ベクトルを生成する(ステップS106)。なお、上記のステップS104でラベルが付与されていない特徴量がサンプリングされた場合、例えば、上記の式(5)の2つ目のスライスでは、ラベルが付与されていない特徴量の事例に対応する行の要素のうち、ラベルの観測有無を表す要素はすべて0(つまり、未観測を表す値)となる。 Next, the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set XL , its label set YL , and the unlabeled feature set XU (step S106). . Note that if a feature quantity to which no label has been assigned is sampled in step S104 above, for example, the second slice in the above equation (5) corresponds to an example of a feature quantity to which no label has been assigned. Among the row elements, all elements representing whether or not a label is observed are 0 (that is, a value representing unobserved).
 次に、予測部203は、上記のステップS106で生成された各事例の潜在ベクトルと、ラベル集合Yに含まれるラベルとを用いて、ラベルなしの特徴量集合Xに含まれる特徴量のラベルを予測する(ステップS107)。 Next, the prediction unit 203 uses the latent vectors of each case generated in step S106 above and the labels included in the label set YL to predict the features included in the unlabeled feature set XU . A label is predicted (step S107).
 次に、メタ学習部204は、ラベルなしの特徴量集合Xのラベル集合Yと比較することで、テスト性能を計算する(ステップS108)。例えば、テスト性能として、 Next, the meta-learning unit 204 calculates test performance by comparing the unlabeled feature set XU with the label set YU (step S108). For example, as a test performance,
Figure JPOXMLDOC01-appb-M000015
を計算する。
Figure JPOXMLDOC01-appb-M000015
Calculate.
 次に、メタ学習部204は、目的タスクの性能が高くなるように、モデルパラメータΘを学習(更新)する(ステップS109)。例えば、期待テスト性能(テスト性能の期待値)により目的タスクの性能を評価する場合、上記の式(10)によりモデルパラメータΘを学習する。 Next, the meta-learning unit 204 learns (updates) the model parameter Θ so that the performance of the target task becomes high (step S109). For example, when evaluating the performance of a target task using expected test performance (expected value of test performance), the model parameter Θ is learned using the above equation (10).
 次に、メタ学習部204は、予め決められた所定の終了条件を満たすか否かを判定する(ステップS110)。当該終了条件を満たす場合はステップS111に進み、そうでない場合はステップS103に進む。これにより、当該終了条件を満たすまでステップS103~ステップS109が繰り返し実行される。なお、終了条件としては、例えば、ステップS103~ステップS109の繰り返し回数が予め決められた閾値を超えたこと、モデルパラメータΘが収束したこと、等が挙げられる。 Next, the meta-learning unit 204 determines whether a predetermined termination condition is satisfied (step S110). If the termination condition is satisfied, the process advances to step S111; otherwise, the process advances to step S103. As a result, steps S103 to S109 are repeatedly executed until the termination condition is satisfied. Note that the termination conditions include, for example, that the number of repetitions of steps S103 to S109 exceeds a predetermined threshold, that the model parameter Θ has converged, and so on.
 上記のステップS110で所定の終了条件を満たすと判定された場合、出力部205は、学習済みのモデルパラメータΘを予め決められた所定の出力先に出力する(ステップS111)。 If it is determined in step S110 that the predetermined end condition is satisfied, the output unit 205 outputs the learned model parameters Θ to a predetermined output destination (step S111).
 <ラベル予測処理>
 以下、本実施形態に係るラベル予測処理の一例について、図4を参照しながら説明する。なお、以下では、モデルパラメータΘは学習済みであるものとする。
<Label prediction processing>
An example of the label prediction process according to this embodiment will be described below with reference to FIG. 4. Note that in the following, it is assumed that the model parameter Θ has been learned.
 まず、入力部201は、ラベルありの特徴量集合X及びそのラベル集合Yとラベルなしの特徴量集合Xとを記憶部206から入力する(ステップS201)。 First, the input unit 201 inputs the labeled feature set XL , its label set YL , and the unlabeled feature set XU from the storage unit 206 (step S201).
 次に、潜在ベクトル生成部202は、ラベルありの特徴量集合Xとそのラベル集合Yとラベルなしの特徴量集合Xとを入力として、各事例の潜在ベクトルを生成する(ステップS202)。 Next, the latent vector generation unit 202 generates a latent vector for each case by inputting the labeled feature set XL , its label set YL , and the unlabeled feature set XU (step S202). .
 次に、予測部203は、上記のステップS202で生成された各事例の潜在ベクトルと、ラベル集合Yに含まれるラベルとを用いて、ラベルなしの特徴量集合Xに含まれる特徴量のラベルを予測する(ステップS203)。なお、具体的には、例えば、上記の式(9)に従ってラベルなしの特徴量のラベルをサンプリングすればよい。 Next, the prediction unit 203 uses the latent vectors of each case generated in step S202 above and the labels included in the label set YL to predict the features included in the unlabeled feature set XU . A label is predicted (step S203). Note that, specifically, for example, labels of unlabeled feature amounts may be sampled according to the above equation (9).
 そして、出力部205は、上記のステップS203で予測されたラベルを予め決められた所定の出力先に出力する(ステップS204)。 Then, the output unit 205 outputs the label predicted in step S203 above to a predetermined output destination (step S204).
 <評価>
 本実施形態に係るメタ学習装置10を評価するため、人工データを用いて既存手法と比較を行った。既存手法としては、混合正規分布(GMM)、ガウス過程(GP)、ラベル伝搬(LP)、モデル不可知メタ学習(MAML)、プロトタイプネット(Proto)、ヘテロジニアスメタ学習(HML)、メタラベル伝搬(MetaLP)を採用した。
<Evaluation>
In order to evaluate the meta-learning device 10 according to this embodiment, comparisons were made with existing methods using artificial data. Existing methods include normal mixed distribution (GMM), Gaussian process (GP), label propagation (LP), model agnostic meta-learning (MAML), prototype net (Proto), heterogeneous meta-learning (HML), and meta-label propagation ( MetaLP) was adopted.
 また、評価指標としてはテスト正答率を採用した。評価結果(平均と標準誤差)を以下の表1に示す。 In addition, the test correct answer rate was used as an evaluation indicator. The evaluation results (average and standard error) are shown in Table 1 below.
Figure JPOXMLDOC01-appb-T000016
 ここで、提案手法が本実施形態に係るメタ学習装置10を表す。また、Shotはタスク毎のラベルありの特徴量数を表す。
Figure JPOXMLDOC01-appb-T000016
Here, the proposed method represents the meta-learning device 10 according to this embodiment. Further, Shot represents the number of labeled features for each task.
 上記の表1に示すように、本実施形態に係るメタ学習装置10は、既存手法と比較して高いテスト正答率を達成できていることがわかる。 As shown in Table 1 above, it can be seen that the meta-learning device 10 according to the present embodiment is able to achieve a higher test correct answer rate than existing methods.
 <まとめ>
 以上のように、本実施形態に係るメタ学習装置10は、ラベルが付与されていない学習データを含む、特徴量空間が異なる複数の学習データセットの集合が与えられたときに、ラベルが付与されていない学習データも活用してモデルパラメータを学習することができる。このため、本実施形態に係るメタ学習装置10によれば、目的タスクの学習データが少量しか与えられない場合であっても、目的タスクで高い性能を達成することができるようになる。
<Summary>
As described above, the meta-learning device 10 according to the present embodiment is capable of assigning labels when given a set of a plurality of learning data sets with different feature spaces, including training data to which no labels are assigned. It is possible to learn model parameters using training data that is not available. Therefore, according to the meta-learning device 10 according to the present embodiment, even if only a small amount of learning data for the target task is given, high performance can be achieved in the target task.
 本発明は、具体的に開示された上記の実施形態に限定されるものではなく、請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。 The present invention is not limited to the above-described specifically disclosed embodiments, and various modifications and changes, combinations with known techniques, etc. are possible without departing from the scope of the claims. .
 10    メタ学習装置
 101   入力装置
 102   表示装置
 103   外部I/F
 103a  記録媒体
 104   通信I/F
 105   RAM
 106   ROM
 107   補助記憶装置
 108   プロセッサ
 109   バス
 201   入力部
 202   潜在ベクトル生成部
 203   予測部
 204   メタ学習部
 205   出力部
 206   記憶部
 210   可変特徴量自己注意機構部
10 Meta-learning device 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 RAM
106 ROM
107 Auxiliary storage device 108 Processor 109 Bus 201 Input unit 202 Latent vector generation unit 203 Prediction unit 204 Meta learning unit 205 Output unit 206 Storage unit 210 Variable feature self-attention mechanism unit

Claims (8)

  1.  事例の特徴量が少なくとも含まれる学習データで構成される学習データセットであって、前記特徴量に対するラベルが含まれない学習データが含まれ得ると共に、前記特徴量の特徴量空間が異なり得る複数の学習データセットを入力する入力手順と、
     前記複数の学習データセットから一の学習データセットを選択する第1の選択手順と、
     前記一の学習データセットから、ラベルありデータとする第1の特徴量及び該第1の特徴量に対する第1のラベルと、ラベルなしデータとする第2の特徴量と、前記第2の特徴量に対する第2のラベルとを選択する第2の選択手順と、
     学習対象パラメータと、前記ラベルありデータと、前記ラベルなしデータとを用いて、前記第1の特徴量又は前記第2の特徴量が表す各事例の潜在ベクトルを生成する生成手順と、
     前記潜在ベクトルを用いて、前記ラベルなしデータに対するラベルを予測する予測手順と、
     前記ラベルなしデータに対するラベルの予測結果と、前記第2のラベルとを用いて、前記学習対象パラメータを学習する学習手順と、
     をコンピュータが実行するメタ学習方法。
    A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input procedure for inputting a training dataset;
    a first selection procedure of selecting one learning data set from the plurality of training data sets;
    From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection step of selecting a second label for;
    a generation procedure of generating a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data;
    a prediction procedure for predicting a label for the unlabeled data using the latent vector;
    a learning procedure of learning the learning target parameter using the label prediction result for the unlabeled data and the second label;
    A meta-learning method that is performed by a computer.
  2.  前記生成手順は、
     前記ラベルありデータと前記ラベルなしデータとを用いて、特徴量とラベルを区別すると共に前記ラベルありデータと前記ラベルなしデータとを区別するテンソルを作成し、
     前記テンソルに対して、前記事例に対する注意と、特徴量とラベルに対する注意とを交互に所定の回数繰り返した後、該繰り返し誤のテンソルの各事例に関するスライスを、前記各事例の潜在ベクトルとして生成する、請求項1に記載のメタ学習方法。
    The generation procedure is
    Using the labeled data and the unlabeled data, create a tensor that distinguishes between a feature amount and a label, and also distinguishes between the labeled data and the unlabeled data,
    After alternately repeating attention to the case and attention to the feature amount and label for the tensor a predetermined number of times, a slice for each case of the tensor with the repetition error is generated as a latent vector for each case. , The meta-learning method according to claim 1.
  3.  前記生成手順は、
     第1軸に沿ったスライスが、特徴量又はラベルの情報を表す第1の行列と、前記特徴量又はラベルが観測されているか否かの情報を表す第2の行列と、各要素が特徴量に対応するか否かの情報を表す第3の行列と、各要素がラベルに対応するか否かの情報を表す第4の行列とで構成される前記テンソルを作成する、請求項2に記載のメタ学習方法。
    The generation procedure is
    A first matrix whose slices along the first axis represent information on features or labels, a second matrix where each element represents information on whether or not the features or labels are observed, and each element represents a feature amount. 3. The tensor is created by creating the tensor including a third matrix representing information on whether each element corresponds to a label, and a fourth matrix representing information on whether each element corresponds to a label. meta-learning methods.
  4.  前記第2の選択手順は、
     前記第1の特徴量に対するラベルが存在しない場合、前記第1の特徴量に対する第1のラベルとして、前記第1の特徴量に対するラベルが存在しないことを表す所定の値を持つラベルを設定し、
     前記生成手順は、
     前記第1の特徴量に対するラベルが存在しない場合、前記第2の行列の要素うち前記第1の特徴量のラベルが観測されている否かを表す要素には、ラベルが観測されていないことを表す情報を設定する、請求項3に記載のメタ学習方法。
    The second selection procedure is
    If a label for the first feature does not exist, a label having a predetermined value indicating that there is no label for the first feature is set as the first label for the first feature;
    The generation procedure is
    If there is no label for the first feature, an element indicating whether or not a label for the first feature is observed among the elements of the second matrix indicates that no label is observed. The meta-learning method according to claim 3, wherein the information to be represented is set.
  5.  前記生成手順には、所定の軸の次元が可変であるテンソルを入力として、前記テンソルに関する注意をニューラルネットワークによって計算する注意機構手順が含まれ、
     前記生成手順は、
     前記事例に対する注意と、特徴量とラベルに対する注意とを前記注意機構手順によって計算する、請求項2乃至4の何れか一項に記載のメタ学習方法。
    The generation procedure includes an attention mechanism procedure in which a neural network calculates attention regarding the tensor by inputting a tensor in which the dimension of a predetermined axis is variable;
    The generation procedure is
    5. The meta-learning method according to claim 2, wherein the attention to the example and the attention to the feature amount and label are calculated by the attention mechanism procedure.
  6.  前記学習対象パラメータには、前記ニューラルネットワークのパラメータが含まれる、請求項5に記載のメタ学習方法。 The meta-learning method according to claim 5, wherein the learning target parameters include parameters of the neural network.
  7.  事例の特徴量が少なくとも含まれる学習データで構成される学習データセットであって、前記特徴量に対するラベルが含まれない学習データが含まれ得ると共に、前記特徴量の特徴量空間が異なり得る複数の学習データセットを入力するように構成されている入力部と、
     前記複数の学習データセットから一の学習データセットを選択するように構成されている第1の選択部と、
     前記一の学習データセットから、ラベルありデータとする第1の特徴量及び該第1の特徴量に対する第1のラベルと、ラベルなしデータとする第2の特徴量と、前記第2の特徴量に対する第2のラベルとを選択するように構成されている第2の選択部と、
     学習対象パラメータと、前記ラベルありデータと、前記ラベルなしデータとを用いて、前記第1の特徴量又は前記第2の特徴量が表す各事例の潜在ベクトルを生成するように構成されている生成部と、
     前記潜在ベクトルを用いて、前記ラベルなしデータに対するラベルを予測するように構成されている予測部と、
     前記ラベルなしデータに対するラベルの予測結果と、前記第2のラベルとを用いて、前記学習対象パラメータを学習するように構成されている学習部と、
     を有するメタ学習装置。
    A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input portion configured to input a training dataset;
    a first selection unit configured to select one learning data set from the plurality of learning data sets;
    From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection unit configured to select a second label for the second label;
    Generation configured to generate a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data. Department and
    a prediction unit configured to predict a label for the unlabeled data using the latent vector;
    a learning unit configured to learn the learning target parameter using the label prediction result for the unlabeled data and the second label;
    A meta-learning device with
  8.  事例の特徴量が少なくとも含まれる学習データで構成される学習データセットであって、前記特徴量に対するラベルが含まれない学習データが含まれ得ると共に、前記特徴量の特徴量空間が異なり得る複数の学習データセットを入力する入力手順と、
     前記複数の学習データセットから一の学習データセットを選択する第1の選択手順と、
     前記一の学習データセットから、ラベルありデータとする第1の特徴量及び該第1の特徴量に対する第1のラベルと、ラベルなしデータとする第2の特徴量と、前記第2の特徴量に対する第2のラベルとを選択する第2の選択手順と、
     学習対象パラメータと、前記ラベルありデータと、前記ラベルなしデータとを用いて、前記第1の特徴量又は前記第2の特徴量が表す各事例の潜在ベクトルを生成する生成手順と、
     前記潜在ベクトルを用いて、前記ラベルなしデータに対するラベルを予測する予測手順と、
     前記ラベルなしデータに対するラベルの予測結果と、前記第2のラベルとを用いて、前記学習対象パラメータを学習する学習手順と、
     をコンピュータに実行させるプログラム。
    A learning data set consisting of learning data that includes at least feature quantities of examples, which may include learning data that does not include labels for the feature quantities, and a plurality of learning data sets that may have different feature spaces of the feature quantities. an input procedure for inputting a training dataset;
    a first selection procedure of selecting one learning data set from the plurality of training data sets;
    From the one learning data set, a first feature quantity to be labeled data, a first label for the first feature quantity, a second feature quantity to be unlabeled data, and the second feature quantity. a second selection step of selecting a second label for;
    a generation procedure of generating a latent vector of each case represented by the first feature amount or the second feature amount using the learning target parameter, the labeled data, and the unlabeled data;
    a prediction procedure for predicting a label for the unlabeled data using the latent vector;
    a learning procedure of learning the learning target parameter using the label prediction result for the unlabeled data and the second label;
    A program that causes a computer to execute.
PCT/JP2022/032222 2022-08-26 2022-08-26 Meta-learning method, meta-learning device, and program WO2024042707A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/032222 WO2024042707A1 (en) 2022-08-26 2022-08-26 Meta-learning method, meta-learning device, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/032222 WO2024042707A1 (en) 2022-08-26 2022-08-26 Meta-learning method, meta-learning device, and program

Publications (1)

Publication Number Publication Date
WO2024042707A1 true WO2024042707A1 (en) 2024-02-29

Family

ID=90012941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/032222 WO2024042707A1 (en) 2022-08-26 2022-08-26 Meta-learning method, meta-learning device, and program

Country Status (1)

Country Link
WO (1) WO2024042707A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019185748A (en) * 2018-04-12 2019-10-24 バイドゥ ユーエスエイ エルエルシーBaidu USA LLC System and method for learning interactive language
JP2020071694A (en) * 2018-10-31 2020-05-07 株式会社日立製作所 Computer system
US20200364302A1 (en) * 2019-05-15 2020-11-19 Captricity, Inc. Few-shot language model training and implementation
WO2021250751A1 (en) * 2020-06-08 2021-12-16 日本電信電話株式会社 Learning method, learning device, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019185748A (en) * 2018-04-12 2019-10-24 バイドゥ ユーエスエイ エルエルシーBaidu USA LLC System and method for learning interactive language
JP2020071694A (en) * 2018-10-31 2020-05-07 株式会社日立製作所 Computer system
US20200364302A1 (en) * 2019-05-15 2020-11-19 Captricity, Inc. Few-shot language model training and implementation
WO2021250751A1 (en) * 2020-06-08 2021-12-16 日本電信電話株式会社 Learning method, learning device, and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
REN MENGYE, TRIANTAFILLOU ELENI, RAVI SACHIN, SNELL JAKE, SWERSKY KEVIN, TENENBAUM JOSHUA B, LAROCHELLE HUGO, ZEMEL RICHARD S: "Meta-Learning for Semi-Supervised Few-Shot Classification", T ICLR 2018, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 1 March 2018 (2018-03-01), Ithaca, pages 1 - 15, XP093142296, Retrieved from the Internet <URL:https://openreview.net/pdf?id=HJcSzz-CZ> [retrieved on 20240318], DOI: 10.48550/arxiv.1803.00676 *
SUN SHENGLI, SUN QINGFENG, ZHOU KEVIN, LV TENGCHAO: "Hierarchical Attention Prototypical Networks for Few-Shot Text Classification", PROCEEDINGS OF THE 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP), ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, STROUDSBURG, PA, USA, 1 January 2019 (2019-01-01), Stroudsburg, PA, USA, pages 476 - 485, XP093142298, DOI: 10.18653/v1/D19-1045 *

Similar Documents

Publication Publication Date Title
CN109657805B (en) Hyper-parameter determination method, device, electronic equipment and computer readable medium
Magliacane et al. Ancestral causal inference
CN110555469B (en) Method and device for processing interactive sequence data
JP6962532B1 (en) Event prediction device and event prediction program
US11281999B2 (en) Predictive accuracy of classifiers using balanced training sets
CN115244587A (en) Efficient ground truth annotation
CN113255842B (en) Vehicle replacement prediction method, device, equipment and storage medium
US20220122000A1 (en) Ensemble machine learning model
Bhowmick et al. Application of machine learning to the selection of sparse linear solvers
Mori et al. Inference in hybrid Bayesian networks with large discrete and continuous domains
Rai Advanced deep learning with R: Become an expert at designing, building, and improving advanced neural network models using R
KR20200092989A (en) Production organism identification using unsupervised parameter learning for outlier detection
Baughman et al. Study of feature importance for quantum machine learning models
Bläsius et al. Towards a systematic evaluation of generative network models
CN113298634A (en) User risk prediction method and device based on time sequence characteristics and graph neural network
Shi et al. A vector representation of DNA sequences using locality sensitive hashing
WO2024042707A1 (en) Meta-learning method, meta-learning device, and program
Harari et al. Automatic features generation and selection from external sources: a DBpedia use case
US11880773B2 (en) Method and apparatus for performing machine learning based on correlation between variables
Boonmatham et al. Stock price analysis with natural language processing and machine learning
Kanakaris et al. On the Exploitation of Textual Descriptions for a Better-informed Task Assignment Process.
Utukuru et al. Missing data resilient ensemble subspace decision Tree Classifier
JP5623344B2 (en) Reduced feature generation apparatus, method, program, model construction apparatus and method
JP7465497B2 (en) Learning device, learning method, and program
US7720771B1 (en) Method of dividing past computing instances into predictable and unpredictable sets and method of predicting computing value

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22956533

Country of ref document: EP

Kind code of ref document: A1