WO2023223509A1

WO2023223509A1 - Learning device, learning method, and learning program

Info

Publication number: WO2023223509A1
Application number: PCT/JP2022/020859
Authority: WO
Inventors: 充敏熊谷
Original assignee: 日本電信電話株式会社
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2023-11-23

Abstract

In the present invention, a feature selection device (1) includes: a learning data input unit (11) that acquires related data which has the same feature amount constitution as data serving as a target for feature selection processing but with different feature amount values; and a feature selection model learning unit (13) that trains a feature selection model such that the results of feature selection, obtained by inputting a small amount of labeled data selected from the related data to a feature selection model, match the results of feature selection, obtained by inputting a sufficient amount of labeled data selected from the related data to the feature selection model.

Description

Learning devices, learning methods and learning programs

The present invention relates to a learning device, a learning method, and a learning program.

FIG. 7 is a diagram explaining supervised feature selection. As shown in FIG. 7, supervised feature selection is a technique that uses supervised machine learning to extract important features from the features of labeled data (see Non-Patent Documents 1 and 2). For labeled data, for example, in the case of text classification, each sample is a sentence, and the label represents the content (politics, sports, etc.) that the sentence represents. The feature value of each sample is, for example, the frequency of appearance of each word in the sentence.As an example, sentence = {“Kubo”:5, “Japanese representative”:3, “Prime Minister”:0,…} , and the label = “soccer”.

By extracting only important feature quantities, interpretability in data analysis can be improved, and in post-processing such as clustering, processing can be sped up by targeting only the extracted feature quantities.

The techniques described in

Non-Patent Documents

1 and 2 require a large amount of labeled data in order to perform accurate feature selection. In particular, the technique described in Non-Patent Document 2 uses NN (Neural Networks). Since NN generally requires a large amount of data, the technique described in Non-Patent Document 2 requires even more data.

However, in real problems, it often happens that large amounts of data cannot be prepared. For example, when you want to analyze purchasing behavior and the like from user data, it is difficult to obtain much data from new users or users who use the service infrequently. Similarly, if you want to analyze the characteristics of a new device from the data of the new device, you cannot immediately perform the analysis because there is not enough data for the new device.

In such cases, conventional techniques cannot appropriately select features that are useful for data analysis, making it difficult to apply feature selection.

The present invention has been made in view of the above, and provides a learning device, a learning method, and a learning program that enable accurate feature selection even when labeled data is small. The purpose is to

In order to solve the above-mentioned problems and achieve the purpose, the learning device according to the present invention acquires related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values. an acquisition unit that inputs the first labeled data selected from the related data to the feature selection model, and inputs the result of feature selection and the second labeled data selected from the related data to the feature selection model. The present invention is characterized by comprising a learning unit that performs learning of the feature selection model so that the input and the result of feature selection match.

According to the present invention, even when there is a small amount of labeled data, it is possible to select features with high accuracy.

FIG. 1 is a diagram illustrating an overview of a feature selection device according to an embodiment. FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment. FIG. 3 is a flowchart showing the processing procedure of the learning process. FIG. 4 is a flowchart showing the procedure of selection processing. FIG. 5 is a diagram for explaining the processing of the learning section. FIG. 6 is a diagram illustrating an example of a computer that implements a feature selection device by executing a program. FIG. 7 is a diagram illustrating supervised feature selection.

Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the description of the drawings, the same parts are denoted by the same reference numerals. In addition, in the following, when "￣A" is written for the matrix A, it is assumed that it is the same as "a symbol with "￣" written directly above "A"."

[Embodiment]
[Overview of feature selection device]
The feature selection device according to the present embodiment utilizes a plurality of related sufficiently labeled datasets to accurately select features from a small amount of labeled data, which is the target dataset from which features are to be selected. .

FIG. 1 is a diagram illustrating an overview of a feature selection device according to an embodiment. As shown in Figure 1, in the learning phase, the feature selection device uses only data from related datasets to accurately select features from a small amount of labeled data (third labeled data) to be processed for feature selection. Learn a model to make choices.

Specifically, in the learning phase, the feature selection device randomly extracts a small amount of labeled data (first labeled data) from the related data set t (t=1,...,T) (Fig. (1) of Figure 1), input to the feature selection model and perform feature selection ((2) of Figure 1).

Then, in the learning phase, the feature selection device also inputs a sufficient amount of labeled data (second labeled data) of the related data set t to the feature selection model and performs feature selection (( 2)). A sufficient amount of labeled data has a sufficiently large amount of data compared to a small amount of labeled data.

In the learning phase, the feature selection device selects the result of feature selection from a small amount of labeled data (feature F1) and the result of feature selection from a sufficient amount of labeled data (feature F2) in the related dataset t. The feature selection model is trained so that they match ((3) in FIG. 1).

Then, in the test (feature selection) phase, the feature selection device selects appropriate features by inputting the target data set (a small amount of labeled data) to the learned feature selection model.

In this way, the feature selection device uses a plurality of related data sets to teach the feature selection model how to select features well from a small amount of labeled data. By using the feature selection model learned in this way, the feature selection device can appropriately select features even for a small amount of labeled data.

Note that a related dataset is one that has the same feature quantities (name) as the target dataset, such as images of the same subject with different colors, but has different conditions and the distribution of the values of each feature. Meaning different datasets.

[Feature selection device]
FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment. The feature selection device 1 (learning device) according to the embodiment has a predetermined program loaded into a computer, etc., including, for example, ROM (Read Only Memory), RAM (Random Access Memory), CPU (Central Processing Unit), etc. , is realized by the CPU executing a predetermined program. Further, the feature selection device 1 has a communication interface for transmitting and receiving various information to and from other devices connected via a network or the like. The feature selection device 1 is realized by a general-purpose computer such as a workstation or a personal computer. As shown in FIG. 2, the feature selection device 1 includes a learning section 10 that performs learning processing and a selection section 20 that performs feature selection processing.

The learning unit 10 trains the feature selection model 141 using a plurality of related data sets (labeled data).

When a target data set (a small amount of labeled data) is given, the selection unit 20 uses the obtained feature selection model 141 to select appropriate features from the target data set (a small amount of labeled data). . The selection unit 20 may be implemented in the same hardware as the learning unit 10, or may be implemented in different hardware.

[Study Department]
The learning section 10 includes a learning data input section 11 (acquisition section), a feature extraction section 12, a feature selection model learning section 13 (learning section), and a storage section 14.

The learning data input unit 11 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit in response to input operations by an operator. The learning data input unit 11 functions as an acquisition unit, and receives as input a related data set (sample with label) that has the same feature configuration as the target data set to be processed for feature selection, but has different feature value values. .

The related data set may be input to the learning unit 10 from an external server device or the like via a communication control unit (not shown) implemented by a NIC (Network Interface Card) or the like.

The feature extraction unit 12 converts each labeled sample of the acquired related data set into a feature vector. Here, the feature vector is a representation of the features of necessary data as an n-dimensional numerical vector. The feature extraction unit 12 performs conversion into a feature vector using a method commonly used in machine learning. For example, when the data is text, the feature extraction unit 12 can apply a method using morphological analysis, a method using n-grams, a method using delimiters, etc.

The feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction. The feature selection model 141 is a model that selects important features from labeled data.

Specifically, the feature selection model learning unit 13 performs pseudo learning of the feature selection model using the related data set from which the feature extraction unit 12 has extracted the features. The feature selection model learning unit 13 randomly selects a small amount of labeled data (sample for pseudo learning) and a sufficient amount of labeled data (sample for pseudo test) from the related data set. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.

As the feature selection model 141, a kernel method-based feature selection model or a NN (Neural Networks)-based model is applied.

The storage unit 14 is realized by a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. The learned feature selection model 141 is stored in the storage unit 14 .

[Selection section]
The selection section 20 includes a data input section 21 , a feature extraction section 22 , a feature selection section 23 , and a result output section 24 .

The data input unit 21 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit and receives target data sets in response to input operations by an operator. The data input unit 21 outputs the input target data set to the feature extraction unit 22. The target data set is a data set to be processed for feature quantity selection, and consists of a small amount of labeled data (third labeled data).

Note that the target data set may be input to the selection unit 20 from an external server device or the like via a communication control unit (not shown) implemented by a NIC or the like. Further, the data input section 21 may be the same hardware as the learning data input section 11.

Similar to the feature extraction unit 12 of the learning unit 10, the feature extraction unit 22 converts each labeled sample of the acquired target data set into a feature vector in preparation for processing in the feature selection unit 23.

The feature selection unit 23 functions as a selection unit, and uses the learned feature selection model 141 to select important feature quantities from the target data set, which is the data to be processed for feature selection.

The result output unit 24 is realized by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, etc., and outputs the result of the feature selection process to the operator. For example, the result output unit 24 outputs important features selected from the input target data set.

[Processing procedure of feature selection processing]
Next, with reference to FIGS. 3 and 4, the procedure of feature selection processing by the feature selection device 1 will be described. The feature selection process of the feature selection device 1 includes a learning process by the learning unit 10 and a selection process by the selection unit 20.

[Learning process]
FIG. 3 is a flowchart showing the processing procedure of the learning process. The flowchart in FIG. 3 is started, for example, at the timing when the user inputs an operation instructing to start the learning process.

As shown in FIG. 3, the learning data input unit 11 receives a plurality of related data sets (labeled data) as input (step S1). The feature extraction unit 12 converts each labeled sample of the input related data set into a feature vector (step S2).

The feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction (step S3). The feature selection model learning unit 13 extracts a small amount of labeled data and a sufficient amount of labeled data from each dataset using the data of the related dataset after the feature extraction unit 12 has extracted the features. Select randomly. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.

The feature selection model learning unit 13 stores the learned feature selection model 141 in the storage unit 14.

[Selection process]
FIG. 4 is a flowchart showing the procedure of selection processing. The flowchart in FIG. 4 is started, for example, at the timing when the user inputs an operation instructing to start the selection process.

The data input unit 21 receives input of the target data set to be processed (a small amount of labeled data) (step S11), and the feature extraction unit 22 converts each sample of the received target data set into a feature vector (step S11). S12).

The feature selection unit 23 executes feature selection from the target data set using the feature selection model 141 (step S13). Then, the result output unit 24 outputs the feature selection result by the feature selection unit 23 (step S14).

[Effects of embodiment]
The feature selection device 1 according to the embodiment acquires related data that has the same feature quantity structure as the data to be processed for feature selection, but has different feature quantity values. The feature selection device 1 inputs a small amount of labeled data selected from related data into a feature selection model, and inputs the result of feature selection and a sufficient amount of labeled data selected from the related data into the feature selection model. The feature selection model is trained so that the input and feature selection results match.

In this way, the feature selection device 1 performs re-learning that requires expensive calculations for any target dataset by learning using related datasets instead of the target dataset (a small amount of labeled data). It is possible to select important features with high accuracy without having to perform

In other words, the feature selection device 1 is able to select important features with high precision even for a small target data set by utilizing useful information from related data sets.

Then, the feature selection device 1 inputs a small amount of labeled data selected from the related data into the feature selection model, and uses the result of feature selection and a sufficient amount of labeled data selected from the related data to select features. The feature selection model 141 is trained so that the input to the model and the result of feature selection match.

With this, the feature selection device 1 causes the feature selection model 141 to learn how to select a good feature from a small amount of labeled data. Therefore, by using the feature selection model 141, the feature selection device 1 can select important features accurately and at low cost even when only a small amount of labeled data sets to be processed are obtained.

[Application example]
An application example of this embodiment will be specifically described. First, let S (formula (1)) be a target data set (a small amount of labeled data).

Here, x _n = (x _n ⁽¹⁾ ,..., x _n ^(D) ) ^T represents the D-dimensional feature vector of the nth sample. y _n represents a real value in the case of a regression problem, and a discrete value in the case of a classification problem. Here, for the sake of simplicity, we will focus on regression problems.

Now, assume that T related data sets (Equation (2)) are given to the learning phase.

Assume that the dimension M of the feature vector is the same for all datasets. We also assume that the problem is the same for all datasets. In other words, you don't have a regression problem in one dataset and a classification problem in another. The objective here is to select at most K features suitable for a target dataset S, which is not included in the relevant dataset, and is given to the feature selection phase.

First, we will explain the feature selection model that selects features from S (feature selection phase). After that, a learning method for the model will be explained (learning phase).

[Feature selection phase]
In the feature selection phase, a non-negative real number α _d representing the importance of each feature is introduced and estimated from S. Feature selection is possible by selecting K corresponding features in descending order of α _d . In order to estimate α=(α ₁ , . . . α _D ), consider the optimization problem of equation (3).

Here, || || _F represents the Frobenius norm. || || ₁ represents the l1 norm. λ represents a positive real number. ￣L _S (Equation (4)) is a centered Gram matrix with respect to label y. The (i,j) component of the matrix L _S is determined by the S-dependent kernel function L(S) (y _i , y _j ). Γ is the centering matrix.

Similarly, ￣K _S (Equation (5)) is the centered Gram matrix for the dth feature. The (i,j) component of the matrix K ^(d) _S is determined by the S-dependent kernel K ^(d) (S) (x ^(d) _i , x ^(d) _j ). A specific example of the kernel function will be described later.

Explain the objective function. The first term of the objective function (Equation (3)) can be decomposed as shown in Equation (6).

Here, equation (7) represents the estimated amount of the HSIC of the random variables X and Y by S.

HSIC has the following properties.
X and Y are independent ⇔ HSIC = 0

In other words, HSIC can be said to mean that "if HSIC>0, X and Y have a (nonlinear) correlation." Based on this, equation (6) will be considered.

Equation (8) represents the correlation between the d-th feature and the label.

When equation (8) is positive, α _d takes on a large value in order to minimize equation (6). Conversely, in the case of equation (9), that is, when the two are unrelated, α _d tends to become 0 due to the effect of the l1 regularization described in the previous section.

In other words, α _d can be interpreted as the importance of the d-th feature in predicting y. In addition, if the second term of equation (6) (formula (10)) is positive, that is, if there is a correlation between d and d'-th features, α _d and α _d' At least one of these is likely to be 0. This means that features that are redundant in explaining y can be automatically eliminated.

As explained above, the desired result can be obtained by minimizing the objective function. However, the assumption was that the HSIC estimator (Equation (11)) was accurately estimated.

In the case of this embodiment, since S is a small amount of data, the HSIC estimate (Equation (11)) usually ends up being an inaccurate estimate. To solve this problem, below we will construct a kernel function suitable for feature selection from a small amount of data. First, a vector representation of S is extracted using equation (12).

Here, f and g are arbitrary neural networks. [,] represents a combination of two vectors. Since the "sum" of f does not depend on the order of the samples in S, equation (12) defines one vector z for the set S. Note that any neural network other than this type may be used as long as it is permutation invariant (for example, maximum value or set transformer).

Using the vector representation z of S, construct a kernel function in the form of equation (13).

Here, β(z) (Equation (14)) and σ(z) (Equation (15)) are each modeled by a neural network that takes z as an input.

Through the learning method described below, it is expected that a bias will be acquired in these quantities such that each feature quantity is more likely to be selected or less likely to be selected.

For example, when β ^(d) (z)=0 is learned, equation (9) is immediately derived from the definition of equation (8). At this time, α _d becomes 0 due to equation (3) and l1 regularization. In this way, even if S is a small amount of data, by having β(z) and σ(z) learn a bias that compensates for the lack of data, it is expected that feature selection accuracy will improve.

Note that for simplicity, we set a kernel that is S-independent for L, but it is also possible to make it S-dependent. σ _y is a hyperparameter.

The optimization method for the objective function of equation (3) will be described. In this example, ISTA (Iterative Shrinkage Thresholding Algorithm) is used. By using ISTA, each optimization update equation can be written in a closed form and is differentiable. Being able to write in a closed format has the advantage that learning can be carried out efficiently. Also, being differentiable is a necessary condition for learning a feature selection model using stochastic gradient descent, similar to the learning of a general NN. Any optimization method other than ISTA can be used as long as the update formula can be found in a differentiable form.

The update equations using ISTA are equations (16) and (17), and the global optimal solution is found by alternately repeating equations (16) and (17). Note that μ>0 is a hyperparameter representing the step size.

[Learning phase]
Here, we describe a model learning method using related datasets. Here, we will denote the sample set from the related data set by the symbol S. For convenience, the same symbols as the feature selection phase are used, but these are different data.

The learning parameters of the model are the parameters of the neural networks f, g, β, and σ, the initial parameter α ₀ of ISTA, and the regularization parameter λ. By using the initial parameter α ₀ of ISTA as a learning target, α can be calculated efficiently when selecting features. The objective functions are equations (18) and (19).

Here, α ^* _d represents the importance of the d-th feature obtained after running ISTA from S for I iterations. The pseudo small amount of training data (a small amount of labeled data) obtained by randomly sampling from the related data set D _t and the pseudo test data (a sufficient amount of labeled data) are represented by S and Q, respectively. ing.

￣L _Q (Equation (20)) is a centered Gram matrix in pseudo test data Q regarding label y, and the (i,j) component of matrix L _Q is determined by the kernel function L (y _i , y _j ) . Γ is the centering matrix.

Similarly, ￣K ^(d) _S (Equation (21)) is the centered Gram matrix on the pseudo test data Q for the dth feature, and the (i,j) component of the matrix K ^(d) _Q is the kernel It is determined by K ^(d) (x ^(d) _i , x ^(d) _j ).

The same kernel as in equation (3) is used as the kernel function L for label y. As the kernel for the feature amount d, we use the RBF kernel of equation (22), which is often used when using HSIC.

Here, σ _x is a hyperparameter. Unlike Equation (3), by using the same kernel for all features, all features can be evaluated equally without bias.

Here, unlike the small amount of labeled data (pseudo small amount learning data) S, it is assumed that the pseudo test data Q has a sufficient amount of data. Therefore, it is possible to accurately estimate the estimated amount of HSIC from the pseudo test data Q using the kernel of equation (22).

Equation (17) assumes a small value when α ^* _d obtained from a small amount of pseudo training data, which is a small amount of labeled data S, matches that obtained from pseudo test data Q, which is a sufficient amount of labeled data. Take.

Therefore, in the embodiment, by applying the present learning method, it is possible to accurately select features from the pseudo-small amount learning data S, which is data with a small amount of labels.

FIG. 5 is a diagram for explaining the processing of the learning section 10. FIG. 5 exemplifies the pseudo code of the processing of the learning unit 10.

First, the learning unit 10 takes D as a related data set, a small amount of labeled data (pseudo small amount training data) S (number of samples N _S ), a sufficient amount of labeled data (pseudo test data) Q (number of samples N _Q ) Get (Algorithm 1).

In the learning phase, the learning unit 10 randomly selects a sample task t, a small amount of labeled data S, and a sufficient amount of labeled data Q (lines 2-4 of Algorithm 1).

The learning unit 10 calculates a vector z from a small amount of labeled data S using equation (12) (fifth line of Algorithm 1).

The learning unit 10 calculates the kernel of equation (13) using the vector representation z of the small amount of labeled data S (line 6 of Aolorithm1).

The learning unit 10 obtains the global optimal solution of the objective function (formula (3)) by alternately repeating the update formulas of formula (16) and formula (17) described using ISTA (7 to 7 of Aolorithm1). line 9).

The learning unit 10 calculates the loss for a sufficient amount of labeled data Q based on equations (18) and (19) (line 10 of Aolorithm1).

The learning unit 10 updates the parameters of the feature selection model based on the loss calculation results (line 11 of Aolorithm1).

The feature selection device according to the present embodiment provides specific improvements over conventional feature selection methods such as those described in

Non-Patent Documents

1 and 2, and is a technical field related to feature selection performance evaluation. It shows improvement.

[About the system configuration of the embodiment]
Each component of the feature selection device 1 is functionally conceptual, and does not necessarily need to be physically configured as shown. In other words, the specific form of distributing and integrating the functions of the feature selection device 1 is not limited to what is shown in the figure, and all or part of them can be divided into functional or physical units in arbitrary units depending on various loads and usage conditions. It can be configured in a distributed or integrated manner.

Furthermore, all or any part of each process performed in the feature selection device 1 may be realized by a CPU, a GPU (Graphics Processing Unit), or a program that is analyzed and executed by the CPU and GPU. Moreover, each process performed in the feature selection device 1 may be realized as hardware using wired logic.

Furthermore, among the processes described in the embodiments, all or part of the processes described as being performed automatically can also be performed manually. Alternatively, all or part of the processes described as being performed manually can also be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, and various data and parameters described above and illustrated can be changed as appropriate, unless otherwise specified.

[program]
FIG. 6 is a diagram showing an example of a computer that implements the feature selection device 1 by executing a program. Computer 1000 includes, for example, memory 1010 and CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.

The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120. Video adapter 1060 is connected to display 1130, for example.

The hard disk drive 1090 stores, for example, an OS (Operating System) 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the feature selection device 1 is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program module 1093 is stored in hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration of the feature selection device 1 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Further, the setting data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

Note that the program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.

Regarding the above embodiments, the following additional notes are further disclosed.

(Additional note 1)
memory and
at least one processor connected to the memory;
including;
The processor includes:
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A learning device that performs learning of the feature selection model so that the results of the feature selection model match the results of the training.

(Additional note 2)
The learning device according to Supplementary Note 1,
The second labeled data has a larger amount of data than the first labeled data.

(Additional note 3)
The learning device according to Supplementary Note 1,
A learning device that selects a feature quantity from third labeled data to be processed for feature selection, using the feature selection model learned by the learning unit.

(Additional note 5)
A non-transitory storage medium storing a program executable by a computer to perform a learning process,
The learning process is
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A non-temporary storage medium that performs learning of the feature selection model so that the results of the above-mentioned feature selection model match the results of the above training.

Although the embodiments applying the invention made by the present inventor have been described above, the present invention is not limited by the description and drawings that form part of the disclosure of the present invention according to the present embodiments. That is, all other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are included in the scope of the present invention.

1 Feature selection device 10 Learning unit 11 Learning data input unit 12, 22 Feature extraction unit 13 Feature selection model learning unit 14 Storage unit 20 Selection unit 21 Data input unit 23 Feature selection unit 24 Result output unit 141 Feature selection model

Claims

an acquisition unit that acquires related data that has the same feature quantity configuration as the data to be processed for feature selection but has a different feature quantity value;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a learning unit that performs learning of the feature selection model so that the results match the results of the
A learning device characterized by having.
The learning device according to claim 1, wherein the second labeled data has a larger amount of data than the first labeled data.
2. The method according to claim 1, further comprising: a selection section that selects a feature amount from third labeled data to be processed for feature selection, using the feature selection model learned by the learning section. learning device.
A learning method executed by a learning device, comprising:
acquiring related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a step of training the feature selection model so that the results match the results of the step;
A learning method characterized by including.
a step of acquiring related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a step of training the feature selection model so that the results match the results of the step;
A learning program for making a computer execute