WO2023223509A1 - Learning device, learning method, and learning program - Google Patents
Learning device, learning method, and learning program Download PDFInfo
- Publication number
- WO2023223509A1 WO2023223509A1 PCT/JP2022/020859 JP2022020859W WO2023223509A1 WO 2023223509 A1 WO2023223509 A1 WO 2023223509A1 JP 2022020859 W JP2022020859 W JP 2022020859W WO 2023223509 A1 WO2023223509 A1 WO 2023223509A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature selection
- data
- learning
- feature
- labeled
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 47
- 238000012549 training Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 20
- 230000008569 process Effects 0.000 description 18
- 238000000605 extraction Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 239000013598 vector Substances 0.000 description 15
- 239000011159 matrix material Substances 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000003860 storage Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to a learning device, a learning method, and a learning program.
- FIG. 7 is a diagram explaining supervised feature selection.
- supervised feature selection is a technique that uses supervised machine learning to extract important features from the features of labeled data (see Non-Patent Documents 1 and 2).
- labeled data for example, in the case of text classification, each sample is a sentence, and the label represents the content (politics, sports, etc.) that the sentence represents.
- Non-Patent Documents 1 and 2 require a large amount of labeled data in order to perform accurate feature selection.
- the technique described in Non-Patent Document 2 uses NN (Neural Networks). Since NN generally requires a large amount of data, the technique described in Non-Patent Document 2 requires even more data.
- the present invention has been made in view of the above, and provides a learning device, a learning method, and a learning program that enable accurate feature selection even when labeled data is small.
- the purpose is to
- the learning device acquires related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values.
- an acquisition unit that inputs the first labeled data selected from the related data to the feature selection model, and inputs the result of feature selection and the second labeled data selected from the related data to the feature selection model.
- the present invention is characterized by comprising a learning unit that performs learning of the feature selection model so that the input and the result of feature selection match.
- FIG. 1 is a diagram illustrating an overview of a feature selection device according to an embodiment.
- FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment.
- FIG. 3 is a flowchart showing the processing procedure of the learning process.
- FIG. 4 is a flowchart showing the procedure of selection processing.
- FIG. 5 is a diagram for explaining the processing of the learning section.
- FIG. 6 is a diagram illustrating an example of a computer that implements a feature selection device by executing a program.
- FIG. 7 is a diagram illustrating supervised feature selection.
- the feature selection device utilizes a plurality of related sufficiently labeled datasets to accurately select features from a small amount of labeled data, which is the target dataset from which features are to be selected. .
- FIG. 1 is a diagram illustrating an overview of a feature selection device according to an embodiment.
- the feature selection device uses only data from related datasets to accurately select features from a small amount of labeled data (third labeled data) to be processed for feature selection. Learn a model to make choices.
- the feature selection device also inputs a sufficient amount of labeled data (second labeled data) of the related data set t to the feature selection model and performs feature selection (( 2)).
- a sufficient amount of labeled data has a sufficiently large amount of data compared to a small amount of labeled data.
- the feature selection device selects the result of feature selection from a small amount of labeled data (feature F1) and the result of feature selection from a sufficient amount of labeled data (feature F2) in the related dataset t.
- the feature selection model is trained so that they match ((3) in FIG. 1).
- the feature selection device selects appropriate features by inputting the target data set (a small amount of labeled data) to the learned feature selection model.
- the feature selection device uses a plurality of related data sets to teach the feature selection model how to select features well from a small amount of labeled data.
- the feature selection device can appropriately select features even for a small amount of labeled data.
- a related dataset is one that has the same feature quantities (name) as the target dataset, such as images of the same subject with different colors, but has different conditions and the distribution of the values of each feature. Meaning different datasets.
- FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment.
- the feature selection device 1 (learning device) according to the embodiment has a predetermined program loaded into a computer, etc., including, for example, ROM (Read Only Memory), RAM (Random Access Memory), CPU (Central Processing Unit), etc. , is realized by the CPU executing a predetermined program. Further, the feature selection device 1 has a communication interface for transmitting and receiving various information to and from other devices connected via a network or the like.
- the feature selection device 1 is realized by a general-purpose computer such as a workstation or a personal computer. As shown in FIG. 2, the feature selection device 1 includes a learning section 10 that performs learning processing and a selection section 20 that performs feature selection processing.
- the learning unit 10 trains the feature selection model 141 using a plurality of related data sets (labeled data).
- the selection unit 20 uses the obtained feature selection model 141 to select appropriate features from the target data set (a small amount of labeled data).
- the selection unit 20 may be implemented in the same hardware as the learning unit 10, or may be implemented in different hardware.
- the learning section 10 includes a learning data input section 11 (acquisition section), a feature extraction section 12, a feature selection model learning section 13 (learning section), and a storage section 14.
- the learning data input unit 11 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit in response to input operations by an operator.
- the learning data input unit 11 functions as an acquisition unit, and receives as input a related data set (sample with label) that has the same feature configuration as the target data set to be processed for feature selection, but has different feature value values. .
- the related data set may be input to the learning unit 10 from an external server device or the like via a communication control unit (not shown) implemented by a NIC (Network Interface Card) or the like.
- a communication control unit not shown
- NIC Network Interface Card
- the feature extraction unit 12 converts each labeled sample of the acquired related data set into a feature vector.
- the feature vector is a representation of the features of necessary data as an n-dimensional numerical vector.
- the feature extraction unit 12 performs conversion into a feature vector using a method commonly used in machine learning. For example, when the data is text, the feature extraction unit 12 can apply a method using morphological analysis, a method using n-grams, a method using delimiters, etc.
- the feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction.
- the feature selection model 141 is a model that selects important features from labeled data.
- the feature selection model learning unit 13 performs pseudo learning of the feature selection model using the related data set from which the feature extraction unit 12 has extracted the features.
- the feature selection model learning unit 13 randomly selects a small amount of labeled data (sample for pseudo learning) and a sufficient amount of labeled data (sample for pseudo test) from the related data set. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.
- a kernel method-based feature selection model or a NN (Neural Networks)-based model is applied as the feature selection model 141.
- the storage unit 14 is realized by a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.
- the learned feature selection model 141 is stored in the storage unit 14 .
- the selection section 20 includes a data input section 21 , a feature extraction section 22 , a feature selection section 23 , and a result output section 24 .
- the data input unit 21 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit and receives target data sets in response to input operations by an operator.
- the data input unit 21 outputs the input target data set to the feature extraction unit 22.
- the target data set is a data set to be processed for feature quantity selection, and consists of a small amount of labeled data (third labeled data).
- the target data set may be input to the selection unit 20 from an external server device or the like via a communication control unit (not shown) implemented by a NIC or the like.
- the data input section 21 may be the same hardware as the learning data input section 11.
- the feature extraction unit 22 converts each labeled sample of the acquired target data set into a feature vector in preparation for processing in the feature selection unit 23.
- the feature selection unit 23 functions as a selection unit, and uses the learned feature selection model 141 to select important feature quantities from the target data set, which is the data to be processed for feature selection.
- the result output unit 24 is realized by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, etc., and outputs the result of the feature selection process to the operator. For example, the result output unit 24 outputs important features selected from the input target data set.
- the feature selection process of the feature selection device 1 includes a learning process by the learning unit 10 and a selection process by the selection unit 20.
- FIG. 3 is a flowchart showing the processing procedure of the learning process.
- the flowchart in FIG. 3 is started, for example, at the timing when the user inputs an operation instructing to start the learning process.
- the learning data input unit 11 receives a plurality of related data sets (labeled data) as input (step S1).
- the feature extraction unit 12 converts each labeled sample of the input related data set into a feature vector (step S2).
- the feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction (step S3).
- the feature selection model learning unit 13 extracts a small amount of labeled data and a sufficient amount of labeled data from each dataset using the data of the related dataset after the feature extraction unit 12 has extracted the features. Select randomly. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.
- the feature selection model learning unit 13 stores the learned feature selection model 141 in the storage unit 14.
- FIG. 4 is a flowchart showing the procedure of selection processing.
- the flowchart in FIG. 4 is started, for example, at the timing when the user inputs an operation instructing to start the selection process.
- the data input unit 21 receives input of the target data set to be processed (a small amount of labeled data) (step S11), and the feature extraction unit 22 converts each sample of the received target data set into a feature vector (step S11). S12).
- the feature selection unit 23 executes feature selection from the target data set using the feature selection model 141 (step S13). Then, the result output unit 24 outputs the feature selection result by the feature selection unit 23 (step S14).
- the feature selection device 1 acquires related data that has the same feature quantity structure as the data to be processed for feature selection, but has different feature quantity values.
- the feature selection device 1 inputs a small amount of labeled data selected from related data into a feature selection model, and inputs the result of feature selection and a sufficient amount of labeled data selected from the related data into the feature selection model.
- the feature selection model is trained so that the input and feature selection results match.
- the feature selection device 1 performs re-learning that requires expensive calculations for any target dataset by learning using related datasets instead of the target dataset (a small amount of labeled data). It is possible to select important features with high accuracy without having to perform
- the feature selection device 1 is able to select important features with high precision even for a small target data set by utilizing useful information from related data sets.
- the feature selection device 1 inputs a small amount of labeled data selected from the related data into the feature selection model, and uses the result of feature selection and a sufficient amount of labeled data selected from the related data to select features.
- the feature selection model 141 is trained so that the input to the model and the result of feature selection match.
- the feature selection device 1 causes the feature selection model 141 to learn how to select a good feature from a small amount of labeled data. Therefore, by using the feature selection model 141, the feature selection device 1 can select important features accurately and at low cost even when only a small amount of labeled data sets to be processed are obtained.
- y n represents a real value in the case of a regression problem, and a discrete value in the case of a classification problem.
- regression problems for the sake of simplicity, we will focus on regression problems.
- the dimension M of the feature vector is the same for all datasets.
- the problem is the same for all datasets. In other words, you don't have a regression problem in one dataset and a classification problem in another.
- the objective here is to select at most K features suitable for a target dataset S, which is not included in the relevant dataset, and is given to the feature selection phase.
- ⁇ K S (Equation (5)) is the centered Gram matrix for the dth feature.
- the (i,j) component of the matrix K (d) S is determined by the S-dependent kernel K (d) (S) (x (d) i , x (d) j ).
- K (d) (S) (x (d) i , x (d) j ) K (d) i , x (d) j ).
- Equation (3) The first term of the objective function (Equation (3)) can be decomposed as shown in Equation (6).
- equation (7) represents the estimated amount of the HSIC of the random variables X and Y by S.
- HSIC can be said to mean that "if HSIC>0, X and Y have a (nonlinear) correlation.” Based on this, equation (6) will be considered.
- Equation (8) represents the correlation between the d-th feature and the label.
- equation (8) When equation (8) is positive, ⁇ d takes on a large value in order to minimize equation (6). Conversely, in the case of equation (9), that is, when the two are unrelated, ⁇ d tends to become 0 due to the effect of the l1 regularization described in the previous section.
- ⁇ d can be interpreted as the importance of the d-th feature in predicting y.
- the second term of equation (6) (formula (10)) is positive, that is, if there is a correlation between d and d'-th features, ⁇ d and ⁇ d' At least one of these is likely to be 0. This means that features that are redundant in explaining y can be automatically eliminated.
- f and g are arbitrary neural networks. [,] represents a combination of two vectors. Since the "sum" of f does not depend on the order of the samples in S, equation (12) defines one vector z for the set S. Note that any neural network other than this type may be used as long as it is permutation invariant (for example, maximum value or set transformer).
- ⁇ (z) (Equation (14)) and ⁇ (z) (Equation (15)) are each modeled by a neural network that takes z as an input.
- equation (9) is immediately derived from the definition of equation (8).
- ⁇ d becomes 0 due to equation (3) and l1 regularization.
- ⁇ y is a hyperparameter.
- ISTA Intelligent Shrinkage Thresholding Algorithm
- each optimization update equation can be written in a closed form and is differentiable. Being able to write in a closed format has the advantage that learning can be carried out efficiently. Also, being differentiable is a necessary condition for learning a feature selection model using stochastic gradient descent, similar to the learning of a general NN. Any optimization method other than ISTA can be used as long as the update formula can be found in a differentiable form.
- [Learning phase] we describe a model learning method using related datasets.
- S the sample set from the related data set.
- the same symbols as the feature selection phase are used, but these are different data.
- the learning parameters of the model are the parameters of the neural networks f, g, ⁇ , and ⁇ , the initial parameter ⁇ 0 of ISTA, and the regularization parameter ⁇ .
- ⁇ can be calculated efficiently when selecting features.
- the objective functions are equations (18) and (19).
- ⁇ * d represents the importance of the d-th feature obtained after running ISTA from S for I iterations.
- the pseudo small amount of training data (a small amount of labeled data) obtained by randomly sampling from the related data set D t and the pseudo test data (a sufficient amount of labeled data) are represented by S and Q, respectively. ing.
- ⁇ L Q (Equation (20)) is a centered Gram matrix in pseudo test data Q regarding label y, and the (i,j) component of matrix L Q is determined by the kernel function L (y i , y j ) .
- ⁇ is the centering matrix.
- ⁇ K (d) S (Equation (21)) is the centered Gram matrix on the pseudo test data Q for the dth feature, and the (i,j) component of the matrix K (d) Q is the kernel It is determined by K (d) (x (d) i , x (d) j ).
- Equation (3) The same kernel as in equation (3) is used as the kernel function L for label y.
- the kernel for the feature amount d we use the RBF kernel of equation (22), which is often used when using HSIC.
- ⁇ x is a hyperparameter. Unlike Equation (3), by using the same kernel for all features, all features can be evaluated equally without bias.
- Equation (17) assumes a small value when ⁇ * d obtained from a small amount of pseudo training data, which is a small amount of labeled data S, matches that obtained from pseudo test data Q, which is a sufficient amount of labeled data.
- the present learning method by applying the present learning method, it is possible to accurately select features from the pseudo-small amount learning data S, which is data with a small amount of labels.
- FIG. 5 is a diagram for explaining the processing of the learning section 10.
- FIG. 5 exemplifies the pseudo code of the processing of the learning unit 10.
- the learning unit 10 takes D as a related data set, a small amount of labeled data (pseudo small amount training data) S (number of samples N S ), a sufficient amount of labeled data (pseudo test data) Q (number of samples N Q ) Get (Algorithm 1).
- the learning unit 10 randomly selects a sample task t, a small amount of labeled data S, and a sufficient amount of labeled data Q (lines 2-4 of Algorithm 1).
- the learning unit 10 calculates a vector z from a small amount of labeled data S using equation (12) (fifth line of Algorithm 1).
- the learning unit 10 calculates the kernel of equation (13) using the vector representation z of the small amount of labeled data S (line 6 of Aolorithm1).
- the learning unit 10 obtains the global optimal solution of the objective function (formula (3)) by alternately repeating the update formulas of formula (16) and formula (17) described using ISTA (7 to 7 of Aolorithm1). line 9).
- the learning unit 10 calculates the loss for a sufficient amount of labeled data Q based on equations (18) and (19) (line 10 of Aolorithm1).
- the learning unit 10 updates the parameters of the feature selection model based on the loss calculation results (line 11 of Aolorithm1).
- the feature selection device provides specific improvements over conventional feature selection methods such as those described in Non-Patent Documents 1 and 2, and is a technical field related to feature selection performance evaluation. It shows improvement.
- Each component of the feature selection device 1 is functionally conceptual, and does not necessarily need to be physically configured as shown.
- the specific form of distributing and integrating the functions of the feature selection device 1 is not limited to what is shown in the figure, and all or part of them can be divided into functional or physical units in arbitrary units depending on various loads and usage conditions. It can be configured in a distributed or integrated manner.
- each process performed in the feature selection device 1 may be realized by a CPU, a GPU (Graphics Processing Unit), or a program that is analyzed and executed by the CPU and GPU.
- each process performed in the feature selection device 1 may be realized as hardware using wired logic.
- FIG. 6 is a diagram showing an example of a computer that implements the feature selection device 1 by executing a program.
- Computer 1000 includes, for example, memory 1010 and CPU 1020.
- the computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.
- the memory 1010 includes a ROM 1011 and a RAM 1012.
- the ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System).
- Hard disk drive interface 1030 is connected to hard disk drive 1090.
- Disk drive interface 1040 is connected to disk drive 1100.
- Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120.
- Video adapter 1060 is connected to display 1130, for example.
- the hard disk drive 1090 stores, for example, an OS (Operating System) 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the feature selection device 1 is implemented as a program module 1093 in which code executable by the computer 1000 is written.
- Program module 1093 is stored in hard disk drive 1090, for example.
- a program module 1093 for executing processing similar to the functional configuration of the feature selection device 1 is stored in the hard disk drive 1090.
- the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- the setting data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.
- program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
- the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.).
- the program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.
- the processor includes: Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
- the first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed.
- a learning device that performs learning of the feature selection model so that the results of the feature selection model match the results of the training.
- the learning device according to Supplementary Note 1, The second labeled data has a larger amount of data than the first labeled data.
- a non-transitory storage medium storing a program executable by a computer to perform a learning process, The learning process is Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values; The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed.
- a non-temporary storage medium that performs learning of the feature selection model so that the results of the above-mentioned feature selection model match the results of the above training.
- Feature selection device 10 Learning unit 11 Learning data input unit 12, 22 Feature extraction unit 13 Feature selection model learning unit 14 Storage unit 20 Selection unit 21 Data input unit 23 Feature selection unit 24 Result output unit 141 Feature selection model
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
In the present invention, a feature selection device (1) includes: a learning data input unit (11) that acquires related data which has the same feature amount constitution as data serving as a target for feature selection processing but with different feature amount values; and a feature selection model learning unit (13) that trains a feature selection model such that the results of feature selection, obtained by inputting a small amount of labeled data selected from the related data to a feature selection model, match the results of feature selection, obtained by inputting a sufficient amount of labeled data selected from the related data to the feature selection model.
Description
本発明は、学習装置、学習方法及び学習プログラムに関する。
The present invention relates to a learning device, a learning method, and a learning program.
図7は、教師あり特徴選択を説明する図である。図7に示すように、教師あり特徴選択とは、教師あり機械学習により、ラベルありデータの特徴量のうち、重要な特徴量を抽出する技術である(非特許文献1,2参照)。ラベルありデータは、例えば、テキスト分類の場合、各サンプルは文章であり、ラベルは文章が表す内容(政治、スポーツなど)を表す。各サンプルの特徴量は、例えば、文章内の各単語の出現頻度”が用いられる。一例として、文章={“久保”:5, “日本代表”:3, “総理大臣”:0,…}であり、ラベル=“サッカー”がある。
FIG. 7 is a diagram explaining supervised feature selection. As shown in FIG. 7, supervised feature selection is a technique that uses supervised machine learning to extract important features from the features of labeled data (see Non-Patent Documents 1 and 2). For labeled data, for example, in the case of text classification, each sample is a sentence, and the label represents the content (politics, sports, etc.) that the sentence represents. The feature value of each sample is, for example, the frequency of appearance of each word in the sentence.As an example, sentence = {“Kubo”:5, “Japanese representative”:3, “Prime Minister”:0,…} , and the label = “soccer”.
重要な特徴量のみ抽出することで、データ分析における解釈性が向上したり、クラスタリングやクラスタリング等の後処理において、抽出された特徴量のみを対象にすることにより処理を高速化したりすることできる。
By extracting only important feature quantities, interpretability in data analysis can be improved, and in post-processing such as clustering, processing can be sped up by targeting only the extracted feature quantities.
非特許文献1,2に記載の技術では、正確な特徴選択を実施するために、大量のラベルありデータを必要とする。特に、非特許文献2に記載の技術では、NN(Neural Networks)を用いる。一般的に、NNは、大量のデータが必要となるため、非特許文献2に記載の技術では、さらに多くのデータを必要とする。
The techniques described in Non-Patent Documents 1 and 2 require a large amount of labeled data in order to perform accurate feature selection. In particular, the technique described in Non-Patent Document 2 uses NN (Neural Networks). Since NN generally requires a large amount of data, the technique described in Non-Patent Document 2 requires even more data.
しかしながら、実問題では大量のデータを準備できないことが頻繁に起こる。例えば、ユーザのデータから購買行動等の分析を行いたいとき、新規ユーザや利用頻度の低いユーザからは多くのデータを得られない。同様に、新しい機器のデータから機器の特性を分析したい場合、新しい機器はデータが十分にないため、すぐに分析を実施することが出来ない。
However, in real problems, it often happens that large amounts of data cannot be prepared. For example, when you want to analyze purchasing behavior and the like from user data, it is difficult to obtain much data from new users or users who use the service infrequently. Similarly, if you want to analyze the characteristics of a new device from the data of the new device, you cannot immediately perform the analysis because there is not enough data for the new device.
このような場合、従来の技術では、データ分析に有用となる特徴を適切に選択することができないため、特徴選択の適用が困難となってしまう。
In such cases, conventional techniques cannot appropriately select features that are useful for data analysis, making it difficult to apply feature selection.
本発明は、上記に鑑みてなされたものであって、ラベルありデータが少量である場合であっても、精度のよい特徴選択を可能とすることができる学習装置、学習方法及び学習プログラムを提供することを目的とする。
The present invention has been made in view of the above, and provides a learning device, a learning method, and a learning program that enable accurate feature selection even when labeled data is small. The purpose is to
上述した課題を解決し、目的を達成するために、本発明に係る学習装置は、特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得する取得部と、前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う学習部と、を有することを特徴とする。
In order to solve the above-mentioned problems and achieve the purpose, the learning device according to the present invention acquires related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values. an acquisition unit that inputs the first labeled data selected from the related data to the feature selection model, and inputs the result of feature selection and the second labeled data selected from the related data to the feature selection model. The present invention is characterized by comprising a learning unit that performs learning of the feature selection model so that the input and the result of feature selection match.
本発明によれば、ラベルありデータが少量である場合であっても、精度のよい特徴選択を可能とすることができる。
According to the present invention, even when there is a small amount of labeled data, it is possible to select features with high accuracy.
以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。なお、以下では、行列であるAに対し、“ ̄A”と記載する場合は「“A”の直上に“ ̄”が記された記号」と同じであるとする。
Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the description of the drawings, the same parts are denoted by the same reference numerals. In addition, in the following, when " ̄A" is written for the matrix A, it is assumed that it is the same as "a symbol with " ̄" written directly above "A"."
[実施の形態]
[特徴選択装置の概要]
本実施の形態に係る特徴選択装置は、複数の関連する十分なラベルありデータセットを活用することで、特徴量を選択したい目標データセットである少量のラベルありデータから精度よく特徴選択を実現する。 [Embodiment]
[Overview of feature selection device]
The feature selection device according to the present embodiment utilizes a plurality of related sufficiently labeled datasets to accurately select features from a small amount of labeled data, which is the target dataset from which features are to be selected. .
[特徴選択装置の概要]
本実施の形態に係る特徴選択装置は、複数の関連する十分なラベルありデータセットを活用することで、特徴量を選択したい目標データセットである少量のラベルありデータから精度よく特徴選択を実現する。 [Embodiment]
[Overview of feature selection device]
The feature selection device according to the present embodiment utilizes a plurality of related sufficiently labeled datasets to accurately select features from a small amount of labeled data, which is the target dataset from which features are to be selected. .
図1は、実施の形態に係る特徴選択装置の概要を説明する図である。特徴選択装置は、図1に示すように、学習フェーズでは、関連データセットのデータのみを用いて、特徴選択の処理対象の少量のラベルありデータ(第3のラベルありデータ)から、精度よく特徴選択を行うモデルを学習する。
FIG. 1 is a diagram illustrating an overview of a feature selection device according to an embodiment. As shown in Figure 1, in the learning phase, the feature selection device uses only data from related datasets to accurately select features from a small amount of labeled data (third labeled data) to be processed for feature selection. Learn a model to make choices.
具体的には、特徴選択装置は、学習フェーズでは、関連データセットt(t=1,…,T)のうち、少量のラベルありデータ(第1のラベルありデータ)をランダムに抽出し(図1の(1))、特徴選択モデルに入力し、特徴選択を行う(図1の(2))。
Specifically, in the learning phase, the feature selection device randomly extracts a small amount of labeled data (first labeled data) from the related data set t (t=1,...,T) (Fig. (1) of Figure 1), input to the feature selection model and perform feature selection ((2) of Figure 1).
そして、特徴選択装置は、学習フェーズでは、関連データセットtの、十分な量のラベルありデータ(第2のラベルありデータ)についても特徴選択モデルに入力し、特徴選択を行う(図1の(2))。十分な量のラベルありデータは、少量のラベルありデータと比して、データ量が十分に多い。
Then, in the learning phase, the feature selection device also inputs a sufficient amount of labeled data (second labeled data) of the related data set t to the feature selection model and performs feature selection (( 2)). A sufficient amount of labeled data has a sufficiently large amount of data compared to a small amount of labeled data.
特徴選択装置は、学習フェーズでは、関連データセットtで、少量のラベルありデータから特徴選択を実行した結果(特徴F1)と、十分な量のラベルありデータから特徴選択した結果(特徴F2)とが一致するように、特徴選択モデルの学習を行う(図1の(3))。
In the learning phase, the feature selection device selects the result of feature selection from a small amount of labeled data (feature F1) and the result of feature selection from a sufficient amount of labeled data (feature F2) in the related dataset t. The feature selection model is trained so that they match ((3) in FIG. 1).
そして、特徴選択装置は、テスト(特徴選択)フェーズでは、学習した特徴選択モデルに、目標データセット(少量のラベルありデータ)を入力することで、適切な特徴を選択する。
Then, in the test (feature selection) phase, the feature selection device selects appropriate features by inputting the target data set (a small amount of labeled data) to the learned feature selection model.
このように、特徴選択装置は、複数の関連データセットを用いて、少量のラベルありデータからの特徴選択の良い選択の仕方を、特徴選択モデルに学習させる。特徴選択装置は、このように学習した特徴選択モデルを用いることによって、少量のラベルありデータであっても特徴選択の適切な実行を可能にする。
In this way, the feature selection device uses a plurality of related data sets to teach the feature selection model how to select features well from a small amount of labeled data. By using the feature selection model learned in this way, the feature selection device can appropriately select features even for a small amount of labeled data.
なお、関連データセットとは、例えば、同一の被写体についての色味の異なる画像等、目標データセットと構成する特徴量(名)が同一であって、条件が異なり各特徴量の値の分布が異なるデータセットを意味する。
Note that a related dataset is one that has the same feature quantities (name) as the target dataset, such as images of the same subject with different colors, but has different conditions and the distribution of the values of each feature. Meaning different datasets.
[特徴選択装置]
図2は、実施の形態に係る特徴選択装置の構成の一例を模式的に示す図である。実施の形態に係る特徴選択装置1(学習装置)は、例えば、ROM(Read Only Memory)、RAM(Random Access Memory)、CPU(Central Processing Unit)等を含むコンピュータ等に所定のプログラムが読み込まれて、CPUが所定のプログラムを実行することで実現される。また、特徴選択装置1は、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースを有する。特徴選択装置1は、ワークステーションやパソコン等の汎用コンピュータで実現される。特徴選択装置1は、図2に示すように、学習処理を行う学習部10と、特徴選択処理を行う選択部20とを有する。 [Feature selection device]
FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment. The feature selection device 1 (learning device) according to the embodiment has a predetermined program loaded into a computer, etc., including, for example, ROM (Read Only Memory), RAM (Random Access Memory), CPU (Central Processing Unit), etc. , is realized by the CPU executing a predetermined program. Further, thefeature selection device 1 has a communication interface for transmitting and receiving various information to and from other devices connected via a network or the like. The feature selection device 1 is realized by a general-purpose computer such as a workstation or a personal computer. As shown in FIG. 2, the feature selection device 1 includes a learning section 10 that performs learning processing and a selection section 20 that performs feature selection processing.
図2は、実施の形態に係る特徴選択装置の構成の一例を模式的に示す図である。実施の形態に係る特徴選択装置1(学習装置)は、例えば、ROM(Read Only Memory)、RAM(Random Access Memory)、CPU(Central Processing Unit)等を含むコンピュータ等に所定のプログラムが読み込まれて、CPUが所定のプログラムを実行することで実現される。また、特徴選択装置1は、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースを有する。特徴選択装置1は、ワークステーションやパソコン等の汎用コンピュータで実現される。特徴選択装置1は、図2に示すように、学習処理を行う学習部10と、特徴選択処理を行う選択部20とを有する。 [Feature selection device]
FIG. 2 is a diagram schematically showing an example of the configuration of the feature selection device according to the embodiment. The feature selection device 1 (learning device) according to the embodiment has a predetermined program loaded into a computer, etc., including, for example, ROM (Read Only Memory), RAM (Random Access Memory), CPU (Central Processing Unit), etc. , is realized by the CPU executing a predetermined program. Further, the
学習部10は、複数の関連データセット(ラベルありデータ)を用いて、特徴選択モデル141の学習を行う。
The learning unit 10 trains the feature selection model 141 using a plurality of related data sets (labeled data).
選択部20は、目標データセット(少量のラベルありデータ)が与えられた際に、得られた特徴選択モデル141を用いて、目標データセット(少量のラベルありデータ)から適切な特徴選択を行う。選択部20は、学習部10と同一のハードウェアに実装されてもよいし、異なるハードウェアに実装されてもよい。
When a target data set (a small amount of labeled data) is given, the selection unit 20 uses the obtained feature selection model 141 to select appropriate features from the target data set (a small amount of labeled data). . The selection unit 20 may be implemented in the same hardware as the learning unit 10, or may be implemented in different hardware.
[学習部]
学習部10は、学習データ入力部11(取得部)、特徴抽出部12、特徴選択モデル学習部13(学習部)、および格納部14を有する。 [Study Department]
Thelearning section 10 includes a learning data input section 11 (acquisition section), a feature extraction section 12, a feature selection model learning section 13 (learning section), and a storage section 14.
学習部10は、学習データ入力部11(取得部)、特徴抽出部12、特徴選択モデル学習部13(学習部)、および格納部14を有する。 [Study Department]
The
学習データ入力部11は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部に対して各種指示情報を入力する。学習データ入力部11は、取得部として機能し、特徴選択の処理対象の目標データセットと特徴量の構成が同一であって特徴量の値が異なる関連データセット(ラベルありサンプル)を入力として受け取る。
The learning data input unit 11 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit in response to input operations by an operator. The learning data input unit 11 functions as an acquisition unit, and receives as input a related data set (sample with label) that has the same feature configuration as the target data set to be processed for feature selection, but has different feature value values. .
関連データセットは、NIC(Network Interface Card)等で実現される図示しない通信制御部を介して、外部のサーバ装置等から学習部10に入力されてもよい。
The related data set may be input to the learning unit 10 from an external server device or the like via a communication control unit (not shown) implemented by a NIC (Network Interface Card) or the like.
特徴抽出部12は、取得された関連データセットの各ラベルありサンプルを特徴ベクトルに変換する。ここで、特徴ベクトルとは、必要なデータの特徴をn次元の数ベクトルで表記したものである。特徴抽出部12は、機械学習で一般的に用いられている手法を利用して、特徴ベクトルへの変換を行う。例えば、特徴抽出部12は、データがテキストである場合には、形態素解析による手法、n-gramによる手法、区切り文字による手法等を適用可能である。
The feature extraction unit 12 converts each labeled sample of the acquired related data set into a feature vector. Here, the feature vector is a representation of the features of necessary data as an n-dimensional numerical vector. The feature extraction unit 12 performs conversion into a feature vector using a method commonly used in machine learning. For example, when the data is text, the feature extraction unit 12 can apply a method using morphological analysis, a method using n-grams, a method using delimiters, etc.
特徴選択モデル学習部13は、特徴抽出後のデータを用いて各データセットに適した特徴選択を実行する、特徴選択モデル141を学習する。特徴選択モデル141は、ラベルありデータからの重要な特徴量を選択するモデルである。
The feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction. The feature selection model 141 is a model that selects important features from labeled data.
具体的には、特徴選択モデル学習部13は、特徴抽出部12が特徴を抽出した後の関連データセットを用いて、疑似的に特徴選択モデルの学習を行う。特徴選択モデル学習部13は、関連データセットから、少量のラベルありデータ(疑似学習用サンプル)と、十分な量のラベルありデータ(疑似テスト用サンプル)と、をランダムに選択する。そして、特徴選択モデル学習部13は、少量のラベルありデータで特徴選択を実行した結果と、十分な量のラベルありデータで実行した結果とが一致するよう明示的に学習を行う。
Specifically, the feature selection model learning unit 13 performs pseudo learning of the feature selection model using the related data set from which the feature extraction unit 12 has extracted the features. The feature selection model learning unit 13 randomly selects a small amount of labeled data (sample for pseudo learning) and a sufficient amount of labeled data (sample for pseudo test) from the related data set. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.
特徴選択モデル141としては、カーネル法ベースの特徴選択モデルや、NN(Neural Networks)ベースのモデルが適用される。
As the feature selection model 141, a kernel method-based feature selection model or a NN (Neural Networks)-based model is applied.
格納部14は、RAM、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。格納部14には、学習された特徴選択モデル141が格納される。
The storage unit 14 is realized by a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. The learned feature selection model 141 is stored in the storage unit 14 .
[選択部]
選択部20は、データ入力部21、特徴抽出部22、特徴選択部23、及び、結果出力部24を有する。 [Selection section]
The selection section 20 includes a data input section 21 , a feature extraction section 22 , a feature selection section 23 , and a result output section 24 .
選択部20は、データ入力部21、特徴抽出部22、特徴選択部23、及び、結果出力部24を有する。 [Selection section]
The selection section 20 includes a data input section 21 , a feature extraction section 22 , a feature selection section 23 , and a result output section 24 .
データ入力部21は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部に対して各種指示情報を入力したり、目標データセットを受け付けたりする。データ入力部21は、入力された目標データセットを、特徴抽出部22に出力する。目標データセットは、特徴量の選択の処理対象のデータセットであり、少量のラベルありデータ(第3のラベルありデータ)からなる。
The data input unit 21 is realized using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit and receives target data sets in response to input operations by an operator. The data input unit 21 outputs the input target data set to the feature extraction unit 22. The target data set is a data set to be processed for feature quantity selection, and consists of a small amount of labeled data (third labeled data).
なお、目標データセットは、NIC等で実現される図示しない通信制御部を介して、外部のサーバ装置等から選択部20に入力されてもよい。また、データ入力部21は、学習データ入力部11と同一のハードウェアでもよい。
Note that the target data set may be input to the selection unit 20 from an external server device or the like via a communication control unit (not shown) implemented by a NIC or the like. Further, the data input section 21 may be the same hardware as the learning data input section 11.
特徴抽出部22は、学習部10の特徴抽出部12と同様に、特徴選択部23における処理の準備として、取得された目標データセットの各ラベルありサンプルを特徴ベクトルに変換する。
Similar to the feature extraction unit 12 of the learning unit 10, the feature extraction unit 22 converts each labeled sample of the acquired target data set into a feature vector in preparation for processing in the feature selection unit 23.
特徴選択部23は、選択部として機能し、学習された特徴選択モデル141を用いて、特徴選択の処理対象のデータである目標データセットから、重要な特徴量を選択する。
The feature selection unit 23 functions as a selection unit, and uses the learned feature selection model 141 to select important feature quantities from the target data set, which is the data to be processed for feature selection.
結果出力部24は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置、情報通信装置等によって実現され、特徴選択処理の結果を操作者に対して出力する。例えば、結果出力部24は、入力された目標データセットから選択された重要な特徴量を出力する。
The result output unit 24 is realized by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, etc., and outputs the result of the feature selection process to the operator. For example, the result output unit 24 outputs important features selected from the input target data set.
[特徴選択処理の処理手順]
次に、図3および図4を参照して、特徴選択装置1による特徴選択処理の処理手順について説明する。特徴選択装置1の特徴選択処理は、学習部10による学習処理と、選択部20による選択処理とを含む。 [Processing procedure of feature selection processing]
Next, with reference to FIGS. 3 and 4, the procedure of feature selection processing by thefeature selection device 1 will be described. The feature selection process of the feature selection device 1 includes a learning process by the learning unit 10 and a selection process by the selection unit 20.
次に、図3および図4を参照して、特徴選択装置1による特徴選択処理の処理手順について説明する。特徴選択装置1の特徴選択処理は、学習部10による学習処理と、選択部20による選択処理とを含む。 [Processing procedure of feature selection processing]
Next, with reference to FIGS. 3 and 4, the procedure of feature selection processing by the
[学習処理]
図3は、学習処理の処理手順を示すフローチャートである。図3のフローチャートは、例えば、ユーザによる学習処理の開始を指示する操作入力があったタイミングで開始される。 [Learning process]
FIG. 3 is a flowchart showing the processing procedure of the learning process. The flowchart in FIG. 3 is started, for example, at the timing when the user inputs an operation instructing to start the learning process.
図3は、学習処理の処理手順を示すフローチャートである。図3のフローチャートは、例えば、ユーザによる学習処理の開始を指示する操作入力があったタイミングで開始される。 [Learning process]
FIG. 3 is a flowchart showing the processing procedure of the learning process. The flowchart in FIG. 3 is started, for example, at the timing when the user inputs an operation instructing to start the learning process.
図3に示すように、学習データ入力部11は、複数の関連データセット(ラベルありデータ)を入力として受け取る(ステップS1)。特徴抽出部12は、入力を受け付けた関連データセットの各ラベルありサンプルを特徴ベクトルに変換する(ステップS2)。
As shown in FIG. 3, the learning data input unit 11 receives a plurality of related data sets (labeled data) as input (step S1). The feature extraction unit 12 converts each labeled sample of the input related data set into a feature vector (step S2).
特徴選択モデル学習部13は、特徴選択モデル学習部13は、特徴抽出後のデータを用いて各データセットに適した特徴選択を実行する、特徴選択モデル141を学習する(ステップS3)。特徴選択モデル学習部13は、特徴抽出部12が特徴を抽出した後の関連データセットのデータを用いて、各データセットから、少量のラベルありデータと、十分な量のラベルありデータと、をランダムに選択する。そして、特徴選択モデル学習部13は、少量のラベルありデータで特徴選択を実行した結果と、十分な量のラベルありデータで実行した結果とが一致するよう明示的に学習を行う。
The feature selection model learning unit 13 learns a feature selection model 141 that executes feature selection suitable for each data set using the data after feature extraction (step S3). The feature selection model learning unit 13 extracts a small amount of labeled data and a sufficient amount of labeled data from each dataset using the data of the related dataset after the feature extraction unit 12 has extracted the features. Select randomly. Then, the feature selection model learning unit 13 explicitly performs learning so that the result of performing feature selection using a small amount of labeled data matches the result of performing feature selection using a sufficient amount of labeled data.
特徴選択モデル学習部13は、学習した特徴選択モデル141を格納部14に格納する。
The feature selection model learning unit 13 stores the learned feature selection model 141 in the storage unit 14.
[選択処理]
図4は、選択処理の処理手順を示すフローチャートである。図4のフローチャートは、例えば、ユーザによる選択処理の開始を指示する操作入力があったタイミングで開始される。 [Selection process]
FIG. 4 is a flowchart showing the procedure of selection processing. The flowchart in FIG. 4 is started, for example, at the timing when the user inputs an operation instructing to start the selection process.
図4は、選択処理の処理手順を示すフローチャートである。図4のフローチャートは、例えば、ユーザによる選択処理の開始を指示する操作入力があったタイミングで開始される。 [Selection process]
FIG. 4 is a flowchart showing the procedure of selection processing. The flowchart in FIG. 4 is started, for example, at the timing when the user inputs an operation instructing to start the selection process.
データ入力部21が、処理対象の目標データセット(少量のラベルありデータ)の入力を受け付け(ステップS11)、特徴抽出部22が、受け付けた目標データセットの各サンプルを特徴ベクトルに変換する(ステップS12)。
The data input unit 21 receives input of the target data set to be processed (a small amount of labeled data) (step S11), and the feature extraction unit 22 converts each sample of the received target data set into a feature vector (step S11). S12).
特徴選択部23は、特徴選択モデル141を用いて、目標データセットから特徴選択を実行する(ステップS13)。そして、結果出力部24が、特徴選択部23による特徴選択結果を出力する(ステップS14)。
The feature selection unit 23 executes feature selection from the target data set using the feature selection model 141 (step S13). Then, the result output unit 24 outputs the feature selection result by the feature selection unit 23 (step S14).
[実施の形態の効果]
実施の形態に係る特徴選択装置1は、特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得する。特徴選択装置1は、関連データから選択した少量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、関連データから選択した、十分な量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、特徴選択モデルの学習を行う。 [Effects of embodiment]
Thefeature selection device 1 according to the embodiment acquires related data that has the same feature quantity structure as the data to be processed for feature selection, but has different feature quantity values. The feature selection device 1 inputs a small amount of labeled data selected from related data into a feature selection model, and inputs the result of feature selection and a sufficient amount of labeled data selected from the related data into the feature selection model. The feature selection model is trained so that the input and feature selection results match.
実施の形態に係る特徴選択装置1は、特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得する。特徴選択装置1は、関連データから選択した少量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、関連データから選択した、十分な量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、特徴選択モデルの学習を行う。 [Effects of embodiment]
The
このように、特徴選択装置1は、目標データセット(少量のラベルありデータ)ではなく、関連データセットを用いて学習することにより、任意の目標データセットに対して高コストな計算を要する再学習を行わなくても、高精度に重要な特徴量を選択することが可能となる。
In this way, the feature selection device 1 performs re-learning that requires expensive calculations for any target dataset by learning using related datasets instead of the target dataset (a small amount of labeled data). It is possible to select important features with high accuracy without having to perform
つまり、特徴選択装置1は、関連データセットの有用な情報を活用して、少量の目標データセットであっても、重要な特徴量の選択を高精度に実現することが可能となる。
In other words, the feature selection device 1 is able to select important features with high precision even for a small target data set by utilizing useful information from related data sets.
そして、特徴選択装置1は、関連データから選択した少量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、関連データから選択した、十分な量のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、特徴選択モデル141の学習を行う。
Then, the feature selection device 1 inputs a small amount of labeled data selected from the related data into the feature selection model, and uses the result of feature selection and a sufficient amount of labeled data selected from the related data to select features. The feature selection model 141 is trained so that the input to the model and the result of feature selection match.
これによって、特徴選択装置1は、少量のラベルありデータから良い特徴選択の仕方を特徴選択モデル141に学習させる。したがって、特徴選択装置1は、処理対象のラベルありデータセットが少量しか得られない場合にも、特徴選択モデル141を用いることで、低コストに精度よく重要な特徴量の選択が可能となる。
With this, the feature selection device 1 causes the feature selection model 141 to learn how to select a good feature from a small amount of labeled data. Therefore, by using the feature selection model 141, the feature selection device 1 can select important features accurately and at low cost even when only a small amount of labeled data sets to be processed are obtained.
[適用例]
本実施の形態の適用例について具体的に説明する。まず、S(式(1))を目標データセット(少量のラベルありデータ)とする。 [Application example]
An application example of this embodiment will be specifically described. First, let S (formula (1)) be a target data set (a small amount of labeled data).
本実施の形態の適用例について具体的に説明する。まず、S(式(1))を目標データセット(少量のラベルありデータ)とする。 [Application example]
An application example of this embodiment will be specifically described. First, let S (formula (1)) be a target data set (a small amount of labeled data).
ここで、xn=(xn
(1),…,xn
(D))Tは、n番目のサンプルのD次元特徴ベクトルを表す。ynは、回帰問題の場合は実数値を表し、分類問題の場合は離散値を表すこととする。ここでは、簡単のため回帰問題に焦点を絞って説明する。
Here, x n = (x n (1) ,..., x n (D) ) T represents the D-dimensional feature vector of the nth sample. y n represents a real value in the case of a regression problem, and a discrete value in the case of a classification problem. Here, for the sake of simplicity, we will focus on regression problems.
今、T個の関連データセット(式(2))が、学習フェーズに与えられたとする。
Now, assume that T related data sets (Equation (2)) are given to the learning phase.
すべてのデータセットで特徴ベクトルの次元Mは同じであると仮定する。また、すべてのデータセットで問題も同じと仮定する。つまり、あるデータセットでは回帰問題、別のデータセットでは分類問題といったことはない。ここでの目的は、関連データセットには含まれない目標データセットSが特徴選択フェーズに与えられたとき、そのデータセットに適した高々K個の特徴を選択することである。
Assume that the dimension M of the feature vector is the same for all datasets. We also assume that the problem is the same for all datasets. In other words, you don't have a regression problem in one dataset and a classification problem in another. The objective here is to select at most K features suitable for a target dataset S, which is not included in the relevant dataset, and is given to the feature selection phase.
まず、Sから特徴選択を行う特徴選択モデルを説明する(特徴選択フェーズ)。その後、当該モデルの学習方法を説明する(学習フェーズ)。
First, we will explain the feature selection model that selects features from S (feature selection phase). After that, a learning method for the model will be explained (learning phase).
[特徴選択フェーズ]
特徴選択フェーズでは、各特徴量の重要度を表す非負実数αdを導入し、これをSから推定する。αdの大きい順に、対応する特徴をK個選ぶことで、特徴選択が可能となる。α=(α1,…αD)を推定するために、式(3)の最適化問題を考える。 [Feature selection phase]
In the feature selection phase, a non-negative real number α d representing the importance of each feature is introduced and estimated from S. Feature selection is possible by selecting K corresponding features in descending order of α d . In order to estimate α=(α 1 , . . . α D ), consider the optimization problem of equation (3).
特徴選択フェーズでは、各特徴量の重要度を表す非負実数αdを導入し、これをSから推定する。αdの大きい順に、対応する特徴をK個選ぶことで、特徴選択が可能となる。α=(α1,…αD)を推定するために、式(3)の最適化問題を考える。 [Feature selection phase]
In the feature selection phase, a non-negative real number α d representing the importance of each feature is introduced and estimated from S. Feature selection is possible by selecting K corresponding features in descending order of α d . In order to estimate α=(α 1 , . . . α D ), consider the optimization problem of equation (3).
ここで、|| ||Fは、フロベニウスノルムを表す。|| ||1は、l1ノルムを表す。λは、正の実数を表す。 ̄LS(式(4))は、ラベルyに関する中心化グラム行列である。行列LSの(i,j)成分は、S依存のカーネル関数L(S)(yi,yj)で定まる。Γは、中心化行列である。
Here, || || F represents the Frobenius norm. || || 1 represents the l1 norm. λ represents a positive real number.  ̄L S (Equation (4)) is a centered Gram matrix with respect to label y. The (i,j) component of the matrix L S is determined by the S-dependent kernel function L(S) (y i , y j ). Γ is the centering matrix.
同様に、 ̄KS(式(5))はd番目の特徴に関する中心化グラム行列である。行列K(d)
Sの(i,j)成分は、S依存のカーネルK(d)(S)(x(d)
i,x(d)
j)で定まる。カーネル関数の具体例は後ほど述べる。
Similarly,  ̄K S (Equation (5)) is the centered Gram matrix for the dth feature. The (i,j) component of the matrix K (d) S is determined by the S-dependent kernel K (d) (S) (x (d) i , x (d) j ). A specific example of the kernel function will be described later.
目的関数の説明を行う。目的関数(式(3))の第1項は、式(6)のように分解できる。
Explain the objective function. The first term of the objective function (Equation (3)) can be decomposed as shown in Equation (6).
ここで、式(7)は、確率変数X,YのHSICのSによる推定量を表す。
Here, equation (7) represents the estimated amount of the HSIC of the random variables X and Y by S.
HSICは、以下の性質を有する。
XとYが独立⇔HSIC=0 HSIC has the following properties.
X and Y are independent ⇔ HSIC = 0
XとYが独立⇔HSIC=0 HSIC has the following properties.
X and Y are independent ⇔ HSIC = 0
言い換えると、HSICは、「HSIC>0ならば、XとYは(非線形な)相関を有する」ことがいえる。これを踏まえて、式(6)について検討を行う。
In other words, HSIC can be said to mean that "if HSIC>0, X and Y have a (nonlinear) correlation." Based on this, equation (6) will be considered.
式(8)は、d番目の特徴とラベルとの相関を表す。
Equation (8) represents the correlation between the d-th feature and the label.
式(8)が正の場合、式(6)を最小化するために、αdは大きな値をとるようになる。逆に、式(9)の場合、つまり両者は無関係の場合、前項のl1正則化の効果により、αdは0になりやすい。
When equation (8) is positive, α d takes on a large value in order to minimize equation (6). Conversely, in the case of equation (9), that is, when the two are unrelated, α d tends to become 0 due to the effect of the l1 regularization described in the previous section.
つまり、αdは、d番目の特徴のyの予測における重要度と解釈できる。また、式(6)の第二項(式(10))が正の場合、つまり、dとd´番目の特徴に相関がある場合、目的関数を小さくするために、αdとαd´との少なくとも一方は0になりやすい。これは、yを説明するに冗長な特徴を自動的に排除できることを意味する。
In other words, α d can be interpreted as the importance of the d-th feature in predicting y. In addition, if the second term of equation (6) (formula (10)) is positive, that is, if there is a correlation between d and d'-th features, α d and α d' At least one of these is likely to be 0. This means that features that are redundant in explaining y can be automatically eliminated.
前述の説明の通り、目的関数を最小化することで所望の結果が得られる。しかしながら、前提として、HSICの推定量(式(11))が正確に見積もられているというものがあった。
As explained above, the desired result can be obtained by minimizing the objective function. However, the assumption was that the HSIC estimator (Equation (11)) was accurately estimated.
本実施の形態の場合、Sは、少量データであるため、HSICの推定量(式(11))は、通常、不正確な推定量となってしまう。これを解決するため、以下では少量データからの特徴選択に適したカーネル関数を構成していく。まず、Sのベクトル表現を式(12)で抽出する。
In the case of this embodiment, since S is a small amount of data, the HSIC estimate (Equation (11)) usually ends up being an inaccurate estimate. To solve this problem, below we will construct a kernel function suitable for feature selection from a small amount of data. First, a vector representation of S is extracted using equation (12).
ここで、fとgは、任意のニューラルネットワークである。[,]は、2つのベクトルの結合を表す。fの「和」はS内のサンプルの順番によらないため、式(12)は集合Sに対して一つのベクトルzを定める。なお、この形のニューラルネットワーク以外であっても、置換不変なニューラルネットワークであれば任意のもの(例えば、最大値やset transformer)を用いてよい。
Here, f and g are arbitrary neural networks. [,] represents a combination of two vectors. Since the "sum" of f does not depend on the order of the samples in S, equation (12) defines one vector z for the set S. Note that any neural network other than this type may be used as long as it is permutation invariant (for example, maximum value or set transformer).
Sのベクトル表現zを用いて、式(13)の形のカーネル関数を構成する。
Using the vector representation z of S, construct a kernel function in the form of equation (13).
ここで、β(z)(式(14))、σ(z)(式(15))は、それぞれzを入力としてとるニューラネットワークでモデル化される。
Here, β(z) (Equation (14)) and σ(z) (Equation (15)) are each modeled by a neural network that takes z as an input.
後述する学習方法により、これらの量には、各特徴量が、選ばれやすい・選ばれにくいと言ったバイアスが獲得されることが期待される。
Through the learning method described below, it is expected that a bias will be acquired in these quantities such that each feature quantity is more likely to be selected or less likely to be selected.
例えば、β(d)(z)=0と学習された場合、式(8)の定義から、式(9)が直ちに導かれる。このとき、式(3)およびl1正則化により、αdは0となる。このように、Sが少量データであったとしても、β(z)やσ(z)に、データ不足を補うバイアスを学習させることで、特徴選択精度の向上が期待できる。
For example, when β (d) (z)=0 is learned, equation (9) is immediately derived from the definition of equation (8). At this time, α d becomes 0 due to equation (3) and l1 regularization. In this way, even if S is a small amount of data, by having β(z) and σ(z) learn a bias that compensates for the lack of data, it is expected that feature selection accuracy will improve.
なお、簡単のため、LにはS非依存のカーネルを設定したが、S依存にすることも可能である。σyはハイパーパラメータである。
Note that for simplicity, we set a kernel that is S-independent for L, but it is also possible to make it S-dependent. σ y is a hyperparameter.
式(3)の目的関数の最適化法について述べる。本例では、ISTA(Iterative Shrinkage Thresholding Algorithm)を用いる。ISTAを用いることで、最適化の各更新式は閉形式で記述することができ、かつ、微分可能となる。閉形式で記述できることで、学習が効率よく実施できるというメリットがある。また、微分可能であることは、特徴選択モデルの学習を、一般的なNNの学習と同様に確率的勾配降下法で実施するために必要な条件となる。ISTA以外でも、微分可能な形で更新式が求まる限りは任意の最適化法を用いることができる。
The optimization method for the objective function of equation (3) will be described. In this example, ISTA (Iterative Shrinkage Thresholding Algorithm) is used. By using ISTA, each optimization update equation can be written in a closed form and is differentiable. Being able to write in a closed format has the advantage that learning can be carried out efficiently. Also, being differentiable is a necessary condition for learning a feature selection model using stochastic gradient descent, similar to the learning of a general NN. Any optimization method other than ISTA can be used as long as the update formula can be found in a differentiable form.
ISTAを用いた更新式は、式(16)、式(17)となり、式(16)、式(17)を交互に繰り返すことで大域的最適解が求まる。なお、μ>0は、ステップサイズを表すハイパーパラメータである。
The update equations using ISTA are equations (16) and (17), and the global optimal solution is found by alternately repeating equations (16) and (17). Note that μ>0 is a hyperparameter representing the step size.
[学習フェーズ]
ここでは、関連データセットを用いたモデルの学習方法を述べる。ここでは、関連データセットからのサンプル集合を記号Sで表すこととする。便宜上、特徴選択フェーズと同じ記号を用いているが、これらは異なるデータである。 [Learning phase]
Here, we describe a model learning method using related datasets. Here, we will denote the sample set from the related data set by the symbol S. For convenience, the same symbols as the feature selection phase are used, but these are different data.
ここでは、関連データセットを用いたモデルの学習方法を述べる。ここでは、関連データセットからのサンプル集合を記号Sで表すこととする。便宜上、特徴選択フェーズと同じ記号を用いているが、これらは異なるデータである。 [Learning phase]
Here, we describe a model learning method using related datasets. Here, we will denote the sample set from the related data set by the symbol S. For convenience, the same symbols as the feature selection phase are used, but these are different data.
モデルの学習パラメータは、ニューラルネットf,g,β,σのパラメータ、ISTAの初期パラメータα0、正則化パラメータλである。ISTAの初期パラメータα0も学習対象とすることで、特徴選択の際に効率的にαを求めることができる。目的関数は、式(18)、式(19)である。
The learning parameters of the model are the parameters of the neural networks f, g, β, and σ, the initial parameter α 0 of ISTA, and the regularization parameter λ. By using the initial parameter α 0 of ISTA as a learning target, α can be calculated efficiently when selecting features. The objective functions are equations (18) and (19).
ここで、α*
dは、SからISTAをIイテレーション動かした後に得られるd番目の特徴の重要度を表す。関連データセットであるデータセットDtからランダムにサンプルして得られる疑似少量学習データ(少量のラベルありデータ)と、疑似テストデータ(十分な量のラベルありデータ)とをそれぞれS,Qで表している。
Here, α * d represents the importance of the d-th feature obtained after running ISTA from S for I iterations. The pseudo small amount of training data (a small amount of labeled data) obtained by randomly sampling from the related data set D t and the pseudo test data (a sufficient amount of labeled data) are represented by S and Q, respectively. ing.
 ̄LQ(式(20))は、ラベルyに関する疑似テストデータQでの中心化グラム行列であり、行列LQの(i,j)成分はカーネル関数L(yi,yj)で定まる。Γは中心化行列である。
 ̄L Q (Equation (20)) is a centered Gram matrix in pseudo test data Q regarding label y, and the (i,j) component of matrix L Q is determined by the kernel function L (y i , y j ) . Γ is the centering matrix.
同様に、 ̄K(d)
S(式(21))は、d番目の特徴に関する疑似テストデータQでの中心化グラム行列であり、行列K(d)
Qの(i,j)成分はカーネルK(d)(x(d)
i,x(d)
j)で定まる。
Similarly,  ̄K (d) S (Equation (21)) is the centered Gram matrix on the pseudo test data Q for the dth feature, and the (i,j) component of the matrix K (d) Q is the kernel It is determined by K (d) (x (d) i , x (d) j ).
ラベルyのカーネル関数Lとしては、式(3)と同じカーネルを利用する。特徴量dのカーネルとしては、HSICを利用する際によく用いられる式(22)のRBFカーネルを用いる。
The same kernel as in equation (3) is used as the kernel function L for label y. As the kernel for the feature amount d, we use the RBF kernel of equation (22), which is often used when using HSIC.
ここで、σxは、ハイパーパラメータである。式(3)と異なり、全特徴で同じカーネルを用いることで、バイアスなしに全ての特徴を平等に評価することができる。
Here, σ x is a hyperparameter. Unlike Equation (3), by using the same kernel for all features, all features can be evaluated equally without bias.
ここで、疑似テストデータQは、少量のラベルありデータ(疑似少量学習データ)Sと異なり、十分な量のデータがあることを仮定している。そのため、式(22)のカーネルを用いて疑似テストデータQからHSICの推定量を正確に見積もることが可能となる。
Here, unlike the small amount of labeled data (pseudo small amount learning data) S, it is assumed that the pseudo test data Q has a sufficient amount of data. Therefore, it is possible to accurately estimate the estimated amount of HSIC from the pseudo test data Q using the kernel of equation (22).
式(17)は、少量ラベルありデータSである疑似少量学習データから得られたα*
dが、十分な量のラベルありデータである疑似テストデータQから得られたものと一致すると小さな値をとる。
Equation (17) assumes a small value when α * d obtained from a small amount of pseudo training data, which is a small amount of labeled data S, matches that obtained from pseudo test data Q, which is a sufficient amount of labeled data. Take.
このため、実施の形態では、本学習方法を適用することで、少量ラベルありデータである疑似少量学習データSから精度よく特徴選択を実施することが可能となる。
Therefore, in the embodiment, by applying the present learning method, it is possible to accurately select features from the pseudo-small amount learning data S, which is data with a small amount of labels.
図5は、学習部10の処理を説明するための図である。図5には、学習部10の処理の疑似コードが例示されている。
FIG. 5 is a diagram for explaining the processing of the learning section 10. FIG. 5 exemplifies the pseudo code of the processing of the learning unit 10.
まず、学習部10は、Dを関連データセットとし、少量のラベルありデータ(疑似少量学習データ)S(サンプル数NS)、十分な量のラベルありデータ(疑似テストデータ)Q(サンプル数NQ)取得する(Algorithm 1)。
First, the learning unit 10 takes D as a related data set, a small amount of labeled data (pseudo small amount training data) S (number of samples N S ), a sufficient amount of labeled data (pseudo test data) Q (number of samples N Q ) Get (Algorithm 1).
学習部10は、学習フェーズにおいて、ランダムに、サンプルとなるタスクt、少量のラベルありデータS、十分な量のラベルありデータQを選ぶ(Algorithm1の2-4行目)。
In the learning phase, the learning unit 10 randomly selects a sample task t, a small amount of labeled data S, and a sufficient amount of labeled data Q (lines 2-4 of Algorithm 1).
学習部10は、式(12)を用いて、少量のラベルありデータSからベクトルzを計算する(Algorithm1の5行目)。
The learning unit 10 calculates a vector z from a small amount of labeled data S using equation (12) (fifth line of Algorithm 1).
学習部10は、少量のラベルありデータSのベクトル表現zを用いて、式(13)のカーネルを計算する(Aolorithm1の6行目)。
The learning unit 10 calculates the kernel of equation (13) using the vector representation z of the small amount of labeled data S (line 6 of Aolorithm1).
学習部10は、ISTAを用いて記述した式(16)、式(17)の更新式を交互に繰り返すことで、目的関数(式(3))の大域的最適解を求める(Aolorithm1の7~9行目)。
The learning unit 10 obtains the global optimal solution of the objective function (formula (3)) by alternately repeating the update formulas of formula (16) and formula (17) described using ISTA (7 to 7 of Aolorithm1). line 9).
学習部10は、式(18)、式(19)を基に、十分な量のラベルありデータQに対する損失を計算する(Aolorithm1の10行目)。
The learning unit 10 calculates the loss for a sufficient amount of labeled data Q based on equations (18) and (19) (line 10 of Aolorithm1).
学習部10は、損失の計算結果に基づき、特徴選択モデルのパラメータを更新する(Aolorithm1の11行目)。
The learning unit 10 updates the parameters of the feature selection model based on the loss calculation results (line 11 of Aolorithm1).
本実施の形態に係る特徴選択装置は、非特許文献1,2に記載のような従来の特徴選択方法に対して特定の改善を提供するものであり、特徴選択の性能評価に係る技術分野の向上を示すものである。
The feature selection device according to the present embodiment provides specific improvements over conventional feature selection methods such as those described in Non-Patent Documents 1 and 2, and is a technical field related to feature selection performance evaluation. It shows improvement.
[実施の形態のシステム構成について]
特徴選択装置1の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、特徴選択装置1の機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [About the system configuration of the embodiment]
Each component of thefeature selection device 1 is functionally conceptual, and does not necessarily need to be physically configured as shown. In other words, the specific form of distributing and integrating the functions of the feature selection device 1 is not limited to what is shown in the figure, and all or part of them can be divided into functional or physical units in arbitrary units depending on various loads and usage conditions. It can be configured in a distributed or integrated manner.
特徴選択装置1の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、特徴選択装置1の機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [About the system configuration of the embodiment]
Each component of the
また、特徴選択装置1においておこなわれる各処理は、全部または任意の一部が、CPU、GPU(Graphics Processing Unit)、及び、CPU、GPUにより解析実行されるプログラムにて実現されてもよい。また、特徴選択装置1においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。
Furthermore, all or any part of each process performed in the feature selection device 1 may be realized by a CPU, a GPU (Graphics Processing Unit), or a program that is analyzed and executed by the CPU and GPU. Moreover, each process performed in the feature selection device 1 may be realized as hardware using wired logic.
また、実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述及び図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。
Furthermore, among the processes described in the embodiments, all or part of the processes described as being performed automatically can also be performed manually. Alternatively, all or part of the processes described as being performed manually can also be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, and various data and parameters described above and illustrated can be changed as appropriate, unless otherwise specified.
[プログラム]
図6は、プログラムが実行されることにより、特徴選択装置1が実現されるコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。 [program]
FIG. 6 is a diagram showing an example of a computer that implements thefeature selection device 1 by executing a program. Computer 1000 includes, for example, memory 1010 and CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.
図6は、プログラムが実行されることにより、特徴選択装置1が実現されるコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。 [program]
FIG. 6 is a diagram showing an example of a computer that implements the
メモリ1010は、ROM1011及びRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1090に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1100に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、例えばマウス1110、キーボード1120に接続される。ビデオアダプタ1060は、例えばディスプレイ1130に接続される。
The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120. Video adapter 1060 is connected to display 1130, for example.
ハードディスクドライブ1090は、例えば、OS(Operating System)1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、特徴選択装置1の各処理を規定するプログラムは、コンピュータ1000により実行可能なコードが記述されたプログラムモジュール1093として実装される。プログラムモジュール1093は、例えばハードディスクドライブ1090に記憶される。例えば、特徴選択装置1における機能構成と同様の処理を実行するためのプログラムモジュール1093が、ハードディスクドライブ1090に記憶される。なお、ハードディスクドライブ1090は、SSD(Solid State Drive)により代替されてもよい。
The hard disk drive 1090 stores, for example, an OS (Operating System) 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the feature selection device 1 is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program module 1093 is stored in hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration of the feature selection device 1 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ1094として、例えばメモリ1010やハードディスクドライブ1090に記憶される。そして、CPU1020が、メモリ1010やハードディスクドライブ1090に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して実行する。
Further, the setting data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.
なお、プログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1090に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ1100等を介してCPU1020によって読み出されてもよい。あるいは、プログラムモジュール1093及びプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール1093及びプログラムデータ1094は、他のコンピュータから、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。
Note that the program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.
以上の実施形態に関し、更に以下の付記を開示する。
Regarding the above embodiments, the following additional notes are further disclosed.
(付記項1)
メモリと、
前記メモリに接続された少なくとも1つのプロセッサと、
を含み、
前記プロセッサは、
特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得し、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う
学習装置。 (Additional note 1)
memory and
at least one processor connected to the memory;
including;
The processor includes:
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A learning device that performs learning of the feature selection model so that the results of the feature selection model match the results of the training.
メモリと、
前記メモリに接続された少なくとも1つのプロセッサと、
を含み、
前記プロセッサは、
特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得し、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う
学習装置。 (Additional note 1)
memory and
at least one processor connected to the memory;
including;
The processor includes:
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A learning device that performs learning of the feature selection model so that the results of the feature selection model match the results of the training.
(付記項2)
付記項1に記載の学習装置であって、
前記第2のラベルありデータは、前記第1のラベルありデータと比してデータ量が多い
学習装置。 (Additional note 2)
The learning device according toSupplementary Note 1,
The second labeled data has a larger amount of data than the first labeled data.
付記項1に記載の学習装置であって、
前記第2のラベルありデータは、前記第1のラベルありデータと比してデータ量が多い
学習装置。 (Additional note 2)
The learning device according to
The second labeled data has a larger amount of data than the first labeled data.
(付記項3)
付記項1に記載の学習装置であって、
特徴選択の処理対象の第3のラベルありデータから、前記学習部によって学習された前記特徴選択モデルを用いて、特徴量を選択する
学習装置。 (Additional note 3)
The learning device according toSupplementary Note 1,
A learning device that selects a feature quantity from third labeled data to be processed for feature selection, using the feature selection model learned by the learning unit.
付記項1に記載の学習装置であって、
特徴選択の処理対象の第3のラベルありデータから、前記学習部によって学習された前記特徴選択モデルを用いて、特徴量を選択する
学習装置。 (Additional note 3)
The learning device according to
A learning device that selects a feature quantity from third labeled data to be processed for feature selection, using the feature selection model learned by the learning unit.
(付記項5)
学習処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記学習処理は、
特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得し、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う
非一時的記憶媒体。 (Additional note 5)
A non-transitory storage medium storing a program executable by a computer to perform a learning process,
The learning process is
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A non-temporary storage medium that performs learning of the feature selection model so that the results of the above-mentioned feature selection model match the results of the above training.
学習処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記学習処理は、
特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得し、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う
非一時的記憶媒体。 (Additional note 5)
A non-transitory storage medium storing a program executable by a computer to perform a learning process,
The learning process is
Obtaining related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. A non-temporary storage medium that performs learning of the feature selection model so that the results of the above-mentioned feature selection model match the results of the above training.
以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。
Although the embodiments applying the invention made by the present inventor have been described above, the present invention is not limited by the description and drawings that form part of the disclosure of the present invention according to the present embodiments. That is, all other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are included in the scope of the present invention.
1 特徴選択装置
10 学習部
11 学習データ入力部
12,22 特徴抽出部
13 特徴選択モデル学習部
14 格納部
20 選択部
21 データ入力部
23 特徴選択部
24 結果出力部
141 特徴選択モデル 1Feature selection device 10 Learning unit 11 Learning data input unit 12, 22 Feature extraction unit 13 Feature selection model learning unit 14 Storage unit 20 Selection unit 21 Data input unit 23 Feature selection unit 24 Result output unit 141 Feature selection model
10 学習部
11 学習データ入力部
12,22 特徴抽出部
13 特徴選択モデル学習部
14 格納部
20 選択部
21 データ入力部
23 特徴選択部
24 結果出力部
141 特徴選択モデル 1
Claims (5)
- 特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得する取得部と、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う学習部と、
を有することを特徴とする学習装置。 an acquisition unit that acquires related data that has the same feature quantity configuration as the data to be processed for feature selection but has a different feature quantity value;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a learning unit that performs learning of the feature selection model so that the results match the results of the
A learning device characterized by having. - 前記第2のラベルありデータは、前記第1のラベルありデータと比してデータ量が多いことを特徴とする請求項1に記載の学習装置。 The learning device according to claim 1, wherein the second labeled data has a larger amount of data than the first labeled data.
- 特徴選択の処理対象の第3のラベルありデータから、前記学習部によって学習された前記特徴選択モデルを用いて、特徴量を選択する選択部
をさらに有することを特徴とする請求項1に記載の学習装置。 2. The method according to claim 1, further comprising: a selection section that selects a feature amount from third labeled data to be processed for feature selection, using the feature selection model learned by the learning section. learning device. - 学習装置が実行する学習方法であって、
特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得する工程と、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行う工程と、
を含んだことを特徴とする学習方法。 A learning method executed by a learning device, comprising:
acquiring related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a step of training the feature selection model so that the results match the results of the step;
A learning method characterized by including. - 特徴選択の処理対象のデータと特徴量の構成が同一であって該特徴量の値が異なる関連データを取得するステップと、
前記関連データから選択した第1のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果と、前記関連データから選択した第2のラベルありデータを特徴選択モデルに入力し、特徴選択を行った結果とが一致するように、前記特徴選択モデルの学習を行うステップと、
をコンピュータに実行させるための学習プログラム。 a step of acquiring related data that has the same feature quantity configuration as the data to be processed for feature selection but has different feature quantity values;
The first labeled data selected from the related data is input into the feature selection model, and the result of feature selection and the second labeled data selected from the related data are input into the feature selection model, and feature selection is performed. a step of training the feature selection model so that the results match the results of the step;
A learning program for making a computer execute
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/020859 WO2023223509A1 (en) | 2022-05-19 | 2022-05-19 | Learning device, learning method, and learning program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/020859 WO2023223509A1 (en) | 2022-05-19 | 2022-05-19 | Learning device, learning method, and learning program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023223509A1 true WO2023223509A1 (en) | 2023-11-23 |
Family
ID=88834940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/020859 WO2023223509A1 (en) | 2022-05-19 | 2022-05-19 | Learning device, learning method, and learning program |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023223509A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170213153A1 (en) * | 2016-01-22 | 2017-07-27 | Arizona Board Of Regents On Behalf Of Arizona State University | Systems and methods for embedded unsupervised feature selection |
JP2018151892A (en) * | 2017-03-14 | 2018-09-27 | 日本放送協会 | Model learning apparatus, information determination apparatus, and program therefor |
-
2022
- 2022-05-19 WO PCT/JP2022/020859 patent/WO2023223509A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170213153A1 (en) * | 2016-01-22 | 2017-07-27 | Arizona Board Of Regents On Behalf Of Arizona State University | Systems and methods for embedded unsupervised feature selection |
JP2018151892A (en) * | 2017-03-14 | 2018-09-27 | 日本放送協会 | Model learning apparatus, information determination apparatus, and program therefor |
Non-Patent Citations (1)
Title |
---|
ATSUTOSHI KUMAGAI; TOMOHARU IWATA; YASUHIRO FUJIWARA: "Few-shot Learning for Unsupervised Feature Selection", ARXIV.ORG, 2 July 2021 (2021-07-02), XP091006548 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Online continual learning through mutual information maximization | |
JP7178513B2 (en) | Chinese word segmentation method, device, storage medium and computer equipment based on deep learning | |
CN112307337B (en) | Associated recommendation method and device based on tag knowledge graph and computer equipment | |
JP6535134B2 (en) | Creation device, creation program, and creation method | |
CN113011191A (en) | Knowledge joint extraction model training method | |
WO2020179378A1 (en) | Information processing system, information processing method, and recording medium | |
CN111310930B (en) | Optimizing apparatus, optimizing method, and non-transitory computer-readable storage medium | |
Teisseyre | Feature ranking for multi-label classification using Markov networks | |
Muschalik et al. | isage: An incremental version of SAGE for online explanation on data streams | |
JP6662754B2 (en) | L1 graph calculation device, L1 graph calculation method, and L1 graph calculation program | |
WO2023223509A1 (en) | Learning device, learning method, and learning program | |
Lücke et al. | Truncated variational sampling for ‘black box’optimization of generative models | |
CN108364067B (en) | Deep learning method based on data segmentation and robot system | |
Borovec et al. | Binary pattern dictionary learning for gene expression representation in drosophila imaginal discs | |
Stippinger et al. | BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space | |
CN115017321A (en) | Knowledge point prediction method and device, storage medium and computer equipment | |
JP7533773B2 (en) | Feature selection device, feature selection method, and feature selection program | |
Zhang et al. | Maximum likelihood inference for the band-read error model for capture-recapture data with misidentification | |
JP7556457B2 (en) | DENSITY RATIO ESTIMATION DEVICE, DENSITY RATIO ESTIMATION METHOD, AND DENSITY RATIO ESTIMATION PROGRAM | |
Noè | Bayesian nonparametric inference in mechanistic models of complex biological systems | |
Turner et al. | A tutorial on joint modeling | |
JP7439923B2 (en) | Learning methods, learning devices and programs | |
WO2023223510A1 (en) | Learning device, learning method, and learning program | |
Pantazis et al. | Enumerating multiple equivalent lasso solutions | |
Michaelides et al. | Property-driven state-space coarsening for continuous time Markov chains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22942713 Country of ref document: EP Kind code of ref document: A1 |