WO2022239245A1 - Training method, inference method, training device, inference device, and program - Google Patents
Training method, inference method, training device, inference device, and program Download PDFInfo
- Publication number
- WO2022239245A1 WO2022239245A1 PCT/JP2021/018484 JP2021018484W WO2022239245A1 WO 2022239245 A1 WO2022239245 A1 WO 2022239245A1 JP 2021018484 W JP2021018484 W JP 2021018484W WO 2022239245 A1 WO2022239245 A1 WO 2022239245A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- multidimensional data
- inference
- attribute
- binning
- attributes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012549 training Methods 0.000 title abstract description 6
- 238000012545 processing Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 17
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000000513 principal component analysis Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 5
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to a learning method, an inference method, a learning device, an inference device, and a program.
- CMOS complementary metal-oxide-semiconductor
- N-dimensional convolution see Non-Patent Document 1.
- N-dimensional convolution inference is performed using an attribute value set and its adjacent attribute value sets.
- attribute values such as ⁇ (year/month, gender, age)
- Specific information for example, the number of contracts
- N-dimensional convolution is difficult to apply to categorical attributes that do not have adjacency definition (eg, occupation, country name, etc.).
- the number of dimensions is N
- the number of attribute value sets adjacent to a certain attribute value set is 3 N ⁇ 1, and there is a problem that the number of adjacencies increases exponentially as the number of dimensions N increases.
- An embodiment of the present invention has been made in view of the above points, and aims at estimating specific information from multidimensional data with high accuracy and low computational complexity.
- a learning method generates multidimensional data having an estimation target attribute indicating an attribute to be estimated and two or more non-estimation target attributes indicating attributes other than the estimation target attribute.
- a dimensionality reduction procedure for reducing the number of dimensions of the non-estimation target attribute of the multidimensional data
- a binning procedure for binning the values of the non-estimation target attribute of the multidimensional data after the dimensionality reduction
- An information addition procedure for adding predetermined additional information to multidimensional data after binning, and an inference model for estimating the value of the attribute to be estimated using the multidimensional data to which the additional information is added.
- a computer performs a learning procedure for learning the parameters.
- FIG. 10 is a diagram showing an example of a multidimensional data set with a dimensionally reduced correct answer in tabular form
- FIG. 4 is a diagram showing an example of a binned multidimensional data set with correct answers in tabular form.
- FIG. 4 is a diagram showing an example of a multidimensional data set for learning in tabular form; It is a figure which shows an example of an inference model.
- FIG. 10 is a diagram showing another example of an inference model
- 6 is a flowchart showing an example of inference processing according to the embodiment
- It is a figure which shows an example of a multi-dimensional data set without a correct answer in tabular form.
- FIG. 10 is a diagram showing an example of a dimension-reduced no-correct multidimensional data set in tabular form
- FIG. 4 is a diagram showing an example of a binned non-correct multidimensional data set in tabular form.
- FIG. 4 is a diagram showing an example of a multidimensional data set for inference in tabular form;
- An embodiment of the present invention will be described below.
- An inference device 10 capable of estimating specific information from multidimensional data with high accuracy and low computational complexity will be described.
- FIG. 1 is a diagram showing an example of the hardware configuration of an inference device 10 according to this embodiment.
- the inference device 10 is realized by the hardware configuration of a general computer or computer system, and includes an input device 101, a display device 102, an external I/F 103, and a communication I/F. F 104 , processor 105 and memory device 106 . Each of these pieces of hardware is communicably connected via a bus 107 .
- the input device 101 is, for example, a keyboard, mouse, touch panel, or the like.
- the display device 102 is, for example, a display. Note that the inference device 10 may not have at least one of the input device 101 and the display device 102, for example.
- the external I/F 103 is an interface with an external device such as the recording medium 103a.
- the inference device 10 can perform reading, writing, etc. of the recording medium 103 a via the external I/F 103 .
- Examples of the recording medium 103a include CD (Compact Disc), DVD (Digital Versatile Disk), SD memory card (Secure Digital memory card), USB (Universal Serial Bus) memory card, and the like.
- the communication I/F 104 is an interface for connecting the inference device 10 to a communication network.
- the processor 105 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit).
- the memory device 106 is, for example, various storage devices such as HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read Only Memory), and flash memory.
- the inference device 10 has the hardware configuration shown in FIG. 1, so that learning processing and inference processing, which will be described later, can be realized.
- the hardware configuration shown in FIG. 1 is merely an example, and the inference device 10 may have, for example, multiple processors 105 or multiple memory devices 106 .
- FIG. 2 is a diagram showing an example of the functional configuration of the inference device 10 according to this embodiment.
- the inference device 10 has a learning unit 201 and an inference unit 202 . These units are implemented by, for example, processing that one or more programs installed in the inference apparatus 10 cause the processor 105 to execute.
- the reasoning apparatus 10 includes a multidimensional data with correct answer storage unit 203, a learning dimensionality reduction model storage unit 204, a learning inference model storage unit 205, and a trained dimensionality reduction model storage unit 206. , a learned inference model storage unit 207 , a no-correct multidimensional data storage unit 208 , and an estimation result storage unit 209 .
- Each of these units is realized by the memory device 106, for example. At least one of these units may be realized by a storage device or the like connected to the inference device 10 via a communication network.
- the learning unit 201 learns a dimensionality reduction model for reducing the number of dimensions of multidimensional data, and an inference model for estimating specific information from the multidimensional data subjected to dimensionality reduction by the dimensionality reduction model. .
- the inference unit 202 uses the trained dimensionality reduction model and the trained inference model to estimate specific information from the multidimensional data.
- the multidimensional data with correct answers storage unit 203 stores a set of multidimensional data with correct answers (hereinafter also referred to as a multidimensional data set with correct answers) used when learning the dimensionality reduction model and the inference model.
- a multidimensional data set with correct answers is a set of multidimensional data to which correct answers (that is, so-called teacher data) of specific information to be estimated by an inference model are added.
- an attribute that can take specific information to be estimated as a value is also referred to as an "estimation target attribute", and the other attributes are also referred to as "non-estimation target attributes”.
- estimate target attribute an attribute that can take specific information to be estimated as a value
- non-estimation target attributes A specific example of the multidimensional data set with correct answers will be described later.
- the learning dimensionality reduction model storage unit 204 stores a dimensionality reduction model to be learned by the learning unit 201 (hereinafter also referred to as a learning dimensionality reduction model).
- a dimensionality reduction model is a model for reducing the number of dimensions of multidimensional data, such as principal component analysis (PCA).
- PCA principal component analysis
- the dimensionality reduction model is PCA, but this is an example and other dimensionality reduction models may be used.
- the learning inference model storage unit 205 stores an inference model to be learned by the learning unit 201 (hereinafter also referred to as a learning inference model).
- An inference model is a model for estimating an attribute value to be estimated from multidimensional data that has undergone dimensionality reduction or the like, and is, for example, a neural network. A specific example of the inference model will be described later.
- the learned dimensionality reduction model storage unit 206 stores the learned dimensionality reduction model learned by the learning unit 201 .
- the learned inference model storage unit 207 stores the learned inference model learned by the learning unit 201.
- the non-correct answer multidimensional data storage unit 208 stores a set of non-correct answer multidimensional data to which estimation target attribute values are not assigned and multidimensional data with correct answer (hereinafter also referred to as a non-correct answer multidimensional data set).
- a multidimensional data set without correct answer is a multidimensional data set that includes multidimensional data to which an estimation target attribute value is not assigned.
- a specific example of the non-correct answer multidimensional data set will be described later.
- the estimation result storage unit 209 stores the estimation result by the inference unit 202 (that is, the estimation target attribute value estimated from the non-correct multidimensional data).
- the learning unit 201 includes a dimension reduction unit 211, a binning unit 212, an information addition unit 213, and a learning processing unit 214.
- the dimension reduction unit 211 receives the multidimensional data set with correct answer and the dimensionality reduction model for learning as input, and reduces the number of dimensions of non-estimation target attributes of each multidimensional data included in the multidimensional data set with correct answer.
- the dimensionality reduction unit 211 outputs a set of multidimensional data after this dimensionality reduction (hereinafter also referred to as a dimensionality-reduced multidimensional data set with correct answers) and a trained dimensionality reduction model.
- a dimensionality-reduced multidimensional data set with correct answers a set of multidimensional data after this dimensionality reduction
- a trained dimensionality reduction model A specific example of the dimension-reduced correct multidimensional data set will be described later.
- the binning unit 212 receives the dimension-reduced correct answer multidimensional data set as an input, and performs binning on non-estimation target attributes of each multidimensional data included in the dimension-reduced correct answer multidimensional data set. Then, the binning unit 212 outputs a set of multidimensional data after this binning (hereinafter also referred to as a binned multidimensional data set with correct answer). Binning is also called discretization, and divides the possible range of non-estimation target attributes contained in multidimensional data into bins of a certain interval, and determines which bin the value of the non-estimation target attribute belongs to. It is a method of replacing with a value that expresses
- the information addition unit 213 receives the binned multidimensional data set with correct answer as input, and creates a learning multidimensional data set by adding additional information to each multidimensional data included in the binned multidimensional data set with correct answer. do.
- the additional information refers to each non-estimation target attribute of each multidimensional data included in the binned correct answer multidimensional data set, while fixing the value of the non-estimation target attribute other than the non-estimation target attribute. It is information configured by the value of the inference target attribute when the value of the non-inference target attribute is changed.
- the information addition unit 213 outputs the learning multidimensional data set.
- a specific example of the learning multidimensional data set will be described later.
- the learning processing unit 214 receives the multidimensional data set for learning and the inference model for learning as input, learns the inference model for learning, and creates a trained inference model. Then, the learning processing unit 214 outputs a learned inference model.
- the inference unit 202 also includes a dimension reduction unit 221 , a binning unit 222 , an information addition unit 223 , and an estimation processing unit 224 .
- the dimension reduction unit 221 receives the non-correct multidimensional data set and the trained dimensionality reduction model as input, and reduces the dimension of the non-estimation target attribute of each multidimensional data included in the non-correct multidimensional data set. Then, the dimension reduction unit 221 outputs a set of multidimensional data after the dimension reduction (hereinafter also referred to as a dimension-reduced non-correct multidimensional data set).
- the binning unit 222 receives the dimension-reduced non-correct multidimensional data set as input, and performs binning on non-estimation target attributes of each multidimensional data included in the dimension-reduced non-correct multidimensional data set. Then, the binning unit 222 outputs a set of multidimensional data after this binning (hereinafter also referred to as a binned non-correct multidimensional data set).
- the information addition unit 223 receives the binned non-correct answer multidimensional data set as an input, and creates an inference multidimensional data set by adding additional information to each multidimensional data included in the binned non-correct answer multidimensional data set. do.
- the estimation processing unit 224 receives the inference multidimensional data set and the learned inference model as input, and estimates an inference target attribute value using the learned inference model. Then, the estimation processing unit 224 outputs the estimation target attribute value as an estimation result.
- each unit shown in FIG. 1 may be distributed among a plurality of devices.
- the learning unit 201 and the inference unit 202 may be included in different devices.
- the device having the learning unit 201 may be called a “learning device”.
- FIG. 3 is a flowchart showing an example of learning processing according to this embodiment.
- Step S101 First, the dimension reduction unit 211 of the learning unit 201 inputs the multidimensional data set with correct answer stored in the multidimensional data with correct answer storage unit 203 .
- FIG. 4 is a diagram showing an example of a multidimensional data set with correct answers in tabular form.
- FIG. 4 is a tabular representation of a set of multidimensional data having "year/month”, “gender”, “age”, etc. as non-estimated attributes and "number of contracts" as an estimated target attribute. .
- “Year and month” can take values indicating the year and month, such as “2019/4" and “2019/5”.
- “Gender” can take the value of either “male” or “female”.
- "Age” can take values indicating ages such as “teens”, “twenties”, and “thirties”.
- the "number of contracts” takes as a value the number of contracts in the relevant year/month, gender, and age group.
- the multidimensional data set with correct answer shown in FIG. 4 is an example, and the present embodiment can be applied to multidimensional data sets having various attributes.
- the values that each attribute can take can be defined in various ways. For example, in the example shown in FIG. 4, "year and month” can take values for each month of a certain year, but it may take values for every other month of a certain year. It may take a value.
- the "age” is not limited to taking values separated by 10 years, for example, it may take values such as “young age group” or "old age group", or "20 years old to 25 years old". The value may be a range such as "years old” or a specific age value such as "20 years old".
- the "number of contracts" may be the number of contracted customers or the total number of contracts.
- each multidimensional data contained in the multidimensional data set with correct answers has various non-estimable attributes other than “year/month”, “gender”, and “age”, and there are a total of N non-estimable attributes. attribute. That is, each multidimensional data included in the multidimensional data set with correct answers has N non-estimation target attributes and one estimation target attribute.
- Step S102 Next, the dimensionality reduction unit 211 of the learning unit 201 inputs the multidimensional data set with correct answer and the dimensionality reduction model for learning stored in the dimensionality reduction model for learning storage unit 204, Reduce the number of dimensions of non-estimable attributes of each multidimensional data contained in the multidimensional dataset. As a result, the dimensionality reduction model for learning is learned, and a learned dimensionality reduction model is obtained.
- N dimensions the number of dimensions of non-estimation target attributes has been reduced to three dimensions of the first principal component, the second principal component, and the third principal component by principal component analysis. Since the principal component analysis is a well-known method, detailed description thereof will be omitted.
- the dimension reduction unit 211 of the learning unit 201 outputs the dimension-reduced multidimensional data set with correct answer to the binning unit 212 and outputs the trained dimension reduction model to the trained dimension reduction model storage unit 206 .
- the trained dimensionality reduction model for example, converts N-dimensional data having N non-estimation target attributes into three-dimensional data having a first principal component, a second principal component, and a third principal component as attributes. It is the information representing the mapping and its parameters.
- FIG. 5 is a diagram showing an example of a multidimensional data set with correct answers after dimensionality reduction in tabular form.
- FIG. 5 is a tabular representation of a set of multidimensional data in which non-estimable attributes of each multidimensional data included in the multidimensional data set with correct answer shown in FIG. 4 are reduced to three dimensions. .
- first principal component "first principal component”, “second principal component” and “third principal component” take the values of the first principal component, second principal component and third principal component, respectively, in principal component analysis. These first to third principal components may be non-estimable attributes of each multidimensional data contained in the multidimensional data set with correct answer shown in FIG. 4, or newly defined attributes may be
- the multidimensional data set with correct answer after dimension reduction shown in FIG. It is possible to In general, the smaller the number of dimensions after dimensionality reduction, the more the amount of calculation can be reduced, but the estimation accuracy at the time of inference is reduced. Therefore, the number of dimensions after dimension reduction is appropriately determined in consideration of the type of target task, the calculation time required for the task, and the like.
- Step S103 Next, the binning unit 212 of the learning unit 201 receives the dimensionality-reduced correct multidimensional data set as input, and non-estimation target attributes of each multidimensional data included in the dimensionality-reduced correct multidimensional data set. binning. Then, the binning unit 212 of the learning unit 201 outputs the binned multidimensional data set with correct answer to the information adding unit 213 .
- FIG. 6 is a diagram showing an example of a binned multidimensional data set with correct answers in tabular form.
- the example shown in FIG. 6 is a tabular representation of a set of multidimensional data obtained by binning the non-estimable attributes of each multidimensional data contained in the multidimensional data set with the dimension-reduced correct answer shown in FIG. It is.
- Each multidimensional data included in the binned multidimensional data set with correct answer shown in FIG. ” as an inferred target attribute, and the values of each non-inferred target attribute are binned.
- the binned multidimensional data set with correct answer shown in FIG. 6 is an example, and for example, the interval width of the bins during binning can be appropriately set to any value.
- Step S104 Next, the information addition unit 213 of the learning unit 201 receives the binned multidimensional data set with correct answer as input, and adds additional information to each multidimensional data included in the binned multidimensional data set with correct answer. Create a given training multidimensional dataset. Then, the information addition unit 213 of the learning unit 201 outputs the learning multidimensional data set to the learning processing unit 214 .
- the additional information is, as described above, fixed values of non-estimation target attributes other than the non-estimation target attributes for each non-estimation target attribute of each multidimensional data included in the binned multidimensional data set with correct answers. It is information composed of the value of the inference target attribute when the value of the non-inference target attribute is changed (within the range of values that the non-inference target attribute can take) while the value of the inference target attribute is changed.
- the value of such an inference target attribute is the binned correct answer of the value of the inference target attribute when the value of the non-estimation target attribute is changed while the value of the non-estimation target attribute other than the non-estimation target attribute is fixed. It is obtained by searching from a multidimensional data set with and aggregating those estimated target attribute values.
- FIG. 7 is a diagram showing an example of a learning multidimensional data set in tabular form.
- FIG. 7 is a tabular representation of a set of multidimensional data in which additional information is added to each piece of multidimensional data included in the binned multidimensional data set with correct answers shown in FIG. Each piece of multidimensional data included in the multidimensional data set for learning shown in FIG. additional information is added.
- “Fixed other than 1st principal component” tabulates the value of the non-estimable attribute (number of contracts) when the value of the 1st principal component is changed while the values of the 2nd and 3rd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is tabulated when the value of the first principal component is changed to "0", "1", etc., respectively.
- “Fixed other than 2nd principal component” tabulates the value of the non-estimable attribute (number of contracts) when the value of the 2nd principal component is changed while the values of the 1st and 3rd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is tabulated when the value of the second principal component is changed to "0", "1", etc., respectively.
- “Fixed other than the 3rd principal component” aggregates the value of the non-estimable attribute (number of contracts) when the value of the 3rd principal component is changed while the values of the 1st and 2nd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is totaled when the value of the third principal component is changed to "0", "1", etc., respectively.
- a non-estimation target attribute value set is defined as a set of values of the "first principal component”, “second principal component”, and “third principal component” of each multidimensional data included in the multidimensional data set for learning. It is also called W, and the set of additional information is represented by X.
- Multidimensional data set for learning shown in FIG. 7 is an example.
- Multidimensional data may be created with unknowns (for example, set to "-" or null value).
- Step S105 Next, the learning processing unit 214 of the learning unit 201 inputs the learning multidimensional data set and the learning inference model stored in the learning inference model storage unit 205, and and create a trained inference model. That is, the learning processing unit 214 of the learning unit 201 receives a vector w ⁇ W representing a set of non-estimation target attribute values and additional information x ⁇ X corresponding to w, and correctly estimates the estimation target attribute value y. (that is, to minimize the error between the estimated target attribute value y estimated by the learning inference model and its correct answer), the parameters of the learning inference model are set using a known error backpropagation method, etc. learn. Note that x corresponding to w is x in the same row as the set of non-estimation target attribute values represented by w in the learning multidimensional data set.
- the inference model is not particularly limited, but for example, a neural network as shown in Fig. 8 may be used as the inference model.
- the neural network shown in FIG. 8 is a model that receives as input a vector w representing a set of non-estimation target attribute values and its additional information x, and outputs an estimation target attribute value y.
- a neural network as shown in FIG. 9 may be used as an inference model.
- the neural network shown in FIG. 9 receives a vector w representing a set of non-estimation target attribute values and its additional information x, and outputs a vector w′ representing a set of estimation target attribute values y and non-estimation target attribute values.
- a neural network as shown in FIG. 9 as an inference model, it is possible to appropriately extract features from the additional information x. High estimation accuracy can be expected.
- Step S106 Finally, the learning processing unit 214 of the learning unit 201 outputs the learned inference model to the learned inference model storage unit 207.
- FIG. 10 is a flowchart showing an example of inference processing according to this embodiment.
- Step S201 First, the dimension reduction unit 221 of the inference unit 202 inputs the multidimensional data without correct answers stored in the multidimensional data storage unit 208 without correct answers.
- FIG. 11 is a diagram showing an example of a non-correct multidimensional data set in tabular form.
- the example shown in FIG. 11 is a tabular representation of a set of multidimensional data having "year/month”, “sex”, “age”, etc. as non-estimation target attributes and "number of contracts" as an estimation target attribute. .
- the multidimensional data in the first row of the multidimensional data set without correct answer shown in FIG. 200).
- (year/month, sex, age, ..., number of contracts) (2020/5, female, twenties, ..., -). That is, in the multidimensional data in the second row, the number of contracts, which is the attribute value to be estimated, is unknown.
- the non-correct multidimensional data set includes at least multidimensional data whose attribute values to be estimated are unknown.
- Step S202 Next, the dimensionality reduction unit 221 of the inference unit 202 inputs the non-correct multidimensional data set and the learned dimensionality reduction model stored in the learned dimensionality reduction model storage unit 206.
- the dimensionality reduction model is used to reduce the number of dimensions of non-estimation target attributes of each multidimensional data included in the non-correct answer multidimensional data set. Then, the dimension reduction unit 221 of the inference unit 202 outputs the dimension-reduced non-correct multidimensional data set to the binning unit 212 .
- FIG. 12 is a diagram showing an example of a dimension-reduced non-correct multidimensional data set in tabular form.
- the example shown in FIG. 12 is a tabular representation of a set of multidimensional data obtained by reducing the non-estimable attributes of each multidimensional data contained in the non-correct multidimensional data set shown in FIG. 11 to three dimensions. .
- Step S203 Next, the binning unit 222 of the inference unit 202 receives the dimension-reduced non-correct multidimensional data set as input, and the non-estimation target attribute of each multidimensional data included in the dimension-reduced non-correct multidimensional data set. binning. Then, the binning unit 222 of the inference unit 202 outputs the binned non-correct multidimensional data set to the information adding unit 223 .
- FIG. 13 is a diagram showing an example of a binned non-correct multidimensional data set in tabular form.
- the example shown in FIG. 13 is a tabular representation of a set of multidimensional data obtained by binning non-estimable attributes of each multidimensional data included in the dimension-reduced non-correct multidimensional data set shown in FIG. It is.
- Each multidimensional data included in the binned non-correct answer multidimensional data set shown in FIG. ” as an inferred target attribute, and the values of each non-inferred target attribute are binned.
- Step S204 Next, the information addition unit 223 of the inference unit 202 receives the binned non-correct answer multidimensional data set as input, and adds additional information to each multidimensional data included in the binned non-correct answer multidimensional data set. Create a given inference multidimensional dataset. Then, the information addition unit 223 of the inference unit 202 outputs the inference multidimensional data set to the estimation processing unit 224 .
- FIG. 14 is a diagram showing an example of a multidimensional data set for inference in tabular form.
- FIG. 14 is a tabular representation of a set of multidimensional data in which additional information is added to each piece of multidimensional data included in the binned non-correct answer multidimensional data set shown in FIG. Each multidimensional data included in the multidimensional data set for inference shown in FIG. additional information is added.
- a set of sets of values of “first principal component”, “second principal component” and “third principal component” of each multidimensional data included in the inference multidimensional data set is It is also called non-estimation target attribute value set W, and the set of additional information is represented by X.
- Step S205 Next, the estimation processing unit 224 of the inference unit 202 inputs the inference multidimensional data set and the learned inference model stored in the learned inference model storage unit 207, and Estimates the attribute value to be estimated by That is, the estimation processing unit 224 of the inference unit 202 receives as input a vector w ⁇ W representing a set of non-estimation target attribute values and additional information x ⁇ X corresponding to w, and uses the learned inference model to determine the estimation target attribute. Estimate the value y.
- Step S ⁇ b>206 Finally, the estimation processing unit 224 of the inference unit 202 outputs the estimation result of the learned inference model (that is, the estimation target attribute value y) to the estimation result storage unit 209 .
- the inference apparatus 10 changes the value of each non-estimation target attribute while fixing the value of the non-estimation target attribute other than the non-estimation target attribute. After creating additional information composed of the value of the attribute to be estimated at the time, learning and inference are performed using this additional information as well. As a result, global features can be extracted, enabling more accurate inference.
- the inference device 10 reduces the number of dimensions of non-estimation target attributes and reduces the number of attribute values of each non-estimation target attribute by binning before creating additional information. This makes it possible to reduce the amount of calculation even when the number of non-estimation target attributes and the number of attribute values are large. For example, if the number of non-estimable attributes before dimensionality reduction is N and the average of the attribute values is M, calculation must be performed for N ⁇ M attribute values without dimensionality reduction and binning. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A training method according to one embodiment of the present invention is such that a computer executes: an input step for inputting multidimensional data having an estimation object attribute indicating the attribute of an estimation object, and at least two non-estimation object attributes indicating attributes other than the estimation object attribute; a dimension reduction step for reducing the number of dimensions of the non-estimation object attributes in the multidimensional data; a binning step for performing binning on the values of the non-estimation object attributes in the multidimensional data after the dimensional reduction; an information addition step for adding prescribed additional information to the multidimensional data after the binning; and a training step for training a parameter of an inference model for estimating the value of the estimation object attribute by using the multidimensional data having the additional information added thereto.
Description
本発明は、学習方法、推論方法、学習装置、推論装置、及びプログラムに関する。
The present invention relates to a learning method, an inference method, a learning device, an inference device, and a program.
年月、性別、年代等といった複数の属性を有する多次元データを用いて、或る特定の情報を推定するタスクが広く行われている。
The task of estimating certain specific information using multidimensional data with multiple attributes such as date, gender, age, etc., is widely performed.
多次元データを用いて情報を推定する手法としては、例えば、多重パーセプトロン(MLP:multi layer perceptron)、N次元の畳み込み(非特許文献1参照)等が知られている。
Known methods for estimating information using multidimensional data include, for example, multi-layer perceptron (MLP) and N-dimensional convolution (see Non-Patent Document 1).
上記のMLPでは、属性値集合のみを用いて推論が行われる。例えば、(年月,性別,年代)=(2020年4月,男性,30代)といった属性値集合のみを入力として、特定の情報(例えば、契約数)が推定される。しかしながら、MLPでは広範囲の傾向(例えば、時系列の周期性等)を捉えることが困難であり、推定精度が高くない場合がある。
In the above MLP, inference is performed using only attribute value sets. For example, specific information (for example, the number of contracts) is estimated by inputting only an attribute value set such as (year/month, sex, age)=(April 2020, male, 30s). However, with MLP, it is difficult to capture a wide range of trends (for example, periodicity of time series, etc.), and estimation accuracy may not be high.
また、上記のN次元の畳み込みでは、属性値集合とそれに隣接する属性値集合とを用いて推論が行われる。例えば、{(年月,性別,年代)|年月=2020年3月,2020年4月,2020年5月,性別=男性,女性,年代=20代,30代,40代}といった属性値集合の集まりを入力として特定情報(例えば、契約数)が推定される。しかしながら、N次元の畳み込みでは、隣接の定義がないカテゴリカルな属性(例えば、職業、国名等)に対しては適用が困難である。また、次元数がNのとき或る属性値集合に隣接する属性値集合の数は3N-1であり、次元数Nが増えると隣接数が指数的に増えるという問題がある。
Also, in the above N-dimensional convolution, inference is performed using an attribute value set and its adjacent attribute value sets. For example, attribute values such as {(year/month, gender, age) | year/month = March 2020, April 2020, May 2020, gender = male, female, age = 20s, 30s, 40s} Specific information (for example, the number of contracts) is estimated using a collection of sets as an input. However, N-dimensional convolution is difficult to apply to categorical attributes that do not have adjacency definition (eg, occupation, country name, etc.). Moreover, when the number of dimensions is N, the number of attribute value sets adjacent to a certain attribute value set is 3 N −1, and there is a problem that the number of adjacencies increases exponentially as the number of dimensions N increases.
本発明の一実施形態は、上記の点に鑑みてなされたもので、多次元データから特定の情報を高精度かつ低計算量で推定することを目的とする。
An embodiment of the present invention has been made in view of the above points, and aims at estimating specific information from multidimensional data with high accuracy and low computational complexity.
上記目的を達成するため、一実施形態に係る学習方法は、推定対象の属性を示す推定対象属性と、前記推定対象属性以外の属性を示す2以上の非推定対象属性とを持つ多次元データを入力する入力手順と、前記多次元データの非推定対象属性の次元数を削減する次元削減手順と、前記次元削減後の多次元データの前記非推定対象属性の値をビニングするビニング手順と、前記ビニング後の多次元データに対して、所定の付加情報を付加する情報付加手順と、前記付加情報が付加された多次元データを用いて、前記推定対象属性の値を推定するための推論モデルのパラメータを学習する学習手順と、をコンピュータが実行する。
In order to achieve the above object, a learning method according to one embodiment generates multidimensional data having an estimation target attribute indicating an attribute to be estimated and two or more non-estimation target attributes indicating attributes other than the estimation target attribute. a dimensionality reduction procedure for reducing the number of dimensions of the non-estimation target attribute of the multidimensional data; a binning procedure for binning the values of the non-estimation target attribute of the multidimensional data after the dimensionality reduction; An information addition procedure for adding predetermined additional information to multidimensional data after binning, and an inference model for estimating the value of the attribute to be estimated using the multidimensional data to which the additional information is added. A computer performs a learning procedure for learning the parameters.
多次元データから特定の情報を高精度かつ低計算量で推定することができる。
It is possible to estimate specific information from multidimensional data with high accuracy and low computational complexity.
以下、本発明の一実施形態について説明する。多次元データから特定の情報を高精度かつ低計算量で推定することができる推論装置10について説明する。
An embodiment of the present invention will be described below. An inference device 10 capable of estimating specific information from multidimensional data with high accuracy and low computational complexity will be described.
<ハードウェア構成>
まず、本実施形態に係る推論装置10のハードウェア構成について、図1を参照しながら説明する。図1は、本実施形態に係る推論装置10のハードウェア構成の一例を示す図である。 <Hardware configuration>
First, the hardware configuration of theinference device 10 according to this embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of the hardware configuration of an inference device 10 according to this embodiment.
まず、本実施形態に係る推論装置10のハードウェア構成について、図1を参照しながら説明する。図1は、本実施形態に係る推論装置10のハードウェア構成の一例を示す図である。 <Hardware configuration>
First, the hardware configuration of the
図1に示すように、本実施形態に係る推論装置10は一般的なコンピュータ又はコンピュータシステムのハードウェア構成で実現され、入力装置101と、表示装置102と、外部I/F103と、通信I/F104と、プロセッサ105と、メモリ装置106とを有する。これらの各ハードウェアは、それぞれがバス107により通信可能に接続される。
As shown in FIG. 1, the inference device 10 according to the present embodiment is realized by the hardware configuration of a general computer or computer system, and includes an input device 101, a display device 102, an external I/F 103, and a communication I/F. F 104 , processor 105 and memory device 106 . Each of these pieces of hardware is communicably connected via a bus 107 .
入力装置101は、例えば、キーボードやマウス、タッチパネル等である。表示装置102は、例えば、ディスプレイ等である。なお、推論装置10は、例えば、入力装置101及び表示装置102のうちの少なくとも一方を有していなくてもよい。
The input device 101 is, for example, a keyboard, mouse, touch panel, or the like. The display device 102 is, for example, a display. Note that the inference device 10 may not have at least one of the input device 101 and the display device 102, for example.
外部I/F103は、記録媒体103a等の外部装置とのインタフェースである。推論装置10は、外部I/F103を介して、記録媒体103aの読み取りや書き込み等を行うことができる。なお、記録媒体103aとしては、例えば、CD(Compact Disc)、DVD(Digital Versatile Disk)、SDメモリカード(Secure Digital memory card)、USB(Universal Serial Bus)メモリカード等が挙げられる。
The external I/F 103 is an interface with an external device such as the recording medium 103a. The inference device 10 can perform reading, writing, etc. of the recording medium 103 a via the external I/F 103 . Examples of the recording medium 103a include CD (Compact Disc), DVD (Digital Versatile Disk), SD memory card (Secure Digital memory card), USB (Universal Serial Bus) memory card, and the like.
通信I/F104は、推論装置10を通信ネットワークに接続するためのインタフェースである。プロセッサ105は、例えば、CPU(Central Processing Unit)やGPU(Graphics Processing Unit)等の各種演算装置である。メモリ装置106は、例えば、HDD(Hard Disk Drive)やSSD(Solid State Drive)、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ等の各種記憶装置である。
The communication I/F 104 is an interface for connecting the inference device 10 to a communication network. The processor 105 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The memory device 106 is, for example, various storage devices such as HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read Only Memory), and flash memory.
本実施形態に係る推論装置10は、図1に示すハードウェア構成を有することにより、後述する学習処理と推論処理を実現することができる。なお、図1に示すハードウェア構成は一例であって、推論装置10は、例えば、複数のプロセッサ105を有していてもよいし、複数のメモリ装置106を有していてもよい。
The inference device 10 according to the present embodiment has the hardware configuration shown in FIG. 1, so that learning processing and inference processing, which will be described later, can be realized. Note that the hardware configuration shown in FIG. 1 is merely an example, and the inference device 10 may have, for example, multiple processors 105 or multiple memory devices 106 .
<機能構成>
次に、本実施形態に係る推論装置10の機能構成について、図2を参照しながら説明する。図2は、本実施形態に係る推論装置10の機能構成の一例を示す図である。 <Functional configuration>
Next, the functional configuration of theinference device 10 according to this embodiment will be described with reference to FIG. FIG. 2 is a diagram showing an example of the functional configuration of the inference device 10 according to this embodiment.
次に、本実施形態に係る推論装置10の機能構成について、図2を参照しながら説明する。図2は、本実施形態に係る推論装置10の機能構成の一例を示す図である。 <Functional configuration>
Next, the functional configuration of the
図2に示すように、本実施形態に係る推論装置10は、学習部201と、推論部202とを有する。これら各部は、例えば、推論装置10にインストールされた1以上のプログラムがプロセッサ105に実行させる処理により実現される。
As shown in FIG. 2, the inference device 10 according to this embodiment has a learning unit 201 and an inference unit 202 . These units are implemented by, for example, processing that one or more programs installed in the inference apparatus 10 cause the processor 105 to execute.
また、本実施形態に係る推論装置10は、正解付き多次元データ記憶部203と、学習用次元削減モデル記憶部204と、学習用推論モデル記憶部205と、学習済み次元削減モデル記憶部206と、学習済み推論モデル記憶部207と、正解無し多次元データ記憶部208と、推定結果記憶部209とを有する。これら各部は、例えば、メモリ装置106により実現される。なお、これら各部のうちの少なくとも1つが、推論装置10と通信ネットワークを介して接続される記憶装置等により実現されてもよい。
Further, the reasoning apparatus 10 according to the present embodiment includes a multidimensional data with correct answer storage unit 203, a learning dimensionality reduction model storage unit 204, a learning inference model storage unit 205, and a trained dimensionality reduction model storage unit 206. , a learned inference model storage unit 207 , a no-correct multidimensional data storage unit 208 , and an estimation result storage unit 209 . Each of these units is realized by the memory device 106, for example. At least one of these units may be realized by a storage device or the like connected to the inference device 10 via a communication network.
学習部201は、多次元データの次元数を削減するための次元削減モデルと、この次元削減モデルによる次元削減等を行った多次元データから特定の情報を推定するための推論モデルとを学習する。
The learning unit 201 learns a dimensionality reduction model for reducing the number of dimensions of multidimensional data, and an inference model for estimating specific information from the multidimensional data subjected to dimensionality reduction by the dimensionality reduction model. .
推論部202は、学習済み次元削減モデルと学習済み推論モデルとを用いて、多次元データから特定の情報を推定する。
The inference unit 202 uses the trained dimensionality reduction model and the trained inference model to estimate specific information from the multidimensional data.
正解付き多次元データ記憶部203は、次元削減モデルと推論モデルの学習時に用いられる正解付き多次元データの集合(以下、正解付き多次元データセットともいう)を記憶する。正解付き多次元データセットとは、推論モデルの推定対象となる特定の情報の正解(つまり、いわゆる教師データ)が付与された多次元データの集合のことである。以下では、多次元データの各属性のうち、推定対象となる特定の情報を値として取り得る属性を「推定対象属性」、それ以外の属性の「非推定対象属性」ともいう。なお、正解付き多次元データセットの具体例については後述する。
The multidimensional data with correct answers storage unit 203 stores a set of multidimensional data with correct answers (hereinafter also referred to as a multidimensional data set with correct answers) used when learning the dimensionality reduction model and the inference model. A multidimensional data set with correct answers is a set of multidimensional data to which correct answers (that is, so-called teacher data) of specific information to be estimated by an inference model are added. Hereinafter, among the attributes of multidimensional data, an attribute that can take specific information to be estimated as a value is also referred to as an "estimation target attribute", and the other attributes are also referred to as "non-estimation target attributes". A specific example of the multidimensional data set with correct answers will be described later.
学習用次元削減モデル記憶部204は、学習部201による学習対象となる次元削減モデル(以下、学習用次元削減モデルともいう。)を記憶する。次元削減モデルとは、多次元データの次元数を削減するためのモデルのことであり、例えば、主成分分析(PCA:principal component analysis)等である。本実施形態では、次元削減モデルはPCAであるものとするが、これは一例であって、他の次元削減モデルが用いられてもよい。
The learning dimensionality reduction model storage unit 204 stores a dimensionality reduction model to be learned by the learning unit 201 (hereinafter also referred to as a learning dimensionality reduction model). A dimensionality reduction model is a model for reducing the number of dimensions of multidimensional data, such as principal component analysis (PCA). In this embodiment, the dimensionality reduction model is PCA, but this is an example and other dimensionality reduction models may be used.
学習用推論モデル記憶部205は、学習部201による学習対象となる推論モデル(以下、学習用推論モデルともいう。)を記憶する。推論モデルとは、次元削減等を行った多次元データから推定対象属性値を推定するためのモデルのことであり、例えば、ニューラルネットワーク等のことである。なお、推論モデルの具体例については後述する。
The learning inference model storage unit 205 stores an inference model to be learned by the learning unit 201 (hereinafter also referred to as a learning inference model). An inference model is a model for estimating an attribute value to be estimated from multidimensional data that has undergone dimensionality reduction or the like, and is, for example, a neural network. A specific example of the inference model will be described later.
学習済み次元削減モデル記憶部206は、学習部201によって学習された学習済み次元削減モデルを記憶する。
The learned dimensionality reduction model storage unit 206 stores the learned dimensionality reduction model learned by the learning unit 201 .
学習済み推論モデル記憶部207は、学習部201によって学習された学習済み推論モデルを記憶する。
The learned inference model storage unit 207 stores the learned inference model learned by the learning unit 201.
正解無し多次元データ記憶部208は、推定対象属性値が付与されていない正解無し多次元データと、正解付き多次元データとの集合(以下、正解無し多次元データセットともいう)を記憶する。正解無し多次元データセットとは、推定対象属性値が付与されていない多次元データが含まれる多次元データセットのことである。なお、正解無し多次元データセットの具体例については後述する。
The non-correct answer multidimensional data storage unit 208 stores a set of non-correct answer multidimensional data to which estimation target attribute values are not assigned and multidimensional data with correct answer (hereinafter also referred to as a non-correct answer multidimensional data set). A multidimensional data set without correct answer is a multidimensional data set that includes multidimensional data to which an estimation target attribute value is not assigned. A specific example of the non-correct answer multidimensional data set will be described later.
推定結果記憶部209は、推論部202による推定結果(つまり、正解無し多次元データから推定された推定対象属性値)を記憶する。
The estimation result storage unit 209 stores the estimation result by the inference unit 202 (that is, the estimation target attribute value estimated from the non-correct multidimensional data).
ここで、学習部201には、次元削減部211と、ビニング部212と、情報付加部213と、学習処理部214とが含まれる。
Here, the learning unit 201 includes a dimension reduction unit 211, a binning unit 212, an information addition unit 213, and a learning processing unit 214.
次元削減部211は、正解付き多次元データセットと学習用次元削減モデルとを入力として、当該正解付き多次元データセットに含まれる各多次元データの非推定対象属性の次元数を削減する。
The dimension reduction unit 211 receives the multidimensional data set with correct answer and the dimensionality reduction model for learning as input, and reduces the number of dimensions of non-estimation target attributes of each multidimensional data included in the multidimensional data set with correct answer.
そして、次元削減部211は、この次元削減後の多次元データの集合(以下、次元削減済み正解付き多次元データセットともいう)と、学習済み次元削減モデルとを出力する。なお、次元削減済み正解付き多次元データセットの具体例については後述する。
Then, the dimensionality reduction unit 211 outputs a set of multidimensional data after this dimensionality reduction (hereinafter also referred to as a dimensionality-reduced multidimensional data set with correct answers) and a trained dimensionality reduction model. A specific example of the dimension-reduced correct multidimensional data set will be described later.
ビニング部212は、次元削減済み正解付き多次元データセットを入力として、当該次元削減済み正解付き多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行う。そして、ビニング部212は、このビニング後の多次元データの集合(以下、ビニング済み正解付き多次元データセットともいう)を出力する。なお、ビニングとは離散化とも呼ばれ、多次元データに含まれる非推定対象属性の取り得る範囲を或る一定区間のビンに分割し、その非推定対象属性の値を、どのビンに属するかを表現する値に置き換える手法のことである。
The binning unit 212 receives the dimension-reduced correct answer multidimensional data set as an input, and performs binning on non-estimation target attributes of each multidimensional data included in the dimension-reduced correct answer multidimensional data set. Then, the binning unit 212 outputs a set of multidimensional data after this binning (hereinafter also referred to as a binned multidimensional data set with correct answer). Binning is also called discretization, and divides the possible range of non-estimation target attributes contained in multidimensional data into bins of a certain interval, and determines which bin the value of the non-estimation target attribute belongs to. It is a method of replacing with a value that expresses
情報付加部213は、ビニング済み正解付き多次元データセットを入力として、当該ビニング済み正解付き多次元データセットに含まれる各多次元データに対して付加情報を付与した学習用多次元データセットを作成する。ここで、付加情報とは、ビニング済み正解付き多次元データセットに含まれる各多次元データの各非推定対象属性に関して、その非推定対象属性以外の非推定対象属性の値を固定したまま、その非推定対象属性の値を変化させたときの推定対象属性の値で構成される情報のことである。
The information addition unit 213 receives the binned multidimensional data set with correct answer as input, and creates a learning multidimensional data set by adding additional information to each multidimensional data included in the binned multidimensional data set with correct answer. do. Here, the additional information refers to each non-estimation target attribute of each multidimensional data included in the binned correct answer multidimensional data set, while fixing the value of the non-estimation target attribute other than the non-estimation target attribute. It is information configured by the value of the inference target attribute when the value of the non-inference target attribute is changed.
そして、情報付加部213は、学習用多次元データセットを出力する。なお、学習用多次元データセットの具体例については後述する。
Then, the information addition unit 213 outputs the learning multidimensional data set. A specific example of the learning multidimensional data set will be described later.
学習処理部214は、学習用多次元データセットと学習用推論モデルとを入力として、当該学習用推論モデルを学習し、学習済み推論モデルを作成する。そして、学習処理部214は、学習済み推論モデルを出力する。
The learning processing unit 214 receives the multidimensional data set for learning and the inference model for learning as input, learns the inference model for learning, and creates a trained inference model. Then, the learning processing unit 214 outputs a learned inference model.
また、推論部202には、次元削減部221と、ビニング部222と、情報付加部223と、推定処理部224とが含まれる。
The inference unit 202 also includes a dimension reduction unit 221 , a binning unit 222 , an information addition unit 223 , and an estimation processing unit 224 .
次元削減部221は、正解無し多次元データセットと学習済み次元削減モデルとを入力として、当該正解無し多次元データセットに含まれる各多次元データの非推定対象属性の次元を削減する。そして、次元削減部221は、この次元削減後の多次元データの集合(以下、次元削減済み正解無し多次元データセットともいう)を出力する。
The dimension reduction unit 221 receives the non-correct multidimensional data set and the trained dimensionality reduction model as input, and reduces the dimension of the non-estimation target attribute of each multidimensional data included in the non-correct multidimensional data set. Then, the dimension reduction unit 221 outputs a set of multidimensional data after the dimension reduction (hereinafter also referred to as a dimension-reduced non-correct multidimensional data set).
ビニング部222は、次元削減済み正解無し多次元データセットを入力として、当該次元削減済み正解無し多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行う。そして、ビニング部222は、このビニング後の多次元データの集合(以下、ビニング済み正解無し多次元データセットともいう)を出力する。
The binning unit 222 receives the dimension-reduced non-correct multidimensional data set as input, and performs binning on non-estimation target attributes of each multidimensional data included in the dimension-reduced non-correct multidimensional data set. Then, the binning unit 222 outputs a set of multidimensional data after this binning (hereinafter also referred to as a binned non-correct multidimensional data set).
情報付加部223は、ビニング済み正解無し多次元データセットを入力として、当該ビニング済み正解無し多次元データセットに含まれる各多次元データに対して付加情報を付与した推論用多次元データセットを作成する。
The information addition unit 223 receives the binned non-correct answer multidimensional data set as an input, and creates an inference multidimensional data set by adding additional information to each multidimensional data included in the binned non-correct answer multidimensional data set. do.
推定処理部224は、推論用多次元データセットと学習済み推論モデルとを入力として、当該学習済み推論モデルにより推定対象属性値を推定する。そして、推定処理部224は、推定対象属性値を推定結果として出力する。
The estimation processing unit 224 receives the inference multidimensional data set and the learned inference model as input, and estimates an inference target attribute value using the learned inference model. Then, the estimation processing unit 224 outputs the estimation target attribute value as an estimation result.
なお、図1に示す各部は複数の装置が分散して有していてもよい。特に、例えば、学習部201と推論部202は異なる装置が有していてもよい。このとき、学習部201を有する装置を「学習装置」と呼んでもよい。
Note that each unit shown in FIG. 1 may be distributed among a plurality of devices. In particular, for example, the learning unit 201 and the inference unit 202 may be included in different devices. At this time, the device having the learning unit 201 may be called a “learning device”.
<学習処理>
次に、本実施形態に係る学習処理について、図3を参照しながら説明する。図3は、本実施形態に係る学習処理の一例を示すフローチャートである。 <Learning processing>
Next, the learning process according to this embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing an example of learning processing according to this embodiment.
次に、本実施形態に係る学習処理について、図3を参照しながら説明する。図3は、本実施形態に係る学習処理の一例を示すフローチャートである。 <Learning processing>
Next, the learning process according to this embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing an example of learning processing according to this embodiment.
ステップS101:まず、学習部201の次元削減部211は、正解付き多次元データ記憶部203に記憶されている正解付き多次元データセットを入力する。
Step S101: First, the dimension reduction unit 211 of the learning unit 201 inputs the multidimensional data set with correct answer stored in the multidimensional data with correct answer storage unit 203 .
ここで、正解付き多次元データセットの具体例について、図4を参照しながら説明する。図4は、正解付き多次元データセットの一例を表形式で示す図である。
Here, a specific example of a multidimensional dataset with correct answers will be described with reference to FIG. FIG. 4 is a diagram showing an example of a multidimensional data set with correct answers in tabular form.
図4に示す例は、「年月」、「性別」、「年代」等を非推定対象属性、「契約数」を推定対象属性として持つ多次元データの集合を表形式で表したものである。
The example shown in FIG. 4 is a tabular representation of a set of multidimensional data having "year/month", "gender", "age", etc. as non-estimated attributes and "number of contracts" as an estimated target attribute. .
「年月」は、「2019/4」、「2019/5」等といった年及び月を示す値を取り得る。「性別」は、「男性」又は「女性」のいずれかの値を取り得る。「年代」は、「10代」、「20代」、「30代」等といった年代を示す値を取り得る。「契約数」は、当該年月、性別及び年代における契約数を値として取る。
"Year and month" can take values indicating the year and month, such as "2019/4" and "2019/5". "Gender" can take the value of either "male" or "female". "Age" can take values indicating ages such as "teens", "twenties", and "thirties". The "number of contracts" takes as a value the number of contracts in the relevant year/month, gender, and age group.
例えば、図4に示す正解付き多次元データセットの1行目の多次元データでは、(年月,性別,年代,・・・,契約数)=(2019/4,男性,30代,・・・,200)である。同様に、2行目の多次元データでは、(年月,性別,年代,・・・,契約数)=(2019/5,女性,20代,・・・,100)である。
For example, in the multidimensional data in the first row of the multidimensional data set with correct answer shown in FIG. , 200). Similarly, in the multidimensional data in the second row, (year/month, sex, age, ..., number of contracts) = (2019/5, female, twenties, ..., 100).
なお、図4に示す正解付き多次元データセットは一例であって、本実施形態は様々な属性を持つ多次元データの集合に対して適用可能である。また、各属性が取り得る値も様々に定義され得る。例えば、図4に示す例では「年月」は或る年の毎月の値を取り得るが、例えば、或る年の隔月の値を取るものであってもよいし、隔年の毎月又は隔月の値を取るものであってもよい。同様に、「年代」は10歳毎に区切られた値を取る場合に限られず、例えば、「若年層」や「高齢層」といった値を取るものであってもよいし、「20歳~25歳」等といった範囲を値に取るものであってもよいし、「20歳」等といった特定の年齢の値に取るものであってもよい。また、「契約数」についても、契約した顧客数であってもよいし、契約の延べ数等であってもよい。
The multidimensional data set with correct answer shown in FIG. 4 is an example, and the present embodiment can be applied to multidimensional data sets having various attributes. Also, the values that each attribute can take can be defined in various ways. For example, in the example shown in FIG. 4, "year and month" can take values for each month of a certain year, but it may take values for every other month of a certain year. It may take a value. Similarly, the "age" is not limited to taking values separated by 10 years, for example, it may take values such as "young age group" or "old age group", or "20 years old to 25 years old". The value may be a range such as "years old" or a specific age value such as "20 years old". Also, the "number of contracts" may be the number of contracted customers or the total number of contracts.
以下では、正解付き多次元データセットに含まれる各多次元データは「年月」、「性別」、「年代」以外にも様々な非推定対象属性を持っており、合計N個の非推定対象属性を持っているものとする。すなわち、正解付き多次元データセットに含まれる各多次元データは、N個の非推定対象属性と、1個の推定対象属性とを持っているものとする。
In the following, each multidimensional data contained in the multidimensional data set with correct answers has various non-estimable attributes other than “year/month”, “gender”, and “age”, and there are a total of N non-estimable attributes. attribute. That is, each multidimensional data included in the multidimensional data set with correct answers has N non-estimation target attributes and one estimation target attribute.
ステップS102:次に、学習部201の次元削減部211は、正解付き多次元データセットと、学習用次元削減モデル記憶部204に記憶されている学習用次元削減モデルとを入力として、当該正解付き多次元データセットに含まれる各多次元データの非推定対象属性の次元数を削減する。これにより、学習用次元削減モデルが学習され、学習済み次元削減モデルが得られる。以下では、一例として、主成分分析により非推定対象属性の次元数(N次元)が、第1主成分、第2主成分及び第3主成分の3次元に削減されたものとする。なお、主成分分析は既知の手法であるため、その詳細な説明は省略する。
Step S102: Next, the dimensionality reduction unit 211 of the learning unit 201 inputs the multidimensional data set with correct answer and the dimensionality reduction model for learning stored in the dimensionality reduction model for learning storage unit 204, Reduce the number of dimensions of non-estimable attributes of each multidimensional data contained in the multidimensional dataset. As a result, the dimensionality reduction model for learning is learned, and a learned dimensionality reduction model is obtained. In the following, as an example, it is assumed that the number of dimensions (N dimensions) of non-estimation target attributes has been reduced to three dimensions of the first principal component, the second principal component, and the third principal component by principal component analysis. Since the principal component analysis is a well-known method, detailed description thereof will be omitted.
そして、学習部201の次元削減部211は、次元削減済み正解付き多次元データセットをビニング部212に出力すると共に、学習済み次元削減モデルを学習済み次元削減モデル記憶部206に出力する。なお、学習済み次元削減モデルとは、例えば、N個の非推定対象属性を持つN次元データを、第1主成分、第2主成分及び第3主成分を属性として持つ3次元データに変換する写像やそのパラメータ等を表す情報のことである。
Then, the dimension reduction unit 211 of the learning unit 201 outputs the dimension-reduced multidimensional data set with correct answer to the binning unit 212 and outputs the trained dimension reduction model to the trained dimension reduction model storage unit 206 . Note that the trained dimensionality reduction model, for example, converts N-dimensional data having N non-estimation target attributes into three-dimensional data having a first principal component, a second principal component, and a third principal component as attributes. It is the information representing the mapping and its parameters.
ここで、次元削減済み正解付き多次元データセットの具体例について、図5を参照しながら説明する。図5は、次元削減済み正解付き多次元データセットの一例を表形式で示す図である。
Here, a specific example of the dimension-reduced correct answer multidimensional data set will be described with reference to FIG. FIG. 5 is a diagram showing an example of a multidimensional data set with correct answers after dimensionality reduction in tabular form.
図5に示す例は、図4に示す正解付き多次元データセットに含まれる各多次元データの非推定対象属性を3次元に次元削減した多次元データの集合を表形式で表したものである。図5に示す次元削減済み正解付き多次元データセットに含まれる各多次元データは、「第1主成分」、「第2主成分」、「第3主成分」を非推定対象属性、「契約数」を推定対象属性として持つ。
The example shown in FIG. 5 is a tabular representation of a set of multidimensional data in which non-estimable attributes of each multidimensional data included in the multidimensional data set with correct answer shown in FIG. 4 are reduced to three dimensions. . Each multidimensional data included in the dimensionally reduced multidimensional data set with correct answer shown in FIG. number” as an inference target attribute.
「第1主成分」、「第2主成分」及び「第3主成分」は、主成分分析でそれぞれ第1主成分、第2主成分及び第3主成分とした値を取る。なお、これらの第1主成分~第3主成分は、図4に示す正解付き多次元データセットに含まれる各多次元データが持つ非推定対象属性であってもよいし、新たに定義した属性であってもよい。
"First principal component", "second principal component" and "third principal component" take the values of the first principal component, second principal component and third principal component, respectively, in principal component analysis. These first to third principal components may be non-estimable attributes of each multidimensional data contained in the multidimensional data set with correct answer shown in FIG. 4, or newly defined attributes may be
例えば、図5に示す次元削減済み正解付き多次元データセットの1行目の多次元データは、図4に示す正解付き多次元データセットの1行目の多次元データを次元削減したものであり、(第1主成分,第2主成分,第3主成分,契約数)=(53,28,103,200)である。同様に、2行目の多次元データは、図4に示す正解付き多次元データセットの2行目の多次元データを次元削減したものであり、(第1主成分,第2主成分,第3主成分,契約数)=(24,80,9,100)である。
For example, the multidimensional data in the first row of the multidimensional data set with the correct answer after the dimension reduction shown in FIG. 5 is the multidimensional data in the first row of the multidimensional data set with the correct answer shown in FIG. , (first principal component, second principal component, third principal component, number of contracts)=(53, 28, 103, 200). Similarly, the multidimensional data in the second row is obtained by reducing the dimensions of the multidimensional data in the second row of the multidimensional data set with correct answer shown in FIG. 3 principal components, number of contracts) = (24, 80, 9, 100).
なお、図5に示す次元削減済み正解付き多次元データセットは一例であって、例えば、次元削減後の次元数は、適宜、任意の値(ただし、Nよりも小さい2以上の整数値)とすることが可能である。一般に、次元削減後の次元数が小さい方が、より計算量を削減できる一方で、推論時の推定精度が低下する。このため、次元削減後の次元数は、目的とするタスクの種類やそのタスクで要求される計算時間等を考慮して適宜決定される。
It should be noted that the multidimensional data set with correct answer after dimension reduction shown in FIG. It is possible to In general, the smaller the number of dimensions after dimensionality reduction, the more the amount of calculation can be reduced, but the estimation accuracy at the time of inference is reduced. Therefore, the number of dimensions after dimension reduction is appropriately determined in consideration of the type of target task, the calculation time required for the task, and the like.
ステップS103:次に、学習部201のビニング部212は、次元削減済み正解付き多次元データセットを入力として、当該次元削減済み正解付き多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行う。そして、学習部201のビニング部212は、ビニング済み正解付き多次元データセットを情報付加部213に出力する。
Step S103: Next, the binning unit 212 of the learning unit 201 receives the dimensionality-reduced correct multidimensional data set as input, and non-estimation target attributes of each multidimensional data included in the dimensionality-reduced correct multidimensional data set. binning. Then, the binning unit 212 of the learning unit 201 outputs the binned multidimensional data set with correct answer to the information adding unit 213 .
ここで、ビニング済み正解付き多次元データセットの具体例について、図6を参照しながら説明する。図6は、ビニング済み正解付き多次元データセットの一例を表形式で示す図である。
Here, a specific example of the binned multidimensional data set with correct answers will be described with reference to FIG. FIG. 6 is a diagram showing an example of a binned multidimensional data set with correct answers in tabular form.
図6に示す例は、図5に示す次元削減済み正解付き多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行った多次元データの集合を表形式で表したものである。図6に示すビニング済み正解付き多次元データセットに含まれる各多次元データは、「第1主成分」、「第2主成分」、「第3主成分」を非推定対象属性、「契約数」を推定対象属性として持ち、各非推定対象属性の値はビニングされている。
The example shown in FIG. 6 is a tabular representation of a set of multidimensional data obtained by binning the non-estimable attributes of each multidimensional data contained in the multidimensional data set with the dimension-reduced correct answer shown in FIG. It is. Each multidimensional data included in the binned multidimensional data set with correct answer shown in FIG. ” as an inferred target attribute, and the values of each non-inferred target attribute are binned.
例えば、図6に示すビニング済み正解付き多次元データセットの1行目の多次元データは、図5に示す次元削減済み正解付き多次元データセットの1行目の多次元データの非推定対象属性の値をビニングしたものであり、(第1主成分,第2主成分,第3主成分,契約数)=(5,3,10,200)である。同様に、2行目の多次元データは、図5に示す次元削減済み正解付き多次元データセットの2行目の多次元データの非推定対象属性の値をビニングしたものであり、(第1主成分,第2主成分,第3主成分,契約数)=(2,8,1,100)である。
For example, the multidimensional data in the first row of the multidimensional data set with the binned correct answer shown in FIG. (first principal component, second principal component, third principal component, number of contracts)=(5, 3, 10, 200). Similarly, the multidimensional data in the second row is obtained by binning the values of the non-estimation target attribute of the multidimensional data in the second row of the multidimensional data set with correct answer after the dimension reduction shown in FIG. principal component, second principal component, third principal component, number of contracts)=(2, 8, 1, 100).
なお、図6に示すビニング済み正解付き多次元データセットは一例であって、例えば、ビニングする際のビンの区間幅等は、適宜、任意の値とすることが可能である。
It should be noted that the binned multidimensional data set with correct answer shown in FIG. 6 is an example, and for example, the interval width of the bins during binning can be appropriately set to any value.
ステップS104:次に、学習部201の情報付加部213は、ビニング済み正解付き多次元データセットを入力として、当該ビニング済み正解付き多次元データセットに含まれる各多次元データに対して付加情報を付与した学習用多次元データセットを作成する。そして、学習部201の情報付加部213は、学習用多次元データセットを学習処理部214に出力する。
Step S104: Next, the information addition unit 213 of the learning unit 201 receives the binned multidimensional data set with correct answer as input, and adds additional information to each multidimensional data included in the binned multidimensional data set with correct answer. Create a given training multidimensional dataset. Then, the information addition unit 213 of the learning unit 201 outputs the learning multidimensional data set to the learning processing unit 214 .
なお、付加情報とは、上述したように、ビニング済み正解付き多次元データセットに含まれる各多次元データの各非推定対象属性に関して、その非推定対象属性以外の非推定対象属性の値を固定したまま、その非推定対象属性の値を変化(ただし、その非推定対象属性が取り得る値の範囲内で変化)させたときの推定対象属性の値で構成される情報のことである。このような推定対象属性の値は、当該非推定対象属性以外の非推定対象属性の値を固定したまま、当該非推定対象属性の値を変化させたときの推定対象属性の値をビニング済み正解付き多次元データセットから検索し、それらの推定対象属性値を集計することで得られる。
Note that the additional information is, as described above, fixed values of non-estimation target attributes other than the non-estimation target attributes for each non-estimation target attribute of each multidimensional data included in the binned multidimensional data set with correct answers. It is information composed of the value of the inference target attribute when the value of the non-inference target attribute is changed (within the range of values that the non-inference target attribute can take) while the value of the inference target attribute is changed. The value of such an inference target attribute is the binned correct answer of the value of the inference target attribute when the value of the non-estimation target attribute is changed while the value of the non-estimation target attribute other than the non-estimation target attribute is fixed. It is obtained by searching from a multidimensional data set with and aggregating those estimated target attribute values.
ここで、学習用多次元データセットの具体例について、図7を参照しながら説明する。図7は、学習用多次元データセットの一例を表形式で示す図である。
Here, a specific example of the multidimensional data set for learning will be described with reference to FIG. FIG. 7 is a diagram showing an example of a learning multidimensional data set in tabular form.
図7に示す例は、図6に示すビニング済み正解付き多次元データセットに含まれる各多次元データに対して付加情報を付加した多次元データの集合を表形式で表したものである。図7に示す学習用多次元データセットに含まれる各多次元データには、「第1主成分以外固定」と「第2主成分以外固定」と「第3主成分以外固定」とで構成される付加情報が付加されている。
The example shown in FIG. 7 is a tabular representation of a set of multidimensional data in which additional information is added to each piece of multidimensional data included in the binned multidimensional data set with correct answers shown in FIG. Each piece of multidimensional data included in the multidimensional data set for learning shown in FIG. additional information is added.
「第1主成分以外固定」は、第2主成分及び第3主成分の値を固定したまま、第1主成分の値を変化させたときの非推定対象属性の値(契約数)を集計したものである。図7に示す例では、第1主成分の値を「0」、「1」等にそれぞれ変化させたときの契約数を集計したものである。
"Fixed other than 1st principal component" tabulates the value of the non-estimable attribute (number of contracts) when the value of the 1st principal component is changed while the values of the 2nd and 3rd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is tabulated when the value of the first principal component is changed to "0", "1", etc., respectively.
「第2主成分以外固定」は、第1主成分及び第3主成分の値を固定したまま、第2主成分の値を変化させたときの非推定対象属性の値(契約数)を集計したものである。図7に示す例では、第2主成分の値を「0」、「1」等にそれぞれ変化させたときの契約数を集計したものである。
"Fixed other than 2nd principal component" tabulates the value of the non-estimable attribute (number of contracts) when the value of the 2nd principal component is changed while the values of the 1st and 3rd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is tabulated when the value of the second principal component is changed to "0", "1", etc., respectively.
「第3主成分以外固定」は、第1主成分及び第2主成分の値を固定したまま、第3主成分の値を変化させたときの非推定対象属性の値(契約数)を集計したものである。図7に示す例では、第3主成分の値を「0」、「1」等にそれぞれ変化させたときの契約数を集計したものである。
"Fixed other than the 3rd principal component" aggregates the value of the non-estimable attribute (number of contracts) when the value of the 3rd principal component is changed while the values of the 1st and 2nd principal components are fixed. It is what I did. In the example shown in FIG. 7, the number of contracts is totaled when the value of the third principal component is changed to "0", "1", etc., respectively.
例えば、図7に示す学習用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「0」は、図6に示すビニング済み正解付き多次元データセットの1行目の多次元データセットの第1主成分の値を「0」に変化させたときの契約数を集計したものである。すなわち、図7に示す学習用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「0」は、図6に示すビニング済み正解付き多次元データセットで(第1主成分,第2主成分,第3主成分)=(0,3,10)である契約数を集計した値「400」である。
For example, "0" in "Fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. The number of contracts is aggregated when the value of the first principal component of the eye multidimensional data set is changed to "0". That is, "0" in the "fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. 1 principal component, 2nd principal component, 3rd principal component)=(0, 3, 10).
同様に、図7に示す学習用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「1」は、図6に示すビニング済み正解付き多次元データセットの1行目の多次元データセットの第1主成分の値を「1」に変化させたときの契約数を集計したものである。すなわち、図7に示す学習用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「1」は、図6に示すビニング済み正解付き多次元データセットで(第1主成分,第2主成分,第3主成分)=(1,3,10)である契約数を集計した値「700」である。
Similarly, "1" of "fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. The number of contracts is aggregated when the value of the first principal component of the multidimensional data set in the row is changed to "1". That is, "1" in the "fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. 1 principal component, 2nd principal component, 3rd principal component)=(1, 3, 10).
同様に、図7に示す学習用多次元データセットの1行目の多次元データの「第2主成分以外固定」の「0」は、図6に示すビニング済み正解付き多次元データセットの1行目の多次元データセットの第2主成分の値を「0」に変化させたときの契約数を集計したものである。すなわち、図7に示す学習用多次元データセットの1行目の多次元データの「第2主成分以外固定」の「0」は、図6に示すビニング済み正解付き多次元データセットで(第1主成分,第2主成分,第3主成分)=(5,0,10)である契約数を集計した値「400」である。
Similarly, "0" of "fixed other than the second principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. The number of contracts is aggregated when the value of the second principal component of the multidimensional data set in the row is changed to "0". That is, "0" in the "fixed other than the second principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. 1st principal component, 2nd principal component, 3rd principal component)=(5, 0, 10).
同様に、図7に示す学習用多次元データセットの1行目の多次元データの「第2主成分以外固定」の「1」は、図6に示すビニング済み正解付き多次元データセットの1行目の多次元データセットの第2主成分の値を「1」に変化させたときの契約数を集計したものである。すなわち、図7に示す学習用多次元データセットの1行目の多次元データの「第2主成分以外固定」の「1」は、図6に示すビニング済み正解付き多次元データセットで(第1主成分,第2主成分,第3主成分)=(5,1,10)である契約数を集計した値「500」である。
Similarly, "1" of "fixed other than the second principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. The number of contracts is aggregated when the value of the second principal component of the multidimensional data set in the row is changed to "1". That is, "1" in the "fixed other than the second principal component" of the multidimensional data in the first row of the multidimensional data set for learning shown in FIG. 1 principal component, 2nd principal component, 3rd principal component)=(5, 1, 10).
付加情報のその他の属性についても同様である。以下では、学習用多次元データセットに含まれる各多次元データの「第1主成分」、「第2主成分」及び「第3主成分」の値の組の集合を非推定対象属性値集合Wともいい、その付加情報の集合をXで表すものとする。
The same applies to other attributes of additional information. In the following, a non-estimation target attribute value set is defined as a set of values of the "first principal component", "second principal component", and "third principal component" of each multidimensional data included in the multidimensional data set for learning. It is also called W, and the set of additional information is represented by X.
なお、図7に示す学習用多次元データセットは一例であって、例えば、データ拡張等の手法を利用して、学習用多次元データセットに含まれる多次元データからその付加情報の一部を未知(例えば、「-」又は空値等を設定)にした多次元データを作成してもよい。
Note that the multidimensional data set for learning shown in FIG. 7 is an example. Multidimensional data may be created with unknowns (for example, set to "-" or null value).
ステップS105:次に、学習部201の学習処理部214は、学習用多次元データセットと、学習用推論モデル記憶部205に記憶されている学習用推論モデルとを入力として、当該学習用推論モデルを学習し、学習済み推論モデルを作成する。すなわち、学習部201の学習処理部214は、非推定対象属性値の組を表すベクトルw∈Wと、このwに対応する付加情報x∈Xとを入力として、推定対象属性値yが正しく推定されるように(つまり、学習用推論モデルで推定した推定対象属性値yとその正解との誤差を最小化するように)、既知の誤差逆伝播法等を用いて学習用推論モデルのパラメータを学習する。なお、wに対応するxとは、学習用多次元データセットでwが表す非推定対象属性値の組同一行のxのことである。
Step S105: Next, the learning processing unit 214 of the learning unit 201 inputs the learning multidimensional data set and the learning inference model stored in the learning inference model storage unit 205, and and create a trained inference model. That is, the learning processing unit 214 of the learning unit 201 receives a vector wεW representing a set of non-estimation target attribute values and additional information xεX corresponding to w, and correctly estimates the estimation target attribute value y. (that is, to minimize the error between the estimated target attribute value y estimated by the learning inference model and its correct answer), the parameters of the learning inference model are set using a known error backpropagation method, etc. learn. Note that x corresponding to w is x in the same row as the set of non-estimation target attribute values represented by w in the learning multidimensional data set.
ここで、推論モデルは特に限定されないが、例えば、図8に示すようなニューラルネットワークを推論モデルとすることが考えられる。図8に示すニューラルネットワークは、非推定対象属性値の組を表すベクトルwとその付加情報xとを入力として、推定対象属性値yを出力するモデルである。
Here, the inference model is not particularly limited, but for example, a neural network as shown in Fig. 8 may be used as the inference model. The neural network shown in FIG. 8 is a model that receives as input a vector w representing a set of non-estimation target attribute values and its additional information x, and outputs an estimation target attribute value y.
また、例えば、図9に示すようなニューラルネットワークを推論モデルとしてもよい。図9に示すニューラルネットワークは、非推定対象属性値の組を表すベクトルwとその付加情報xとを入力として、推定対象属性値yと非推定対象属性値の組を表すベクトルw'とを出力するモデルである。このモデルを用いる場合には、推定対象属性値yに加えて、w'がwを正しく再現するように、学習用推論モデルのパラメータを学習する。すなわち、推定対象属性値yに加えて、元のwを再現するw'がマルチタスクに推定されるように、学習用推論モデルのパラメータを学習する。図9に示すようなニューラルネットワークを推論モデルとすることで、付加情報xから特徴を適切に抽出することができるため、推定対象属性値yに当該特徴を十分に反映させることが可能となり、より高い推定精度が得られることが期待できる。
Also, for example, a neural network as shown in FIG. 9 may be used as an inference model. The neural network shown in FIG. 9 receives a vector w representing a set of non-estimation target attribute values and its additional information x, and outputs a vector w′ representing a set of estimation target attribute values y and non-estimation target attribute values. It is a model that When using this model, in addition to the attribute value y to be estimated, the parameters of the inference model for learning are learned so that w' correctly reproduces w. That is, in addition to the attribute value y to be estimated, the parameters of the inference model for learning are learned so that w' that reproduces the original w can be estimated by multitasking. By using a neural network as shown in FIG. 9 as an inference model, it is possible to appropriately extract features from the additional information x. High estimation accuracy can be expected.
ステップS106:最後に、学習部201の学習処理部214は、学習済み推論モデルを学習済み推論モデル記憶部207に出力する。
Step S106: Finally, the learning processing unit 214 of the learning unit 201 outputs the learned inference model to the learned inference model storage unit 207.
<推論処理>
次に、本実施形態に係る推論処理について、図10を参照しながら説明する。図10は、本実施形態に係る推論処理の一例を示すフローチャートである。 <Inference processing>
Next, inference processing according to this embodiment will be described with reference to FIG. FIG. 10 is a flowchart showing an example of inference processing according to this embodiment.
次に、本実施形態に係る推論処理について、図10を参照しながら説明する。図10は、本実施形態に係る推論処理の一例を示すフローチャートである。 <Inference processing>
Next, inference processing according to this embodiment will be described with reference to FIG. FIG. 10 is a flowchart showing an example of inference processing according to this embodiment.
ステップS201:まず、推論部202の次元削減部221は、正解無し多次元データ記憶部208に記憶されている正解無し多次元データを入力する。
Step S201: First, the dimension reduction unit 221 of the inference unit 202 inputs the multidimensional data without correct answers stored in the multidimensional data storage unit 208 without correct answers.
ここで、正解無し多次元データセットの具体例について、図11を参照しながら説明する。図11は、正解無し多次元データセットの一例を表形式で示す図である。
Here, a specific example of a non-correct multidimensional data set will be described with reference to FIG. FIG. 11 is a diagram showing an example of a non-correct multidimensional data set in tabular form.
図11に示す例は、「年月」、「性別」、「年代」等を非推定対象属性、「契約数」を推定対象属性として持つ多次元データの集合を表形式で表したものである。図11に示す正解無し多次元データセットの1行目の多次元データでは、(年月,性別,年代,・・・,契約数)=(2020/4,男性,30代,・・・,200)である。一方で、2行目の多次元データでは、(年月,性別,年代,・・・,契約数)=(2020/5,女性,20代,・・・,-)である。つまり、2行目の多次元データでは、推定対象属性値である契約数が未知である。このように、正解無し多次元データセットには、推定対象属性値が未知である多次元データが少なくとも含まれている。
The example shown in FIG. 11 is a tabular representation of a set of multidimensional data having "year/month", "sex", "age", etc. as non-estimation target attributes and "number of contracts" as an estimation target attribute. . In the multidimensional data in the first row of the multidimensional data set without correct answer shown in FIG. 200). On the other hand, in the multidimensional data in the second row, (year/month, sex, age, ..., number of contracts) = (2020/5, female, twenties, ..., -). That is, in the multidimensional data in the second row, the number of contracts, which is the attribute value to be estimated, is unknown. In this way, the non-correct multidimensional data set includes at least multidimensional data whose attribute values to be estimated are unknown.
ステップS202:次に、推論部202の次元削減部221は、正解無し多次元データセットと、学習済み次元削減モデル記憶部206に記憶されている学習済み次元削減モデルとを入力として、当該学習済み次元削減モデルにより、当該正解無し多次元データセットに含まれる各多次元データの非推定対象属性の次元数を削減する。そして、推論部202の次元削減部221は、次元削減済み正解無し多次元データセットをビニング部212に出力する。
Step S202: Next, the dimensionality reduction unit 221 of the inference unit 202 inputs the non-correct multidimensional data set and the learned dimensionality reduction model stored in the learned dimensionality reduction model storage unit 206. The dimensionality reduction model is used to reduce the number of dimensions of non-estimation target attributes of each multidimensional data included in the non-correct answer multidimensional data set. Then, the dimension reduction unit 221 of the inference unit 202 outputs the dimension-reduced non-correct multidimensional data set to the binning unit 212 .
ここで、次元削減済み正解無し多次元データセットの具体例について、図12を参照しながら説明する。図12は、次元削減済み正解無し多次元データセットの一例を表形式で示す図である。
Here, a specific example of the dimension-reduced non-correct multidimensional data set will be described with reference to FIG. FIG. 12 is a diagram showing an example of a dimension-reduced non-correct multidimensional data set in tabular form.
図12に示す例は、図11に示す正解無し多次元データセットに含まれる各多次元データの非推定対象属性を3次元に次元削減した多次元データの集合を表形式で表したものである。図12に示す次元削減済み正解無し多次元データセットに含まれる各多次元データは、「第1主成分」、「第2主成分」、「第3主成分」を非推定対象属性、「契約数」を推定対象属性として持つ。
The example shown in FIG. 12 is a tabular representation of a set of multidimensional data obtained by reducing the non-estimable attributes of each multidimensional data contained in the non-correct multidimensional data set shown in FIG. 11 to three dimensions. . Each multidimensional data included in the dimension-reduced non-correct multidimensional data set shown in FIG. number” as an inference target attribute.
例えば、図12に示す次元削減済み正解無し多次元データセットの1行目の多次元データは、図11に示す正解無し多次元データセットの1行目の多次元データを次元削減したものであり、(第1主成分,第2主成分,第3主成分,契約数)=(58,21,109,200)である。同様に、2行目の多次元データは、図11に示す正解無し多次元データセットの2行目の多次元データを次元削減したものであり、(第1主成分,第2主成分,第3主成分,契約数)=(20,81,6,-)である。
For example, the multidimensional data in the first row of the dimension-reduced non-correct multidimensional data set shown in FIG. 12 is the multidimensional data in the first row of the non-correct multidimensional data set shown in FIG. , (first principal component, second principal component, third principal component, number of contracts)=(58, 21, 109, 200). Similarly, the multidimensional data in the second row is obtained by reducing the dimensions of the multidimensional data in the second row of the non-correct answer multidimensional data set shown in FIG. 3 principal components, number of contracts) = (20, 81, 6, -).
ステップS203:次に、推論部202のビニング部222は、次元削減済み正解無し多次元データセットを入力として、当該次元削減済み正解無し多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行う。そして、推論部202のビニング部222は、ビニング済み正解無し多次元データセットを情報付加部223に出力する。
Step S203: Next, the binning unit 222 of the inference unit 202 receives the dimension-reduced non-correct multidimensional data set as input, and the non-estimation target attribute of each multidimensional data included in the dimension-reduced non-correct multidimensional data set. binning. Then, the binning unit 222 of the inference unit 202 outputs the binned non-correct multidimensional data set to the information adding unit 223 .
ここで、ビニング済み正解無し多次元データセットの具体例について、図13を参照しながら説明する。図13は、ビニング済み正解無し多次元データセットの一例を表形式で示す図である。
Here, a specific example of the binned non-correct multidimensional data set will be described with reference to FIG. FIG. 13 is a diagram showing an example of a binned non-correct multidimensional data set in tabular form.
図13に示す例は、図12に示す次元削減済み正解無し多次元データセットに含まれる各多次元データの非推定対象属性に対してビニングを行った多次元データの集合を表形式で表したものである。図13に示すビニング済み正解無し多次元データセットに含まれる各多次元データは、「第1主成分」、「第2主成分」、「第3主成分」を非推定対象属性、「契約数」を推定対象属性として持ち、各非推定対象属性の値はビニングされている。
The example shown in FIG. 13 is a tabular representation of a set of multidimensional data obtained by binning non-estimable attributes of each multidimensional data included in the dimension-reduced non-correct multidimensional data set shown in FIG. It is. Each multidimensional data included in the binned non-correct answer multidimensional data set shown in FIG. ” as an inferred target attribute, and the values of each non-inferred target attribute are binned.
例えば、図13に示す次元削減済み正解無し多次元データセットの1行目の多次元データは、図12に示す次元削減済み正解無し多次元データセットの1行目の多次元データの非推定対象属性の値をビニングしたものであり、(第1主成分,第2主成分,第3主成分,契約数)=(5,3,10,200)である。同様に、2行目の多次元データは、図12に示す次元削減済み正解無し多次元データセットの2行目の多次元データの非推定対象属性の値をビニングしたものであり、(第1主成分,第2主成分,第3主成分,契約数)=(2,8,1,-)である。
For example, the multidimensional data in the first row of the dimension-reduced no-correct multidimensional data set shown in FIG. Attribute values are binned, and (first principal component, second principal component, third principal component, number of contracts)=(5, 3, 10, 200). Similarly, the multidimensional data in the second row is obtained by binning the values of the non-estimation target attribute of the multidimensional data in the second row of the dimension-reduced non-correct multidimensional data set shown in FIG. principal component, second principal component, third principal component, number of contracts)=(2, 8, 1, -).
ステップS204:次に、推論部202の情報付加部223は、ビニング済み正解無し多次元データセットを入力として、当該ビニング済み正解無し多次元データセットに含まれる各多次元データに対して付加情報を付与した推論用多次元データセットを作成する。そして、推論部202の情報付加部223は、推論用多次元データセットを推定処理部224に出力する。
Step S204: Next, the information addition unit 223 of the inference unit 202 receives the binned non-correct answer multidimensional data set as input, and adds additional information to each multidimensional data included in the binned non-correct answer multidimensional data set. Create a given inference multidimensional dataset. Then, the information addition unit 223 of the inference unit 202 outputs the inference multidimensional data set to the estimation processing unit 224 .
ここで、推論用多次元データセットの具体例について、図14を参照しながら説明する。図14は、推論用多次元データセットの一例を表形式で示す図である。
Here, a specific example of the multidimensional data set for inference will be described with reference to FIG. FIG. 14 is a diagram showing an example of a multidimensional data set for inference in tabular form.
図14に示す例は、図13に示すビニング済み正解無し多次元データセットに含まれる各多次元データに対して付加情報を付加した多次元データの集合を表形式で表したものである。図14に示す推論用多次元データセットに含まれる各多次元データには、「第1主成分以外固定」と「第2主成分以外固定」と「第3主成分以外固定」とで構成される付加情報が付加されている。
The example shown in FIG. 14 is a tabular representation of a set of multidimensional data in which additional information is added to each piece of multidimensional data included in the binned non-correct answer multidimensional data set shown in FIG. Each multidimensional data included in the multidimensional data set for inference shown in FIG. additional information is added.
例えば、図14に示す推論用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「0」は、図13に示すビニング済み正解無し多次元データセットの1行目の多次元データセットの第1主成分の値を「0」に変化させたときの契約数を集計したものである。すなわち、図14に示す推論用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「0」は、図13に示すビニング済み正解無し多次元データセットで(第1主成分,第2主成分,第3主成分)=(0,3,10)である契約数を集計した値「400」である。
For example, "0" in "Fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for inference shown in FIG. The number of contracts is aggregated when the value of the first principal component of the eye multidimensional data set is changed to "0". That is, "0" in the "fixed other than the first principal component" of the multidimensional data in the first row of the multidimensional data set for inference shown in FIG. 1 principal component, 2nd principal component, 3rd principal component)=(0, 3, 10).
同様に、図14に示す推論用多次元データセットの2行目の多次元データの「第1主成分以外固定」の「0」は、図13に示すビニング済み正解無し多次元データセットの2行目の多次元データセットの第1主成分の値を「0」に変化させたときの契約数を集計したものである。すなわち、図14に示す推論用多次元データセットの2行目の多次元データの「第1主成分以外固定」の「0」は、図13に示すビニング済み正解無し多次元データセットで(第1主成分,第2主成分,第3主成分)=(0,8,1)である契約数を集計した値「500」である。
Similarly, "0" in "Fixed other than the first principal component" of the multidimensional data in the second row of the multidimensional data set for inference shown in FIG. The number of contracts is aggregated when the value of the first principal component of the multidimensional data set in the row is changed to "0". That is, "0" in the "fixed other than the first principal component" of the multidimensional data in the second row of the multidimensional data set for inference shown in FIG. 1 principal component, 2nd principal component, 3rd principal component)=(0, 8, 1).
ここで、或る非推定対象属性以外の非推定対象属性の値を固定したまま、その非推定対象属性の値を或る値に変化させたときに推定対象属性の値が存在しない場合には、付加情報には「-」又は空値等が設定される。例えば、図14に示す推論用多次元データセットの1行目の多次元データの「第1主成分以外固定」の「1」には「-」が設定されている。これは、図13に示すビニング済み正解無し多次元データセットには、(第1主成分,第2主成分,第3主成分)=(1,3,10)のときの契約数が存在しないためである。
Here, when the value of the non-estimation target attribute other than a certain non-estimation target attribute is fixed and the value of the non-estimation target attribute is changed to a certain value, if the value of the inference target attribute does not exist, , "-" or a null value is set in the additional information. For example, "-" is set to "1" of "fixed except for the first principal component" of the multidimensional data in the first row of the inference multidimensional data set shown in FIG. This is because the number of contracts when (first principal component, second principal component, third principal component) = (1, 3, 10) does not exist in the binned non-correct multidimensional data set shown in FIG. It's for.
付加情報のその他の属性についても同様である。以下では、学習時と同様に、推論用多次元データセットに含まれる各多次元データの「第1主成分」、「第2主成分」及び「第3主成分」の値の組の集合を非推定対象属性値集合Wともいい、その付加情報の集合をXで表すものとする。
The same applies to other attributes of additional information. In the following, as in the case of learning, a set of sets of values of “first principal component”, “second principal component” and “third principal component” of each multidimensional data included in the inference multidimensional data set is It is also called non-estimation target attribute value set W, and the set of additional information is represented by X.
ステップS205:次に、推論部202の推定処理部224は、推論用多次元データセットと、学習済み推論モデル記憶部207に記憶されている学習済み推論モデルとを入力として、当該学習済み推論モデルにより推定対象属性値を推定する。すなわち、推論部202の推定処理部224は、非推定対象属性値の組を表すベクトルw∈Wと、このwに対応する付加情報x∈Xとを入力として、学習済み推論モデルにより推定対象属性値yを推定する。
Step S205: Next, the estimation processing unit 224 of the inference unit 202 inputs the inference multidimensional data set and the learned inference model stored in the learned inference model storage unit 207, and Estimates the attribute value to be estimated by That is, the estimation processing unit 224 of the inference unit 202 receives as input a vector wεW representing a set of non-estimation target attribute values and additional information xεX corresponding to w, and uses the learned inference model to determine the estimation target attribute. Estimate the value y.
ステップS206:最後に、推論部202の推定処理部224は、学習済み推論モデルの推定結果(つまり、推定対象属性値y)を推定結果記憶部209に出力する。
Step S<b>206 : Finally, the estimation processing unit 224 of the inference unit 202 outputs the estimation result of the learned inference model (that is, the estimation target attribute value y) to the estimation result storage unit 209 .
<まとめ>
以上のように、本実施形態に係る推論装置10は、各非推定対象属性に関して、その非推定対象属性以外の非推定対象属性の値を固定したまま、その非推定対象属性の値を変化させたときの推定対象属性の値で構成される付加情報を作成した上で、この付加情報も用いて学習及び推論を行う。これにより、大域的な特徴を抽出することができるため、より高精度な推論が可能になる。 <Summary>
As described above, theinference apparatus 10 according to the present embodiment changes the value of each non-estimation target attribute while fixing the value of the non-estimation target attribute other than the non-estimation target attribute. After creating additional information composed of the value of the attribute to be estimated at the time, learning and inference are performed using this additional information as well. As a result, global features can be extracted, enabling more accurate inference.
以上のように、本実施形態に係る推論装置10は、各非推定対象属性に関して、その非推定対象属性以外の非推定対象属性の値を固定したまま、その非推定対象属性の値を変化させたときの推定対象属性の値で構成される付加情報を作成した上で、この付加情報も用いて学習及び推論を行う。これにより、大域的な特徴を抽出することができるため、より高精度な推論が可能になる。 <Summary>
As described above, the
また、本実施形態に係る推論装置10は、付加情報を作成する前に、非推定対象属性の次元数を削減すると共に、ビニングにより各非推定対象属性の属性値数を削減する。これにより、非推定対象属性数やその属性値数が多い場合であっても、計算量を削減することが可能となる。例えば、次元削減前の非推定対象属性の数をN、その属性値の平均をMとした場合、次元削減及びビニングを行わないとN×M個の属性値に対して計算を行う必要がある。一方、次元削減後の非推定対象属性の数をn<N、ビニング後の属性値の平均をm<Mとした場合、本実施形態ではn×m個の属性値に対して計算を行えばよく、より少ない計算量で学習及び推論を行うことが可能となる。
In addition, the inference device 10 according to the present embodiment reduces the number of dimensions of non-estimation target attributes and reduces the number of attribute values of each non-estimation target attribute by binning before creating additional information. This makes it possible to reduce the amount of calculation even when the number of non-estimation target attributes and the number of attribute values are large. For example, if the number of non-estimable attributes before dimensionality reduction is N and the average of the attribute values is M, calculation must be performed for N×M attribute values without dimensionality reduction and binning. . On the other hand, if the number of non-estimable attributes after dimensionality reduction is n<N and the average of attribute values after binning is m<M, in this embodiment, if calculation is performed for n×m attribute values, It is often possible to learn and reason with less computational effort.
本発明は、具体的に開示された上記の実施形態に限定されるものではなく、請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。
The present invention is not limited to the specifically disclosed embodiments described above, and various modifications, alterations, combinations with known techniques, etc. are possible without departing from the scope of the claims. .
10 推論装置
101 入力装置
102 表示装置
103 外部I/F
103a 記録媒体
104 通信I/F
105 プロセッサ
106 メモリ装置
107 バス
201 学習部
202 推論部
203 正解付き多次元データ記憶部
204 学習用次元削減モデル記憶部
205 学習用推論モデル記憶部
206 学習済み次元削減モデル記憶部
207 学習済み推論モデル記憶部
208 正解無し多次元データ記憶部
209 推定結果記憶部
211 次元削減部
212 ビニング部
213 情報付加部
214 学習処理部
221 次元削減部
222 ビニング部
223 情報付加部
224 推定処理部 10inference device 101 input device 102 display device 103 external I/F
103a recording medium 104 communication I/F
105processor 106 memory device 107 bus 201 learning unit 202 inference unit 203 multidimensional data storage unit with correct answer 204 dimensionality reduction model storage unit for learning 205 inference model storage unit for learning 206 learned dimensionality reduction model storage unit 207 learned inference model storage unit Unit 208 Multidimensional data storage unit without correct answer 209 Estimation result storage unit 211 Dimension reduction unit 212 Binning unit 213 Information addition unit 214 Learning processing unit 221 Dimension reduction unit 222 Binning unit 223 Information addition unit 224 Estimation processing unit
101 入力装置
102 表示装置
103 外部I/F
103a 記録媒体
104 通信I/F
105 プロセッサ
106 メモリ装置
107 バス
201 学習部
202 推論部
203 正解付き多次元データ記憶部
204 学習用次元削減モデル記憶部
205 学習用推論モデル記憶部
206 学習済み次元削減モデル記憶部
207 学習済み推論モデル記憶部
208 正解無し多次元データ記憶部
209 推定結果記憶部
211 次元削減部
212 ビニング部
213 情報付加部
214 学習処理部
221 次元削減部
222 ビニング部
223 情報付加部
224 推定処理部 10
105
Claims (7)
- 推定対象の属性を示す推定対象属性と、前記推定対象属性以外の属性を示す2以上の非推定対象属性とを持つ多次元データを入力する入力手順と、
前記多次元データの非推定対象属性の次元数を削減する次元削減手順と、
前記次元削減後の多次元データの前記非推定対象属性の値をビニングするビニング手順と、
前記ビニング後の多次元データに対して、所定の付加情報を付加する情報付加手順と、
前記付加情報が付加された多次元データを用いて、前記推定対象属性の値を推定するための推論モデルのパラメータを学習する学習手順と、
をコンピュータが実行する学習方法。 an input step of inputting multidimensional data having an inference target attribute indicating an inference target attribute and two or more non-estimation target attributes indicating attributes other than the inference target attribute;
a dimension reduction procedure for reducing the number of dimensions of non-estimation target attributes of the multidimensional data;
a binning procedure for binning the values of the non-estimable attribute of the multidimensional data after the dimensionality reduction;
an information addition procedure for adding predetermined additional information to the binning multidimensional data;
a learning procedure for learning parameters of an inference model for estimating the value of the attribute to be estimated using the multidimensional data to which the additional information is added;
a computer-implemented learning method. - 前記情報付加手順は、
前記ビニング後の多次元データの前記非推定対象属性毎に、前記非推定対象属性以外の非推定対象属性の値を変化させたときの前記推定対象属性の値の集計値を前記付加情報として付加する、請求項1に記載の学習方法。 The information addition procedure includes:
For each of the non-estimation target attributes of the multidimensional data after binning, a total value of the values of the estimation target attributes when the values of the non-estimation target attributes other than the non-estimation target attributes are changed is added as the additional information. The learning method according to claim 1, wherein - 前記学習手順は、
前記非推定対象属性の値を入力として前記推論モデルにより推定された前記推定対象属性の値と前記推定対象属性の値の正解との誤差、及び、前記推論モデルにより再現された前記非推定対象属性の値と前記推論モデルに入力された前記非推定対象属性の値との誤差、を最小化するように、前記推論モデルのパラメータを学習する、請求項1又は2に記載の学習方法。 The learning procedure includes:
Error between the value of the estimated attribute estimated by the inference model using the value of the non-estimated attribute as an input and the correct value of the estimated attribute, and the non-estimated attribute reproduced by the inference model. 3. The learning method according to claim 1 or 2, wherein parameters of said inference model are learned so as to minimize an error between the value of and the value of said non-estimated attribute input to said inference model. - 推定対象の属性を示す推定対象属性と、前記推定対象属性以外の属性を示す2以上の非推定対象属性とを持つ多次元データを入力する入力手順と、
前記多次元データの非推定対象属性の次元数を削減する次元削減手順と、
前記次元削減後の多次元データの前記非推定対象属性の値をビニングするビニング手順と、
前記ビニング後の多次元データに対して、所定の付加情報を付加する情報付加手順と、
前記付加情報が付加された多次元データを用いて、予め学習済みの推論モデルにより前記推定対象属性の値を推定する推定手順と、
をコンピュータが実行する推論方法。 an input step of inputting multidimensional data having an inference target attribute indicating an inference target attribute and two or more non-estimation target attributes indicating attributes other than the inference target attribute;
a dimension reduction procedure for reducing the number of dimensions of non-estimation target attributes of the multidimensional data;
a binning procedure for binning the values of the non-estimable attribute of the multidimensional data after the dimensionality reduction;
an information addition procedure for adding predetermined additional information to the binning multidimensional data;
an estimation step of estimating the value of the attribute to be estimated by a pre-trained inference model using the multidimensional data to which the additional information is added;
is a computer-implemented inference method. - 推定対象の属性を示す推定対象属性と、前記推定対象属性以外の属性を示す2以上の非推定対象属性とを持つ多次元データを入力する入力部と、
前記多次元データの非推定対象属性の次元数を削減する次元削減部と、
前記次元削減後の多次元データの前記非推定対象属性の値をビニングするビニング部と、
前記ビニング後の多次元データに対して、所定の付加情報を付加する情報付加部と、
前記付加情報が付加された多次元データを用いて、前記推定対象属性の値を推定するための推論モデルのパラメータを学習する学習部と、
を有する学習装置。 an input unit for inputting multidimensional data having an inference target attribute indicating an inference target attribute and two or more non-estimation target attributes indicating attributes other than the inference target attribute;
a dimension reduction unit that reduces the number of dimensions of non-estimation target attributes of the multidimensional data;
a binning unit for binning the values of the non-estimation target attribute of the multidimensional data after the dimensionality reduction;
an information addition unit that adds predetermined additional information to the binning multidimensional data;
a learning unit that learns parameters of an inference model for estimating the value of the attribute to be estimated using the multidimensional data to which the additional information is added;
A learning device having - 推定対象の属性を示す推定対象属性と、前記推定対象属性以外の属性を示す2以上の非推定対象属性とを持つ多次元データを入力する入力部と、
前記多次元データの非推定対象属性の次元数を削減する次元削減部と、
前記次元削減後の多次元データの前記非推定対象属性の値をビニングするビニング部と、
前記ビニング後の多次元データに対して、所定の付加情報を付加する情報付加部と、
前記付加情報が付加された多次元データを用いて、予め学習済みの推論モデルにより前記推定対象属性の値を推定する推定部と、
を有する推論装置。 an input unit for inputting multidimensional data having an inference target attribute indicating an inference target attribute and two or more non-estimation target attributes indicating attributes other than the inference target attribute;
a dimension reduction unit that reduces the number of dimensions of non-estimation target attributes of the multidimensional data;
a binning unit for binning the values of the non-estimation target attribute of the multidimensional data after the dimensionality reduction;
an information addition unit that adds predetermined additional information to the multidimensional data after binning;
an estimating unit that estimates the value of the attribute to be estimated by a pre-trained inference model using the multidimensional data to which the additional information is added;
A reasoning device with - コンピュータに、請求項1乃至3の何れか一項に記載の学習方法、又は、請求項4に記載の推論方法、を実行させるプログラム。 A program that causes a computer to execute the learning method according to any one of claims 1 to 3 or the inference method according to claim 4.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023520731A JPWO2022239245A1 (en) | 2021-05-14 | 2021-05-14 | |
PCT/JP2021/018484 WO2022239245A1 (en) | 2021-05-14 | 2021-05-14 | Training method, inference method, training device, inference device, and program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/018484 WO2022239245A1 (en) | 2021-05-14 | 2021-05-14 | Training method, inference method, training device, inference device, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022239245A1 true WO2022239245A1 (en) | 2022-11-17 |
Family
ID=84028075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/018484 WO2022239245A1 (en) | 2021-05-14 | 2021-05-14 | Training method, inference method, training device, inference device, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2022239245A1 (en) |
WO (1) | WO2022239245A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311944A1 (en) * | 2019-03-28 | 2020-10-01 | Canon Virginia, Inc. | Devices, systems, and methods for topological normalization for anomaly detection |
JP2021002315A (en) * | 2019-06-19 | 2021-01-07 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for generating information |
-
2021
- 2021-05-14 JP JP2023520731A patent/JPWO2022239245A1/ja active Pending
- 2021-05-14 WO PCT/JP2021/018484 patent/WO2022239245A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311944A1 (en) * | 2019-03-28 | 2020-10-01 | Canon Virginia, Inc. | Devices, systems, and methods for topological normalization for anomaly detection |
JP2021002315A (en) * | 2019-06-19 | 2021-01-07 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for generating information |
Non-Patent Citations (1)
Title |
---|
OKADA ASAMI, MORIGUCHI YUSUKE, UKITA NORIMICHI, HAGITA NORIHIRO: "People Groping by Spatio-Temporal Features of Trajectories", vol. J96-D, no. 11, 1 November 2013 (2013-11-01), XP093007105 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022239245A1 (en) | 2022-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hewamalage et al. | Forecast evaluation for data scientists: common pitfalls and best practices | |
Michelucci | Applied deep learning | |
Ray et al. | Prediction of infectious disease epidemics via weighted density ensembles | |
Tatsat et al. | Machine Learning and Data Science Blueprints for Finance | |
Chi et al. | k-pod: A method for k-means clustering of missing data | |
Häggström | Data‐driven confounder selection via Markov and Bayesian networks | |
US11514369B2 (en) | Systems and methods for machine learning model interpretation | |
US20180276691A1 (en) | Metric Forecasting Employing a Similarity Determination in a Digital Medium Environment | |
US20200143005A1 (en) | Resolving opaqueness of complex machine learning applications | |
Li et al. | Constraint-based causal structure learning with consistent separating sets | |
Lataniotis | Data-driven uncertainty quantification for high-dimensional engineering problems | |
Doherty et al. | The path to predictive analytics and machine learning | |
Rai | Advanced deep learning with R: Become an expert at designing, building, and improving advanced neural network models using R | |
Zhang et al. | Feature relevance term variation for multi-label feature selection | |
Shi et al. | A forward and backward stagewise algorithm for nonconvex loss functions with adaptive lasso | |
CN114298299A (en) | Model training method, device, equipment and storage medium based on course learning | |
Siems et al. | Curve your enthusiasm: concurvity regularization in differentiable generalized additive models | |
Sodja | Detecting anomalous time series by GAMLSS-Akaike-Weights-Scoring | |
CN115510932A (en) | Model training method and device, electronic equipment and storage medium | |
WO2022239245A1 (en) | Training method, inference method, training device, inference device, and program | |
He et al. | Bayesian attribute bagging-based extreme learning machine for high-dimensional classification and regression | |
Smith et al. | Faster variational quantum algorithms with quantum kernel-based surrogate models | |
Zhu et al. | A hybrid model for nonlinear regression with missing data using quasilinear kernel | |
Wang et al. | Semisupervised transfer learning for evaluation of model classification performance | |
KR102441442B1 (en) | Method and apparatus for learning graph convolutional network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21941975 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023520731 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21941975 Country of ref document: EP Kind code of ref document: A1 |