WO2023166564A1

WO2023166564A1 - Estimation device

Info

Publication number: WO2023166564A1
Application number: PCT/JP2022/008617
Authority: WO
Inventors: バトニヤマエンケタイワン; 光土田; 邦大伊東; 勇寺西
Original assignee: 日本電気株式会社
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2023-09-07

Abstract

An estimation device 400 has: an acquisition unit 421 that acquires a plurality of inference results that are respectively inferred as a result of inputting, to a learning model, multiple pieces of candidate data created on the basis of information indicating an unknown attribute candidate; a calculation unit 422 that for each inference result calculates the distance between the inference result acquired by the acquisition unit 421 and a label corresponding to the candidate data; and an estimation unit 423 that estimates an unknown attribute value in accordance with the result calculated by the calculation unit 422.

Description

estimation device

The present invention relates to an estimation device, an estimation method, and a recording medium.

A known technique is to estimate the data used during learning based on the output from a learning model for the purpose of risk assessment of a learning model learned using machine learning.

For example, Non-Patent Document 1 describes a method of outputting a plausible attribute value by executing a predetermined process with known attributes and true labels of target data as inputs. For example, according to Non-Patent Document 1, an estimated label to be output from a decision tree is calculated by fixing an unknown attribute to be estimated at a certain value. After that, the error function assumed is used to calculate the deviation between the true label and the estimated label, and the marginal probability is evaluated using the calculated deviation as a weight. According to Non-Patent Document 1, for example, a likely attribute value is identified as a result of the above processing. As a technology related to Non-Patent Document 1, for example, there is also a technique such as Non-Patent Document 2.

Also, as a document describing machine learning, there is, for example, Patent Document 1. For example, Patent Document 1 describes giving acquired data to a trained machine learning model, causing the trained machine learning model to perform predetermined inference, and as a result, obtaining an inference result for the data. ing.

International Publication No. 2021/014878

In the case of the techniques described in Non-Patent Document 1 and Non-Patent Document 2, it is necessary to assume the shape of the error function, to know the marginal probability, etc. Various prior knowledge is required for estimation. , an assumption was necessary. Therefore, there is a problem that data cannot be estimated accurately when the above knowledge is not available or assumptions are not made.

Therefore, an object of the present invention is to provide an estimating device, an estimating method, and a recording medium that can solve the above-described problems.

In order to achieve such an object, the estimating device, which is one aspect of the present disclosure,
an acquisition unit for acquiring a plurality of inference results respectively inferred as a result of inputting a plurality of candidate data created based on information indicating an unknown attribute candidate to a learning model;
a calculation unit that calculates a distance between the inference result acquired by the acquisition unit and a label corresponding to the candidate data for each inference result;
an estimation unit that estimates the value of an unknown attribute according to the result calculated by the calculation unit;
It has a configuration of

In addition, an estimation method, which is another form of the present disclosure,
The information processing device
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
It is configured to estimate the value of an unknown attribute according to the calculated result.

In addition, a recording medium that is another aspect of the present disclosure includes:
information processing equipment,
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
It is a computer-readable recording medium recording a program for realizing a process of estimating the value of an unknown attribute according to the calculated result.

According to each configuration as described above, it is possible to provide an estimation device, an estimation method, and a recording medium capable of accurately estimating data.

It is a figure which shows the structural example of the risk-evaluation system in the 1st Embodiment of this invention. 3 is a block diagram showing a configuration example of a model storage device; FIG. It is a block diagram which shows the structural example of a risk-evaluation apparatus. It is a figure which shows an example of prior information. It is a figure which shows another example of prior information. It is a flowchart which shows the operation example of the risk-evaluation apparatus at the time of attribute estimation. 4 is a flowchart showing an operation example of the risk evaluation device during risk evaluation; It is a figure which shows another example of prior information. It is a figure which shows the hardware structural example of the estimation apparatus in 2nd Embodiment of this indication. It is a block diagram which shows the structural example of an estimation apparatus.

[First embodiment]
A first embodiment of the present disclosure will be described with reference to FIGS. 1 to 8. FIG. FIG. 1 is a diagram showing a configuration example of a risk evaluation system 100. As shown in FIG. FIG. 2 is a block diagram showing a configuration example of the model storage device 200. As shown in FIG. FIG. 3 is a block diagram showing a configuration example of the risk evaluation device 300. As shown in FIG. FIG. 4 is a diagram showing an example of the prior information 341. As shown in FIG. FIG. 5 is a diagram showing another example of the prior information 341. As shown in FIG. FIG. 6 is a flowchart showing an operation example of the risk evaluation device 300 during attribute estimation. FIG. 7 is a flowchart showing an operation example of the risk evaluation device 300 during risk evaluation. FIG. 8 is a diagram showing another example of the prior information 341. As shown in FIG.

In the first embodiment of the present disclosure, if some of the attributes that make up the training data used during training of the learning model 241 are partially missing due to reasons such as being concealed, a known attribute A risk assessment system 100 that estimates the values of missing attributes using For example, the risk assessment system 100 knows the values (x 2, ..., x _d ) of some of the attributes (x ₁ , x ₂ _, ..., x _d ) that make up the training data, Suppose we know that an unknown attribute x ₁ can take any of k values (v ₁₁ , . . . , v _1k ). In such a case, the risk assessment system 100 assumes that the unknown attribute X ₁ takes any value of (v ₁₁ , . . . , v _1k ) and creates candidate data corresponding to each value. In addition, the risk evaluation system 100 inputs each created candidate data to the learning model 241 and obtains an inference result corresponding to each candidate data. Then, the risk evaluation system 100 calculates the distance (for example, residual) between each acquired inference result and the known label, and based on the calculation result, the unknown attribute is (v ₁₁ , . . . , v _1k ). In this way, the risk evaluation system 100 described in the present embodiment calculates the distance between the inference result for candidate data created based on known knowledge and the known label, thereby obtaining an unknown attribute value to estimate Moreover, the risk evaluation system 100 can perform risk evaluation according to the risk of leakage of training data based on the result of attribute value estimation.

In addition, in this embodiment, the learning model 241 is generated by supervised learning using a plurality of training data. For example, the learning model 241 includes a plurality of attributes and labels so as to output a label indicating whether or not the patient is ill in response to the input of a plurality of attributes such as gender, age, height, weight, and so on. It is learned using multiple training data. Note that specific examples of attributes and labels are not limited to the above examples, and may be set arbitrarily. Any model such as a decision tree or a neural network may be used as the model trained using the training data. An attribute can also be called an explanatory variable or a feature amount. A label can also be called an objective variable.

Also, the risk evaluation system 100 described in the present embodiment estimates unknown attributes when, for example, the learning model 241 is set in a black box. For example, a model generated by machine learning may have a black box setting in which only the output for the input is disclosed to the user, and a white box setting in which model information such as the model structure and branching conditions are also disclosed. As will be described later, the risk evaluation system 100 in this embodiment can estimate unknown attributes without using information disclosed by white box setting.

FIG. 1 shows a configuration example of the risk assessment system 100 in this embodiment. Referring to FIG. 1, the risk assessment system 100 has, for example, a risk assessment device 300 and a model storage device 200 . As shown in FIG. 1, the risk evaluation device 300 and the model storage device 200 are connected, for example, via a network or the like so that they can communicate with each other.

The model storage device 200 is an information processing device that stores a learning model 241 learned using training data. FIG. 2 shows a configuration example of the model storage device 200 . For example, referring to FIG. 2, the model storage device 200 has a storage unit 240 in which a learning model 241 is stored, a receiving unit 210 , an inference unit 220 and an output unit 230 . For example, the model storage device 200 has an arithmetic device such as a CPU (Central Processing Unit) and a storage device. The model storage device 200 can realize each of the above-described processing units by executing the program stored in the storage device by the arithmetic device.

Note that, as shown in FIG. 2, the learning model 241 stored in the storage unit 240 is learned in advance using a plurality of training data including a plurality of attributes and labels. The learning model 241 may be learned within the model storage device 200 or may be learned outside the model storage device 200 .

Receiving unit 210 receives candidate data, which will be described later, from risk evaluation device 300 . For example, the receiving unit 210 includes values of attributes known to the risk assessment apparatus 300 such as “v ₁₁ , x _{2 ,} . . _. , x _d ” and “v ₁₂ , x ₂ , . Receive training data containing attribute candidates. As an example, the receiving unit 210 receives from the risk assessment device 300 a number of pieces of candidate data corresponding to the number of unknown attribute candidates for the risk assessment device 300 . The receiving unit 210 may receive information other than the above examples, such as identification information, together with the candidate data.

The inference unit 220 inputs each candidate data received by the reception unit 210 to the learning model 241 . As a result of the input, the inference unit 220 acquires an inference label, which is an inference result corresponding to each candidate data.

The output unit 230 transmits the inference label acquired by the inference unit 220 to the risk evaluation device 300 . For example, the output unit 230 may transmit the inference label to the risk assessment apparatus 300 together with the identification information of the candidate data so that the inference label can be determined based on which candidate data. .

For example, as described above, the model storage device 200 has a learning model 241 learned using training data. Also, upon receiving candidate data from the risk evaluation device 300, the model storage device 200 obtains an inference label corresponding to the candidate data by performing inference using the learning model 241 based on the received candidate data. The model storage device 200 then transmits the acquired inference label to the risk evaluation device 300 .

The risk evaluation device 300 is an information processing device that estimates the values of hidden attributes using known knowledge such as information about known attributes. Also, the risk assessment device 300 can perform risk assessment based on the estimation results.

FIG. 3 shows a configuration example of the risk evaluation device 300. FIG. Referring to FIG. 3, the risk assessment device 300 includes, as main components, for example, an operation input unit 310, a screen display unit 320, a communication I/F unit 330, a storage unit 340, and an arithmetic processing unit 350. ,have.

Note that FIG. 3 illustrates a case where the function of the risk evaluation device 300 is realized using one information processing device. However, the risk evaluation device 300 may be implemented using a plurality of information processing devices, such as being implemented on a cloud. For example, the functions of the risk evaluation device 300 include an estimation device having functions as a candidate data creation unit 351, a candidate data transmission unit 352, an inference result acquisition unit 353, a distance calculation unit 354, and an estimation unit 355, and an evaluation unit 356. It may be realized by two information processing devices, one is an evaluation device having a function as the output unit 357 . Moreover, the risk assessment device 300 may not include a part of the above-exemplified configuration such as having no operation input unit or screen display unit, or may have a configuration other than the above-exemplified configuration.

The operation input unit 310 consists of operation input devices such as a keyboard and a mouse. The operation input unit 310 detects the operation of the operator who operates the risk evaluation device 300 and outputs it to the arithmetic processing unit 350 .

The screen display unit 320 consists of a screen display device such as an LCD (Liquid Crystal Display). The screen display unit 320 can display various information stored in the storage unit 340 on the screen in accordance with instructions from the arithmetic processing unit 350 .

The communication I/F unit 330 consists of a data communication circuit and the like. The communication I/F unit 330 performs data communication with an external device such as the model storage device 200 connected via a communication line.

The storage unit 340 is a storage device such as a hard disk or memory. The storage unit 340 stores processing information and programs 345 necessary for various processes in the arithmetic processing unit 350 . The program 345 realizes various processing units by being read and executed by the arithmetic processing unit 350 . The program 345 is read in advance from an external device or recording medium via a data input/output function such as the communication I/F unit 330 and stored in the storage unit 340 . Main information stored in the storage unit 340 includes, for example, advance information 341, inference result information 342, distance information 343, estimation information 344, and the like.

The prior information 341 includes previously known information about training data used during training of the learning model 241 stored in the model storage device 200 . For example, prior information 341 is acquired in advance using a method such as being acquired from an external device via communication I/F unit 330 or being input using operation input unit 310, and is stored in storage unit 340. ing.

FIG. 4 shows an example of the prior information 341. Referring to FIG. 4, the prior information 341 includes partial training data information and missing attribute information. For example, as shown in FIG. 4, the prior information 341 can include a plurality of pieces of information in which partial training data information and missing attribute information are associated with each other.

Here, the partial training data information indicates known attribute values and corresponding labels in a state in which some attributes of the training data used for learning the learning model 241 are concealed (deleted). . For example, FIG. 4 illustrates a case where attributes (x ₂ , . . . , x _d ) and label y are known and attribute x ₁ is missing. Missing attribute information indicates information about the value of the missing attribute. For example, FIG. 4 shows that the missing attribute x ₁ takes one of k values (v ₁₁ , . . . , v _1k ). Note that in the present embodiment, missing attributes are, for example, categorical variables (discrete variables).

Also, the advance information 341 can include information other than the information illustrated in FIG. For example, FIG. 5 shows another example of the prior information 341. As shown in FIG. For example, referring to FIG. 5, the prior information 341 can include, in addition to the information exemplified above, information indicating the marginal probability that the missing attribute takes the value of each candidate. For example, the a priori information can include information indicating the marginal probabilities corresponding to each candidate (v ₁₁ , . . . , v _1k ) for the unknown attribute x ₁ , as shown in FIG. The prior information 341 may include information other than the above examples.

The inference result information 342 includes information indicating an inference label obtained by inputting candidate data created by the candidate data creation unit 351 based on the prior information 341 to the learning model 241, which will be described later. For example, the inference result information 342 may include information indicating inference labels corresponding to the number of candidates for missing attributes. For example, the inference result information 342 is generated and updated in response to an inference label acquired from the model storage device 200 by an inference result acquisition unit 353 (to be described later).

The distance information 343 includes information indicating the result of calculating the distance between the inference label included in the inference result information 342 and the label used as training data by the distance calculation unit 354, which will be described later. For example, the distance information 343 may include information indicating distances according to the number of candidates for missing attributes. For example, the distance information 343 is generated and updated as the distance calculator 354 calculates the distance between the inference labels.

The estimation information 344 includes information indicating the result of estimation based on the distance information 343 by the estimation unit 355, which will be described later. For example, the estimation information 344 may include information indicating values of attributes estimated by the estimation unit 355 among unknown attribute candidates. For example, the estimation information 344 is generated and updated when the estimation unit 355 estimates a plausible value for an unknown attribute among the candidates based on the distance between the inference labels. .

The arithmetic processing unit 350 has an arithmetic device such as a CPU and its peripheral circuits. The arithmetic processing unit 350 reads the program 345 from the storage unit 340 and executes it, so that the hardware and the program 345 work together to realize various processing units. Main processing units realized by the arithmetic processing unit 350 include, for example, a candidate data creation unit 351, a candidate data transmission unit 352, an inference result acquisition unit 353, a distance calculation unit 354, an estimation unit 355, an evaluation unit 356, and an output unit. 357, etc.

The candidate data creation unit 351 creates candidate data based on the prior information 341. For example, the candidate data creation unit 351 creates candidate data according to the number of candidates indicated by the missing attribute information. The candidate data creation unit 351 may create candidate data at any timing.

Specifically, for example _, as the prior information 341, partial _training data information (x ₂ _, _. ) is stored. In this case, _the candidate data generating unit 351 assumes that the unknown attribute x ₁ takes any value of (v ₁₁ , _. . . , v _1k ), and the candidate Create data. That is, the candidate data creating unit 351 creates candidate data (v ₁₁ , x ₂ , ..., x _d ), ..., (v _1k , x ₂ , ..., x _d ).

It should be noted that, as described above, the prior information 341 can include a plurality of pieces of information in which partial training data information and missing attribute information are associated with each other. The candidate data creation unit 351 may create candidate data using the method described above for each of the associated information.

The candidate data transmission unit 352 transmits the candidate data created by the candidate data creation unit 351 to the model storage device 200 . The candidate data transmission unit 352 may transmit, together with the candidate data, identification information of the candidate data according to the partial training data information used when creating the candidate data.

The inference result acquisition unit 353 receives and acquires an inference label from the model storage device 200 as a result of inference based on candidate data. For example, the inference result acquisition unit 353 acquires the inference label from the model storage device 200 together with the identification information so that the inference target candidate data can be identified. The inference result acquisition unit 353 also stores the received inference label as the inference result information 342 in the storage unit 340 . The inference result acquisition unit 353 may store the inference label in the storage unit 340 together with the identification information of the corresponding candidate data.

Based on the prior information 341 and the inference result information 342, the distance calculation unit 354 calculates the distance between the inference label and the label included in the partial training data information from which the candidate data to be inferred was created. calculate. That is, the distance calculation unit 354 calculates the distance between the inference label and the label corresponding to the inference target candidate data. Further, the distance calculation unit 354 stores the calculated distance in the storage unit 340 as the distance information 343 . The distance calculation unit 354 may store the calculation result in the storage unit 340 together with the identification information of the corresponding candidate data.

Specifically, for example, the distance calculation unit 354 calculates a residual between the inference label and the label as the distance between the inference label and the label. For example, let the label be denoted as y and the inference label be denoted as number 1. In this case, the distance calculation unit 354 calculates the residual between the inference label and the label by calculating Equation 2, which will be described later.
Note that i takes any value from 1 to k.

For example, as described above, the distance calculation unit 354 calculates the residual between the inference labels as the distance between the inference labels. The distance calculation unit 354 may be configured to calculate the distance between the inference label and the label using a known method other than the exemplified one, such as calculating a value that is twice the value of Equation 2 as the distance.

Based on the distance information 343, the estimation unit 355 estimates the value of an attribute that is likely to be an unknown attribute among the candidates. The estimation unit 355 also stores the result of estimation in the storage unit 340 as estimation information 344 .

For example, the estimation unit 355 identifies a candidate with the smallest distance based on the distance information 343, and estimates a value according to the identified result. Specifically, for example, the estimation unit 355 identifies i′ by solving Equation 3 below. Then, v _1i' corresponding to the specified i' is output as a plausible attribute value. Note that i' takes any value from 1 to k.

It should be noted that there may be a plurality of i' when the residual is 0, for example. In this case, for example, the estimating unit 355 can select one of the plurality of i' at random, and output v _1i' according to the selected result. Further, as described above, the prior information 341 may include information indicating marginal probabilities. In this case, the estimator 355 may select one of the multiple i' based on marginal probabilities. For example, the estimating unit 355 can select i′ having the maximum marginal probability among a plurality of i′. Also, the estimation unit 355 may be configured to select i' with a probability corresponding to the marginal probability. In this way, the estimating unit 355 is configured to select one of the plurality of i' by an arbitrary method when there are a plurality of i' and to output v _1i' according to the selected result. good. When there are a plurality of i', the estimator 355 may be configured to output a plurality of v _1i ' corresponding to each of the plurality of i'.

The evaluation unit 356 performs evaluation based on the estimation information 344. In other words, the evaluation unit 356 performs risk evaluation based on the result of estimation by the estimation unit 355 .

For example, the evaluation unit 356 has correct answer information, which is information indicating what value the unknown attribute indicated by the prior information 341 was actually. For example, in the case of FIG. 4, the evaluation unit 356 has correct answer information indicating which value of (v ₁₁ , . . . , v _1k ) x ₁ is. The evaluation unit 356 can compare the result of estimation by the estimation unit 355 and the actual value indicated by the correct answer information, and perform risk evaluation based on the comparison result. For example, the evaluation unit 356 can evaluate that the risk is high when the result of estimation by the estimation unit 355 and the actual value indicated by the correct answer information match. On the other hand, when the result of estimation by the estimation unit 355 and the actual value indicated by the correct answer information do not match, the evaluation unit 356 can evaluate that the risk is low.

As described above, the prior information 341 includes a plurality of pieces of information in which partial training data information and missing attribute information are associated with each other. Therefore, the estimating unit 355 can estimate a candidate for each of the associated information. Therefore, for example, the evaluation unit 356 may perform risk evaluation based on a comparison result between a plurality of estimation results by the estimation unit 355 and correct information corresponding to each estimation. Specifically, for example, the evaluation unit 356 calculates the percentage of correct answers indicating the percentage of matches between the estimation results and the correct information, according to the results of a plurality of comparisons. Then, the evaluation unit 356 can output, for example, the calculated percentage of correct answers as the information indicating the risk. The evaluation unit 356 may be configured to evaluate risk according to whether or not the calculated percentage of correct answers exceeds a predetermined threshold, and output the evaluation result.

The output unit 357 outputs information indicating candidates estimated by the estimation unit 355, information indicating evaluation results by the evaluation unit 356, and the like. For example, the output unit 357 displays each of the above information on the screen display unit 320 or transmits the information to an external device via the communication I/F unit 330 .

The above is a configuration example of the risk evaluation device 300. Next, an operation example of the risk assessment device 300 will be described with reference to FIGS. 6 and 7. FIG.

First, an operation example of the risk evaluation device 300 when estimating an unknown attribute will be described with reference to FIG. FIG. 6 is a flowchart showing an operation example of the risk evaluation device 300 when estimating an unknown attribute. Referring to FIG. 6, the candidate data creating unit 351 creates candidate data based on the prior information 341 (step S101). For example, the candidate data creation unit 351 creates candidate data according to the number of candidates indicated by the missing attribute information.

The candidate data transmission unit 352 transmits each candidate data created by the candidate data creation unit 351 to the model storage device 200 (step S102).

The inference result acquisition unit 353 acquires an inference label for each candidate data from the model storage device 200 as an inference result based on the candidate data (step S103).

Based on the inference label acquired by the inference result acquisition unit 353, the distance calculation unit 354 calculates the distance between the inference label and the training label indicated by the corresponding partial training data information (step S104). For example, the distance calculation unit 354 calculates the residual between each received inference label and the label as the distance.

Based on the result calculated by the distance calculation unit 354, the estimation unit 355 estimates a plausible value for the unknown attribute among the candidates (step S105). For example, the estimation unit 355 identifies a candidate with the smallest distance based on the distance information 343, and estimates a value according to the identified result.

The above is a configuration example of the risk evaluation device 300 at the time of attribute estimation. For example, the risk evaluation device 300 can perform the processing from step S101 to step S105 for each target to be estimated.

Next, an operation example of the risk evaluation device 300 during risk evaluation will be described with reference to FIG. FIG. 7 is a flowchart showing an operation example of the risk evaluation device 300 during risk evaluation. Referring to FIG. 7, the risk evaluation device 300 performs the process of estimating unknown attributes described with reference to FIG. 6 (step S201).

When the estimation target remains in the prior information 341 (step S202, No), the risk evaluation device 300 returns to the process of step S201 and performs the estimation process. On the other hand, when there is no estimation target in the prior information 341 (step S202, Yes), the risk evaluation device 300 performs risk evaluation according to each estimation result (step S203). For example, the risk assessment device 300 can calculate the percentage of correct answers based on the results of comparison between the result of each estimation and the correct answer information corresponding to each estimation, and output according to the calculated percentage of correct answers.

The above is an example of the operation of the risk evaluation device 300 during risk evaluation. Note that the process of step S203 does not necessarily have to be performed continuously after the processes of steps S201 and S202. For example, the process of step S203 may be performed at any timing after the processes of steps S201 and S202.

Thus, the risk evaluation device 300 has the distance calculation unit 354 and the estimation unit 355. According to such a configuration, the estimation unit 355 can estimate the value of the attribute that is likely to be the unknown attribute among the candidates, based on the distance between the estimated labels calculated by the distance calculation unit 354. . That is, according to the above configuration, unknown attribute values can be estimated without assuming error functions or knowledge of marginal probabilities. As a result, the data can be estimated more accurately even when the user does not have the above knowledge or makes no assumptions, that is, even when no prior knowledge is assumed.

Note that the present embodiment has exemplified the case where there is one unknown attribute x ₁ . However, the present invention can be applied without problems even when there are multiple unknown attributes.

For example, FIG. 8 shows an example of prior information 341 when there are multiple unknown attributes from _x1 to _xn . For example, FIG. 8 illustrates a case where attributes (x _n+1 , . . . , x _d ) and label y are known and attributes (x ₁ , . . . , x _n ) are missing. In this case, the missing attribute information indicates information about the value of each missing attribute. Note that, as illustrated in FIG. 5, the prior information 341 may include information indicating the marginal probability of each candidate even when there are a plurality of unknown attributes.

When there are a plurality of unknown attributes as shown in FIG. 8, the candidate data creation unit 351 assumes that each unknown attribute takes one of the candidates, and creates a number of candidate data corresponding to the combination of the unknown attribute candidates. create. From the candidate data transmission unit 352 onward, processing can be performed in the same manner as when there is one unknown attribute. For example, when there are a plurality of i', the estimating unit 355 may select one of the plurality of i' at random, for example, as in the case where there is one unknown attribute, or the marginal probability may be You can choose accordingly. For example, as described above, even when there are a plurality of unknown attributes, the same processing as in the case where there is one unknown attribute is performed except that the number of candidate data created by the candidate data creating unit 351 increases. Unknown attribute values can be estimated.

In addition, in this embodiment, the case where the risk evaluation system 100 has the model storage device 200 and the risk evaluation device 300 is exemplified. However, the risk evaluation system 100 may be composed of, for example, one information processing device having the functions of the model storage device 200 and the risk evaluation device 300 described in this embodiment. Risk assessment system 100 may employ other known variations.

[Second embodiment]
Next, a second embodiment of the present disclosure will be described with reference to FIGS. 9 and 10. FIG. FIG. 9 is a diagram illustrating a hardware configuration example of the estimation device 400. As illustrated in FIG. FIG. 10 is a block diagram showing a configuration example of the estimation device 400. As shown in FIG.

In a second embodiment of the present disclosure, a configuration example of an estimation device 400, which is an information processing device that estimates unknown attribute values based on information about known attributes, will be described. FIG. 9 shows a hardware configuration example of the estimation device 400 . Referring to FIG. 9, the estimating device 400 has the following hardware configuration as an example.
- CPU (Central Processing Unit) 401 (arithmetic unit)
・ROM (Read Only Memory) 402 (storage device)
・RAM (Random Access Memory) 403 (storage device)
Program group 404 loaded into RAM 403
- Storage device 405 for storing program group 404
- A drive device 406 that reads and writes a recording medium 410 outside the information processing device
- A communication interface 407 that connects to a communication network 411 outside the information processing apparatus
An input/output interface 408 for inputting/outputting data
A bus 409 connecting each component

Also, the estimating apparatus 400 can realize the functions of the acquiring unit 421, the calculating unit 422, and the estimating unit 423 shown in FIG. The program group 404 is stored in the storage device 405 or the ROM 402 in advance, for example, and is loaded into the RAM 403 or the like by the CPU 401 as necessary and executed. The program group 404 may be supplied to the CPU 401 via the communication network 411 or stored in the recording medium 410 in advance, and the drive device 406 may read the program and supply it to the CPU 401 .

Note that FIG. 9 shows a hardware configuration example of the estimation device 400 . The hardware configuration of estimation device 400 is not limited to the case described above. For example, the estimating device 400 may be configured from part of the configuration described above, such as not having the drive device 406 .

The acquisition unit 421 acquires a plurality of inference results that are respectively inferred as a result of inputting a plurality of candidate data created based on information indicating unknown attribute candidates to the learning model.

The calculation unit 422 calculates the distance between the inference result acquired by the acquisition unit 421 and the label corresponding to the candidate data for each inference result. For example, the calculator 422 calculates the residual as the distance.

The estimation unit 423 estimates the unknown attribute value according to the result calculated by the calculation unit 422 .

In this way, the estimation device 400 has the calculation unit 422 and the estimation unit 423. According to such a configuration, the estimation unit 423 can estimate the value of the unknown attribute based on the distance calculation result of the calculation unit 422 . That is, according to the above configuration, unknown attribute values can be estimated without assuming error functions or knowledge of marginal probabilities. As a result, the data can be estimated more accurately even when the user does not have the above knowledge or makes no assumptions, that is, even when no prior knowledge is assumed.

Note that the estimation device 400 described above can be realized by installing a predetermined program in an information processing device such as the estimation device 400 . Specifically, a program that is another aspect of the present invention inputs a plurality of candidate data created based on information indicating unknown attribute candidates to an information processing device such as the estimation device 400 to a learning model. Obtain multiple inference results that are respectively inferred as a result, calculate the distance between the obtained inference result and the label corresponding to the candidate data for each inference result, and determine the unknown attribute according to the calculated result. This is a program for realizing the process of estimating the value of

In addition, in the estimation method executed by the information processing apparatus such as the estimation apparatus 400 described above, the information processing apparatus such as the estimation apparatus 400 uses a plurality of candidate data created based on information indicating unknown attribute candidates as a learning model. For each inference result, obtain multiple inference results to be inferred as a result of each input, calculate the distance between the obtained inference result and the label corresponding to the candidate data for each inference result, and calculate the distance according to the calculated result It is a method of estimating the value of an unknown attribute by using

Even in the invention of the program, the computer-readable recording medium recording the program, or the estimation method having the configuration described above, in order to have the same effects and effects as the estimation device 400 described above, The objects of the present invention described above can be achieved.

<Appendix>
Some or all of the above embodiments may also be described as the following appendices. An outline of the estimation device and the like according to the present invention will be described below. However, the present invention is not limited to the following configurations.

(Appendix 1)
an acquisition unit for acquiring a plurality of inference results respectively inferred as a result of inputting a plurality of candidate data created based on information indicating an unknown attribute candidate to a learning model;
a calculation unit that calculates a distance between the inference result acquired by the acquisition unit and a label corresponding to the candidate data for each inference result;
an estimation unit that estimates the value of an unknown attribute according to the result calculated by the calculation unit;
an estimator.
(Appendix 2)
The estimating device according to Supplementary Note 1,
The calculation unit calculates a residual between the inference result and the label as a distance between the inference result and the label,
The estimation device, wherein the estimation unit estimates a value of an unknown attribute according to the residual calculated by the calculation unit.
(Appendix 3)
The estimating device according to Supplementary Note 1 or Supplementary Note 2,
The estimating device, wherein the estimating unit estimates a value according to the specified result by specifying a candidate with the smallest distance according to the result calculated by the calculating unit.
(Appendix 4)
The estimating device according to any one of Supplements 1 to 3,
The estimation device, wherein the estimation unit estimates the value of the unknown attribute using information indicating the marginal probability of the unknown attribute candidate.
(Appendix 5)
The estimating device according to any one of Supplements 1 to 4,
a creation unit that creates candidate data corresponding to each unknown attribute candidate based on information about known attributes and information indicating unknown attribute candidates;
The acquisition unit acquires an inference result inferred as a result of inputting the plurality of candidate data created by the creation unit to the learning model.
(Appendix 6)
The estimating device according to Supplementary Note 5,
The estimating device, wherein, when there are a plurality of unknown attributes, the creating unit creates candidate data according to a combination of candidates of the plurality of unknown attributes.
(Appendix 7)
The estimating device according to any one of Supplements 1 to 6,
An estimating device, comprising an evaluating unit that performs a predetermined evaluation based on a result of estimation by the estimating unit.
(Appendix 8)
The information processing device
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
An estimation method that estimates the value of an unknown attribute according to the calculated result.
(Appendix 9)
The estimation method according to Appendix 8,
calculating a residual between the inference result and the label as the distance between the inference result and the label;
An estimation method that estimates the value of an unknown attribute according to the computed residuals.
(Appendix 10)
information processing equipment,
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
A computer-readable recording medium that records a program for realizing the process of estimating the value of an unknown attribute according to the calculated result.

Although the present invention has been described with reference to the above-described embodiments, the present invention is not limited to the above-described embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

100 Risk evaluation system 200 Model storage device 210 Reception unit 220 Inference unit 230 Output unit 240 Storage unit 241 Learning model 300 Risk evaluation device 310 Operation input unit 320 Screen display unit 330 Communication I/F unit 340 Storage unit 341 Prior information 342 Inference result Information 343 Distance information 344 Estimation information 350 Calculation processing unit 351 Candidate data creation unit 352 Candidate data transmission unit 353 Inference result acquisition unit 354 Distance calculation unit 355 Estimation unit 356 Evaluation unit 357 Output unit 400 Estimation device 401 CPU
402 ROMs
403 RAM
404 program group 405 storage device 406 drive device 407 communication interface 408 input/output interface 409 bus 410 recording medium 411 communication network 421 acquisition unit 422 calculation unit 423 estimation unit

Claims

an acquisition unit for acquiring a plurality of inference results respectively inferred as a result of inputting a plurality of candidate data created based on information indicating an unknown attribute candidate to a learning model;
a calculation unit that calculates a distance between the inference result acquired by the acquisition unit and a label corresponding to the candidate data for each inference result;
an estimation unit that estimates the value of an unknown attribute according to the result calculated by the calculation unit;
an estimator.
The estimating device according to claim 1,
The calculation unit calculates a residual between the inference result and the label as a distance between the inference result and the label,
The estimation device, wherein the estimation unit estimates a value of an unknown attribute according to the residual calculated by the calculation unit.
The estimating device according to claim 1 or claim 2,
The estimating device, wherein the estimating unit estimates a value according to the specified result by specifying a candidate with the smallest distance according to the result calculated by the calculating unit.
The estimating device according to any one of claims 1 to 3,
The estimation device, wherein the estimation unit estimates the value of the unknown attribute using information indicating the marginal probability of the unknown attribute candidate.
The estimating device according to any one of claims 1 to 4,
a creation unit that creates candidate data corresponding to each unknown attribute candidate based on information about known attributes and information indicating unknown attribute candidates;
The acquisition unit acquires an inference result inferred as a result of inputting the plurality of candidate data created by the creation unit to the learning model.
The estimating device according to claim 5,
The estimating device, wherein, when there are a plurality of unknown attributes, the creating unit creates candidate data according to a combination of candidates of the plurality of unknown attributes.
The estimating device according to any one of claims 1 to 6,
An estimating device, comprising an evaluating unit that performs a predetermined evaluation based on a result of estimation by the estimating unit.
The information processing device
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
An estimation method that estimates the value of an unknown attribute according to the calculated result.
The estimation method according to claim 8,
calculating a residual between the inference result and the label as the distance between the inference result and the label;
An estimation method that estimates the value of an unknown attribute according to the computed residuals.
information processing equipment,
Acquiring multiple inference results that are respectively inferred as a result of inputting multiple candidate data created based on information indicating unknown attribute candidates to a learning model, respectively;
calculating a distance between the obtained inference result and a label corresponding to the candidate data for each inference result;
A computer-readable recording medium that records a program for realizing the process of estimating the value of an unknown attribute according to the calculated result.