WO2020044814A1 - Model updating device, model updating method, and model updating program - Google Patents

Model updating device, model updating method, and model updating program Download PDF

Info

Publication number
WO2020044814A1
WO2020044814A1 PCT/JP2019/027687 JP2019027687W WO2020044814A1 WO 2020044814 A1 WO2020044814 A1 WO 2020044814A1 JP 2019027687 W JP2019027687 W JP 2019027687W WO 2020044814 A1 WO2020044814 A1 WO 2020044814A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
data
condition
unit
node
Prior art date
Application number
PCT/JP2019/027687
Other languages
French (fr)
Japanese (ja)
Inventor
智之 西山
江藤 力
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2020044814A1 publication Critical patent/WO2020044814A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • the present invention relates to a model updating device, a model updating method, and a model updating program for updating a discrimination model.
  • a hierarchical mixed model is known as a model whose determination conditions are clear and easy to interpret.
  • the hierarchical mixed model is a model having a tree structure in which branch conditions (sometimes referred to as discriminant conditions and gate tree conditions) are set for root nodes and node nodes, and discriminants are set for leaf nodes.
  • branch conditions sometimes referred to as discriminant conditions and gate tree conditions
  • discriminants are set for leaf nodes.
  • the conditions for discriminating the target data are generally clear, so that the data can be discriminated only by the branch condition indicated by the node.
  • FIG. 28 is an explanatory diagram showing an example of a discrimination model for performing a binary decision.
  • the discrimination model shown in FIG. 28 has a tree structure, and is a model for discriminating target data at leaf nodes. For example, when performing a binary judgment (for example, true / false judgment) on input information in a certain task, it is necessary to return the same result (object variable) for the same input information. For example, in the business of assessing approval or rejection represented by credit or the like, it is desired to determine the same result (acceptance of financing) for information on customers having the same conditions.
  • the target data can be completely determined.
  • the discrimination model shown in FIG. 28 can be said to be a model that can completely discriminate the target data.
  • FIG. 29 is an explanatory diagram showing another example of the discrimination model for performing the binary determination.
  • the model shown in FIG. 29 is also a model that makes a determination based on the conditions 1 to n similarly to the model shown in FIG.
  • the conditions 1 and 3 illustrated in FIG. 29 it is possible to completely separate the input information by the determination 1.
  • a condition based on information that is not in the teacher data is hidden under the condition n, the result will be different even if the data satisfies the same condition n.
  • Information that is not in the teacher data is generally determined from knowledge and other information.
  • a method of determining target information using a score is also known. Based on the assumption that the analysis result is normally distributed with respect to the population (that is, the law of large numbers), as a method of discriminating input information into binary values, for example, discriminating based on the degree of contribution to the classification value (score) And a method in which an occurrence probability (score) is assigned to each classification based on a reference value to make a determination.
  • Patent Document 1 describes a system state determination support device that determines the state of a system.
  • the apparatus described in Patent Literature 1 discloses a method of generating a discrimination model for discriminating whether a system is in a predetermined state, the reliability of the monitoring information of the system (model reliability), and the monitoring information of the monitoring target to be discriminated by the system. Calculate the reliability (target reliability). Then, based on the model reliability and the target reliability, the threshold used by the determination model for determination is corrected.
  • FIG. 30 is an explanatory diagram showing experimental results of analysis by binary discrimination. For example, it is assumed that the score of the above-described discriminant is output as a discrimination result. Before the analysis, the positive and negative examples were assumed to have the distribution shown in FIG. 30A. However, when the analysis was actually performed, the distribution was close to the distribution illustrated in FIG. 30B. In order to properly remove the negative example in this state, it is necessary to shift the boundary score for determining the positive example to a higher score direction (ie, increase the boundary score from the score S1 to the score S2). This can be seen (see FIG. 30 (c)).
  • gray data data that cannot be clearly discriminated by the discrimination model
  • manual confirmation is required. Therefore, when operating using an existing discriminant model, it is desired that the influence of such gray data be suppressed and the accuracy of the discriminant model be ensured.
  • the result of the discrimination using the discrimination model be completed as much as possible without human intervention.
  • the determination accuracy of the determination model may decrease due to some external factors such as the presence of an unknown explanatory variable, and the gray data may increase.
  • an object of the present invention is to provide a model updating apparatus, a model updating method, and a model updating program that can update a discrimination model so as to improve the discrimination accuracy while maintaining the discrimination conditions of an existing discrimination model.
  • a model updating apparatus is a model updating apparatus for updating a hierarchical mixed model, comprising: a data extracting unit for extracting data classified under a target condition in the hierarchical mixed model; A model in which a data replenishing unit that receives replenishment for a model, a model generating unit that generates a discriminant model using the replenished data, and a node that classifies data that satisfies the target condition are arranged at the top of the hierarchical mixed model And a model updating unit that applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node node, and converts the condition to the discriminant model corresponding to the leaf node for the node node.
  • the method is characterized in that a model indicating that the satisfying data is applied is generated.
  • a model updating method is a model updating method for updating a hierarchical mixed model, in which data classified under a target condition in the hierarchical mixed model is extracted, and replenishment for the extracted data is received.
  • a discriminant model is generated using the supplemented data, a node node for classifying data satisfying the target condition is generated at the top of the hierarchical mixed model, and a model is generated. Applying data that does not satisfy the condition to a hierarchical mixed model corresponding to a leaf node with respect to, and generating a model indicating that data satisfying the condition is applied to a discriminant model corresponding to a leaf node with respect to a node node .
  • the model update program is a model update program that is applied to a computer that updates a hierarchical mixed model.
  • the computer updates the computer with data classified under the conditions targeted in the hierarchical mixed model.
  • Hierarchical data extraction processing to extract to extract
  • data replenishment processing to accept replenishment for the extracted data
  • model generation processing to generate a discriminant model using the replenished data
  • node nodes that classify data that satisfies the target conditions
  • a model updating process for generating a model arranged at the top of the type mixed model is executed, and in the model updating process, data that does not satisfy the conditions is applied to the hierarchical mixed model corresponding to the leaf node for the node node, Applying data that satisfies the conditions to the discriminant model corresponding to a leaf node Characterized in that to generate the model shown.
  • the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination conditions of the existing discrimination model.
  • FIG. 1 is a block diagram illustrating a configuration example of a first embodiment of a model updating device according to the present invention. It is explanatory drawing which shows the example of the discrimination model by a hierarchical mixture model.
  • FIG. 9 is an explanatory diagram illustrating an example of a data determination result.
  • FIG. 14 is an explanatory diagram illustrating an example of a process in which data is excluded.
  • 6 is a flowchart illustrating an operation example of the model updating device.
  • FIG. 9 is an explanatory diagram illustrating another example of a data determination result.
  • 13 is a flowchart illustrating another operation example of the model updating device.
  • FIG. 9 is an explanatory diagram illustrating an example of a process of determining gray data.
  • FIG. 13 is an explanatory diagram illustrating another example of the process of determining gray data. It is a block diagram showing the example of composition of the second embodiment of the model updating device by the present invention.
  • FIG. 9 is an explanatory diagram illustrating an example of a discrimination model serving as a reference.
  • FIG. 9 is an explanatory diagram illustrating an example of a hierarchical mixture model generated using first learning data.
  • FIG. 9 is an explanatory diagram illustrating an example of a determination result.
  • FIG. 9 is an explanatory diagram illustrating an example of a determination result.
  • FIG. 9 is an explanatory diagram illustrating an example of a process of generating a new hierarchical mixed model.
  • FIG. 9 is an explanatory diagram illustrating an example of a generated hierarchical mixed model.
  • FIG. 9 is an explanatory diagram showing an example of displaying the properties of each discriminant.
  • FIG. 11 is an explanatory diagram illustrating an example of a result of classifying data using a discrimination model.
  • FIG. 11 is an explanatory diagram illustrating an example of a result of classifying data using an updated discrimination model. It is a block diagram showing the outline of the model updating device by the present invention.
  • FIG. 2 is a schematic block diagram illustrating a configuration of a computer according to at least one embodiment.
  • FIG. 4 is an explanatory diagram illustrating an example of a discrimination model for performing a binary determination.
  • FIG. 11 is an explanatory diagram illustrating another example of a discrimination model that performs a binary determination.
  • FIG. 9 is an explanatory diagram showing experimental results of analysis by binary discrimination.
  • FIG. 1 is a block diagram showing a configuration example of a first embodiment of a model updating device according to the present invention.
  • the model updating apparatus according to the present invention has a function of updating a hierarchical mixed model among the discrimination models.
  • the hierarchical mixture model is represented by a tree structure, and has a structure in which components are arranged in leaf nodes and a gate function (branch function) indicating a branch condition is arranged in another upper node.
  • the branch condition of the gate function is described using explanatory variables.
  • the input data is branched by a gate function, and is assigned to any of a plurality of components following a root node and each node.
  • FIG. 2 is an explanatory diagram illustrating an example of a discrimination model based on a hierarchical mixed model.
  • the input data is classified into any of four types of leaf nodes based on the conditions 1 to 3, and is determined based on the discriminants Y 1 to Y 4 allocated to each leaf node.
  • a discriminant model as a reference that is suitable for operation. Specifically, it is preferable to select a discrimination model from a plurality of generated discrimination models not only from the viewpoint of accuracy but also from the viewpoint of easy understanding of the user. That is, a model constituted by variables and coefficients that the user is most satisfied with can be said to be a model that is easy for the user to understand.
  • a model updating apparatus 100 includes a storage unit 10, an input unit 20, a data extracting unit 30, a data replenishing unit 40, a model generating unit 50, a model updating unit 60, And an output unit 70.
  • the storage unit 10 stores data to be determined. Further, the storage unit 10 may store various parameters necessary for the model generation unit 50 described later to generate a model. The storage unit 10 may store supplementary data received by the data supplementing unit 40 described later.
  • the storage unit 10 is realized by, for example, a magnetic disk or the like.
  • the input unit 20 inputs data to be determined.
  • the input unit 20 may read, for example, data to be determined stored in the storage unit 10 and input the data to the data extraction unit 30. Further, the input unit 20 may receive an instruction to select a branch condition to be extracted by the data extracting unit 30 described later.
  • the ⁇ data extraction unit 30 determines the data classified into the leaf nodes by using the discriminant formula allocated to each leaf node.
  • the data extraction unit 30 performs the same determination on the data classified into other leaf nodes, and totals the determination results of the data classified into each leaf node.
  • the data extraction unit 30 determines the classification result of the data classified under the condition traced from the node. It can be said that the total is calculated. That is, the data extraction unit 30 can be referred to as a determination result totalization unit.
  • the data extraction unit 30 extracts data classified under the target condition in the hierarchical mixed model. Specifically, the data extraction unit 30 may extract data classified under a condition that the data determination result does not satisfy the criteria.
  • the data extraction unit 30 calculates, for each leaf node (that is, a condition under which the data is classified), the ratio of the correctness of the prediction result of the classified data. Specifically, the data extraction unit 30 totals the correct answer ratio, which is the ratio of data that is truly positive and the determination result is also positive among the data classified under each condition.
  • the condition for classifying the leaf node can be explained by a known explanatory variable X. This is because, for example, as shown in FIG. 28 described above, if all the conditions for performing the binary determination on the input information are known, the target data can be completely determined.
  • the Gini coefficient may be used as a reference.
  • a Gini coefficient of 99% is set as a reference.
  • FIG. 3 is an explanatory diagram illustrating an example of a data determination result.
  • the table illustrated in FIG. 3 shows a result of discrimination of each data classified into the conditions (leaf nodes) determined by each of the discriminants Y 1 to Y 4 .
  • the TP (True Positive) illustrated in FIG. 3 is the number of cases where the data of the positive example is determined as a positive example
  • the TN (True Negative) is the number of cases where the data of the negative example is determined as a negative example.
  • FP False Positive
  • FN is the number of cases where the data of the negative example is determined as a positive example.
  • a subscript may be added to each determination result.
  • TP is 100, since TN is 1, the ratio of TP of at least 99%.
  • a result of the determination by the discriminant Y 4 illustrated in FIG. 3 only the TP can be said because it is 50, and can be explained only by the explanatory variable X with respect to the determination result. Therefore, for example, a criterion “the ratio of TP is 99% or more” may be provided.
  • TP is 20, FP is 30, FN 15, and, TN is 20.
  • TP is 0, FP is 20, FN 40, and, TN is 10. From these results, it can be determined that it is difficult to explain or categorize using the explanatory variable X, and that there is a possibility that an unknown explanatory variable X ′ exists.
  • the data extraction unit 30 may extract data classified under the condition that the correct answer ratio is equal to or less than the predetermined threshold.
  • the data extracted in this way is used as learning data for performing more appropriate discrimination.
  • the data extracting unit 30 extracts, as gray data, data classified under the condition that the determination result of the data satisfies the standard.
  • gray data does not mean data that cannot be distinguished or not, but means data that is difficult to distinguish only by given explanatory variables.
  • the data extraction unit 30 satisfies the condition data classified to the determination target by satisfying data and discriminant Y 3 are classified into the discrimination object by discriminant Y 2, extracted, respectively.
  • the data extraction unit 30 may extract data classified under the conditions specified by the user via the input unit 20. Specifically, the data extracting unit 30 may output the totalized determination result to the output unit 70 described later, and may extract data classified under the condition pointed out by the user according to the determination result. .
  • the data extraction unit 30 may extract data classified based on more detailed conditions. A method of classifying the subordinates of the target condition in more detail (that is, performing deep excavation of the condition) will be described in more detail in an embodiment described later.
  • the data replenishment unit 40 receives replenishment for the data extracted by the data extraction unit 30.
  • replenishment of data means so-called machine teaching, which includes adding a value based on a new explanatory variable to the data, updating a teacher label of the data, and the like.
  • machine teaching which includes adding a value based on a new explanatory variable to the data, updating a teacher label of the data, and the like.
  • the data supplementing unit 40 may receive information using the work location as an explanatory variable.
  • the data supplementing unit 40 may receive the teacher label to be changed.
  • the data supplementing unit 40 may output the data group extracted by the data extracting unit 30 in, for example, a file format, and accept a data group in which the user supplements the output data group with information.
  • the model generation unit 50 generates a discrimination model using the supplemented data.
  • the mode of the model generated by the model generation unit 50 is arbitrary.
  • the model generation unit 50 may generate, for example, a simple linear regression model as a discrimination model, or may learn a discrimination model represented by a hierarchical mixed model. By learning the discriminant model represented by the hierarchical mixed model, it becomes possible to clarify the conditions for classifying gray data while maintaining the clarity and interpretability of the discriminant conditions.
  • the model generation unit 50 may generate a plurality of types of discrimination models.
  • the model generation unit 50 causes the output unit 70 described later to output the generated plural types of discrimination models, and allows the user (for example, via the input unit 20) to select a desired discrimination model from the plural types of discrimination models. May be selected.
  • the model updating unit 60 updates the existing discrimination model using the generated new discrimination model. Specifically, the model updating unit 60 generates a model in which node nodes that classify data that satisfy the conditions for which the data extracting unit 30 extracts data are arranged at the top of the hierarchical mixed model. More specifically, the model updating unit 60 applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node, and converts the data that satisfies the condition to the discriminant model corresponding to the leaf node for the node. Generate a model that indicates the application.
  • the model updating unit 60 may generate a discriminant model that uses the discriminant model generated by the model generating unit 50 as a model for filtering data input to the existing discriminant model in advance. Further, the model updating unit 60 may generate a discrimination model in which the generated discrimination model is directly combined with the existing discrimination model. In this way, the model updating unit 60 arranges the generated discrimination model at the top of the hierarchical mixed model.
  • FIG. 4 is an explanatory diagram showing an example of a process in which data is excluded by a new determination model.
  • the new discriminant model M1 illustrated in FIG. 4 is a discriminant model generated by the model generating unit 50, and the existing discriminant model M2 is a discriminant model created in advance as a reference.
  • the model updating unit 60 generates a model in which node nodes are arranged at the highest level such that the new discrimination model M1 and the existing discrimination model M2 are respectively leaf nodes. Specifically, data that satisfies the conditions for the data extraction unit 30 to extract data is applied to the new discrimination model M1.
  • the model generating unit 50 has been extracted by the data extraction unit 30, based on data that meets conditions are classified into the discrimination object by discriminant Y 2 or discriminants Y 3 Assume that a new discriminant model is generated.
  • data to be determined by the new determination model M1 is data that satisfies (not (condition 1) and (condition 2)) or (not (condition 1) and not (condition 2) and (condition 3)). become. Since the data to be discriminated by the new discrimination model M1 is excluded from the discrimination processing of the discrimination model M2, the above-described condition can be called an exclusion condition (exclusion rule) of the existing discrimination model.
  • the data extracting unit 30 converts the data satisfying the target condition into a new discrimination model.
  • the determination is made using M1.
  • the new discrimination model M1 is a hierarchical mixed model, data satisfying the target condition is further classified according to each branch condition, and discrimination processing is performed at each leaf node.
  • the data extraction unit 30 extracts data classified under the condition that the result of the data determination does not satisfy the criteria. In other words, the data extraction unit 30 extracts data that is difficult to explain using the known explanatory variable X, and excludes data that is difficult to discriminate using the discrimination model M2 as preprocessing of the discrimination processing using the existing discrimination model M2. It can be said that.
  • the data extraction unit 30 determines data other than the extracted data using the determination model M2.
  • the data to be subjected to the discrimination processing by the discrimination model M2 can be said to be data that can be explained by the explanatory variable X. Therefore, the result determined by the determination model M2 can be said to be almost 100% reliable, and this result can be applied to various cases.
  • the output unit 70 outputs the information of the discrimination model to an output device such as a display device (not shown).
  • the output unit 70 may output, for example, an updated model.
  • the output unit 70 may output the determination result by the data extraction unit 30 or may output the determination conditions and the discriminant in a plurality of types of the determination models. The specific output mode will be described later.
  • the input unit 20, the data extraction unit 30, the data replenishment unit 40, the model generation unit 50, the model update unit 60, and the output unit 70 include a computer processor (eg, a computer that operates according to a program (model update program)). It is realized by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (field-programmable gate array).
  • a computer processor eg, a computer that operates according to a program (model update program)
  • It is realized by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (field-programmable gate array).
  • the program is stored in the storage unit 10, and the processor reads the program, and according to the program, the input unit 20, the data extraction unit 30, the data supplement unit 40, the model generation unit 50, the model update unit 60, and the output unit 70. It may operate as. Further, the function of the model updating device may be provided in SaaS (Software @ as ⁇ a ⁇ Service ⁇ ) format.
  • the input unit 20, the data extraction unit 30, the data replenishment unit 40, the model generation unit 50, the model update unit 60, and the output unit 70 may each be realized by dedicated hardware.
  • some or all of the components of each device may be realized by a general-purpose or dedicated circuit (circuitry II), a processor, or a combination thereof. These may be configured by a single chip, or may be configured by a plurality of chips connected via a bus. Some or all of the components of each device may be realized by a combination of the above-described circuit and the like and a program.
  • the plurality of information processing devices or circuits may be centrally arranged or distributed. It may be arranged.
  • the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client server system and a cloud computing system.
  • FIG. 5 is a flowchart illustrating an operation example of the model updating device according to the present embodiment.
  • the input unit 20 inputs data to be determined (step S11).
  • the data extraction unit 30 extracts data classified under the target condition (step S12).
  • the data extracting unit 30 may extract, for example, data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold.
  • the data replenishing unit 40 receives replenishment for the extracted data (step S13).
  • the model generation unit 50 generates a discrimination model using the supplemented data (step S14). Then, the model updating unit 60 generates a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model (step S15). Specifically, the model updating unit 60 applies a data that does not satisfy the condition to the hierarchical mixed model, and generates a model indicating that the data that satisfies the condition is applied to the discrimination model.
  • the data extraction unit 30 extracts data classified under the target conditions in the hierarchical mixed model, and the data replenishment unit 40 receives replenishment for the extracted data.
  • the model generation unit 50 generates a discrimination model using the supplemented data.
  • the model updating unit 60 generates a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model. Specifically, the model updating unit 60 applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node node, and applies the data that satisfies the condition to the discriminant model corresponding to the leaf node for the node node. Generate a model that indicates Therefore, the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination condition of the existing discrimination model.
  • the discriminant model is simply updated (reconstructed)
  • the structure, branching condition, discriminant, and the like of the discriminant model selected by the user may change.
  • the model becomes difficult for the user to use.
  • the model updating unit 60 sets a node indicating that the discriminant model is applied to data that reduces the accuracy of the existing discriminant model (that is, data that satisfies the target condition).
  • a model arranged at the top of the hierarchical mixed model is generated. Therefore, a model to which the newly generated discriminant model is added can be generated without changing the structure of the existing discriminant model. Therefore, the discrimination accuracy of the discrimination model can be improved as a whole while satisfying the user's desire to keep using the existing discrimination model.
  • the model updating unit 60 updates the existing discriminant model using the new discriminant model generated by the model generating unit 50
  • the data extraction unit 30 may further extract data classified under the condition that the data discrimination result does not satisfy the criteria, using the discrimination model generated by the model generation unit 50.
  • the data replenishing unit 40 may receive replenishment for the data further extracted by the data extracting unit 30, and the model generating unit 50 may further generate a discriminant model using the replenished data.
  • the data extraction unit 30 may repeat the above processing until the number of extracted data does not decrease (in other words, only data that cannot be determined by a known explanatory variable remains).
  • FIG. 6 is an explanatory diagram illustrating another example of the data determination result.
  • the table illustrated in FIG. 6 shows the determination result of each data classified into the conditions (leaf nodes) determined by each of the discriminants Y 11 to Y 14 similarly to the table illustrated in FIG.
  • the discrimination result illustrated in FIG. 6 has a low ratio of being discriminated as TP, and thus the discrimination result has low validity and is considered to be an unreliable result.
  • the determination result illustrated in FIG. 6 is difficult to determine using the known explanatory variable X, and can be determined to be data affected by the unknown explanatory variable X ′.
  • the data extraction unit 30 may repeat the above processing until only the data as illustrated in FIG. 6 remains.
  • the condition applied to the newly generated discriminant model is an AND combination of the condition specified for each process.
  • the discriminant illustrated in FIG. 6 is a discriminant performed on data satisfying “condition 4” in addition to the conditions illustrated in FIGS. 2 and 3, the exclusion condition (exclusion rule) (Not (condition 1) and (condition 2)) or (not (condition 1) and not (condition 2) and (condition 3)) and (condition 4).
  • FIG. 7 is a flowchart illustrating an operation example of the model updating device of the present modified example.
  • the process of updating the reference discriminant model using the generated discriminant model (specifically, the processes of steps S11 to S15 illustrated in FIG. 5) is performed in advance.
  • the input unit 20 inputs the data to be determined again (step S16).
  • the data extracting unit 30 extracts data classified under the target condition in the model generated by the model updating unit 60 (Step S17).
  • the data replenishing unit 40 further receives replenishment for the extracted data (step S18).
  • the model generation unit 50 generates another discrimination model using the supplemented data (Step S19).
  • the model updating unit 60 generates a model indicating that data satisfying the target condition is applied to another discrimination model (step S20).
  • step S16 to step S20 is repeated until only data that cannot be determined by a known explanatory variable is extracted.
  • the processing from step S16 to step S20 is repeated until only data that cannot be determined by a known explanatory variable is extracted.
  • Embodiment 2 a second embodiment of the model updating device according to the present invention will be described.
  • the first embodiment the method of updating the model by supplementing data extracted as gray data has been described.
  • the conditions for extracting gray data are further deepened to refine the conditions, thereby determining whether the explanatory variables for prediction are satisfied. Further, by using the conditions refined in this way and extracting gray data from input data in advance, the discrimination accuracy of the reference discrimination model is improved.
  • FIG. 8 is an explanatory diagram illustrating an example of a process of determining gray data.
  • Each rectangle in FIG. 8 represents data to be determined.
  • a threshold value for determining a positive example and a negative example is provided, and data having a score smaller than the threshold value is not automatically determined to be a positive example.
  • a rectangle above the horizontal axis represents data of a positive example
  • a rectangle below the horizontal axis represents data of a negative example.
  • the data group S4 smaller than the threshold value S3 is not automatically determined to be a positive example even if the data group is smaller than the threshold value S3 even if the data is greater than 0. Become. That is, although the data group S4 is statistically correct, in the AI determination, it becomes data that is not subject to automatic determination due to the setting of the threshold.
  • FIG. 9 is an explanatory diagram showing another example of the process of determining gray data.
  • the determination results of the data classified into each leaf node are totaled.
  • the unit of the data to be aggregated is sometimes referred to as a zone.
  • a zone that is, a data group under the condition
  • Data belonging to the gray zone is referred to as gray data.
  • a zone for which the determination result is uniquely determined that is, a data group under known explanatory variables and under the condition that all results can be predicted
  • a clean zone Data belonging to the clean zone is referred to as clean data.
  • a zone in which a positive example and a negative example are mixed is a zone in which a determination result is not uniquely determined, and thus can be called a gray zone.
  • each area surrounded by a dotted line indicates a gray zone
  • each area surrounded by an elliptical solid line indicates a clean zone.
  • FIG. 10 is a block diagram showing a configuration example of the second embodiment of the model updating device according to the present invention.
  • the model updating device 200 includes a storage unit 10, an input unit 20, a learning data generation unit 31, a model learning unit 32, a score calculation unit 33, a condition extraction unit 34, a condition generation unit 35, , A filter generation unit 61, and an output unit 70. That is, instead of the data extracting unit 30, the data replenishing unit 40, the model generating unit 50, and the model updating unit 60, the model updating device 200 of the present embodiment includes a learning data generating unit 31, a model learning unit 32, and a score calculating unit 33. , A condition extracting unit 34, a condition generating unit 35, and a filter generating unit 61 are different from the model updating apparatus 100 of the first embodiment. Other configurations are the same as in the first embodiment.
  • the storage unit 10 stores data to be determined and various parameters, as in the first embodiment.
  • the input unit 20 inputs the data to be determined as in the first embodiment.
  • the learning data generation unit 31 generates learning data used when the model learning unit 32 described later learns the hierarchical mixed model.
  • the model learning unit 32 generates a hierarchical mixed model by heterogeneous machine learning using the generated learning data. More specifically, with regard to heterogeneous machine learning, the model learning unit 32 preferably uses FAB (Factorized Asymptotic, Bayesian, inference) that maximizes the lower limit of the information criterion FIC (Factorized, Information, Criterion).
  • FAB Vectorized Asymptotic, Bayesian, inference
  • FIC Fractorized, Information, Criterion
  • the method by which the model learning unit 32 learns the hierarchical mixture model is not limited to the heterogeneous mixture machine learning.
  • the ⁇ score calculation unit 33 calculates a data determination result for each leaf node in the hierarchical mixture model.
  • the condition extraction unit 34 extracts a branch condition for each leaf node based on a predetermined criterion.
  • the condition generator 35 generates a condition combining the extracted conditions.
  • the learning data generation unit 31, the model learning unit 32, the score calculation unit 33, the condition extraction unit 34, and the condition generation unit 35 in the present embodiment differ in target data and criteria to be used depending on the progress of processing.
  • each component unit operates by changing target data and a reference to be used according to the progress of processing.
  • each component may be implemented as a separate component according to the content of each process.
  • the operation of each component will be described along the processing flow.
  • FIG. 11 is an explanatory diagram illustrating an example of a discrimination model serving as a reference.
  • the discriminant model M20 illustrated in FIG. 11 based on the conditions C1 and C2, input data is classified into any of three types of leaf nodes, and discriminants Y 21 to Y 23 arranged in each leaf node. Is determined based on
  • teacher data including a label indicating a determination result is stored in the storage unit 10 in advance.
  • the input unit 20 reads the teacher data stored in the storage unit 10 and inputs the data to the learning data generation unit 31.
  • the learning data generation unit 31 applies the teacher data including the label indicating the input discrimination result to the discrimination model serving as a reference. Then, the learning data generating unit 31 sets learning data in which the discrimination result of the discrimination model matches the label as a positive example, and sets learning data in which the discrimination result and the label differ from each other as a negative example (hereinafter referred to as first learning data). .) Is generated.
  • the learning data generation unit 31 generates learning data using the teacher data determined as TP 1 and TN 1 as a positive example. Similarly, the learning data generation unit 31 generates learning data using the teacher data determined as FP 1 and FN 1 as a negative example.
  • the model learning unit 32 generates a hierarchical mixed model (hereinafter, referred to as a first hierarchical mixed model) by heterogeneous machine learning using the generated first learning data.
  • the first hierarchical mixed model generated here is a model different from the reference discriminant model.
  • a model generated by heterogeneous machine learning is a technique that can be analyzed at various angles, and can be analyzed using explanatory variables of a discriminant model used as a reference.
  • FIG. 12 is an explanatory diagram illustrating an example of a hierarchical mixture model generated using the first learning data.
  • the model learning unit 32 generates, for example, a hierarchical mixture model illustrated in FIG.
  • the first learning data is classified into any of three types of leaf nodes based on the conditions C3 and C4, and the discriminant expressions Y 31 to Y 33 arranged in each leaf node are used. Indicates that it is determined based on
  • the score calculation unit 33 calculates, for each leaf node in the generated first hierarchical mixed model, among the first learning data classified into the leaf node, data in which the first learning data set as a positive example is correctly determined. Is calculated (that is, the ratio of TP). Hereinafter, the calculated ratio is referred to as a first score.
  • FIG. 13 is an explanatory diagram illustrating an example of the determination result.
  • the example shown in FIG. 13 shows that the determination results by the discriminants Y 31 to Y 33 are classified into TP 2 , FP 2 , TN 2 and FN 2 respectively.
  • the discriminant Y 31 since the first learning data of five, which is a positive cases were all correctly determined, explanatory variables to predict for the first learning data of the five is the they meet I can say.
  • the condition extraction unit 34 extracts a branch condition to a leaf node whose first score has been calculated so as to satisfy a predetermined criterion.
  • the criterion defined here is a criterion for determining whether or not data that can be determined by using the explanatory variables used in the first hierarchical mixture model is a leaf node to be classified.
  • this criterion is referred to as a first criterion. That is, the first criterion can be said to be a criterion for determining whether or not a zone (clean zone) for which the determination result is uniquely determined as described above. For example, a criterion of “satisfying 100%” may be set as the first criterion.
  • the first criterion is not limited to “satisfying 100%”, and a predetermined value less than 100 (for example, 0.995) may be set as the first criterion.
  • FIG. 13 illustrates a case where only one branch condition is extracted, the number of branch conditions to be extracted is not limited to one and may be two or more.
  • the model learning unit 32, the score calculation unit 33, and the condition extraction unit 34 repeat the above processing.
  • the number of repetitions depends on the machine resources and the like, but is preferably repeated in units of, for example, several hundred to several thousand.
  • the model learning unit 32 generates a hierarchical mixed model (first hierarchical mixed model) that can be regularized as a whole. Therefore, the data can be classified by performing the learning several hundred to several thousand times.
  • the model learning unit 32 generates a plurality of types of first hierarchical mixed models using the same generated first learning data, for example, by changing initial parameters in heterogeneous machine learning. .
  • the score calculation unit 33 calculates the ratio of TP for each leaf node in the generated first hierarchical mixed model, and the condition extracting unit 34 determines a first criterion for each generated first hierarchical mixed model. The branch condition to the leaf node for which the first score to be satisfied is calculated is extracted.
  • the condition generator 35 generates a condition (hereinafter, referred to as a distinguishable condition) in which branch conditions satisfying the first criterion are combined. Specifically, the condition generating unit 35 generates a distinguishable condition by combining all the extracted branch conditions. Since the discriminable condition is a condition in which zones in which the discrimination result is uniquely determined are combined, it can be said that it is a clean zone specific condition. For example, when a branch condition to Z leaf nodes is extracted, it can be said that Z zones (segments) that can be completely ruled (that is, can be predicted using a known explanatory variable) have been extracted. .
  • the gray zone extraction process is an auxiliary process for efficiently proceeding with the above-described first process (that is, the clean zone extraction process). Since the clean zone has been extracted in the first processing, the learning data generating unit 31 determines the learning data (hereinafter, referred to as the second learning data) excluding the learning data corresponding to the discriminable condition from the first learning data. .) Is generated. For example, suppose that there are 40,000 first learning data items corresponding to the discriminable condition out of 100,000. In this case, the learning data generation unit 31 excludes the corresponding 40,000 cases from 100,000 cases and generates 60,000 second learning data.
  • the ratio of the remaining learning data that is difficult to determine increases. For example, if the number of positive cases becomes 54,000 and the number of negative cases becomes 65,000 by this process, the ratio of the positive case to the negative case becomes 9: 1.
  • the model learning unit 32 generates a hierarchical mixed model (hereinafter, referred to as a second hierarchical mixed model) by heterogeneous machine learning using the generated second learning data.
  • a hierarchical mixed model hereinafter, referred to as a second hierarchical mixed model
  • the score calculation unit 33 For each leaf node in the generated second hierarchical mixed model, the score calculation unit 33 has correctly determined the second learning data that is regarded as a positive example among the second learning data classified into the leaf node. The sum of the ratio and the ratio at which the second learning data set as a negative example was not correctly determined (that is, the ratio of (TP + FN)) is calculated. Hereinafter, the calculated ratio is referred to as a second score.
  • FIG. 14 is an explanatory diagram illustrating an example of the determination result.
  • the example shown in FIG. 14 shows that the discrimination results by the discriminants Y 41 to Y 43 are classified into TP 3 , FP 3 , TN 3 and FN 3 respectively.
  • the condition extraction unit 34 extracts a branch condition to a leaf node whose second score has been calculated so as to satisfy a predetermined criterion.
  • the criterion determined here is a criterion for judging whether or not the data is a leaf node into which data that is difficult to determine only by using an explanatory variable used in the second hierarchical mixed model is classified.
  • this criterion is referred to as a second criterion. That is, the second criterion can be said to be a criterion for determining whether or not the determination result is a zone (gray zone) for which the determination result is not uniquely determined, as described above.
  • a branch condition indicating a zone with a large proportion of gray data can be extracted, and if the value of the second criterion is increased, more branch conditions can be extracted.
  • a criterion of “less than 0.5” can be set.
  • the value to be set is not limited to 0.5, but may be set to a value of 0.7 to 0.8, for example.
  • the model learning unit 32, the score calculation unit 33, and the condition extraction unit 34 repeat the above-described processing as in the case of the first processing.
  • the number of repetitions depends on the machine resources and the like, but is preferably repeated in units of, for example, several hundred to several thousand.
  • the model learning unit 32 similarly to the generation of the first hierarchical mixed model, in the present embodiment, the model learning unit 32 generates a hierarchical mixed model (second hierarchical mixed model) that can be regularized as a whole. Therefore, the data can be classified by performing the learning several hundred to several thousand times.
  • the model learning unit 32 generates a plurality of types of second hierarchical mixed models using the same generated second learning data.
  • the score calculation unit 33 calculates the ratio of TP + FN for each leaf node in the generated second hierarchical mixed model, and the condition extracting unit 34 determines the second reference for each generated second hierarchical mixed model.
  • the branch condition to the leaf node for which the second score that satisfies is satisfied is extracted.
  • the condition generator 35 generates a condition (hereinafter, referred to as a difficult-to-discriminate condition) obtained by combining conditions satisfying the second criterion. Specifically, the condition generating unit 35 generates a difficult-to-discriminate condition by combining all the extracted branch conditions. Since the difficult-to-discriminate condition is a condition combining zones that are difficult to determine only with the given explanatory variables, it can be said that it is a specific condition of the gray zone.
  • the learning data generation unit 31 generates data (hereinafter, referred to as third learning data) excluding the learning data corresponding to the difficult-to-discriminate condition from the second learning data.
  • positive cases and negative examples percentage of data as determined by the discriminant Y 41 is 1: 5
  • positive cases and negative cases the proportion of data which is determined in discriminant Y 42 is 1: It is one.
  • the extraction of data as determined by the discriminant data and discriminant Y 42 is determined in Y 41 as gray data, but also to exclude negative example 6 by simply excluding positive example 2 Will be possible. This makes it possible to increase the ratio of clean data in the learning data.
  • (1) clean zone extraction processing, (2) gray zone extraction processing, and (3) gray data exclusion processing data determined as clean data or gray data is excluded from the learning data.
  • (1) clean zone extraction processing, (2) gray zone extraction processing, and (3) gray data exclusion processing may be repeated. That is, the model learning unit 32 may generate the first hierarchical mixture model using the generated third learning data.
  • the score calculation unit 33 calculates the first score for each leaf node in the first hierarchical mixed model generated using the third learning data. Further, the condition extracting unit 34 extracts a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated. In other words, the condition extracting unit 34 extracts a branch condition to a leaf node that is not determined as a clean zone.
  • the model learning unit 32 uses learning data classified into leaf nodes from which branch conditions have been extracted, and uses a hierarchical mixed model (hereinafter, referred to as a third hierarchical mixed model) that branches conditionally under the leaf nodes. ).
  • a hierarchical mixed model hereinafter, referred to as a third hierarchical mixed model
  • FIG. 15 is an explanatory diagram showing an example of processing for generating a new hierarchical mixed model under a leaf node.
  • the hierarchical mixed model M21 illustrated in FIG. 15 is the same as the hierarchical mixed model illustrated in FIG. In FIG. 15, each balloon indicates the classification results D51 to D53 of the classified data at each leaf node.
  • the number of “ ⁇ ” indicates the ratio of TP data
  • the number of “ ⁇ ” indicates the ratio of other (that is, TN, FP, FN) data.
  • the condition extracting unit 34 extracts a branch condition to the leaf node C7.
  • the model learning unit 32 uses the data of the determination result D51 to generate a third hierarchical mixed model that branches conditionally under the leaf node C7.
  • FIG. 16 is an explanatory diagram illustrating an example of the generated third-layer mixed model.
  • the model learning unit 32 generates a third hierarchical mixed model M23 that branches conditionally under the leaf node C7 illustrated in FIG.
  • the discrimination results D61 to D63 of each leaf node are calculated.
  • the determination result D63 of the learning data classified into discriminant Y 63 indicates a TP100%.
  • condition extracting unit 34 specifies this leaf node as a leaf node satisfying the first criterion (that is, a clean zone), and extracts a branch condition.
  • each leaf node of the generated third hierarchical mixed model may be further performed on each leaf node of the generated third hierarchical mixed model. That is, the score calculation unit 33 calculates the first score for each leaf node in the third hierarchical mixed model, and the condition extraction unit 34 determines whether the branch to the leaf node for which the first score satisfying the first criterion has been calculated. The condition may be extracted, and the model learning unit 32 may use the learning data classified into the leaf nodes to generate a third hierarchical mixed model that branches conditionally under the leaf nodes.
  • the branch condition is digged only for one leaf node.
  • the target for which the branch condition is to be excavated is not limited to one leaf node, but may be two or more leaf nodes.
  • the condition extracting unit 34 extracts a branch condition to the leaf node C9.
  • condition generating unit 35 generates a discriminable condition in which branch conditions extracted as leaf nodes (that is, clean zones) satisfying the first criterion are further combined.
  • the model learning unit 32 determines the model including the deeper branch condition (that is, the third hierarchical mixed model). ) Makes it possible to separate the clean zone and the gray zone under more detailed conditions. As described above, by performing more detailed segmentation, it is possible to further specify nodes that are satisfied with explanatory variables for prediction and nodes that are not satisfied.
  • the filter generation unit 61 generates a condition (hereinafter, referred to as a filter condition) for removing gray zone data (that is, gray data).
  • a filter condition for removing gray zone data satisfying a condition that cannot be predicted by a known explanatory variable.
  • the filter generation unit 61 compares a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated and a branch condition that satisfies the second criterion (that is, a difficult-to-discriminate condition). Combined to generate filter conditions.
  • condition extracting unit 34 may extract a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated, and the condition generating unit 35 may determine a branch condition that satisfies the second criterion. Combination difficulties may be generated.
  • the output unit 70 outputs the filter condition generated by the filter generation unit 61.
  • FIG. 17 is an explanatory diagram showing an example of the determination system.
  • the discrimination system 500 illustrated in FIG. 17 includes a discrimination device 510 and a gray zone removal device 520.
  • the discrimination device 510 discriminates the input data 521 based on the discrimination model M20 used as a reference.
  • the gray zone removal device 520 removes the gray data 522 from the input data based on the filter condition generated by the filter generation unit 61, and inputs the clean data 523 to the determination device 510. As described above, the gray zone removal device 520 removes the gray data 522 in advance, so that the determination result of the clean data input to the determination device 510 is guaranteed.
  • the discrimination system illustrated in FIG. 17 can be said to be a device that selects whether data can be discriminated only by setting conditions using known explanatory variables. Therefore, the discrimination system illustrated in FIG. It can be called a sorting system. Further, since the discriminating system illustrated in FIG. 17 can be realized by the model updating device of the present embodiment, the model updating device of the second embodiment can also be called a discriminable data selection system.
  • the input unit 20, the learning data generation unit 31, the model learning unit 32, the score calculation unit 33, the condition extraction unit 34, the condition generation unit 35, the filter generation unit 61, and the output unit 70 include a program ( (A discriminable data selection program).
  • FIG. 18 is a flowchart illustrating an operation example of the clean zone extraction process performed by the model updating device 200 according to the present embodiment.
  • the input unit 20 inputs the teacher data to the learning data generation unit 31 (Step S21).
  • the learning data generation unit 31 applies the teacher data including the label indicating the input discrimination result to the discrimination model serving as a reference (step S22).
  • the learning data generating unit 31 generates first learning data in which the teacher data whose discrimination result and the label match each other is set as a positive example, and the teacher data whose discrimination result is different from the label is set as a negative example (step S23).
  • the model learning unit 32 generates a first hierarchical mixed model by heterogeneous machine learning using the generated first learning data (step S24).
  • the score calculation unit 33 calculates the ratio of TP for each leaf node in the generated first hierarchical mixture model (Step S25).
  • the condition extraction unit 34 extracts a branch condition to the leaf node for which the first score has been calculated so as to satisfy the first criterion (Step S26).
  • step S24 to step S26 If the number of repetitions of the processing from step S24 to step S26 (that is, the processing from generation of the model to extraction of the branch condition) has not reached the predetermined number (No in step S27), steps S24 to S26 The processing up to is repeated. On the other hand, when the number of repetitions has reached the predetermined number (Yes in step S27), the condition generating unit 35 generates a discriminable condition in which branch conditions satisfying the first criterion are combined (step S28).
  • FIG. 19 is a flowchart illustrating an operation example of a gray zone extraction process performed by the model updating apparatus 200 according to the present embodiment.
  • the learning data generation unit 31 generates second learning data excluding the learning data corresponding to the discriminable condition from the first learning data (step S31).
  • the model learning unit 32 generates a second hierarchical mixed model by heterogeneous machine learning using the generated second learning data (step S32).
  • the score calculation unit 33 calculates the ratio of (TP + FN) for each leaf node in the generated second hierarchical mixed model (step S33).
  • the condition extraction unit 34 extracts a branch condition to the leaf node for which the second score has been calculated so as to satisfy the second criterion (step S34).
  • step S32 to step S34 If the number of repetitions of the processing from step S32 to step S34 (that is, the processing from generation of the model to extraction of the branch condition) does not reach the predetermined number (No in step S35), steps S32 to S34 The processing up to is repeated. On the other hand, when the number of repetitions has reached the predetermined number (Yes in step S35), the condition generating unit 35 generates a difficult-to-discriminate condition combining conditions satisfying the second criterion (step S36).
  • FIG. 20 is a flowchart illustrating an operation example of gray data exclusion processing performed by the model updating apparatus 200 of the present embodiment.
  • the learning data generation unit 31 generates third learning data from which the learning data corresponding to the difficult-to-discriminate condition is excluded from the second learning data (step S41).
  • FIG. 21 is a flowchart showing an operation example of the gray zone deep excavation processing performed by the model updating apparatus 200 of the present embodiment.
  • the model learning unit 32 generates a first hierarchical mixture model using the generated third learning data (Step S51).
  • the score calculation unit 33 calculates a first score (percentage of TP) for each leaf node in the first hierarchical mixed model generated using the third learning data (step S52).
  • the condition extracting unit 34 extracts a branch condition to the leaf node for which the first score that does not satisfy the first criterion has been calculated (step S53).
  • the model learning unit 32 uses the learning data classified into the leaf nodes from which the branch condition has been extracted, and generates a third hierarchical mixed model that branches conditionally under the leaf nodes (step S54).
  • the learning data generation unit 31 applies the reference teacher data to the discrimination model, compares the discrimination result with the label, sets the matched teacher data as a positive example, First learning data is generated using the data as a negative example.
  • the model learning unit 32 generates a first hierarchical mixed model by heterogeneous machine learning using the generated first learning data, and the score calculation unit 33 determines the ratio of the TP for each leaf node (first score). Is calculated.
  • the condition extraction unit 34 extracts a branch condition to the leaf node for which the first score that satisfies the first criterion has been calculated, and the condition generation unit 35 determines a identifiable condition obtained by combining the branch conditions that satisfy the first criterion. Generate.
  • the discriminating condition of the existing discriminant model is maintained by selecting the data according to the filter condition. It can be said that the discrimination model has been updated so as to improve the discrimination accuracy while improving the discrimination accuracy.
  • the model updating device 200 of the present embodiment may include the data replenishing unit 40 and the model generating unit 50 of the first embodiment. That is, the model updating apparatus 200 may perform supplementation of data or learning of a new discrimination model. According to such a configuration, the discrimination accuracy of the existing discrimination model can be further improved.
  • the model learning unit 32 learns a hierarchical mixture model by heterogeneous mixture machine learning. Therefore, it is possible to extract data conditions that can be predicted almost completely with the learning data only, and data conditions that cannot be predicted completely with the learning data. More specifically, data corresponding to the former condition can be subjected to automatic determination by relying on the result of determination by the determination model, and data corresponding to the latter condition is difficult to determine by the determination model. Items can be individually determined. For example, in the case of a pre-examination of a mortgage loan, a case that can be completely distinguished from application information based on rules and a case that is preferably distinguished by a person can be separated.
  • Embodiment 3 a third embodiment of the model updating device according to the present invention will be described.
  • the model updating apparatus of the present invention is applied to a general discrimination problem including multi-value discrimination has been described.
  • a description will be given focusing on a binary discrimination problem in which a clear rule exists behind, and 0/1 can be specified only by a branch condition.
  • the data (TP, TN) for which the device for performing the determination using the discriminant model (hereinafter, referred to as AI (artificial ⁇ intelligence ⁇ )) and the data for which the AI could not be determined (FP, FN) are utilized. If the AI can be separated into data that can be determined only by input information and data that requires an external cause, the data that requires an external cause is excluded from the AI determination, and the AI determination result is almost 100% reliable. become.
  • AI artificial ⁇ intelligence ⁇
  • AI sometimes referred to as AI prediction
  • the explanatory variables include "known variables" used in learning and "unknown variables" not used in learning. 2. Unknown variables are either present or not present at the time of analysis. 3. If the prediction target can be determined only by the known variables, the prediction target is completely correct (determined). 4. When the prediction target is affected by the unknown variable, (1) the prediction target does not completely answer correctly, and (2) even if the learning is performed a plurality of times, the accuracy deteriorates in the discrimination part of the unknown variable. That is, even if the branching is performed correctly with known variables, if an unknown variable appears at the end, the model created from the learning data will not completely match. 5. By repeatedly dividing and predicting the prediction target from various angles with known information a plurality of times, data that can be distinguished from known data is separated.
  • model updating apparatus of the present embodiment is used as an apparatus for updating a discriminating model for crediting a business partner.
  • discrimination model is updated using the model updating device of the first embodiment.
  • the storage unit 10 stores, as an existing discriminant model, a discriminant for discriminating whether or not credit is given to a trading partner in a leaf node of the hierarchical mixed model, and performs binary branching based on an explanatory variable representing information on the credit trading partner.
  • the model in which the condition is set to each node of the hierarchical mixed model is stored.
  • the user determines the model that he / she wants to use for work from among a plurality of existing discrimination models generated in advance.
  • a user selects a model that matches not only the discrimination accuracy but also the branching condition and the prediction formula from the plurality of discrimination models to the operation.
  • FIG. 22 is an explanatory diagram showing an example of a discrimination model based on a hierarchical mixed model.
  • a double-frame rectangle is a root node and a node indicating a branch condition
  • a normal rectangle is a leaf node indicating a discriminant (prediction formula).
  • a transaction partner whose age is less than 30 and whose loan balance is 10,000,000 or more is determined by the prediction formula of prediction formula number 1.
  • FIG. 22 illustrates the number of samples classified into each leaf node at the time of learning, evaluation, and prediction.
  • FIG. 23 is an explanatory diagram showing an example of displaying the properties of each discriminant.
  • the graph illustrated in FIG. 23 shows the result of accumulating (adding) the weights (coefficients) of the explanatory variables in the case where each discriminant (predictive formula) is represented in a linear format for each predictive formula.
  • the output unit 70 may display the hierarchical mixture model in the format illustrated in FIG. 22 or may display the prediction formula in the format illustrated in FIG.
  • the input unit 20 inputs data to be used for determining whether credit is permitted.
  • the ⁇ data extraction unit 30 tallies up the correct answer ratios in which the value of the labeled correct answer label is positive and the credit determination result is also positive among the business partner data classified into each leaf node. Then, the data extracting unit 30 extracts the partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold.
  • the data replenishment unit 40 accepts replenishment of at least one of adding an explanatory variable and updating a correct answer label to the extracted business partner data.
  • the model generation unit 50 generates a discrimination model by using the supplemented business partner data. As in the first embodiment, the model generation unit 50 may generate an arbitrary discrimination model.
  • the model updating unit 60 applies data that does not satisfy the conditions under which the extracted trading partner data is classified to the hierarchical mixed model, and generates a model indicating that data that satisfies the above conditions is applied to the discrimination model.
  • the output unit 70 outputs the generated discrimination model.
  • the output unit 70 may output the generated discrimination model in a format exemplified in FIGS. 22 and 23 described above.
  • the model updating device updates the discriminating model for crediting the trading partner, so that while maintaining the discriminating conditions of the existing discriminating model, the trading partner to be manually confirmed is determined.
  • the data (gray data) shown can be extracted.
  • FIG. 24 is an explanatory diagram showing an example of the result of classifying data using the discrimination model.
  • the data group D11 is classified into the leaf node of the prediction formula number 5 via the classification processing indicated by the thick arrow when the existing discrimination model M12 illustrated in FIG. 22 is used. .
  • the discrimination accuracy at the leaf node is reduced.
  • FIG. 25 is an explanatory diagram showing an example of a result of classifying data by the updated discrimination model.
  • a new discrimination model M11 is generated, and the discrimination model is updated as a whole including the new branch condition M13.
  • the data group D12 including data determined to be incorrect at the leaf node of the prediction formula number 5 is classified into the new discrimination model M11, so that the discrimination accuracy of the existing discrimination model M12 is improved. It becomes possible to do.
  • FIG. 26 is a block diagram showing an outline of a model updating device according to the present invention.
  • the model updating apparatus 80 according to the present invention is a model updating apparatus (for example, the model updating apparatus 100) that updates a hierarchical mixed model, and extracts data classified under a target condition in the hierarchical mixed model.
  • a data extraction unit 81 for example, the data extraction unit 30
  • a data replenishment unit 82 for example, the data replenishment unit 40
  • a model generation that generates a discriminant model using the replenished data
  • a unit 83 for example, the model generating unit 50
  • a model updating unit 84 for example, the model updating unit 60
  • the model updating unit 84 is a model indicating that the data that does not satisfy the condition is applied to the hierarchical mixed model corresponding to the leaf node for the node node, and the data that satisfies the condition is applied to the discriminant model corresponding to the leaf node for the node node.
  • the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination conditions of the existing discrimination model.
  • the model updating device 80 may include a determination result totalizing unit (for example, the data extracting unit 30) that totalizes the determination results of the data classified under each condition. Then, the data extracting unit 81 may extract data classified under the condition that the result of the data determination does not satisfy the criterion. With such a configuration, it is possible to specify and extract a condition portion where the explanatory variables are assumed to be insufficient.
  • a determination result totalizing unit for example, the data extracting unit 30
  • a condition of binary discrimination (for example, a clear rule) defined based on the explanatory variable may be set for each node of the hierarchical mixed model.
  • the determination result totaling unit may total the correct answer ratio, which is the ratio of data that is truly positive and the determination result is also positive among the data classified under each condition.
  • the data extraction unit 81 extracts data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold, applies data that does not satisfy the condition to the hierarchical mixed model, and determines data that satisfies the condition as a discrimination model. May be generated.
  • a discriminant for determining whether credit to a counterparty is permitted is set in a leaf node of the hierarchical mixed model, and a binary branching condition based on an explanatory variable representing information on credit counterparties is used as a hierarchical mixed condition. It may be set for each node of the model.
  • the discrimination result tallying unit tallies the correct answer ratio in which the value of the labeled correct answer label is positive and the credit discrimination result is also positive among the partner data classified into each leaf node.
  • the data extraction unit may extract partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold.
  • the data supplementing unit 82 accepts supplementation of at least one of the addition of the explanatory variable and the update of the correct answer label to the extracted business partner data
  • the model generation unit 83 uses the supplemented business partner data.
  • the model generation unit 83 may learn a discrimination model represented by a hierarchical mixed model. With such a configuration, it is possible to deeply determine the condition of the portion where the explanatory variable is assumed to be insufficient.
  • the model updating device 80 may include an output unit (for example, the output unit 70) that outputs information on the discrimination model. Then, the model generating unit 83 generates a plurality of types of discriminating models, and the output unit outputs the discriminating conditions and the discriminants (for example, the results illustrated in FIGS. 22 and 23) in the plural types of discriminating models. Good. With such a configuration, it is possible to present the content under the deeply digged condition to the user for selection.
  • the data supplementing unit 82 may receive addition of an explanatory variable or update of a teacher label to the extracted data.
  • the data extracting unit 81 may extract data classified under the target condition in the model generated by the model updating unit, and the data replenishing unit 82 may receive replenishment for the extracted data. Further, the model generation unit 83 generates another discrimination model using the supplemented data, and the model update unit 84 generates a model indicating that data satisfying a target condition is applied to another discrimination model. May be. As described above, by generating the repetitive discrimination model, it is possible to further improve the accuracy of the existing discrimination model.
  • FIG. 27 is a schematic block diagram showing a configuration of a computer according to at least one embodiment.
  • the computer 1000 includes a processor 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.
  • model updating device is implemented in the computer 1000.
  • the operation of each processing unit described above is stored in the auxiliary storage device 1003 in the form of a program (model update program).
  • the processor 1001 reads out the program from the auxiliary storage device 1003, expands the program in the main storage device 1002, and executes the above processing according to the program.
  • the auxiliary storage device 1003 is an example of a non-transitory tangible medium.
  • Other examples of non-transitory tangible media include a magnetic disk, a magneto-optical disk, a CD-ROM (Compact Disc Read-only memory), a DVD-ROM (Read-only memory), A semiconductor memory and the like are included.
  • the program When the program is distributed to the computer 1000 via a communication line, the computer 1000 that has received the program may load the program into the main storage device 1002 and execute the above-described processing.
  • the program may be for realizing a part of the functions described above. Further, the program may be a program that realizes the above-described function in combination with another program already stored in the auxiliary storage device 1003, that is, a so-called difference file (difference program).
  • a model updating apparatus that updates a hierarchical mixed model, the data updating unit extracting data classified under a target condition in the hierarchical mixed model, and a supplement for the extracted data.
  • a model in which a data replenishing unit to be accepted, a model generating unit that generates a discriminant model using the replenished data, and a node node for classifying data that satisfies the target condition are arranged at the top of the hierarchical mixed model.
  • a model updating unit for generating wherein the model updating unit applies data that does not satisfy the condition to the hierarchical mixed model that is a leaf node for the node node, and the discrimination model is a leaf node for the node node.
  • a model updating apparatus for generating a model indicating that data satisfying the above condition is applied to the model.
  • a discriminant for judging whether or not credit is given to the trading partner is set in the leaf node of the hierarchical mixed model, and the binary branching condition based on the explanatory variable representing the information on the credit trading partner is set in the hierarchical mixed model.
  • the determination result summarizing unit determines that the value of the labeled correct answer label is positive among the partner data classified into each leaf node, and that the determination result of the credit is also correct. Aggregating the ratios, the data extracting unit extracts the partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold, and the data replenishing unit explains the extracted partner data.
  • the model generating unit receives a supplement of at least one of adding a variable and updating a correct answer label, the model generating unit generates a discrimination model using the supplemented partner data, and the model updating unit executes the extracted transaction.
  • Applying the data that does not meet the conditions for hand data is classified into a hierarchical mixture model, the model updating device according to Note 2 or appendix 3, wherein generating a model illustrating applying the satisfying data to determine model.
  • An output unit that outputs information of the discrimination model is provided, the model generation unit generates a plurality of types of discrimination models, and the output unit outputs discrimination conditions and discriminants in the plurality of types of discrimination models.
  • the model updating device according to any one of supplementary notes 1 to 5.
  • the data extraction unit extracts data classified under the target condition in the model generated by the model update unit, the data replenishment unit receives replenishment for the extracted data, and the model generation unit Generates another discriminant model using the supplemented data, and the model updating unit generates a model indicating that data satisfying the target condition is applied to the other discriminant model.
  • the model updating device according to any one of 7 above.
  • a model updating method for updating a hierarchical mixed model including extracting data classified under a target condition in the hierarchical mixed model, receiving replenishment for the extracted data, and replenishing the extracted data. Generating a discriminant model using the extracted data, generating a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model, and generating the model, Applying data that does not satisfy the condition to the hierarchical mixed model that is a leaf node for a node, and generating a model indicating that data that satisfies the condition is applied to the discriminant model that is a leaf node for the node node.
  • a model updating method characterized by the following.
  • the computer is caused to execute a discrimination result summarization process for summarizing the discrimination results of the data classified under each condition, and in the data extraction process, the data is classified under the condition that the discrimination result does not satisfy the criterion.

Abstract

A data extraction unit 81 extracts data classified under a target condition in a hierarchical mixed model. A data supplementation unit 82 accepts supplementation of the extracted data. A model generation unit 83 uses the supplemented data and generates a discriminant model. A model updating unit 84 generates a model having a joint node arranged at the top of the hierarchical mixed model, said joint node classifying data fulfilling the target condition. The model updating unit 84 also: applies data not satisfying the condition, to the hierarchical mixed model corresponding to a leaf node for the joint node; and generates a model indicating that data satisfying the condition is applied to a discriminant model corresponding to a leaf node for the joint node.

Description

モデル更新装置、モデル更新方法およびモデル更新プログラムModel updating device, model updating method, and model updating program
 本発明は、判別モデルを更新するモデル更新装置、モデル更新方法およびモデル更新プログラムに関する。 << The present invention relates to a model updating device, a model updating method, and a model updating program for updating a discrimination model.
 効率的にデータを処理するため、判別モデルを用いて対象とするデータを判別する業務が一般に行われている。判別モデルには様々なタイプがあるが、判別条件が明確であり、解釈容易性の高いモデルが運用しやすいモデルと言える。 (4) In order to process data efficiently, the task of determining target data using a determination model is generally performed. Although there are various types of discriminating models, a discriminating condition is clear, and a model with high interpretability can be said to be a model that can be easily operated.
 判別条件が明確で解釈しやすいモデルとして、階層型混合モデルが知られている。階層型混合モデルは、根ノードおよび節ノードに分岐条件(判別条件、門木条件、と記すこともある)、葉ノードに判別式がそれぞれ設定された木構造を有するモデルである。例えば、明確なルールが背景にある判別問題では、対象データを判別する条件も一般に明確であるため、ノードに示される分岐条件だけで、データの判別を行うことが可能である。 階層 A hierarchical mixed model is known as a model whose determination conditions are clear and easy to interpret. The hierarchical mixed model is a model having a tree structure in which branch conditions (sometimes referred to as discriminant conditions and gate tree conditions) are set for root nodes and node nodes, and discriminants are set for leaf nodes. For example, in a discrimination problem with a clear rule in the background, the conditions for discriminating the target data are generally clear, so that the data can be discriminated only by the branch condition indicated by the node.
 図28は、二値判定を行う判別モデルの例を示す説明図である。図28に示す判別モデルは木構造を有し、葉ノードで対象とするデータを判別するモデルである。例えば、ある業務において、入力情報に対する二値判定(例えば、真偽判定)を行う場合、同一の入力情報に対しては、同一の結果(目的変数)を返す必要がある。例えば、与信などに代表される承認または謝絶を査定するような業務では、同一の条件を有する顧客の情報に対しては、同一の結果(融資の可否)を決定することが望まれる。 FIG. 28 is an explanatory diagram showing an example of a discrimination model for performing a binary decision. The discrimination model shown in FIG. 28 has a tree structure, and is a model for discriminating target data at leaf nodes. For example, when performing a binary judgment (for example, true / false judgment) on input information in a certain task, it is necessary to return the same result (object variable) for the same input information. For example, in the business of assessing approval or rejection represented by credit or the like, it is desired to determine the same result (acceptance of financing) for information on customers having the same conditions.
 入力情報に対して二値判定を行う条件が全て既知であれば、対象とするデータを完全に判別可能である。例えば、図28に示す条件1~nが、判別に用いられる全ての条件である場合、図28に示す判別モデルは、対象とするデータを完全に判別可能なモデルと言える。 対 象 If all the conditions for performing the binary judgment on the input information are known, the target data can be completely determined. For example, if the conditions 1 to n shown in FIG. 28 are all the conditions used for discrimination, the discrimination model shown in FIG. 28 can be said to be a model that can completely discriminate the target data.
 ただし、通常は、判別に用いられる全ての条件が既知であることは稀であり、完全な判別を行うためには、その他の情報が必要になる。図29は、二値判定を行う判別モデルの他の例を示す説明図である。図29に示すモデルも、図28に示すモデルと同様に、条件1~nに基づいて判別を行うモデルである。図29に例示する条件1および条件3を用いることで、判定1によって入力情報を完全に分離させることが可能になる。一方、条件nの配下に、教師データにない情報に基づく条件が潜んでいる場合、同じ条件nを満たすデータであっても、結果が異なることになる。教師データにない情報は、一般に、知見やその他の情報から判定される。 However, usually, it is rare that all the conditions used for discrimination are known, and other information is required to perform a complete discrimination. FIG. 29 is an explanatory diagram showing another example of the discrimination model for performing the binary determination. The model shown in FIG. 29 is also a model that makes a determination based on the conditions 1 to n similarly to the model shown in FIG. By using the conditions 1 and 3 illustrated in FIG. 29, it is possible to completely separate the input information by the determination 1. On the other hand, if a condition based on information that is not in the teacher data is hidden under the condition n, the result will be different even if the data satisfies the same condition n. Information that is not in the teacher data is generally determined from knowledge and other information.
 図29に示す判定2の処理によって予測を行うためには、「教師データにない情報」を含む学習データに基づいて再学習を行う必要がある。ただし、実際には、学習データに、この「教師データにない情報」、すなわち、他の説明変数が含まれないことが多い。言い換えると、予測に「教師データにない情報」が必要なことが分かっているのであれば、条件1~nだけで、完全な予測はできないことになる。 再 In order to perform prediction by the processing of determination 2 shown in FIG. 29, it is necessary to perform re-learning based on learning data including “information not in teacher data”. However, in practice, the learning data often does not include this "information not in the teacher data", that is, other explanatory variables. In other words, if it is known that “information that is not in the teacher data” is required for prediction, complete prediction cannot be performed only with the conditions 1 to n.
 一方、スコアを用いて対象とする情報を判別する方法も知られている。分析結果が母集団に対し正規分布するという前提(すなわち、大数の法則)に基づくと、入力情報を二値に判別する方法として、例えば、分類値への寄与度(スコア)に基づいて判別する方法や、基準値を境に発生確率(スコア)を各分類に割り当てて判別する方法などが挙げられる。発生確率には、例えば、ロジット(オッズ(=P/(1-P))の対数)が用いられる。 On the other hand, a method of determining target information using a score is also known. Based on the assumption that the analysis result is normally distributed with respect to the population (that is, the law of large numbers), as a method of discriminating input information into binary values, for example, discriminating based on the degree of contribution to the classification value (score) And a method in which an occurrence probability (score) is assigned to each classification based on a reference value to make a determination. As the occurrence probability, for example, logit (logarithm of odds (= P / (1−P))) is used.
 特許文献1には、システムの状態を判別するシステム状態判別支援装置が記載されている。特許文献1に記載された装置は、システムが所定状態にあるか判別するための判別モデルの生成に用いたシステムの監視情報の信頼度(モデル信頼度)と、システムの判別対象の監視情報の信頼度(対象信頼度)を算出する。そして、モデル信頼度と対象信頼度とをもとに、判別モデルが判別で用いる閾値が補正される。 Patent Document 1 describes a system state determination support device that determines the state of a system. The apparatus described in Patent Literature 1 discloses a method of generating a discrimination model for discriminating whether a system is in a predetermined state, the reliability of the monitoring information of the system (model reliability), and the monitoring information of the monitoring target to be discriminated by the system. Calculate the reliability (target reliability). Then, based on the model reliability and the target reliability, the threshold used by the determination model for determination is corrected.
 図30は、二値判別による分析の実験結果を示す説明図である。例えば、上述する判別式のスコアが判別結果として出力されるとする。分析前には、正例と負例は、図30(a)に示す分布になると想定されたが、実際に分析を行うと、図30(b)に例示する分布に近くなった。この状態で負例を適切に除くためには、正例と判別するための境界スコアを、よりスコアの高い方向にシフトさせる(すなわち、境界スコアをスコアS1からスコアS2に高くする)必要があることが分かる(図30(c)参照)。 FIG. 30 is an explanatory diagram showing experimental results of analysis by binary discrimination. For example, it is assumed that the score of the above-described discriminant is output as a discrimination result. Before the analysis, the positive and negative examples were assumed to have the distribution shown in FIG. 30A. However, when the analysis was actually performed, the distribution was close to the distribution illustrated in FIG. 30B. In order to properly remove the negative example in this state, it is necessary to shift the boundary score for determining the positive example to a higher score direction (ie, increase the boundary score from the score S1 to the score S2). This can be seen (see FIG. 30 (c)).
国際公開第2014/020908号International Publication No. WO2014 / 020908
 判別モデルによって明確に判別できないデータ(以下、グレーデータと記すこともある)が増加すると、人手による確認が必要になる。そのため、既存の判別モデルを用いて運用する際には、そのようなグレーデータの影響を抑制し、判別モデルの精度が担保されることが望まれる。 (4) When data that cannot be clearly discriminated by the discrimination model (hereinafter sometimes referred to as gray data) increases, manual confirmation is required. Therefore, when operating using an existing discriminant model, it is desired that the influence of such gray data be suppressed and the accuracy of the discriminant model be ensured.
 例えば、明確なルールが背景にある判別問題では、判別モデルを用いて判別された結果に対しては、極力人手を介すことなく、処理が完結することが好ましい。一方、明確なルールが背景にある判別問題であっても、不明な説明変数の存在など、何らかの外因によって判別モデルの判別精度が低下し、グレーデータが増加する可能性もある。 For example, in a discrimination problem with a clear rule in the background, it is preferable that the result of the discrimination using the discrimination model be completed as much as possible without human intervention. On the other hand, even if the determination problem is based on a clear rule, the determination accuracy of the determination model may decrease due to some external factors such as the presence of an unknown explanatory variable, and the gray data may increase.
 判別精度を上げるために、特許文献1に記載されている方法のように判別モデルが判別で用いる閾値を補正することも考えられる。例えば、上述する図30に示すように、境界スコアを、よりスコアの高い方向にシフトさせることで、負例のデータを判別モデルによる判別結果から除外することは可能である。しかし、境界スコアを上げた場合、スコアの低い正例データ(負例側寄りの正例データ)も予測対象から除外される(すなわち、モデルによる判別結果を利用できない)ため、除外されたデータを確認するため、より人手が必要になってしまうという問題がある。 閾 値 In order to increase the discrimination accuracy, it is conceivable to correct the threshold value used for discrimination by the discrimination model as in the method described in Patent Document 1. For example, as shown in FIG. 30 described above, it is possible to exclude negative example data from the discrimination result by the discrimination model by shifting the boundary score toward a higher score. However, when the boundary score is increased, positive example data having a low score (negative example-side positive example data) is also excluded from the prediction target (that is, the judgment result by the model cannot be used). There is a problem that more manpower is required for confirmation.
 また、判別条件を多数組み合わせて、多くのデータを判別できるようにすることも考えられる。しかし、判別条件が複雑になると、判別条件の明確性や解釈可能性が低下してしまうため、判別モデルに基づいてユーザが判別の適否を判断することが困難になってしまうという問題もある。 It is also conceivable that a large number of data can be determined by combining a large number of determination conditions. However, if the discrimination conditions are complicated, the clarity and interpretability of the discrimination conditions are reduced, so that there is a problem that it is difficult for the user to determine whether or not the discrimination is appropriate based on the discrimination model.
 そこで、本発明は、既存の判別モデルの判別条件を維持しつつ判別精度を向上させるように判別モデルを更新できるモデル更新装置、モデル更新方法およびモデル更新プログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide a model updating apparatus, a model updating method, and a model updating program that can update a discrimination model so as to improve the discrimination accuracy while maintaining the discrimination conditions of an existing discrimination model.
 本発明によるモデル更新装置は、階層型混合モデルを更新するモデル更新装置であって、階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出部と、抽出されたデータに対する補充を受け付けるデータ補充部と、補充されたデータを用いて判別モデルを生成するモデル生成部と、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成するモデル更新部とを備え、モデル更新部が、節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成することを特徴とする。 A model updating apparatus according to the present invention is a model updating apparatus for updating a hierarchical mixed model, comprising: a data extracting unit for extracting data classified under a target condition in the hierarchical mixed model; A model in which a data replenishing unit that receives replenishment for a model, a model generating unit that generates a discriminant model using the replenished data, and a node that classifies data that satisfies the target condition are arranged at the top of the hierarchical mixed model And a model updating unit that applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node node, and converts the condition to the discriminant model corresponding to the leaf node for the node node. The method is characterized in that a model indicating that the satisfying data is applied is generated.
 本発明によるモデル更新方法は、階層型混合モデルを更新するモデル更新方法であって、階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出し、抽出されたデータに対する補充を受け付け、補充されたデータを用いて判別モデルを生成し、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成し、モデルを生成する際、節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成することを特徴とする。 A model updating method according to the present invention is a model updating method for updating a hierarchical mixed model, in which data classified under a target condition in the hierarchical mixed model is extracted, and replenishment for the extracted data is received. , A discriminant model is generated using the supplemented data, a node node for classifying data satisfying the target condition is generated at the top of the hierarchical mixed model, and a model is generated. Applying data that does not satisfy the condition to a hierarchical mixed model corresponding to a leaf node with respect to, and generating a model indicating that data satisfying the condition is applied to a discriminant model corresponding to a leaf node with respect to a node node .
 本発明によるモデル更新プログラムは、コンピュータに、階層型混合モデルを更新するコンピュータに適用されるモデル更新プログラムであって、コンピュータに、階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出処理、抽出されたデータに対する補充を受け付けるデータ補充処理、補充されたデータを用いて判別モデルを生成するモデル生成処理、および、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成するモデル更新処理を実行させ、モデル更新処理で、節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成させることを特徴とする。 The model update program according to the present invention is a model update program that is applied to a computer that updates a hierarchical mixed model.The computer updates the computer with data classified under the conditions targeted in the hierarchical mixed model. Hierarchical data extraction processing to extract, data replenishment processing to accept replenishment for the extracted data, model generation processing to generate a discriminant model using the replenished data, and node nodes that classify data that satisfies the target conditions A model updating process for generating a model arranged at the top of the type mixed model is executed, and in the model updating process, data that does not satisfy the conditions is applied to the hierarchical mixed model corresponding to the leaf node for the node node, Applying data that satisfies the conditions to the discriminant model corresponding to a leaf node Characterized in that to generate the model shown.
 本発明によれば、既存の判別モデルの判別条件を維持しつつ判別精度を向上させるように判別モデルを更新できる。 According to the present invention, the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination conditions of the existing discrimination model.
本発明によるモデル更新装置の第一の実施形態の構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a first embodiment of a model updating device according to the present invention. 階層型混合モデルによる判別モデルの例を示す説明図である。It is explanatory drawing which shows the example of the discrimination model by a hierarchical mixture model. データの判別結果の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a data determination result. データが除外される処理の例を示す説明図である。FIG. 14 is an explanatory diagram illustrating an example of a process in which data is excluded. モデル更新装置の動作例を示すフローチャートである。6 is a flowchart illustrating an operation example of the model updating device. データの判別結果の他の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating another example of a data determination result. モデル更新装置の他の動作例を示すフローチャートである。13 is a flowchart illustrating another operation example of the model updating device. グレーデータを判別する処理の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a process of determining gray data. グレーデータを判別する処理の他の例を示す説明図である。FIG. 13 is an explanatory diagram illustrating another example of the process of determining gray data. 本発明によるモデル更新装置の第二の実施形態の構成例を示すブロック図である。It is a block diagram showing the example of composition of the second embodiment of the model updating device by the present invention. 基準とする判別モデルの例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a discrimination model serving as a reference. 第一学習データを用いて生成された階層型混合モデルの例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a hierarchical mixture model generated using first learning data. 判別結果の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a determination result. 判別結果の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a determination result. 新たな階層型混合モデルを生成する処理の例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a process of generating a new hierarchical mixed model. 生成された階層型混合モデルの例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a generated hierarchical mixed model. 判別システムの例を示す説明図である。It is an explanatory view showing an example of a discrimination system. クリーンゾーン抽出処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of a clean zone extraction process. グレーゾーン抽出処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of a gray zone extraction process. グレーデータ除外処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of gray data exclusion processing. グレーゾーン深堀処理の動作例を示すフローチャートである。It is a flow chart which shows an example of operation of gray zone deep excavation processing. 階層型混合モデルによる判別モデルの例を示す説明図である。It is explanatory drawing which shows the example of the discrimination model by a hierarchical mixture model. 各判別式の性質を表示する例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of displaying the properties of each discriminant. 判別モデルでデータを分類した結果の例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an example of a result of classifying data using a discrimination model. 更新された判別モデルでデータを分類した結果の例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an example of a result of classifying data using an updated discrimination model. 本発明によるモデル更新装置の概要を示すブロック図である。It is a block diagram showing the outline of the model updating device by the present invention. 少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。FIG. 2 is a schematic block diagram illustrating a configuration of a computer according to at least one embodiment. 二値判定を行う判別モデルの例を示す説明図である。FIG. 4 is an explanatory diagram illustrating an example of a discrimination model for performing a binary determination. 二値判定を行う判別モデルの他の例を示す説明図である。FIG. 11 is an explanatory diagram illustrating another example of a discrimination model that performs a binary determination. 二値判別による分析の実験結果を示す説明図である。FIG. 9 is an explanatory diagram showing experimental results of analysis by binary discrimination.
 以下、本発明の実施形態を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
実施形態1.
 図1は、本発明によるモデル更新装置の第一の実施形態の構成例を示すブロック図である。本発明におけるモデル更新装置は、判別モデルのうち、特に、階層型混合モデルを更新する機能を有する。階層型混合モデルは、木構造で表され、葉ノードにコンポーネントが配されるとともに、他の上位ノードに分岐条件を示す門関数(門木関数)が配される構造を有する。門関数の分岐条件は説明変数を用いて記述される。判別モデルにデータが入力されると、入力されたデータは、門関数で分岐され、根ノードおよび各節ノードを辿って複数のコンポーネントのいずれかに割り当てられる。
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration example of a first embodiment of a model updating device according to the present invention. The model updating apparatus according to the present invention has a function of updating a hierarchical mixed model among the discrimination models. The hierarchical mixture model is represented by a tree structure, and has a structure in which components are arranged in leaf nodes and a gate function (branch function) indicating a branch condition is arranged in another upper node. The branch condition of the gate function is described using explanatory variables. When data is input to the discriminant model, the input data is branched by a gate function, and is assigned to any of a plurality of components following a root node and each node.
 また、本実施形態では、判別モデルは予め作成され、基準とする判別モデルが決定されているものとする。すなわち、教師データに基づいて階層型混合モデルを生成することで、葉ノードに配される判別式を用いて判別が可能になる。以下の説明では、階層型混合モデルにより二値判別を行う場合を例示するが、多値判別を行う場合の処理も同様である。 In the present embodiment, it is assumed that a discriminant model is created in advance, and a discriminant model to be used as a reference has been determined. In other words, by generating a hierarchical mixed model based on teacher data, it becomes possible to make a determination using a discriminant provided for a leaf node. In the following description, a case in which binary discrimination is performed using a hierarchical mixed model will be described as an example.
 図2は、階層型混合モデルによる判別モデルの例を示す説明図である。図2に示す例では、条件1~3に基づいて、入力されたデータが4種類のいずれかの葉ノードに分類され、各葉ノードに配された判別式Y~Yに基づいて判別されることを示す。例えば、条件1を満たすデータ(条件1=trueを満たすデータ)が入力された場合、そのデータは、判別式Yが配される葉ノードに分類され、判別式Y=F(X)に基づいて判別が行われる。 FIG. 2 is an explanatory diagram illustrating an example of a discrimination model based on a hierarchical mixed model. In the example shown in FIG. 2, the input data is classified into any of four types of leaf nodes based on the conditions 1 to 3, and is determined based on the discriminants Y 1 to Y 4 allocated to each leaf node. Indicates that For example, when the data satisfies the condition 1 (data satisfying the condition 1 = true) is input, the data discriminant Y 1 is classified into the leaf nodes arranged, discriminant Y 1 = F 1 (X) Is determined based on
 基準とする判別モデルには、運用に適合するものを選定することが好ましい。具体的には、生成した複数の判別モデルの中から、精度の観点だけでなく、利用者の理解のし易さの観点で判別モデルを選定することが好ましい。すなわち、利用者が最も納得する変数や係数で構成されたモデルは、利用者が理解しやすいモデルと言える。 (4) It is preferable to select a discriminant model as a reference that is suitable for operation. Specifically, it is preferable to select a discrimination model from a plurality of generated discrimination models not only from the viewpoint of accuracy but also from the viewpoint of easy understanding of the user. That is, a model constituted by variables and coefficients that the user is most satisfied with can be said to be a model that is easy for the user to understand.
 図1を参照すると、本実施形態のモデル更新装置100は、記憶部10と、入力部20と、データ抽出部30と、データ補充部40と、モデル生成部50と、モデル更新部60と、出力部70とを備えている。 Referring to FIG. 1, a model updating apparatus 100 according to the present embodiment includes a storage unit 10, an input unit 20, a data extracting unit 30, a data replenishing unit 40, a model generating unit 50, a model updating unit 60, And an output unit 70.
 記憶部10は、判別対象のデータを記憶する。また、記憶部10は、後述するモデル生成部50がモデルを生成するために必要な各種パラメータを記憶していてもよい。また、記憶部10は、後述するデータ補充部40が受け付けた補充データを記憶してもよい。記憶部10は、例えば、磁気ディスク等により実現される。 The storage unit 10 stores data to be determined. Further, the storage unit 10 may store various parameters necessary for the model generation unit 50 described later to generate a model. The storage unit 10 may store supplementary data received by the data supplementing unit 40 described later. The storage unit 10 is realized by, for example, a magnetic disk or the like.
 入力部20は、判別対象のデータを入力する。入力部20は、例えば、記憶部10に記憶された判別対象のデータを読み取って、データ抽出部30に入力してもよい。また、入力部20は、後述するデータ抽出部30が抽出対象とする分岐条件の選択指示を受け付けてもよい。 The input unit 20 inputs data to be determined. The input unit 20 may read, for example, data to be determined stored in the storage unit 10 and input the data to the data extraction unit 30. Further, the input unit 20 may receive an instruction to select a branch condition to be extracted by the data extracting unit 30 described later.
 データ抽出部30は、各葉ノード配された判別式を用いて、その葉ノードに分類されたデータの判別を行う。データ抽出部30は、他の葉ノードに分類されたデータについても同様に判別を行い、各葉ノードに分類されたデータの判別結果を集計する。本実施形態では、各葉ノードに分類されるデータは、分岐条件を辿って分類されたデータであることから、データ抽出部30は、ノードから辿った条件の配下に分類されたデータの判別結果を集計しているとも言える。すなわち、データ抽出部30は、判別結果集計部と言うことができる。 The 抽出 data extraction unit 30 determines the data classified into the leaf nodes by using the discriminant formula allocated to each leaf node. The data extraction unit 30 performs the same determination on the data classified into other leaf nodes, and totals the determination results of the data classified into each leaf node. In the present embodiment, since the data classified into each leaf node is data classified according to the branch condition, the data extraction unit 30 determines the classification result of the data classified under the condition traced from the node. It can be said that the total is calculated. That is, the data extraction unit 30 can be referred to as a determination result totalization unit.
 データ抽出部30は、階層型混合モデルにおいて、対象とする条件の配下に分類されるデータを抽出する。具体的には、データ抽出部30は、データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出してもよい。 The data extraction unit 30 extracts data classified under the target condition in the hierarchical mixed model. Specifically, the data extraction unit 30 may extract data classified under a condition that the data determination result does not satisfy the criteria.
 データ抽出部30は、葉ノード(すなわち、データが分類される条件)ごとに、分類されたデータの予測結果の正否の割合を算出する。具体的には、データ抽出部30は、各条件の配下に分類されたデータのうち、真に正であり、かつ、判別結果も正であるデータの割合である正解割合を集計する。葉ノードごとに予測結果を集計することで、その葉ノードに分類されるための条件が既知の説明変数Xで説明可能か否かを判断することが可能になる。これは、例えば、上述する図28に示すように、入力情報に対して二値判定を行う条件が全て既知であれば、対象とするデータを完全に判別可能なためである。 The data extraction unit 30 calculates, for each leaf node (that is, a condition under which the data is classified), the ratio of the correctness of the prediction result of the classified data. Specifically, the data extraction unit 30 totals the correct answer ratio, which is the ratio of data that is truly positive and the determination result is also positive among the data classified under each condition. By totalizing the prediction results for each leaf node, it is possible to determine whether or not the condition for classifying the leaf node can be explained by a known explanatory variable X. This is because, for example, as shown in FIG. 28 described above, if all the conditions for performing the binary determination on the input information are known, the target data can be completely determined.
 データを抽出するか否か判断する基準は、運用に合わせて適宜設定される。基準としてジニ係数が用いられてもよい。ルール依存が高いケース(明確なルールが背景にあるケース)では、例えば、ジニ係数99%が基準として設定される。 基準 Criteria for determining whether to extract data are set appropriately according to the operation. The Gini coefficient may be used as a reference. In the case where the rule is highly dependent (the case where a clear rule is in the background), for example, a Gini coefficient of 99% is set as a reference.
 図3は、データの判別結果の例を示す説明図である。図3に例示する表は、各判別式Y~Yで判別される条件(葉ノード)に分類された各データの判別結果を示す。図3に例示するTP(True Positive )は、正例のデータを正例と判断した件数であり、TN(True Negative )は、負例のデータを負例と判断した件数である。また、FP(False Positive)は、正例のデータを負例と判断した件数であり、FNは、負例のデータを正例と判断した件数である。以下の説明では、各判別結果に添え字を付して記すこともある。 FIG. 3 is an explanatory diagram illustrating an example of a data determination result. The table illustrated in FIG. 3 shows a result of discrimination of each data classified into the conditions (leaf nodes) determined by each of the discriminants Y 1 to Y 4 . The TP (True Positive) illustrated in FIG. 3 is the number of cases where the data of the positive example is determined as a positive example, and the TN (True Negative) is the number of cases where the data of the negative example is determined as a negative example. Further, FP (False Positive) is the number of cases where the data of the positive example is determined as a negative example, and FN is the number of cases where the data of the negative example is determined as a positive example. In the following description, a subscript may be added to each determination result.
 例えば、図3に例示する判別式Yで判別した結果は、TPが100であり、TNが1であることから、TPの割合が99%以上である。同様に、図3に例示する判別式Yで判別した結果は、TPのみが50であることから、判別結果に対して説明変数Xのみで説明が可能と言える。そこで、例えば、「TPの割合が99%以上」との基準が設けられていてもよい。 For example, a result of the determination by the discriminant Y 1 illustrated in Figure 3, TP is 100, since TN is 1, the ratio of TP of at least 99%. Similarly, a result of the determination by the discriminant Y 4 illustrated in FIG. 3, only the TP can be said because it is 50, and can be explained only by the explanatory variable X with respect to the determination result. Therefore, for example, a criterion “the ratio of TP is 99% or more” may be provided.
 一方、図3に例示する判別式Yで判別した結果は、TPが20、FPが30、FNが15、および、TNが20である。また、図3に例示する判別式Yで判別した結果は、TPが0、FPが20、FNが40、および、TNが10である。これらの結果から、説明変数Xでの説明や分類は困難であること、また、未知の説明変数X´の存在の可能性があることが判断できる。 Meanwhile, a result of the determination by the discriminant Y 2 illustrated in Figure 3, TP is 20, FP is 30, FN 15, and, TN is 20. As a result of determination by the discriminant Y 3 illustrated in FIG. 3, TP is 0, FP is 20, FN 40, and, TN is 10. From these results, it can be determined that it is difficult to explain or categorize using the explanatory variable X, and that there is a possibility that an unknown explanatory variable X ′ exists.
 このように、データ抽出部30は、正解割合が予め定めた閾値以下の条件の配下に分類されたデータを抽出してもよい。このように抽出されたデータは、より適切な判別を行うための学習データとして用いられる。言い換えると、データ抽出部30は、データの判別結果が基準を満たす条件の配下に分類されたデータを、グレーデータとして抽出しているとも言える。なお、以下の説明において「グレーデータ」とは、判別が可能か否か不明なデータではなく、与えられた説明変数だけでは判別が困難なデータを意味する。図3に示す例では、データ抽出部30は、判別式Yによる判別対象に分類される条件を満たすデータおよび判別式Yによる判別対象に分類される条件を満たすデータを、それぞれ抽出する。 As described above, the data extraction unit 30 may extract data classified under the condition that the correct answer ratio is equal to or less than the predetermined threshold. The data extracted in this way is used as learning data for performing more appropriate discrimination. In other words, it can be said that the data extracting unit 30 extracts, as gray data, data classified under the condition that the determination result of the data satisfies the standard. In the following description, “gray data” does not mean data that cannot be distinguished or not, but means data that is difficult to distinguish only by given explanatory variables. In the example shown in FIG. 3, the data extraction unit 30 satisfies the condition data classified to the determination target by satisfying data and discriminant Y 3 are classified into the discrimination object by discriminant Y 2, extracted, respectively.
 なお、データ抽出部30は、入力部20を介してユーザから指定された条件の配下に分類されるデータを抽出してもよい。具体的には、データ抽出部30は、集計した判別結果を後述する出力部70に出力させ、その判別結果に応じてユーザから指摘された条件の配下に分類されるデータを抽出してもよい。 The data extraction unit 30 may extract data classified under the conditions specified by the user via the input unit 20. Specifically, the data extracting unit 30 may output the totalized determination result to the output unit 70 described later, and may extract data classified under the condition pointed out by the user according to the determination result. .
 なお、全体の件数に対し、既知の説明変数によって基準を満たす対象が限定的な場合、学習に利用する説明変数が不足していることが疑われることになる。一方、対象とする条件の配下も、更なる条件を追加することで、より詳細に分類することも可能である。そこで、データ抽出部30は、より詳細な条件に基づいて分類されるデータを抽出してもよい。対象とする条件の配下をより詳細に分類する(すなわち、条件の深堀を行う)方法については、後述する実施形態でより詳しく説明する。 場合 In addition, when the number of targets satisfying the criterion by the known explanatory variables is limited with respect to the total number of cases, it is suspected that the explanatory variables used for learning are insufficient. On the other hand, it is also possible to classify the subordinates under the target conditions in more detail by adding further conditions. Therefore, the data extraction unit 30 may extract data classified based on more detailed conditions. A method of classifying the subordinates of the target condition in more detail (that is, performing deep excavation of the condition) will be described in more detail in an embodiment described later.
 データ補充部40は、データ抽出部30によって抽出されたデータに対する補充を受け付ける。ここで、データに対する補充とは、いわゆる、マシンティーチングを行うことであり、そのデータに新たな説明変数に基づく値を追加することや、そのデータの教師ラベルを更新することなどが挙げられる。例えば、あるユーザに関するデータに、勤務地を示す情報が含まれていない場合、データ補充部40は、勤務地を説明変数とする情報を受け付けてもよい。また、あるユーザに関する教師ラベルを変更する場合、データ補充部40は、変更する教師ラベルを受け付けてもよい。 The data replenishment unit 40 receives replenishment for the data extracted by the data extraction unit 30. Here, replenishment of data means so-called machine teaching, which includes adding a value based on a new explanatory variable to the data, updating a teacher label of the data, and the like. For example, when data relating to a certain user does not include information indicating a work location, the data supplementing unit 40 may receive information using the work location as an explanatory variable. When changing the teacher label for a certain user, the data supplementing unit 40 may receive the teacher label to be changed.
 データ補充部40は、データ抽出部30が抽出したデータ群を、例えば、ファイル形式で出力し、出力したデータ群に対してユーザが情報を補充したデータ群を受け付けてもよい。 The data supplementing unit 40 may output the data group extracted by the data extracting unit 30 in, for example, a file format, and accept a data group in which the user supplements the output data group with information.
 モデル生成部50は、補充されたデータを用いて判別モデルを生成する。モデル生成部50が生成するモデルの態様は任意である。モデル生成部50は、例えば、単純な線形回帰モデルを判別モデルとして生成してもよいし、階層型混合モデルで表される判別モデルを学習してもよい。階層型混合モデルで表される判別モデルを学習することで、判別条件の明確性や解釈性を維持しつつ、グレーデータを分類するための条件を明確にすることが可能になる。 The model generation unit 50 generates a discrimination model using the supplemented data. The mode of the model generated by the model generation unit 50 is arbitrary. The model generation unit 50 may generate, for example, a simple linear regression model as a discrimination model, or may learn a discrimination model represented by a hierarchical mixed model. By learning the discriminant model represented by the hierarchical mixed model, it becomes possible to clarify the conditions for classifying gray data while maintaining the clarity and interpretability of the discriminant conditions.
 また、モデル生成部50は、複数種類の判別モデルを生成してもよい。この場合、モデル生成部50は、生成した複数種類の判別モデルを後述する出力部70に出力させ、ユーザに(例えば、入力部20を介して)複数種類の判別モデルの中から所望の判別モデルを選択させるようにしてもよい。 (4) The model generation unit 50 may generate a plurality of types of discrimination models. In this case, the model generation unit 50 causes the output unit 70 described later to output the generated plural types of discrimination models, and allows the user (for example, via the input unit 20) to select a desired discrimination model from the plural types of discrimination models. May be selected.
 モデル更新部60は、生成した新たな判別モデルを用いて既存の判別モデルを更新する。具体的には、モデル更新部60は、データ抽出部30がデータを抽出する対象とした条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成する。より詳しくは、モデル更新部60は、上記節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、上記節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成する。 The model updating unit 60 updates the existing discrimination model using the generated new discrimination model. Specifically, the model updating unit 60 generates a model in which node nodes that classify data that satisfy the conditions for which the data extracting unit 30 extracts data are arranged at the top of the hierarchical mixed model. More specifically, the model updating unit 60 applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node, and converts the data that satisfies the condition to the discriminant model corresponding to the leaf node for the node. Generate a model that indicates the application.
 このように生成されるモデルを用いることで、既存の判別モデルで判別が困難なデータを、新たに生成した判別モデルで事前に除外するという効果が得られる。モデル更新部60は、モデル生成部50によって生成された判別モデルを、既存の判別モデルに入力されるデータを事前にフィルタリングするモデルとして用いるような判別モデルを生成してもよい。また、モデル更新部60は、既存の判別モデルに、生成された判別モデルを直接結合させた判別モデルを生成してもよい。このようにして、モデル更新部60は、階層型混合モデルの最上位に、生成された判別モデルを配置する。 (4) By using the model generated in this way, an effect is obtained in which data that is difficult to discriminate with an existing discriminant model is excluded in advance by a newly generated discriminant model. The model updating unit 60 may generate a discriminant model that uses the discriminant model generated by the model generating unit 50 as a model for filtering data input to the existing discriminant model in advance. Further, the model updating unit 60 may generate a discrimination model in which the generated discrimination model is directly combined with the existing discrimination model. In this way, the model updating unit 60 arranges the generated discrimination model at the top of the hierarchical mixed model.
 図4は、新たな判別モデルによってデータが除外される処理の例を示す説明図である。図4に例示する新判別モデルM1は、モデル生成部50により生成された判別モデルであり、既存の判別モデルM2は、予め作成された基準とする判別モデルである。モデル更新部60は、新判別モデルM1と既存の判別モデルM2とをそれぞれ葉ノードとするような節ノードを最上位に配したモデルを生成する。具体的には、新判別モデルM1には、データ抽出部30がデータを抽出する対象とした条件を満たすデータが適用される。 FIG. 4 is an explanatory diagram showing an example of a process in which data is excluded by a new determination model. The new discriminant model M1 illustrated in FIG. 4 is a discriminant model generated by the model generating unit 50, and the existing discriminant model M2 is a discriminant model created in advance as a reference. The model updating unit 60 generates a model in which node nodes are arranged at the highest level such that the new discrimination model M1 and the existing discrimination model M2 are respectively leaf nodes. Specifically, data that satisfies the conditions for the data extraction unit 30 to extract data is applied to the new discrimination model M1.
 例えば、図2および図3に示す例において、モデル生成部50が、データ抽出部30によって抽出された、判別式Yまたは判別式Yによる判別対象に分類される条件を満たすデータに基づいて、新判別モデルを生成したとする。このとき、新判別モデルM1によって判別される対象のデータは、(not(条件1)and(条件2))or(not(条件1)and not(条件2)and(条件3))を満たすデータになる。新判別モデルM1によって判別される対象のデータは、判別モデルM2の判別処理からは除外されるため、上述する条件は、既存の判別モデルの除外条件(除外ルール)と言うことができる。 For example, in the example shown in FIGS. 2 and 3, the model generating unit 50 has been extracted by the data extraction unit 30, based on data that meets conditions are classified into the discrimination object by discriminant Y 2 or discriminants Y 3 Assume that a new discriminant model is generated. At this time, data to be determined by the new determination model M1 is data that satisfies (not (condition 1) and (condition 2)) or (not (condition 1) and not (condition 2) and (condition 3)). become. Since the data to be discriminated by the new discrimination model M1 is excluded from the discrimination processing of the discrimination model M2, the above-described condition can be called an exclusion condition (exclusion rule) of the existing discrimination model.
 図4に例示するモデル(具体的には、新判別モデルM1および判別モデルM2)に対して教師データD1が入力されると、データ抽出部30は、対象とした条件を満たすデータを新判別モデルM1を用いて判別する。例えば、新判別モデルM1が階層型混合モデルの場合、対象とした条件を満たすデータは、それぞれの分岐条件に応じてさらに分類され、各葉ノードで判別処理が行われる。 When the teacher data D1 is input to the models illustrated in FIG. 4 (specifically, the new discrimination model M1 and the discrimination model M2), the data extracting unit 30 converts the data satisfying the target condition into a new discrimination model. The determination is made using M1. For example, when the new discrimination model M1 is a hierarchical mixed model, data satisfying the target condition is further classified according to each branch condition, and discrimination processing is performed at each leaf node.
 データ抽出部30は、データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出する。言い換えると、データ抽出部30は、既知の説明変数Xで説明が困難なデータを抽出しており、既存の判別モデルM2による判別処理の前処理として、判別モデルM2による判別が困難なデータを除外していると言える。 The data extraction unit 30 extracts data classified under the condition that the result of the data determination does not satisfy the criteria. In other words, the data extraction unit 30 extracts data that is difficult to explain using the known explanatory variable X, and excludes data that is difficult to discriminate using the discrimination model M2 as preprocessing of the discrimination processing using the existing discrimination model M2. It can be said that.
 データ抽出部30は、抽出されたデータ以外のデータを判別モデルM2を用いて判別する。判別モデルM2が判別処理の対象とするデータは、説明変数Xで説明可能なデータと言える。したがって、判別モデルM2で判別された結果は、ほぼ100%信頼可能な結果と言えるため、この結果を様々なケースに適用することが可能になる。 The data extraction unit 30 determines data other than the extracted data using the determination model M2. The data to be subjected to the discrimination processing by the discrimination model M2 can be said to be data that can be explained by the explanatory variable X. Therefore, the result determined by the determination model M2 can be said to be almost 100% reliable, and this result can be applied to various cases.
 出力部70は、ディスプレイ装置(図示せず)などの出力装置に、判別モデルの情報を出力する。出力部70は、例えば、更新されたモデルを出力してもよい。また、上述するように、出力部70は、データ抽出部30による判別結果を出力してもよいし、複数種類の判別モデルにおける判別条件および判別式を出力してもよい。なお、具体的な出力態様は後述される。 The output unit 70 outputs the information of the discrimination model to an output device such as a display device (not shown). The output unit 70 may output, for example, an updated model. In addition, as described above, the output unit 70 may output the determination result by the data extraction unit 30 or may output the determination conditions and the discriminant in a plurality of types of the determination models. The specific output mode will be described later.
 入力部20と、データ抽出部30と、データ補充部40と、モデル生成部50と、モデル更新部60と、出力部70とは、プログラム(モデル更新プログラム)に従って動作するコンピュータのプロセッサ(例えば、CPU(Central Processing Unit )、GPU(Graphics Processing Unit)、FPGA(field-programmable gate array ))によって実現される。 The input unit 20, the data extraction unit 30, the data replenishment unit 40, the model generation unit 50, the model update unit 60, and the output unit 70 include a computer processor (eg, a computer that operates according to a program (model update program)). It is realized by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (field-programmable gate array).
 例えば、プログラムは、記憶部10に記憶され、プロセッサは、そのプログラムを読み込み、プログラムに従って、入力部20、データ抽出部30、データ補充部40、モデル生成部50、モデル更新部60および出力部70として動作してもよい。また、モデル更新装置の機能がSaaS(Software as a Service )形式で提供されてもよい。 For example, the program is stored in the storage unit 10, and the processor reads the program, and according to the program, the input unit 20, the data extraction unit 30, the data supplement unit 40, the model generation unit 50, the model update unit 60, and the output unit 70. It may operate as. Further, the function of the model updating device may be provided in SaaS (Software @ as \ a \ Service \) format.
 入力部20と、データ抽出部30と、データ補充部40と、モデル生成部50と、モデル更新部60と、出力部70とは、それぞれが専用のハードウェアで実現されていてもよい。また、各装置の各構成要素の一部又は全部は、汎用または専用の回路(circuitry )、プロセッサ等やこれらの組合せによって実現されもよい。これらは、単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。各装置の各構成要素の一部又は全部は、上述した回路等とプログラムとの組合せによって実現されてもよい。 The input unit 20, the data extraction unit 30, the data replenishment unit 40, the model generation unit 50, the model update unit 60, and the output unit 70 may each be realized by dedicated hardware. In addition, some or all of the components of each device may be realized by a general-purpose or dedicated circuit (circuitry II), a processor, or a combination thereof. These may be configured by a single chip, or may be configured by a plurality of chips connected via a bus. Some or all of the components of each device may be realized by a combination of the above-described circuit and the like and a program.
 また、モデル更新装置の各構成要素の一部又は全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When some or all of the components of the model updating device are realized by a plurality of information processing devices or circuits, the plurality of information processing devices or circuits may be centrally arranged or distributed. It may be arranged. For example, the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client server system and a cloud computing system.
 次に、本実施形態のモデル更新装置の動作を説明する。図5は、本実施形態のモデル更新装置の動作例を示すフローチャートである。入力部20は、判別対象のデータを入力する(ステップS11)。データ抽出部30は、対象とする条件の配下に分類されるデータを抽出する(ステップS12)。データ抽出部30は、例えば、正解割合が予め定めた閾値以下の条件の配下に分類されたデータを抽出してもよい。データ補充部40は、抽出されたデータに対する補充を受け付ける(ステップS13)。 Next, the operation of the model updating apparatus according to the present embodiment will be described. FIG. 5 is a flowchart illustrating an operation example of the model updating device according to the present embodiment. The input unit 20 inputs data to be determined (step S11). The data extraction unit 30 extracts data classified under the target condition (step S12). The data extracting unit 30 may extract, for example, data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold. The data replenishing unit 40 receives replenishment for the extracted data (step S13).
 モデル生成部50は、補充されたデータを用いて判別モデルを生成する(ステップS14)。そして、モデル更新部60は、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成する(ステップS15)。具体的には、モデル更新部60は、条件を満たさないデータを階層型混合モデルへ適用し、条件を満たすデータを判別モデルへ適用することを示すモデルを生成する。 The model generation unit 50 generates a discrimination model using the supplemented data (step S14). Then, the model updating unit 60 generates a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model (step S15). Specifically, the model updating unit 60 applies a data that does not satisfy the condition to the hierarchical mixed model, and generates a model indicating that the data that satisfies the condition is applied to the discrimination model.
 以上のように、本実施形態では、データ抽出部30が、階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出し、データ補充部40が、抽出されたデータに対する補充を受け付け、モデル生成部50が、補充されたデータを用いて判別モデルを生成する。そして、モデル更新部60が、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成する。具体的には、モデル更新部60は、節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成する。よって、既存の判別モデルの判別条件を維持しつつ判別精度を向上させるように判別モデルを更新できる。 As described above, in the present embodiment, the data extraction unit 30 extracts data classified under the target conditions in the hierarchical mixed model, and the data replenishment unit 40 receives replenishment for the extracted data. , The model generation unit 50 generates a discrimination model using the supplemented data. Then, the model updating unit 60 generates a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model. Specifically, the model updating unit 60 applies the data that does not satisfy the condition to the hierarchical mixed model corresponding to the leaf node for the node node, and applies the data that satisfies the condition to the discriminant model corresponding to the leaf node for the node node. Generate a model that indicates Therefore, the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination condition of the existing discrimination model.
 例えば、判別モデルを単純に更新(再構築)しようとすると、利用者が選定した判別モデルの構造や分岐条件、判別式などが変わってしまう可能性がある。このような更新を行った場合、判別モデルの精度は向上できたとしても、利用者にとっては使いづらいモデルになってしまう。 For example, if the discriminant model is simply updated (reconstructed), the structure, branching condition, discriminant, and the like of the discriminant model selected by the user may change. When such an update is performed, even if the accuracy of the discrimination model can be improved, the model becomes difficult for the user to use.
 一方、本実施形態では、モデル更新部60が、既存の判別モデルの精度を低下させるようなデータ(すなわち、対象とする条件を満たすデータ)に対して判別モデルを適用することを示す節ノードを、階層型混合モデルの最上位に配置したモデルを生成する。そのため、既存の判別モデルの構造を変化させず、新たに生成した判別モデルを加えたモデルを生成できる。よって、既存の判別モデルを使い続けたいという利用者の要望を叶えつつ、全体として判別モデルの判別精度を向上できる。 On the other hand, in the present embodiment, the model updating unit 60 sets a node indicating that the discriminant model is applied to data that reduces the accuracy of the existing discriminant model (that is, data that satisfies the target condition). , A model arranged at the top of the hierarchical mixed model is generated. Therefore, a model to which the newly generated discriminant model is added can be generated without changing the structure of the existing discriminant model. Therefore, the discrimination accuracy of the discrimination model can be improved as a whole while satisfying the user's desire to keep using the existing discrimination model.
 次に、本実施形態のモデル更新装置の変形例を説明する。上記実施形態では、モデル更新部60が、モデル生成部50によって生成された新たな判別モデルを用いて既存の判別モデルを更新する場合について説明した。その際、データ抽出部30は、モデル生成部50が生成した判別モデルを用いて、データの判別結果が基準を満たさない条件の配下に分類されたデータをさらに抽出してもよい。 Next, a modified example of the model updating device of the present embodiment will be described. In the above embodiment, the case where the model updating unit 60 updates the existing discriminant model using the new discriminant model generated by the model generating unit 50 has been described. At this time, the data extraction unit 30 may further extract data classified under the condition that the data discrimination result does not satisfy the criteria, using the discrimination model generated by the model generation unit 50.
 この場合、データ補充部40は、データ抽出部30によってさらに抽出されたデータに対する補充を受け付け、モデル生成部50は、補充されたデータを用いてさらに判別モデルを生成してもよい。データ抽出部30は、抽出されるデータが減らなくなるまで(言い換えると、既知の説明変数で判別できないデータのみが残るまで)上記処理を繰り返してもよい。 In this case, the data replenishing unit 40 may receive replenishment for the data further extracted by the data extracting unit 30, and the model generating unit 50 may further generate a discriminant model using the replenished data. The data extraction unit 30 may repeat the above processing until the number of extracted data does not decrease (in other words, only data that cannot be determined by a known explanatory variable remains).
 図6は、データの判別結果の他の例を示す説明図である。図6に例示する表は、図3に例示する表と同様に、各判別式Y11~Y14で判別される条件(葉ノード)に分類された各データの判別結果を示す。図6に例示する判別結果は、TPに判別される割合が低いことから、判別結果の妥当性が低く、信用できない結果であると考えられる。言い換えると、図6に例示する判別結果は、既知の説明変数Xでの判別は困難であり、未知の説明変数X´が影響しているデータと判断できる。データ抽出部30は、図6に例示するようなデータだけが残るまで、上記処理を繰り返してもよい。 FIG. 6 is an explanatory diagram illustrating another example of the data determination result. The table illustrated in FIG. 6 shows the determination result of each data classified into the conditions (leaf nodes) determined by each of the discriminants Y 11 to Y 14 similarly to the table illustrated in FIG. The discrimination result illustrated in FIG. 6 has a low ratio of being discriminated as TP, and thus the discrimination result has low validity and is considered to be an unreliable result. In other words, the determination result illustrated in FIG. 6 is difficult to determine using the known explanatory variable X, and can be determined to be data affected by the unknown explanatory variable X ′. The data extraction unit 30 may repeat the above processing until only the data as illustrated in FIG. 6 remains.
 新たに生成された判別モデルに適用される条件は、処理ごとに特定される条件のand結合になる。例えば、図6に例示する判別式が、図2および図3に例示する条件に加えて、「条件4」を満たすデータに対して行われる判別式である場合、除外条件(除外ルール)は、(not(条件1)and(条件2))or(not(条件1)and not(条件2)and(条件3))and(条件4)になる。 条件 The condition applied to the newly generated discriminant model is an AND combination of the condition specified for each process. For example, when the discriminant illustrated in FIG. 6 is a discriminant performed on data satisfying “condition 4” in addition to the conditions illustrated in FIGS. 2 and 3, the exclusion condition (exclusion rule) (Not (condition 1) and (condition 2)) or (not (condition 1) and not (condition 2) and (condition 3)) and (condition 4).
 次に、本変形例の動作を説明する。図7は、本変形例のモデル更新装置の動作例を示すフローチャートである。なお、生成された判別モデルを用いて基準とする判別モデルを更新する処理(具体的には、図5に例示するステップS11~ステップS15までの処理)が事前に行われる。 Next, the operation of the present modified example will be described. FIG. 7 is a flowchart illustrating an operation example of the model updating device of the present modified example. The process of updating the reference discriminant model using the generated discriminant model (specifically, the processes of steps S11 to S15 illustrated in FIG. 5) is performed in advance.
 入力部20は、判別対象のデータを再度入力する(ステップS16)。データ抽出部30は、モデル更新部60によって生成されたモデルにおいて対象とする条件の配下に分類されるデータを抽出する(ステップS17)。データ補充部40は、抽出されたデータに対する補充をさらに受け付ける(ステップS18)。モデル生成部50は、補充されたデータを用いて、他の判別モデルを生成する(ステップS19)。モデル更新部60は、対象とする条件を満たすデータを他の判別モデルへ適用することを示すモデルを生成する(ステップS20)。 The input unit 20 inputs the data to be determined again (step S16). The data extracting unit 30 extracts data classified under the target condition in the model generated by the model updating unit 60 (Step S17). The data replenishing unit 40 further receives replenishment for the extracted data (step S18). The model generation unit 50 generates another discrimination model using the supplemented data (Step S19). The model updating unit 60 generates a model indicating that data satisfying the target condition is applied to another discrimination model (step S20).
 以降、既知の説明変数で判別できないデータだけが抽出されるようになるまで、ステップS16からステップS20までの処理が繰り返される。このように、処理を繰り返すことで、既知の説明変数で判別できないデータをより適切に除外する判別モデルを生成できる。 Thereafter, the processing from step S16 to step S20 is repeated until only data that cannot be determined by a known explanatory variable is extracted. In this way, by repeating the processing, it is possible to generate a discrimination model that more appropriately excludes data that cannot be discriminated by a known explanatory variable.
実施形態2.
 次に、本発明によるモデル更新装置の第二の実施形態を説明する。第一の実施形態では、グレーデータとして抽出されたデータに対する補充を行って、モデルを更新する方法を説明した。本実施形態では、グレーデータを抽出するための条件をさらに深堀して条件を詳細化することにより、予測のための説明変数が充足しているかを判断する。また、このように詳細化された条件を用いて、入力されるデータからグレーデータを事前に抽出することで、基準とする判別モデルの判別精度を向上させる。
Embodiment 2. FIG.
Next, a second embodiment of the model updating device according to the present invention will be described. In the first embodiment, the method of updating the model by supplementing data extracted as gray data has been described. In the present embodiment, the conditions for extracting gray data are further deepened to refine the conditions, thereby determining whether the explanatory variables for prediction are satisfied. Further, by using the conditions refined in this way and extracting gray data from input data in advance, the discrimination accuracy of the reference discrimination model is improved.
 まず、本実施形態で用いられるグレーデータについて説明する。図8は、グレーデータを判別する処理の例を示す説明図である。図8における各矩形は、判別対象のデータを表わす。上述するように、一般的な方法では、正例と負例とを判別する閾値が設けられ、閾値より小さいスコアのデータは、自動的に正例とは判別されないことになる。図8に示す例では、横軸より上の矩形が正例のデータ、横軸より下の矩形が負例のデータを表わす。また、図8に示す例では、スコア=0より大きい値であって、最もスコアの大きい負例のデータよりも大きい値に閾値S3が設定されているものとする。 First, gray data used in the present embodiment will be described. FIG. 8 is an explanatory diagram illustrating an example of a process of determining gray data. Each rectangle in FIG. 8 represents data to be determined. As described above, in the general method, a threshold value for determining a positive example and a negative example is provided, and data having a score smaller than the threshold value is not automatically determined to be a positive example. In the example shown in FIG. 8, a rectangle above the horizontal axis represents data of a positive example, and a rectangle below the horizontal axis represents data of a negative example. In the example illustrated in FIG. 8, it is assumed that the threshold value S3 is set to a value larger than the score = 0 and larger than the data of the negative example having the highest score.
 閾値に基づいて判別する方法の場合、スコアが0より大きいデータであっても、閾値S3より小さいデータ群S4は、正例のデータであっても、自動的に正例とは判別されないことになる。すなわち、データ群S4は、統計上は正解しているものの、AI判定においては、閾値の設定が原因で、自動判別の対象にならないデータになってしまう。 In the case of the method of discriminating based on the threshold value, the data group S4 smaller than the threshold value S3 is not automatically determined to be a positive example even if the data group is smaller than the threshold value S3 even if the data is greater than 0. Become. That is, although the data group S4 is statistically correct, in the AI determination, it becomes data that is not subject to automatic determination due to the setting of the threshold.
 図9は、グレーデータを判別する処理の他の例を示す説明図である。本実施形態では、第一の実施形態と同様に、各葉ノードに分類されたデータの判別結果をそれぞれ集計する。この集計されるデータの単位をゾーンと記すこともある。また、本実施形態では、データおよび予測式が同一または類似であるにもかかわらず、判別結果が異なるデータが含まれるゾーン(すなわち、条件配下のデータ群)を、グレーゾーンと記す。また、グレーゾーンに属するデータのことをグレーデータと記す。一方、データおよび予測式が同一または類似の場合に、判別結果が一意に決まるゾーン(すなわち、既知の説明変数で、全ての結果を予測可能な条件配下のデータ群)を、クリーンゾーンと記す。また、クリーンゾーンに属するデータのことをクリーンデータと記す。例えば、正例と負例とが混在しているゾーンは、判別結果が一意に決まらないゾーンであるため、グレーゾーンと言うことができる。 FIG. 9 is an explanatory diagram showing another example of the process of determining gray data. In the present embodiment, similarly to the first embodiment, the determination results of the data classified into each leaf node are totaled. The unit of the data to be aggregated is sometimes referred to as a zone. Also, in the present embodiment, a zone (that is, a data group under the condition) that includes data having the same or similar prediction formula but different determination results is described as a gray zone. Data belonging to the gray zone is referred to as gray data. On the other hand, when the data and the prediction formula are the same or similar, a zone for which the determination result is uniquely determined (that is, a data group under known explanatory variables and under the condition that all results can be predicted) is referred to as a clean zone. Data belonging to the clean zone is referred to as clean data. For example, a zone in which a positive example and a negative example are mixed is a zone in which a determination result is not uniquely determined, and thus can be called a gray zone.
 図9に示す例では、点線で囲まれた各領域がグレーゾーンを示し、楕円状の実線で囲まれた各領域がクリーンゾーンを示す。すなわち、一般的な方法で算出されるスコアは同じでも、スコアを算出する過程(ノードの違い)により、グレーゾーンに属するかクリーンゾーンに属するかがデータごとに異なる。このように、ゾーン単位でデータを扱うことで、閾値の設定が不要になる。また、高スコアの負例についても、一部をグレーゾーンとして扱うことも可能になる。ただし、運用上の利便性から、正例と負例との判別においてスコアを利用することを妨げない。 In the example shown in FIG. 9, each area surrounded by a dotted line indicates a gray zone, and each area surrounded by an elliptical solid line indicates a clean zone. In other words, even if the scores calculated by the general method are the same, whether the data belongs to the gray zone or the clean zone differs for each data depending on the process of calculating the score (difference between nodes). By handling data in units of zones, it is not necessary to set a threshold. In addition, a part of the negative example having a high score can be treated as a gray zone. However, the use of the score in discriminating a positive example from a negative example is not prevented from the operational convenience.
 図10は、本発明によるモデル更新装置の第二の実施形態の構成例を示すブロック図である。本実施形態のモデル更新装置200は、記憶部10と、入力部20と、学習データ生成部31と、モデル学習部32と、スコア算出部33と、条件抽出部34と、条件生成部35と、フィルタ生成部61と、出力部70とを備えている。すなわち、本実施形態のモデル更新装置200は、データ抽出部30、データ補充部40、モデル生成部50およびモデル更新部60の代わりに、学習データ生成部31、モデル学習部32、スコア算出部33、条件抽出部34と、条件生成部35と、フィルタ生成部61とを備えている点において、第一の実施形態のモデル更新装置100と異なる。それ以外の構成は、第一の実施形態と同様である。 FIG. 10 is a block diagram showing a configuration example of the second embodiment of the model updating device according to the present invention. The model updating device 200 according to the present embodiment includes a storage unit 10, an input unit 20, a learning data generation unit 31, a model learning unit 32, a score calculation unit 33, a condition extraction unit 34, a condition generation unit 35, , A filter generation unit 61, and an output unit 70. That is, instead of the data extracting unit 30, the data replenishing unit 40, the model generating unit 50, and the model updating unit 60, the model updating device 200 of the present embodiment includes a learning data generating unit 31, a model learning unit 32, and a score calculating unit 33. , A condition extracting unit 34, a condition generating unit 35, and a filter generating unit 61 are different from the model updating apparatus 100 of the first embodiment. Other configurations are the same as in the first embodiment.
 記憶部10は、第一の実施形態と同様、判別対象のデータや各種パラメータを記憶する。また、入力部20は、第一の実施形態と同様、判別対象のデータを入力する。 The storage unit 10 stores data to be determined and various parameters, as in the first embodiment. The input unit 20 inputs the data to be determined as in the first embodiment.
 学習データ生成部31は、後述するモデル学習部32が階層型混合モデルを学習する際に利用する学習データを生成する。モデル学習部32は、生成された学習データを用いて異種混合機械学習により階層型混合モデルを生成する。なお、異種混合機械学習について、より詳しくは、モデル学習部32は、情報量基準FIC(Factorized Information Criterion)の下限を最大化するFAB(Factorized Asymptotic Bayesian inference)を用いることが好ましい。ただし、同様の技術であれば、モデル学習部32が階層型混合モデルを学習する方法は、異種混合機械学習に限定されない。 The learning data generation unit 31 generates learning data used when the model learning unit 32 described later learns the hierarchical mixed model. The model learning unit 32 generates a hierarchical mixed model by heterogeneous machine learning using the generated learning data. More specifically, with regard to heterogeneous machine learning, the model learning unit 32 preferably uses FAB (Factorized Asymptotic, Bayesian, inference) that maximizes the lower limit of the information criterion FIC (Factorized, Information, Criterion). However, with a similar technique, the method by which the model learning unit 32 learns the hierarchical mixture model is not limited to the heterogeneous mixture machine learning.
 スコア算出部33は、階層型混合モデルにおける葉ノードごとに、データの判別結果を算出する。条件抽出部34は、予め定めた基準に基づいて、各葉ノードへの分岐条件を抽出する。条件生成部35は、抽出された条件を組み合わせた条件を生成する。 The 算出 score calculation unit 33 calculates a data determination result for each leaf node in the hierarchical mixture model. The condition extraction unit 34 extracts a branch condition for each leaf node based on a predetermined criterion. The condition generator 35 generates a condition combining the extracted conditions.
 本実施形態における学習データ生成部31、モデル学習部32、スコア算出部33、条件抽出部34および条件生成部35は、対象とするデータや用いる基準が、処理の進み具合に応じて異なる。本実施形態では、各構成部が、処理の進み具合に応じて対象とするデータや用いる基準を変えて動作する場合について説明する。ただし、各処理の内容に応じて、各構成部が、それぞれ別の構成部として実現されていてもよい。以下、処理の流れに沿って、各構成部の動作を説明する。 The learning data generation unit 31, the model learning unit 32, the score calculation unit 33, the condition extraction unit 34, and the condition generation unit 35 in the present embodiment differ in target data and criteria to be used depending on the progress of processing. In the present embodiment, a case will be described in which each component unit operates by changing target data and a reference to be used according to the progress of processing. However, each component may be implemented as a separate component according to the content of each process. Hereinafter, the operation of each component will be described along the processing flow.
[(1)クリーンゾーン抽出処理]
 まず、第一の処理として、学習データにおけるクリーンゾーンを抽出する処理を説明する。上述するように、クリーンゾーンは、判別結果が一意に決まるゾーンであり、データについて予測するための説明変数が充足しているゾーンであるということができる。前提として、第一の実施形態と同様、基準とする判別モデルが決定されているものとする。図11は、基準とする判別モデルの例を示す説明図である。図11に例示する判別モデルM20は、条件C1および条件C2に基づいて、入力されたデータが3種類のいずれかの葉ノードに分類され、各葉ノードに配された判別式Y21~Y23に基づいて判別されることを示す。
[(1) Clean zone extraction processing]
First, as a first process, a process of extracting a clean zone in learning data will be described. As described above, the clean zone is a zone in which the determination result is uniquely determined, and can be said to be a zone in which explanatory variables for predicting data are satisfied. As a premise, as in the first embodiment, it is assumed that a reference discrimination model has been determined. FIG. 11 is an explanatory diagram illustrating an example of a discrimination model serving as a reference. In the discriminant model M20 illustrated in FIG. 11, based on the conditions C1 and C2, input data is classified into any of three types of leaf nodes, and discriminants Y 21 to Y 23 arranged in each leaf node. Is determined based on
 また、記憶部10には、判別結果を示すラベルを含む教師データが予め記憶されているとする。ここでは、教師データが10万件存在し、教師データにおける正例と負例との割合が、19:1であるとする。入力部20は、記憶部10に記憶された教師データを読み取り、学習データ生成部31に入力する。 教師 Further, it is assumed that teacher data including a label indicating a determination result is stored in the storage unit 10 in advance. Here, it is assumed that there are 100,000 teacher data, and the ratio of positive examples to negative examples in the teacher data is 19: 1. The input unit 20 reads the teacher data stored in the storage unit 10 and inputs the data to the learning data generation unit 31.
 学習データ生成部31は、入力された判別結果を示すラベルを含む教師データを、基準とする判別モデルに適用する。そして、学習データ生成部31は、判別モデルによる判別結果とラベルとが一致した教師データを正例とし、判別結果とラベルとが異なる教師データを負例とする学習データ(以下、第一学習データと記す。)を生成する。 The learning data generation unit 31 applies the teacher data including the label indicating the input discrimination result to the discrimination model serving as a reference. Then, the learning data generating unit 31 sets learning data in which the discrimination result of the discrimination model matches the label as a positive example, and sets learning data in which the discrimination result and the label differ from each other as a negative example (hereinafter referred to as first learning data). .) Is generated.
 例えば、教師データを判別モデルに適用した結果、TP=93.5K、TN=0.3K、FP=5.6K、FN=0.6Kと判別されたとする。この場合、学習データ生成部31は、TPおよびTNと判別された教師データを正例とする学習データを生成する。同様に、学習データ生成部31は、FPおよびFNと判別された教師データを負例とする学習データを生成する。 For example, it is assumed that as a result of applying the teacher data to the discrimination model, it is determined that TP 1 = 93.5K, TN 1 = 0.3K, FP 1 = 5.6K, and FN 1 = 0.6K. In this case, the learning data generation unit 31 generates learning data using the teacher data determined as TP 1 and TN 1 as a positive example. Similarly, the learning data generation unit 31 generates learning data using the teacher data determined as FP 1 and FN 1 as a negative example.
 モデル学習部32は、生成された第一学習データを用いて異種混合機械学習により階層型混合モデル(以下、第一階層型混合モデルと記す。)を生成する。なお、ここで生成された第一階層型混合モデルは、基準とする判別モデルとは異なるモデルである。異種混合機械学習により階層型混合モデルを生成することで、自動で分割されたデータから、元データにおいて混在していたパターンや規則性を分離抽出することが可能になる。また、生成された階層型混合モデルを用いることで、門木により分類されたデータ群をリーフに配された回帰式で適切に判別できる。また、異種混合機械学習により生成されるモデルは、様々な切り口で分析が可能な手法であり、基準とする判別モデルの説明変数で分析することが可能である。 The model learning unit 32 generates a hierarchical mixed model (hereinafter, referred to as a first hierarchical mixed model) by heterogeneous machine learning using the generated first learning data. Note that the first hierarchical mixed model generated here is a model different from the reference discriminant model. By generating a hierarchical mixture model by heterogeneous machine learning, it becomes possible to separate and extract patterns and regularities mixed in the original data from the automatically divided data. In addition, by using the generated hierarchical mixture model, the data group classified by the gate tree can be appropriately determined by the regression equation arranged on the leaf. Further, a model generated by heterogeneous machine learning is a technique that can be analyzed at various angles, and can be analyzed using explanatory variables of a discriminant model used as a reference.
 図12は、第一学習データを用いて生成された階層型混合モデルの例を示す説明図である。モデル学習部32は、例えば、図12に例示する階層型混合モデルを生成する。図12に例示する判別モデルは、条件C3および条件C4に基づいて、第一学習データが3種類のいずれかの葉ノードに分類され、各葉ノードに配された判別式Y31~Y33に基づいて判別されることを示す。 FIG. 12 is an explanatory diagram illustrating an example of a hierarchical mixture model generated using the first learning data. The model learning unit 32 generates, for example, a hierarchical mixture model illustrated in FIG. In the discriminant model illustrated in FIG. 12, the first learning data is classified into any of three types of leaf nodes based on the conditions C3 and C4, and the discriminant expressions Y 31 to Y 33 arranged in each leaf node are used. Indicates that it is determined based on
 具体的には、図12に示す例では、判別式Y31で判別されるデータは、条件「C=0」かつ「D≠0」を満たすデータであり、判別式Y32で判別されるデータは、条件「C=0」かつ「D=0」を満たすデータであり、判別式Y33で判別されるデータは、条件「C≠0」を満たすデータである。 Data Specifically, in the example shown in FIG. 12, data that is determined in discriminant Y 31 is data that satisfies the condition "C = 0" and "D ≠ 0", which is determined by the discriminant Y 32 is data that satisfies the condition "C = 0" and "D = 0", the data is determined by the discriminant Y 33 is data that satisfies the condition "C ≠ 0".
 スコア算出部33は、生成された第一階層型混合モデルにおける葉ノードごとに、その葉ノードに分類された第一学習データのうち、正例とされた第一学習データが正しく判別されたデータの割合(すなわち、TPの割合)を算出する。以下、ここで算出された割合のことを第一スコアと記す。 The score calculation unit 33 calculates, for each leaf node in the generated first hierarchical mixed model, among the first learning data classified into the leaf node, data in which the first learning data set as a positive example is correctly determined. Is calculated (that is, the ratio of TP). Hereinafter, the calculated ratio is referred to as a first score.
 図13は、判別結果の例を示す説明図である。図13に示す例は、判別式Y31~Y33による判別結果が、それぞれ、TP、FP、TNおよびFNに分類されていることを示す。例えば、判別式Y31によって、正例とされた5件の第一学習データが全て正しく判別されたことから、この5件の第一学習データについて予測するための説明変数は充足していると言える。 FIG. 13 is an explanatory diagram illustrating an example of the determination result. The example shown in FIG. 13 shows that the determination results by the discriminants Y 31 to Y 33 are classified into TP 2 , FP 2 , TN 2 and FN 2 respectively. For example, the discriminant Y 31, since the first learning data of five, which is a positive cases were all correctly determined, explanatory variables to predict for the first learning data of the five is the they meet I can say.
 条件抽出部34は、予め定めた基準を満たすような第一スコアが算出された葉ノードへの分岐条件を抽出する。ここで定められる基準は、第一階層型混合モデルに用いられる説明変数を用いることで判別可能なデータが分類される葉ノードか否か判断するための基準である。以下の説明では、この基準を第一の基準と記す。すなわち、第一の基準は、上述するような、判別結果が一意に決まるゾーン(クリーンゾーン)か否か判別するための基準とも言える。例えば、第一の基準として「100%を満たす」という基準が設定されていてもよい。これは、対象の葉ノードに分類されるデータについて予測するための説明変数が充足していることを示す。ただし、第一の基準は「100%を満たす」ものに限定されず、100未満の所定の値(例えば、0.995など)が第一の基準として設定されてもよい。 The condition extraction unit 34 extracts a branch condition to a leaf node whose first score has been calculated so as to satisfy a predetermined criterion. The criterion defined here is a criterion for determining whether or not data that can be determined by using the explanatory variables used in the first hierarchical mixture model is a leaf node to be classified. In the following description, this criterion is referred to as a first criterion. That is, the first criterion can be said to be a criterion for determining whether or not a zone (clean zone) for which the determination result is uniquely determined as described above. For example, a criterion of “satisfying 100%” may be set as the first criterion. This indicates that the explanatory variables for predicting data classified as the target leaf node are satisfied. However, the first criterion is not limited to “satisfying 100%”, and a predetermined value less than 100 (for example, 0.995) may be set as the first criterion.
 例えば、第一の基準を「100%を満たす」とした場合、図13に示す例では、判別式Y31による判別結果に対し、TP/CNT=5/5=1.0=100%を満たす。そのため、条件抽出部34は、判別式Y31が設定された葉ノードへの分岐条件(「C=0」かつ「D≠0」)を抽出する。なお、図13では、抽出される分岐条件が一つの場合を例示しているが、抽出される分岐条件は一つに限定されず、二つ以上であってもよい。 For example, when the first reference to "meet 100%", in the example shown in FIG. 13, determination result by the discriminant Y 31 to satisfy the TP / CNT = 5/5 = 1.0 = 100% . Therefore, condition extracting unit 34 extracts the discriminant Y 31 is the branch condition to set the leaf node ( "C = 0" and "D ≠ 0"). Although FIG. 13 illustrates a case where only one branch condition is extracted, the number of branch conditions to be extracted is not limited to one and may be two or more.
 モデル学習部32、スコア算出部33および条件抽出部34は、上述する処理を繰り返す。繰り返す回数は、マシンリソース等にも依存するが、例えば、数百回から数千回の単位で繰り返されることが好ましい。例えば、一般的な二分木と、その二分木の葉ノードに配する判別式を学習する機械学習アルゴリズムとを用いて、データを分類しようとした場合、数百回から数千回程度の実行では、適切にデータを分類することは難しい。一方、本実施形態では、モデル学習部32が、全体の正則化が可能な階層型混合モデル(第一階層型混合モデル)を生成する。そのため、数百回から数千回程度の学習の実行により、データを分類することが可能である。 The model learning unit 32, the score calculation unit 33, and the condition extraction unit 34 repeat the above processing. The number of repetitions depends on the machine resources and the like, but is preferably repeated in units of, for example, several hundred to several thousand. For example, if an attempt is made to classify data using a general binary tree and a machine learning algorithm that learns a discriminant to be assigned to a leaf node of the binary tree, if the data is executed several hundred to several thousand times, It is difficult to classify data. On the other hand, in the present embodiment, the model learning unit 32 generates a hierarchical mixed model (first hierarchical mixed model) that can be regularized as a whole. Therefore, the data can be classified by performing the learning several hundred to several thousand times.
 具体的には、モデル学習部32は、例えば、異種混合機械学習における初期パラメータを変更することで、生成された同一の第一学習データを用いて複数種類の第一階層型混合モデルを生成する。スコア算出部33は、生成された第一階層型混合モデルにおける葉ノードごとにTPの割合を算出し、条件抽出部34は、生成された第一階層型混合モデルごとに、第一の基準を満たす第一スコアが算出された葉ノードへの分岐条件を抽出する。 More specifically, the model learning unit 32 generates a plurality of types of first hierarchical mixed models using the same generated first learning data, for example, by changing initial parameters in heterogeneous machine learning. . The score calculation unit 33 calculates the ratio of TP for each leaf node in the generated first hierarchical mixed model, and the condition extracting unit 34 determines a first criterion for each generated first hierarchical mixed model. The branch condition to the leaf node for which the first score to be satisfied is calculated is extracted.
 条件生成部35は、第一の基準を満たす分岐条件を組み合わせた条件(以下、判別可能条件と記す。)を生成する。具体的には、条件生成部35は、抽出された全ての分岐条件を組み合わせて、判別可能条件を生成する。この判別可能条件は、判別結果が一意に決まるゾーンを組み合わせた条件であることから、クリーンゾーンの特定条件と言うことができる。例えば、Z個の葉ノードへの分岐条件が抽出された場合、完全にルール化可能(すなわち、既知の説明変数を用いて予測可能)なZ個のゾーン(セグメント)を抽出できていると言える。 (4) The condition generator 35 generates a condition (hereinafter, referred to as a distinguishable condition) in which branch conditions satisfying the first criterion are combined. Specifically, the condition generating unit 35 generates a distinguishable condition by combining all the extracted branch conditions. Since the discriminable condition is a condition in which zones in which the discrimination result is uniquely determined are combined, it can be said that it is a clean zone specific condition. For example, when a branch condition to Z leaf nodes is extracted, it can be said that Z zones (segments) that can be completely ruled (that is, can be predicted using a known explanatory variable) have been extracted. .
[(2)グレーゾーン抽出処理]
 次に、第二の処理として、学習データにおけるグレーゾーンを抽出する処理を説明する。グレーゾーン抽出処理は、上述する第一の処理(すなわち、クリーンゾーン抽出処理)を効率的に進めるための補助処理である。第一の処理にて、クリーンゾーンが抽出されているため、学習データ生成部31は、第一学習データのうち、判別可能条件に該当する学習データを除外した学習データ(以下、第二学習データと記す。)を生成する。例えば、判別可能条件に該当する第一学習データが10万件中4万件存在したとする。この場合、学習データ生成部31は、該当する4万件を10万件から除外して、6万件の第二学習データを生成する。
[(2) Gray zone extraction processing]
Next, as a second process, a process of extracting a gray zone in learning data will be described. The gray zone extraction process is an auxiliary process for efficiently proceeding with the above-described first process (that is, the clean zone extraction process). Since the clean zone has been extracted in the first processing, the learning data generating unit 31 determines the learning data (hereinafter, referred to as the second learning data) excluding the learning data corresponding to the discriminable condition from the first learning data. .) Is generated. For example, suppose that there are 40,000 first learning data items corresponding to the discriminable condition out of 100,000. In this case, the learning data generation unit 31 excludes the corresponding 40,000 cases from 100,000 cases and generates 60,000 second learning data.
 このように、学習データからクリーンデータが除外されることにより、残った学習データでは、判別が困難なデータの割合が増加する。例えば、この処理により、正例が5.4万件、負例が0.6万件になった場合、正例と負例の比は、9:1になる。 ク リ ー ン By removing clean data from the learning data, the ratio of the remaining learning data that is difficult to determine increases. For example, if the number of positive cases becomes 54,000 and the number of negative cases becomes 65,000 by this process, the ratio of the positive case to the negative case becomes 9: 1.
 モデル学習部32は、生成された第二学習データを用いて異種混合機械学習により階層型混合モデル(以下、第二階層型混合モデルと記す。)を生成する。 The model learning unit 32 generates a hierarchical mixed model (hereinafter, referred to as a second hierarchical mixed model) by heterogeneous machine learning using the generated second learning data.
 スコア算出部33は、生成された第二階層型混合モデルにおける葉ノードごとに、その葉ノードに分類された第二学習データのうち、正例とされている第二学習データが正しく判別された割合と、負例とされている第二学習データが正しく判別されなかった割合との和(すなわち、(TP+FN)の割合)を算出する。以下、ここで算出された割合のことを第二スコアと記す。 For each leaf node in the generated second hierarchical mixed model, the score calculation unit 33 has correctly determined the second learning data that is regarded as a positive example among the second learning data classified into the leaf node. The sum of the ratio and the ratio at which the second learning data set as a negative example was not correctly determined (that is, the ratio of (TP + FN)) is calculated. Hereinafter, the calculated ratio is referred to as a second score.
 図14は、判別結果の例を示す説明図である。図14に示す例は、判別式Y41~Y43による判別結果が、それぞれ、TP、FP、TNおよびFNに分類されていることを示す。スコア算出部33は、例えば、判別式Y41について、正例とされている第二学習データが正しく判別された割合0(=0/6)と、負例とされている第二学習データが正しく判別されなかった割合0.16(=1/6)との和を0.16と算出する。 FIG. 14 is an explanatory diagram illustrating an example of the determination result. The example shown in FIG. 14 shows that the discrimination results by the discriminants Y 41 to Y 43 are classified into TP 3 , FP 3 , TN 3 and FN 3 respectively. Score calculation unit 33, for example, the discriminant Y 41, the proportion second learning data is a positive case was correctly discriminated 0 (= 0/6), the second learning data is a negative example The sum of the incorrectly determined ratio 0.16 (= 1/6) is calculated as 0.16.
 条件抽出部34は、予め定めた基準を満たすような第二スコアが算出された葉ノードへの分岐条件を抽出する。ここで定められる基準は、第二階層型混合モデルに用いられる説明変数を用いただけでは判別が困難なデータが分類される葉ノードか否か判断するための基準である。以下の説明では、この基準を第二の基準と記す。すなわち、第二の基準は、上述するような、判別結果が一意には決まらないゾーン(グレーゾーン)か否か判別するための基準とも言える。第二の基準の値を小さくすれば、グレーデータの割合が多いゾーンを示す分岐条件が抽出でき、第二の基準の値を大きくすれば、より多くの分岐条件が抽出できることになる。第二の基準として、例えば、「0.5より小さい」という基準を設定可能である。なお、設定される値は0.5に限定されず、例えば、0.7~0.8の値が設定されてもよい。 The condition extraction unit 34 extracts a branch condition to a leaf node whose second score has been calculated so as to satisfy a predetermined criterion. The criterion determined here is a criterion for judging whether or not the data is a leaf node into which data that is difficult to determine only by using an explanatory variable used in the second hierarchical mixed model is classified. In the following description, this criterion is referred to as a second criterion. That is, the second criterion can be said to be a criterion for determining whether or not the determination result is a zone (gray zone) for which the determination result is not uniquely determined, as described above. If the value of the second criterion is reduced, a branch condition indicating a zone with a large proportion of gray data can be extracted, and if the value of the second criterion is increased, more branch conditions can be extracted. As the second criterion, for example, a criterion of “less than 0.5” can be set. The value to be set is not limited to 0.5, but may be set to a value of 0.7 to 0.8, for example.
 モデル学習部32、スコア算出部33および条件抽出部34は、第一の処理の場合と同様、上述する処理を繰り返す。繰り返す回数は、マシンリソース等にも依存するが、例えば、数百回から数千回の単位で繰り返されることが好ましい。第一階層型混合モデルの生成と同様、本実施形態では、モデル学習部32が、全体の正則化が可能な階層型混合モデル(第二階層型混合モデル)を生成する。そのため、数百回から数千回程度の学習の実行により、データを分類することが可能である。 The model learning unit 32, the score calculation unit 33, and the condition extraction unit 34 repeat the above-described processing as in the case of the first processing. The number of repetitions depends on the machine resources and the like, but is preferably repeated in units of, for example, several hundred to several thousand. In the present embodiment, similarly to the generation of the first hierarchical mixed model, in the present embodiment, the model learning unit 32 generates a hierarchical mixed model (second hierarchical mixed model) that can be regularized as a whole. Therefore, the data can be classified by performing the learning several hundred to several thousand times.
 具体的には、モデル学習部32は、生成された同一の第二学習データを用いて複数種類の第二階層型混合モデルを生成する。スコア算出部33は、生成された第二階層型混合モデルにおける葉ノードごとに、TP+FNの割合を算出し、条件抽出部34は、生成された第二階層型混合モデルごとに、第二の基準を満たす第二スコアが算出された葉ノードへの分岐条件を抽出する。 Specifically, the model learning unit 32 generates a plurality of types of second hierarchical mixed models using the same generated second learning data. The score calculation unit 33 calculates the ratio of TP + FN for each leaf node in the generated second hierarchical mixed model, and the condition extracting unit 34 determines the second reference for each generated second hierarchical mixed model. The branch condition to the leaf node for which the second score that satisfies is satisfied is extracted.
 条件生成部35は、第二の基準を満たす条件を組み合わせた条件(以下、判別困難条件と記す。)を生成する。具体的には、条件生成部35は、抽出された全ての分岐条件を組み合わせて、判別困難条件を生成する。この判別困難条件は、与えられた説明変数だけでは判別が困難なゾーンを組み合わせた条件であることから、グレーゾーンの特定条件と言うことができる。 The condition generator 35 generates a condition (hereinafter, referred to as a difficult-to-discriminate condition) obtained by combining conditions satisfying the second criterion. Specifically, the condition generating unit 35 generates a difficult-to-discriminate condition by combining all the extracted branch conditions. Since the difficult-to-discriminate condition is a condition combining zones that are difficult to determine only with the given explanatory variables, it can be said that it is a specific condition of the gray zone.
[(3)グレーデータ除外処理]
 次に、第三の処理として、第二学習データからグレーデータを除外する処理を説明する。学習データ生成部31は、第二学習データのうち、判別困難条件に該当する学習データを除外したデータ(以下、第三学習データと記す。)を生成する。
[(3) Gray data exclusion processing]
Next, as a third process, a process of excluding gray data from the second learning data will be described. The learning data generation unit 31 generates data (hereinafter, referred to as third learning data) excluding the learning data corresponding to the difficult-to-discriminate condition from the second learning data.
 例えば、図14において、判別式Y41で判別されるデータの正例と負例の割合は1:5であり、判別式Y42で判別されるデータの正例と負例の割合は1:1である。ここで、判別式Y41で判別されるデータおよび判別式Y42で判別されるデータをグレーデータとして抽出すれば、2件の正例を除外するだけで6件の負例も除外することが可能になる。これによって、学習データ内のクリーンデータの割合を増加させることが可能になる。 For example, in FIG. 14, positive cases and negative examples percentage of data as determined by the discriminant Y 41 is 1: 5, positive cases and negative cases the proportion of data which is determined in discriminant Y 42 is 1: It is one. Here, if the extraction of data as determined by the discriminant data and discriminant Y 42 is determined in Y 41 as gray data, but also to exclude negative example 6 by simply excluding positive example 2 Will be possible. This makes it possible to increase the ratio of clean data in the learning data.
 以上に示す(1)クリーンゾーン抽出処理、(2)グレーゾーン抽出処理、および、(3)グレーデータ除外処理によって、クリーンデータまたはグレーデータと判断されたデータが学習データから除外されることになる。クリーンデータまたはグレーデータを、より除外するため、(1)クリーンゾーン抽出処理、(2)グレーゾーン抽出処理、および、(3)グレーデータ除外処理が繰り返されてもよい。すなわち、モデル学習部32は、生成された第三学習データを用いて第一階層型混合モデルを生成してもよい。 By the above-described (1) clean zone extraction processing, (2) gray zone extraction processing, and (3) gray data exclusion processing, data determined as clean data or gray data is excluded from the learning data. . In order to further exclude clean data or gray data, (1) clean zone extraction processing, (2) gray zone extraction processing, and (3) gray data exclusion processing may be repeated. That is, the model learning unit 32 may generate the first hierarchical mixture model using the generated third learning data.
[(4)グレーゾーン深堀処理]
 次に、第四の処理として、グレーゾーンを特定する条件をより詳細化する処理(グレーゾーンの深堀処理)を説明する。ここでは、(3)グレーデータ除外処理が行われた学習データ(すなわち、第三学習データ)に対して(4)グレーゾーン深堀処理を行う場合について説明する。ただし、第一学習データに対して(4)グレーゾーン深堀処理が行われてもよい。
[(4) Gray zone deep moat treatment]
Next, as a fourth process, a process for further refining the conditions for specifying the gray zone (gray zone deep digging process) will be described. Here, a case will be described in which (3) gray zone deep excavation processing is performed on (3) learning data on which gray data exclusion processing has been performed (that is, third learning data). However, (4) gray zone deep excavation processing may be performed on the first learning data.
 (1)クリーンゾーン抽出処理で示す処理と同様、スコア算出部33は、第三学習データを用いて生成された第一階層型混合モデルにおける葉ノードごとに第一スコアを算出する。また、条件抽出部34は、第一の基準を満たさない第一スコアが算出された葉ノードへの分岐条件を抽出する。言い換えると、条件抽出部34は、クリーンゾーンと判断されない葉ノードへの分岐条件を抽出する。 ス コ ア Similar to the processing shown in (1) clean zone extraction processing, the score calculation unit 33 calculates the first score for each leaf node in the first hierarchical mixed model generated using the third learning data. Further, the condition extracting unit 34 extracts a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated. In other words, the condition extracting unit 34 extracts a branch condition to a leaf node that is not determined as a clean zone.
 モデル学習部32は、分岐条件の抽出対象になった葉ノードへ分類される学習データを用いて、その葉ノード配下で条件分岐する階層型混合モデル(以下、第三階層型混合モデルと記す。)を生成する。 The model learning unit 32 uses learning data classified into leaf nodes from which branch conditions have been extracted, and uses a hierarchical mixed model (hereinafter, referred to as a third hierarchical mixed model) that branches conditionally under the leaf nodes. ).
 図15は、葉ノード配下に、新たな階層型混合モデルを生成する処理の例を示す説明図である。図15に例示する階層型混合モデルM21は、図11に例示する階層型混合モデルと同様である。なお、図15において、各葉ノードにおける分類されたデータの判別結果D51~D53を、各吹き出しで示している。判別結果において、「○」の数は、TPのデータの割合を示し、「×」の数は、それ以外(すなわち、TN、FP、FN)のデータの割合を示す。 FIG. 15 is an explanatory diagram showing an example of processing for generating a new hierarchical mixed model under a leaf node. The hierarchical mixed model M21 illustrated in FIG. 15 is the same as the hierarchical mixed model illustrated in FIG. In FIG. 15, each balloon indicates the classification results D51 to D53 of the classified data at each leaf node. In the discrimination result, the number of “○” indicates the ratio of TP data, and the number of “×” indicates the ratio of other (that is, TN, FP, FN) data.
 例えば、図15に例示する葉ノードC7について、TPの割合が判別結果D51で示すように概ね50%であったとする。この場合、この葉ノードで算出される第一スコアは、第一の基準を満たさないため、条件抽出部34は、葉ノードC7への分岐条件を抽出する。そして、モデル学習部32は、判別結果D51のデータを用いて、葉ノードC7配下で条件分岐する第三階層型混合モデルを生成する。 For example, it is assumed that, for the leaf node C7 illustrated in FIG. 15, the ratio of the TP is approximately 50% as indicated by the determination result D51. In this case, since the first score calculated at this leaf node does not satisfy the first criterion, the condition extracting unit 34 extracts a branch condition to the leaf node C7. Then, the model learning unit 32 uses the data of the determination result D51 to generate a third hierarchical mixed model that branches conditionally under the leaf node C7.
 図16は、生成された第三階層型混合モデルの例を示す説明図である。図16に例示するように、モデル学習部32は、図15に例示する葉ノードC7配下で条件分岐するような第三階層型混合モデルM23を生成する。新たな第三階層型混合モデルが葉ノード配下に設定されることで、各葉ノードへのより詳細な分岐条件が定義でき、各葉ノードの判別結果D61~D63が算出される。その結果、既知の説明変数で予測できるデータ群の条件と、既知の説明変数では予測できないデータ群の条件とを、さらに絞り込むことが可能になる。例えば、判別式Y63に分類される学習データの判別結果D63がTP100%を示している。そのため、このノードに分類されるデータは、既知の説明変数で予測できると言える。そこで、条件抽出部34は、この葉ノードを第一の基準を満たす葉ノード(すなわち、クリーンゾーン)として特定し、分岐条件を抽出する。 FIG. 16 is an explanatory diagram illustrating an example of the generated third-layer mixed model. As illustrated in FIG. 16, the model learning unit 32 generates a third hierarchical mixed model M23 that branches conditionally under the leaf node C7 illustrated in FIG. By setting a new third-layer mixed model under the leaf nodes, more detailed branch conditions for each leaf node can be defined, and the discrimination results D61 to D63 of each leaf node are calculated. As a result, it is possible to further narrow down the conditions of the data group that can be predicted with the known explanatory variables and the conditions of the data group that cannot be predicted with the known explanatory variables. For example, the determination result D63 of the learning data classified into discriminant Y 63 indicates a TP100%. Therefore, it can be said that the data classified into this node can be predicted by a known explanatory variable. Therefore, the condition extracting unit 34 specifies this leaf node as a leaf node satisfying the first criterion (that is, a clean zone), and extracts a branch condition.
 また、生成された第三階層型混合モデルの各葉ノードに対して、さらに同様の処理が行われてもよい。すなわち、スコア算出部33が、第三階層型混合モデルにおける葉ノードごとに第一スコアを算出し、条件抽出部34が、第一の基準を満たす第一スコアが算出された葉ノードへの分岐条件を抽出し、モデル学習部32が、葉ノードへ分類される学習データを用いて、その葉ノード配下で条件分岐する第三階層型混合モデルを生成してもよい。 Further, the same processing may be further performed on each leaf node of the generated third hierarchical mixed model. That is, the score calculation unit 33 calculates the first score for each leaf node in the third hierarchical mixed model, and the condition extraction unit 34 determines whether the branch to the leaf node for which the first score satisfying the first criterion has been calculated. The condition may be extracted, and the model learning unit 32 may use the learning data classified into the leaf nodes to generate a third hierarchical mixed model that branches conditionally under the leaf nodes.
 また、図16に示す例では、一つの葉ノードのみ分岐条件を深堀する場合について説明している。ただし、分岐条件を深堀する対象は、一つの葉ノードに限定されず、二以上の葉ノードであってもよい。例えば、図16に例示する葉ノードC9についても、算出される第一スコアは、第一の基準を満たさないため、条件抽出部34は、葉ノードC9への分岐条件を抽出する。 In the example shown in FIG. 16, a case is described in which the branch condition is digged only for one leaf node. However, the target for which the branch condition is to be excavated is not limited to one leaf node, but may be two or more leaf nodes. For example, also with respect to the leaf node C9 illustrated in FIG. 16, the calculated first score does not satisfy the first criterion, so the condition extracting unit 34 extracts a branch condition to the leaf node C9.
 そして、条件生成部35は、第一の基準を満たす葉ノード(すなわち、クリーンゾーン)として抽出された分岐条件をさらに組み合わせた判別可能条件を生成する。 {Circle around (5)} Then, the condition generating unit 35 generates a discriminable condition in which branch conditions extracted as leaf nodes (that is, clean zones) satisfying the first criterion are further combined.
 このように、ある葉ノードへの分岐条件だけでは、学習データを十分に分類できない場合であっても、モデル学習部32が、より深堀した分岐条件を含むモデル(すなわち、第三階層型混合モデル)を生成することで、より詳細な条件で、クリーンゾーンとグレーゾーンを切り分けすることが可能になる。このように、より詳細に切り分けを行うことで、予測のための説明変数が充足しているノードと、充足していないノードを、さらに特定することが可能になる。 As described above, even when the learning data cannot be sufficiently classified only by the branch condition to a certain leaf node, the model learning unit 32 determines the model including the deeper branch condition (that is, the third hierarchical mixed model). ) Makes it possible to separate the clean zone and the gray zone under more detailed conditions. As described above, by performing more detailed segmentation, it is possible to further specify nodes that are satisfied with explanatory variables for prediction and nodes that are not satisfied.
 フィルタ生成部61は、グレーゾーンのデータ(すなわち、グレーデータ)を除去する条件(以下、フィルタ条件と記す。)を生成する。言い換えると、フィルタ生成部61は、既知の説明変数では予測できない条件を満たすデータを除去するためのフィルタ条件を生成する。具体的には、フィルタ生成部61は、第一の基準を満たさない第一スコアが算出された葉ノードへの分岐条件と、第二の基準を満たす分岐条件(すなわち、判別困難条件)とを組み合わせて、フィルタ条件を生成する。なお、条件抽出部34が、第一の基準を満たさない第一スコアが算出された葉ノードへの分岐条件を抽出してもよく、条件生成部35が、第二の基準を満たす分岐条件を組み合わせた判別困難条件を生成してもよい。 The filter generation unit 61 generates a condition (hereinafter, referred to as a filter condition) for removing gray zone data (that is, gray data). In other words, the filter generation unit 61 generates a filter condition for removing data satisfying a condition that cannot be predicted by a known explanatory variable. Specifically, the filter generation unit 61 compares a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated and a branch condition that satisfies the second criterion (that is, a difficult-to-discriminate condition). Combined to generate filter conditions. Note that the condition extracting unit 34 may extract a branch condition to a leaf node for which a first score that does not satisfy the first criterion is calculated, and the condition generating unit 35 may determine a branch condition that satisfies the second criterion. Combination difficulties may be generated.
 出力部70は、フィルタ生成部61によって生成されたフィルタ条件を出力する。 The output unit 70 outputs the filter condition generated by the filter generation unit 61.
 図17は、判別システムの例を示す説明図である。図17に例示する判別システム500は、判別装置510と、グレーゾーン除去装置520とを備えている。 FIG. 17 is an explanatory diagram showing an example of the determination system. The discrimination system 500 illustrated in FIG. 17 includes a discrimination device 510 and a gray zone removal device 520.
 判別装置510は、基準とする判別モデルM20に基づいて、入力データ521の判別を行う。グレーゾーン除去装置520は、フィルタ生成部61によって生成されたフィルタ条件に基づいて、入力データからグレーデータ522を除去し、クリーンデータ523を判別装置510に入力する。このように、グレーゾーン除去装置520が、グレーデータ522を事前に除去することで、判別装置510に入力されるクリーンデータの判別結果が保証されることになる。 The discrimination device 510 discriminates the input data 521 based on the discrimination model M20 used as a reference. The gray zone removal device 520 removes the gray data 522 from the input data based on the filter condition generated by the filter generation unit 61, and inputs the clean data 523 to the determination device 510. As described above, the gray zone removal device 520 removes the gray data 522 in advance, so that the determination result of the clean data input to the determination device 510 is guaranteed.
 なお、図17に例示する判別システムは、既知の説明変数を用いた条件設定のみで判別可能なデータか否かを選別する装置と言えることから、図17に例示する判別システムを、判別可能データ選別システムと言うことができる。また、本実施形態のモデル更新装置によっても、図17に例示する判別システムを実現できることから、第二の実施形態のモデル更新装置を、判別可能データ選別システムと言うこともできる。 Note that the discrimination system illustrated in FIG. 17 can be said to be a device that selects whether data can be discriminated only by setting conditions using known explanatory variables. Therefore, the discrimination system illustrated in FIG. It can be called a sorting system. Further, since the discriminating system illustrated in FIG. 17 can be realized by the model updating device of the present embodiment, the model updating device of the second embodiment can also be called a discriminable data selection system.
 入力部20と、学習データ生成部31と、モデル学習部32と、スコア算出部33と、条件抽出部34と、条件生成部35と、フィルタ生成部61と、出力部70とは、プログラム(判別可能データ選別プログラム)に従って動作するコンピュータのプロセッサによって実現される。 The input unit 20, the learning data generation unit 31, the model learning unit 32, the score calculation unit 33, the condition extraction unit 34, the condition generation unit 35, the filter generation unit 61, and the output unit 70 include a program ( (A discriminable data selection program).
 次に、本実施形態のモデル更新装置の動作を説明する。図18は、本実施形態のモデル更新装置200が行うクリーンゾーン抽出処理の動作例を示すフローチャートである。入力部20は、教師データを学習データ生成部31に入力する(ステップS21)。学習データ生成部31は、入力された判別結果を示すラベルを含む教師データを、基準とする判別モデルに適用する(ステップS22)。そして、学習データ生成部31は、判別結果とラベルとが一致した教師データを正例とし、判別結果とラベルとが異なる教師データを負例とする第一学習データを生成する(ステップS23)。 Next, the operation of the model updating apparatus according to the present embodiment will be described. FIG. 18 is a flowchart illustrating an operation example of the clean zone extraction process performed by the model updating device 200 according to the present embodiment. The input unit 20 inputs the teacher data to the learning data generation unit 31 (Step S21). The learning data generation unit 31 applies the teacher data including the label indicating the input discrimination result to the discrimination model serving as a reference (step S22). Then, the learning data generating unit 31 generates first learning data in which the teacher data whose discrimination result and the label match each other is set as a positive example, and the teacher data whose discrimination result is different from the label is set as a negative example (step S23).
 モデル学習部32は、生成された第一学習データを用いて異種混合機械学習により第一階層型混合モデルを生成する(ステップS24)。スコア算出部33は、生成された第一階層型混合モデルにおける葉ノードごとに、TPの割合を算出する(ステップS25)。条件抽出部34は、第一の基準を満たすような第一スコアが算出された葉ノードへの分岐条件を抽出する(ステップS26)。 The model learning unit 32 generates a first hierarchical mixed model by heterogeneous machine learning using the generated first learning data (step S24). The score calculation unit 33 calculates the ratio of TP for each leaf node in the generated first hierarchical mixture model (Step S25). The condition extraction unit 34 extracts a branch condition to the leaf node for which the first score has been calculated so as to satisfy the first criterion (Step S26).
 ステップS24からステップS26までの処理(すなわち、モデルを生成して分岐条件を抽出するまでの処理)を繰り返す回数が、所定の回数に達していない場合(ステップS27におけるNo)、ステップS24からステップS26までの処理が繰り返される。一方、繰り返す回数が、所定の回数に達している場合(ステップS27におけるYes)、条件生成部35は、第一の基準を満たす分岐条件を組み合わせた判別可能条件を生成する(ステップS28)。 If the number of repetitions of the processing from step S24 to step S26 (that is, the processing from generation of the model to extraction of the branch condition) has not reached the predetermined number (No in step S27), steps S24 to S26 The processing up to is repeated. On the other hand, when the number of repetitions has reached the predetermined number (Yes in step S27), the condition generating unit 35 generates a discriminable condition in which branch conditions satisfying the first criterion are combined (step S28).
 図19は、本実施形態のモデル更新装置200が行うグレーゾーン抽出処理の動作例を示すフローチャートである。学習データ生成部31は、第一学習データのうち、判別可能条件に該当する学習データを除外した第二学習データを生成する(ステップS31)。モデル学習部32は、生成された第二学習データを用いて異種混合機械学習により第二階層型混合モデルを生成する(ステップS32)。 FIG. 19 is a flowchart illustrating an operation example of a gray zone extraction process performed by the model updating apparatus 200 according to the present embodiment. The learning data generation unit 31 generates second learning data excluding the learning data corresponding to the discriminable condition from the first learning data (step S31). The model learning unit 32 generates a second hierarchical mixed model by heterogeneous machine learning using the generated second learning data (step S32).
 スコア算出部33は、生成された第二階層型混合モデルにおける葉ノードごとに、(TP+FN)の割合を算出する(ステップS33)。条件抽出部34は、第二の基準を満たすような第二スコアが算出された葉ノードへの分岐条件を抽出する(ステップS34)。 The score calculation unit 33 calculates the ratio of (TP + FN) for each leaf node in the generated second hierarchical mixed model (step S33). The condition extraction unit 34 extracts a branch condition to the leaf node for which the second score has been calculated so as to satisfy the second criterion (step S34).
 ステップS32からステップS34までの処理(すなわち、モデルを生成して分岐条件を抽出するまでの処理)を繰り返す回数が、所定の回数に達していない場合(ステップS35におけるNo)、ステップS32からステップS34までの処理が繰り返される。一方、繰り返す回数が、所定の回数に達している場合(ステップS35におけるYes)、条件生成部35は、第二の基準を満たす条件を組み合わせた判別困難条件を生成する(ステップS36)。 If the number of repetitions of the processing from step S32 to step S34 (that is, the processing from generation of the model to extraction of the branch condition) does not reach the predetermined number (No in step S35), steps S32 to S34 The processing up to is repeated. On the other hand, when the number of repetitions has reached the predetermined number (Yes in step S35), the condition generating unit 35 generates a difficult-to-discriminate condition combining conditions satisfying the second criterion (step S36).
 図20は、本実施形態のモデル更新装置200が行うグレーデータ除外処理の動作例を示すフローチャートである。学習データ生成部31は、第二学習データのうち、判別困難条件に該当する学習データを除外した第三学習データを生成する(ステップS41)。 FIG. 20 is a flowchart illustrating an operation example of gray data exclusion processing performed by the model updating apparatus 200 of the present embodiment. The learning data generation unit 31 generates third learning data from which the learning data corresponding to the difficult-to-discriminate condition is excluded from the second learning data (step S41).
 図21は、本実施形態のモデル更新装置200が行うグレーゾーン深堀処理の動作例を示すフローチャートである。モデル学習部32は、生成された第三学習データを用いて第一階層型混合モデルを生成する(ステップS51)。スコア算出部33は、第三学習データを用いて生成された第一階層型混合モデルにおける葉ノードごとに第一スコア(TPの割合)を算出する(ステップS52)。条件抽出部34は、第一の基準を満たさない第一スコアが算出された葉ノードへの分岐条件を抽出する(ステップS53)。モデル学習部32は、分岐条件の抽出対象になった葉ノードへ分類される学習データを用いて、その葉ノード配下で条件分岐する第三階層型混合モデルを生成する(ステップS54)。 FIG. 21 is a flowchart showing an operation example of the gray zone deep excavation processing performed by the model updating apparatus 200 of the present embodiment. The model learning unit 32 generates a first hierarchical mixture model using the generated third learning data (Step S51). The score calculation unit 33 calculates a first score (percentage of TP) for each leaf node in the first hierarchical mixed model generated using the third learning data (step S52). The condition extracting unit 34 extracts a branch condition to the leaf node for which the first score that does not satisfy the first criterion has been calculated (step S53). The model learning unit 32 uses the learning data classified into the leaf nodes from which the branch condition has been extracted, and generates a third hierarchical mixed model that branches conditionally under the leaf nodes (step S54).
 以上のように、本実施形態では、学習データ生成部31が、基準とする教師データを判別モデルに適用し、判別結果とラベルとを比較して、一致した教師データを正例とし、異なる教師データを負例とする第一学習データを生成する。また、モデル学習部32が生成された第一学習データを用いて異種混合機械学習により第一階層型混合モデルを生成し、スコア算出部33が、葉ノードごとにTPの割合(第一スコア)を算出する。条件抽出部34は、第一の基準を満たす第一スコアが算出された葉ノードへの分岐条件を抽出し、条件生成部35は、第一の基準を満たす分岐条件を組み合わせた判別可能条件を生成する。 As described above, in the present embodiment, the learning data generation unit 31 applies the reference teacher data to the discrimination model, compares the discrimination result with the label, sets the matched teacher data as a positive example, First learning data is generated using the data as a negative example. In addition, the model learning unit 32 generates a first hierarchical mixed model by heterogeneous machine learning using the generated first learning data, and the score calculation unit 33 determines the ratio of the TP for each leaf node (first score). Is calculated. The condition extraction unit 34 extracts a branch condition to the leaf node for which the first score that satisfies the first criterion has been calculated, and the condition generation unit 35 determines a identifiable condition obtained by combining the branch conditions that satisfy the first criterion. Generate.
 よって、既知の説明変数を用いた条件設定のみで判別可能なデータか否かを選別することができる。また、第一の実施形態のように、データに対する補充や、新たな判別モデルの学習をしなくても、フィルタ条件に応じてデータを選別することで、既存の判別モデルの判別条件を維持しつつ判別精度を向上させるように判別モデルを更新できていると言える。 Accordingly, it is possible to determine whether or not the data can be determined only by setting conditions using known explanatory variables. Further, as in the first embodiment, without replenishing the data or learning a new discriminant model, the discriminating condition of the existing discriminant model is maintained by selecting the data according to the filter condition. It can be said that the discrimination model has been updated so as to improve the discrimination accuracy while improving the discrimination accuracy.
 なお、本実施形態のモデル更新装置200が、第一の実施形態におけるデータ補充部40およびモデル生成部50を備えていてもよい。すなわち、モデル更新装置200が、データに対する補充や、新たな判別モデルの学習を行ってもよい。そのような構成によれば、既存の判別モデルの判別精度をより向上させることができる。 Note that the model updating device 200 of the present embodiment may include the data replenishing unit 40 and the model generating unit 50 of the first embodiment. That is, the model updating apparatus 200 may perform supplementation of data or learning of a new discrimination model. According to such a configuration, the discrimination accuracy of the existing discrimination model can be further improved.
 また、本実施形態では、モデル学習部32が、異種混合機械学習により階層型混合モデルを学習する。よって、学習データのみで、ほぼ完全に予測可能なデータの条件と、学習データでは完全に予測できないデータの条件を抽出できる。具体的には、前者の条件に該当するデータについては、判別モデルによる判別結果を信頼して自動判別の対象とすることができ、後者の条件に該当するデータについては、判別モデルによる判別が困難な案件として、個々に判別する対象とすることができる。例えば、住宅ローンの事前審査の場合、ルールに基づいて申込み情報から完全に判別可能な案件と、人による判別が好ましい案件とを分離できる。 Also, in the present embodiment, the model learning unit 32 learns a hierarchical mixture model by heterogeneous mixture machine learning. Therefore, it is possible to extract data conditions that can be predicted almost completely with the learning data only, and data conditions that cannot be predicted completely with the learning data. More specifically, data corresponding to the former condition can be subjected to automatic determination by relying on the result of determination by the determination model, and data corresponding to the latter condition is difficult to determine by the determination model. Items can be individually determined. For example, in the case of a pre-examination of a mortgage loan, a case that can be completely distinguished from application information based on rules and a case that is preferably distinguished by a person can be separated.
実施形態3.
 次に、本発明によるモデル更新装置の第三の実施形態を説明する。第一の実施形態では、本発明のモデル更新装置を、多値判別まで含めた一般的な判別問題に適用する場合について説明した。本実施形態では、明確なルールが背後に存在し、分岐条件だけで0/1が特定できるような二値判別問題に着目して説明する。
Embodiment 3 FIG.
Next, a third embodiment of the model updating device according to the present invention will be described. In the first embodiment, the case where the model updating apparatus of the present invention is applied to a general discrimination problem including multi-value discrimination has been described. In the present embodiment, a description will be given focusing on a binary discrimination problem in which a clear rule exists behind, and 0/1 can be specified only by a branch condition.
 機械学習で判別を行う場合、入力となる情報(説明変数)が判別に用いられる全ての情報である。しかし、実際は、それ以外の外部要因(外因)により結果が導出される。外因により判別が行われる場合、入力情報だけで完全な判別をすることは困難である。 (4) In the case of performing discrimination by machine learning, input information (explanatory variables) is all information used for discrimination. However, actually, the result is derived by other external factors (external factors). When the determination is made due to an external factor, it is difficult to make a complete determination using only input information.
 そこで、判別モデルを用いて判別を行う装置(以下、AI(artificial intelligence )と記す。)が正しく判定できたデータ(TP,TN)と、AIが判定できなかった(FP,FN)を活用する。AIが入力情報だけで判断可能なデータと、外因を要するデータに分別することが可能であれば、外因を要するデータをAI判定から除外することで、AIの判定結果は、ほぼ100%信頼できることになる。 Therefore, the data (TP, TN) for which the device for performing the determination using the discriminant model (hereinafter, referred to as AI (artificial {intelligence})) and the data for which the AI could not be determined (FP, FN) are utilized. . If the AI can be separated into data that can be determined only by input information and data that requires an external cause, the data that requires an external cause is excluded from the AI determination, and the AI determination result is almost 100% reliable. become.
 以下、AIを用いた判別(AI予測と記すこともある)について、以下の仮説を定義する。 Hereinafter, the following hypothesis is defined for the determination using AI (sometimes referred to as AI prediction).
 1.説明変数は学習で利用される『既知の変数』と、学習で利用されない『未知の変数』が存在する。
 2.未知の変数は、分析時点において、存在するか、または、存在しないかも未知とする。
 3.予測対象が既知の変数のみで判断可能な場合、予測対象は完全に正解する(判別される)。
 4.予測対象が未知の変数の影響を受ける場合には、(1)予測対象は完全に正解することはなく、(2)複数回学習しても、未知の変数の判別部分で精度が劣化する。すなわち、既知の変数で正しく分岐されても、最後に未知の変数が現れた場合には、学習データにより作成されたモデルは、完全に一致することはない。
 5.予測対象を既知の情報で複数回、様々な角度からの分割、学習を繰り返すことで、既知データにより判別可能なデータが分離される。
1. The explanatory variables include "known variables" used in learning and "unknown variables" not used in learning.
2. Unknown variables are either present or not present at the time of analysis.
3. If the prediction target can be determined only by the known variables, the prediction target is completely correct (determined).
4. When the prediction target is affected by the unknown variable, (1) the prediction target does not completely answer correctly, and (2) even if the learning is performed a plurality of times, the accuracy deteriorates in the discrimination part of the unknown variable. That is, even if the branching is performed correctly with known variables, if an unknown variable appears at the end, the model created from the learning data will not completely match.
5. By repeatedly dividing and predicting the prediction target from various angles with known information a plurality of times, data that can be distinguished from known data is separated.
 すなわち、上述する与信などの査定系の業務には、明確なルールが存在するはずである。言い換えると、対象のデータを分類する条件だけで、いわゆる0/1の判別が可能なはずである。一方、0/1の判別ができないような、いわゆるグレーデータは、恣意性が残るデータと言えることから、判別モデルによって何からの結果を出力するよりも、別途ユーザによって0/1の判別を行うことが好ましい。本実施形態のモデル更新装置によって更新されるモデルでは、明確に分類できないデータ(グレーデータ)の判定基準を設定することが可能であるため、このような判定基準を設定することで、グレーデータの判定を自動化することが可能になる。 That is, there should be clear rules for credit-related tasks such as credit mentioned above. In other words, so-called 0/1 determination should be possible only by the condition for classifying the target data. On the other hand, so-called gray data in which 0/1 cannot be determined can be said to be data with arbitrariness, so that the user performs 0/1 determination separately rather than outputting any result using a determination model. Is preferred. In the model updated by the model updating apparatus of the present embodiment, it is possible to set a criterion for data (gray data) that cannot be clearly classified. The judgment can be automated.
 以下、本実施形態のモデル更新装置を、取引相手に対して与信を行う判別モデルを更新する装置として利用する場合について、具体例を説明する。ここでは、第一の実施形態のモデル更新装置を用いて判別モデルを更新する場合について説明する。 Hereinafter, a specific example will be described in a case where the model updating apparatus of the present embodiment is used as an apparatus for updating a discriminating model for crediting a business partner. Here, a case where the discrimination model is updated using the model updating device of the first embodiment will be described.
 記憶部10は、既存の判別モデルとして、取引相手に対する与信の可否を判別する判別式が階層型混合モデルの葉ノードに設定され、与信の取引相手に関する情報を表わす説明変数に基づく二値の分岐条件が階層型混合モデルの各ノードに設定されたモデルを記憶する。 The storage unit 10 stores, as an existing discriminant model, a discriminant for discriminating whether or not credit is given to a trading partner in a leaf node of the hierarchical mixed model, and performs binary branching based on an explanatory variable representing information on the credit trading partner. The model in which the condition is set to each node of the hierarchical mixed model is stored.
 ユーザは、予め生成された複数の既存の判別モデルの中から、業務に使用したいモデルを確定する。ユーザは、一般に、複数の判別モデルの中から、判別精度だけでなく、分岐条件や予測式が、運用に適合するモデルを選定する。 (4) The user determines the model that he / she wants to use for work from among a plurality of existing discrimination models generated in advance. In general, a user selects a model that matches not only the discrimination accuracy but also the branching condition and the prediction formula from the plurality of discrimination models to the operation.
 図22は、階層型混合モデルによる判別モデルの例を示す説明図である。図22に例示する判別モデルは、二重枠の矩形が分岐条件を示す根ノードおよび節ノードであり、通常の矩形が判別式(予測式)を示す葉ノードである。例えば、年齢が30歳未満でローン残高が1000万以上の取引相手は、予測式番号1の予測式で判別されることを示す。 FIG. 22 is an explanatory diagram showing an example of a discrimination model based on a hierarchical mixed model. In the discriminant model illustrated in FIG. 22, a double-frame rectangle is a root node and a node indicating a branch condition, and a normal rectangle is a leaf node indicating a discriminant (prediction formula). For example, a transaction partner whose age is less than 30 and whose loan balance is 10,000,000 or more is determined by the prediction formula of prediction formula number 1.
 また、図22には、学習時、評価時、および、予測時において、各葉ノードに分類されたサンプル数がそれぞれ例示されている。 FIG. 22 illustrates the number of samples classified into each leaf node at the time of learning, evaluation, and prediction.
 図23は、各判別式の性質を表示する例を示す説明図である。図23に例示するグラフは、各判別式(予測式)が線形式で表されている場合における各説明変数の重み(係数)を、予測式ごとに積み上げた(加算した)結果を示す。例えば、出力部70が、図22に例示する形式で階層型混合モデルを表示してもよく、図23に例示する形式で予測式を表示してもよい。 FIG. 23 is an explanatory diagram showing an example of displaying the properties of each discriminant. The graph illustrated in FIG. 23 shows the result of accumulating (adding) the weights (coefficients) of the explanatory variables in the case where each discriminant (predictive formula) is represented in a linear format for each predictive formula. For example, the output unit 70 may display the hierarchical mixture model in the format illustrated in FIG. 22 or may display the prediction formula in the format illustrated in FIG.
 入力部20は、与信の可否を判別する対象のデータを入力する。 (4) The input unit 20 inputs data to be used for determining whether credit is permitted.
 データ抽出部30は、各葉ノードに分類された取引相手データのうち、ラベル付けされた正解ラベルの値が正であり、かつ、与信の判別結果も正である正解割合を集計する。そして、データ抽出部30は、正解割合が予め定めた閾値以下の条件の配下に分類された取引相手データを抽出する。 The 抽出 data extraction unit 30 tallies up the correct answer ratios in which the value of the labeled correct answer label is positive and the credit determination result is also positive among the business partner data classified into each leaf node. Then, the data extracting unit 30 extracts the partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold.
 データ補充部40は、抽出された取引相手データに対して、説明変数の追加と正解ラベルの更新の少なくとも一方の補充を受け付ける。 The data replenishment unit 40 accepts replenishment of at least one of adding an explanatory variable and updating a correct answer label to the extracted business partner data.
 モデル生成部50は、補充された取引相手データを用いて判別モデルを生成する。第一の実施形態と同様に、モデル生成部50は、任意の判別モデルを生成すればよい。 The model generation unit 50 generates a discrimination model by using the supplemented business partner data. As in the first embodiment, the model generation unit 50 may generate an arbitrary discrimination model.
 モデル更新部60は、抽出された取引相手データが分類される条件を満たさないデータを階層型混合モデルへ適用し、上記条件を満たすデータを判別モデルへ適用することを示すモデルを生成する。 The model updating unit 60 applies data that does not satisfy the conditions under which the extracted trading partner data is classified to the hierarchical mixed model, and generates a model indicating that data that satisfies the above conditions is applied to the discrimination model.
 出力部70は、生成した判別モデルを出力する。出力部70は、上述する図22および図23に例示する形式で、生成した判別モデルを出力してもよい。 The output unit 70 outputs the generated discrimination model. The output unit 70 may output the generated discrimination model in a format exemplified in FIGS. 22 and 23 described above.
 以上のように、本実施形態では、モデル更新装置が、取引相手に対して与信を行う判別モデルを更新するため、既存の判別モデルの判別条件を維持しながら、人手で確認すべき取引相手を示すデータ(グレーデータ)を抽出できる。 As described above, in the present embodiment, the model updating device updates the discriminating model for crediting the trading partner, so that while maintaining the discriminating conditions of the existing discriminating model, the trading partner to be manually confirmed is determined. The data (gray data) shown can be extracted.
 より具体的には、判別モデルの解釈性等を維持しつつ、利用している説明変数では判定できないデータ(すなわち、グレーデータ)を抽出できる。このようなデータを明確にすることで、必要な説明変数を検討することなども可能になる。 More specifically, it is possible to extract data (that is, gray data) that cannot be determined by the explanatory variable used while maintaining the interpretability of the discriminant model and the like. By clarifying such data, it becomes possible to consider necessary explanatory variables.
 図24は、判別モデルでデータを分類した結果の例を示す説明図である。図24に示す例では、図22に例示する既存の判別モデルM12を用いた場合に、太線矢印で示す分類処理を経由して、データ群D11が予測式番号5の葉ノードに分類されたとする。図24に示す例では、データ群D11の中に不正解のデータが4つ含まれているため、この葉ノードにおける判別精度が低下してしまっている。 FIG. 24 is an explanatory diagram showing an example of the result of classifying data using the discrimination model. In the example illustrated in FIG. 24, it is assumed that the data group D11 is classified into the leaf node of the prediction formula number 5 via the classification processing indicated by the thick arrow when the existing discrimination model M12 illustrated in FIG. 22 is used. . In the example shown in FIG. 24, since four incorrect data are included in the data group D11, the discrimination accuracy at the leaf node is reduced.
 図25は、更新された判別モデルでデータを分類した結果の例を示す説明図である。図25に示す例では、図24に例示する判別モデルを更新した結果、新判別モデルM11が生成され、新たな分岐条件M13を含めて、全体として判別モデルが更新されている。この新たな分岐条件M13により、予測式番号5の葉ノードにおいて不正解と判別されるデータを含むデータ群D12が、新判別モデルM11へ分類されるため、既存の判別モデルM12の判別精度を向上させることが可能になる。 FIG. 25 is an explanatory diagram showing an example of a result of classifying data by the updated discrimination model. In the example shown in FIG. 25, as a result of updating the discrimination model illustrated in FIG. 24, a new discrimination model M11 is generated, and the discrimination model is updated as a whole including the new branch condition M13. According to the new branch condition M13, the data group D12 including data determined to be incorrect at the leaf node of the prediction formula number 5 is classified into the new discrimination model M11, so that the discrimination accuracy of the existing discrimination model M12 is improved. It becomes possible to do.
 次に、本発明の概要を説明する。図26は、本発明によるモデル更新装置の概要を示すブロック図である。本発明によるモデル更新装置80は、階層型混合モデルを更新するモデル更新装置(例えば、モデル更新装置100)であって、階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出部81(例えば、データ抽出部30)と、抽出されたデータに対する補充を受け付けるデータ補充部82(例えば、データ補充部40)と、補充されたデータを用いて判別モデルを生成するモデル生成部83(例えば、モデル生成部50)と、対象とする条件を満たすデータを分類する節ノードを階層型混合モデルの最上位に配置したモデルを生成するモデル更新部84(例えば、モデル更新部60)とを備えている。 Next, the outline of the present invention will be described. FIG. 26 is a block diagram showing an outline of a model updating device according to the present invention. The model updating apparatus 80 according to the present invention is a model updating apparatus (for example, the model updating apparatus 100) that updates a hierarchical mixed model, and extracts data classified under a target condition in the hierarchical mixed model. A data extraction unit 81 (for example, the data extraction unit 30), a data replenishment unit 82 (for example, the data replenishment unit 40) that receives replenishment for the extracted data, and a model generation that generates a discriminant model using the replenished data A unit 83 (for example, the model generating unit 50) and a model updating unit 84 (for example, the model updating unit 60) that generates a model in which node nodes for classifying data satisfying a target condition are arranged at the top of the hierarchical mixed model. ).
 モデル更新部84は、節ノードに対する葉ノードに相当する階層型混合モデルへ条件を満たさないデータを適用し、節ノードに対する葉ノードに相当する判別モデルへ条件を満たすデータを適用することを示すモデルを生成する。 The model updating unit 84 is a model indicating that the data that does not satisfy the condition is applied to the hierarchical mixed model corresponding to the leaf node for the node node, and the data that satisfies the condition is applied to the discriminant model corresponding to the leaf node for the node node. Generate
 そのような構成により、既存の判別モデルの判別条件を維持しつつ判別精度を向上させるように判別モデルを更新できる。 に よ り With such a configuration, the discrimination model can be updated so as to improve the discrimination accuracy while maintaining the discrimination conditions of the existing discrimination model.
 また、モデル更新装置80は、各条件の配下に分類されたデータの判別結果を集計する判別結果集計部(例えば、データ抽出部30)を備えていてもよい。そして、データ抽出部81は、データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出してもよい。そのような構成により、説明変数が不足していると想定される条件部分を特定して抽出できる。 The model updating device 80 may include a determination result totalizing unit (for example, the data extracting unit 30) that totalizes the determination results of the data classified under each condition. Then, the data extracting unit 81 may extract data classified under the condition that the result of the data determination does not satisfy the criterion. With such a configuration, it is possible to specify and extract a condition portion where the explanatory variables are assumed to be insufficient.
 また、具体的態様として、説明変数に基づいて規定される二値判別の条件(例えば、明確なルール)が、階層型混合モデルの各ノードに設定されていてもよい。このとき、判別結果集計部は、各条件の配下に分類されたデータのうち、真に正であり、かつ、判別結果も正であるデータの割合である正解割合を集計してもよい。そして、データ抽出部81は、正解割合が予め定めた閾値以下の条件の配下に分類されたデータを抽出し、条件を満たさないデータを階層型混合モデルへ適用し、条件を満たすデータを判別モデルへ適用することを示すモデルを生成してもよい。 {Also, as a specific mode, a condition of binary discrimination (for example, a clear rule) defined based on the explanatory variable may be set for each node of the hierarchical mixed model. At this time, the determination result totaling unit may total the correct answer ratio, which is the ratio of data that is truly positive and the determination result is also positive among the data classified under each condition. Then, the data extraction unit 81 extracts data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold, applies data that does not satisfy the condition to the hierarchical mixed model, and determines data that satisfies the condition as a discrimination model. May be generated.
 さらなる具体的態様として、取引相手に対する与信の可否を判別する判別式が階層型混合モデルの葉ノードに設定され、与信の取引相手に関する情報を表わす説明変数に基づく二値の分岐条件が階層型混合モデルの各ノードに設定されていてもよい。このとき、 判別結果集計部は、各葉ノードに分類された取引相手データのうち、ラベル付けされた正解ラベルの値が正であり、かつ、与信の判別結果も正である正解割合を集計し、データ抽出部は、正解割合が予め定めた閾値以下の条件の配下に分類された取引相手データを抽出してもよい。そして、データ補充部82は、抽出された取引相手データに対して、説明変数の追加と正解ラベルの更新の少なくとも一方の補充を受け付け、モデル生成部83は、補充された取引相手データを用いて判別モデルを生成し、モデル更新部84は、抽出された取引相手データが分類される条件を満たさないデータを階層型混合モデルへ適用し、条件を満たすデータを判別モデルへ適用することを示すモデルを生成してもよい。 As a further specific aspect, a discriminant for determining whether credit to a counterparty is permitted is set in a leaf node of the hierarchical mixed model, and a binary branching condition based on an explanatory variable representing information on credit counterparties is used as a hierarchical mixed condition. It may be set for each node of the model. At this time, the discrimination result tallying unit tallies the correct answer ratio in which the value of the labeled correct answer label is positive and the credit discrimination result is also positive among the partner data classified into each leaf node. The data extraction unit may extract partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold. Then, the data supplementing unit 82 accepts supplementation of at least one of the addition of the explanatory variable and the update of the correct answer label to the extracted business partner data, and the model generation unit 83 uses the supplemented business partner data. A model indicating that a discriminant model is generated, and the model updating unit 84 applies data that does not satisfy the condition under which the extracted trading partner data is classified to the hierarchical mixed model, and applies data that satisfies the condition to the discriminant model. May be generated.
 また、モデル生成部83は、階層型混合モデルで表される判別モデルを学習してもよい。そのような構成により、説明変数が不足していると想定される部分の条件を深堀して判別できる。 (4) The model generation unit 83 may learn a discrimination model represented by a hierarchical mixed model. With such a configuration, it is possible to deeply determine the condition of the portion where the explanatory variable is assumed to be insufficient.
 また、モデル更新装置80は、判別モデルの情報を出力する出力部(例えば、出力部70)を備えていてもよい。そして、モデル生成部83は、複数種類の判別モデルを生成し、出力部は、複数種類の判別モデルにおける判別条件および判別式(例えば、図22および図23に例示する結果)を出力してもよい。そのような構成により、深堀した条件配下の内容をユーザに提示して選択させることができる。 The model updating device 80 may include an output unit (for example, the output unit 70) that outputs information on the discrimination model. Then, the model generating unit 83 generates a plurality of types of discriminating models, and the output unit outputs the discriminating conditions and the discriminants (for example, the results illustrated in FIGS. 22 and 23) in the plural types of discriminating models. Good. With such a configuration, it is possible to present the content under the deeply digged condition to the user for selection.
 具体的には、データ補充部82は、抽出されたデータに対して、説明変数の追加または教師ラベルの更新を受け付けてもよい。 Specifically, the data supplementing unit 82 may receive addition of an explanatory variable or update of a teacher label to the extracted data.
 また、データ抽出部81が、モデル更新部によって生成されたモデルにおいて対象とする条件の配下に分類されるデータを抽出し、データ補充部82が、抽出されたデータに対する補充を受け付けてもよい。さらに、モデル生成部83が、補充されたデータを用いて他の判別モデルを生成し、モデル更新部84が、対象とする条件を満たすデータを他の判別モデルへ適用することを示すモデルを生成してもよい。このように、繰り返し判別モデルを生成することで、既存の判別モデルの精度をさらに向上させることが可能になる。 (4) The data extracting unit 81 may extract data classified under the target condition in the model generated by the model updating unit, and the data replenishing unit 82 may receive replenishment for the extracted data. Further, the model generation unit 83 generates another discrimination model using the supplemented data, and the model update unit 84 generates a model indicating that data satisfying a target condition is applied to another discrimination model. May be. As described above, by generating the repetitive discrimination model, it is possible to further improve the accuracy of the existing discrimination model.
 図27は、少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。コンピュータ1000は、プロセッサ1001、主記憶装置1002、補助記憶装置1003、インタフェース1004を備える。 FIG. 27 is a schematic block diagram showing a configuration of a computer according to at least one embodiment. The computer 1000 includes a processor 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.
 上述のモデル更新装置は、コンピュータ1000に実装される。そして、上述した各処理部の動作は、プログラム(モデル更新プログラム)の形式で補助記憶装置1003に記憶されている。プロセッサ1001は、プログラムを補助記憶装置1003から読み出して主記憶装置1002に展開し、当該プログラムに従って上記処理を実行する。 モ デ ル The above-described model updating device is implemented in the computer 1000. The operation of each processing unit described above is stored in the auxiliary storage device 1003 in the form of a program (model update program). The processor 1001 reads out the program from the auxiliary storage device 1003, expands the program in the main storage device 1002, and executes the above processing according to the program.
 なお、少なくとも1つの実施形態において、補助記憶装置1003は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例としては、インタフェース1004を介して接続される磁気ディスク、光磁気ディスク、CD-ROM(Compact Disc Read-only memory )、DVD-ROM(Read-only memory)、半導体メモリ等が挙げられる。また、このプログラムが通信回線によってコンピュータ1000に配信される場合、配信を受けたコンピュータ1000が当該プログラムを主記憶装置1002に展開し、上記処理を実行してもよい。 In at least one embodiment, the auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible media include a magnetic disk, a magneto-optical disk, a CD-ROM (Compact Disc Read-only memory), a DVD-ROM (Read-only memory), A semiconductor memory and the like are included. When the program is distributed to the computer 1000 via a communication line, the computer 1000 that has received the program may load the program into the main storage device 1002 and execute the above-described processing.
 また、当該プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、当該プログラムは、前述した機能を補助記憶装置1003に既に記憶されている他のプログラムとの組み合わせで実現するもの、いわゆる差分ファイル(差分プログラム)であってもよい。 The program may be for realizing a part of the functions described above. Further, the program may be a program that realizes the above-described function in combination with another program already stored in the auxiliary storage device 1003, that is, a so-called difference file (difference program).
 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 一部 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited to the following.
(付記1)階層型混合モデルを更新するモデル更新装置であって、前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出部と、抽出されたデータに対する補充を受け付けるデータ補充部と、補充されたデータを用いて判別モデルを生成するモデル生成部と、前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成するモデル更新部とを備え、前記モデル更新部は、前記節ノードに対する葉ノードである前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードである前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成することを特徴とするモデル更新装置。 (Supplementary Note 1) A model updating apparatus that updates a hierarchical mixed model, the data updating unit extracting data classified under a target condition in the hierarchical mixed model, and a supplement for the extracted data. A model in which a data replenishing unit to be accepted, a model generating unit that generates a discriminant model using the replenished data, and a node node for classifying data that satisfies the target condition are arranged at the top of the hierarchical mixed model. A model updating unit for generating, wherein the model updating unit applies data that does not satisfy the condition to the hierarchical mixed model that is a leaf node for the node node, and the discrimination model is a leaf node for the node node. A model updating apparatus for generating a model indicating that data satisfying the above condition is applied to the model.
(付記2)各条件の配下に分類されたデータの判別結果を集計する判別結果集計部を備え、データ抽出部は、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出する付記1記載のモデル更新装置。 (Supplementary Note 2) A judgment result summation unit that sums up the judgment results of the data classified under each condition is provided, and the data extraction unit extracts the data classified under the condition that the judgment result of the data does not satisfy the criterion. The model updating device according to supplementary note 1 to be extracted.
(付記3)説明変数に基づいて規定される二値判別の条件が、階層型混合モデルの各ノードに設定され、判別結果集計部は、各条件の配下に分類されたデータのうち、真に正であり、かつ、判別結果も正であるデータの割合である正解割合を集計し、データ抽出部は、前記正解割合が予め定めた閾値以下の条件の配下に分類されたデータを抽出し、モデル更新部は、前記条件を満たさないデータを階層型混合モデルへ適用し、前記条件を満たすデータを判別モデルへ適用することを示すモデルを生成する付記2記載のモデル更新装置。 (Supplementary Note 3) Binary discrimination conditions defined based on the explanatory variables are set for each node of the hierarchical mixed model, and the discrimination result tallying unit determines whether the data classified under each condition is true. Positive, and the determination result is also tabulated the correct answer ratio that is the ratio of data that is also positive, the data extraction unit, the correct answer ratio is extracted data classified under the condition of a predetermined threshold or less, 3. The model updating apparatus according to claim 2, wherein the model updating unit applies data that does not satisfy the condition to a hierarchical mixed model and generates a model indicating that data that satisfies the condition is applied to a discriminant model.
(付記4)取引相手に対する与信の可否を判別する判別式が階層型混合モデルの葉ノードに設定され、与信の取引相手に関する情報を表わす説明変数に基づく二値の分岐条件が前記階層型混合モデルの各ノードに設定され、判別結果集計部は、各葉ノードに分類された取引相手データのうち、ラベル付けされた正解ラベルの値が正であり、かつ、与信の判別結果も正である正解割合を集計し、データ抽出部は、前記正解割合が予め定めた閾値以下の条件の配下に分類された取引相手データを抽出し、データ補充部は、抽出された取引相手データに対して、説明変数の追加と正解ラベルの更新の少なくとも一方の補充を受け付け、モデル生成部は、補充された取引相手データを用いて判別モデルを生成し、モデル更新部は、前記抽出された取引相手データが分類される条件を満たさないデータを階層型混合モデルへ適用し、前記条件を満たすデータを判別モデルへ適用することを示すモデルを生成する付記2または付記3記載のモデル更新装置。 (Supplementary Note 4) A discriminant for judging whether or not credit is given to the trading partner is set in the leaf node of the hierarchical mixed model, and the binary branching condition based on the explanatory variable representing the information on the credit trading partner is set in the hierarchical mixed model. Is set in each node of the answer table, and the determination result summarizing unit determines that the value of the labeled correct answer label is positive among the partner data classified into each leaf node, and that the determination result of the credit is also correct. Aggregating the ratios, the data extracting unit extracts the partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold, and the data replenishing unit explains the extracted partner data. Receiving a supplement of at least one of adding a variable and updating a correct answer label, the model generating unit generates a discrimination model using the supplemented partner data, and the model updating unit executes the extracted transaction. Applying the data that does not meet the conditions for hand data is classified into a hierarchical mixture model, the model updating device according to Note 2 or appendix 3, wherein generating a model illustrating applying the satisfying data to determine model.
(付記5)モデル生成部は、階層型混合モデルで表される判別モデルを学習する付記1から付記4のうちのいずれか1つに記載のモデル更新装置。 (Supplementary note 5) The model updating device according to any one of Supplementary notes 1 to 4, wherein the model generation unit learns a discrimination model represented by a hierarchical mixed model.
(付記6)判別モデルの情報を出力する出力部を備え、モデル生成部は、複数種類の判別モデルを生成し、前記出力部は、前記複数種類の判別モデルにおける判別条件および判別式を出力する付記1から付記5のうちのいずれか1つに記載のモデル更新装置。 (Supplementary Note 6) An output unit that outputs information of the discrimination model is provided, the model generation unit generates a plurality of types of discrimination models, and the output unit outputs discrimination conditions and discriminants in the plurality of types of discrimination models. The model updating device according to any one of supplementary notes 1 to 5.
(付記7)データ補充部は、抽出されたデータに対して、説明変数の追加または教師ラベルの更新を受け付ける付記1から付記6のうちのいずれか1つに記載のモデル更新装置。 (Supplementary note 7) The model updating device according to any one of Supplementary notes 1 to 6, wherein the data supplementing unit receives addition of an explanatory variable or update of a teacher label to the extracted data.
(付記8)データ抽出部は、モデル更新部によって生成されたモデルにおいて対象とする条件の配下に分類されるデータを抽出し、データ補充部は、抽出されたデータに対する補充を受け付け、モデル生成部は、補充されたデータを用いて他の判別モデルを生成し、モデル更新部は、前記対象とする条件を満たすデータを前記他の判別モデルへ適用することを示すモデルを生成する付記1から付記7のうちのいずれか1つに記載のモデル更新装置。 (Supplementary Note 8) The data extraction unit extracts data classified under the target condition in the model generated by the model update unit, the data replenishment unit receives replenishment for the extracted data, and the model generation unit Generates another discriminant model using the supplemented data, and the model updating unit generates a model indicating that data satisfying the target condition is applied to the other discriminant model. 8. The model updating device according to any one of 7 above.
(付記9)階層型混合モデルを更新するモデル更新方法であって、前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出し、抽出されたデータに対する補充を受け付け、補充されたデータを用いて判別モデルを生成し、前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成し、前記モデルを生成する際、前記節ノードに対する葉ノードである前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードである前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成することを特徴とするモデル更新方法。 (Supplementary Note 9) A model updating method for updating a hierarchical mixed model, the method including extracting data classified under a target condition in the hierarchical mixed model, receiving replenishment for the extracted data, and replenishing the extracted data. Generating a discriminant model using the extracted data, generating a model in which node nodes for classifying data satisfying the target condition are arranged at the top of the hierarchical mixed model, and generating the model, Applying data that does not satisfy the condition to the hierarchical mixed model that is a leaf node for a node, and generating a model indicating that data that satisfies the condition is applied to the discriminant model that is a leaf node for the node node. A model updating method characterized by the following.
(付記10)各条件の配下に分類されたデータの判別結果を集計し、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出する付記9記載のモデル更新方法。 (Supplementary note 10) The model updating method according to supplementary note 9, wherein the determination results of the data classified under each condition are totaled, and the data classified under the condition that the data determination result does not satisfy the criteria is extracted.
(付記11)階層型混合モデルを更新するコンピュータに適用されるモデル更新プログラムであって、前記コンピュータに、前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出処理、抽出されたデータに対する補充を受け付けるデータ補充処理、補充されたデータを用いて判別モデルを生成するモデル生成処理、および、前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成するモデル更新処理を実行させ、前記モデル更新処理で、前記節ノードに対する葉ノードである前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードである前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成させるためのモデル更新プログラム。 (Supplementary Note 11) A model update program applied to a computer that updates a hierarchical mixed model, the data updating processing extracting data classified under a target condition in the hierarchical mixed model to the computer. A data replenishment process for receiving replenishment for extracted data, a model generation process for generating a discriminant model using the replenished data, and a node node for classifying data satisfying the target condition. Executing a model update process for generating a model arranged at the top of the model node, applying data that does not satisfy the condition to the hierarchical mixed model that is a leaf node for the node node in the model update process, Indicates that data that satisfies the condition is applied to the discriminant model that is a leaf node for Model update program for generating the Dell.
(付記12)コンピュータに、各条件の配下に分類されたデータの判別結果を集計する判別結果集計処理を実行させ、データ抽出処理で、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出させる付記11記載のモデル更新プログラム。 (Supplementary Note 12) The computer is caused to execute a discrimination result summarization process for summarizing the discrimination results of the data classified under each condition, and in the data extraction process, the data is classified under the condition that the discrimination result does not satisfy the criterion. The model update program according to supplementary note 11, which causes the extracted data to be extracted.
 以上、実施形態及び実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the exemplary embodiments and examples, the present invention is not limited to the exemplary embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 この出願は、2018年8月27日に出願された日本特許出願2018-158155を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2018-158155 filed on Aug. 27, 2018, the entire disclosure of which is incorporated herein.
 10 記憶部
 20 入力部
 30 データ抽出部
 40 データ補充部
 50 モデル生成部
 60 モデル更新部
 70 出力部
 100 モデル更新装置
DESCRIPTION OF SYMBOLS 10 Storage part 20 Input part 30 Data extraction part 40 Data supplement part 50 Model generation part 60 Model update part 70 Output part 100 Model update device

Claims (12)

  1.  階層型混合モデルを更新するモデル更新装置であって、
     前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出部と、
     抽出されたデータに対する補充を受け付けるデータ補充部と、
     補充されたデータを用いて判別モデルを生成するモデル生成部と、
     前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成するモデル更新部とを備え、
     前記モデル更新部は、前記節ノードに対する葉ノードに相当する前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードに相当する前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成する
     ことを特徴とするモデル更新装置。
    A model updating device for updating a hierarchical mixed model,
    A data extraction unit that extracts data classified under the condition to be targeted in the hierarchical mixed model,
    A data replenishment unit that receives replenishment for the extracted data,
    A model generation unit that generates a discriminant model using the supplemented data,
    A model updating unit that generates a model in which a node that classifies data that satisfies the target condition is arranged at the top of the hierarchical mixed model,
    The model updating unit applies data that does not satisfy the condition to the hierarchical mixed model corresponding to a leaf node for the node node, and transmits data that satisfies the condition to the discriminant model corresponding to a leaf node for the node node. A model updating device for generating a model indicating application.
  2.  各条件の配下に分類されたデータの判別結果を集計する判別結果集計部を備え、
     データ抽出部は、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出する
     請求項1記載のモデル更新装置。
    A determination result totaling unit that totals the determination results of the data classified under each condition is provided,
    The model updating device according to claim 1, wherein the data extracting unit extracts data classified under a condition where the determination result of the data does not satisfy a criterion.
  3.  説明変数に基づいて規定される二値判別の条件が、階層型混合モデルの各ノードに設定され、
     判別結果集計部は、各条件の配下に分類されたデータのうち、真に正であり、かつ、判別結果も正であるデータの割合である正解割合を集計し、
     データ抽出部は、前記正解割合が予め定めた閾値以下の条件の配下に分類されたデータを抽出し、
     モデル更新部は、前記条件を満たさないデータを階層型混合モデルへ適用し、前記条件を満たすデータを判別モデルへ適用することを示すモデルを生成する
     請求項2記載のモデル更新装置。
    Binary discrimination conditions defined based on explanatory variables are set for each node of the hierarchical mixed model,
    The determination result totaling unit totals a correct answer ratio, which is a ratio of data that is truly positive among the data classified under each condition and the determination result is also positive,
    The data extraction unit extracts the data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold,
    The model updating device according to claim 2, wherein the model updating unit applies data that does not satisfy the condition to a hierarchical mixed model, and generates a model indicating that data that satisfies the condition is applied to a discriminant model.
  4.  取引相手に対する与信の可否を判別する判別式が階層型混合モデルの葉ノードに設定され、与信の取引相手に関する情報を表わす説明変数に基づく二値の分岐条件が前記階層型混合モデルの各ノードに設定され、
     判別結果集計部は、各葉ノードに分類された取引相手データのうち、ラベル付けされた正解ラベルの値が正であり、かつ、与信の判別結果も正である正解割合を集計し、
     データ抽出部は、前記正解割合が予め定めた閾値以下の条件の配下に分類された取引相手データを抽出し、
     データ補充部は、抽出された取引相手データに対して、説明変数の追加と正解ラベルの更新の少なくとも一方の補充を受け付け、
     モデル生成部は、補充された取引相手データを用いて判別モデルを生成し、
     モデル更新部は、前記抽出された取引相手データが分類される条件を満たさないデータを階層型混合モデルへ適用し、前記条件を満たすデータを判別モデルへ適用することを示すモデルを生成する
     請求項2または請求項3記載のモデル更新装置。
    A discriminant for determining whether credit is possible for the counterparty is set in the leaf node of the hierarchical mixed model, and a binary branching condition based on an explanatory variable representing information about the credit counterparty is set for each node of the hierarchical mixed model. Is set,
    The discrimination result totaling unit tallies the correct answer ratio in which the value of the labeled correct answer label is positive and the credit discrimination result is also positive among the partner data classified into each leaf node,
    The data extraction unit extracts the partner data classified under the condition that the correct answer ratio is equal to or less than a predetermined threshold,
    The data replenishment unit accepts replenishment of at least one of addition of explanatory variables and update of the correct answer label to the extracted partner data,
    The model generation unit generates a discrimination model using the supplemented counterparty data,
    The model updating unit applies data that does not satisfy a condition under which the extracted trading partner data is classified to a hierarchical mixed model, and generates a model indicating that data that satisfies the condition is applied to a discrimination model. The model updating device according to claim 2 or 3.
  5.  モデル生成部は、階層型混合モデルで表される判別モデルを学習する
     請求項1から請求項4のうちのいずれか1項に記載のモデル更新装置。
    The model updating device according to claim 1, wherein the model generation unit learns a discrimination model represented by a hierarchical mixed model.
  6.  判別モデルの情報を出力する出力部を備え、
     モデル生成部は、複数種類の判別モデルを生成し、
     前記出力部は、前記複数種類の判別モデルにおける判別条件および判別式を出力する
     請求項1から請求項5のうちのいずれか1項に記載のモデル更新装置。
    An output unit that outputs information of the discrimination model is provided,
    The model generation unit generates a plurality of types of discrimination models,
    The model updating device according to claim 1, wherein the output unit outputs a discriminant condition and a discriminant in the plurality of types of discriminant models.
  7.  データ補充部は、抽出されたデータに対して、説明変数の追加または教師ラベルの更新を受け付ける
     請求項1から請求項6のうちのいずれか1項に記載のモデル更新装置。
    The model updating device according to any one of claims 1 to 6, wherein the data supplementing unit receives addition of an explanatory variable or update of a teacher label for the extracted data.
  8.  データ抽出部は、モデル更新部によって生成されたモデルにおいて対象とする条件の配下に分類されるデータを抽出し、
     データ補充部は、抽出されたデータに対する補充を受け付け、
     モデル生成部は、補充されたデータを用いて他の判別モデルを生成し、
     モデル更新部は、前記対象とする条件を満たすデータを前記他の判別モデルへ適用することを示すモデルを生成する
     請求項1から請求項7のうちのいずれか1項に記載のモデル更新装置。
    The data extraction unit extracts data classified under the target condition in the model generated by the model update unit,
    The data replenishment unit receives replenishment for the extracted data,
    The model generation unit generates another discrimination model using the supplemented data,
    The model updating device according to any one of claims 1 to 7, wherein the model updating unit generates a model indicating that data satisfying the target condition is applied to the another discriminant model.
  9.  階層型混合モデルを更新するモデル更新方法であって、
     前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出し、
     抽出されたデータに対する補充を受け付け、
     補充されたデータを用いて判別モデルを生成し、
     前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成し、
     前記モデルを生成する際、前記節ノードに対する葉ノードに相当する前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードに相当する前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成する
     ことを特徴とするモデル更新方法。
    A model updating method for updating a hierarchical mixed model,
    Extract data classified under the conditions of interest in the hierarchical mixed model,
    Accept replenishment for the extracted data,
    Generate a discriminant model using the supplemented data,
    Generate a model in which a node node for classifying data satisfying the target condition is arranged at the top of the hierarchical mixed model,
    When generating the model, data that does not satisfy the condition is applied to the hierarchical mixed model corresponding to a leaf node for the node node, and data that satisfies the condition is applied to the discriminant model corresponding to a leaf node for the node node. A model updating method, characterized by generating a model indicating that a model is applied.
  10.  各条件の配下に分類されたデータの判別結果を集計し、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出する
     請求項9記載のモデル更新方法。
    The model updating method according to claim 9, wherein the determination results of the data classified under each condition are totaled, and the data classified under the condition that the determination result of the data does not satisfy a criterion is extracted.
  11.  階層型混合モデルを更新するコンピュータに適用されるモデル更新プログラムであって、
     前記コンピュータに、
     前記階層型混合モデルにおいて対象とする条件の配下に分類されるデータを抽出するデータ抽出処理、
     抽出されたデータに対する補充を受け付けるデータ補充処理、
     補充されたデータを用いて判別モデルを生成するモデル生成処理、および、
     前記対象とする条件を満たすデータを分類する節ノードを前記階層型混合モデルの最上位に配置したモデルを生成するモデル更新処理を実行させ、
     前記モデル更新処理で、前記節ノードに対する葉ノードに相当する前記階層型混合モデルへ前記条件を満たさないデータを適用し、前記節ノードに対する葉ノードに相当する前記判別モデルへ前記条件を満たすデータを適用することを示すモデルを生成させる
     ためのモデル更新プログラム。
    A model update program applied to a computer that updates a hierarchical mixed model,
    On the computer,
    A data extraction process for extracting data classified under the target condition in the hierarchical mixed model;
    Data replenishment processing to accept replenishment for extracted data,
    A model generation process of generating a discriminant model using the supplemented data, and
    Causing the node node that classifies the data satisfying the target condition to execute a model update process of generating a model in which the node is arranged at the top of the hierarchical mixed model;
    In the model update process, data that does not satisfy the condition is applied to the hierarchical mixed model corresponding to a leaf node for the node, and data that satisfies the condition is applied to the discriminant model corresponding to a leaf node for the node. A model update program to generate a model indicating that it applies.
  12.  コンピュータに、
     各条件の配下に分類されたデータの判別結果を集計する判別結果集計処理を実行させ、
     データ抽出処理で、前記データの判別結果が基準を満たさない条件の配下に分類されたデータを抽出させる
     請求項11記載のモデル更新プログラム。
    On the computer,
    The discrimination result summarization process of summing up the judgment results of the data classified under each condition is executed,
    The model update program according to claim 11, wherein the data extraction processing causes data classified under a condition that a result of the data determination does not satisfy a standard to be extracted.
PCT/JP2019/027687 2018-08-27 2019-07-12 Model updating device, model updating method, and model updating program WO2020044814A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018158155 2018-08-27
JP2018-158155 2018-08-27

Publications (1)

Publication Number Publication Date
WO2020044814A1 true WO2020044814A1 (en) 2020-03-05

Family

ID=69643270

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/027687 WO2020044814A1 (en) 2018-08-27 2019-07-12 Model updating device, model updating method, and model updating program

Country Status (1)

Country Link
WO (1) WO2020044814A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021197089A (en) * 2020-06-18 2021-12-27 ヤフー株式会社 Output device, output method, and output program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017058848A (en) * 2015-09-15 2017-03-23 日本電気株式会社 Information processing system, information processing method, and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017058848A (en) * 2015-09-15 2017-03-23 日本電気株式会社 Information processing system, information processing method, and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021197089A (en) * 2020-06-18 2021-12-27 ヤフー株式会社 Output device, output method, and output program
JP7170689B2 (en) 2020-06-18 2022-11-14 ヤフー株式会社 Output device, output method and output program

Similar Documents

Publication Publication Date Title
Fernandes et al. Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning
CN109492945A (en) Business risk identifies monitoring method, device, equipment and storage medium
JP7173332B2 (en) Fraud detection device, fraud detection method, and fraud detection program
Alsubaie et al. Cost-sensitive prediction of stock price direction: Selection of technical indicators
CN112348519A (en) Method and device for identifying fraudulent user and electronic equipment
CN109034201B (en) Model training and rule mining method and system
WO2021174760A1 (en) Voiceprint data generation method and device, computer device, and storage medium
CN112070138A (en) Multi-label mixed classification model construction method, news classification method and system
CN112270546A (en) Risk prediction method and device based on stacking algorithm and electronic equipment
WO2021111540A1 (en) Evaluation method, evaluation program, and information processing device
CN111160959B (en) User click conversion prediction method and device
CN116186611A (en) Unbalanced data classification method, device, terminal equipment and medium
CN111582315A (en) Sample data processing method and device and electronic equipment
WO2020044814A1 (en) Model updating device, model updating method, and model updating program
US20210342707A1 (en) Data-driven techniques for model ensembles
CN112598405A (en) Business project data management method and system based on big data
CN112508684A (en) Joint convolutional neural network-based collection risk rating method and system
WO2020044815A1 (en) Discriminable data sorting system, method, and program
CN112685374A (en) Log classification method and device and electronic equipment
JP2005222445A (en) Information processing method and analysis device in data mining
CN110781293A (en) Validating training data for a classifier
Khidmat et al. Machine learning in the boardroom: gender diversity prediction using boosting and undersampling methods
CN115994331A (en) Message sorting method and device based on decision tree
CN112015861A (en) Intelligent test paper algorithm based on user historical behavior analysis
JP5946949B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19856156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19856156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP