WO2022254607A1

WO2022254607A1 - Information processing device, difference extraction method, and non-temporary computer-readable medium

Info

Publication number: WO2022254607A1
Application number: PCT/JP2021/020987
Authority: WO
Inventors: 瑞蒋
Original assignee: 日本電気株式会社
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2022-12-08
Also published as: JPWO2022254607A1

Abstract

Provided is an information processing device (1) capable of efficiently evaluating a learning model. The information processing device (1) is provided with: an analysis unit (2) for extracting the difference between a first explanatory variable included in first case information indicating information that pertains to a design pattern of a first learning model and a second explanatory variable included in second case information indicating information that pertains to a design pattern of a second learning model; a calculation unit (3) that, when the difference is extracted, calculates a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information, and calculates a second correlation coefficient between the second explanatory variable and a second objective variable included in the second case information; and, an output unit (4) for outputting the extraction result of the analysis unit (2) and the calculation result of the calculation unit (3).

Description

Information processing device, difference extraction method, and non-transitory computer-readable medium

The present disclosure relates to an information processing device, a difference extraction method, and a non-transitory computer-readable medium.

AI (Artificial Intelligence) utilization is required in many fields and systems. In AI utilization, the learning model is evaluated based on the past design information and the difference between the learning model, and the quality of the prediction accuracy is determined. For example, Patent Literature 1 discloses a system that evaluates performance by comparing candidate algorithms used for machine learning in order to create a learning model.

JP 2017-004509

In general, in order to create a good learning model, it is necessary to consider new design patterns by analyzing the learning model and checking the information used in already tried design patterns. The design pattern information includes, for example, algorithms, objective variables, explanatory variables, hyperparameters, data used for learning and verification, and the like. Therefore, the person in charge of analysis determines whether or not there is a difference in each type of information, and analyzes how the information with the difference affects the prediction accuracy of the learning model. However, depending on the skill level of the person in charge of analysis, the information with the difference may be overlooked, or the degree of influence of the information with the difference on the learning model may not be determined. Therefore, in order to make up for the difference in the skill level of the person in charge of analysis, for example, measures such as final confirmation by an expert are taken. However, if such a measure is taken, the work time and cost of the expert will be generated, and the burden on the expert will be increased.

One object of the present disclosure is to solve the above problems, and an information processing device, a difference extraction method, and a non-temporary computer-readable method capable of efficiently evaluating a learning model. It is to provide a medium.

The information processing device according to the present disclosure is
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. an analysis means for extracting a difference from explanatory variables;
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and a calculation means for calculating a second correlation coefficient with a second objective variable included in the second case information;
an output means for outputting the extraction result of the analysis means and the calculation result of the calculation means;
Prepare.

The difference extraction method according to the present disclosure is
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. include.

A non-transitory computer-readable medium according to the present disclosure includes:
A non-temporary computer-readable medium storing a program that causes an information processing device to execute a difference extraction method,
The difference extraction method is
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. include.

According to the present disclosure, it is possible to provide an information processing device, a difference extraction method, and a non-temporary computer-readable medium capable of efficiently evaluating a learning model.

1 is a block diagram showing a configuration example of an information processing apparatus according to a first embodiment; FIG. It is a figure which shows the structural example of the information processing apparatus concerning 2nd Embodiment. FIG. 4 is a diagram showing information held by an information analysis unit; It is a table which shows the process target data relevant to an analysis process. It is a figure which shows an example of a determination table. 9 is a flow chart showing an operation example of the information processing apparatus according to the second embodiment; 9 is a flow chart showing an operation example of the information processing apparatus according to the second embodiment; It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. It is a figure which shows an example of an extraction result. FIG. 10 is a diagram for explaining processing for narrowing down difference display; FIG. 10 is a diagram for explaining processing for narrowing down difference display; It is a figure which shows the hardware structural examples, such as an information processing apparatus.

Embodiments of the present disclosure will be described below with reference to the drawings. Note that the following descriptions and drawings are appropriately omitted and simplified for clarity of explanation. Further, in each drawing below, the same elements are denoted by the same reference numerals, and redundant description is omitted as necessary.

First, the terms used in this disclosure will be explained. In the present disclosure, design patterns that indicate patterns for designing learning models are referred to as "cases." Also, in this disclosure, "case" is defined as a term that can also include design information for creating, validating, and evaluating analytical models. Design information includes the specification of the AI engine, the specification of data for learning, the data for verification and the data for evaluation, the specification of hyperparameters and data division conditions, and the specification of parameters other than hyperparameters used to execute the AI engine. can include Furthermore, the design information may include the source code of the AI engine execution program, and the like. For example, when the first learning model is created based on the first design pattern, the first design pattern is referred to as the first case, and the information about the first design pattern (used for the first design pattern). information) is referred to as the first case information. Also, in the present disclosure, a learning model may be referred to as an analysis model.

(First embodiment)
A configuration example of the information processing apparatus 1 according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration example of an information processing apparatus according to a first embodiment; The information processing device 1 may be a personal computer or a server. The information processing device 1 includes an analysis unit 2 , a calculation unit 3 and an output unit 4 .

The analysis unit 2 analyzes the first explanatory variable included in the first case information indicating the information regarding the design pattern of the first learning model and the second case information indicating the information regarding the design pattern of the second learning model. A difference from the included second explanatory variable is extracted. The analysis unit 2 may acquire the first case information and the second case information from a storage device (not shown) holding the first case information and the second case information. Alternatively, the analysis unit 2 may acquire the first case information and the second case information by inputting them from an input device (not shown). The storage device and the input device may be provided inside or outside the information processing device 1, respectively. The analysis unit 2 extracts the difference between the first explanatory variable and the second explanatory variable from the acquired first case information and second case information.

When the analysis unit 2 extracts the difference, the calculation unit 3 calculates a first correlation coefficient between the first explanatory variable and the first objective variable included in the first case information, and calculates a second correlation coefficient. A second correlation coefficient between the explanatory variable and the second objective variable included in the second case information is calculated. The first correlation coefficient is an index value indicating the relationship between the first explanatory variable and the first objective variable. The calculation unit 3 divides the covariance between the first explanatory variable and the first objective variable by the product of the standard deviation of the first explanatory variable and the standard deviation of the first objective variable to obtain the first may be calculated. Also, the second correlation coefficient is an index value indicating the relationship between the second explanatory variable and the second objective variable. Similar to the first correlation coefficient, the calculator 3 calculates the covariance between the second explanatory variable and the second objective variable as the standard deviation of the second explanatory variable and the standard deviation of the second objective variable. A second correlation coefficient may be calculated by dividing by the product of .

The output unit 4 outputs the extraction result of the analysis unit 2 and the calculation result of the calculation unit 3. The output unit 4 may output the extraction result of the analysis unit 2 and the calculation result of the calculation unit 3 to an output device (not shown) provided inside or outside the information processing apparatus 1 .

As described above, the information processing apparatus 1 extracts the difference between the explanatory variables included in the two pieces of case information by the analysis unit 2. When the difference between the explanatory variables is extracted, the information processing apparatus 1 extracts the explanatory variable , and the correlation with the objective variable. The information processing device 1 outputs the extraction result of the analysis unit 2 and the calculation result of the calculation unit 3 . A correlation coefficient is an index value that indicates the relationship between an explanatory variable and an objective variable. Therefore, by checking the results output from the information processing apparatus 1, the person in charge of analysis who analyzes each piece of case information can grasp the difference between the explanatory variables and the explanatory variables that affect the objective variable. Since the learning model is a model that predicts the objective variable based on the explanatory variables, the correlation coefficient can also be said to be an index value of the influence of the explanatory variables on the learning model. Therefore, by checking the results output from the information processing apparatus 1, the person in charge of analysis can grasp the difference between the explanatory variables and also grasp the explanatory variables that affect the learning model. Therefore, according to the information processing apparatus 1 according to the first embodiment, it is possible to efficiently evaluate the learning model regardless of the skill level of the person in charge of analysis.

(Second embodiment)
Next, a second embodiment will be described. The second embodiment is a concrete embodiment of the first embodiment.
<Configuration example of information processing device>
A configuration example of the information processing apparatus 100 according to the second embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating a configuration example of an information processing apparatus according to a second embodiment; An information processing device 100 corresponds to the information processing device 1 according to the first embodiment. The information processing device 100 is a device that analyzes an analysis model that is a machine-learned learning model. Information processing apparatus 100 may be a personal computer or a server. The information processing device 100 includes a repository 10 , a processing device 20 , an input device 30 and an output device 40 . Note that in the following description, the learning model analyzed by the information processing apparatus 100 is described as an analysis model.

The repository 10 is a storage device that stores (holds) case information analyzed by the information processing device 100 and various types of information related to the case information. The repository 10 may be, for example, the NEC Advanced Analytics Platform Modeler (AAPF Modeler). The repository 10 has an information holding unit 11 .

The information holding unit 11 inputs and holds various types of information received by the information input unit 21 provided in the processing device 20 from the information input unit 21 . The information holding unit 11 may be called a storage unit.

Various types of information held (accumulated) by the information holding unit 11 will now be described with reference to FIG. FIG. 3 is a diagram showing information held by an information analysis unit. As shown in FIG. 3, the information holding unit 11 holds analysis summary information, case information, analysis model information, evaluation record information, and assignment information.

　Analysis summary information is created for each analysis purpose for which you want to analyze using an analysis model, which is a learning model. For example, if a user (person in charge of analysis) who uses an analysis model wants to perform power demand forecasting and sets power demand forecasting as the purpose of analysis, analysis summary information with "power demand forecasting" as the purpose of analysis is created. be. For example, when a user who uses an analysis model wants to make a sales forecast different from a power demand forecast and sets the sales forecast as the purpose of analysis, analysis summary information is created with the purpose of analysis being "sales forecast." Analysis summary information includes analysis summary name, analysis objective, prediction objective, and target accuracy index value.

The name of the analysis summary is set in the analysis summary name.
The purpose of analysis is set with the purpose of creating an analysis model. Using the above example, the analysis purpose is set to, for example, "power demand forecast" or "sales forecast."

For prediction purposes, the type of analysis performed by machine learning is set. Types of analysis performed in machine learning include, for example, regression analysis of supervised learning, class analysis, and the like. Therefore, information that can specify the type of analysis, such as regression analysis of supervised learning and class analysis, is set for the prediction purpose.

The target accuracy index value is set with an accuracy index value that is the target of the prediction accuracy of the analysis model created based on the analysis summary information. In other words, the information about the target accuracy index is set with information indicating the index value of the prediction accuracy target of the analysis model created from a plurality of cases based on the analysis summary information. As the target accuracy index value, items related to the accuracy index and numerical values of the items are set as shown below. Items related to the accuracy index include, for example, the following items. For the target accuracy index value, for example, the average absolute percent error is set to XX%.
<Items related to accuracy index>
・Mean Absolute Error (MAE)
・Mean Squared Error (MSE)
・Root Mean Squared Error (RMSE)
・Mean Absolute Percentage Error (MAPE)
・Root Mean Squared Percentage Error (RMSPE)
・CoD: Coefficient of Determination
・AUC (Area Under Curve): An index value that indicates the area below the ROC curve when creating the ROC (Receiver Operating Characteristic) curve ・PR-AUC (PR-Area Under Curve): PR (Precision Recall) Index value indicating the area below the PR curve when the curve is created ・TP (true positive): number of correctly predicted positively ・FP (false positive): number of incorrectly positively predicted true negative): number of correct negative predictions FN (false negative): number of wrong negative predictions Accuracy
・Precision
・Recall rate
・Specificity
・False positive rate (FPR)
・False Negative Rate (FNR)
・F-measure
・Matthews Correlation Coefficient (MCC)
・Logloss (Logarithmic Loss)

The case information is information about cases (design patterns) for creating an analysis model based on the analysis summary information. When one piece of analysis summary information is created, an analysis model with high prediction accuracy is created according to the analysis purpose, prediction purpose, and target accuracy index value included in the analysis summary information. It is generally difficult to create an analytical model with high prediction accuracy with only one design. Create high analytical models. Therefore, a plurality of pieces of case information are created from one piece of analysis summary information. In other words, the analysis summary information is information bundling a plurality of pieces of case information, and the information holding unit 11 holds, for example, the analysis summary information and the case information in a hierarchical manner. In other words, the information holding unit 11 holds the case information so that it is stored one level below the analysis summary information. Therefore, the analysis summary information and the case information are held by the information holding unit 11 so that the corresponding information can be identified by tracing the held hierarchy. Case information includes case names, learning candidate data, AI engine algorithms, hyperparameters, objective variables, explanatory variables, and corresponding tasks.

A case name is set to identify a case for designing an analysis model.
A set of data that may be used to create an analysis model is set in learning candidate data. Specifically, the learning candidate data is set with a plurality of variable names that can be used as objective variables and explanatory variables, and data such as numerical values for each variable. Note that the learning candidate data may include variables that are not used as objective variables and explanatory variables.

The AI engine algorithm is set with the AI engine name and the name of the algorithm used by the AI engine. AI engine is a general term for AI that performs analysis based on a specific algorithm classification. An AI engine refers to a system that realizes analysis processing such as prediction and discrimination by generating an analysis model using machine learning technology according to a predetermined data analysis method. The AI engine is, for example, a commercial software program or a software program provided as open source. AI engines include, for example, scikit-learn and PyTorch.

For the objective variable, the variable name (objective variable name) of the information to be predicted by the analysis model (data to be predicted) and the data type are set. The data type of the objective variable is a label that indicates the type of value of the objective variable and is used for classification. Examples of data types include, for example, categorical types and numeric types. For example, if the purpose of analysis is "electricity demand forecast", the objective variable is set to "result (10,000 kW)", which indicates the objective variable name of the actual electric power value related to electric power demand, and the data type of the objective variable. be.

The explanatory variables are multiple variables used when the analysis model makes predictions, and variable names (explanatory variable names) that are assumed to affect the objective variable are set. For explanatory variables, all explanatory variable names are set, for example, in the form of a variable list. For example, if the purpose of the analysis is "electricity demand forecast", the explanatory variables include "temperature", "precipitation", and the actual electric power value two days ago, which are used to forecast the electric power demand, which is the objective variable. A variable name such as “Actual (10,000 kW)_2 days ago” is set in a list format as a variable list.

The problem to be solved is information related to the problem information described later, and the problem to be solved in each case is set in the problem to be solved. For example, when evaluating an analysis model created from a certain case, if it is found that the data related to "temperature" included in the learning candidate data is insufficient, the problem information will include "Data related to "temperature" is missing. Insufficient" problem is set. If a newly considered case is based on training candidate data to which data on 'temperature' has been added, the response task included in the case information for that case will have the message 'data on 'temperature' is lacking'. ” is set.

Information about the analysis model created from one piece of case information is set in the analysis model information. Since a plurality of analytical models may be created from one case, at least one piece of analytical model information is associated with one piece of case information. The information holding unit 11 hierarchically holds analysis summary information, case information, and analysis model information. Specifically, the information holding unit 11 stores the analysis outline so that the case information is stored in the hierarchy one level below the analysis outline information, and the analysis model information is stored in the hierarchy one level below the case information. Retain information, case information and analysis model information. Therefore, the analysis summary information, the case information, and the analysis model information are held by the information holding unit 11 so that the corresponding information can be specified by tracing the held hierarchy. The analysis model information includes an analysis model name, forecast/actual log, explanatory variable column correspondence map, learning data, evaluation data, model qualitative information, and accuracy index value.

The name of the analytical model is set in the analytical model name.
A value predicted by the analysis model and an actual value are set in the forecast/actual log. The forecast/actual log may be held in the information holding unit 11 in a file format.

Information that determines how to process the data used in the analysis model is set in the explanatory variable column correspondence map. Specifically, column correspondence information before and after data processing is set in the explanatory variable column correspondence map. More specifically, the explanatory variable column correspondence map contains information on the input column to which the input explanatory variable is set, information indicating the processing content of what kind of processing is to be performed on the input column, and an explanatory variable column correspondence map. Information of the output column to which the variable after the variable is set is set. Examples of processing contents include binary expansion of expanding one column into a plurality of columns, standardization processing of standardizing and outputting one column, and the like.

A set of multiple data used to create an analysis model is set in the learning data. Variable names and numerical data of each variable are set in the learning data. All variable names are set in the learning data, for example, in the form of a variable list. Since the explanatory variables may be processed by the explanatory variable column correspondence map when creating the analysis model, the variable names of the learning data are the variable names after the explanatory variables have been processed. Note that when the explanatory variables are not processed, the variable names of the learning data are the explanatory variable names.

A set of data used to evaluate the analysis model is set in the evaluation data. A variable name and numerical data of each variable are set in the evaluation data. All variable names are set in the evaluation data, for example, in the form of a variable list. Since the explanatory variables may be processed by the explanatory variable column correspondence map when creating the analysis model, the variable names of the evaluation data are the variable names after the explanatory variables have been processed. Note that when the explanatory variables are not processed, the variable names of the evaluation data are the explanatory variable names.

The model qualitative information contains information on the grounds for the analysis model to derive the predicted value. For example, when the analysis model is represented by a regression equation, the model qualitative information includes the regression equation and the regression coefficients included in the regression equation.

Also, in the model qualitative information, for example, when the analysis model is represented by a decision tree, hierarchical information of the decision tree is set. A decision tree represents a predictive model for deriving a conclusion about the target value of a certain item from the observed results for that item. The hierarchy of decision trees represents the hierarchy of the tree structure of decision trees. In the branches of the decision tree, internal nodes correspond to variables, and branches to child nodes indicate possible values of the variable. A leaf of the decision tree represents the predicted value of the objective variable for the variable value represented by the path from the root. The number of data samples in the leaves of the decision tree is the number of data records expected in each leaf. Decision tree hierarchy information includes the number of decision tree layers, relationship information between layers, decision conditions for each branch of the decision tree, the number of learning data samples for each leaf of the decision tree, and the number of evaluation data samples for each leaf of the decision tree. including. The relationship information between hierarchies is information indicating the connection of each leaf. The number of learning data samples for each leaf of the decision tree is the number of learning data records predicted for each leaf of the decision tree. The number of evaluation data samples for each leaf of the decision tree is the number of records of evaluation data predicted for each leaf of the decision tree.

The accuracy index value is the accuracy of the learning result, which is the output result of the analysis model after inputting the learning data into the analysis model, and the accuracy of the prediction result, which is the output result of the analysis model after inputting the evaluation data into the analysis model. is set. As the accuracy index value, an item related to the accuracy index and the value of each item are set. The items related to the accuracy index are the items listed in the description of the target accuracy index value. The value of each item is calculated from the forecast/actual log. In the following description, the accuracy index value indicating the accuracy of the learning result is described as the accuracy index value based on the learning data, and the accuracy index value indicating the accuracy of the prediction result is described as the accuracy index value based on the evaluation data. Sometimes. In this embodiment, the accuracy index value based on the learning data and the accuracy index value based on the evaluation data are set as the accuracy index value. Any one of the values may be set.

The evaluation record information is information related to records when evaluation target case information and analysis model information are evaluated. The evaluation record information includes an evaluation record name, an evaluation target, an accuracy index, and an evaluation/opinion.
The name of the evaluation record is set in the evaluation record name.
Information specifying a case related to the analysis model to be evaluated is set in the evaluation target.

The accuracy index is set with the accuracy index value of the actual value with respect to the predicted value of the analysis model. The accuracy index is an index that can be arbitrarily set by the user who performs the evaluation, and may be set based on the forecast/actual log.
In the evaluation/opinion, the opinion of the user who evaluates the analysis model and case to be evaluated is set.

The assignment information is set with information related to assignments identified from the evaluation record information. For example, when evaluating an analysis model created from a certain case, if it is found that the data related to "temperature" included in the learning candidate data is insufficient, the problem information will include "Data related to "temperature" is missing. Insufficient" information is set. The task information includes the task name, task content, occurrence evaluation result name, source case, task response case, and presence/absence of case effect.

The name of the assignment is set in the assignment name. If the task information is information about the task ``Insufficient data about temperature'', information such as ``Insufficient data about temperature'' is set in the task name, for example.

The specific content of the task is set in the task content. If the task information is information about the task that ``the data about 'temperature' is insufficient'', the task content includes, for example, ``the data about 'temperature' included in the learning candidate data is insufficient''. information is set.

The occurrence evaluation result name is set to the evaluation record name included in the evaluation record information in which the issue was found.
Information specifying a case in which a problem has been identified is set in the source case. Information specifying a case set as an evaluation target included in the evaluation record information in which the problem was found is set in the source case.

Information that identifies the case corresponding to the issue is set in the issue-related case. For example, when a new case is created for an issue, that case is set as the issue-handling case.

For the presence or absence of case effect, the judgment result of whether or not each case solves the problem is set for the new case corresponding to the problem. Assume that two new cases are created for the problem, the first case does not solve the problem, and the second case solves the problem. In this case, information indicating whether the problem has been solved is set for the first case as information about the presence or absence of case effects, and information indicating that the problem has been solved for the second case is set. information is set.

Returning to FIG. 2, a configuration example of the processing device 20 will be described. The processing device 20 functions as a control section that performs various controls on data input from the input device 30 . Also, the processing device 20 analyzes the analysis summary information, the case information, and the analysis model information using various types of information held by the repository 10 and outputs the analysis results to the output device 40 . The processing device 20 performs operations on external systems. The processing device 20 includes an information input section 21 , an information analysis section 22 , a calculation section 23 , an output section 24 and an external system control section 25 .

The information input unit 21 receives various information held by the information holding unit 11 of the repository 10 from the input device 30 . Information input unit 21 inputs the received information to information holding unit 11 . The information input unit 21 receives, through the input device 30 , information about an analysis target to be analyzed and an analysis model of a comparison target to be compared with the analysis target, input by the user to the input device 30 . The information input unit 21 outputs to the information analysis unit 22 information about the analysis model to be analyzed and the analysis model to be compared.

Note that the information input unit 21 may receive, via the input device 30, the information on the case of the analysis target to be analyzed and the comparison target case to be compared with the analysis target, input by the user into the input device 30. Alternatively, if the user wants to compare all the analysis models, the information input unit 21 does not need to receive feedback regarding the analysis target and comparison target analysis models.

In addition, the information input unit 21 receives, from the user via the input device 30, information as to whether or not the analysis processing, which is performed by the information analysis unit 22 and the calculation unit 23 and will be described later, is to be stopped in the middle. 22. In the analysis process, the information input unit 21 receives output conditions for outputting the extraction result extracted by the information analysis unit 22 and the calculation result calculated by the calculation unit 23 from the user via the input device 30, and outputs the output condition to the output unit 24. output to The information input unit 21 receives from the user via the input device 30 whether or not to output each of the extraction result and the calculation result to the output device 40 , and outputs them to the output unit 24 . In other words, the information input unit 21 receives output items to be output to the output device 40 from the user via the input device 30 and outputs them to the output unit 24 .

The information analysis unit 22 corresponds to the analysis unit 2 in the first embodiment. The information analysis unit 22 uses the information on the analytical model to be analyzed and the analytical model to be compared that are input to the information input unit 21 among the various types of information held by the information holding unit 11 of the repository 10, and analyzes the two Run an analysis process that compares two analysis models. Specifically, the information analysis unit 22 performs learning candidate data, AI engine algorithms, hyperparameters, objective variables, explanatory variables, learning data, evaluation data, model qualitative information, and accuracy, which are surrounded by dotted lines in FIG. Analytical processing is performed by comparing index values. Details of the analysis processing will be described later.

The calculator 23 corresponds to the calculator 3 in the first embodiment. The calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable in the analysis process. A correlation coefficient is an index value that indicates the relationship between an explanatory variable and an objective variable. The calculation unit 23 may calculate the correlation coefficient by dividing the covariance between the explanatory variable and the objective variable by the product of the standard deviation of the explanatory variable and the standard deviation of the objective variable. The calculator 23 outputs the calculated correlation coefficient to the information analyzer 22 . The calculation unit 23 calculates hash values of learning candidate data in the analysis process. The calculation unit 23 outputs the calculated hash value of the learning candidate data to the information analysis unit 22 .

The calculation unit 23 calculates basic statistics of learning candidate data, learning data, and evaluation data in the analysis process. The calculator 23 calculates a basic statistic according to the data type of the objective variable. Basic statistics include, for example, number of elements, arithmetic mean, standard deviation, minimum value, quarter quantile, median, and three quarter quantile. The basic statistic calculated by the calculator 23 is not limited to the above, and may be configured to calculate an arbitrarily set statistic. The calculation unit 23 outputs the calculated basic statistics to the information analysis unit 22 .

<analysis processing>
Now, with reference to FIG. 4, analysis processing performed by the information analysis unit 22 and the calculation unit 23 will be described. FIG. 4 is a table showing processing target data related to analysis processing.

First, FIG. 4 will be explained. FIG. 4 shows data to be processed by the information analysis unit 22 to be processed in the analysis process, extraction information indicating information for extracting a difference from the data to be processed, and additional extraction when there is a difference in the extraction information. It is a table showing a relationship with additional extraction/calculation information. The information analysis unit 22 sequentially processes the processing target data shown in FIG. 4 from the top.

Next, the details of the analysis processing will be described.
The information analysis unit 22 identifies analysis model information to be analyzed based on the information about the analysis model to be analyzed, which is input from the information input unit 21 . The information analysis unit 22 identifies analysis model information to be compared based on the information about the analysis model to be compared, which is input from the information input unit 21 .

The information analysis unit 22 extracts the difference between the objective variable included in the case information corresponding to the analysis model information to be analyzed and the objective variable included in the case information corresponding to the analysis model information to be compared. The information analysis unit 22 generates case information corresponding to the analysis model information to be analyzed and case information corresponding to the analysis model information to be compared based on the hierarchical relationship between the analysis model information in the information holding unit 11 and the case information. to identify The information analysis unit 22 sets the objective variable name set to the objective variable included in the case information corresponding to the analysis model information to be analyzed and the objective variable name set to the objective variable included in the case information corresponding to the analysis model information to be compared. Extract the difference from the target variable name. In the following explanation, the case information corresponding to the analysis model information to be analyzed can be described as the case information to be analyzed, and the case information corresponding to the analysis model information to be compared can be described as the case information to be compared. be.

When the difference between the objective variable included in the case information to be analyzed and the objective variable included in the case information to be compared is extracted, the information analysis unit 22 performs additional extraction/ Extract the difference in the data type that is the calculation information.

The information analysis unit 22 extracts the difference between the AI engine algorithm included in the case information to be analyzed and the AI engine algorithm included in the case information to be compared. Specifically, the information analysis unit 22 extracts the difference between the AI engine name included in the case information to be analyzed and the AI engine name included in the case information to be compared. The information analysis unit 22 also extracts the difference between the algorithm name included in the case information to be analyzed and the algorithm name included in the case information to be compared.

The information analysis unit 22 extracts the difference between the hyperparameters included in the case information to be analyzed and the hyperparameters included in the case information to be compared. Specifically, when the AI engine name and the algorithm name match, the information analysis unit 22 extracts the difference between the hyperparameters included in the case information to be analyzed and the hyperparameters included in the case information to be compared.

The calculation unit 23 calculates the hash value of the learning candidate data included in the case information to be analyzed, and calculates the hash value of the learning candidate data included in the case information to be compared. The information analysis unit 22 extracts the difference between the hash value of the learning candidate data included in the case information to be analyzed and the hash value of the learning candidate data included in the case information to be compared. A hash value is a fixed-length value obtained from learning candidate data by, for example, a hash function. If there is a difference in the hash value, it can be seen that the learning candidate data included in the case information to be analyzed is different from the learning candidate data included in the case information to be compared. Therefore, the information analysis unit 22 extracts the difference between the hash value of the learning candidate data included in the case information to be analyzed and the hash value of the learning candidate data included in the case information to be compared.

When the difference between the hash value of the learning candidate data included in the case information to be analyzed and the hash value of the learning candidate data included in the case information to be compared is extracted, the information analysis unit 22 corresponds to the learning candidate data. Extract the attached additional extraction/calculation information. The information analysis unit 22 extracts the difference between the basic statistics according to the data type of the objective variable, which is the additional extraction/calculation information associated with the learning candidate data in FIG. Therefore, the calculation unit 23 calculates a basic statistic of the learning candidate data included in the case information to be analyzed and a basic statistic of the learning candidate data included in the case information to be compared. The calculator 23 calculates a basic statistic for each variable included in the learning candidate data. Basic statistics include, for example, number of elements, arithmetic mean, standard deviation, minimum value, quarter quantile, median, and three quarter quantile. The calculation unit 23 calculates the value of each item included in the basic statistics for each variable.

The information analysis unit 22 extracts the difference between the basic statistics of the learning candidate data included in the case information to be analyzed and the basic statistics of the learning candidate data included in the case information to be compared. The information analysis unit 22 analyzes the basic statistics of the learning candidate data included in the case information to be analyzed and the case information to be compared for each variable included in the learning candidate data and each item included in the basic statistics. Extract the difference from the basic statistic of the learning candidate data.

The information analysis unit 22 extracts the difference between the explanatory variables included in the case information to be analyzed and the explanatory variables included in the case information to be compared. The information analysis unit 22 compares the variable list in which the explanatory variable name set as the explanatory variable is set, thereby determining the explanatory variable included in the case information to be analyzed and the explanatory variable included in the case information to be compared. Extract the difference between The information analysis unit 22 determines the explanatory variable names that match (overlapping) and do not match in the case information to be analyzed and the case information to be compared in the variable list. Extract the difference.

When there is a difference in the explanatory variables, in FIG. 4, the correlation coefficient with the objective variable and the weighting are extracted as additional extraction/calculation information associated with the explanatory variables. Specifically, the calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable using learning candidate data included in the case information to be analyzed. The calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable using the learning candidate data included in the case information to be compared. The calculation unit 23 calculates at least the correlation coefficient between the explanatory variable having a difference among the explanatory variables included in the case information to be analyzed and the case information to be compared, and the objective variable.

The information analysis unit 22 extracts the weighting that indicates the degree of weighting of the explanatory variables in the analysis model from the model qualitative information included in the analysis model information to be analyzed. The information analysis unit 22 extracts the weighting indicating the degree of weighting of the explanatory variables in the analysis model from the model qualitative information included in the analysis model information to be compared. The information analysis unit 22 extracts at least the weighting of explanatory variables having a difference among the explanatory variables included in the case information to be analyzed and the case information to be compared. Note that the weighting is a quantification of the importance of the input value, so it may also be referred to as a weighting factor.

Specifically, the information analysis unit 22 sets the regression coefficient of each explanatory variable set in the model qualitative information included in the analysis model information to be analyzed and the model qualitative information included in the analysis model information to be compared. The regression coefficients of each explanatory variable are extracted as weights. In other words, when the analytical model to be analyzed and the analytical model to be compared are represented by regression equations, the information analysis unit 22 extracts regression coefficients corresponding to explanatory variables of the regression equations as weights (weighting coefficients). . If the analysis model is created by heterogeneous mixture learning using multiple prediction formulas, each of the multiple prediction formulas is regarded as a regression formula, and the coefficient of each explanatory variable of each prediction formula is used for regression. Considering it as a coefficient, the regression coefficient may be extracted as weighting.

The calculation unit 23 calculates the basic statistics of the learning data included in the analysis model information to be analyzed, and calculates the basic statistics of the learning data included in the analysis model information to be compared. Specifically, the calculation unit 23 calculates the basic statistics of each variable set in the learning data included in the analysis model information to be analyzed, and calculates the basic statistics of each variable set in the learning data included in the analysis model information to be compared. Calculate basic statistics for each variable. In addition, since the basic statistics include, for example, the number of elements, the arithmetic mean, the standard deviation, the minimum value, the 1/4 quantile, the median value, and the 3/4 quantile, the calculation unit 23, for each variable , and for each basic statistic item, the basic statistic is calculated.

The information analysis unit 22 extracts the difference between the variables set in the learning data included in the analysis model information to be analyzed and the variables set in the learning data included in the analysis model information to be compared. Further, based on the result calculated by the calculation unit 23, the information analysis unit 22 calculates the learning data included in the analysis model information to be analyzed and the analysis model to be compared for each variable and for each basic statistic item. A difference in basic statistics is calculated with respect to the learning data included in the information.

The calculation unit 23 calculates the basic statistics of the evaluation data included in the analysis model information to be analyzed, and calculates the basic statistics of the evaluation data included in the analysis model information to be compared. Specifically, the calculation unit 23 calculates the basic statistics of each variable set in the evaluation data included in the analysis model information to be analyzed, and calculates the basic statistics of each variable set in the evaluation data included in the analysis model information to be compared. Calculate basic statistics for each variable. In addition, since the basic statistics include, for example, the number of elements, the arithmetic mean, the standard deviation, the minimum value, the 1/4 quantile, the median value, and the 3/4 quantile, the calculation unit 23, for each variable , and for each basic statistic item, the basic statistic is calculated.

The information analysis unit 22 extracts the difference between the variables set in the evaluation data included in the analysis model information to be analyzed and the variables set in the evaluation data included in the analysis model information to be compared. Further, based on the result calculated by the calculation unit 23, the information analysis unit 22 calculates the evaluation data included in the analysis model information to be analyzed and the analysis model to be compared for each variable and for each basic statistic item. A difference in basic statistics is calculated with respect to the evaluation data included in the information.

The information analysis unit 22 extracts the difference between the model qualitative information included in the analysis model information to be analyzed and the model qualitative information included in the analysis model information to be compared. When the analysis model is represented by a regression formula, the information analysis unit 22 extracts the difference of regression coefficients, which are weights included in the regression formula. Specifically, when the regression coefficients of the regression equation are set for the analytical model information to be analyzed and the analytical model information to be compared, the information analysis unit 22 uses the regression coefficients of the regression equations as weights, and calculates the weighted difference. Extract. In other words, the information analysis unit 22 extracts the difference in weighting for explanatory variables with different weights even when there is no difference between the explanatory variables included in the case information to be analyzed and the case information to be compared.

If there is a difference in weighting, the calculation unit 23 calculates the correlation coefficient between the explanatory variable with a difference in weighting (weighting coefficient) and the objective variable. The calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable having different weightings using the learning candidate data included in the case information to be analyzed, and calculates the learning data included in the case information to be compared. Calculated using candidate data.

When the analysis model is represented by a decision tree, the information analysis unit 22 extracts differences in the hierarchical information of the decision tree. Specifically, when decision tree hierarchy information is set in the analysis model information to be analyzed and the analysis model information to be compared, the information analysis unit 22 extracts the difference in the hierarchy information of the decision trees. The decision tree hierarchy information includes the number of levels of the decision tree, decision conditions for each branch of the decision tree, the number of learning data samples for each leaf of the decision tree, and the number of evaluation data samples for each leaf of the decision tree. Therefore, the information analysis unit 22 extracts the difference for each piece of hierarchical information of the decision tree.

The information analysis unit 22 extracts the difference between the accuracy index value included in the analysis model information to be analyzed and the accuracy index value included in the analysis model information to be compared. Specifically, the information analysis unit 22 sets the accuracy index value based on the learning data and the accuracy index based on the evaluation data, which are set to the accuracy index values included in the analytical model information to be analyzed and the analytical model information to be compared. Extract value differences. In addition, since at least one item related to the accuracy index is set as the accuracy index value, the information analysis unit 22 calculates the accuracy index value based on the learning data and the accuracy index value based on the evaluation data for each item related to the accuracy index. Extract the difference.

The information analysis unit 22 calculates the difference between the accuracy index value based on the learning data and the accuracy index value based on the evaluation data, the target accuracy index value related to the analysis model information to be analyzed and the analysis model information to be compared, and a predetermined The superiority or inferiority of the analysis model is judged based on the judgment conditions. Specifically, the information analysis unit 22, based on the difference between the accuracy index value based on the learning data and the accuracy index value based on the evaluation data, the target accuracy index value included in the analysis summary information, and the determination table, Determine the superiority or inferiority of the analysis model. That is, the information analysis unit 22 determines whether the prediction accuracy of the analysis model to be compared has improved or deteriorated based on the analysis model to be analyzed.

Here, an example of the determination table will be described using FIG. 5, and the prediction accuracy improvement and deterioration determination performed by the information analysis unit 22 will be described. FIG. 5 is a diagram showing an example of a determination table. In the determination table, a list of items of accuracy index values, a determination condition for performance improvement, and a determination condition for performance deterioration are set in order from the left.

The information analysis unit 22 acquires from the information holding unit 11 of the repository 10 an item indicating the accuracy index set as the target accuracy index value from the analysis outline information including the model information to be analyzed and the analysis model information to be compared. The information analysis unit 22 extracts the difference of the item that matches the item indicating the acquired accuracy index, with respect to the difference of the accuracy index value based on the learning data and the difference of the accuracy index value based on the evaluation data. The information analysis unit 22 searches the determination table for the extracted items, and compares the determination conditions for performance improvement and the determination conditions for performance deterioration set in the determination table with the difference of the extracted items. The information analysis unit 22 determines that the accuracy of the analytical model to be analyzed has improved compared to the analytical model to be compared when the difference for the extracted item satisfies the performance improvement criteria. The information analysis unit 22 determines that the accuracy of the analytical model to be analyzed is lower than that of the analytical model to be compared when the difference for the extracted item satisfies the performance deterioration determination condition.

Returning to FIG. 2, the output unit 24 will be described. The output section 24 corresponds to the output section 4 in the first embodiment. The output unit 24 outputs the extraction result of the information analysis unit 22 and the calculation result of the calculation unit 23 to the output device 40 . The output unit 24 receives output conditions and output items from the information input unit 21 . The output unit 24 outputs to the output device 40 the output items, the output items satisfying the output conditions, the extraction results, and the calculation results among the extraction results of the information analysis unit 22 and the calculation results of the calculation unit 23 .

The external system control unit 25 controls the execution of the AI engine provided outside the information processing device 100 .

The input device 30 functions as an input unit. The input device 30 may be, for example, a keyboard, mouse, touch panel, or the like. When the user inputs various information held by the information holding unit 11 of the repository 10 to the input device 30 , the input device 30 outputs the inputted information to the information input unit 21 . When the user inputs an analysis model to be analyzed and an analysis model to be compared by the information analysis unit 22 , the input device 30 outputs the information to the information input unit 21 .

The input device 30 receives information from the user as to whether or not to stop the analysis processing performed by the information analysis unit 22 and the calculation unit 23, and outputs the information to the information input unit 21. The input device 30 receives output conditions for outputting the extraction results extracted by the information analysis unit 22 and the calculation results calculated by the calculation unit 23 from the user and outputs them to the information input unit 21 . The input device 30 receives output items to be output to the output device 40 from the user and outputs them to the information input unit 21 .

The output device 40 functions as an output unit. The output device 40 is configured to include, for example, a display. The output device 40 displays the result calculated by the processing device 20 to the user. The output device 40 displays the output items, extraction results, and calculation results output from the output unit 24 on the display. Note that the output device 40 may output the output items, extraction results, and calculation results output from the output unit 24 to a file.

<Example of operation of information processing device>
Next, an operation example of the information processing apparatus 100 will be described with reference to FIGS. 6 and 7. FIG. Further, an operation example of the information processing device 100 will be described while showing a specific example of the extraction result extracted by the information analysis unit 22 . 6 and 7 are flowcharts showing an operation example of the information processing apparatus according to the second embodiment. As a premise, it is assumed that the user designates the analytical model to be analyzed and the analytical model to be compared, and the information analysis unit 22 specifies the analytical model information to be analyzed and the analytical model information to be compared. In addition, the information analysis unit 22 also specifies analysis target case information corresponding to analysis target analysis model information and comparison target case information corresponding to comparison target analysis model information.

The information analysis unit 22 obtains the objective variable name set to the objective variable included in the case information to be analyzed and the objective variable set to the objective variable included in the case information other than the case information to be analyzed acquired from the information holding unit 11. A difference from the target variable name is extracted (step S1).

The information analysis unit 22 determines whether or not there is a difference in objective variable names (step S2). In other words, the information analysis unit 22 determines whether the difference in the objective variable name has been extracted.
If there is a difference (YES in step S2), the information analysis unit 22 executes step S3.
If there is no difference (NO in step S2), the information analysis unit 22 executes step S5.

In step S3, the information analysis unit 22 determines whether there is a difference in the data types of the objective variables included in the case information to be analyzed and the case information to be compared (step S3).
If there is a difference in the data type (YES in step S3), the information input unit 21 confirms with the user via the input device 30 whether to stop extracting the difference in order to determine whether to stop the subsequent processing. (step S4). If there is a difference in the data types, the purpose of analysis may be different and meaningful comparison may not be possible. Therefore, the information input unit 21 confirms with the user whether to execute subsequent processing.

On the other hand, if there is no difference in data type (NO in step S3), the information analysis unit 22 executes step S5. Even if the target variable names are different, if the data types are the same, it can be determined that the analysis purposes match, so the information analysis unit 22 executes the subsequent processing.

In step S4, when the information input unit 21 receives information indicating that the user will stop extracting the difference via the input device 30 (YES in step S4), the information processing device 100 executes step S8. .
On the other hand, when the information input unit 21 receives information indicating that the user continues the difference extraction via the input device 30 (NO in step S4), the information analysis unit 22 executes step S5.

In step S5, the information analysis unit 22 extracts the difference between the AI engine algorithm included in the case information to be analyzed and the AI engine algorithm included in the case information to be compared (step S5). The information analysis unit 22 extracts the difference between the AI engine name included in the case information to be analyzed and the AI engine name included in the case information to be compared. The information analysis unit 22 also extracts the difference between the algorithm name included in the case information to be analyzed and the algorithm name included in the case information to be compared.

The information analysis unit 22 determines whether there is a difference between the AI engine algorithm included in the case information to be analyzed and the AI engine algorithm included in the case information to be compared (step S6). In other words, the information analysis unit 22 determines whether the AI engine algorithm difference has been extracted.
If there is a difference (YES in step S6), the information input unit 21 confirms with the user via the input device 30 whether to stop extracting the difference in order to determine whether to stop subsequent processing (step S7).
If there is no difference (NO in step S6), the information analysis unit 22 executes step S9.

In step S7, when the information input unit 21 receives through the input device 30 information indicating that the user is to stop extracting the difference (YES in step S7), the information processing apparatus 100 executes step S8. .
On the other hand, when the information input unit 21 receives information indicating that the user continues the difference extraction via the input device 30 (NO in step S7), the information analysis unit 22 executes step S10.

In step S8, the output unit 24 outputs the differences extracted in steps S1 and S5 to the output device 40 and displays them on the screen of the output device 40 (step S8). After executing step S8, the information processing apparatus 100 ends the process.

In step S9, the information analysis unit 22 extracts the difference between the hyperparameters included in the case information to be analyzed and the hyperparameters included in the case information to be compared (step S9). Since the same AI engine is used for the case information to be analyzed and the case information to be compared, the information analysis unit 22 extracts differences in hyperparameters.

Here, using FIG. 8, a specific example of the extraction results extracted by the information analysis unit 22 in steps S1 to S9 will be shown. FIG. 8 is a diagram showing an example of an extraction result. As shown in FIG. 8, the information analysis unit 22 holds the extraction results in, for example, a table format. Difference extraction items, cases to be analyzed, cases to be compared, and differences are set in the table in which the extraction results are set.

In the columns of difference extraction items, items for which differences are extracted in steps S1, S5 and S9 are set in each row.
In the column indicating the case to be analyzed, the case name included in the case information to be analyzed corresponding to the analysis model to be analyzed is set. FIG. 8 shows that case information to be analyzed is case 1 . In each row of the analysis target case, the values included in the analysis target case information are set for the items for which differences are extracted in steps S1, S5, and S9.

In the column indicating the case to be compared, the case name included in the case information to be compared corresponding to the analysis model to be compared is set. FIG. 8 shows that case information to be compared is case 2 . In each row of the case to be compared, the values included in the case information to be compared are set for the items for which differences are extracted in steps S1, S5 and S9.

Information indicating whether or not there was a difference in the difference extraction items for which differences were extracted in steps S1, S5, and S9 is set in the difference column. In addition, for example, if a numerical value is set for an item, such as a hyperparameter, and there is a difference, the value obtained by subtracting the value of the case to be analyzed from the value of the case to be compared is displayed in the column that indicates the difference. set.

Returning to FIG. 6, the description of the operation example of the information processing apparatus 100 is continued.
In step S10, the calculation unit 23 calculates hash values of learning candidate data included in the case information to be analyzed and the case information to be compared, and the information analysis unit 22 extracts the difference between the calculated hash values. (Step S10).

The information analysis unit 22 determines whether there is a difference in hash values (step S11).
If there is a difference (YES in step S11), the calculator 23 executes step S12. If there is a difference in hash values, it can be determined that the learning candidate data are different, so the information processing apparatus 100 analyzes the learning candidate data.
If there is no difference (NO in step S11), the calculator 23 executes step S20.

In step S12, the calculation unit 23 calculates the basic statistics of the learning candidate data included in the case information to be analyzed and the basic statistics of the learning candidate data included in the case information to be compared (step S12). The calculator 23 calculates a basic statistic for each variable included in the learning candidate data. Basic statistics include, for example, number of elements, arithmetic mean, standard deviation, minimum value, quarter quantile, median, and three quarter quantile. The calculation unit 23 calculates the value of each item included in the basic statistics for each variable.

The information analysis unit 22 extracts the difference between the basic statistics of the learning candidate data included in the case information to be analyzed and the basic statistics of the learning candidate data included in the case information to be compared (step S13). The information analysis unit 22 analyzes the basic statistics of the learning candidate data included in the case information to be analyzed and the basic statistics of the learning candidate data included in the case information to be compared for each variable included in the learning candidate data and each item included in the basic statistics. Extract the difference from the basic statistic of the learning candidate data.

The calculation unit 23 uses the learning candidate data included in the case information to be analyzed to calculate the correlation coefficient between the explanatory variable and the objective variable, and using the learning candidate data included in the case information to be compared, A correlation coefficient between the explanatory variable and the objective variable is calculated (step S14). Note that after the calculation unit 23 calculates the correlation coefficient, the information analysis unit 22 calculates the correlation coefficient calculated using the learning candidate data included in the case information to be analyzed and the correlation coefficient included in the case information to be compared. A difference from the correlation coefficient calculated using the learning candidate data obtained from the data may be extracted.

The information analysis unit 22 extracts the weighting for each explanatory variable from the model qualitative information included in the analysis model information to be analyzed, and extracts the weighting for each explanatory variable from the model qualitative information included in the analysis model information for comparison ( step S15). The information analysis unit 22 extracts the regression coefficient of each explanatory variable set in the analysis model information to be analyzed as weighting, and extracts the regression coefficient of each explanatory variable set in the model qualitative information included in the analysis model information to be compared. are extracted as weights. Note that when the analysis model is created by heterogeneous mixture learning using a plurality of prediction formulas, the information analysis unit 22 regards each of the plurality of prediction formulas as a regression formula, and calculates the coefficient of each variable of the prediction formula as a regression formula. It may be regarded as a coefficient and the coefficient may be extracted as weighting. After extracting the weighting, the information analysis unit 22 may extract the difference between the weighting extracted from the analytical model information to be analyzed and the weighting extracted from the analytical model information to be compared.

The weighting and correlation coefficient values are the basis for judging whether changes in the explanatory variables have an impact on the prediction accuracy of the analysis model. Therefore, in step S14, the calculation unit 23 calculates the correlation coefficient, and in step S15, the information analysis unit 22 extracts weighting.

Here, an example of the extraction result extracted by the information analysis unit 22 in steps S12 to S15 is shown using FIGS. 9A and 9B. 9A and 9B are diagrams showing examples of extraction results. 9A and 9B are diagrams obtained by dividing the extraction results extracted by the information analysis unit 22. The information analysis unit 22 holds FIGS. 9A and 9B as extraction results in steps S12 to S15. As shown in FIGS. 9A and 9B, similarly to FIG. 8, the information analysis unit 22 holds the extraction results in tabular form, for example. Difference extraction items, objective variable difference results, and explanatory variable difference results are set in the table in which the extraction results are set.

In the column of difference extraction items, for example, from the top, the result of checking whether explanatory variables exist in the case information to be analyzed and the case information to be compared, and the difference extracted by the information analysis unit 22 in steps S12 to S15. Each item is set. For example, since the basic statistic of learning candidate data includes a plurality of items, each item is set in one row so that the difference between the items can be understood. Also, regarding the weighting extracted from the model qualitative information, if the analytical model is created by heterogeneous mixture learning using multiple prediction formulas, each prediction formula should be 1 set to one line.

In the area where the difference result of the objective variable is set, for example, the objective variable name indicating which variable is the objective variable is set, the case name indicating the case information to be analyzed, and the case indicating the case information to be compared Name and difference are set. If there is a difference in an item for which a numerical value is set, a value obtained by subtracting the value of the case to be analyzed from the value of the case to be compared is set in the column indicating the difference.

The area in which the difference result of the explanatory variables is set includes the area in which the difference result of each explanatory variable is set so that the difference result of each explanatory variable can be understood. In the area where the difference result of each explanatory variable is set, for example, an explanatory variable name indicating which explanatory variable is the explanatory variable is set, a case name indicating the case information to be analyzed, and a case information to be compared. A case name and difference are set. If there is a difference in an item for which a numerical value is set, a value obtained by subtracting the value of the case to be analyzed from the value of the case to be compared is set in the column indicating the difference.

Returning to FIG. 6, the description of the operation example of the information processing apparatus 100 is continued.
In step S16, the information analysis unit 22 extracts the difference between the explanatory variable included in the case information to be analyzed and the explanatory variable included in the case information to be compared (step S16). The information analysis unit 22 compares the variable list in which the explanatory variable name set as the explanatory variable is set, thereby determining the explanatory variable included in the case information to be analyzed and the explanatory variable included in the case information to be compared. Extract the difference between The information analysis unit 22 determines the explanatory variable names that match (overlapping) and do not match in the case information to be analyzed and the case information to be compared in the variable list. Extract the difference.

The information analysis unit 22 determines whether there is a difference in explanatory variables (step S17). In other words, the information analysis unit 22 determines whether the difference of explanatory variables has been extracted.
If there is a difference (YES in step S17), the calculator 23 executes step S18.
If there is no difference (NO in step S17), the calculator 23 executes step S20.

In step S18, the calculation unit 23 uses the learning candidate data included in the case information to be analyzed and the case information to be compared to calculate the correlation coefficient between the explanatory variable and the objective variable (step S18). The calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable using learning candidate data included in the case information to be analyzed. The calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable using the learning candidate data included in the case information to be compared. In addition, since the calculation unit 23 calculates the correlation coefficient between the explanatory variable and the objective variable in step S14, the correlation coefficient calculated in step S14 may be used.

The information analysis unit 22 extracts the weighting for each explanatory variable from the model qualitative information included in the analysis model information to be analyzed, and extracts the weighting for each explanatory variable from the model qualitative information included in the analysis model information for comparison ( step S19). By executing steps S18 and S19, the user can grasp the correlation coefficients and weighting values of the deleted or added explanatory variables, and determine whether the explanatory variables have affected the prediction accuracy of the analysis model. I can judge. In addition, since the information analysis unit 22 extracts the weighting in step S15, the weighting extracted in step S15 may be used.

Here, using FIG. 10, an example of the extraction results extracted by the information analysis unit 22 in steps S16 to S19 is shown. FIG. 10 is a diagram showing an example of the extraction result. As shown in FIG. 10, similarly to FIG. 8, the information analysis unit 22 holds the extraction results in, for example, a table format. In the table in which the extraction results are set, explanatory variable names, information about the explanatory variables included in the case information to be analyzed, and information about the explanatory variables included in the case information to be compared are set.

The explanatory variable names of the explanatory variables included in the case information to be analyzed and the explanatory variables included in the case information to be compared are set in the explanatory variable name column. Each explanatory variable name is set for each row in the explanatory variable name.

A case name that indicates the case information to be analyzed is set in the information area related to explanatory variables included in the case information to be analyzed. In addition, in the area of information related to explanatory variables included in the case information to be analyzed, there is a column for setting whether or not the explanatory variable set in each row exists in the case information to be analyzed, and an objective function and , and a column in which the weights are set.

For columns where explanatory variables are set to indicate whether they exist in the case information to be analyzed, information is set so that it is possible to ascertain whether they also exist in the case information to be compared. In the example shown in FIG. 10, the hatched circle indicates that the explanatory variable is included not only in the case information to be analyzed but also in the case information to be compared. A circle without a slash indicates that the explanatory variable is included only in the case information to be analyzed or the case information to be compared, and there is a difference between the case information to be analyzed and the case information to be compared. It is shown that. In other words, if a circle without a slash is set in the "presence" column of the information related to the explanatory variable included in the case information to be analyzed, then the explanatory variable of interest is only found in the case information to be analyzed. Represents Note that when the analysis model is created by heterogeneous mixture learning using a plurality of prediction formulas, the coefficients of each prediction formula may be set as one column.

A case name indicating the case information to be compared is set in the area of the information related to the explanatory variables included in the case information to be compared. In addition, in the area of information related to explanatory variables included in the case information to be compared, there is a column for setting whether or not the explanatory variable set in each row exists in the case information to be compared, and an objective function and , and a column in which the weights are set.

For columns where explanatory variables are set to indicate whether they exist in the case information to be compared, information is set so that it is possible to ascertain whether they also exist in the case information to be analyzed. If a circle without a slash is set in the "presence" column of the information about the explanatory variable included in the case information to be compared, it means that the explanatory variable is only in the case information to be compared. show. Note that when the analysis model is created by heterogeneous mixture learning using a plurality of prediction formulas, the coefficients of each prediction formula may be set as one column.

Next, the description of the operation example of the information processing apparatus 100 will be continued with reference to FIG.
In step S20, the calculation unit 23 calculates the basic statistics of the learning data and the evaluation data included in the analysis model information to be analyzed, and calculates the basic statistics of the learning data and the evaluation data included in the analysis model information to be compared. calculate. (Step S20). The calculation unit 23 calculates the basic statistics of each variable set in the learning data included in the analysis model information to be analyzed, and calculates the basic statistics of each variable set in the learning data included in the analysis model information to be compared. Calculate quantity. The calculator 23 calculates a basic statistic for each variable and for each basic statistic item. The calculation unit 23 calculates the basic statistics of each variable set in the evaluation data included in the analysis model information to be analyzed, and calculates the basic statistics of each variable set in the evaluation data included in the analysis model information to be compared. Calculate quantity. The calculator 23 calculates a basic statistic for each variable and for each basic statistic item.

The information analysis unit 22 extracts the difference between the basic statistics of the learning data and the evaluation data and the difference between the variables based on the calculation result of the calculation unit 23 (step S21). Based on the results calculated by the calculation unit 23, the information analysis unit 22 analyzes the learning data included in the analysis model information to be analyzed and the analysis model information to be compared for each variable and each basic statistic item. Differences in basic statistics are calculated for included learning data. The information analysis unit 22 extracts the difference between the variable set in the learning data included in the analysis model information to be analyzed and the variable set in the learning data included in the analysis model information to be compared. Further, based on the result calculated by the calculation unit 23, the information analysis unit 22 calculates the evaluation data included in the analysis model information to be analyzed and the analysis model to be compared for each variable and for each basic statistic item. A difference in basic statistics is calculated with respect to the evaluation data included in the information. The information analysis unit 22 also extracts the difference between the variables set in the evaluation data included in the analytical model information to be analyzed and the variables set in the evaluation data included in the analytical model information to be compared.

Next, the information analysis unit 22 determines whether there is model qualitative information in the analysis model information to be analyzed, and determines whether there is model qualitative information in the analysis model information to be compared (step S22).

If there is no model qualitative information (YES in step S22), the information analysis unit 22 executes step S27.
If there is model qualitative information (NO in step S22), the information analysis unit 22 extracts the difference in weighting of each explanatory variable (step S23). When regression coefficients of regression equations are set for the analysis model information to be analyzed and the analysis model information to be compared, the information analysis unit 22 uses the regression coefficients of the regression equations as weights and extracts weighted differences.

In step S24, the information analysis unit 22 determines whether there is a weighting difference (step S24).
If there is a weighting difference (YES in step S24), the calculator 23 calculates a correlation coefficient between the explanatory variable and the objective variable (step S25).
On the other hand, if there is no weighting difference (NO in step S24), the information analysis unit 22 executes step S26.

At step S26, the information analysis unit 22 extracts the difference in the hierarchical information of the decision tree (step S26). When decision tree hierarchy information is set in the analysis model information to be analyzed and the analysis model information to be compared, the information analysis unit 22 extracts the difference in the hierarchy information of the decision trees. The decision tree hierarchy information includes the number of levels of the decision tree, decision conditions for each branch of the decision tree, the number of learning data samples for each leaf of the decision tree, and the number of evaluation data samples for each leaf of the decision tree. The information analysis unit 22 extracts a difference for each piece of hierarchical information of the decision tree.

Here, an example of the extraction result extracted by the information analysis unit 22 in step S26 is shown using FIGS. 11 and 12. FIG. 11 and 12 are diagrams showing examples of extraction results.

FIG. 11 is a diagram for explaining differences in the decision conditions of each branch of the decision tree among the hierarchical information of the decision tree. As shown in FIG. 11, the information analysis unit 22 uses arrows to indicate the decision conditions of each branch of the decision tree for each of the cases to be analyzed and the cases to be compared so that the relationships between the branches can be understood. Set to table to set the extraction result to. The information analysis unit 22 finds a difference part based on the relation line of the decision condition of each branch of the decision tree. The information analysis unit 22 holds the difference between the decision conditions of each branch of the decision tree so that the difference can be found.

FIG. 12 is a diagram for explaining the difference between the number of learning data samples for each leaf of the decision tree and the number of evaluation data samples for each leaf of the decision tree among the hierarchical information of the decision tree. As shown in FIG. 12, the information analysis unit 22 holds the extraction results in, for example, a table format. Information indicating the leaves of the decision tree and the number of samples are set in the table in which the extraction results are set.

In the column of information indicating the leaves of the decision tree, each row contains the predicted value of the final objective variable, which is the leaf of the decision tree.

The area where the number of samples is set includes an area where information about the number of data samples for which the training data is classified into each leaf is set, and an area for which information about the number of data samples for which the evaluation data is classified into each leaf is set. and are set.

The area where information about the number of data samples for which the training data is classified into each leaf is set. is set.

The area in which information about the number of data samples for which the evaluation data is classified into each leaf is set. is set. In the difference column, the difference calculated by the information analysis unit 22 for the number of learning data samples and the number of evaluation data samples for each leaf of the decision tree is set for each leaf of the decision tree. The information analysis unit 22 calculates a difference by subtracting the number of learning data samples specified from the case information to be compared from the number of learning data samples specified from the case information to be compared, and calculates the difference in the difference column. set the value.

Returning to FIG. 7, the description of the operation example of the information processing apparatus 100 is continued.
In step S27, the information analysis unit 22 extracts the difference between the accuracy index value included in the analytical model information to be analyzed and the accuracy index value included in the analytical model information to be compared (step S27). The information analysis unit 22 extracts the difference between the accuracy index value based on the learning data and the accuracy index value based on the evaluation data set in the accuracy index values included in the analytical model information to be analyzed and the analytical model information to be compared. do. In addition, since at least one item related to the accuracy index is set as the accuracy index value, the information analysis unit 22 calculates the accuracy index value based on the learning data and the accuracy index value based on the evaluation data for each item related to the accuracy index. Extract the difference.

Here, using FIG. 13, an example of the extraction result extracted by the information analysis unit 22 in step S27 is shown. FIG. 13 is a diagram illustrating an example of an extraction result; As shown in FIG. 13, the information analysis unit 22 holds the extraction results in, for example, a table format. Difference extraction items, cases to be analyzed, cases to be compared, and differences are set in the table in which the extraction results are set.

In the difference extraction item column, each item related to the accuracy index indicating that the difference extraction item is an accuracy index value and the accuracy index value is set. Each item related to the accuracy index includes a row in which an accuracy index value based on learning data is set and a row in which an accuracy index value based on evaluation data is set.

In the column indicating the case to be analyzed, the case name included in the case information to be analyzed corresponding to the analysis model to be analyzed is set. FIG. 13 shows that case information to be analyzed is case 1 . Also, in the columns of the cases to be analyzed, an accuracy index value based on the learning data and an accuracy index value based on the evaluation data are set for each item related to the accuracy index.

In the column indicating the case to be compared, case names included in the case information to be compared corresponding to the analysis model to be compared are set. FIG. 13 shows that case information to be compared is case 2 . Also, in the column of cases to be compared, an accuracy index value based on the learning data and an accuracy index value based on the evaluation data are set for each item related to the accuracy index.
A value obtained by subtracting the value of the case to be analyzed from the value of the case to be compared is set in the column indicating the difference.

Returning to FIG. 7, the description of the operation example of the information processing apparatus 100 is continued.
In step S28, the information analysis unit 22 determines the superiority or inferiority of the performance of the analysis model based on the difference in accuracy index value corresponding to the target accuracy index value (step S28). Based on the difference between the accuracy index value based on the learning data and the accuracy index value based on the evaluation data, the target accuracy index value included in the analysis summary information, and the determination table shown in FIG. Determine the superiority or inferiority of the analysis model.

The information analysis unit 22 acquires from the information holding unit 11 of the repository 10 an item indicating the accuracy index set as the target accuracy index value from the analysis outline information including the model information to be analyzed and the analysis model information to be compared. The information analysis unit 22 extracts the difference of the item that matches the item indicating the acquired accuracy index, with respect to the difference of the accuracy index value based on the learning data and the difference of the accuracy index value based on the evaluation data. The information analysis unit 22 searches the determination table for the extracted items, and compares the determination conditions for performance improvement and the determination conditions for performance deterioration set in the determination table with the difference of the extracted items.

The information analysis unit 22 determines that the accuracy of the analytical model to be analyzed has improved compared to the analytical model to be compared when the difference for the extracted item satisfies the criteria for performance improvement. The information analysis unit 22 determines that the accuracy of the analytical model to be analyzed is lower than that of the analytical model to be compared when the difference for the extracted item satisfies the performance deterioration determination condition. If there is no difference between the items of the learning accuracy index value and the prediction accuracy index value and the difference is 0 (zero), no determination is made for the corresponding item.

The information analysis unit 22 outputs the difference extracted up to step S28 to the output device 40 via the output unit 24 (step S29).
The information input unit 21 confirms with the user via the input device 30 whether or not to narrow down the difference display output to the output device 40 (step S30).

When the information input unit 21 receives information indicating that the user narrows down the difference table via the input device 30 (YES in step S30), the information processing apparatus 100 executes step S31.
On the other hand, when the information input unit 21 receives information indicating that the user does not narrow down the difference table via the input device 30 (NO in step S30), the output unit 24 executes step S34.

In step S31, the information input unit 21 inputs display item selection information selected by the user via the input device 30 (step S31). In other words, the information input unit 21 inputs output items to be finally output to the output device 40 .

The information input unit 21 inputs, via the input device 30, output conditions for narrowing down the difference display output to the output device 40 (step S32). The information input unit 21 inputs output conditions for determining items to be finally output to the output device 40 for each of the objective variable and the explanatory variable via the input device 30 . Specifically, the information input unit 21 sets the determination condition for whether or not to display the basic statistics of the learning candidate data, the correlation coefficient with the objective variable, and the weighting difference for each explanatory variable of the model qualitative information. The content input by the user is input to the input device 30 for each explanatory variable.

The output unit 24 determines output items, extraction results, and calculation results that satisfy the display item selection information and output conditions (step S33). In other words, the output unit 24 determines the output items based on the display item selection information, the output items satisfying the output conditions, the extraction results, and the calculation results. The output unit 24 determines whether or not to display the difference between the learning candidate data, explanatory variables, and model qualitative information on the screen for each of the objective variable and the explanatory variable.

Here, the contents of steps S31 to S33 will be explained again using FIGS. 14A and 14B. 14A and 14B are diagrams for explaining the process of narrowing down the difference display.
14A and 14B are diagrams corresponding to FIGS. 9A and 9B, respectively, with the addition of a screen display column to the rightmost columns of FIGS. 9A and 9B. Also, in FIG. 14B, a screen display area is added at the bottom. In addition, in FIG. 14B, in the screen display area, a display availability determination condition row and a display row are added.

In step S31, the output unit 24 displays, on the output device 40, the screen display columns and the screen display rows of FIGS. 14A and 14B in a blank state. The user inputs display selection information on the output item to the input device 30 by selecting the output item to be finally displayed on the screen.

Also, in step S32, the user inputs into the input device 30 the output conditions for determining the extraction results and calculation results to be finally displayed on the screen. The output condition is determined by the user inputting the display propriety determination condition in the screen display area shown in FIG. 14B. The information input unit 21 inputs the display propriety determination condition input by the user via the input device 30 as an output condition.

In step S33, the output unit 24 determines the display item selection information, the output items that satisfy the output conditions, the extraction results, and the calculation results. The output unit 24 determines to display the output item included in the display item selection information on the screen. The output unit 24 also determines whether the output conditions are satisfied for each of the objective variable and explanatory variable, and determines the extraction results and calculation results to be displayed on the screen. In the example shown in FIG. 14B, the output condition of whether the absolute value of the difference in the arithmetic mean is 100 or more or whether data exists in only one of case 1 and case 2 is input as the display availability determination condition. ing. The output unit 24 determines whether or not the input output conditions are satisfied for each of the objective variable and explanatory variable. In the example shown in FIGS. 14A and 14B , the explanatory variable “result (10,000 kW)_7 days ago” satisfies the condition, so the output unit 24 determines to display the explanatory variable on the screen. The output unit 24 sets the determined contents in the display line in the screen display line.

Returning to FIG. 7, the description of the operation example of the information processing apparatus 100 is continued.
In step S34, the output unit 24 outputs by displaying the difference on the screen of the output device 40 (step S34). In step S30, if the difference display is not narrowed down, the output unit 24 outputs to the output device 40 so as to maintain the difference displayed in step S29. In step S30, when narrowing down the difference display, the output unit 24 outputs to the output device 40 the output items and the difference determined to be displayed on the screen in step S33.

As described above, the information processing apparatus 100 extracts the difference between various types of information regarding the analytical model to be analyzed and the analytical model to be compared. Therefore, by using the information processing device 100, it is possible to standardize the case difference extraction, and by clarifying the points to be evaluated, the prediction accuracy of the leveled analysis model can be improved in a short time and regardless of the skill level. evaluation can be realized. Therefore, by using the information processing apparatus 100, it is possible to efficiently create an analysis model and improve the prediction accuracy.

Specifically, the person in charge of analysis can confirm the overall improvement status based on the difference in the accuracy index values extracted by the information processing apparatus 100, and can confirm the difference in the explanatory variables and the difference in the AI engine/algorithm. You can immediately check the factors for Also, regarding the degree of impact on improvement, the person in charge of analysis should consider basic statistics of learning candidate data, changes in data trends such as correlation coefficients between explanatory variables and objective variables, weighting of regression formulas, and decision tree It is possible to determine whether or not to adopt the conditional expression used in In this way, even if the person in charge of analysis is inexperienced, the information processing apparatus 100 outputs information corresponding to the points to be confirmed, so that the evaluation of the prediction accuracy of the analysis model can be made more efficient. Therefore, according to the information processing apparatus 100 according to the second embodiment, it is possible to efficiently evaluate the learning model regardless of the skill level of the person in charge of analysis.

Further, in the technology disclosed in Patent Document 1 described above, the accuracy index value when the AI engine and algorithm are changed for the comparison target with the same objective variable, learning candidate data, explanatory variable, learning data, and evaluation data Extract the difference between In the technique disclosed in Patent Document 1, the influence on the prediction accuracy of the analysis model due to changes in the AI engine and algorithm is determined. On the other hand, the information processing apparatus 100 extracts the difference even when the objective variable, learning candidate data, explanatory variable, learning data, and evaluation data are not the same. Therefore, according to the information processing apparatus 100 according to the second embodiment, it is possible to grasp information that affects the prediction accuracy of the analysis model, which contributes to creating an analysis model with high prediction accuracy.

(Other embodiments)

The information processing apparatuses 1 and 100 (hereinafter referred to as information processing apparatuses 1 and the like) described in the above embodiments may have the following hardware configuration. FIG. 15 is a diagram illustrating a hardware configuration example of an information processing apparatus according to the present disclosure;

With reference to FIG. 15, the information processing device 1 and the like include a processor 1201 and a memory 1202 . The processor 1201 reads software (computer program) from the memory 1202 and executes it to perform the processing of the information processing apparatus 1 and the like described using the flowcharts in the above-described embodiments. The processor 1201 may be, for example, a microprocessor, MPU (Micro Processing Unit), or CPU (Central Processing Unit). Processor 1201 may include multiple processors.

The memory 1202 is composed of a combination of volatile memory and non-volatile memory. Memory 1202 may include storage remotely located from processor 1201 . In this case, processor 1201 may access memory 1202 via an I/O (Input/Output) interface (not shown).

In the example of FIG. 15, memory 1202 is used to store software modules. The processor 1201 reads these software modules from the memory 1202 and executes them, thereby performing the processing of the information processing apparatus 1 and the like described in the above embodiments.

As described with reference to FIG. 15, each of the one or more processors included in the information processing apparatus 1 or the like has one or more processors containing instructions for causing the computer to execute the algorithm described with reference to the drawings. Run the program.

In the above examples, the program includes instructions (or software code) that, when read into a computer, cause the computer to perform one or more of the functions described in the embodiments. The program may be stored in a non-transitory computer-readable medium or tangible storage medium. By way of example, and not limitation, computer readable media or tangible storage media may include random-access memory (RAM), read-only memory (ROM), flash memory, solid-state drives (SSD) or other memory technology, CDs -ROM, digital versatile disc (DVD), Blu-ray disc or other optical disc storage, magnetic cassette, magnetic tape, magnetic disc storage or other magnetic storage device; The program may be transmitted on a transitory computer-readable medium or communication medium. By way of example, and not limitation, transitory computer readable media or communication media include electrical, optical, acoustic, or other forms of propagated signals.

It should be noted that the present disclosure is not limited to the above embodiments, and can be modified as appropriate without departing from the scope. In addition, the present disclosure may be implemented by appropriately combining each embodiment.

In addition, part or all of the above-described embodiments can be described as the following additional remarks, but are not limited to the following.
(Appendix 1)
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. an analysis means for extracting a difference from explanatory variables;
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and a calculation means for calculating a second correlation coefficient with a second objective variable included in the second case information;
an output means for outputting the extraction result of the analysis means and the calculation result of the calculation means;
Information processing device.
(Appendix 2)
The analysis means, when extracting a difference between the first explanatory variable and the second explanatory variable, provides a first weighting coefficient indicating the degree of weighting of the first explanatory variable in the first learning model. and a second weighting factor indicating the degree of weighting of the second explanatory variable in the second learning model.
(Appendix 3)
When the first learning model is represented by a first regression formula, the analysis means extracts a regression coefficient of the first regression formula as the first weighting factor, and the second learning model is The information processing apparatus according to appendix 2, wherein, when expressed by a second regression equation, a regression coefficient of the second regression equation is extracted as the second weighting coefficient.
(Appendix 4)
The calculating means calculates a third correlation coefficient between the first explanatory variable and the first objective variable, with respect to the first explanatory variable having a difference between the first weighting factor and the second weighting factor. , for a second explanatory variable having a difference between the first weighting factor and the second weighting factor, calculating a fourth correlation coefficient with the second objective variable,

Supplementary Note

2 or 4. The information processing device according to 3.
(Appendix 5)
The calculation means calculates a first basic statistic of first learning candidate data included in the first case information and a second basic statistic of second learning candidate data included in the second case information. calculate the amount and
5. The information processing apparatus according to any one of appendices 1 to 4, wherein the analysis means extracts a difference between the first basic statistic and the second basic statistic.
(Appendix 6)
The calculation means calculates a third basic statistic of the first learning data used to create the first learning model, and calculates a third basic statistic used to create the second learning model. Calculate the fourth basic statistic of the learning data of 2,
The analysis means analyzes the difference between the third basic statistic and the fourth basic statistic, and the difference between the variables included in the first learning data and the variables included in the second learning data. 6. The information processing device according to any one of appendices 1 to 5, which extracts a difference.
(Appendix 7)
The calculating means calculates a fifth basic statistic of the first evaluation data used to evaluate the first learning model and a second basic statistic used to evaluate the second learning model. Calculate a sixth basic statistic of the evaluation data,
The analysis means analyzes the difference between the fifth basic statistic and the sixth basic statistic, and the variables included in the first evaluation data and the variables included in the second evaluation data. 7. The information processing device according to any one of appendices 1 to 6, which extracts a difference.
(Appendix 8)
When the first learning model is represented by a first decision tree and when the second learning model is represented by a second decision tree, the analysis means performs 8. The information processing device according to any one of appendices 1 to 7, wherein a difference between hierarchical information and hierarchical information of the second decision tree is extracted.
(Appendix 9)
The hierarchy information includes the number of layers of the decision tree, the relation information between the layers, the decision condition of each branch of the decision tree, the number of learning data samples of each leaf of the decision tree, and the number of evaluation data samples of each leaf of the decision tree. , the information processing apparatus according to appendix 8.
(Appendix 10)
The analysis means performs at least a first accuracy index value indicating accuracy of at least one of the learning result and prediction result of the first learning model and at least one of the learning result and prediction result of the second learning model. 10. The information processing apparatus according to any one of appendices 1 to 9, wherein a difference between a second accuracy index value indicating one accuracy and a difference is extracted.
(Appendix 11)
The analysis means includes a difference between the first accuracy index value and the second accuracy index value, a target accuracy index value related to the first learning model and the second learning model, and a predetermined 11. The information processing apparatus according to appendix 10, wherein it is determined whether or not the prediction accuracy of the first learning model is improved over the prediction accuracy of the second learning model based on a determination condition.
(Appendix 12)
further comprising input means for inputting output items and output conditions,
12. The output unit according to any one of appendices 1 to 11, wherein the output means outputs the output item, the output item satisfying the output condition, the extraction result, and the calculation result out of the extraction result and the calculation result. Information processing equipment.
(Appendix 13)
The analysis means extracts a difference between a first AI (Artificial Intelligence) engine included in the first case information and a second AI engine included in the second case information, Appendices 1 to 13. The information processing device according to any one of 12.
(Appendix 14)
14. The information processing according to appendix 13, wherein the analysis means extracts a difference between a first learning algorithm included in the first case information and a second learning algorithm included in the second case information. Device.
(Appendix 15)
The analysis means is included in the first case information if the first AI engine matches the second AI engine and the first learning algorithm matches the second learning algorithm 15. The information processing device according to appendix 14, wherein a difference between a first hyperparameter and a second hyperparameter included in the second case information is extracted.
(Appendix 16)
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. Diff method including.
(Appendix 17)
A non-temporary computer-readable medium storing a program that causes an information processing device to execute a difference extraction method,
The difference extraction method is
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. non-transitory computer-readable media including;

Reference Signs List

1, 100 information processing device 2 analysis unit 3 calculation unit 4 output unit 10 repository 11 information holding unit 20 processing unit 21 information input unit 22 information analysis unit 23 calculation unit 24 output unit 25 external system control unit 30 input device 40 output device 1201 processor 1202 memory

Claims

A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. an analysis means for extracting a difference from explanatory variables;
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and a calculation means for calculating a second correlation coefficient with a second objective variable included in the second case information;
an output means for outputting the extraction result of the analysis means and the calculation result of the calculation means;
Information processing device.
The analysis means, when extracting a difference between the first explanatory variable and the second explanatory variable, provides a first weighting coefficient indicating the degree of weighting of the first explanatory variable in the first learning model. and a second weighting factor indicating the degree of weighting of the second explanatory variable in the second learning model.
When the first learning model is represented by a first regression formula, the analysis means extracts a regression coefficient of the first regression formula as the first weighting factor, and the second learning model is 3. The information processing apparatus according to claim 2, wherein, when represented by a second regression equation, a regression coefficient of said second regression equation is extracted as said second weighting coefficient.
The calculating means calculates a third correlation coefficient between the first explanatory variable and the first objective variable, with respect to the first explanatory variable having a difference between the first weighting factor and the second weighting factor. , calculating a fourth correlation coefficient with said second objective variable for a second explanatory variable having a difference between said first weighting factor and said second weighting factor; 3. The information processing device according to 3.
The calculation means calculates a first basic statistic of first learning candidate data included in the first case information and a second basic statistic of second learning candidate data included in the second case information. calculate the amount and
5. The information processing apparatus according to claim 1, wherein said analysis means extracts a difference between said first basic statistic and said second basic statistic.
The calculation means calculates a third basic statistic of the first learning data used to create the first learning model, and calculates a third basic statistic used to create the second learning model. Calculate the fourth basic statistic of the learning data of 2,
The analysis means analyzes the difference between the third basic statistic and the fourth basic statistic, and the difference between the variables included in the first learning data and the variables included in the second learning data. The information processing apparatus according to any one of claims 1 to 5, which extracts a difference.
The calculating means calculates a fifth basic statistic of the first evaluation data used to evaluate the first learning model and a second basic statistic used to evaluate the second learning model. Calculate a sixth basic statistic of the evaluation data,
The analysis means analyzes the difference between the fifth basic statistic and the sixth basic statistic, and the variables included in the first evaluation data and the variables included in the second evaluation data. 7. The information processing apparatus according to any one of claims 1 to 6, which extracts a difference.
When the first learning model is represented by a first decision tree and when the second learning model is represented by a second decision tree, the analysis means performs 8. The information processing apparatus according to claim 1, wherein difference between hierarchical information and hierarchical information of said second decision tree is extracted.
The hierarchy information includes the number of layers of the decision tree, the relation information between the layers, the decision condition of each branch of the decision tree, the number of learning data samples of each leaf of the decision tree, and the number of evaluation data samples of each leaf of the decision tree. 9. The information processing apparatus according to claim 8.
The analysis means performs at least a first accuracy index value indicating accuracy of at least one of the learning result and prediction result of the first learning model and at least one of the learning result and prediction result of the second learning model. 10. The information processing apparatus according to any one of claims 1 to 9, wherein a difference between a second accuracy index value indicating one accuracy and a difference is extracted.
The analysis means includes a difference between the first accuracy index value and the second accuracy index value, a target accuracy index value related to the first learning model and the second learning model, and a predetermined 11. The information processing apparatus according to claim 10, wherein it is determined whether or not the prediction accuracy of said first learning model has improved over the prediction accuracy of said second learning model based on a determination condition.
further comprising input means for inputting output items and output conditions,
12. The output unit according to any one of claims 1 to 11, wherein out of the extraction result and the calculation result, the output item satisfying the output item and the output condition, the extraction result, and the calculation result are output. information processing equipment.
2. The analyzing means extracts a difference between a first AI (Artificial Intelligence) engine included in the first case information and a second AI engine included in the second case information. 13. The information processing apparatus according to any one of items 1 to 12.
14. The information according to claim 13, wherein said analysis means extracts a difference between a first learning algorithm included in said first case information and a second learning algorithm included in said second case information. processing equipment.
The analysis means is included in the first case information if the first AI engine matches the second AI engine and the first learning algorithm matches the second learning algorithm 15. The information processing apparatus according to claim 14, extracting a difference between a first hyperparameter and a second hyperparameter included in said second case information.
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. Diff method including.
A non-temporary computer-readable medium storing a program that causes an information processing device to execute a difference extraction method,
The difference extraction method is
A first explanatory variable included in the first case information indicating information about the design pattern of the first learning model, and a second explanatory variable included in the second case information indicating information about the design pattern of the second learning model. Extracting the difference with the explanatory variable,
When the difference is extracted, a first correlation coefficient between the first explanatory variable and a first objective variable included in the first case information is calculated, and the second explanatory variable and calculating a second correlation coefficient with a second objective variable included in the second case information; and outputting the extracted extraction result and the calculated calculation result. non-transitory computer-readable media including;