CN112989621B - Model performance evaluation method, device, equipment and storage medium - Google Patents

Model performance evaluation method, device, equipment and storage medium Download PDF

Info

Publication number
CN112989621B
CN112989621B CN202110347813.8A CN202110347813A CN112989621B CN 112989621 B CN112989621 B CN 112989621B CN 202110347813 A CN202110347813 A CN 202110347813A CN 112989621 B CN112989621 B CN 112989621B
Authority
CN
China
Prior art keywords
evaluated
model
evaluation
index
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110347813.8A
Other languages
Chinese (zh)
Other versions
CN112989621A (en
Inventor
郑晓华
陈青山
许国良
康祖荫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202110347813.8A priority Critical patent/CN112989621B/en
Publication of CN112989621A publication Critical patent/CN112989621A/en
Application granted granted Critical
Publication of CN112989621B publication Critical patent/CN112989621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a model performance evaluation method, a device, equipment and a storage medium. The method comprises the following steps: obtaining a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model; determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated; and determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights. According to the technical scheme, the problems of incomplete evaluation dimension and imperfect model evaluation system in the existing model evaluation are solved, the pertinence of the performance evaluation of the target model is improved, and the accuracy and the credibility of the obtained model evaluation result are improved.

Description

Model performance evaluation method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a model performance evaluation method, device, equipment and storage medium.
Background
In recent years, with the wide application of models in banking industry, models of banks have evolved from risk metering class models to artificial intelligence and machine learning models. The machine learning model has wider application in the fields of data analysis, credit approval, decision making, client management and the like. The classification model is the most typical model with the widest application range in the machine learning model, and performance evaluation is required for the usability of the explicit model.
The existing classification model evaluation method generally comprises the following steps: 1) Before evaluating the performance of the model, selecting a few measurement indexes such as confusion matrix, receiver work characteristic Curve (Receiver Operating Characteristic, ROC), area Under Curve (AUC) and the like; 2) After the measurement indexes are selected, each index value of the model is calculated by using the data of the test set and the verification set, and a comprehensive conclusion of whether the performance of the model meets the standard is obtained according to the calculated index values and some existing evaluation standards.
However, in the existing classification model performance evaluation method, performance indexes which are required to be evaluated when the model is in different stages and different dimensions are not considered, so that the dimension of the model evaluation is incomplete, the model evaluation system is imperfect, and the finally obtained model evaluation result is inaccurate and has low reliability.
Disclosure of Invention
The invention provides a model performance evaluation method, device, equipment and storage medium, which are used for realizing model performance evaluation on a target model to be evaluated through more targeted evaluation indexes in more dimensionalities, improving the pertinence of model performance evaluation and improving the accuracy and the credibility of a model evaluation result.
In a first aspect, an embodiment of the present invention provides a method for evaluating model performance, including:
obtaining a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model;
determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated;
and determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights.
Further, the factors to be evaluated of the target model include: sample evaluation factors, feature evaluation factors, model evaluation factors, front-end monitoring evaluation factors and back-end monitoring evaluation factors.
Further, determining the to-be-evaluated index in the preset evaluation index set according to the to-be-evaluated factor includes:
If the factors to be evaluated are sample evaluation factors, the indexes to be evaluated comprise the number of samples, the ratio of positive and negative samples, the number of features and the feature deletion rate;
if the factor to be evaluated is a characteristic evaluation factor, the index to be evaluated comprises a monotonic characteristic duty ratio, a predictive capability characteristic duty ratio, a stable characteristic duty ratio and a weak correlation characteristic duty ratio;
if the factors to be evaluated are model evaluation factors, the indexes to be evaluated comprise accuracy evaluation indexes, model discrimination evaluation indexes, lifting graphs, model stability evaluation indexes and model mobility;
if the factor to be evaluated is a front-end monitoring evaluation factor, the index to be evaluated comprises a stable characteristic duty ratio and a model stability evaluation index;
and if the factors to be evaluated are the back-end monitoring evaluation factors, the indexes to be evaluated comprise model discrimination evaluation indexes and model mobility.
Further, before obtaining the weight corresponding to each index to be evaluated, the method further includes:
determining the hierarchical total sequence of each evaluation index in a preset evaluation index set based on a hierarchical analysis method;
and determining the weight of each evaluation index according to the hierarchical total sequence.
Further, determining a hierarchical total ordering of each evaluation index in the preset evaluation index set based on a hierarchical analysis method comprises the following steps:
Constructing a hierarchical structure chart according to a preset decision target, a preset decision criterion and the interrelation among the decision objects; the method comprises the steps that a preset decision criterion comprises factors to be evaluated, and a decision object comprises all evaluation indexes in a preset evaluation index set;
constructing a judgment matrix according to importance scales among elements in each level in the hierarchy chart;
consistency test is carried out on the judgment matrix, and when the test is successful, hierarchical single sequencing is carried out on each level in the hierarchical structure diagram according to the judgment matrix;
and determining the total hierarchical sequence of each evaluation index in the preset evaluation index set according to the hierarchical single sequence corresponding to each hierarchical.
Further, performing consistency check on the judgment matrix includes:
determining the maximum eigenvalue and the matrix order of the judgment matrix;
determining the difference between the maximum characteristic value and the matrix order as a first difference value, determining the difference between the matrix order and one as a second difference value, and determining the ratio of the first difference value to the second difference value as a consistency index of the judgment matrix;
determining a random consistency index of the judgment matrix according to the matrix order, and determining the ratio of the consistency index to the corresponding random consistency index as a consistency ratio;
if the consistency ratio is within the preset ratio threshold, the consistency check is determined to be successful.
Further, determining a hierarchical total ranking of each evaluation index in the preset evaluation index set according to the hierarchical single ranking corresponding to each hierarchy, including:
determining the sequencing weight of the relative importance of each evaluation index in a preset evaluation index set relative to a preset decision target according to the hierarchy list sequencing corresponding to each hierarchy;
determining the hierarchical total sorting after the sorting of each evaluation index according to the sorting weight;
the ordering is sequentially performed from the highest layer to the lowest layer in the hierarchy chart.
Further, after determining the total rank of each evaluation index in the preset evaluation index set according to the rank list rank corresponding to each rank, the method further comprises:
and carrying out consistency test on the hierarchical total sequence, and determining the hierarchical total sequence which is successfully tested as the hierarchical total sequence of each evaluation index.
Further, the accuracy evaluation index includes at least one of an accuracy rate, a recall rate, a trauma rate, and an F1 score.
Further, the model discrimination evaluation index includes at least one of a receiver work characteristic curve area and a K-S test index.
Further, the model data includes training set data, test set data, on-line scoring set data and on-line operation set data, and determining an index score corresponding to each index to be evaluated according to the model data includes:
Determining preset evaluation standards corresponding to the indexes to be evaluated, wherein the numerical values of the indexes to be evaluated in the preset evaluation standards are in positive correlation with the performance of the model;
and determining an index score corresponding to the index to be evaluated according to the model data corresponding to the index to be evaluated and a preset evaluation standard.
Further, determining a model comprehensive score corresponding to the factor to be evaluated according to each index score and the corresponding weight, including:
determining the product of each index score and the corresponding weight;
and determining the sum of the products as a model comprehensive score corresponding to the factors to be evaluated.
In a second aspect, an embodiment of the present invention provides a model performance evaluation apparatus, including:
the model acquisition module is used for acquiring a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model;
the index determining module is used for determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated and obtaining weights corresponding to the indexes to be evaluated;
and the score determining module is used for determining index scores corresponding to the indexes to be evaluated according to the model data and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights.
In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:
a memory and one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the model performance evaluation method of the first aspect as described above.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions for performing the model performance evaluation method as in the first aspect when executed by a computer processor.
According to the model performance evaluation method, device, equipment and storage medium provided by the embodiment of the invention, a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model are obtained; determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated; and determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights. According to the technical scheme, according to the obtained target model to be evaluated and the factors to be evaluated, which need to evaluate the model, the indexes to be evaluated, which are used for evaluating the performance of the target model, are selected in the preset evaluation index set, the corresponding weights of the indexes to be evaluated are determined according to the importance difference of the indexes to be evaluated, the model data corresponding to the target model is evaluated according to the difference of the required evaluation contents of the indexes to be evaluated, the corresponding index scores are obtained, and finally, the index scores and the corresponding weights are combined to determine the comprehensive scores of the model when the factors to be evaluated of the target model are evaluated, so that the performance evaluation can be performed on the target model according to the evaluation factors of the target model in different dimensions, the problems that the evaluation dimension is incomplete and the model evaluation system is imperfect in the conventional model evaluation are solved, the pertinence of the performance evaluation of the target model is improved, and the accuracy and the reliability of the obtained model evaluation result are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a model performance evaluation method in accordance with a first embodiment of the present invention;
FIG. 2 is a flow chart of a model performance evaluation method in a second embodiment of the invention;
FIG. 3 is a flow chart of determining a hierarchical total ordering of each evaluation index in the preset evaluation index set based on a hierarchical analysis method in a second embodiment of the present invention;
FIG. 4 is a diagram illustrating a hierarchical structure of model performance evaluation in accordance with a second embodiment of the present invention;
FIG. 5 is a diagram illustrating a hierarchical structure according to a second embodiment of the present invention;
FIG. 6 is a graph showing correspondence between an index to be evaluated and model data in a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of a model performance evaluation apparatus in a third embodiment of the present invention;
Fig. 8 is a schematic structural diagram of a computer device in a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention as detailed in the accompanying claims.
In the description of the present invention, it should be understood that the terms "first," "second," "third," and the like are used merely to distinguish between similar objects and are not necessarily used to describe a particular order or sequence, nor should they be construed to indicate or imply relative importance. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances. Furthermore, in the description of the present invention, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Example 1
Fig. 1 is a flowchart of a model performance evaluation method according to a first embodiment of the present invention, where the method may be applied to a case where performance tests are performed on a classification model before or after the model is put on line, and the method may be performed by a model performance evaluation device, where the model performance evaluation device may be implemented by software and/or hardware, and the model performance evaluation device may be configured on a computer device, where the computer device may be configured by two or more physical entities or may be configured by one physical entity. In general, the computer device may be a notebook, desktop, smart tablet, or the like.
As shown in fig. 1, the method for evaluating model performance provided in the first embodiment specifically includes the following steps:
s101, acquiring a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model.
In this embodiment, the target model may be understood as a trained classification model, and the target model may be in an online state or an offline state, and the acquired model data corresponding to the target model may be different according to the online state of the target model. The factors to be evaluated can be understood as the dimensions in which it is desired to evaluate the target model, i.e. the performance of the type approximation in the target model that needs to be evaluated.
Specifically, one or more classification models which need to be evaluated at this time are obtained as target models, the evaluation dimension of each target model is determined according to the requirement of the performance evaluation at this time, and the factors to be evaluated of the target models are determined according to the evaluation dimension, wherein the factors to be evaluated can be one or more, that is, the performance evaluation under a single dimension can be performed on the target models to be evaluated in the performance evaluation process of one model, and the comprehensive performance evaluation under multiple dimensions can also be performed. Meanwhile, as the factors to be evaluated correspond to different online states of the target model, model data suitable for performance evaluation of the current factors to be evaluated can be obtained according to the online states corresponding to the currently determined factors to be evaluated.
S102, determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated.
In this embodiment, the evaluation index may be understood as a parameter for measuring a certain performance of the classification model, the preset evaluation index set may be understood as a set of evaluation indexes with the finest granularity in each evaluation dimension for evaluating the target model, and the index to be evaluated may be understood as an evaluation index related to the dimension represented by the factor to be evaluated.
Specifically, a plurality of evaluation indexes related to the evaluation dimension corresponding to the factor to be evaluated are selected in a preset evaluation index set, the selected evaluation indexes are used as the indexes to be evaluated, and meanwhile, the weight corresponding to each index to be evaluated is obtained by determining the importance of each index to be evaluated relative to the total target.
According to the embodiment of the invention, different factors to be evaluated are matched with different indexes to be evaluated, so that the target model can be subjected to targeted performance evaluation aiming at the factors to be evaluated, meanwhile, the indexes to be evaluated with great influence on the model performance are highlighted according to the importance of the different indexes to be evaluated as the indexes to be evaluated are matched with different weights, and the accuracy of the model evaluation result obtained by evaluating according to the indexes to be evaluated is improved.
And S103, determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights.
Specifically, according to the index data of each index to be evaluated, determining the correlation between the index data and the model performance, and further determining the evaluation standard corresponding to each index to be evaluated, wherein the positive correlation exists between the numerical value of the index to be evaluated in the evaluation standard and the model performance to be evaluated. And matching the model data corresponding to the indexes to be evaluated with the determined evaluation standard, determining index scores corresponding to the indexes to be evaluated, weighting and summing the index scores according to weights corresponding to the indexes to be evaluated, and determining the obtained sum as a model comprehensive score corresponding to the factors to be evaluated.
According to the technical scheme, a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model are obtained; determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated; and determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights. According to the technical scheme, according to the obtained target model to be evaluated and the factors to be evaluated, which need to evaluate the model, the indexes to be evaluated, which are used for evaluating the performance of the target model, are selected in the preset evaluation index set, the corresponding weights of the indexes to be evaluated are determined according to the importance difference of the indexes to be evaluated, the model data corresponding to the target model is evaluated according to the difference of the required evaluation contents of the indexes to be evaluated, the corresponding index scores are obtained, and finally, the index scores and the corresponding weights are combined to determine the comprehensive scores of the model when the factors to be evaluated of the target model are evaluated, so that the performance evaluation can be performed on the target model according to the evaluation factors of the target model in different dimensions, the problems that the evaluation dimension is incomplete and the model evaluation system is imperfect in the conventional model evaluation are solved, the pertinence of the performance evaluation of the target model is improved, and the accuracy and the reliability of the obtained model evaluation result are improved.
Example two
Fig. 2 is a flowchart of a model performance evaluation method provided by the second embodiment of the present invention, where the technical solution of the second embodiment of the present invention is further optimized based on the foregoing alternative technical solutions, a correspondence between a factor to be evaluated and an index to be evaluated is provided, and a method for determining a weight of the index to be evaluated is provided at the same time, a total hierarchical ranking of each evaluation index in a preset evaluation index set is determined by using a analytic hierarchy process, further, a weight value of each evaluation index is determined according to the total hierarchical ranking, different data in model data are substituted into the corresponding index to be evaluated to determine a corresponding index score, and the model comprehensive score corresponding to the factor to be evaluated is determined by combining each index score with the weight value corresponding to the index score, so that the accuracy of the performance evaluation result represented by the determined model comprehensive score is higher, and the reliability of the performance evaluation result is enhanced.
As shown in fig. 2, the method for evaluating model performance provided in the second embodiment specifically includes the following steps:
s201, acquiring a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model.
In this embodiment, the model data includes training set data, test set data, on-line score set data, and on-line run set data. The training set data and the test set data are data when the model is not in an online state, the online grading set data and the online operation set data are data when the model is in an online state, and according to different determined factors to be evaluated, performance evaluation of the target model is achieved through different model data.
Further, the training set data comprises feature data, label data and prediction probability data; the test set data comprises feature data, label data and prediction probability data; the online scoring set data comprises characteristic data and prediction probability data; the online run-set data includes reflow tag data.
S202, determining an index to be evaluated in a preset evaluation index set according to the factor to be evaluated.
In this embodiment, the factors to be evaluated include a sample evaluation factor, a feature evaluation factor, a model evaluation factor, a front-end monitoring evaluation factor, and a back-end monitoring evaluation factor. The sample evaluation factors, the characteristic evaluation factors and the model evaluation factors are evaluation factors of the target model in an offline state, and the front-end monitoring evaluation factors and the rear-end monitoring evaluation factors are evaluation factors of the target model in an online state. When performance evaluation of the target model is performed, one or more factors to be evaluated can be selected for evaluation, and each factor to be evaluated corresponds to different indexes to be evaluated in a preset evaluation index set.
Correspondingly, determining the to-be-evaluated index in the preset evaluation index set according to the to-be-evaluated factor specifically comprises:
A. when the factor to be evaluated is a sample evaluation factor, the index to be evaluated comprises the number of samples, the positive and negative sample ratio, the feature number and the feature deletion rate.
In this embodiment, the number of samples may be understood as the number of samples in the training set; positive samples can be understood as samples corresponding to the categories to be correctly classified, negative samples can be understood as samples corresponding to the categories not to be correctly classified, and the positive-negative sample ratio can be understood as the ratio of the number of positive samples to the number of negative samples in the training set; the feature quantity can be understood as the number of different features corresponding to each sample in the training set; the feature missing rate can be understood as the proportion of missing content in a feature in the training set to the total content of the corresponding feature.
B. When the factor to be evaluated is a feature evaluation factor, the index to be evaluated includes a monotonic feature ratio, a predictive capability feature ratio, a stable feature ratio, and a weak correlation feature ratio.
In this embodiment, the monotonic feature ratio may be understood as a ratio of features having monotonicity in the training set to the total feature number in the training set; predictive capability feature ratio can be understood as the ratio of features with predictive capability to the total feature number determined according to the predictive probability in the training set; the stability characteristic ratio can be understood as the ratio of the number of the characteristics meeting the preset stability requirement in the training set and the testing set to the total characteristic number; the weak correlation feature ratio is defined as the ratio of features with Pearson coefficients within a preset coefficient threshold to the total feature number, which is determined according to Pearson coefficients of feature variables.
C. When the factors to be evaluated are model evaluation factors, the indexes to be evaluated comprise accuracy evaluation indexes, model discrimination evaluation indexes, lifting graphs, model stability evaluation indexes and model mobility.
The accuracy evaluation index comprises at least one of accuracy, precision, recall rate, accidental injury rate and F1 fraction; the model discrimination evaluation index includes at least one of a receiver work characteristic curve area and a K-S test index.
In this embodiment, the receiver working characteristic Curve (Receiver Operating Characteristic, ROC) is understood as a connection line determined by taking the false alarm probability as the abscissa and the hit probability as the ordinate, and the receiver working characteristic Curve Area (AUC) is understood as an Area surrounded by the receiver working characteristic Curve and the coordinate axis, so as to determine the distinction degree of the model; the K-S test (Kolmogorov-Smirnov) is understood as a test standard for measuring the difference between the accumulated fractions of good and bad samples; a Lift Chart (Lift Chart) can be understood as an image used to evaluate the effect of the model;
D. when the factor to be evaluated is a front-end monitoring evaluation factor, the index to be evaluated comprises a stability characteristic ratio and a model stability evaluation index.
In this embodiment, the evaluation modes of the stability feature ratio and the model stability evaluation index are the same as those described above, but the evaluated data are derived not only from training set data but also from on-line scoring set data.
E. When the factors to be evaluated are the back-end monitoring evaluation factors, the indexes to be evaluated comprise model discrimination evaluation indexes and model mobility.
In the present embodiment, the evaluation modes of the model discrimination evaluation index and the model mobility evaluation index are the same as those described above, but the data evaluated is derived from the on-line score set data and the on-line run set data.
S203, determining the hierarchical total sequence of all the evaluation indexes in the preset evaluation index set based on a hierarchical analysis method.
In this embodiment, the analytic hierarchy process (Analytic Hierarchy Process, AHP) is understood as a decision-making process of decomposing elements always related to decision making into levels of targets, criteria, schemes, etc., on the basis of which qualitative and quantitative analysis is performed. According to the nature of the problem and the total target to be achieved, the problem is decomposed into different layers of aggregation combinations to form a multi-layer analysis structure model, and the problems are compared in quality and ordered. The hierarchical total ranking can be understood as a process of calculating and ranking weights of all factors of a certain hierarchy in the determined multi-hierarchy analysis structure model for the highest hierarchy, namely for the relative importance of the total targets.
Specifically, a hierarchical analysis structure model for model performance evaluation is constructed based on a hierarchical analysis method, the factors of each layer are ordered according to the constructed hierarchical analysis structure model, after the layers where the evaluation indexes are located are ordered, the factors in the layers, namely the relative importance of the evaluation indexes to a total target, are determined, and the hierarchical total ordering of the evaluation indexes in a preset evaluation index set is formed.
Further, fig. 3 is a schematic flow chart of determining a hierarchical total ordering of each evaluation index in the preset evaluation index set based on a hierarchical analysis method according to the second embodiment of the present invention, which specifically includes the following steps:
s2031, constructing a hierarchical structure according to a preset decision target, a preset decision criterion and the interrelation among the decision objects.
The preset decision criterion comprises factors to be evaluated, and the decision object comprises a preset evaluation index set.
Specifically, the preset decision target, the preset decision criterion, that is, the factors to be considered and the decision object are divided into the highest layer, the middle layer and the lowest layer according to the interrelation between the two, and the corresponding hierarchical structure diagram is determined. The highest layer is the purpose of decision making and the problem to be solved, the lowest layer is an alternative scheme in decision making, and the middle layer is the factor to be considered, namely the criterion of decision making.
Fig. 4 is an exemplary diagram of a hierarchical structure of model performance evaluation according to the second embodiment of the present invention, where a layer is defined as a target layer, B1-B2 is defined as a criterion layer, C1-C5 is defined as a sub-criterion layer, D1-D14 is defined as a sub-criterion layer of C layer, E1-E19 is defined as a lowest layer, i.e. an index layer, C1-C5 is understood as a factor to be evaluated in the present application, and E1-E19 is understood as a preset evaluation index set in the present application.
S2032, constructing a judgment matrix according to importance scales among elements in each level in the hierarchy chart.
Specifically, a judgment matrix is constructed for each element in each level in the hierarchical structure by a consistent matrix method, and all elements in each level are not put together for comparison, but are compared in pairs. Optionally, the constructed decision matrix is as follows:
Figure BDA0003001353480000141
wherein a is ij The comparison of the ith element with respect to the jth element is identified.
Optionally, the importance scale between different elements can be represented by a preset relative scale relationship, so that the difficulty in comparing different elements with each other is reduced as much as possible, and the accuracy of the judgment matrix is improved. Illustratively, the scale of the decision matrix is shown in Table 1 below.
TABLE 1
Scale with a scale bar Meaning of
1 Representing that the two elements are of equal importance in comparison
3 Representing that two elements are compared, one element is slightly more important than the other element
5 Representing that two elements are compared, one element is significantly more important than the other element
7 Representing that two elements are compared, one element is of greater importance than the other element
9 Representing that two elements are compared, one element is extremely important than the other element
2,4,6,8 Median of the two adjacent judgments
As shown in the above table, if a ij Equal to 1, it is stated that the ith element is as important as the jth element, if a ij Equal to 5, it indicates that the ith element is significantly more important than the jth element, and so on, so as to obtain a judgment matrix among the elements in each hierarchy.
S2033, consistency test is conducted on the judgment matrix, and when the consistency test is successful, hierarchical single ordering is conducted on all layers in the hierarchical structure chart according to the judgment matrix.
Further, the consistency check of the judgment matrix may include the steps of:
a. and determining the maximum eigenvalue and the matrix order of the judgment matrix.
Specifically, the number of indexes constituting the judgment matrix is determined as the matrix order of the judgment matrix, and the maximum eigenvalue of the judgment matrix is determined by calculation.
b. And determining the difference between the maximum characteristic value and the matrix order as a first difference value, determining the difference between the matrix order and one as a second difference value, and determining the ratio of the first difference value to the second difference value as a consistency index of the judgment matrix.
Exemplary, assume that the maximum eigenvalue is denoted as λ max If the matrix order is denoted by n, the consistency index CI of the judgment matrix can be represented by the following formula:
Figure BDA0003001353480000151
when ci=0, the judgment matrix can be considered to have complete consistency; CI is close to 0, and the judgment matrix can be considered to have satisfactory consistency; the larger the CI, the more serious the inconsistency.
c. And determining a random consistency index of the judgment matrix according to the matrix order, and determining the ratio of the consistency index to the corresponding random consistency index as a consistency ratio.
Specifically, to measure the size of CI, a random uniformity index RI is introduced, where the value of the random uniformity index RI is related to the end of the judgment matrix, and the value of the random uniformity index RI is shown in table 2 below.
TABLE 2
n 1 2 3 4 5 6 7 8 9 10 11
RI 0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45 1.49 1.51
Further, the consistency ratio defined in terms of CI and RI can be expressed as:
Figure BDA0003001353480000161
to determine if the judgment matrix is satisfactory in consistency, i.e., if the consistency test is passed.
d. And determining a consistency test result according to the consistency ratio and a preset ratio threshold range.
Specifically, if the consistency ratio is within the range of the preset ratio threshold, the consistency degree of each element in the judgment matrix is considered to be within the allowable range, satisfactory consistency exists, and the consistency test is determined to be successful, otherwise, the evaluation index of the judgment matrix is required to be readjusted and constructed until the judgment matrix passes the consistency test. Alternatively, the preset ratio threshold range may be [0,0.1], or may be set according to practical situations, which is not limited in the embodiment of the present invention.
Further, after the judgment matrix is successfully checked, the feature vector of the maximum feature root of the judgment matrix is normalized, and the normalized value is determined as the weight of the relative importance of the element of the same layer to the element of the previous layer, so that the hierarchical single sequencing of each layer in the hierarchical structure diagram is completed.
S2034, determining the total hierarchical order of all the evaluation indexes in the preset evaluation index set according to the hierarchical order corresponding to all the hierarchies.
For example, FIG. 5 is a diagram of an exemplary hierarchical structure provided in an embodiment of the present invention, assuming that the hierarchical structure of the structure contains 3 levels in total, as shown in FIG. 5, the layer B has m elements B 1 ,B 2 ,...,B m According to the hierarchical single ranking result, the ranking thereof relative to the total target A, i.e. the weight relative to the total target A is b 1 ,b 2 ,...,b m . The n elements of the C layer are B to the elements in the upper layer B j Hierarchical single order c 11 ,c 21 ,...,c nm Where j=1, 2..m, i.e., the hierarchical total ordering of the C layers, i.e., the weight of the i-th element of the C layers to the total target a, can be expressed as
Figure BDA0003001353480000171
Wherein x means and b j Associated c ij Is a number of (3). And sequencing the elements of the layer C from high to low according to the obtained weight, and obtaining the hierarchical total sequencing of the layer C. The same principle can determine the weight of each evaluation index relative to the total target in the preset evaluation index set in the application, and then the corresponding hierarchical total ordering is obtained.
S2035, carrying out consistency test on the total hierarchical sequence, and determining the total hierarchical sequence which is successfully tested as the total hierarchical sequence of each evaluation index.
Exemplary, assume element B j The corresponding consistency index is CI j The corresponding random consistency index is RI j And j=1, 2,..m, the consistency ratio of the total ordering of the hierarchy may be determined as:
Figure BDA0003001353480000172
further, judging whether the consistency ratio of the total hierarchical ranking is within a preset ratio threshold, if so, considering the total hierarchical ranking to be successfully checked, and taking the total hierarchical ranking as the total ranking of each evaluation index.
S204, determining the weight of each evaluation index according to the hierarchical total sequence.
Specifically, the weight of each evaluation index relative to the total target in the hierarchical total sequence is determined as the weight of the corresponding evaluation index.
S205, obtaining weights corresponding to the indexes to be evaluated.
S206, determining preset evaluation standards corresponding to the indexes to be evaluated, wherein the numerical values of the indexes to be evaluated in the preset evaluation standards are in positive correlation with the performance of the model.
Specifically, each index to be evaluated is preprocessed, and non-quantitative tempering and normalization processing are performed on the index to be evaluated, so that the index value for evaluating the model performance in each index to be evaluated and the final model performance score are in positive correlation, namely, the larger the index value is, the better the performance is, and the higher the score is.
For example, for each index to be evaluated determined in step S202 and each index to be evaluated in the index layer shown in fig. 4, the determined preset evaluation criteria are as follows:
a) Sample number: when the number of samples is less than 5000, score=0; when the number of samples is greater than 5000, score=1.
b) Positive and negative sample ratio: positive number of samples/negative number of samples >0.1, score = 1; otherwise, score = 0.
c) Feature quantity: when the feature quantity is less than 20, score=0; when the feature quantity is between [20,100], the score=0.005×feature quantity+0.5; when the feature quantity is greater than 100, score=1.
d) Characteristic loss rate: score = feature number/total feature number with a feature deletion rate less than 0.8.
e) Monotonic feature duty cycle: score = monotonic feature number/total feature number.
f) Predictive power feature duty cycle: the predictive feature is determined by the IV (Information Value ) value of the feature variable, wherein the IV value evaluation criteria are shown in table 3 below. Score = number of predictive features/total number of features, a predictive feature being a feature with IV value in the interval [0.02,1 ].
TABLE 3 Table 3
Figure BDA0003001353480000181
Figure BDA0003001353480000191
g) Stabilizing feature duty cycle: the characteristics satisfying the preset stability requirement are determined by the characteristic variable PSI values (Population Stability Index, model stability evaluation index), wherein the PSI value evaluation criteria are shown in table 4 below. Score = PSI value equal to or less than 0.1 feature quantity/feature total quantity.
TABLE 4 Table 4
PSI value Variable stability
<=0.1 The variable distribution has slight change and good stability
0.1-0.25 The variable distribution has small variation and needs to be monitored with emphasis
>0.25 The variable distribution has large change and is suggested to be deleted
h) Weak correlation characteristic duty cycle: features satisfying weak correlation are determined by feature variable Pearson coefficients (Pearson Correlation Coefficient), which are used to measure whether two data sets are above a line, and to measure the linear relationship between distance variables. The Pearson coefficient evaluation criteria are shown in table 5 below. Score = number of features/total number of features with Pearson coefficients ranging between [0,0.4 ].
TABLE 5
Pearson coefficient Correlation of variables
0.8-1.0 Extremely strong correlation
0.6-0.8 Strong correlation
0.4-0.6 Moderate correlation
0.2-0.4 Weak correlation
0.0-0.2 Very weak correlation or no correlation
i) Accuracy evaluation index: score = accuracy/precision/recall/trauma/F1 value calculated.
j) Model discrimination evaluation index: when the model discrimination evaluation index is the area AUC of the receiver working characteristic curve, determining a score according to the value range of the model AUC value, and when the AUC value range is [0.95-1], obtaining a score of = -2 times AUC+2.9; score = 1.33 AUC-0.2635 at AUC value range [0.8,0.95); score = 2 x AUC-0.8 for AUC value range [0.7, 0.8); score = 2 AUC-0.8 at AUC value range [0.5,0.7); otherwise, score = 0. The evaluation criteria for AUC values are shown in Table 6 below.
TABLE 6
AUC values Model discrimination capability
0.9-1.0 The model may be overcomplete
0.8-0.9 The model distinguishing performance is stronger
0.7-0.8 Acceptable discrimination capability
0.5-0.7 Weak distinguishing ability
<0.5 Without discrimination capability, model invalidation
When the model discrimination evaluation index is a K-S test value (Kolmogorov-Smirnov), determining a score according to the value range of the model KS value, and when KS value >0.8, the score=0; when KS values lie within [0.75,0.8), score = -20 x KS value +16; score = 0.8 x KS value +0.4 when KS value is within the range of 0.5, 0.75); score = KS value +0.3 when KS value is within the range of 0.3, 0.5); score = 6 x KS value-1.2 when KS value lies within [0.2,0.3). Wherein, KS value evaluation criteria are shown in Table 7 below. A step of
TABLE 7
Figure BDA0003001353480000201
/>
Figure BDA0003001353480000211
k) Lifting diagram: if the negative sample duty ratio of each interval is submitted monotonically, the score=1; otherwise score = 0.
l) model stability evaluation index: determining a score according to a value range of a model stability evaluation index (Population Stability Index, PSI), wherein when the PSI value is in a [0.25,1] range, the score is = -0.8 PSI value+0.8; when the PSI value is within the range of 0.1, 0.25), the score = -2 x PSI value +1.1; otherwise, score = -1 x psi value +1. The PSI value evaluation criteria are shown in table 8 below.
TABLE 8
PSI value Model stability
<=0.1 The stability of the model is good
0.1-0.25 Model stability is acceptable
>0.25 The model is extremely unstable
m) model mobility: if the negative sample duty ratio of each interval is monotonically submitted, determining a score=1; otherwise, a score=0 is determined.
S207, determining an index score corresponding to the index to be evaluated according to the model data corresponding to the index to be evaluated and a preset evaluation standard.
Specifically, corresponding model data are selected according to different indexes to be evaluated, and the model data are evaluated according to preset evaluation criteria, so that index scores corresponding to the indexes to be evaluated are obtained.
For example, if the index to be evaluated is a model stability evaluation index under a model evaluation factor, the model data used for evaluation is prediction probability data in the test set data and the training set data; if the index to be evaluated is the model stability evaluation index under the front-end monitoring evaluation factor, the model data for evaluation is the prediction probability data in the on-line scoring set data and the training set data, and further, fig. 6 is a corresponding relationship diagram of the index to be evaluated and the model data provided in the second embodiment of the present invention.
S208, determining products of the index scores and the corresponding weights.
Specifically, the determined index scores of the indexes to be evaluated are multiplied by weights corresponding to the indexes to be evaluated in sequence, so that weighted index scores of the indexes to be evaluated are obtained.
S209, determining the sum of products as a model comprehensive score corresponding to the factors to be evaluated.
Specifically, the determined weighted index scores of the indexes to be evaluated are added, and the sum of the weighted index scores is the model comprehensive score of the factors to be evaluated corresponding to the indexes to be evaluated.
According to the technical scheme, different indexes to be evaluated are configured for different factors to be evaluated, meanwhile, the hierarchical total sequence of each evaluation index is determined by using a hierarchical analysis method, the weight of each index to be evaluated relative to the total target is further determined, the index to be evaluated with higher importance is highlighted, different model data are substituted into the index to be evaluated according to the difference of the indexes to be evaluated, the corresponding index score is determined, and the model comprehensive score corresponding to the factors to be evaluated is determined according to the determined weight value, so that the determined model comprehensive score can be used for evaluating the model performance more specifically, and the accuracy and the reliability of an evaluation result are improved.
Example III
Fig. 7 is a schematic structural diagram of a model performance evaluation device according to a third embodiment of the present invention, where the model performance evaluation device includes: a model acquisition module 31, an index determination module 32 and a score determination module 33.
The model obtaining module 31 is configured to obtain a target model to be evaluated, a factor to be evaluated of the target model, and model data corresponding to the target model; the index determining module 32 is configured to determine indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and obtain weights corresponding to the indexes to be evaluated; the score determining module 33 is configured to determine an index score corresponding to each index to be evaluated according to the model data, and determine a model composite score corresponding to the factor to be evaluated according to each index score and the corresponding weight.
According to the technical scheme, the problems of incomplete evaluation dimension and imperfect model evaluation system in the existing model evaluation are solved, the pertinence of the performance evaluation of the target model is improved, and the accuracy and the reliability of the obtained model evaluation result are improved.
Optionally, the factors to be evaluated of the target model include: sample evaluation factors, feature evaluation factors, model evaluation factors, front-end monitoring evaluation factors and back-end monitoring evaluation factors.
Optionally, the index determination module 32 includes:
the index determining unit is used for determining the index to be evaluated, if the factor to be evaluated is a sample evaluation factor, the index to be evaluated comprises the number of samples, the positive and negative sample duty ratio, the feature number and the feature deletion rate; if the factor to be evaluated is a characteristic evaluation factor, the index to be evaluated comprises a monotonic characteristic duty ratio, a predictive capability characteristic duty ratio, a stable characteristic duty ratio and a weak correlation characteristic duty ratio; if the factors to be evaluated are model evaluation factors, the indexes to be evaluated comprise accuracy evaluation indexes, model discrimination evaluation indexes, lifting graphs, model stability evaluation indexes and model mobility; if the factor to be evaluated is a front-end monitoring evaluation factor, the index to be evaluated comprises a stable characteristic duty ratio and a model stability evaluation index; and if the factors to be evaluated are the back-end monitoring evaluation factors, the indexes to be evaluated comprise model discrimination evaluation indexes and model mobility.
Further, the accuracy evaluation index includes at least one of an accuracy rate, a recall rate, a trauma rate, and an F1 score.
Further, the model discrimination evaluation index includes at least one of a receiver work characteristic curve area and a K-S test index.
And the weight acquisition unit is used for acquiring the weight corresponding to each index to be evaluated.
Optionally, the model performance evaluation device further includes:
the hierarchical ranking determining module is used for determining the hierarchical total ranking of all the evaluation indexes in the preset evaluation index set based on a hierarchical analysis method.
And the index weight determining module is used for determining the weight of each evaluation index according to the hierarchical total sequence.
Optionally, the hierarchical ordering determining module includes:
the structure diagram construction unit is used for constructing a hierarchy diagram according to a preset decision target, a preset decision criterion and the interrelationship among the decision objects; the preset decision criteria comprise factors to be evaluated, and the decision objects comprise evaluation indexes in a preset evaluation index set.
And the judging matrix constructing unit is used for constructing a judging matrix according to the importance scale among the elements in each level in the hierarchy chart.
And the single sequencing unit is used for carrying out consistency test on the judgment matrix and carrying out hierarchical single sequencing on each level in the hierarchical structure chart according to the judgment matrix when the test is successful.
And the total sorting unit is used for determining the hierarchical total sorting of each evaluation index in the preset evaluation index set according to the hierarchical single sorting corresponding to each hierarchy.
Optionally, the single sequencing unit is specifically configured to determine a maximum eigenvalue and a matrix order of the judgment matrix; determining the difference between the maximum characteristic value and the matrix order as a first difference value, determining the difference between the matrix order and one as a second difference value, and determining the ratio of the first difference value to the second difference value as a consistency index of the judgment matrix; determining a random consistency index of the judgment matrix according to the matrix order, and determining the ratio of the consistency index to the corresponding random consistency index as a consistency ratio; if the consistency ratio is within the preset ratio threshold, the consistency check is determined to be successful.
Optionally, the total sorting unit is specifically configured to determine a sorting weight of relative importance of each evaluation index in the preset evaluation index set with respect to the preset decision target according to the hierarchical list sorting corresponding to each hierarchy; determining the hierarchical total sorting after the sorting of each evaluation index according to the sorting weight; the ordering is sequentially performed from the highest layer to the lowest layer in the hierarchy chart.
Optionally, the total ranking unit is further configured to perform consistency check on the total ranking of the layers, and determine the total ranking of the layers that is successfully checked as the total ranking of the layers of each evaluation index.
Further, the model data includes training set data, test set data, on-line scoring set data, and on-line running set data.
Optionally, the score determining module 33 includes:
the evaluation standard determining unit is used for determining preset evaluation standards corresponding to the indexes to be evaluated, and the numerical values of the indexes to be evaluated in the preset evaluation standards and the model performance are in positive correlation.
And the index score determining unit is used for determining an index score corresponding to the index to be evaluated according to the model data corresponding to the index to be evaluated and a preset evaluation standard.
A comprehensive score determining unit, configured to determine a product of each index score and a corresponding weight; and determining the sum of the products as a model comprehensive score corresponding to the factors to be evaluated.
The model performance evaluation device provided by the embodiment of the invention can execute the model performance evaluation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 8 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention, where the model performance evaluation device according to the embodiment of the present invention may be integrated. As shown in fig. 8, the computer apparatus 400 includes a storage device 401, a processor 402, and a computer program stored on the storage device 401 and executable on the processor 402, where the processor 402 implements the model performance evaluation method provided by the embodiment of the present invention when executing the computer program.
The storage device 401 is a computer-readable storage medium, and may be used to store a software program, a computer-executable program, and modules, such as program instructions/modules (e.g., the model acquisition module 31, the index determination module 32, and the score determination module 33) corresponding to the model performance evaluation method in the embodiment of the present invention. The processor 402 executes various functional applications of the apparatus and data processing by executing software programs, instructions, and modules stored in the storage device 401, that is, implements the above-described model performance evaluation method.
The storage device 401 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. Further, storage 401 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, storage 401 may further include memory remotely located with respect to processor 402, which may be connected to the identification through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Example five
A fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a model performance evaluation method, the method comprising:
obtaining a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model;
determining indexes to be evaluated in a preset evaluation index set according to factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated;
and determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the method operations described above, and may also perform the related operations in the model performance evaluation method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the above-mentioned embodiments of the search apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, as long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (11)

1. A model performance evaluation method, characterized by comprising:
obtaining a target model to be evaluated, wherein factors to be evaluated of the target model and model data corresponding to the target model are obtained;
Determining indexes to be evaluated in a preset evaluation index set according to the factors to be evaluated, and acquiring weights corresponding to the indexes to be evaluated;
determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights;
the target model is a classification model;
the factors to be evaluated of the target model comprise: sample evaluation factors, characteristic evaluation factors, model evaluation factors, front-end monitoring evaluation factors and back-end monitoring evaluation factors;
the model data comprises training set data, test set data, on-line grading set data and on-line operation set data;
the training set data and the test set data are data when the model is not in an online state, and the online scoring set data and the online operation set data are data when the model is in an online state;
the training set data comprises feature data, label data and prediction probability data; the test set data comprises feature data, label data and prediction probability data; the online scoring set data comprises characteristic data and predictive probability data; the online running set data comprises reflow tag data;
The determining the to-be-evaluated index in the preset evaluation index set according to the to-be-evaluated factors comprises the following steps:
if the factors to be evaluated are the sample evaluation factors, the indexes to be evaluated comprise the number of samples, the positive and negative sample duty ratio, the feature number and the feature deletion rate;
if the factor to be evaluated is the characteristic evaluation factor, the index to be evaluated comprises a monotonic characteristic duty ratio, a predictive capacity characteristic duty ratio, a stable characteristic duty ratio and a weak correlation characteristic duty ratio;
if the factors to be evaluated are the model evaluation factors, the indexes to be evaluated comprise accuracy evaluation indexes, model discrimination evaluation indexes, lifting graphs, model stability evaluation indexes and model mobility;
if the factors to be evaluated are the front-end monitoring evaluation factors, the indexes to be evaluated comprise stable characteristic duty ratio and model stability evaluation indexes;
if the factors to be evaluated are the back-end monitoring evaluation factors, the indexes to be evaluated comprise model discrimination evaluation indexes and model mobility;
before the weight corresponding to each index to be evaluated is obtained, the method further comprises:
determining the hierarchical total sequence of each evaluation index in the preset evaluation index set based on a hierarchical analysis method;
Determining the weight of each evaluation index according to the hierarchical total sequence;
the step of determining the hierarchical total sequence of each evaluation index in the preset evaluation index set based on the hierarchical analysis method comprises the following steps:
constructing a hierarchical structure chart according to a preset decision target, a preset decision criterion and the interrelation among the decision objects; the decision object comprises all evaluation indexes in the preset evaluation index set;
constructing a judgment matrix according to importance scales among elements in each level in the hierarchy chart;
consistency test is conducted on the judgment matrix, and when the test is successful, hierarchical single sequencing is conducted on all layers in the hierarchical structure diagram according to the judgment matrix;
and determining the total hierarchical sequence of each evaluation index in the preset evaluation index set according to the hierarchical single sequence corresponding to each hierarchical.
2. The method of claim 1, wherein said performing a consistency check on said decision matrix comprises:
determining the maximum eigenvalue and the matrix order of the judgment matrix;
determining the difference between the maximum characteristic value and the matrix order as a first difference value, determining the difference between the matrix order and one as a second difference value, and determining the ratio of the first difference value to the second difference value as a consistency index of the judgment matrix;
Determining a random consistency index of the judgment matrix according to the matrix order, and determining the ratio of the consistency index to the corresponding random consistency index as a consistency ratio;
and if the consistency ratio is within the preset ratio threshold range, determining that the consistency test is successful.
3. The method according to claim 1, wherein the determining the hierarchical total ranking of each evaluation index in the preset evaluation index set according to the hierarchical single ranking corresponding to each hierarchy includes:
determining the sequencing weight of the relative importance of each evaluation index in the preset evaluation index set relative to the preset decision target according to the hierarchy list sequencing corresponding to each hierarchy;
determining the hierarchical total sorting after sorting the evaluation indexes according to the sorting weight;
the sorting is sequentially performed from the highest layer to the lowest layer in the hierarchical structure diagram.
4. The method according to claim 1, wherein after determining the total rank of each evaluation index in the preset evaluation index set according to the rank list rank corresponding to each rank, the method further comprises:
and carrying out consistency test on the hierarchical total sequence, and determining the hierarchical total sequence which is successfully tested as the hierarchical total sequence of each evaluation index.
5. The method of claim 1, wherein the accuracy assessment indicator comprises at least one of an accuracy rate, a precision rate, a recall rate, a trauma rate, and an F1 score.
6. The method of claim 1, wherein the model discrimination evaluation indicator comprises at least one of a recipient performance characteristic area and a K-S test indicator.
7. The method according to claim 1, wherein determining an index score corresponding to each of the indexes to be evaluated according to the model data includes:
determining a preset evaluation standard corresponding to each index to be evaluated, wherein the numerical value of the index to be evaluated in the preset evaluation standard and the performance of the model are in positive correlation;
and determining an index score corresponding to the index to be evaluated according to the model data corresponding to the index to be evaluated and the preset evaluation standard.
8. The method of claim 1, wherein determining a model composite score corresponding to the factor to be evaluated based on each of the index scores and the corresponding weights comprises:
determining the product of each index score and the corresponding weight;
and determining the sum of the products as a model comprehensive score corresponding to the factors to be evaluated.
9. A model performance evaluation device, characterized by comprising:
the model acquisition module is used for acquiring a target model to be evaluated, factors to be evaluated of the target model and model data corresponding to the target model;
the target model is a classification model;
the factors to be evaluated of the target model comprise: sample evaluation factors, characteristic evaluation factors, model evaluation factors, front-end monitoring evaluation factors and back-end monitoring evaluation factors;
the model data comprises training set data, test set data, on-line grading set data and on-line operation set data;
the training set data and the test set data are data when the model is not in an online state, and the online scoring set data and the online operation set data are data when the model is in an online state;
the training set data comprises feature data, label data and prediction probability data; the test set data comprises feature data, label data and prediction probability data; the online scoring set data comprises characteristic data and predictive probability data; the online running set data comprises reflow tag data;
the index determining module is used for determining indexes to be evaluated in a preset evaluation index set according to the factors to be evaluated and obtaining weights corresponding to the indexes to be evaluated;
An index determination module comprising:
the index determining unit is used for determining the index to be evaluated, if the factor to be evaluated is a sample evaluation factor, the index to be evaluated comprises the number of samples, the positive and negative sample duty ratio, the feature number and the feature deletion rate; if the factor to be evaluated is a characteristic evaluation factor, the index to be evaluated comprises a monotonic characteristic duty ratio, a predictive capability characteristic duty ratio, a stable characteristic duty ratio and a weak correlation characteristic duty ratio; if the factors to be evaluated are model evaluation factors, the indexes to be evaluated comprise accuracy evaluation indexes, model discrimination evaluation indexes, lifting graphs, model stability evaluation indexes and model mobility; if the factor to be evaluated is a front-end monitoring evaluation factor, the index to be evaluated comprises a stable characteristic duty ratio and a model stability evaluation index; if the factors to be evaluated are the back-end monitoring evaluation factors, the indexes to be evaluated comprise model discrimination evaluation indexes and model mobility;
the hierarchical ranking determining module is used for determining the hierarchical total ranking of each evaluation index in the preset evaluation index set based on a hierarchical analysis method;
the hierarchical ordering determining module specifically comprises:
the structure diagram construction unit is used for constructing a hierarchy diagram according to a preset decision target, a preset decision criterion and the interrelationship among the decision objects; the method comprises the steps that a preset decision criterion comprises factors to be evaluated, and a decision object comprises all evaluation indexes in a preset evaluation index set;
The judging matrix constructing unit is used for constructing a judging matrix according to the importance scale among the elements in each level in the hierarchy chart;
the single sequencing unit is used for carrying out consistency test on the judgment matrix and carrying out hierarchical single sequencing on each level in the hierarchical structure chart according to the judgment matrix when the test is successful;
the total sorting unit is used for determining the hierarchical total sorting of each evaluation index in the preset evaluation index set according to the hierarchical single sorting corresponding to each hierarchy;
the index weight determining module is used for determining the weight of each evaluation index according to the hierarchical total sequence;
and the score determining module is used for determining index scores corresponding to the indexes to be evaluated according to the model data, and determining model comprehensive scores corresponding to the factors to be evaluated according to the index scores and the corresponding weights.
10. A computer device comprising a memory and one or more processors;
the memory is used for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the model performance evaluation method of any of claims 1-8.
11. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the model performance evaluation method of any one of claims 1-8.
CN202110347813.8A 2021-03-31 2021-03-31 Model performance evaluation method, device, equipment and storage medium Active CN112989621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110347813.8A CN112989621B (en) 2021-03-31 2021-03-31 Model performance evaluation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110347813.8A CN112989621B (en) 2021-03-31 2021-03-31 Model performance evaluation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112989621A CN112989621A (en) 2021-06-18
CN112989621B true CN112989621B (en) 2023-06-23

Family

ID=76338633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110347813.8A Active CN112989621B (en) 2021-03-31 2021-03-31 Model performance evaluation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112989621B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849369A (en) * 2021-09-22 2021-12-28 上海浦东发展银行股份有限公司 Grading method, grading device, grading equipment and storage medium
CN113837320A (en) * 2021-10-29 2021-12-24 中国建设银行股份有限公司 Model migration method and system based on data classification
CN114124779B (en) * 2021-11-05 2023-06-30 中国联合网络通信集团有限公司 Route evaluation method, device, server and storage medium
CN115759884B (en) * 2023-01-09 2023-05-02 天津中科谱光信息技术有限公司 Spectrum data quality evaluation method and device based on point spectrometer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235877A (en) * 2013-04-12 2013-08-07 北京工业大学 Robot control software module partitioning method
CN108734409A (en) * 2018-05-25 2018-11-02 重庆大学 A kind of Mountainous City River side landscape suitability evaluation methods
CN109033497A (en) * 2018-06-04 2018-12-18 南瑞集团有限公司 A kind of multistage data mining algorithm intelligent selecting method towards high concurrent

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235877A (en) * 2013-04-12 2013-08-07 北京工业大学 Robot control software module partitioning method
CN108734409A (en) * 2018-05-25 2018-11-02 重庆大学 A kind of Mountainous City River side landscape suitability evaluation methods
CN109033497A (en) * 2018-06-04 2018-12-18 南瑞集团有限公司 A kind of multistage data mining algorithm intelligent selecting method towards high concurrent

Also Published As

Publication number Publication date
CN112989621A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112989621B (en) Model performance evaluation method, device, equipment and storage medium
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
CN111401600A (en) Enterprise credit risk evaluation method and system based on incidence relation
CN111626821A (en) Product recommendation method and system for realizing customer classification based on integrated feature selection
CN110222733B (en) High-precision multi-order neural network classification method and system
CN113177839A (en) Credit risk assessment method, device, storage medium and equipment
CN116596095B (en) Training method and device of carbon emission prediction model based on machine learning
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN114638498A (en) ESG evaluation method, ESG evaluation system, electronic equipment and storage equipment
CN117235608B (en) Risk detection method, risk detection device, electronic equipment and storage medium
CN113011788A (en) Emergency decision-making method for marine traffic accident, terminal equipment and storage medium
CN116629716A (en) Intelligent interaction system work efficiency analysis method
CN115630708A (en) Model updating method and device, electronic equipment, storage medium and product
CN114049205A (en) Abnormal transaction identification method and device, computer equipment and storage medium
Li et al. Research on listed companies’ credit ratings, considering classification performance and interpretability
CN114565457A (en) Risk data identification method and device, storage medium and electronic equipment
CN113743752A (en) Data processing method and device
Nawaiseh et al. Financial Statement Audit Utilising Naive Bayes Networks, Decision Trees, Linear Discriminant Analysis and Logistic Regression
CN117094817B (en) Credit risk control intelligent prediction method and system
Sun Quantitative investment prediction analysis for enterprise asset management using machine learning algorithms
Bansal et al. Comparison of Different Supervised Machine Learning Classifiers to Predict Credit Card Approvals
AlDarmaki et al. Prediction of the closing price in the Dubai financial market: a data mining approach
Guo et al. Transductive Semi-Supervised Metric Network for Reject Inference in Credit Scoring
CN113538020B (en) Method and device for acquiring association degree of group of people features, storage medium and electronic device
CN114266641A (en) Scoring model construction method based on logistic regression and rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant