CN116975621A - Model stability monitoring method and device and computer equipment - Google Patents

Model stability monitoring method and device and computer equipment Download PDF

Info

Publication number
CN116975621A
CN116975621A CN202310258052.8A CN202310258052A CN116975621A CN 116975621 A CN116975621 A CN 116975621A CN 202310258052 A CN202310258052 A CN 202310258052A CN 116975621 A CN116975621 A CN 116975621A
Authority
CN
China
Prior art keywords
data sets
stability
consistency
model
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310258052.8A
Other languages
Chinese (zh)
Inventor
曾炜
郭潇阳
刘玉凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310258052.8A priority Critical patent/CN116975621A/en
Publication of CN116975621A publication Critical patent/CN116975621A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a model stability monitoring method, a model stability monitoring device, a model stability monitoring computer device, a model stability monitoring storage medium and a model stability monitoring computer program product. The method comprises the following steps: acquiring a target monitoring model; acquiring at least two different application related data sets obtained by applying a target monitoring model to at least two different data sets; obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets; performing stability evaluation on the target monitoring model according to the evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension includes a degree of consistency of the consistency evaluation dimension; and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model. The method enriches the evaluation index of the model and realizes automatic detection of the stability of the model.

Description

Model stability monitoring method and device and computer equipment
Technical Field
The present application relates to the field of machine learning technology, and in particular, to a model stability monitoring method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of artificial intelligence technology, the weight of machine learning applications is gradually increasing. Machine learning obtains a model by learning a large number of samples, and can analyze data and learn a prediction result by using the model.
In the model training stage, the model is generally evaluated through performance indexes such as error rate, recall rate and the like so as to evaluate the quality of the model. However, the traditional model evaluation mode usually only focuses on the accuracy of the model, and the evaluation dimension is single.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a model stability monitoring method, apparatus, computer device, computer-readable storage medium, and computer program product that can expand the model evaluation dimension.
In a first aspect, the present application provides a method for model stability monitoring. The method comprises the following steps:
acquiring a target monitoring model;
acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
Performing stability evaluation on the target monitoring model according to an evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
In a second aspect, the application further provides a model stability monitoring device. The device comprises:
the target acquisition module is used for acquiring a target monitoring model;
the data acquisition module is used for acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
the consistency dimension evaluation module is used for acquiring predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
the overall evaluation module is used for evaluating the stability of the target monitoring model according to the evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
And the monitoring module is used for prompting the stability abnormality of the target monitoring model when the stability evaluation result is that the stability of the target monitoring model is abnormal.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a target monitoring model;
acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
performing stability evaluation on the target monitoring model according to an evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a target monitoring model;
acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
performing stability evaluation on the target monitoring model according to an evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
Acquiring a target monitoring model;
acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
performing stability evaluation on the target monitoring model according to an evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
According to the model stability monitoring method, device, computer equipment, storage medium and computer program product, the predicted value distribution proportion sequence corresponding to at least two different data sets is obtained from at least two different application related data sets, the consistency degree of the consistency evaluation dimension is obtained according to the predicted value distribution proportion sequence corresponding to at least two different data sets, the consistency degree can represent the consistency of the predicted distribution of the target monitoring model under different data sets, so that the stability of the target monitoring model when the target monitoring model is applied to different application data sets can be represented, compared with the traditional evaluation indexes such as accuracy, the model evaluation dimension can be expanded by introducing the stability index, and the evaluation index of the model is enriched. And when the stability abnormality of the model is monitored, a stability abnormality prompt is output, so that the automatic detection of the stability of the model is realized, the stability abnormality of the model is found in time, and the stability of the application of the model is further improved.
Drawings
FIG. 1 is a diagram of an application environment for a model stability monitoring method in one embodiment;
FIG. 2 is a flow chart of a method of model stability monitoring in one embodiment;
FIG. 3 is a flow chart illustrating steps for cross-sample consistency assessment in one embodiment;
FIG. 4 is a schematic diagram illustrating an application of a model stability monitoring method in one embodiment;
FIG. 5 is a schematic interface diagram of user rating input in one embodiment;
FIG. 6 is a schematic diagram of an interface for user rating output in one embodiment;
FIG. 7 is a schematic diagram of an application of a rating model application model stability monitoring method in one embodiment;
FIG. 8 is a block diagram of a model stability monitoring device in one embodiment;
fig. 9 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Among them, machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning, and the like, and is specifically described by the following embodiments:
the model stability monitoring method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers.
The terminal may trigger a model application request to apply the model on the server 104. The server acquires a target monitoring model; acquiring at least two different application related data sets obtained by applying a target monitoring model to at least two different data sets; obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets; performing stability evaluation on the target monitoring model according to the evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension includes a degree of consistency of the consistency evaluation dimension; and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a method for monitoring stability of a model is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
step 202, a target monitoring model is obtained.
The model is a machine learning related model, including but not limited to a decision tree model, a random forest model, an artificial neural network model and a Bayesian learning model. The target monitoring model can be a model in a training state or a model in a use state after being trained.
The monitored model can be deployed on the same server as the model stability monitoring framework or on a different server. When the monitored model and the model stability monitoring framework are deployed on the same server, the related data of the target monitoring model are acquired through an interface calling mode, and the stability of the monitored model is monitored.
When the model stability monitoring framework is deployed on an independent server, the framework can communicate with a plurality of servers deployed with the monitored models, acquire relevant data of the monitored models, and monitor the stability of the monitored models. The model stability refers to the capability of a machine learning algorithm to keep the outputtable capability, the result accuracy and the distribution consistency of the machine learning algorithm under the condition of facing different realization scene inputs.
The monitoring period can be set for the monitored model, and stability monitoring can be carried out on the monitored model at regular time. Specifically, a timer is set for each monitored model, and when the timer reaches the set monitoring time, the monitored model is taken as a target monitoring model. The monitoring period of each monitored model may be personalized based on the business needs of the model application, e.g., the monitoring period may be set to one day, one week, one month, etc.
Step 204, obtaining at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets.
Wherein the different data sets are different input data sets of the model. In general, the input data set may include feature data of a plurality of objects under test, and the feature data may be a plurality of. Taking the model as a target object classification model as an example, the input dataset includes the age, region, interest, etc. of the object. Taking the model as an image recognition model as an example, the input dataset includes image features. The different data sets may be different input data sets of different monitoring periods, or may be different input data sets of different sample distributions in the same monitoring period or different monitoring periods. And obtaining corresponding application related data when each data set applies the target monitoring model. The data set may be applied to the target monitoring model by training the target monitoring model by using the data set, or may be applied to the target monitoring model by using the data set.
The relevant data set obtained by applying the relevant data set to the data set by applying the target monitoring model can comprise a predicted output value corresponding to the input data and a true value corresponding to the input data. The predicted output value is predicted by the model according to the input data, and the real value corresponding to the input data can be the labeling data of the sample or the feedback of the user to the predicted output value. Taking the model as an example of purchase intention classification, predicting an output value as purchase intention, and pushing the advertisement to the object with the purchase intention according to the predicted output value. When the object implements a purchase behavior based on the push advertisement, the feedback true value is a purchase. When the object does not implement a purchase behavior based on the push advertisement, the feedback implementation value is not purchased.
Taking two data sets with different monitoring periods as examples, the current period data set and the historical data set are included, and the application related data set includes current period application related data corresponding to the current period data set and historical application related data corresponding to the historical data set. It should be appreciated that the at least two different data sets may be more data sets, not limited to two.
Step 206, obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from at least two different application related data sets, and obtaining the consistency degree of the consistency dimension according to the predicted value distribution proportion sequences corresponding to at least two different data sets.
Wherein the stability assessment dimension may include a plurality of assessment dimensions, wherein the stability assessment dimension includes at least a consistency assessment dimension. The consistency assessment dimension is used to assess consistency of model outputs of the target monitoring model under different data sets. In addition to the consistency assessment dimension, the stability assessment dimension can have a number of other assessment dimensions, which can include accuracy and coverage, as examples.
In one embodiment, a stability evaluation dimension and a calculation parameter of the stability evaluation dimension may be preset, a target parameter corresponding to at least one stability evaluation dimension is obtained from at least two different application related data sets, and an evaluation result of the stability evaluation dimension is obtained according to the target parameter.
Wherein the stability assessment dimension includes a consistency dimension. The calculation parameters corresponding to the consistency dimension are predicted value distribution proportion sequences corresponding to at least two different data sets, and if the predicted value distribution proportion sequences corresponding to the at least two different data sets are similar, the target monitoring model can keep consistency on the predicted value distribution proportion sequences for the different data sets, and has higher stability.
The predicted value distribution proportion sequence refers to the quantity ratio distribution of each predicted value output by the model, taking the model as a two-class model as an example, wherein the model output comprises yes and no, and the predicted value distribution proportion sequence comprises the ratio of the quantity of which the model output result is yes to the total output quantity and the ratio of the quantity of which the model output result is no to the total output quantity. Taking model output as a plurality of classification categories as an example, the predicted value distribution proportion sequence refers to the number ratio distribution of each classification category of model output.
In a business application, some models are output as probability values, which are discrete and inconvenient to count. For such a model, a classification module may be provided, the input of the classification module is a probability value output by the model, the classification model maps the probability value into classification categories, and the classification model is used as a model prediction output value, so that the occupation ratio of each category output by the statistical model is facilitated.
In this embodiment, the consistency degree in the consistency evaluation dimension is obtained according to the predicted value distribution proportion sequence corresponding to at least two different data sets. The more similar the predicted value distribution proportion sequences corresponding to at least two different data sets are, the higher the consistency degree of the characterization is, and the higher the stability of the target monitoring model is. The consistency degree can be evaluated according to the difference value of the predicted value distribution proportion sequences corresponding to at least two different data sets, and when the difference value of the predicted value distribution proportion sequences is smaller, the consistency degree is high. The consistency degree can also be evaluated according to the ratio of the predicted value distribution proportion sequences corresponding to at least two different data sets, and when the ratio of the two data sets is closer to 1, the consistency degree is high. The degree of consistency can also be assessed based on the difference and the ratio of the predicted value distribution ratio sequence.
Step 208, performing stability evaluation on the target monitoring model according to the evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension includes a degree of consistency of the consistency evaluation dimension.
If the stability evaluation dimension only comprises the consistency dimension, the stability evaluation can be performed on the target monitoring model based on the evaluation result corresponding to the consistency dimension. If the stability evaluation dimension comprises a plurality of dimensions, and at least one dimension is a consistency dimension, performing stability evaluation on the target monitoring model based on evaluation results of the plurality of evaluation dimensions. For example, the weight of each evaluation dimension can be set, and the stability value of the target monitoring model can be obtained according to the weighted sum of the evaluation results corresponding to each evaluation dimension.
And 210, when the stability evaluation result is that the stability of the target monitoring model is abnormal, performing stability abnormality prompt on the target monitoring model.
The stability evaluation result is the stability degree, when the stability degree is lower than a threshold value, the stability of the target monitoring model is abnormal, and at the moment, the stability abnormality prompt is carried out on the target monitoring model, so that a developer can be helped to find out the model abnormality in time.
Wherein the stability anomaly prompt may include an improvement recommendation, for example, when the stability of the visual detection model is abnormal, a prompt "recommendation to retrain and update the model" may be output. Therefore, the stability of the model can be automatically monitored, and when the stability of the model is abnormal, an improvement suggestion can be prompted, an algorithm engineer can be assisted to update the algorithm, and the performance of the model is improved.
According to the model stability monitoring method, the predicted value distribution proportion sequences corresponding to at least two different data sets are obtained from at least two different application related data sets, the consistency degree of the consistency evaluation dimension is obtained according to the predicted value distribution proportion sequences corresponding to at least two different data sets, the consistency degree can represent the consistency of the predicted distribution of the target monitoring model under different data sets, therefore, the stability of the target monitoring model when the target monitoring model is applied to different application data sets can be represented, compared with the traditional accuracy and other evaluation indexes, the model evaluation dimension can be expanded by introducing the stability index, and the evaluation index of the model is enriched. And when the stability abnormality of the model is monitored, a stability abnormality prompt is output, so that the automatic detection of the stability of the model is realized, the stability abnormality of the model is found in time, and the stability of the application of the model is further improved.
In another embodiment, the different data sets are data sets of different application times; the consistency dimension includes a cross-time consistency dimension.
The data sets of different application times may be input data sets of different training times, or input data sets of application times in different detection periods, for example, input data sets applied in the current detection period and input data sets applied in the previous detection period.
For cross-time consistency dimension, acquiring predicted value distribution proportion sequences corresponding to data sets of at least two different application times from at least two different application related data sets; and obtaining the cross-time consistency degree of the target monitoring model in the cross-time consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times.
Wherein cross-time consistency is used to evaluate consistency of model output at different times. The evaluation parameter of the cross-time consistency dimension is a predicted value distribution proportion sequence corresponding to the data sets of different application times. The predicted value distribution proportion sequence refers to the number ratio distribution of each predicted value outputted by the model.
Specifically, a predicted value distribution proportion sequence corresponding to at least two data sets with different application times is obtained from at least two data sets with different application times, if the predicted value distribution proportion sequences corresponding to the data sets with different application times are similar or identical, the output of the model application on the data sets with different application times is stable, and the stability of the model application in the time dimension is realized.
Specifically, a cross-time consistency degree value may be calculated according to a predicted value distribution proportion sequence corresponding to at least two data sets of different application times, where the cross-time consistency degree value is the consistency of a predicted value distribution proportion sequence output by the data sets of different application times, and the cross-time consistency degree may be estimated according to a difference value of the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times, where the difference value of the two is smaller, which indicates that the consistency degree is high. The cross-time consistency degree can also be estimated according to the ratio of the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times, and when the ratio of the two is closer to 1, the consistency degree is high. The cross-time consistency degree can also be estimated according to the difference value and the ratio value of the predicted value distribution proportion sequence corresponding to the data sets of at least two different application times, and the specific calculation formula is as follows:
wherein A is i Representing the distribution proportion of the ith predicted value corresponding to the data set of the first application time, wherein the distribution proportion is the proportion of the ith predicted value in the data set of the first application time to the total predicted value, B i And the distribution proportion of the i-th type predicted value corresponding to the data set representing the second application time is the proportion of the i-th type predicted value in the data set representing the second application time to the total predicted value.
In this embodiment, the stability of the model across the time dimension is evaluated by evaluating the output stability of the model applied across different application time datasets across the time consistency degree.
In another embodiment, the different data sets are data sets of different sample distributions; the consistency dimension includes a cross-sample consistency dimension.
The data sets of different sample distributions may be training data sets of different sample distributions, or application data sets of different sample distributions, for example, application data sets of samples in different regions, for example, application data sets derived from a first province and application data sets derived from a second province may be data sets of different sample distributions.
For cross-sample consistency dimension, acquiring predicted value distribution proportion sequences corresponding to data sets with at least two different sample distributions from at least two different application related data sets; and obtaining the cross-sample consistency degree in the cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequence corresponding to the data sets of at least two different sample distributions.
Wherein cross-sample consistency is used to evaluate consistency of model output when facing data sets of different sample distributions. The evaluation parameter crossing the sample consistency dimension is a predicted value distribution proportion sequence corresponding to the data sets distributed by different samples. And obtaining predicted value distribution proportion sequences corresponding to the data sets with at least two different sample distributions from at least two different application related data sets, wherein if the predicted value distribution proportion sequences corresponding to the data sets with at least two different sample distributions are similar or identical, the output of the model application on the data sets with different sample distributions is stable, and the stability of the model application in the cross-sample dimension is realized.
Specifically, a cross-sample consistency degree value may be calculated according to a predicted value distribution proportion sequence corresponding to at least two data sets distributed by different samples, where the cross-sample consistency degree value is the consistency of a predicted value distribution proportion sequence corresponding to the data sets distributed by different samples, and the cross-sample consistency degree may be estimated according to a difference value of the predicted value distribution proportion sequence corresponding to the data sets distributed by at least two samples, where the cross-sample consistency degree is indicated to be high when the difference value of the two is smaller. The consistency degree of the cross samples can be evaluated according to the ratio of the predicted value distribution proportion sequences corresponding to the data sets of at least two different sample distributions, and when the ratio of the two is closer to 1, the consistency degree is high. The consistency degree of the cross-sample can be estimated according to the difference value and the ratio value of the predicted value distribution proportion sequence corresponding to the data sets of at least two different sample distributions, and a specific calculation formula is as follows:
wherein T is i Representing the distribution proportion of the ith predicted value corresponding to the data set of the first sample distribution, which is the proportion of the ith predicted value in the data set of the first sample distribution to the total predicted value, V i And the i-th type predicted value distribution proportion corresponding to the data set representing the second sample distribution is the proportion of the i-th type predicted value in the data set of the second sample distribution to the total predicted value.
In this embodiment, the stability of the model across sample dimensions is evaluated by evaluating the output stability of the model across sample consistency levels applied to data sets of different sample distributions.
In another embodiment, the different data sets are data sets of different application times, and the consistency dimension includes a cross-time consistency dimension and a cross-sample consistency dimension.
For cross-time consistency dimension, acquiring predicted value distribution proportion sequences corresponding to data sets of at least two different application times from at least two different application related data sets; and obtaining the cross-time consistency degree of the target monitoring model in the cross-time consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times.
The manner in which the cross-time consistency dimension is evaluated is described above and is not described in any greater detail herein.
For cross-sample consistency dimensions under data sets at different application times, as shown in fig. 3, comprising the steps of:
step 302, sample distribution analysis is performed on at least two data sets with different application times, so as to obtain sample distribution labels of each data in each data set.
The data sets with different application times can be subjected to sample distribution analysis according to different attributes of the samples, and sample distribution is respectively set for each data in the data sets. Taking a sample attribute as an academy as an example, sample distribution analysis can be performed on the sample academy, and an academy label is set for each data. Taking the sample attribute as a region as an example, sample analysis can be performed for the sample region, and region tags are set for each data.
And step 304, analyzing the predicted output values of different data sets according to the sample distribution labels to determine the predicted output value distribution of different sample distributions.
The data of the same sample distribution can be used as a data set to obtain the predicted output value distribution under the sample distribution. The predicted output value is an output value obtained by a data input model of one sample in the data set, and the distribution of the predicted output values is counted according to the predicted output value of each sample in the data set. The prediction output value distribution may represent a prediction category distribution.
Taking the sample attribute as an academic example, the model can comprise a family, a study student and a doctor, and the model output is of a purchase intention type, so as to respectively obtain the predicted value distribution (such as the purchase intention type) corresponding to each data in the family data set, the predicted value distribution (such as the purchase intention type distribution) corresponding to the study student data set and the predicted value distribution (such as the purchase intention type distribution) corresponding to the doctor.
Step 306, determining a predicted value distribution proportion sequence of different sample distributions according to the predicted value distributions of the different sample distributions, and obtaining the cross-sample consistency degree in the cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequence of the different sample distributions.
Specifically, a predicted value distribution proportion sequence of each sample distribution, specifically, a predicted value distribution proportion sequence of each distribution sample is calculated according to predicted value distributions of different sample distributions. The manner in which the cross-sample consistency degree across sample consistency assessment dimensions is calculated from the predicted value distribution ratio sequences of different sample distributions is described above, and is not described in detail herein.
Taking sample attribute as an example, the difference between the predicted value distribution proportion sequence corresponding to the raw data set and the predicted value distribution proportion sequence corresponding to the doctor can be researched according to the predicted value distribution proportion sequence in the raw data set of the family, and the cross-sample consistency degree can be estimated.
In this embodiment, the stability of the model application in the cross-time dimension and the cross-sample dimension is evaluated by evaluating the output stability of the model application in the cross-time and cross-sample data sets in different application time.
In another embodiment, the at least two different application-related data sets comprise predicted input data and predicted output values corresponding to the different data sets. The stability assessment dimension includes a coverage dimension.
The coverage represents the degree to which the model responds to the input, and is used to evaluate the degree to which the model can output the predicted outcome, and may be the ratio of the number of predicted output values in the dataset to the number of predicted input data. The higher the coverage, the more stable the representation model can predict the dataset.
Wherein, for the coverage assessment dimension, obtaining predicted input data and predicted output values for at least one of the data sets from at least two different application-related data sets; and obtaining the coverage of the target monitoring model according to the ratio of the number of the predicted output values to the number of the predicted input data.
In this embodiment, the coverage of the model may be calculated according to the predicted input data and the predicted output value of one of the data sets, specifically, according to the ratio of the number of predicted output values and the number of predicted input data of any one data set, to obtain the coverage of the model. For example, in a regularly monitored application scenario, the coverage of the model may be obtained from the ratio of the number of predicted output values and the number of predicted input data of the current application dataset.
The stability of the model may also be evaluated based on the stability of the coverage. For the coverage assessment dimension, obtaining predicted input data and predicted output values for each data set from at least two different application-related data sets; and obtaining the coverage corresponding to each data set according to the ratio of the number of the predicted output values to the number of the predicted input data, and obtaining the stability of the coverage according to the coverage difference of each data set. And if the coverage stability is high, the model can be stably output in the face of different data sets. Wherein, the coverage stability can be obtained according to the coverage difference of each data set.
In this embodiment, the degree of the predicted value is stably output by the coverage evaluation model, so that the stability of the model output is evaluated.
In another embodiment, the stability assessment dimension comprises an accuracy dimension, and the application-related data comprises a true value corresponding to the predicted input data and a predicted output value.
The training scene is characterized in that the real value corresponding to the predicted input data is marked data of the predicted input data, and the real value is a feedback result of the predicted output value in the predicted scene. Taking the predicted output value as the purchase intention as an example, the true value is the result of whether the user purchases.
The evaluation index regarding accuracy may be Precision, recall, F1 value, AUC, or the like. Specifically, the model accuracy index is calculated based on the confusion matrix:
the specific meaning of the four values is:
TP (True Positive): positive examples that are correctly predicted. That is, the true value of the data is the positive example, and the predicted value is also the positive example;
TN (True Negative): counterexamples of correctly predicted. That is, the true value of the data is the counterexample, and the predicted value is the counterexample;
FP (False Positive): a mispredicted positive example. That is, if the true value of the data is a negative example, but is mispredicted to be a positive example;
FN (False Negative): counterexamples of mispredicted. That is, the true value of the data is a positive example, but is mispredicted as a negative example.
The following model accuracy index values may be derived based on the confusion matrix:
1) Accuracy calculation
Accuracy (Accuracy) represents the proportion of correctly classified samples to the total number of samples. The calculation formula is as follows:
2) Precision calculation
The Precision (Precision), in turn, is called Precision, and represents the proportion of samples that are predicted to be positive in practice. The calculation formula is as follows:
3) Recall rate recovery calculation
The Recall (Recall), also known as Recall, represents the predicted outcome as the proportion of the actual number of positive samples in positive samples to the positive samples in the full samples. The calculation formula is as follows:
4) F1 value calculation
F1 score is a weighted average of precision and recall. The calculation formula is as follows:
5) AUC calculation
AUC (Area Under Curve) is defined as the area under the ROC curve enclosed by the coordinate axes. Typically, as a performance index for measuring the quality of a classification model, the performance index can be obtained by summing the areas of the parts under the ROC curve.
The abscissa of the ROC curve is the false positive rate, and the ordinate is the true positive rate, and the calculation method is as follows:
false Positive Rate (FPR): the probability that the positive example is determined not to be the true example, namely the probability that the positive example is determined to be the true example in the true negative example. The calculation formula is as follows:
True Positive Rate (TPR): the probability that the positive example is determined to be the true example is also the probability that the positive example is determined to be the positive example in the true example (i.e., the positive example recall rate). The calculation formula is as follows:
ROC curves were derived based on FPR and TPR. The thresholds of a two-class model may be set high or low, each threshold setting may result in different FPR and TPR, and the (FPR, TPR) coordinates of each threshold of the same model are plotted in ROC space to form the ROC curve of the specific model.
The AUC is the area under the curve, when different classification models are compared, the ROC curve of each model can be drawn, and the area under the curve is compared to be used as an index of the model quality. Because the area is found in 1x1 squares, the AUC must be between 0 and 1. Assuming that the threshold value is positive above and negative below; if a positive sample and a negative sample are randomly extracted, the classifier correctly determines that the value of the positive sample is higher than the probability=auc value of the negative sample. Briefly, the method comprises the following steps: the higher the AUC value of the classifier, the higher the accuracy.
In this embodiment, the stability evaluation dimension includes accuracy, so that the stability evaluation dimension can be expanded, and accuracy of stability evaluation is improved.
In one embodiment, performing stability evaluation on the target monitoring model according to the evaluation result of the at least one stability dimension comprises: and weighting the cross-time consistency degree, the cross-sample consistency degree coverage, the coverage and the accuracy of the target monitoring model to obtain a stability evaluation result of the target monitoring model.
The accuracy, coverage, cross-time consistency and cross-sample consistency weights of the target monitoring model can be adjusted according to business scenes. For business scenes with higher coverage requirements, the coverage weight can be improved.
Specifically, the stability evaluation formula is:
wherein std is a normalization function
Wherein a, b, c, d is the weight of accuracy, coverage, consistency across time, consistency across samples, respectively. For example, in some business scenarios, where accuracy and coverage of the model are equally important, and emphasis on cross-time consistency is one order of magnitude higher than cross-sample consistency, the weight ratios may be a=1, b=1, c=10, d=1. In some business scenarios, the coverage requirement is higher than the accuracy, the cross-time consistency requirement is higher than the cross-sample consistency, such as a private domain scenario, the weight ratio can be selected from a=1, b=10, c=10, d=1, and the stability evaluation formula of the corresponding model is as follows:
stability= (std (accuracy x coverage) × (1 std (10 x cross-time consistency + cross-sample consistency))
Traditional stability evaluation is only in a certain aspect, such as precision, and is mainly used for an offline training process, and cannot be directly evaluated for an algorithm real-time online derivation process. When the stability evaluation between different samples and different models cannot be solved in the evaluation from the aspect of algorithm precision, in addition, the index calculation is greatly influenced by extreme values, abnormal values and the like, and the comparability is poor. In the embodiment, the stability performance of algorithm coverage, accuracy, cross-sample consistency and cross-time consistency can be comprehensively evaluated, and the stability of the model can be comprehensively evaluated.
In one embodiment, when the stability evaluation result is that the stability of the target monitoring model is abnormal, the stability abnormality prompt is performed on the target monitoring model. Wherein, the stability abnormality cue may be a stability index value and an improvement cue.
Wherein, stability anomaly prompt can be visually displayed. In other embodiments, stability monitoring results are displayed for each stability assessment.
In one embodiment, the stability monitoring results corresponding to the different stability index values are shown in table 1:
table 1 stability monitoring results corresponding to different stability index values
Stability index Algorithm stability evaluation Algorithm suggestion
90%~100% Excellent and excellent properties The algorithm is very stable and can be applied for a long time
70%~90% Good quality The algorithm is stable, and the effect needs to be continuously observed
Less than 70 percent Unstable state Suggesting retraining and updating models
By outputting stability evaluation to different stability index values and corresponding prompts, for example, when the stability evaluation result is that the stability of the target monitoring model is abnormal, the stability of the target monitoring model is prompted, and the method can play a guiding role in algorithm optimization for developers (such as algorithm engineers).
In another embodiment, when the stability assessment result is that the stability of the target monitoring model is abnormal, determining an abnormality assessment dimension; and outputting a corresponding improvement prompt according to the index value of the abnormal evaluation dimension.
Specifically, a threshold value is set for each evaluation dimension, an index value of each evaluation dimension is compared with the threshold value, and the model is also evaluated in a single dimension. It can be appreciated that the stability of the algorithm is often comprehensively influenced by each dimension, the effect of a single dimension is good, the effect of other dimensions is improved, the effect of the single dimension is poor, and the effect of the other dimensions is poor. Therefore, only when the stability evaluation result is that the stability of the target monitoring model is abnormal, the index value of each evaluation dimension is compared with the corresponding threshold value to obtain the evaluation of each evaluation dimension, and when the evaluation dimension is abnormal, the corresponding improvement prompt is output according to the index value of the abnormal evaluation dimension.
The accuracy evaluation results corresponding to the accuracy index values are shown in table 2:
table 2 accuracy evaluation results corresponding to accuracy index values
Accuracy index Algorithm evaluation Algorithm suggestion
0.85~1.0 Excellent and excellent properties The algorithm is very accurate and can be applied for a long time
0.7~0.85 Good quality The algorithm is accurate, and the effect needs to be continuously observed
0.7 or less Inaccuracy of Suggesting retraining and updating models
The coverage evaluation results corresponding to the coverage index values are shown in table 3:
Table 3 coverage evaluation results corresponding to coverage index values
Coverage index Algorithm evaluation Algorithm suggestion
90%~100% Excellent and excellent properties The coverage is perfect, and the product can be applied for a long time
70%~90% Good quality Coverage has a lifting space and needs to be continuously updated
Less than 70 percent Imperfect design Suggesting reconstruction of a model or replacement data source
The cross-time consistency evaluation results corresponding to the cross-time consistency are shown in table 4:
TABLE 4 Cross-time consistency evaluation results corresponding to cross-time consistency
Cross-time consistency index Algorithm evaluation Algorithm suggestion
0~1e-3 Excellent and excellent properties The cross-time consistency is good, and the method can be applied for a long time
1e-3~1e-1 Good quality The cross-time consistency is good, and the effect needs to be continuously observed
1e-1 or more Inconsistencies in Focusing on the case of feature offset
The cross-sample consistency evaluation results corresponding to the cross-sample consistency are shown in table 5:
table 5 cross-sample consistency evaluation results corresponding to cross-sample consistency
Cross-sample consistency index Algorithm evaluation Algorithm suggestion
0~1e-5 Excellent and excellent properties The cross-sample consistency is good, and the method can be applied for a long time
1e-5~1e-3 Good quality The consistency across samples is good, and the effect needs to be continuously observed
1e-3 or more Inconsistencies in Focusing on the influence of data missing values
In this embodiment, when the stability evaluation result is that the stability of the target monitoring model is abnormal, a corresponding improvement prompt is given to the abnormal evaluation dimension, so that a developer can be assisted to know factors affecting the stability and the improvement direction, and the development efficiency is improved.
In practical application, for a service scene with a model output of continuous value (such as classification), a predicted value distribution proportion sequence can be calculated according to the model output, while for a service scene with a model output of discrete value (such as probability), statistical calculation of the predicted value distribution proportion sequence is inconvenient.
In this case, in the present embodiment, when the direct output of the model is a probability value, a mapping model is connected after the model is output, and the probability value directly output by the model is mapped to the classification category. Therefore, the model stability monitoring method provided by the application is compatible with the model output whether the model output is classified or probability.
In one embodiment, an application of a model stability monitoring method is shown in fig. 4:
at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets are obtained. The two different application related data sets comprise predicted input data, a predicted output value corresponding to the predicted input data and a true value. In some application scenarios, the true value may be obtained by predicting the feedback of the result of the input value application.
And acquiring target parameters corresponding to each evaluation dimension from at least two different application related data sets, and calculating index values of each evaluation dimension according to each target parameter.
As shown in FIG. 4, the evaluation dimensions include accuracy, coverage, cross-time consistency, and cross-sample consistency. And weighting according to the accuracy, coverage, cross-time consistency degree and cross-sample consistency of the target monitoring model to obtain a stability evaluation result of the target monitoring model.
And when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model, thereby realizing automatic monitoring on the stability of the model.
Taking the model as a rating model for example, the rating model can rate by using a target object, such as purchase intention rating of a user, such as consumption level rating, and the like. The application of the rating model application stability detection may comprise the steps of:
1. a list of target objects to be rated is entered, represented by a unique identification id, as shown in fig. 5.
2. And (5) measuring and calculating by using a rating model, and sequencing the ratings and the intentions.
3. The rating model outputs the rating result of each target object, which may be the rating score of each target object, or may be a target object list ordered from big to small according to intent, as shown in fig. 6.
For this rating model, as shown in fig. 7, an algorithm stability monitoring module is provided, which implements the model stability monitoring of the present application. The application process is as follows:
1. And opening a client page by a user, and inputting a target object list to be rated.
2. The user transmits the list of target objects back to the backend server.
3. The back-end server matches corresponding target object features from the feature database using the target object list.
4. Target object features are input into the rating model.
5. The rating model outputs the intent score of each target object and returns the intent score to the back-end server.
6. And the back-end server returns the intent score to the user end.
7. The algorithm stability monitoring link monitors and calculates model output and client feedback data to obtain the actual effect of the rating model, and is used for guiding optimization and updating of the model.
After experiments are carried out and the stability monitoring method is applied, the following data can be obtained:
1. the rating model coverage was calculated to be 100%.
2. The accuracy was 0.8037.
3. The cross-time consistency degree results are shown below, with a calculated distance-time consistency degree of 7.58e-5.
Score T V T% V% ln(T%/V%) (T%-V%)ln(T%/V%)
1 18079 9987 0.1021 0.0998 0.0227 5.22e-5
2 35366 20024 0.1998 0.2002 -0.0019 7.31e-7
3 70893 40016 0.4006 0.4001 0.0011 5.40e-7
4 35146 19973 0.1986 0.1997 -0.0056 6.24e-6
5 17472 10000 0.0987 0.1000 -0.0127 1.60e-5
4. Cross-sample consistency results the degree of calculated cross-sample consistency is 0.0062 as follows.
Score A B A% B% ln(A%/B%) (A%-B%)ln(A%/B%)
1 468 515 0.0738 0.0807 -0.0892 0.0006
2 1279 1358 0.2017 0.2128 -0.0534 0.0005
3 2837 2661 0.4475 0.4170 0.0704 0.0021
4 1297 1288 0.2046 0.2018 0.0134 0.00003
5 458 558 0.0722 0.0874 -0.1910 0.0029
According to coverage, accuracy, cross-time consistency and cross-sample consistency, the comprehensive calculation stability index is 77.87%, the obtained stability evaluation is good, and the corresponding stability prompt is as follows: the model is stable, and the effect needs to be continuously observed.
The model stability monitoring method can comprehensively evaluate the stability performance of algorithm coverage, accuracy, cross-sample consistency and cross-time consistency, and is compatible in both algorithm output fraction and continuous probability. Through the scheme, the effect evaluation can be carried out on the offline training process, and the real-time monitoring of the online effect of the algorithm can be realized.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a model stability monitoring device for realizing the above-mentioned model stability monitoring method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitations in one or more embodiments of the model stability monitoring device provided below may be referred to above for the limitations of the model stability monitoring method, and will not be described herein.
In one embodiment, as shown in fig. 8, there is provided a model stability monitoring apparatus comprising:
the target acquisition module 802 is configured to acquire a target monitoring model.
The data obtaining module 804 is configured to obtain at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets.
The consistency dimension evaluation module 806 is configured to obtain a predicted value distribution proportion sequence corresponding to at least two different data sets from at least two different application related data sets, and obtain a consistency degree of a consistency dimension according to the predicted value distribution proportion sequence corresponding to at least two different data sets.
The total evaluation module 808 is configured to perform stability evaluation on the target monitoring model according to an evaluation result of the at least one stability dimension; the evaluation result of the at least one stability evaluation dimension includes a degree of consistency of the consistency evaluation dimension.
And the monitoring module 810 is configured to prompt the stability anomaly of the target monitoring model when the stability evaluation result is that the stability of the target monitoring model is abnormal.
According to the model stability monitoring device, the predicted value distribution proportion sequences corresponding to at least two different data sets are obtained from at least two different application related data sets, the consistency degree of the consistency evaluation dimension is obtained according to the predicted value distribution proportion sequences corresponding to at least two different data sets, the consistency degree can represent the consistency of the predicted distribution of the target monitoring model under different data sets, therefore, the stability of the target monitoring model when the target monitoring model is applied to different application data sets can be represented, compared with the traditional accuracy and other evaluation indexes, the model evaluation dimension can be expanded by introducing the stability index, and the evaluation index of the model is enriched. And when the stability abnormality of the model is monitored, a stability abnormality prompt is output, so that the automatic detection of the stability of the model is realized, the stability abnormality of the model is found in time, and the stability of the application of the model is further improved.
In another embodiment, the different data sets are data sets of different application times; the consistency dimension includes a cross-time consistency dimension.
The consistency dimension evaluation module is used for acquiring predicted value distribution proportion sequences corresponding to the data sets of at least two different application times from at least two different application related data sets; and obtaining the cross-time consistency degree of the target monitoring model in the cross-time consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times.
In another embodiment, the different data sets are data sets of different sample distributions; the consistency dimension includes a cross-sample consistency dimension.
The consistency dimension evaluation module is used for acquiring predicted value distribution proportion sequences corresponding to data sets with at least two different sample distributions from at least two different application related data sets, and obtaining cross-sample consistency degree in cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets with at least two different sample distributions.
In another embodiment, the at least two different application-related data sets include predicted output values corresponding to the respective different data sets; the consistency dimension also includes a cross-sample consistency dimension.
The consistency dimension evaluation module is used for carrying out sample distribution analysis on at least two data sets with different application times to obtain sample distribution labels of all data in all the data sets; analyzing the predicted output values of different data sets according to the sample distribution labels to determine the predicted output value distribution of different sample distributions; and determining a predicted value distribution proportion sequence of different sample distributions according to the predicted value distributions of the different sample distributions, and obtaining the cross-sample consistency degree in the cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequence of the different sample distributions.
In another embodiment, the at least two different application-related data sets include predicted input data and predicted output values corresponding to the different data sets; the stability assessment dimension includes a coverage dimension.
The model stability monitoring device further comprises a coverage evaluation module for acquiring predicted input data and predicted output values of at least one of the data sets from at least two different application-related data sets; and obtaining the coverage of the target monitoring model according to the ratio of the number of the predicted output values to the number of the predicted input data.
In another embodiment, the stability assessment dimension further comprises: accuracy assessment dimension, coverage assessment dimension; the consistency dimension includes a cross-time consistency dimension and a cross-sample consistency dimension.
The overall evaluation module is used for weighting the cross-time consistency degree, the cross-sample consistency degree coverage and the accuracy of the target monitoring model to obtain a stability evaluation result of the target monitoring model.
In another embodiment, the monitoring module is further configured to determine an anomaly evaluation dimension when the stability evaluation result is a stability anomaly of the target monitoring model; and outputting a corresponding improvement prompt according to the index value of the abnormal evaluation dimension.
In another embodiment, the model stability monitoring apparatus further comprises a preprocessing module for mapping the probability value directly output by the model to the classification category when the direct output of the model is the probability value.
The modules in the model stability monitoring device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for model input data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a model stability monitoring method.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor, which when executing the computer program, implements the steps of model stability monitoring of the above embodiments.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon which, when executed by a processor, implements the steps of model stability monitoring of the above embodiments.
In an embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of model stability monitoring of the above embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A method of model stability monitoring, the method comprising:
acquiring a target monitoring model;
acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
Performing stability evaluation on the target monitoring model according to an evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and when the stability evaluation result is that the stability of the target monitoring model is abnormal, carrying out stability abnormality prompt on the target monitoring model.
2. The method of claim 1, wherein the different data sets are data sets of different application times; the consistency dimension includes a cross-time consistency dimension;
the obtaining the predicted value distribution proportion sequence corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of the consistency dimension according to the predicted value distribution proportion sequence corresponding to the at least two different data sets includes:
acquiring predicted value distribution proportion sequences corresponding to data sets of at least two different application times from the at least two different application related data sets; obtaining the cross-time consistency degree of the target monitoring model in the cross-time consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets of at least two different application times.
3. The method of claim 1, wherein the different data sets are data sets of different sample distributions; the consistency dimension includes a cross-sample consistency dimension;
the obtaining the predicted value distribution proportion sequence corresponding to at least two different data sets from the at least two different application related data sets, and obtaining the consistency degree of the consistency dimension according to the predicted value distribution proportion sequence corresponding to the at least two different data sets includes:
and obtaining predicted value distribution proportion sequences corresponding to the data sets distributed by at least two different samples from the at least two different application related data sets, and obtaining the cross-sample consistency degree in the cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequences corresponding to the data sets distributed by at least two different samples.
4. The method of claim 2, wherein the at least two different application-related data sets include predicted output values corresponding to respective different data sets; the consistency dimension further includes a cross-sample consistency dimension;
the method comprises the steps of obtaining predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, obtaining the consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets, and further comprising:
Sample distribution analysis is carried out on the data sets of the at least two application times, so that sample distribution labels of all data in all data sets are obtained;
analyzing the predicted output values of different data sets according to the sample distribution labels to determine the predicted output value distribution of different sample distributions;
and determining a predicted value distribution proportion sequence of different sample distributions according to the predicted value distributions of the different sample distributions, and obtaining the cross-sample consistency degree in the cross-sample consistency evaluation dimension according to the predicted value distribution proportion sequence of the different sample distributions.
5. The method of claim 1, wherein the at least two different application-related data sets include predicted input data and predicted output values corresponding to different data sets;
the stability assessment dimension includes a coverage dimension, the method further comprising:
obtaining predicted input data and predicted output values of at least one of the data sets from the at least two different application-related data sets; and obtaining the coverage of the target monitoring model according to the ratio of the number of the predicted output values to the number of the predicted input data.
6. The method of claim 1, wherein the stability assessment dimension further comprises: accuracy assessment dimension, coverage assessment dimension; the consistency dimension comprises a cross-time consistency dimension and a cross-sample consistency dimension;
And performing stability evaluation on the target monitoring model according to the evaluation result of at least one stability dimension, wherein the stability evaluation comprises the following steps: and weighting the cross-time consistency degree, the cross-sample consistency degree coverage, the coverage and the accuracy of the target monitoring model to obtain a stability evaluation result of the target monitoring model.
7. The method according to any one of claims 1 to 6, further comprising:
when the stability evaluation result is that the stability of the target monitoring model is abnormal, determining an abnormal evaluation dimension;
and outputting a corresponding improvement prompt according to the index value of the abnormal evaluation dimension.
8. The method of claim 4, wherein the predicted output value is a classification category, and wherein when the direct output of the model is a probability value, the method further comprises: and mapping the probability value directly output by the model into a classification category.
9. A model stability monitoring device, the device comprising:
the target acquisition module is used for acquiring a target monitoring model;
the data acquisition module is used for acquiring at least two different application related data sets obtained by applying the target monitoring model to at least two different data sets;
The consistency dimension evaluation module is used for acquiring predicted value distribution proportion sequences corresponding to at least two different data sets from the at least two different application related data sets, and obtaining consistency degree of consistency dimension according to the predicted value distribution proportion sequences corresponding to the at least two different data sets;
the overall evaluation module is used for evaluating the stability of the target monitoring model according to the evaluation result of at least one stability dimension; the evaluation result of the at least one stability evaluation dimension comprises a consistency degree of the consistency evaluation dimension;
and the monitoring module is used for prompting the stability abnormality of the target monitoring model when the stability evaluation result is that the stability of the target monitoring model is abnormal.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
CN202310258052.8A 2023-03-08 2023-03-08 Model stability monitoring method and device and computer equipment Pending CN116975621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310258052.8A CN116975621A (en) 2023-03-08 2023-03-08 Model stability monitoring method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310258052.8A CN116975621A (en) 2023-03-08 2023-03-08 Model stability monitoring method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN116975621A true CN116975621A (en) 2023-10-31

Family

ID=88473774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310258052.8A Pending CN116975621A (en) 2023-03-08 2023-03-08 Model stability monitoring method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN116975621A (en)

Similar Documents

Publication Publication Date Title
US20140278339A1 (en) Computer System and Method That Determines Sample Size and Power Required For Complex Predictive and Causal Data Analysis
Xu et al. Fusing complete monotonic decision trees
CN112131261A (en) Community query method and device based on community network and computer equipment
Bhardwaj et al. Health insurance amount prediction
CN112149884A (en) Academic early warning monitoring method for large-scale students
CN105335595A (en) Feeling-based multimedia processing
Tang et al. Dropout Rate Prediction of Massive Open Online Courses Based on Convolutional Neural Networks and Long Short‐Term Memory Network
US11144938B2 (en) Method and system for predictive modeling of consumer profiles
Tribhuvan et al. Applying Naïve Bayesian classifier for predicting performance of a student using WEKA
Liu et al. [Retracted] Deep Learning and Collaborative Filtering‐Based Methods for Students’ Performance Prediction and Course Recommendation
Chrisnanto et al. The uses of educational data mining in academic performance analysis at higher education institutions (case study at UNJANI)
CN116523001A (en) Method, device and computer equipment for constructing weak line identification model of power grid
Wu et al. Generating life course trajectory sequences with recurrent neural networks and application to early detection of social disadvantage
CN116029760A (en) Message pushing method, device, computer equipment and storage medium
Kavipriya et al. Adaptive weight deep convolutional neural network (AWDCNN) classifier for predicting student’s performance in job placement process
CN116975621A (en) Model stability monitoring method and device and computer equipment
Wang [Retracted] Design of Chinese Teaching Evaluation System for International Students under the Background of Data Mining
CN114529063A (en) Financial field data prediction method, device and medium based on machine learning
Shen et al. A deep embedding model for co-occurrence learning
CN114529399A (en) User data processing method, device, computer equipment and storage medium
CN114254762A (en) Interpretable machine learning model construction method and device and computer equipment
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
Rong et al. Exploring network behavior using cluster analysis
CN117151247B (en) Method, apparatus, computer device and storage medium for modeling machine learning task
CN117078112B (en) Energy consumption detection method and data analysis system applied to enterprise abnormal electricity management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication