CN115757035A - Data model monitoring method and device, processor and electronic equipment - Google Patents

Data model monitoring method and device, processor and electronic equipment Download PDF

Info

Publication number
CN115757035A
CN115757035A CN202211476936.2A CN202211476936A CN115757035A CN 115757035 A CN115757035 A CN 115757035A CN 202211476936 A CN202211476936 A CN 202211476936A CN 115757035 A CN115757035 A CN 115757035A
Authority
CN
China
Prior art keywords
data
model
sample
monitoring
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211476936.2A
Other languages
Chinese (zh)
Inventor
王中晴
易厚梅
蒋李灵
吴心坪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211476936.2A priority Critical patent/CN115757035A/en
Publication of CN115757035A publication Critical patent/CN115757035A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application discloses a data model monitoring method, a data model monitoring device, a processor and electronic equipment. Relating to the field of cloud computing, the method comprises the following steps: obtaining models required in the process of processing target services, obtaining various data models, and determining a data source of each data model; storing data in data sources of various data models to a target data lake; extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; calculating an evaluation index for a model to be evaluated in the multiple data models according to a preset element; and calculating a monitoring index of the model to be evaluated according to the evaluation index, and monitoring the model to be evaluated through the monitoring index. By the method and the device, the problem that computing resources and storage resources are wasted due to the fact that monitoring indexes need to be computed through a large amount of detailed data when the data model is monitored in the related technology is solved.

Description

Data model monitoring method and device, processor and electronic equipment
Technical Field
The application relates to the field of cloud computing, in particular to a monitoring method and device of a data model, a processor and electronic equipment.
Background
With the development of big data computing and storage architectures, artificial intelligence AI technology built on the storage architecture is more and more mature. And therefore, the application of intelligent models is increasing. In the process of using the intelligent model, the life cycle of the intelligent model needs to be managed, and the most managed in the life cycle management of the intelligent model is the evaluation monitoring of the effect of the model. In the related art, a relevant evaluation index of a model is calculated based on detailed model data, thereby monitoring the model. Although the monitoring method can also perform stable and effective calculation on the evaluation index, the calculation method still has the following problems: when intelligent models in the related art are managed, the number of models to be managed can reach hundreds to thousands, and with the development of business, the data volume of each model is larger and larger, the model data of larger business is even hundreds of millions, and the stored detail data occupies more data storage space, so that the storage resource waste is caused.
In addition, large data computing engines such as Hadoop, spark, hive and the like are used for managing the model based on the calculation of the large data, and a large amount of CPU and memory resources are occupied. Most of the calculation logics based on the large data framework system are performed in an off-line batch calculation mode, the calculation mode is time-consuming and cannot check analysis results in real time, and the calculation results are rigid and cannot be displayed in real time.
Aiming at the problem that computing resources and storage resources are wasted due to the fact that a large amount of detailed data are needed to compute monitoring indexes when a data model is monitored in the related technology, an effective solution is not provided at present.
Disclosure of Invention
The present application mainly aims to provide a method, an apparatus, a processor and an electronic device for monitoring a data model, so as to solve the problem that computational resources and storage resources are wasted because a large amount of detailed data are required to calculate a monitoring index when the data model is monitored in the related art.
To achieve the above object, according to one aspect of the present application, there is provided a monitoring method of a data model. The method comprises the following steps: obtaining models required in the process of processing target services, obtaining various data models, and determining a data source of each data model; storing data in data sources of various data models to a target data lake; extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; calculating an evaluation index for a model to be evaluated in the multiple data models according to a preset element; and calculating a monitoring index of the model to be evaluated according to the evaluation index, and monitoring the model to be evaluated through the monitoring index.
Optionally, storing data in the data sources of the plurality of data models to the target data lake comprises: judging whether the operation mode of each data model is a real-time operation mode; under the condition that the operation mode of the data model is a non-real-time operation mode, storing data in a data source of the data model into a target data lake; under the condition that the operation mechanism of the data model is a real-time operation mode, data in a data source of the data model are stored to a cloud server, wherein the cloud server is in communication connection with a target data lake.
Optionally, extracting at least one preset element from the target data lake comprises: determining at least one preset element, wherein the at least one preset element comprises at least one of the following elements: sample value, sample frequency, model output result and actual result related to the model output result; and storing at least one preset element into the intermediate table, and compressing and storing other elements except the preset element in the target data lake into the intermediate table.
Optionally, under a condition that the preset elements at least include a sample value, a sample frequency, a model output result, and an actual result associated with the model output result, calculating an evaluation index for a model to be evaluated in the multiple data models according to the preset elements includes: determining a calculation mode of the evaluation index and the grouping number of the model output result, wherein the calculation mode comprises an equal frequency mode and an equidistant mode, and the grouping number is the number of different evaluation intervals divided for the model output result; determining a plurality of model output results contained in each evaluation interval according to the calculation mode and the grouping number; determining a positive sample, a first proportion of the positive sample in the evaluation interval, a negative sample and a second proportion of the negative sample in the evaluation interval according to the model output result and the actual result; and determining the positive sample, the negative sample, the first ratio and the second ratio as evaluation indexes.
Optionally, determining a plurality of model output results included in each evaluation interval according to the calculation mode and the grouping number includes: under the condition that the calculation mode is the equal frequency mode, determining the sum of the frequency of each sample of the model to be evaluated as the total number of the samples; calculating a first ratio of the total number of samples to the number of groups, and determining a plurality of sample values contained in each evaluation interval according to the first ratio; determining the difference value between the maximum value and the minimum value of the model output result under the condition that the calculation mode is the equidistant mode; and calculating a second ratio of the difference value to the packet number, and determining a plurality of sample values contained in each evaluation interval according to the second ratio.
Optionally, determining the positive sample, the first proportion of the positive sample in the evaluation interval, the negative sample and the second proportion of the negative sample in the evaluation interval according to the model output result and the actual result comprises: judging whether the model output result corresponding to each sample value is the same as the actual result or not; determining the sample value as a positive sample under the condition that the model output result is the same as the actual result; determining the sample value as a negative sample under the condition that the model output result is different from the actual result; calculating the ratio of the total number of the positive samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a first ratio; and calculating the ratio of the total number of the negative samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a second ratio.
Optionally, the calculating the monitoring index of the model to be evaluated according to the evaluation index includes: and inputting the evaluation index into a preset formula to obtain a monitoring index, wherein the preset formula at least comprises one of a positive and negative sample discrimination index calculation formula and a sample stability index calculation formula.
In order to achieve the above object, according to another aspect of the present application, there is provided a monitoring apparatus of a data model. The device includes: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring models required in the process of processing target services, acquiring various data models and determining a data source of each data model; the storage unit is used for storing data in the data sources of the various data models into a target data lake; the extraction unit is used for extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; the calculation unit is used for calculating an evaluation index of a model to be evaluated in the multiple data models according to preset elements; and the monitoring unit is used for calculating a monitoring index of the model to be evaluated according to the evaluation index and monitoring the model to be evaluated through the monitoring index.
By the application, the following steps are adopted: obtaining models required in the process of processing target services, obtaining various data models, and determining a data source of each data model; storing data in data sources of various data models to a target data lake; extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; calculating an evaluation index for a model to be evaluated in the multiple data models according to a preset element; the monitoring index of the model to be evaluated is calculated according to the evaluation index, and the model to be evaluated is monitored through the monitoring index, so that the problem that computing resources and storage resources are wasted due to the fact that a large amount of detailed data are needed to calculate the monitoring index when the data model is monitored in the related technology is solved. By extracting preset elements in data sources of various data models to a target data lake, calculating evaluation indexes of different models according to the preset elements and monitoring various models in a unified manner, the effect of saving calculation resources and storage resources when monitoring the data models is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for monitoring a data model provided according to an embodiment of the application;
FIG. 2 is a flow chart of a manner of calculating a monitoring index provided according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a monitoring device for a data model provided according to an embodiment of the present application;
fig. 4 is a schematic diagram of an electronic device provided according to an embodiment of the present application.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
The present invention is described below with reference to preferred implementation steps, and fig. 1 is a flowchart of a monitoring method for a data model provided in an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, obtaining models required in the process of processing the target service, obtaining various data models, and determining the data source of each data model.
Specifically, the target service may be a service that needs to use a data model, for example, a service that needs to use a product recommendation marketing, user risk assessment, user loss condition prediction, and the like, and the data model is established through data of a big data platform, so that a service target is determined more accurately through the data model. The data models may be data models obtained from different development platforms. A plurality of data models are collected and the data source on which each data model operates is determined.
It should be noted that the running modes and running log files of different models are known by collecting the characteristics and running mechanisms of different model development platforms. And storing the data corresponding to the data models with different operation modes into a target data lake.
And S102, storing data in the data sources of the multiple data models into a target data lake.
Specifically, due to the fact that the targets and the main keys of different data models are inconsistent, in order to improve the efficiency of computing the access of different data sources to the big data, the data in the obtained data sources of the different data models are uniformly written into the data lake, and then the data written into the data lake is subjected to data templating. The functions and objectives of the different data models are determined, as well as the data primary keys used by the models and the model results. For example, the data primary Key of the A data model is Key i =(key 1 ,key 2 ,Λ,key n ) It is stored to the Target data lake normalized to the primary Key form of Target _ Key and Score.
Step S103, at least one preset element is extracted from the target data lake, wherein the preset element is an element required for evaluating the data model.
Specifically, when various data models are managed, common elements, namely preset elements, required for model evaluation are extracted from standardized data in a target data lake, for example, a middle table T with sample values, sample frequencies, model output results and actual results associated with the model output results as main elements is extracted, monitoring management of various data models is realized through the middle table, and storage space can be greatly saved by only extracting the preset elements to manage the data models. By taking a credit card risk model as an example, nearly one hundred million detailed data can be compressed and stored into less than 2000 pieces of data in a large scale by compressing and storing preset elements into a middle table, the compression efficiency is nearly 99.9%, and data storage resources are greatly saved.
And step S104, calculating evaluation indexes of the model to be evaluated in the multiple data models according to preset elements.
Specifically, when each data model is monitored and managed, a model to be evaluated which needs to be monitored and managed is determined, and an evaluation index is calculated through a preset element corresponding to the model to be evaluated in the intermediate table, so that the model to be evaluated is monitored.
And S105, calculating a monitoring index of the model to be evaluated according to the evaluation index, and monitoring the model to be evaluated through the monitoring index.
Specifically, the monitoring Index may be a PSI value (Population Stability Index), a KS value (Kolmogrov-Smirnov), or the like. The model monitoring refers to monitoring the stability and the discrimination of the model; dividing the model target which can be obtained or not into a front-end monitoring and a back-end monitoring by taking the model target as a demarcation point; the front-end monitoring refers to monitoring before a model target is not obtained, and mainly detects indexes such as model stability, operation conditions and the like, such as PSI (program specific information) and the like; the back-end monitoring refers to monitoring after a model target is acquired, and mainly detects whether the effect of the model is degraded or not, whether the model needs iteration or not, and the like, such as a KS value.
According to the monitoring method of the data model, multiple data models are obtained by obtaining the models required in the process of processing the target service, and the data source of each data model is determined; storing data in data sources of various data models to a target data lake; extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; calculating an evaluation index of a model to be evaluated in the multiple data models according to a preset element; the monitoring index of the model to be evaluated is calculated according to the evaluation index, and the model to be evaluated is monitored through the monitoring index, so that the problem that computing resources and storage resources are wasted due to the fact that a large amount of detailed data are needed to calculate the monitoring index when the data model is monitored in the related technology is solved. By extracting preset elements in data sources of various data models to a target data lake, calculating evaluation indexes of different models according to the preset elements and monitoring various models in a unified manner, the effect of saving calculation resources and storage resources when monitoring the data models is achieved.
Optionally, in the monitoring method of a data model provided in the embodiment of the present application, storing data in data sources of multiple data models in a target data lake includes: judging whether the operation mode of each data model is a real-time operation mode or not; under the condition that the operation mode of the data model is a non-real-time operation mode, storing data in a data source of the data model into a target data lake; under the condition that the operation mechanism of the data model is a real-time operation mode, data in a data source of the data model are stored in a cloud server, wherein the cloud server is in communication connection with a target data lake.
Specifically, a real-time operation mode is a data access data model that needs to be updated in real time when the data model is maintained, a non-real-time operation mode is an operation mode capable of performing offline data management, and corresponding data are stored according to the operation mode of the data model. Real-time data in the data model of the real-time operation mode is stored in a big data platform, namely a cloud server, in real time through Flink (a big data real-time computing engine) or Mysql (a database management system); real-time data in a data model of a non-real-time operation mode is written into a target data lake by uniformly using Hive Sql (a database management system) or Spark (a big data calculation engine) after offline data batch calculation. By storing the data corresponding to the data models with different operation modes into the target data lake, data extraction is convenient to perform during subsequent management of the data models, and a uniform data entry is ensured.
After the data is stored in the target data lake, monitoring the data model by extracting preset elements in the target data lake, optionally, in the monitoring method of the data model provided in the embodiment of the present application, extracting at least one preset element from the target data lake includes: determining at least one preset element, wherein the at least one preset element comprises at least one of the following: sample value, sample frequency, model output result and actual result related to the model output result; and storing at least one preset element into the intermediate table, and compressing and storing other elements except the preset element in the target data lake into the intermediate table.
Specifically, the sample value is a sample value corresponding to the data model, the sample frequency is a frequency of occurrence of each sample value in a sample set corresponding to the data model, the model output result is also an output result of the data model, for example, a predicted service condition of some services, and the actual result is an actual service condition corresponding to the sample value. By extracting common elements, namely preset elements, required for evaluating the data model in the target data lake, the evaluation indexes of the data model are calculated according to the preset elements, compared with the prior art that the data model is monitored through detailed data, the data model evaluation method and the data model evaluation device have the advantages that the model output result is used as a main key, the intermediate table T which takes the sample value, the sample frequency, the model output result and the actual result related to the model output result as main elements is generated, and the data model is monitored through the intermediate table, so that the storage space can be greatly saved. By taking a credit card risk model as an example, by extracting preset elements in data corresponding to the model, about hundred million detailed data can be compressed and stored in a large scale into less than 2000 pieces of data, the compression efficiency is nearly 99.9%, and data storage resources are greatly saved. The compression storage of the preset elements in the target data lake to the intermediate table is realized through the following codes.
By defining the sample value field as { Target _ Key }, the sample frequency field as { Target _ Key }, the model output result field as { Target _ Key and Score }, the actual result field as { Target _ Key and Target }, and the design structure of the intermediate table T as { Score, weight, target }, where Weight represents the number of occurrences of Score and Target. For example, the part may be implemented by Sql code, and the pseudo code is as follows:
Figure BDA0003960308020000061
Figure BDA0003960308020000071
optionally, in the monitoring method of a data model provided in this embodiment of the present application, in a case that the preset element at least includes a sample value, a sample frequency, a model output result, and an actual result associated with the model output result, calculating an evaluation index for a model to be evaluated in a plurality of data models according to the preset element includes: determining a calculation mode of the evaluation index and the grouping number of the model output result, wherein the calculation mode comprises an equal frequency mode and an equidistant mode, and the grouping number is the number of different evaluation intervals divided for the model output result; determining a plurality of model output results contained in each evaluation interval according to the calculation mode and the grouping number; determining a positive sample, a first proportion of the positive sample in the evaluation interval, a negative sample and a second proportion of the negative sample in the evaluation interval according to the model output result and the actual result; the positive sample, the negative sample, the first proportion and the second proportion are determined as evaluation indexes.
Specifically, the calculation mode refers to a calculation mode of dividing an evaluation interval for a model output result, the evaluation index is calculated based on data of the compressed intermediate table T, offline batch calculation is not performed by using a big data engine Hive SQL or Spark, and a general development language such as Java, C + +, python and the like is adopted for real-time viewing and calculation. Fig. 2 is a flowchart of a calculation method of a monitoring index according to an embodiment of the present application. As shown in fig. 2, a desired number of packets Bins and a sum calculation method including equal frequency and equal distance are input. And selecting different calculation logics according to the input calculation mode, and determining the model output results contained in different evaluation intervals. And sequencing the output results of the models for determining the evaluation interval. And respectively calculating a positive sample, a first proportion of the positive sample in the evaluation interval, a negative sample and a second proportion of the negative sample in the evaluation interval by using a loop statement. And after the evaluation index is calculated through the preset element, determining the monitoring index according to the evaluation index.
Optionally, in the monitoring method for a data model provided in this embodiment of the present application, determining a plurality of model output results included in each evaluation interval according to the calculation mode and the number of groups includes: under the condition that the calculation mode is the equal frequency mode, determining the sum of the frequency of each sample of the model to be evaluated as the total number of the samples; calculating a first ratio of the total number of samples to the number of packets, and determining a plurality of sample values contained in each evaluation interval according to the first ratio; determining the difference value between the maximum value and the minimum value of the model output result under the condition that the calculation mode is the equidistant mode; and calculating a second ratio of the difference value to the number of the packets, and determining a plurality of sample values contained in each evaluation interval according to the second ratio.
Specifically, the constant frequency mode refers to a mode of dividing the evaluation interval equally by the frequency of data occurrence, for example, one set of models outputs results as 1, 3, 7, 9. The total number of the data is 10, if the grouping number is 2, every 5 data are an evaluation interval, and the model output results in the first evaluation interval are 1, 3 and 3. The model outputs in the second evaluation interval are 3, 7, 9. The equidistant manner means that the evaluation interval is equally divided by the maximum difference between the data, for example, a set of models output results of 1, 1 3, 7, 9. If the number of groups is 2, firstly, the difference value between the maximum value and the minimum value is calculated to be 9-1 and is equal to 8, and the division of the difference value by the number of groups is 4, then the data interval of each evaluation interval is 4, namely the model output result of the first evaluation interval is a value between 1 and 5 and is 1, 3 or 3. The model output for the second evaluation interval is a value between 5 and 9, 7, 9. Assessment indexes are determined according to different calculation modes, assessment intervals can be divided into the data model in a targeted mode, and therefore the assessment indexes are guaranteed to be more accordant with the target of the data model.
In addition, if the calculation method is an equal frequency calculation method, the total number of samples and the number of packets may not be evenly divided. If the model output results in the last evaluation interval are not exactly divisible, the remainder is recorded as M, the number of the model output results in the last evaluation interval is M more than the number of the model output results in other evaluation intervals, or one piece of data is respectively added to the model output results in the previous M evaluation intervals.
Optionally, in the monitoring method of a data model provided in the embodiment of the present application, determining the positive sample, the first proportion of the positive sample in the evaluation interval, the negative sample, and the second proportion of the negative sample in the evaluation interval according to the model output result and the actual result includes: judging whether the model output result corresponding to each sample value is the same as the actual result or not; under the condition that the output result of the model is the same as the actual result, determining the sample value as a positive sample; determining the sample value as a negative sample under the condition that the model output result is different from the actual result; calculating the ratio of the total number of the positive samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a first ratio; and calculating the ratio of the total number of the negative samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a second ratio.
For example, a certain model for predicting customer churn conditions has two evaluation intervals, the range of the model output results in the first evaluation interval is 0 to 50, the range of the model output results in the second evaluation interval is 50 to 100, the data model is used for predicting whether customers churn, the model output results in the range of 0 to 50 represent that customers are not easy to churn, the model output results in the range of 50 to 100 represent that customers are easy to churn, and positive samples and negative samples in each evaluation interval are determined by comparing the customer churn conditions represented by the model output results with the actual churn conditions of the customers. For example, the actual result of the client with the model output result of 90 is not lost, which is different from the model output result, and the sample corresponding to the model output result is proved to be a negative sample, and the actual result of the client with the model output result of 30 is not lost, and the sample is proved to be a positive sample as the model output result. And determining the total number of the positive samples and the total number of the negative samples in each evaluation interval, calculating the proportion of the total number of the positive samples in the respective evaluation interval to the total number of the samples in the whole evaluation interval, namely a first proportion, and calculating the proportion of the total number of the negative samples in the respective evaluation interval to the total number of the samples in the whole evaluation interval, namely a second proportion.
Optionally, in the monitoring method of a data model provided in the embodiment of the present application, calculating the monitoring index of the model to be evaluated according to the evaluation index includes: and inputting the evaluation index into a preset formula to obtain a monitoring index, wherein the preset formula at least comprises one of a positive and negative sample discrimination index calculation formula and a sample stability index calculation formula.
Specifically, a positive and negative sample discrimination index calculation formula, that is, a KS value calculation formula can detect whether the prediction effect of the model to be evaluated is degraded or not and whether the model to be evaluated needs to be updated and iterated or not through the KS value. And a sample stability index calculation formula, namely a PSI value calculation formula, and the stability and the operation condition of the model to be evaluated can be detected through the PSI value.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present application further provides a monitoring device for a data model, and it should be noted that the monitoring device for a data model according to the embodiment of the present application may be used to execute the monitoring method for a data model according to the embodiment of the present application. The following describes a monitoring apparatus for a data model provided in an embodiment of the present application.
Fig. 3 is a schematic diagram of a monitoring apparatus for a data model provided according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
the acquisition unit 10 is configured to acquire a model required in a process of processing a target service, obtain multiple data models, and determine a data source of each data model;
the storage unit 20 is used for storing data in data sources of various data models into a target data lake;
the extraction unit 30 is configured to extract at least one preset element from the target data lake, where the preset element is an element required for evaluating the data model;
the calculation unit 40 is configured to calculate an evaluation index for a model to be evaluated in the multiple data models according to a preset element;
and the monitoring unit 50 is configured to calculate a monitoring index of the model to be evaluated according to the evaluation index, and monitor the model to be evaluated through the monitoring index.
According to the monitoring device for the data models, provided by the embodiment of the application, the model required in the process of processing the target service is obtained through the obtaining unit 10, various data models are obtained, and the data source of each data model is determined; the storage unit 20 is used for storing data in the data sources of the various data models into a target data lake; an extraction unit 30, which extracts at least one preset element from the target data lake, wherein the preset element is an element required for evaluating the data model; the calculation unit 40 is used for calculating an evaluation index of a model to be evaluated in the multiple data models according to preset elements; the monitoring unit 50 is used for calculating monitoring indexes of the model to be evaluated according to the evaluation indexes and monitoring the model to be evaluated through the monitoring indexes, so that the problem that computing resources and storage resources are wasted due to the fact that the monitoring indexes need to be calculated through a large amount of detailed data when the data model is monitored in the related technology is solved.
Optionally, in the monitoring apparatus for a data model provided in the embodiment of the present application, the storage unit 20 includes: the judging module is used for judging whether the operation mode of each data model is a real-time operation mode; the first storage module is used for storing data in a data source of the data model into a target data lake under the condition that the operation mode of the data model is a non-real-time operation mode; the second storage module is used for storing the data in the data source of the data model to the cloud server under the condition that the operation mechanism of the data model is in a real-time operation mode, wherein the cloud server is in communication connection with the target data lake.
Optionally, in the monitoring apparatus for a data model provided in the embodiment of the present application, the extracting unit 30 includes: a first determining module, configured to determine at least one preset element, where the at least one preset element includes at least one of: sample value, sample frequency, model output result and actual result related to the model output result; and the third storage module is used for storing at least one preset element into the intermediate table and compressing and storing other elements except the preset element in the target data lake into the intermediate table.
Optionally, in the monitoring apparatus for a data model provided in this embodiment of the present application, in a case that the preset element at least includes a sample value, a sample frequency, a model output result, and an actual result associated with the model output result, the calculating unit 40 includes: the second determination module is used for determining a calculation mode of the evaluation index and the grouping number of the model output result, wherein the calculation mode comprises an equal frequency mode and an equidistant mode, and the grouping number is the number of different evaluation intervals divided for the model output result; the third determining module is used for determining a plurality of model output results contained in each evaluation interval according to the calculation mode and the grouping number; the fourth determining module is used for determining the positive sample, the first proportion of the positive sample in the evaluation interval, the second proportion of the negative sample in the evaluation interval and the positive sample in each evaluation interval according to the model output result and the actual result; and the fifth determination module is used for determining the positive sample, the negative sample, the first proportion and the second proportion as the evaluation indexes.
Optionally, in the monitoring apparatus for a data model provided in an embodiment of the present application, the third determining module includes: the first determining submodule is used for determining the sum of the frequency of each sample of the model to be evaluated as the total number of the samples under the condition that the calculation mode is the equal frequency mode; the first calculation submodule is used for calculating a first ratio of the total number of the samples to the number of the packets and determining a plurality of sample values contained in each evaluation interval according to the first ratio; the second determining submodule is used for determining the difference value between the maximum value and the minimum value of the model output result under the condition that the calculation mode is the equidistant mode; and the second calculation submodule is used for calculating a second ratio of the difference value to the packet number and determining a plurality of sample values contained in each evaluation interval according to the second ratio.
Optionally, in the monitoring apparatus for a data model provided in the embodiment of the present application, the fourth determining module includes: the judging submodule is used for judging whether the model output result corresponding to each sample value is the same as the actual result or not; the third determining submodule is used for determining the sample value as a positive sample under the condition that the model output result is the same as the actual result; the fourth determining submodule is used for determining the sample value as a negative sample under the condition that the model output result is different from the actual result; the third calculation submodule is used for calculating the ratio of the total number of the positive samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a first ratio; and the fourth calculation submodule is used for calculating the ratio of the total number of the negative samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain a second ratio.
Optionally, in the monitoring apparatus for a data model provided in the embodiment of the present application, the monitoring unit 50 includes: and the input module is used for inputting the evaluation index into a preset formula to obtain a monitoring index, wherein the preset formula at least comprises one of a positive and negative sample discrimination index calculation formula and a sample stability index calculation formula.
The monitoring device of the data model comprises a processor and a memory, wherein the acquiring unit 10, the storing unit 20, the extracting unit 30, the calculating unit 40, the monitoring unit 50 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and computing resources and storage resources are saved when the kernel parameters are adjusted to monitor the data model.
The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium, on which a program is stored, the program implementing a monitoring method of a data model when being executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein a monitoring method of a data model is executed when the program runs.
As shown in fig. 4, an embodiment of the present invention provides an electronic device, where the device 401 includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps: a method for monitoring a data model. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: a method for monitoring a data model.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims (10)

1. A method for monitoring a data model, comprising:
obtaining models required in the process of processing target services, obtaining various data models, and determining a data source of each data model;
storing data in the data sources of the multiple data models to a target data lake;
extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating a data model;
calculating an evaluation index for a model to be evaluated in the multiple data models according to the preset elements;
and calculating a monitoring index of the model to be evaluated according to the evaluation index, and monitoring the model to be evaluated through the monitoring index.
2. The method of claim 1, wherein storing data in the data sources of the plurality of data models to a target data lake comprises:
judging whether the operation mode of each data model is a real-time operation mode;
under the condition that the operation mode of the data model is a non-real-time operation mode, storing data in a data source of the data model into the target data lake;
and under the condition that the operation mechanism of the data model is a real-time operation mode, storing data in a data source of the data model to a cloud server, wherein the cloud server is in communication connection with the target data lake.
3. The method of claim 1, wherein extracting at least one preset element from the target data lake comprises:
determining at least one preset element, wherein the at least one preset element at least comprises one of the following elements: sample value, sample frequency, model output result and actual result related to the model output result;
and storing the at least one preset element into an intermediate table, and compressing and storing other elements except the preset element in the target data lake into the intermediate table.
4. The method of claim 1, wherein in the case that the preset elements at least include sample values, sample frequencies, model output results, and actual results associated with the model output results, calculating an evaluation index for a model to be evaluated in the plurality of data models according to the preset elements comprises:
determining a calculation mode of the evaluation index and a grouping number of the model output result, wherein the calculation mode comprises an equal frequency mode and an equidistant mode, and the grouping number is the number of different evaluation intervals for dividing the model output result;
determining a plurality of model output results contained in each evaluation interval according to the calculation mode and the grouping number;
determining a positive sample, a first proportion of the positive sample in the evaluation interval, a negative sample and a second proportion of the negative sample in the evaluation interval according to the model output result and the actual result;
determining the positive sample, the negative sample, the first ratio, and the second ratio as the evaluation index.
5. The method of claim 4, wherein determining a plurality of model output results included in each evaluation interval according to the calculation method and the number of groupings comprises:
determining the sum of the frequency of each sample of the model to be evaluated as the total number of samples under the condition that the calculation mode is the equal frequency mode;
calculating a first ratio of the total number of samples to the number of packets, and determining a plurality of sample values contained in each evaluation interval according to the first ratio;
under the condition that the calculation mode is an equidistant mode, determining the difference value between the maximum value and the minimum value of the model output result;
and calculating a second ratio of the difference value to the packet number, and determining a plurality of sample values contained in each evaluation interval according to the second ratio.
6. The method of claim 4, wherein determining a positive sample, a first fraction of the positive samples within the evaluation interval, a negative sample, and a second fraction of the negative samples within the evaluation interval from the model output result and the actual result comprises:
judging whether the model output result corresponding to each sample value is the same as the actual result or not;
determining the sample value as a positive sample when the model output result is the same as the actual result;
determining the sample value as a negative sample if the model output result is different from the actual result;
calculating the ratio of the total number of positive samples to the total number of sample values in each evaluation interval to obtain the first ratio;
and calculating the ratio of the total number of the negative samples to the total number of the sample values in the evaluation interval in each evaluation interval to obtain the second ratio.
7. The method of claim 1, wherein calculating the monitoring index of the model to be evaluated according to the evaluation index comprises:
and inputting the evaluation index into a preset formula to obtain the monitoring index, wherein the preset formula at least comprises one of a positive and negative sample discrimination index calculation formula and a sample stability index calculation formula.
8. An apparatus for monitoring a data model, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring models required in the process of processing target services, acquiring various data models and determining a data source of each data model;
the storage unit is used for storing the data in the data sources of the multiple data models into a target data lake;
the extraction unit is used for extracting at least one preset element from the target data lake, wherein the preset element is an element required for evaluating a data model;
the calculation unit is used for calculating an evaluation index of a model to be evaluated in the multiple data models according to the preset elements;
and the monitoring unit is used for calculating a monitoring index of the model to be evaluated according to the evaluation index and monitoring the model to be evaluated through the monitoring index.
9. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of monitoring a data model of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and memory storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of monitoring of a data model of any of claims 1-7.
CN202211476936.2A 2022-11-23 2022-11-23 Data model monitoring method and device, processor and electronic equipment Pending CN115757035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211476936.2A CN115757035A (en) 2022-11-23 2022-11-23 Data model monitoring method and device, processor and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211476936.2A CN115757035A (en) 2022-11-23 2022-11-23 Data model monitoring method and device, processor and electronic equipment

Publications (1)

Publication Number Publication Date
CN115757035A true CN115757035A (en) 2023-03-07

Family

ID=85336256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211476936.2A Pending CN115757035A (en) 2022-11-23 2022-11-23 Data model monitoring method and device, processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN115757035A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091181A (en) * 2023-04-06 2023-05-09 神州数码融信云技术服务有限公司 Detection method and device, computer equipment and computer readable storage medium
CN116610308A (en) * 2023-07-13 2023-08-18 支付宝(杭州)信息技术有限公司 Code management method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091181A (en) * 2023-04-06 2023-05-09 神州数码融信云技术服务有限公司 Detection method and device, computer equipment and computer readable storage medium
CN116610308A (en) * 2023-07-13 2023-08-18 支付宝(杭州)信息技术有限公司 Code management method and device, electronic equipment and storage medium
CN116610308B (en) * 2023-07-13 2023-11-03 支付宝(杭州)信息技术有限公司 Code management method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115757035A (en) Data model monitoring method and device, processor and electronic equipment
US10031829B2 (en) Method and system for it resources performance analysis
CN110992167A (en) Bank client business intention identification method and device
CN110457175B (en) Service data processing method and device, electronic equipment and medium
CN107168995B (en) Data processing method and server
CN111258767A (en) Intelligent cloud computing resource allocation method and device for complex system simulation application
CN114742477B (en) Enterprise order data processing method, device, equipment and storage medium
CN115641162A (en) Prediction data analysis system and method based on construction project cost
CN107562532B (en) Method and device for predicting hardware resource utilization rate of equipment cluster
CN110008977B (en) Clustering model construction method and device
CN113010389A (en) Training method, fault prediction method, related device and equipment
CN116882520A (en) Prediction method and system for predetermined prediction problem
CN116827950A (en) Cloud resource processing method, device, equipment and storage medium
CN114490413A (en) Test data preparation method and device, storage medium and electronic equipment
CN111242430A (en) Power equipment supplier evaluation method and device
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN115730507A (en) Model engine construction method, kernel function processing method, device and storage medium
CN110910241B (en) Cash flow evaluation method, apparatus, server device and storage medium
CN116432069A (en) Information processing method, service providing method, device, equipment and medium
CN114518988B (en) Resource capacity system, control method thereof, and computer-readable storage medium
CN116010216A (en) Method, device, equipment and storage medium for evaluating health degree of data asset
CN113868141A (en) Data testing method and device, electronic equipment and storage medium
US20210357781A1 (en) Efficient techniques for determining the best data imputation algorithms
CN115757002A (en) Energy consumption determination method, device and equipment and computer readable storage medium
CN113676377A (en) Online user number evaluation method, device, equipment and medium based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination