CN112183644B

CN112183644B - Index stability monitoring method and device, computer equipment and medium

Info

Publication number: CN112183644B
Application number: CN202011056363.9A
Authority: CN
Inventors: 罗健; 陈远波
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2024-05-03
Anticipated expiration: 2040-09-29
Also published as: CN112183644A

Abstract

The invention relates to the field of data processing, and discloses a method, a device, computer equipment and a medium for monitoring index stability, wherein the method comprises the following steps: the method comprises the steps of obtaining historical data characteristics of a first preset period, using the historical data characteristics as an initial training set, carrying out stability detection on the initial training set to obtain a detection result, determining a target training set according to the detection result, and obtaining data characteristics in a prediction set according to a second preset period, wherein the second preset period is smaller than the first preset period, calculating the stability of the data characteristics in the prediction set relative to the data characteristics in the target training set in a preset mode, taking the stability as a target stability, and determining a monitoring result of a prediction set index based on the target stability and a preset stability threshold.

Description

Index stability monitoring method and device, computer equipment and medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and apparatus for monitoring stability of an index, a computer device, and a medium.

Background

With the rapid development of computer technology, more and more data processing involves machine learning and artificial intelligence, and a better model needs to be trained before the data processing is performed by adopting the machine learning and the artificial intelligence, so that the rapid data processing is performed by adopting the model.

In the model training process, the accuracy of the model training result is directly determined by the quality of the model-entering index, and good indexes (small fluctuation and strong predictability) have irreplaceable effects on the model, but the data are in various changes and cannot exist in an ideal form, so that the stability of the indexes is monitored in advance, the model effect is stabilized, and the model quality is ensured.

The existing index monitoring schemes are all index calculation tools for calculating the stability of each index and giving evaluation comments, but in an actual scene, the related indexes are more, and the indexes are directly calculated through the tools, so that the efficiency is lower, and therefore, an efficient index stability monitoring method is needed.

Disclosure of Invention

The embodiment of the invention provides a method, a device, computer equipment and a medium for monitoring index stability, so as to improve the efficiency of monitoring the index stability.

In order to solve the above technical problems, an embodiment of the present application provides a method for monitoring index stability, including:

Acquiring historical data characteristics of a first preset period as an initial training set;

Performing stability detection on the initial training set to obtain a detection result;

determining a target training set according to the detection result;

acquiring data characteristics in a prediction set according to a second preset period, wherein the second preset period is smaller than the first preset period;

Calculating the stability of the data features in the prediction set relative to the data features in the target training set in a preset mode, and taking the stability as target stability;

and determining a monitoring result of the prediction set index based on the target stability and a preset stability threshold.

Optionally, the acquiring the historical data feature of the first preset period includes, as an initial training set:

Performing data cleaning and normalization processing on the historical data characteristics to obtain initial data;

if continuous data exist in the initial data, discretizing the continuous data to obtain discrete data;

And performing one-time thermal coding on the discrete data, and taking the data subjected to one-time thermal coding as the initial training set.

Optionally, the determining the target training set according to the detection result includes:

If the detection result is that unstable data exists in the initial training set, adding the stable data into a target training set, and taking the unstable data as abnormal data;

Acquiring an unstable type corresponding to the abnormal data, and repairing the abnormal data according to a repairing scheme corresponding to the unstable type;

if the repair is successful, the repaired abnormal data is added into the target training set, and if the repair is failed, the abnormal data is removed.

Optionally, the calculating, by a preset manner, the stability of the data features in the prediction set relative to the data features in the target training set, where the calculating includes:

Carrying out box division processing on the data features in the prediction set according to a preset box division mode to obtain a first box division, and carrying out box division processing on the data features in the target training set to obtain a second box division;

Calculating stability indexes of data features in the sub-boxes and data features in the sub-boxes corresponding to the second sub-boxes according to any sub-box in the first sub-box to obtain basic stability;

And accumulating all the basic stability to obtain the target stability corresponding to the data characteristic.

Optionally, the step of performing the box division processing on the data features in the prediction set according to a preset box division manner, and the step of obtaining the first box includes:

Acquiring a box division configuration parameter from a preset configuration file, wherein the box division configuration parameter comprises the box number threshold;

obtaining m feature values contained in the data features, wherein m is a positive integer greater than 1;

storing m characteristic values into a preset characteristic value set, setting the initial value of the number k of the box dividing wheels to be 0, and setting the box dividing result of the 0 th wheel box dividing to be null, wherein k is [0, m-1];

aiming at each characteristic value in the characteristic value set, taking the characteristic value as a test splitting point, dividing the nominal variable into k+2 boxes on the basis of the box dividing result of the k-th round of box dividing, and calculating an association index value corresponding to the characteristic value to obtain m-k association index values;

taking a characteristic value corresponding to the maximum value in the m-k associated index values as a target splitting point, dividing the nominal variable into k+2 boxes on the basis of the box dividing result of the k-th round of box dividing, taking the nominal variable as the box dividing result of the k+1-th round of box dividing, and removing the characteristic value from the characteristic value set;

If the value of k+2 does not reach the preset bin number threshold value, returning to each characteristic value in the characteristic value set, taking the characteristic value as a test splitting point, dividing the nominal variable into k+2 bins on the basis of the bin dividing result of the k-th bin dividing, calculating the association index value corresponding to the characteristic value, and obtaining m-k association index values to continue execution, otherwise, stopping bin dividing, and determining the bin dividing result of the k+1-th bin dividing as the first bin dividing.

Optionally, the association index value is any one of an IV value, a kunit, and an information entropy.

In order to solve the above technical problem, an embodiment of the present application further provides a device for monitoring index stability, including:

the first data acquisition module is used for acquiring historical data characteristics of a first preset period and taking the historical data characteristics as an initial training set;

the first stability detection module is used for carrying out stability detection on the initial training set to obtain a detection result;

the target training set determining module is used for determining a target training set according to the detection result;

The second data acquisition module is used for acquiring data characteristics in the prediction set according to a second preset period, wherein the second preset period is smaller than the first preset period;

the second stability detection module is used for calculating the stability of the data features in the prediction set relative to the data features in the target training set in a preset mode and taking the stability as the target stability;

and the result determining module is used for determining the monitoring result of the prediction set index based on the target stability and a preset stability threshold value.

Optionally, the first data acquisition module includes:

The data preprocessing unit is used for carrying out data cleaning and normalization processing on the historical data characteristics to obtain initial data;

The data discretization unit is used for discretizing the continuous data to obtain discrete data if the continuous data exist in the initial data;

And the single-heat encoding unit is used for carrying out single-heat encoding on the discrete data and taking the data subjected to single-heat encoding as the initial training set.

Optionally, the target training set determining module includes:

The abnormal data determining unit is used for adding the stable data into a target training set and taking the unstable data as abnormal data if the detection result is that the unstable data exists in the initial training set;

The abnormal data repairing unit is used for acquiring an unstable type corresponding to the abnormal data and repairing the abnormal data according to a repairing scheme corresponding to the unstable type;

the abnormal data classifying unit is used for adding the repaired abnormal data into the target training set if the repair is successful, and removing the abnormal data if the repair is failed.

Optionally, the second stability detection module includes:

The data characteristic box dividing unit is used for carrying out box dividing processing on the data characteristics in the prediction set according to a preset box dividing mode to obtain a first box dividing, and carrying out box dividing processing on the data characteristics in the target training set to obtain a second box dividing;

the basic stability calculation unit is used for calculating stability indexes of data features in the sub-boxes and data features in the sub-boxes corresponding to the second sub-boxes aiming at any sub-box in the first sub-box to obtain basic stability;

And the target stability determining unit is used for accumulating all the basic stability to obtain the target stability corresponding to the data characteristic.

Optionally, the data feature binning unit comprises:

a parameter obtaining subunit, configured to obtain a box division configuration parameter from a preset configuration file, where the box division configuration parameter includes the box number threshold;

A feature value obtaining subunit, configured to obtain m feature values included in the data feature, where m is a positive integer greater than 1;

The initialization unit is used for storing m characteristic values into a preset characteristic value set, setting the initial value of the number k of the box dividing wheels as 0 and setting the box dividing result of the 0 th round of box dividing as empty, wherein k is [0, m-1];

The association index value calculation unit is used for dividing the nominal variable into k+2 boxes on the basis of the box dividing result of the k-th round of box dividing by taking the characteristic value as a test splitting point for each characteristic value in the characteristic value set, and calculating association index values corresponding to the characteristic values to obtain m-k association index values;

the bin dividing result determining unit is used for taking a characteristic value corresponding to the maximum value of m-k associated index values as a target splitting point, dividing the nominal variable into k+2 bins on the basis of the bin dividing result of the k-th round of bin dividing, taking the nominal variable as the bin dividing result of the k+1-th round of bin dividing, and removing the characteristic value from the characteristic value set;

and the loop iteration unit is used for returning each characteristic value in the characteristic value set if the value of k+2 does not reach the preset bin number threshold, taking the characteristic value as a test splitting point, dividing the nominal variable into k+2 bins on the basis of the bin dividing result of the k-th bin dividing, calculating the association index value corresponding to the characteristic value, and obtaining m-k association index values to continue execution, otherwise, stopping bin dividing, and determining the bin dividing result of the k+1-th bin dividing as the first bin dividing.

In order to solve the above technical problems, an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the method for monitoring the stability of the indicator when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the method for monitoring the stability of the index.

According to the method, the device, the computer equipment and the medium for monitoring the index stability, the historical data characteristics of the first preset period are obtained and used as an initial training set, stability detection is carried out on the initial training set to obtain a detection result, the target training set is determined according to the detection result, so that the data stability can be detected quickly according to the target training set as a reference, the accuracy of the subsequent data stability identification is improved, meanwhile, the data characteristics in the prediction set are obtained according to the second preset period, wherein the second preset period is smaller than the first preset period, the stability of the data characteristics in the prediction set relative to the data characteristics in the target training set is calculated in a preset mode to serve as target stability, the monitoring result of the index of the prediction set is determined based on the target stability and a preset stability threshold, the stability monitoring of the data characteristics in the prediction set relative to the data characteristics in the target training set is achieved through calculation, and the efficiency of the data stability detection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of monitoring index stability of the present application;

FIG. 3 is a schematic diagram of one embodiment of a monitoring device for index stability according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Description of the embodiments

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III, moving Picture expert compression standard audio plane 3), MP4 players (Moving Picture E interface display perts Group Audio Layer IV, moving Picture expert compression standard audio plane 4), laptop and desktop computers, and so on.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the method for monitoring the index stability provided by the embodiment of the application is executed by the server, and accordingly, the device for monitoring the index stability is arranged in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements, and the terminal devices 101, 102, 103 in the embodiment of the present application may specifically correspond to application systems in actual production.

Referring to fig. 2, fig. 2 shows a method for monitoring index stability according to an embodiment of the present invention, and the method is applied to the server in fig. 1 for illustration, and is described in detail as follows:

s201: and acquiring the historical data characteristics of the first preset period as an initial training set.

Specifically, in the actual service requirement, the prediction model needs to be iterated periodically, the iteration of the model involves screening the data features, the advantages and disadvantages of the data features (i.e. the modeling indexes) of the input model directly affect the prediction result of the model, in this embodiment, the first preset period is spaced, the historical data features in the period are obtained and used as the initial training set, and then the indexes are screened from the initial training set indexes later.

The historical data features refer to data features used in the historical service, and specifically, different data features exist according to different services, which is not limited herein.

Preferably, in this embodiment, the first preset period is one month.

For example, in a specific embodiment, the preset model is a traffic situation trend prediction model, the first preset period is 1 month, and the historical data characteristics of about 1 month are obtained from the service library and used as the initial training set.

S202: and (3) performing stability detection on the initial training set to obtain a detection result.

Specifically, after the initial training set is obtained, stability of data features in the initial training set needs to be detected, and data restoration is performed on unstable data features in the initial training set, where the stability of the data features is detected, and in this embodiment, stability indexes (population stability index, PSI) of the data features in the initial training set and the historical data features may be calculated.

The PSI (PopulationStability Index, stability index) is used for stability of each feature in the initial training set, and in this embodiment, the stability of the initial training set may be specifically detected by the following manner: preprocessing and single-heat encoding are carried out on data features in an initial training set to obtain digital data, similarity calculation is further carried out on the digital data, the digital data with similarity exceeding a preset threshold value is used as peer data, the most peer data is obtained and used as a reference peer, the average value of the digital data in the reference peer is further obtained, PSI (program specific information) of each digital data and the average value is calculated respectively and used as a stable value of the digital data, and then a detection result is determined according to a comparison result of the stable value and the preset stable threshold value.

In this embodiment, the detection result includes a data feature stability in the initial training set and a partial data feature instability in the initial training set, where the partial data feature instability in the initial training set includes, but is not limited to: logic anomalies, periodic index anomalies, surge anomalies, and the like.

S203: and determining a target training set according to the detection result.

Specifically, after the detection result is obtained, if the detection result is that the data features in the initial training set are stable, the initial training set is determined to be the target training set, if partial data features in the initial training set are unstable, the unstable data features are repaired, and the repaired stable data features are used as the target training set.

Further, different types of instability and restoration modes of data features of each type of instability can be preset, so that source data can be reserved as much as possible, and the problem that the accuracy of subsequent stability monitoring is reduced due to the fact that data of a certain type are all rejected due to the data attribute problem is avoided. For specific procedures for data repair, reference may be made to the description of the following embodiments, and for avoiding repetition, details are not repeated here.

S204: and acquiring the data characteristics in the prediction set according to a second preset period, wherein the second preset period is smaller than the first preset period.

Specifically, after a target training set with stable data features is obtained, the target training set is used to monitor the stability of the data features in the prediction set periodically, and in this embodiment, the period of the data features in the prediction set is set to be a second preset period, and the second preset period is smaller than the first preset period.

The prediction set is a set of data features that need to be input into the model to predict the service processing result according to the service requirement, and it should be understood that if the data features input into the model are not stable enough, the accuracy of the model prediction result may be reduced, so that the stability of the data features in the prediction set needs to be monitored through stable and reliable data features.

Preferably, when the first preset period is 1 month, the second preset period is 1 day, that is, each time a month passes, the historical data feature of the last month is obtained, the target training set is obtained according to the obtained historical data feature, and in the subsequent month, the stability of the real-time data of each day is evaluated by using the target training set.

S205: and calculating the stability of the data features in the prediction set relative to the data features in the target training set in a preset mode, and taking the stability as the target stability.

Specifically, the data features in the target training set are all stable data features, the stability of the data features in the prediction set is determined by evaluating the fluctuation condition of the data features in the prediction set relative to the data features in the target training set, and in this embodiment, after the data features in the second preset period are acquired, the stability of the data features in the prediction set relative to the data features in the target training set is calculated according to a preset mode, so as to obtain the target stability.

The preset ways of performing the stability evaluation include, but are not limited to: accuracy (Precision), recall (Recall), F-value (F-Measure), rank correlation, singular value decomposition (Singular Value Decomposition, SVD) algorithm, etc., may be set according to actual requirements, and are not specifically limited herein.

S206: and determining a monitoring result of the prediction set index based on the target stability and a preset stability threshold.

Specifically, based on the target stability and a preset stability threshold, determining a monitoring result of the index of the prediction set, when the target stability does not exceed the preset stability threshold, confirming that the monitoring result is normal, and when the target stability exceeds the preset stability threshold, confirming that the monitoring result is not stable enough.

Preferably, the preset stability threshold is 0.25.

In this embodiment, the historical data features of the first preset period are obtained as an initial training set, stability detection is performed on the initial training set to obtain a detection result, a target training set is determined according to the detection result, so that subsequent detection of data stability can be rapidly performed according to the target training set as a reference, which is favorable for improving accuracy of subsequent data stability identification, and meanwhile, the data features in the prediction set are obtained according to the second preset period, wherein the second preset period is smaller than the first preset period, stability of the data features in the prediction set relative to the data features in the target training set is calculated in a preset manner and is used as a target stability, and a monitoring result of a prediction set index is determined based on the target stability and a preset stability threshold, so that stability monitoring is performed on the data features in the prediction set relative to the data features in the target training set through calculation, which is favorable for improving efficiency of data stability detection.

In some optional implementations of this embodiment, in step S201, the obtaining the historical data feature of the first preset period includes:

And performing single-heat coding on the discrete data, and taking the data subjected to single-heat coding as an initial training set.

Specifically, after the historical data features are obtained, data cleaning and standardization processing are carried out on the historical data features to obtain initial data, each piece of initial data comprises a plurality of attribute features, the types of each attribute feature are divided into two types of continuous type and discrete type, the feature types of the initial data are further identified, if the feature types are continuous type, discretization processing is carried out on the continuous type data features, the continuous type data features are converted into discrete type data, and independent heat encoding is carried out on the discrete type data to obtain an initial training set.

The data cleaning refers to cleaning some attribute incomplete data or redundant data and data which do not belong to a preset range, and the normalization processing refers to normalization processing of the data in the form, the attribute, the range and the like.

The attribute features are specific one of the initial data, and in the financial field, one piece of data often contains a plurality of attribute features, for example, one piece of initial data is user information data, which contains a user name, a user gender, a contact way, a transacted business and the like, and each item is one attribute feature.

The continuous attribute features are attribute features which can be arbitrarily valued in a certain interval, the values of the continuous attribute features are continuous, two adjacent values can be infinitely divided, and infinite values can be taken, for example, the specification and the size of a production part, the height, the weight, the chest circumference and the like of a human body are continuous attribute features, and the values can only be obtained by a measuring or metering method.

Wherein, the discrete attribute features refer to data with feature values listed one by one in a certain order, and the feature values are usually valued in integer bits. Such as the number of workers, the number of factories, the number of machines, etc., the numerical value of the discrete attribute features is obtained by a counting method.

In this embodiment, null value filling is performed on the discrete attribute features with missing values, and the null value filling is performed as a special character "NA", so that the influence of subsequent stability calculation in the initial data caused by the fact that the attribute features have no corresponding feature values is avoided.

Optionally, when the historical data features are more, the dimension reduction processing is performed on the historical data features, so that excessive unimportant features are prevented from participating in operation, the operation amount and the occupation of system resources are reduced, and the data processing efficiency is improved.

Further, for each initial data, if it has m different attribute features, m binary features are obtained according to one-hot encoding (one-hot encoding). And the characteristic values are mutually exclusive, only one characteristic value is activated at a time, the activated characteristic value is set to be 1, the rest characteristic values which are not activated are set to be constant 0, and finally, the basic digital code corresponding to each characteristic value of the attribute characteristic is obtained.

It should be understood that the single-hot encoding mode can change the data in the original state into sparse data, can better solve the problem of classifying the attribute data feature samples by data mining, and plays a role of expanding features to a certain extent, wherein the data in the original state refers to the initial data and the value range of the attribute features thereof.

For example, when the attribute feature is "sex", the range of values of the feature values includes two values of "male" and "female", that is, gender= [ "male", "female" ], the digital code corresponding to the sex "male" is gender= [1,0], and the digital code corresponding to the sex female is gender= [0,1].

It is worth to say that, because the value-taking mode and the value-taking range of the attribute feature are different, the evaluation effect of stability can be affected, and the unique thermal coding is adopted for the feature values of different attribute features, so that the feature values in the original state can be changed into sparse data, the negative influence on stability calculation due to the fact that the value-taking modes of different feature values are different is avoided, and the accuracy of stability calculation is improved.

In the embodiment, the data cleaning, planning processing, discretization and single-heat encoding are performed on the historical data characteristics, so that the dimension and complexity of the data are reduced, the operand is reduced in the subsequent data processing process, the data processing efficiency is improved, and meanwhile, the data with different characterizations are quantized, and the accuracy of the subsequent stability detection is improved.

In some optional implementations of the present embodiment, in step S203, determining the target training set according to the detection result includes:

if the detection result is that unstable data exists in the initial training set, adding the stable data into the target training set, and taking the unstable data as abnormal data;

Specifically, if the detection result is that unstable data exists in the initial training set, adding the stable data into the target training set, taking the unstable data as abnormal data, judging the type of the instability corresponding to the abnormal data, and repairing the abnormal data according to the instability corresponding to the abnormal data.

The method for repairing the unstable data features comprises the steps of adopting different strategies according to different reasons for causing the instability, and specifically comprising the following steps:

Aiming at instability caused by logic abnormality, reprocessing logic, performing logic repair, and re-refreshing training set data repair;

For the abnormality caused by the upward or downward fluctuation of the index due to the fluctuation of the overall index, the processing mode is to perform normalization processing, so that the stability of the index (data characteristic) is ensured;

for instability caused by periodic indexes, certain indexes are considered to periodically change along with time, so that the indexes are not processed;

And aiming at the data characteristics with normal fluctuation of the data characteristics and the data characteristics with abnormal fluctuation of a certain box or a plurality of boxes, carrying out the box division and the moulding processing.

In this embodiment, the abnormal data whose detection result is unstable is repaired, so that the accuracy of the data in the target training set is ensured, and the accuracy of the subsequent target stability calculation is ensured.

In some optional implementations of this embodiment, in step S205, calculating, by a preset manner, a stability of the data features in the prediction set relative to the data features in the target training set, where the stability includes:

And accumulating all the basic stability to obtain the target stability corresponding to the data characteristics.

Specifically, the data features in the prediction set and the data features in the target training set are more, so that the stability of the data features in the prediction set is more quickly and accurately determined.

Among them, data binning (also known as discrete binning or segmentation) is a data preprocessing technique that reduces the effects of minor observation errors, a method of grouping multiple consecutive values into a smaller number of "bins".

The preset box dividing mode can be selected according to actual requirements, and common box dividing modes include, but are not limited to, equal-frequency box dividing, equal-width box dividing, k-means clustering-based box dividing and the like.

In this embodiment, the stability is calculated by dividing the data features in the target training set into boxes in a manner of dividing the data features into boxes, so as to obtain the target stability, which is favorable for improving the efficiency of stability calculation.

In some optional implementations of this embodiment, performing, according to a preset binning manner, binning processing on data features in the prediction set, where obtaining the first bin includes:

acquiring a box division configuration parameter from a preset configuration file, wherein the box division configuration parameter comprises a box number threshold;

obtaining m characteristic values contained in the data characteristics, wherein m is a positive integer greater than 1;

Storing m characteristic values into a preset characteristic value set, setting the initial value of the number k of the box dividing wheels as 0, and setting the box dividing result of the 0 th wheel box dividing as null, wherein k is [0, m-1];

aiming at each characteristic value in the characteristic value set, taking the characteristic value as a test splitting point, dividing a nominal variable into k+2 boxes on the basis of a box dividing result of a k-th round of box dividing, and calculating associated index values corresponding to the characteristic values to obtain m-k associated index values;

Taking a characteristic value corresponding to the maximum value in the m-k associated index values as a target splitting point, dividing a nominal variable into k+2 boxes on the basis of the box dividing result of the k-th round of box dividing, taking the nominal variable as the box dividing result of the k+1-th round of box dividing, and removing the characteristic value from the characteristic value set;

If the value of k+2 does not reach the preset bin number threshold value, returning to each characteristic value in the characteristic value set, taking the characteristic value as a test splitting point, dividing a nominal variable into k+2 bins on the basis of a bin dividing result of the k-th bin dividing, calculating the association index value corresponding to the characteristic value, and obtaining m-k association index values to continue to execute, otherwise, stopping bin dividing, and determining a bin dividing result of the k+1-th bin dividing as a first bin dividing result.

Specifically, m feature values contained in preset configuration parameters and data features are obtained, and then a target splitting point is determined to split according to a mode of calculating an association index value, and splitting is stopped when the obtained bin number reaches a bin number threshold value, so that a first bin is obtained.

It should be understood that the preset configuration parameters include a threshold number of bins that are ultimately required to be divided into bins, and may be configured according to actual needs, which is not limited herein.

Wherein, the association index value is any one of IV value, coefficient of kunity and information entropy.

In the embodiment, the splitting points are determined by calculating the association index values, so that the risk of overfitting of the model is reduced, the increase and the decrease of the scattered features are easy, the quick iteration of the model is easy, and the efficiency and the accuracy of data feature box division are improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Fig. 3 shows a schematic block diagram of an index stability monitoring apparatus in one-to-one correspondence with the index stability monitoring method of the above embodiment. As shown in fig. 3, the monitoring apparatus for index stability includes a first data acquisition module 31, a first stability detection module 32, a target training set determination module 33, a second data acquisition module 34, a second stability detection module 35, and a result determination module 36. The functional modules are described in detail as follows:

A first data obtaining module 31, configured to obtain a historical data feature of a first preset period as an initial training set;

a first stability detection module 32, configured to perform stability detection on the initial training set to obtain a detection result;

a target training set determining module 33, configured to determine a target training set according to the detection result;

A second data obtaining module 34, configured to obtain data features in the prediction set according to a second preset period, where the second preset period is smaller than the first preset period;

the second stability detection module 35 is configured to calculate, in a preset manner, a stability of the data features in the prediction set relative to the data features in the target training set, as a target stability;

the result determining module 36 is configured to determine a monitoring result of the predictor set indicator based on the target stability and a preset stability threshold.

Optionally, the first data acquisition module 31 includes:

And the single-heat encoding unit is used for carrying out single-heat encoding on the discrete data and taking the data after single-heat encoding as an initial training set.

Optionally, the target training set determining module 33 includes:

the abnormal data determining unit is used for adding stable data into the target training set and taking the unstable data as abnormal data if the detection result is that the unstable data exists in the initial training set;

Optionally, the second stability detection module 35 includes:

Optionally, the data feature binning unit comprises:

The parameter acquisition subunit is used for acquiring the box division configuration parameters from a preset configuration file, wherein the box division configuration parameters comprise a box number threshold;

The initialization unit is used for storing m characteristic values into a preset characteristic value set, setting the initial value of the number k of the box dividing wheels as 0 and setting the box dividing result of the 0 th wheel box dividing to be empty, wherein k is [0, m-1];

The bin dividing result determining unit is used for taking a characteristic value corresponding to the maximum value in the m-k associated index values as a target splitting point, dividing a nominal variable into k+2 bins on the basis of the bin dividing result of the k-th round of bin dividing, taking the nominal variable as the bin dividing result of the k+1-th round of bin dividing, and removing the characteristic value from the characteristic value set;

and the loop iteration unit is used for returning each characteristic value in the characteristic value set if the value of k+2 does not reach the preset bin number threshold value, taking the characteristic value as a test splitting point, dividing the nominal variable into k+2 bins on the basis of the bin dividing result of the k-th round of bin dividing, calculating the association index value corresponding to the characteristic value, and obtaining m-k association index values to continue execution, otherwise, stopping bin dividing, and determining the bin dividing result of the k+1-th round of bin dividing as the first bin dividing.

For specific limitations of the monitoring device for index stability, reference may be made to the above limitation of the monitoring method for index stability, and no further description is given here. All or part of each module in the index stability monitoring device can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only a computer device 4 having a component connection memory 41, a processor 42, a network interface 43 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used for storing an operating system and various application software installed on the computer device 4, such as program codes for controlling electronic files, etc. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute a program code stored in the memory 41 or process data, such as a program code for executing control of an electronic file.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The present application also provides another embodiment, namely, a computer readable storage medium storing an interface display program, where the interface display program is executable by at least one processor, so that the at least one processor performs the steps of the method for monitoring the stability of an index as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The method for monitoring the index stability is characterized by comprising the following steps:

determining a target training set according to the detection result;

determining a monitoring result of the prediction set index based on the target stability and a preset stability threshold;

The step of performing stability detection on the initial training set to obtain a detection result comprises the following steps:

Preprocessing and single-heat encoding are carried out on the data characteristics in the initial training set to obtain digital data;

performing similarity calculation on the data, and taking the digital data with the similarity exceeding a preset threshold value as homologous data;

Acquiring the family with the most homologous data as a reference family;

Acquiring an average value of the digital data in the reference family, and respectively calculating PSI of each digital data and the average value as a stable value of the digital data;

determining a detection result according to a comparison result of the stable value and a preset stable threshold value;

wherein, according to the detection result, determining the target training set includes:

If the repair is successful, adding the repaired abnormal data into the target training set, and if the repair is failed, removing the abnormal data;

the repairing of the abnormal data according to the repairing scheme corresponding to the unstable type comprises the following steps:

for the abnormality caused by upward or downward fluctuation of the index due to the fluctuation of the overall index, the processing mode is to perform normalization processing, so that the stability of the index is ensured;

For the data characteristics with abnormal data characteristics but normal fluctuation of a certain box or a plurality of boxes, carrying out box-division and die-entry processing;

The calculating, by a preset manner, the stability of the data features in the prediction set relative to the data features in the target training set, where the calculating includes:

2. The method for monitoring the stability of an indicator according to claim 1, wherein the step of obtaining the historical data characteristic of the first preset period includes, as an initial training set:

3. The method for monitoring the stability of an index according to claim 1, wherein the step of performing the binning process on the data features in the prediction set according to a preset binning manner, to obtain a first bin comprises:

obtaining a box division configuration parameter from a preset configuration file, wherein the box division configuration parameter comprises a box number threshold;

Aiming at each characteristic value in the characteristic value set, taking the characteristic value as a test splitting point, dividing a nominal variable into k+2 boxes on the basis of a box dividing result of a k-th round of box dividing, and calculating an association index value corresponding to the characteristic value to obtain m-k association index values;

4. The method for monitoring the stability of an index according to claim 3, wherein the association index value is any one of an IV value, a coefficient of kunning, and an information entropy.

5. The utility model provides a monitoring device of index stability which characterized in that includes:

the result determining module is used for determining a monitoring result of the prediction set index based on the target stability and a preset stability threshold;

The first stability detection module is further used for preprocessing and single-heat encoding the data features in the initial training set to obtain digital data; performing similarity calculation on the data, and taking the digital data with the similarity exceeding a preset threshold value as homologous data; acquiring the family with the most homologous data as a reference family; acquiring an average value of the digital data in the reference family, and respectively calculating PSI of each digital data and the average value as a stable value of the digital data; determining a detection result according to a comparison result of the stable value and a preset stable threshold value;

Wherein, the target training set determining module comprises:

The abnormal data classifying unit is used for adding the repaired abnormal data into the target training set if the repair is successful, and removing the abnormal data if the repair is failed;

The abnormal data restoration unit is also used for reprocessing logic aiming at instability caused by logic abnormality, carrying out logic restoration and re-brushing training set data restoration; for the abnormality caused by upward or downward fluctuation of the index due to the fluctuation of the overall index, the processing mode is to perform normalization processing, so that the stability of the index is ensured; for instability caused by periodic indexes, certain indexes are considered to periodically change along with time, so that the indexes are not processed; for the data characteristics with abnormal data characteristics but normal fluctuation of a certain box or a plurality of boxes, carrying out box-division and die-entry processing;

Wherein the second stability detection module comprises:

6. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method for monitoring the stability of an indicator according to any of claims 1 to 4 when the computer program is executed by the processor.

7. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of monitoring the stability of an index according to any one of claims 1 to 4.