CN113641525A

CN113641525A - Variable exception recovery method, apparatus, medium, and computer program product

Info

Publication number: CN113641525A
Application number: CN202110926804.4A
Authority: CN
Inventors: 要卓; 陈婷; 吴三平; 庄伟亮
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-12

Abstract

The application discloses a variable abnormity repairing method, equipment, a medium and a computer program product, wherein the variable abnormity repairing method comprises the following steps: acquiring a data set to be predicted, predicting a missing variable in the data set to be predicted through a missing data prediction model, and acquiring a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a pre-collected missing data set and a complete data set, and repairing the missing variable based on the data prediction result to obtain a variable repairing result. The method and the device solve the technical problem of low accuracy of model prediction.

Description

Variable exception recovery method, apparatus, medium, and computer program product

Technical Field

The present application relates to the field of machine learning techniques for financial technology (Fintech), and in particular, to a method, device, medium, and computer program product for variable anomaly recovery.

Background

With the continuous development of financial science and technology, especially internet science and technology, more and more technologies (such as distributed technology, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlog in the financial industry are also put forward.

With the development of computer technology, the application of federal learning is more and more extensive. Currently, in the case of risk model application, when a data source corresponding to a variable in a model is abnormal, so that the value of the variable is missing, and the missing variable cannot be repaired in a short time, in order to weaken model deviation caused by data missing, and in order to weaken model deviation caused by data missing, the value corresponding to the missing variable can be repaired by filling, at present, a method for repairing missing data by filling often uses the mean value of historical data, directly uses a default value, or uses the last non-missing value to repair, however, uses the mean value of historical data or directly uses the default value to repair by filling, when the importance of missing data is higher, a larger noise error still exists, and uses the last non-missing value to repair by filling, changes caused by recent data updating can be ignored, which in turn results in less accurate model predictions.

Disclosure of Invention

The present application mainly aims to provide a variable anomaly repairing method, device, medium, and computer program product, and aims to solve the technical problem of low accuracy of model prediction in the prior art.

In order to achieve the above object, the present application provides a variable anomaly repairing method, including:

acquiring a data set to be predicted;

and predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance.

The present application also provides a variable abnormality repairing apparatus, which is a virtual apparatus, the variable abnormality repairing apparatus including:

the acquisition module is used for acquiring a data set to be predicted;

the prediction module is used for predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;

and the repairing module is used for repairing the missing variable based on the data prediction result to obtain a variable repairing result.

The present application further provides a variable anomaly repairing apparatus, which is an entity apparatus, and includes: the variable exception recovery method comprises a memory, a processor and a variable exception recovery program stored on the memory, wherein the variable exception recovery program can realize the steps of the variable exception recovery method when being executed by the processor.

The present application further provides a medium, which is a readable storage medium, where a variable exception recovery program is stored on the readable storage medium, and when executed by a processor, the variable exception recovery program implements the steps of the variable exception recovery method as described above.

The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the variable anomaly repair method as described above or performs the steps of the data prediction method as described above.

Compared with the technical means of filling and repairing missing data by means of the mean value of historical data, a default value or a latest non-missing value in the prior art, the method for repairing the abnormal variable comprises the steps of firstly obtaining a data set to be predicted, then predicting the missing variable in the data set to be predicted by a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by iteratively training and optimizing a missing data prediction model to be trained by a pre-collected missing data set and a complete data set, so that the weight difference between the missing variable in the current missing data set and each complete variable in the complete data set is learned by the missing data prediction model based on the missing data set and the complete data set, furthermore, based on the data prediction result, the missing variable is repaired to obtain a variable repair result, so that the model deviation caused by data missing is reduced by repairing the missing variable, the accuracy of the missing data prediction model is improved, the missing data is more accurately predicted by the missing data prediction model, the technical defect that in the prior art, the change caused by updating of recent data is ignored, the accuracy of model prediction is lower due to the fact that the change caused by updating of recent data is ignored when the importance of the missing data is higher, and the accuracy of model prediction is improved is overcome.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram illustrating a variable anomaly restoration method according to a first embodiment of the present application;

FIG. 2 is a schematic flow chart of a variable anomaly repairing method according to a second embodiment of the present application;

FIG. 3 is a flowchart illustrating a variable anomaly repairing method according to a third embodiment of the present application;

FIG. 4 is a schematic flow chart of a variable anomaly repairing method according to a fourth embodiment of the present application;

FIG. 5 is a schematic flow chart of a missing data prediction model obtained by iterative training optimization in the variable anomaly restoration method of the present application;

fig. 6 is a schematic device structure diagram of a hardware operating environment related to a variable exception recovery method in an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In a first embodiment of the variable anomaly repairing method, referring to fig. 1, the variable anomaly repairing method includes:

step S10, acquiring a data set to be predicted;

in this embodiment, it should be noted that the data set to be predicted includes a history value set corresponding to a missing variable with missing data and a history value set corresponding to each complete variable without missing data.

Step S20, predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;

in this embodiment, it should be noted that the predicting is to perform predictive restoration on the value of the missing variable of the missing data in the data set to be predicted, the missing data set is a data set corresponding to the missing variable having the missing data in the historical data set corresponding to the sample client, the complete data set is a data set corresponding to each complete variable having no missing data in the historical data set corresponding to the sample client, and the historical data set is a data set corresponding to all variables of the sample client.

Predicting missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and an integral data set which are collected in advance, specifically, performing iterative training optimization on the missing data prediction model to be trained through the integral data set and the missing data set, wherein the missing data prediction model to be trained comprises a long-short term memory network model and a full-link neural network model, and judging whether the optimized missing data prediction model to be trained meets a preset training ending condition, wherein the preset training ending condition comprises conditions of loss function convergence, reaching a maximum iteration number threshold value and the like, and if so, obtaining the missing data prediction model, if not, returning to the execution step: iteratively training and optimizing a missing data prediction model to be trained through the complete data set and the missing data set to obtain the missing data prediction model, and then inputting the data set to be predicted into the missing data prediction model, namely inputting the historical values of the missing variables in the data set to be predicted and the historical values of other complete variables in the data set to be predicted into the missing data prediction model, performing dimensionality reduction processing on the historical values of the other complete variables through a long-short term memory network model in the missing data prediction model to obtain characteristic information results of the other complete variables, and further inputting the characteristic information results of the other complete variables and the historical values of the missing variables into a fully-connected neural network model in the missing data prediction model, and outputting data prediction results of values corresponding to the missing variables in the data set to be predicted.

Wherein the missing data prediction model comprises a long-term and short-term memory network model and a fully-connected neural network model,

the step of predicting the missing variables in the data set to be predicted through the missing data prediction model to obtain the data prediction result comprises the following steps:

step S21, inputting the historical values of all the complete variables in the data set to be predicted into the long-short term memory network model, and outputting the characteristic information results corresponding to all the complete variables in the data set to be predicted;

in this embodiment, it should be noted that the historical values of the complete variables are values corresponding to the complete variables in a preset time.

And specifically, the historical values of the complete variables in the data set to be predicted are input into the long-short term memory network model, so that the multidimensional time sequence data set is subjected to dimensionality reduction, and the characteristic information results corresponding to the complete variables in the data set to be predicted are obtained.

Step S22, inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result;

in this embodiment, it should be noted that the historical value of the missing variable is a value corresponding to the missing variable in a preset time.

Inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result, specifically, abstracting the historical value of the missing variable in the data set to be predicted into a multi-dimensional characteristic vector, setting a first dimension vector in the multi-dimensional characteristic vector as the number of the missing variable in the data set to be predicted, setting empty data in the missing variable in the multi-dimensional characteristic vector as a preset value, setting the remaining dimension characteristic vectors as the historical values corresponding to the missing variable in the data set to be predicted, inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the multi-dimensional characteristic vector into the fully-connected neural network model together, and further outputting the data prediction result of the value corresponding to the missing variable in the data set to be predicted, and the data prediction result is as close as possible to the real value.

And step S30, repairing the missing variable based on the data prediction result to obtain a variable repairing result.

In this embodiment, the missing variables are repaired based on the data prediction result to obtain a variable repair result, and specifically, the missing data in the missing variables are repaired by the data prediction result to obtain the variable repair result, so that the model deviation caused by data missing is reduced, and the risk prediction of the model is better performed.

Compared with the technical means of filling and repairing missing data through the mean value of historical data, directly using a default value or using a latest non-missing value in the prior art, the method for repairing the abnormal variable comprises the steps of firstly obtaining a data set to be predicted, then predicting the missing variable in the data set to be predicted through a missing data prediction model, and obtaining a data prediction result, wherein the missing data prediction model is obtained by performing iterative training and optimization on the missing data prediction model to be trained through a pre-collected missing data set and a complete data set, so that the weight difference between the missing variable in the current missing data set and each complete variable in the complete data set is learned through the missing data prediction model based on the missing data set and the complete data set, furthermore, based on the data prediction result, the missing variable is repaired to obtain a variable repair result, so that the model deviation caused by data missing is reduced by repairing the missing variable, the accuracy of the missing data prediction model is improved, the missing data is more accurately predicted by the missing data prediction model, the technical defect that in the prior art, the change caused by updating of recent data is ignored, the accuracy of model prediction is lower due to the fact that the change caused by updating of recent data is ignored when the importance of the missing data is higher, and the accuracy of model prediction is improved is overcome.

Further, referring to fig. 2, based on the first embodiment in the present application, in another embodiment in the present application, before the missing data prediction model is obtained by performing iterative training optimization on a missing data prediction model to be trained through a missing data set and a complete data set collected in advance, the variable anomaly repairing method includes:

step A10, acquiring a historical data set of a sample client;

in this embodiment, it should be noted that the historical data set is a data set corresponding to all characteristic variables of the sample client.

And acquiring a historical data set of the sample client, specifically, selecting the sample client, and acquiring the historical data set corresponding to the sample client.

Step A20, performing missing processing on the historical data set of the sample client to obtain a missing data set in the historical data set and a complete data set in the historical data set;

in this embodiment, it should be noted that the missing processing is a processing mode in which the historical data set is set according to a preset missing degree, the preset missing degree is the number of missing data corresponding to missing variables in the historical data set, the missing data set is a data set corresponding to missing variables having missing data in the historical data set corresponding to the sample client, and the complete data set is a data set corresponding to each complete variable having no missing data in the historical data set corresponding to the sample client.

Specifically, if the historical data set meets a preset value condition, a time point set corresponding to the historical data set is obtained, wherein the preset value condition is that each characteristic variable in the historical data set has a corresponding value within a preset observation time length, the time point set is a set of each time point corresponding to the historical data set, and a time point random number is randomly generated in the time point set to obtain a target time point, it is required to say that a plurality of time point random numbers can also be randomly generated in the time point set to obtain a plurality of target time points, further, a loss degree random number is randomly generated in the preset observation time length to obtain a loss degree corresponding to the loss variable in the historical data set, wherein the preset observation time is a preset time closest to the target time point, and based on the target time point, the value of each complete variable in the historical data set is set as the complete data set, and the value of the missing variable in the historical data set is set according to the missing degree, so as to obtain the missing data set, for example, assuming that the preset observation time is 5 days, the target time point is 6 and 6 days in 2021 year, and the missing degree is 2, by taking the target time point as a reference, the value corresponding to the missing variable with the missing data from 6 and 4 days in 2021 year to 6 and 5 days in 2021 year is set as a null value, and the value corresponding to the missing variable with the missing data from 6 and 1 days in 2021 year to 6 and 3 days in 2021 year is set as a true value corresponding to the missing variable in the historical data set from 6 and 1 day in 2021 year to 6 and 3 months in 2021 year, and then taking the corresponding values of the missing variables from 6/1/2021 to 6/5/2021 as the missing data set together, and taking the corresponding values of the complete variables in the historical data set from 6/1/2021 to 6/5/2021 as the complete data set.

And A30, performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model.

In this embodiment, it should be noted that the missing data prediction model to be trained includes a long-term and short-term memory network model and a fully-connected neural network model, and the missing data prediction model is a model for predicting missing data.

Iteratively training a missing data prediction model to be trained by the missing data set and the complete data set to obtain a missing data prediction model, specifically, inputting the complete data set into the long-short term memory network model to obtain a characteristic information result, further performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result, and judging whether the optimized fully-connected neural network model meets a preset training end condition, wherein the preset training end condition comprises conditions of loss function convergence, maximum iteration threshold reaching and the like, if so, obtaining the missing data prediction model, and if not, returning to the executing step: and performing iterative training optimization on the missing data prediction model to be trained by using the missing data set and the complete data set to obtain a missing data prediction model.

Wherein the missing data prediction model to be trained comprises a long-term and short-term memory network model and a fully-connected neural network model,

the step of iteratively training the missing data prediction model to be trained by using the missing data set and the complete data set to obtain the missing data prediction model comprises the following steps:

step A31, inputting the complete data set into the long-short term memory network model, and outputting a characteristic information result;

in this embodiment, the complete data set is input into the long-short term memory network model, and a feature information result is output, specifically, the complete data set is input into the long-short term memory network model, so that information transmission of time series data is realized through the long-short term memory network model, a problem that a gradient disappears in a training process can be avoided, and the feature information result is output, for example, a preset observation duration is set to s, a data set corresponding to n variables is included in the complete data set, a data vector corresponding to the complete data set is an s × n-dimensional vector, and the s × n-dimensional vector can be converted into a one-dimensional vector through the long-short term memory network model, that is, the feature information result, so that a problem of gradient explosion in the training process is avoided.

Step A32, performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.

In this embodiment, the fully-connected neural network model is iteratively trained and optimized according to the missing data set and the feature information result to obtain the missing data prediction model, and specifically, based on the missing data set and the feature information result output by the long-short term memory network model, the fully-connected neural network model is iteratively trained by using a gradient descent algorithm according to a preset loss function, so as to adjust parameters of the fully-connected neural network model to optimize the fully-connected neural network model, and determine whether the optimized fully-connected neural network model meets a preset training end condition, where the preset training end condition includes conditions such as loss function convergence and reaching a maximum iteration threshold, and if so, the missing data prediction model is obtained, and if not, returning to the execution step: and performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.

The embodiment of the application provides a method for repairing a variable anomaly, namely, a historical data set of a sample client is obtained, then the historical data set of the sample client is subjected to deletion processing, a deleted data set in the historical data set and an integrated data set in the historical data set are obtained, the training sample data with different deletion degrees, namely the integrated data set and the deleted data set, are obtained by performing the deletion processing on the historical data set, so that the training sample data cover the situations of different deletion degrees as much as possible, thereby reducing the model deviation caused by data deletion, further, a model for predicting the deleted data to be trained is subjected to iterative training optimization through the deleted data set and the integrated data set, a deleted data prediction model is obtained, and further, according to the training sample data with different deletion degrees, the missing data prediction model obtained by iterative training is more accurate in predicting the missing data, and a foundation is laid for overcoming the technical defect that the model prediction accuracy is lower due to the fact that the average value of historical data is directly used for filling and repairing by a default value, when the importance of the missing data is higher, a larger noise error still exists, and a latest unreleased value is used for filling and repairing, so that the change caused by recent data updating can be ignored.

Further, referring to fig. 3, in another embodiment of the present application, based on the first embodiment of the present application, the step of performing deletion processing on the historical data set of the sample client, and obtaining a missing data set and a complete data set in the historical data set includes:

step B10, acquiring a time point set corresponding to the historical data set;

in this embodiment, a time point set corresponding to the historical data set is obtained, specifically, if each feature variable of the historical data set has corresponding data, that is, has no missing data, the time point set corresponding to the historical data set is obtained.

Step B20, selecting a time point random number from the time point set to obtain a target time point;

in this embodiment, a time point Random number is selected from the time point set to obtain a target time point, specifically, a preset number of time point Random numbers are randomly generated in the time point set to obtain a target time point corresponding to the time point Random number, for example, the time point set is a set from 1/2020 to 6/30/2021, one or more time point Random numbers are randomly generated by using a new Random () function or a math Random () function, and if the time point Random number is 6/2020, the target time point is set to 6/2020.

Step B30, selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set;

in this embodiment, based on the target time point, a value corresponding to the missing variable in the historical data set is selected to obtain the missing data set, specifically, a random number of a missing degree is selected in a preset observation time period to obtain the missing degree corresponding to the missing variable in the historical data set, and then the value corresponding to the missing variable in the historical data set is set according to the missing degree by using the target time point as a reference to obtain the missing data set.

Selecting a value corresponding to a missing variable in the historical data set based on the target time point, wherein the step of obtaining the missing data set comprises:

step B31, selecting a random number of the missing degree in a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set;

in this embodiment, it should be noted that the missing degree is the number of missing data corresponding to a missing variable in the historical data set, and the preset observation time length is a preset time length value closest to the target time point, for example, 5 days and 10 days.

Selecting a random number of the missing degree from a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set, specifically, randomly generating a random number in the preset observation time length, and determining the missing degree corresponding to the missing variable in the historical data set based on the random number.

And step B32, setting the value of the corresponding missing variable in the historical data set according to the missing degree based on the target time point, and obtaining the missing data set.

In this embodiment, based on the target time point, the values of the corresponding missing variables in the historical data set are set according to the missing degree, so as to obtain the missing data set, specifically, setting the number of missing data in the missing variables according to the missing degree by taking the target time point as a reference, further, the missing data set is obtained, for example, assuming that the preset observation time is 5, the target time point is 6 months and 6 days, the missing degree is 2, that is, the number of missing data is 2, further setting the data from 4 days at 6 months to 5 days at 6 months as null, setting the values from 1 day at 6 months to 3 days at 6 months as the real data corresponding to the missing variables from 1 day at 6 months to 3 days at 6 months in the historical data set, and further taking the corresponding values of the missing variables from 6 months 1 to 6 months 5 as the missing data set.

And step B40, setting the value of each complete variable in the historical data set as the complete data set based on the target time point.

In this embodiment, based on the target time point, the value of each complete variable in the historical data set is set as the complete data set, specifically, the value of each complete variable in the historical data set is set as the complete data set with the target time point as a reference, for example, if the preset observation time is 5, the target time point is 6 months and 6 days, the complete data set is data corresponding to each complete variable in the historical data set at 6 months and 1 day and 6 months and 5 days.

The embodiment of the application provides a method for repairing variable abnormality, that is, a time point set corresponding to a historical data set is obtained, a preset number of time point random numbers are selected from the time point set to obtain a target time point, so that the situation of specific interference of a certain time point is avoided, further, values corresponding to missing variables in the historical data set are selected based on the target time point to obtain the missing data set, the values of all the complete variables in the historical data set are set as the complete data set based on the target time point, the target time point and the missing degree are determined by extracting the time point random numbers and the missing degree random numbers, then the complete data set and the missing data sets with different missing degrees are randomly extracted from the historical data set, so that a model can learn under the condition of different missing degrees, the difference of the weights between the current missing variable and each complete variable reduces the model deviation caused by data missing, and lays a foundation for overcoming the technical defect that the average value of historical data is directly used for filling and repairing, when the importance of the missing data is higher, a larger noise error still exists, the latest unreleased value is used for filling and repairing, the change caused by recent data updating can be ignored, and the accuracy of model prediction is lower in the prior art.

Further, referring to fig. 4, based on the first embodiment of the present application, in another embodiment of the present application, before the step of performing iterative training optimization on the fully-connected neural network model according to the missing data set and the feature information result to obtain the missing data prediction model, the variable anomaly repairing method further includes:

step C10, obtaining the missing number corresponding to the missing data in the missing data set;

in this embodiment, the number of missing data corresponding to the missing data in the missing data set is obtained, specifically, the missing degree corresponding to the missing variable in the historical data set is obtained by selecting a random number of the missing degree in a preset observation time, where the missing degree is the number of missing data corresponding to the missing data in the missing data set.

Step C20, extracting the missing data set into a multi-dimensional feature vector;

and step C30, setting the multidimensional characteristic vector according to a preset numerical rule based on the missing number to obtain the missing data dimensional characteristic vector.

In this embodiment, it should be noted that the missing data set is input into the fully-connected neural network model, when the missing data set has empty data, the fully-connected neural network model cannot detect the empty data of the missing data set, so that the fully-connected neural network model cannot learn the missing degree corresponding to the missing data set, the preset numerical rule is a rule that the missing data corresponding to the multidimensional feature vector is set according to a preset numerical value according to the missing number, where the preset numerical value is a preset numerical value, and includes values such as-1 and 0.

Setting the multidimensional feature vector according to a preset numerical rule based on the missing number to obtain a missing data dimension feature vector, specifically, setting data corresponding to a first dimension vector in the multidimensional feature vector as a value corresponding to the missing degree according to the missing degree of the missing data set, further setting the dimension vector of the missing data in the multidimensional feature vector as the preset numerical value to avoid the condition that a model cannot detect the missing data, further, setting data corresponding to the remaining dimension vectors in the multidimensional feature vector as real values corresponding to the missing data set, further obtaining the missing data dimension feature vector, for example, the preset observation duration is 5, the missing degree is 2, the missing data sets are (NULL, 2,3,4), wherein NULL represents that data is empty, that is, no data, and then the first dimension vector in the multi-dimensional feature vector corresponding to the missing data set is the missing degree, the numerical value is 2, the values from the second dimension vector to the third dimension vector are all set to be-1, the values of the other 3 dimension vectors are 2,3 and 4 respectively, and the obtained missing data dimension feature vector is (2, -1, -1,2,3 and 4), so that the missing data prediction model to be detected can learn the weight difference between the current missing variable and each complete variable under the condition of different missing degrees.

The embodiment of the application provides a variable abnormity repairing method, namely, acquiring the missing number corresponding to the missing data in the missing data set, further extracting the missing data set into a multi-dimensional feature vector, setting the multi-dimensional feature vector according to a preset value rule based on the missing number to obtain the missing data dimension feature vector, setting the missing data as a preset value to avoid the condition that a model cannot detect the missing data, learning the weight difference between the current missing variable and each complete variable through the model according to different missing degrees, and performing filling repairing by using the average value of historical data and the default value directly in order to overcome the defect that the filling repairing is performed by using the average value of the historical data and the default value directly when the importance of the missing data is higher, a larger noise error still exists and a latest non-missing value is used for filling repairing, changes caused by recent data updating can be ignored, and a foundation is laid for the technical defect that the accuracy of model prediction is low.

Further, referring to fig. 5, fig. 5 is a schematic flowchart of a missing data prediction model obtained by performing iterative training optimization in the variable anomaly repairing method of the present application, where an original data set B is the complete data set, an original data set a is the missing data set, an LSTM layer is the long-short term memory network model, and a DNN portion is the fully-connected neural network model.

Referring to fig. 6, fig. 6 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 6, the variable abnormality repairing apparatus may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the variable anomaly repairing device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

Those skilled in the art will appreciate that the variable anomaly repair device structure shown in fig. 6 does not constitute a limitation of the variable anomaly repair device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 6, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a variable exception repair program. The operating system is a program for managing and controlling hardware and software resources of the variable exception recovery device and supports the running of the variable exception recovery program and other software and/or programs. The network communication module is used for realizing communication among the components in the memory 1005 and communication with other hardware and software in the variable exception recovery method system.

In the variable exception recovery apparatus shown in fig. 6, the processor 1001 is configured to execute a variable exception recovery program stored in the memory 1005, and implement the steps of the variable exception recovery method described in any one of the above.

The specific implementation of the variable anomaly repairing device of the present application is basically the same as that of each embodiment of the variable anomaly repairing method, and is not described herein again.

The present application also provides a variable abnormality repairing apparatus, including:

the acquisition module is used for acquiring a data set to be predicted;

Optionally, the prediction module is further configured to:

inputting the historical values of all the complete variables in the data set to be predicted into the long-short term memory network model, and outputting the characteristic information results corresponding to all the complete variables in the data set to be predicted;

and inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result of the missing variable.

Optionally, the variable anomaly repairing apparatus is further configured to:

acquiring a historical data set of a sample client;

performing missing processing on the historical data set of the sample client to obtain a missing data set in the historical data set and a complete data set in the historical data set;

and performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model.

Optionally, the variable anomaly repairing apparatus is further configured to:

inputting the complete data set into the long-term and short-term memory network model, and outputting a characteristic information result;

and performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.

Optionally, the variable anomaly repairing apparatus is further configured to:

acquiring a time point set corresponding to the historical data set;

selecting a preset number of time point random numbers from the time point set to obtain target time points;

selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set;

and setting the value of each complete variable in the historical data set as the complete data set based on the target time point.

Optionally, the variable anomaly repairing apparatus is further configured to:

selecting a random number of the missing degree in a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set;

and setting the value of the corresponding missing variable in the historical data set according to the missing degree based on the target time point to obtain the missing data set.

Optionally, the variable anomaly repairing apparatus is further configured to:

acquiring the number of missing data corresponding to the missing data in the missing data set;

extracting the missing data set into a multi-dimensional feature vector;

and setting the multidimensional feature vector according to a preset numerical rule based on the missing number to obtain the missing data dimension feature vector.

The specific implementation of the variable anomaly repairing apparatus of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.

The present application provides a medium, which is a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs are further executable by one or more processors for implementing the steps of the variable exception repairing method described in any one of the above.

The specific implementation of the readable storage medium of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.

The embodiment of the present application provides a computer program product, and the computer program product includes one or more computer programs, which can also be executed by one or more processors for implementing the steps of any one of the above variable exception repairing methods.

The specific implementation of the computer program product of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A variable exception recovery method, comprising:

acquiring a data set to be predicted;

predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;

and repairing the missing variable based on the data prediction result to obtain a variable repairing result.

2. The variable anomaly repairing method according to claim 1, wherein before the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set collected in advance, the variable anomaly repairing method comprises:

acquiring a historical data set of a sample client;

3. The variable abnormality repairing method according to claim 2, wherein the missing data prediction model to be trained includes a long-short term memory network model and a fully-connected neural network model,

the step of performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model comprises the following steps:

4. The variable anomaly repair method according to claim 2, wherein the step of performing missing processing on the historical data set of the sample client to obtain a missing data set and a complete data set in the historical data set comprises:

acquiring a time point set corresponding to the historical data set;

5. The method according to claim 4, wherein the step of selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set comprises:

6. The variable anomaly repairing method according to claim 3, wherein before the step of performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model, the variable anomaly repairing method further comprises:

extracting the missing data set into a multi-dimensional feature vector;

7. The variable anomaly restoration method according to claim 1, wherein the missing data prediction model includes a long-short term memory network model and a fully-connected neural network model,

8. A variable abnormality repairing apparatus characterized by comprising: memory, a processor, and a variable exception repair program stored on the memory,

the variable exception recovery program being executed by the processor to implement the steps of the variable exception recovery method as claimed in any one of claims 1 to 7.

9. A medium which is a readable storage medium, characterized in that a variable exception recovery program is stored on the readable storage medium, the variable exception recovery program being executed by a processor to implement the steps of the variable exception recovery method according to any one of claims 1 to 7.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the variable exception repair method according to any one of claims 1 to 7.