CN113641525A - Variable exception recovery method, apparatus, medium, and computer program product - Google Patents

Variable exception recovery method, apparatus, medium, and computer program product Download PDF

Info

Publication number
CN113641525A
CN113641525A CN202110926804.4A CN202110926804A CN113641525A CN 113641525 A CN113641525 A CN 113641525A CN 202110926804 A CN202110926804 A CN 202110926804A CN 113641525 A CN113641525 A CN 113641525A
Authority
CN
China
Prior art keywords
missing
data set
variable
missing data
complete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110926804.4A
Other languages
Chinese (zh)
Inventor
要卓
陈婷
吴三平
庄伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110926804.4A priority Critical patent/CN113641525A/en
Publication of CN113641525A publication Critical patent/CN113641525A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application discloses a variable abnormity repairing method, equipment, a medium and a computer program product, wherein the variable abnormity repairing method comprises the following steps: acquiring a data set to be predicted, predicting a missing variable in the data set to be predicted through a missing data prediction model, and acquiring a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a pre-collected missing data set and a complete data set, and repairing the missing variable based on the data prediction result to obtain a variable repairing result. The method and the device solve the technical problem of low accuracy of model prediction.

Description

Variable exception recovery method, apparatus, medium, and computer program product
Technical Field
The present application relates to the field of machine learning techniques for financial technology (Fintech), and in particular, to a method, device, medium, and computer program product for variable anomaly recovery.
Background
With the continuous development of financial science and technology, especially internet science and technology, more and more technologies (such as distributed technology, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlog in the financial industry are also put forward.
With the development of computer technology, the application of federal learning is more and more extensive. Currently, in the case of risk model application, when a data source corresponding to a variable in a model is abnormal, so that the value of the variable is missing, and the missing variable cannot be repaired in a short time, in order to weaken model deviation caused by data missing, and in order to weaken model deviation caused by data missing, the value corresponding to the missing variable can be repaired by filling, at present, a method for repairing missing data by filling often uses the mean value of historical data, directly uses a default value, or uses the last non-missing value to repair, however, uses the mean value of historical data or directly uses the default value to repair by filling, when the importance of missing data is higher, a larger noise error still exists, and uses the last non-missing value to repair by filling, changes caused by recent data updating can be ignored, which in turn results in less accurate model predictions.
Disclosure of Invention
The present application mainly aims to provide a variable anomaly repairing method, device, medium, and computer program product, and aims to solve the technical problem of low accuracy of model prediction in the prior art.
In order to achieve the above object, the present application provides a variable anomaly repairing method, including:
acquiring a data set to be predicted;
and predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance.
The present application also provides a variable abnormality repairing apparatus, which is a virtual apparatus, the variable abnormality repairing apparatus including:
the acquisition module is used for acquiring a data set to be predicted;
the prediction module is used for predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;
and the repairing module is used for repairing the missing variable based on the data prediction result to obtain a variable repairing result.
The present application further provides a variable anomaly repairing apparatus, which is an entity apparatus, and includes: the variable exception recovery method comprises a memory, a processor and a variable exception recovery program stored on the memory, wherein the variable exception recovery program can realize the steps of the variable exception recovery method when being executed by the processor.
The present application further provides a medium, which is a readable storage medium, where a variable exception recovery program is stored on the readable storage medium, and when executed by a processor, the variable exception recovery program implements the steps of the variable exception recovery method as described above.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the variable anomaly repair method as described above or performs the steps of the data prediction method as described above.
Compared with the technical means of filling and repairing missing data by means of the mean value of historical data, a default value or a latest non-missing value in the prior art, the method for repairing the abnormal variable comprises the steps of firstly obtaining a data set to be predicted, then predicting the missing variable in the data set to be predicted by a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by iteratively training and optimizing a missing data prediction model to be trained by a pre-collected missing data set and a complete data set, so that the weight difference between the missing variable in the current missing data set and each complete variable in the complete data set is learned by the missing data prediction model based on the missing data set and the complete data set, furthermore, based on the data prediction result, the missing variable is repaired to obtain a variable repair result, so that the model deviation caused by data missing is reduced by repairing the missing variable, the accuracy of the missing data prediction model is improved, the missing data is more accurately predicted by the missing data prediction model, the technical defect that in the prior art, the change caused by updating of recent data is ignored, the accuracy of model prediction is lower due to the fact that the change caused by updating of recent data is ignored when the importance of the missing data is higher, and the accuracy of model prediction is improved is overcome.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram illustrating a variable anomaly restoration method according to a first embodiment of the present application;
FIG. 2 is a schematic flow chart of a variable anomaly repairing method according to a second embodiment of the present application;
FIG. 3 is a flowchart illustrating a variable anomaly repairing method according to a third embodiment of the present application;
FIG. 4 is a schematic flow chart of a variable anomaly repairing method according to a fourth embodiment of the present application;
FIG. 5 is a schematic flow chart of a missing data prediction model obtained by iterative training optimization in the variable anomaly restoration method of the present application;
fig. 6 is a schematic device structure diagram of a hardware operating environment related to a variable exception recovery method in an embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In a first embodiment of the variable anomaly repairing method, referring to fig. 1, the variable anomaly repairing method includes:
step S10, acquiring a data set to be predicted;
in this embodiment, it should be noted that the data set to be predicted includes a history value set corresponding to a missing variable with missing data and a history value set corresponding to each complete variable without missing data.
Step S20, predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;
in this embodiment, it should be noted that the predicting is to perform predictive restoration on the value of the missing variable of the missing data in the data set to be predicted, the missing data set is a data set corresponding to the missing variable having the missing data in the historical data set corresponding to the sample client, the complete data set is a data set corresponding to each complete variable having no missing data in the historical data set corresponding to the sample client, and the historical data set is a data set corresponding to all variables of the sample client.
Predicting missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and an integral data set which are collected in advance, specifically, performing iterative training optimization on the missing data prediction model to be trained through the integral data set and the missing data set, wherein the missing data prediction model to be trained comprises a long-short term memory network model and a full-link neural network model, and judging whether the optimized missing data prediction model to be trained meets a preset training ending condition, wherein the preset training ending condition comprises conditions of loss function convergence, reaching a maximum iteration number threshold value and the like, and if so, obtaining the missing data prediction model, if not, returning to the execution step: iteratively training and optimizing a missing data prediction model to be trained through the complete data set and the missing data set to obtain the missing data prediction model, and then inputting the data set to be predicted into the missing data prediction model, namely inputting the historical values of the missing variables in the data set to be predicted and the historical values of other complete variables in the data set to be predicted into the missing data prediction model, performing dimensionality reduction processing on the historical values of the other complete variables through a long-short term memory network model in the missing data prediction model to obtain characteristic information results of the other complete variables, and further inputting the characteristic information results of the other complete variables and the historical values of the missing variables into a fully-connected neural network model in the missing data prediction model, and outputting data prediction results of values corresponding to the missing variables in the data set to be predicted.
Wherein the missing data prediction model comprises a long-term and short-term memory network model and a fully-connected neural network model,
the step of predicting the missing variables in the data set to be predicted through the missing data prediction model to obtain the data prediction result comprises the following steps:
step S21, inputting the historical values of all the complete variables in the data set to be predicted into the long-short term memory network model, and outputting the characteristic information results corresponding to all the complete variables in the data set to be predicted;
in this embodiment, it should be noted that the historical values of the complete variables are values corresponding to the complete variables in a preset time.
And specifically, the historical values of the complete variables in the data set to be predicted are input into the long-short term memory network model, so that the multidimensional time sequence data set is subjected to dimensionality reduction, and the characteristic information results corresponding to the complete variables in the data set to be predicted are obtained.
Step S22, inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result;
in this embodiment, it should be noted that the historical value of the missing variable is a value corresponding to the missing variable in a preset time.
Inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result, specifically, abstracting the historical value of the missing variable in the data set to be predicted into a multi-dimensional characteristic vector, setting a first dimension vector in the multi-dimensional characteristic vector as the number of the missing variable in the data set to be predicted, setting empty data in the missing variable in the multi-dimensional characteristic vector as a preset value, setting the remaining dimension characteristic vectors as the historical values corresponding to the missing variable in the data set to be predicted, inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the multi-dimensional characteristic vector into the fully-connected neural network model together, and further outputting the data prediction result of the value corresponding to the missing variable in the data set to be predicted, and the data prediction result is as close as possible to the real value.
And step S30, repairing the missing variable based on the data prediction result to obtain a variable repairing result.
In this embodiment, the missing variables are repaired based on the data prediction result to obtain a variable repair result, and specifically, the missing data in the missing variables are repaired by the data prediction result to obtain the variable repair result, so that the model deviation caused by data missing is reduced, and the risk prediction of the model is better performed.
Compared with the technical means of filling and repairing missing data through the mean value of historical data, directly using a default value or using a latest non-missing value in the prior art, the method for repairing the abnormal variable comprises the steps of firstly obtaining a data set to be predicted, then predicting the missing variable in the data set to be predicted through a missing data prediction model, and obtaining a data prediction result, wherein the missing data prediction model is obtained by performing iterative training and optimization on the missing data prediction model to be trained through a pre-collected missing data set and a complete data set, so that the weight difference between the missing variable in the current missing data set and each complete variable in the complete data set is learned through the missing data prediction model based on the missing data set and the complete data set, furthermore, based on the data prediction result, the missing variable is repaired to obtain a variable repair result, so that the model deviation caused by data missing is reduced by repairing the missing variable, the accuracy of the missing data prediction model is improved, the missing data is more accurately predicted by the missing data prediction model, the technical defect that in the prior art, the change caused by updating of recent data is ignored, the accuracy of model prediction is lower due to the fact that the change caused by updating of recent data is ignored when the importance of the missing data is higher, and the accuracy of model prediction is improved is overcome.
Further, referring to fig. 2, based on the first embodiment in the present application, in another embodiment in the present application, before the missing data prediction model is obtained by performing iterative training optimization on a missing data prediction model to be trained through a missing data set and a complete data set collected in advance, the variable anomaly repairing method includes:
step A10, acquiring a historical data set of a sample client;
in this embodiment, it should be noted that the historical data set is a data set corresponding to all characteristic variables of the sample client.
And acquiring a historical data set of the sample client, specifically, selecting the sample client, and acquiring the historical data set corresponding to the sample client.
Step A20, performing missing processing on the historical data set of the sample client to obtain a missing data set in the historical data set and a complete data set in the historical data set;
in this embodiment, it should be noted that the missing processing is a processing mode in which the historical data set is set according to a preset missing degree, the preset missing degree is the number of missing data corresponding to missing variables in the historical data set, the missing data set is a data set corresponding to missing variables having missing data in the historical data set corresponding to the sample client, and the complete data set is a data set corresponding to each complete variable having no missing data in the historical data set corresponding to the sample client.
Specifically, if the historical data set meets a preset value condition, a time point set corresponding to the historical data set is obtained, wherein the preset value condition is that each characteristic variable in the historical data set has a corresponding value within a preset observation time length, the time point set is a set of each time point corresponding to the historical data set, and a time point random number is randomly generated in the time point set to obtain a target time point, it is required to say that a plurality of time point random numbers can also be randomly generated in the time point set to obtain a plurality of target time points, further, a loss degree random number is randomly generated in the preset observation time length to obtain a loss degree corresponding to the loss variable in the historical data set, wherein the preset observation time is a preset time closest to the target time point, and based on the target time point, the value of each complete variable in the historical data set is set as the complete data set, and the value of the missing variable in the historical data set is set according to the missing degree, so as to obtain the missing data set, for example, assuming that the preset observation time is 5 days, the target time point is 6 and 6 days in 2021 year, and the missing degree is 2, by taking the target time point as a reference, the value corresponding to the missing variable with the missing data from 6 and 4 days in 2021 year to 6 and 5 days in 2021 year is set as a null value, and the value corresponding to the missing variable with the missing data from 6 and 1 days in 2021 year to 6 and 3 days in 2021 year is set as a true value corresponding to the missing variable in the historical data set from 6 and 1 day in 2021 year to 6 and 3 months in 2021 year, and then taking the corresponding values of the missing variables from 6/1/2021 to 6/5/2021 as the missing data set together, and taking the corresponding values of the complete variables in the historical data set from 6/1/2021 to 6/5/2021 as the complete data set.
And A30, performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model.
In this embodiment, it should be noted that the missing data prediction model to be trained includes a long-term and short-term memory network model and a fully-connected neural network model, and the missing data prediction model is a model for predicting missing data.
Iteratively training a missing data prediction model to be trained by the missing data set and the complete data set to obtain a missing data prediction model, specifically, inputting the complete data set into the long-short term memory network model to obtain a characteristic information result, further performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result, and judging whether the optimized fully-connected neural network model meets a preset training end condition, wherein the preset training end condition comprises conditions of loss function convergence, maximum iteration threshold reaching and the like, if so, obtaining the missing data prediction model, and if not, returning to the executing step: and performing iterative training optimization on the missing data prediction model to be trained by using the missing data set and the complete data set to obtain a missing data prediction model.
Wherein the missing data prediction model to be trained comprises a long-term and short-term memory network model and a fully-connected neural network model,
the step of iteratively training the missing data prediction model to be trained by using the missing data set and the complete data set to obtain the missing data prediction model comprises the following steps:
step A31, inputting the complete data set into the long-short term memory network model, and outputting a characteristic information result;
in this embodiment, the complete data set is input into the long-short term memory network model, and a feature information result is output, specifically, the complete data set is input into the long-short term memory network model, so that information transmission of time series data is realized through the long-short term memory network model, a problem that a gradient disappears in a training process can be avoided, and the feature information result is output, for example, a preset observation duration is set to s, a data set corresponding to n variables is included in the complete data set, a data vector corresponding to the complete data set is an s × n-dimensional vector, and the s × n-dimensional vector can be converted into a one-dimensional vector through the long-short term memory network model, that is, the feature information result, so that a problem of gradient explosion in the training process is avoided.
Step A32, performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.
In this embodiment, the fully-connected neural network model is iteratively trained and optimized according to the missing data set and the feature information result to obtain the missing data prediction model, and specifically, based on the missing data set and the feature information result output by the long-short term memory network model, the fully-connected neural network model is iteratively trained by using a gradient descent algorithm according to a preset loss function, so as to adjust parameters of the fully-connected neural network model to optimize the fully-connected neural network model, and determine whether the optimized fully-connected neural network model meets a preset training end condition, where the preset training end condition includes conditions such as loss function convergence and reaching a maximum iteration threshold, and if so, the missing data prediction model is obtained, and if not, returning to the execution step: and performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.
The embodiment of the application provides a method for repairing a variable anomaly, namely, a historical data set of a sample client is obtained, then the historical data set of the sample client is subjected to deletion processing, a deleted data set in the historical data set and an integrated data set in the historical data set are obtained, the training sample data with different deletion degrees, namely the integrated data set and the deleted data set, are obtained by performing the deletion processing on the historical data set, so that the training sample data cover the situations of different deletion degrees as much as possible, thereby reducing the model deviation caused by data deletion, further, a model for predicting the deleted data to be trained is subjected to iterative training optimization through the deleted data set and the integrated data set, a deleted data prediction model is obtained, and further, according to the training sample data with different deletion degrees, the missing data prediction model obtained by iterative training is more accurate in predicting the missing data, and a foundation is laid for overcoming the technical defect that the model prediction accuracy is lower due to the fact that the average value of historical data is directly used for filling and repairing by a default value, when the importance of the missing data is higher, a larger noise error still exists, and a latest unreleased value is used for filling and repairing, so that the change caused by recent data updating can be ignored.
Further, referring to fig. 3, in another embodiment of the present application, based on the first embodiment of the present application, the step of performing deletion processing on the historical data set of the sample client, and obtaining a missing data set and a complete data set in the historical data set includes:
step B10, acquiring a time point set corresponding to the historical data set;
in this embodiment, a time point set corresponding to the historical data set is obtained, specifically, if each feature variable of the historical data set has corresponding data, that is, has no missing data, the time point set corresponding to the historical data set is obtained.
Step B20, selecting a time point random number from the time point set to obtain a target time point;
in this embodiment, a time point Random number is selected from the time point set to obtain a target time point, specifically, a preset number of time point Random numbers are randomly generated in the time point set to obtain a target time point corresponding to the time point Random number, for example, the time point set is a set from 1/2020 to 6/30/2021, one or more time point Random numbers are randomly generated by using a new Random () function or a math Random () function, and if the time point Random number is 6/2020, the target time point is set to 6/2020.
Step B30, selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set;
in this embodiment, based on the target time point, a value corresponding to the missing variable in the historical data set is selected to obtain the missing data set, specifically, a random number of a missing degree is selected in a preset observation time period to obtain the missing degree corresponding to the missing variable in the historical data set, and then the value corresponding to the missing variable in the historical data set is set according to the missing degree by using the target time point as a reference to obtain the missing data set.
Selecting a value corresponding to a missing variable in the historical data set based on the target time point, wherein the step of obtaining the missing data set comprises:
step B31, selecting a random number of the missing degree in a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set;
in this embodiment, it should be noted that the missing degree is the number of missing data corresponding to a missing variable in the historical data set, and the preset observation time length is a preset time length value closest to the target time point, for example, 5 days and 10 days.
Selecting a random number of the missing degree from a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set, specifically, randomly generating a random number in the preset observation time length, and determining the missing degree corresponding to the missing variable in the historical data set based on the random number.
And step B32, setting the value of the corresponding missing variable in the historical data set according to the missing degree based on the target time point, and obtaining the missing data set.
In this embodiment, based on the target time point, the values of the corresponding missing variables in the historical data set are set according to the missing degree, so as to obtain the missing data set, specifically, setting the number of missing data in the missing variables according to the missing degree by taking the target time point as a reference, further, the missing data set is obtained, for example, assuming that the preset observation time is 5, the target time point is 6 months and 6 days, the missing degree is 2, that is, the number of missing data is 2, further setting the data from 4 days at 6 months to 5 days at 6 months as null, setting the values from 1 day at 6 months to 3 days at 6 months as the real data corresponding to the missing variables from 1 day at 6 months to 3 days at 6 months in the historical data set, and further taking the corresponding values of the missing variables from 6 months 1 to 6 months 5 as the missing data set.
And step B40, setting the value of each complete variable in the historical data set as the complete data set based on the target time point.
In this embodiment, based on the target time point, the value of each complete variable in the historical data set is set as the complete data set, specifically, the value of each complete variable in the historical data set is set as the complete data set with the target time point as a reference, for example, if the preset observation time is 5, the target time point is 6 months and 6 days, the complete data set is data corresponding to each complete variable in the historical data set at 6 months and 1 day and 6 months and 5 days.
The embodiment of the application provides a method for repairing variable abnormality, that is, a time point set corresponding to a historical data set is obtained, a preset number of time point random numbers are selected from the time point set to obtain a target time point, so that the situation of specific interference of a certain time point is avoided, further, values corresponding to missing variables in the historical data set are selected based on the target time point to obtain the missing data set, the values of all the complete variables in the historical data set are set as the complete data set based on the target time point, the target time point and the missing degree are determined by extracting the time point random numbers and the missing degree random numbers, then the complete data set and the missing data sets with different missing degrees are randomly extracted from the historical data set, so that a model can learn under the condition of different missing degrees, the difference of the weights between the current missing variable and each complete variable reduces the model deviation caused by data missing, and lays a foundation for overcoming the technical defect that the average value of historical data is directly used for filling and repairing, when the importance of the missing data is higher, a larger noise error still exists, the latest unreleased value is used for filling and repairing, the change caused by recent data updating can be ignored, and the accuracy of model prediction is lower in the prior art.
Further, referring to fig. 4, based on the first embodiment of the present application, in another embodiment of the present application, before the step of performing iterative training optimization on the fully-connected neural network model according to the missing data set and the feature information result to obtain the missing data prediction model, the variable anomaly repairing method further includes:
step C10, obtaining the missing number corresponding to the missing data in the missing data set;
in this embodiment, the number of missing data corresponding to the missing data in the missing data set is obtained, specifically, the missing degree corresponding to the missing variable in the historical data set is obtained by selecting a random number of the missing degree in a preset observation time, where the missing degree is the number of missing data corresponding to the missing data in the missing data set.
Step C20, extracting the missing data set into a multi-dimensional feature vector;
and step C30, setting the multidimensional characteristic vector according to a preset numerical rule based on the missing number to obtain the missing data dimensional characteristic vector.
In this embodiment, it should be noted that the missing data set is input into the fully-connected neural network model, when the missing data set has empty data, the fully-connected neural network model cannot detect the empty data of the missing data set, so that the fully-connected neural network model cannot learn the missing degree corresponding to the missing data set, the preset numerical rule is a rule that the missing data corresponding to the multidimensional feature vector is set according to a preset numerical value according to the missing number, where the preset numerical value is a preset numerical value, and includes values such as-1 and 0.
Setting the multidimensional feature vector according to a preset numerical rule based on the missing number to obtain a missing data dimension feature vector, specifically, setting data corresponding to a first dimension vector in the multidimensional feature vector as a value corresponding to the missing degree according to the missing degree of the missing data set, further setting the dimension vector of the missing data in the multidimensional feature vector as the preset numerical value to avoid the condition that a model cannot detect the missing data, further, setting data corresponding to the remaining dimension vectors in the multidimensional feature vector as real values corresponding to the missing data set, further obtaining the missing data dimension feature vector, for example, the preset observation duration is 5, the missing degree is 2, the missing data sets are (NULL, 2,3,4), wherein NULL represents that data is empty, that is, no data, and then the first dimension vector in the multi-dimensional feature vector corresponding to the missing data set is the missing degree, the numerical value is 2, the values from the second dimension vector to the third dimension vector are all set to be-1, the values of the other 3 dimension vectors are 2,3 and 4 respectively, and the obtained missing data dimension feature vector is (2, -1, -1,2,3 and 4), so that the missing data prediction model to be detected can learn the weight difference between the current missing variable and each complete variable under the condition of different missing degrees.
The embodiment of the application provides a variable abnormity repairing method, namely, acquiring the missing number corresponding to the missing data in the missing data set, further extracting the missing data set into a multi-dimensional feature vector, setting the multi-dimensional feature vector according to a preset value rule based on the missing number to obtain the missing data dimension feature vector, setting the missing data as a preset value to avoid the condition that a model cannot detect the missing data, learning the weight difference between the current missing variable and each complete variable through the model according to different missing degrees, and performing filling repairing by using the average value of historical data and the default value directly in order to overcome the defect that the filling repairing is performed by using the average value of the historical data and the default value directly when the importance of the missing data is higher, a larger noise error still exists and a latest non-missing value is used for filling repairing, changes caused by recent data updating can be ignored, and a foundation is laid for the technical defect that the accuracy of model prediction is low.
Further, referring to fig. 5, fig. 5 is a schematic flowchart of a missing data prediction model obtained by performing iterative training optimization in the variable anomaly repairing method of the present application, where an original data set B is the complete data set, an original data set a is the missing data set, an LSTM layer is the long-short term memory network model, and a DNN portion is the fully-connected neural network model.
Referring to fig. 6, fig. 6 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
As shown in fig. 6, the variable abnormality repairing apparatus may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the variable anomaly repairing device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Those skilled in the art will appreciate that the variable anomaly repair device structure shown in fig. 6 does not constitute a limitation of the variable anomaly repair device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 6, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a variable exception repair program. The operating system is a program for managing and controlling hardware and software resources of the variable exception recovery device and supports the running of the variable exception recovery program and other software and/or programs. The network communication module is used for realizing communication among the components in the memory 1005 and communication with other hardware and software in the variable exception recovery method system.
In the variable exception recovery apparatus shown in fig. 6, the processor 1001 is configured to execute a variable exception recovery program stored in the memory 1005, and implement the steps of the variable exception recovery method described in any one of the above.
The specific implementation of the variable anomaly repairing device of the present application is basically the same as that of each embodiment of the variable anomaly repairing method, and is not described herein again.
The present application also provides a variable abnormality repairing apparatus, including:
the acquisition module is used for acquiring a data set to be predicted;
the prediction module is used for predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;
and the repairing module is used for repairing the missing variable based on the data prediction result to obtain a variable repairing result.
Optionally, the prediction module is further configured to:
inputting the historical values of all the complete variables in the data set to be predicted into the long-short term memory network model, and outputting the characteristic information results corresponding to all the complete variables in the data set to be predicted;
and inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result of the missing variable.
Optionally, the variable anomaly repairing apparatus is further configured to:
acquiring a historical data set of a sample client;
performing missing processing on the historical data set of the sample client to obtain a missing data set in the historical data set and a complete data set in the historical data set;
and performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model.
Optionally, the variable anomaly repairing apparatus is further configured to:
inputting the complete data set into the long-term and short-term memory network model, and outputting a characteristic information result;
and performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.
Optionally, the variable anomaly repairing apparatus is further configured to:
acquiring a time point set corresponding to the historical data set;
selecting a preset number of time point random numbers from the time point set to obtain target time points;
selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set;
and setting the value of each complete variable in the historical data set as the complete data set based on the target time point.
Optionally, the variable anomaly repairing apparatus is further configured to:
selecting a random number of the missing degree in a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set;
and setting the value of the corresponding missing variable in the historical data set according to the missing degree based on the target time point to obtain the missing data set.
Optionally, the variable anomaly repairing apparatus is further configured to:
acquiring the number of missing data corresponding to the missing data in the missing data set;
extracting the missing data set into a multi-dimensional feature vector;
and setting the multidimensional feature vector according to a preset numerical rule based on the missing number to obtain the missing data dimension feature vector.
The specific implementation of the variable anomaly repairing apparatus of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.
The present application provides a medium, which is a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs are further executable by one or more processors for implementing the steps of the variable exception repairing method described in any one of the above.
The specific implementation of the readable storage medium of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.
The embodiment of the present application provides a computer program product, and the computer program product includes one or more computer programs, which can also be executed by one or more processors for implementing the steps of any one of the above variable exception repairing methods.
The specific implementation of the computer program product of the present application is substantially the same as the embodiments of the variable anomaly repairing method, and is not described herein again.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. A variable exception recovery method, comprising:
acquiring a data set to be predicted;
predicting the missing variables in the data set to be predicted through a missing data prediction model to obtain a data prediction result, wherein the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set which are collected in advance;
and repairing the missing variable based on the data prediction result to obtain a variable repairing result.
2. The variable anomaly repairing method according to claim 1, wherein before the missing data prediction model is obtained by performing iterative training optimization on the missing data prediction model to be trained through a missing data set and a complete data set collected in advance, the variable anomaly repairing method comprises:
acquiring a historical data set of a sample client;
performing missing processing on the historical data set of the sample client to obtain a missing data set in the historical data set and a complete data set in the historical data set;
and performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model.
3. The variable abnormality repairing method according to claim 2, wherein the missing data prediction model to be trained includes a long-short term memory network model and a fully-connected neural network model,
the step of performing iterative training optimization on the missing data prediction model to be trained through the missing data set and the complete data set to obtain the missing data prediction model comprises the following steps:
inputting the complete data set into the long-term and short-term memory network model, and outputting a characteristic information result;
and performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model.
4. The variable anomaly repair method according to claim 2, wherein the step of performing missing processing on the historical data set of the sample client to obtain a missing data set and a complete data set in the historical data set comprises:
acquiring a time point set corresponding to the historical data set;
selecting a preset number of time point random numbers from the time point set to obtain target time points;
selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set;
and setting the value of each complete variable in the historical data set as the complete data set based on the target time point.
5. The method according to claim 4, wherein the step of selecting a value corresponding to the missing variable in the historical data set based on the target time point to obtain the missing data set comprises:
selecting a random number of the missing degree in a preset observation time length to obtain the missing degree corresponding to the missing variable in the historical data set;
and setting the value of the corresponding missing variable in the historical data set according to the missing degree based on the target time point to obtain the missing data set.
6. The variable anomaly repairing method according to claim 3, wherein before the step of performing iterative training optimization on the fully-connected neural network model according to the missing data set and the characteristic information result to obtain the missing data prediction model, the variable anomaly repairing method further comprises:
acquiring the number of missing data corresponding to the missing data in the missing data set;
extracting the missing data set into a multi-dimensional feature vector;
and setting the multidimensional feature vector according to a preset numerical rule based on the missing number to obtain the missing data dimension feature vector.
7. The variable anomaly restoration method according to claim 1, wherein the missing data prediction model includes a long-short term memory network model and a fully-connected neural network model,
the step of predicting the missing variables in the data set to be predicted through the missing data prediction model to obtain the data prediction result comprises the following steps:
inputting the historical values of all the complete variables in the data set to be predicted into the long-short term memory network model, and outputting the characteristic information results corresponding to all the complete variables in the data set to be predicted;
and inputting the characteristic information result corresponding to each complete variable in the data set to be predicted and the historical value of the missing variable in the data set to be predicted into the fully-connected neural network model, and outputting the data prediction result of the missing variable.
8. A variable abnormality repairing apparatus characterized by comprising: memory, a processor, and a variable exception repair program stored on the memory,
the variable exception recovery program being executed by the processor to implement the steps of the variable exception recovery method as claimed in any one of claims 1 to 7.
9. A medium which is a readable storage medium, characterized in that a variable exception recovery program is stored on the readable storage medium, the variable exception recovery program being executed by a processor to implement the steps of the variable exception recovery method according to any one of claims 1 to 7.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the variable exception repair method according to any one of claims 1 to 7.
CN202110926804.4A 2021-08-12 2021-08-12 Variable exception recovery method, apparatus, medium, and computer program product Pending CN113641525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110926804.4A CN113641525A (en) 2021-08-12 2021-08-12 Variable exception recovery method, apparatus, medium, and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110926804.4A CN113641525A (en) 2021-08-12 2021-08-12 Variable exception recovery method, apparatus, medium, and computer program product

Publications (1)

Publication Number Publication Date
CN113641525A true CN113641525A (en) 2021-11-12

Family

ID=78421200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110926804.4A Pending CN113641525A (en) 2021-08-12 2021-08-12 Variable exception recovery method, apparatus, medium, and computer program product

Country Status (1)

Country Link
CN (1) CN113641525A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662611A (en) * 2022-04-07 2022-06-24 中科三清科技有限公司 Method and device for restoring particulate component data, electronic equipment and storage medium
CN115983495A (en) * 2023-02-20 2023-04-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) RFR-Net based global neutral atmospheric temperature density prediction method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662611A (en) * 2022-04-07 2022-06-24 中科三清科技有限公司 Method and device for restoring particulate component data, electronic equipment and storage medium
CN115983495A (en) * 2023-02-20 2023-04-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) RFR-Net based global neutral atmospheric temperature density prediction method and device
CN115983495B (en) * 2023-02-20 2023-08-11 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Global neutral atmospheric temperature density prediction method and equipment based on RFR-Net

Similar Documents

Publication Publication Date Title
KR102170105B1 (en) Method and apparatus for generating neural network structure, electronic device, storage medium
Jamshidi et al. Learning to sample: Exploiting similarities across environments to learn performance models for configurable systems
US11544604B2 (en) Adaptive model insights visualization engine for complex machine learning models
US9129228B1 (en) Robust and fast model fitting by adaptive sampling
JP2022548654A (en) Computer-based system, computer component and computer object configured to implement dynamic outlier bias reduction in machine learning models
US10067746B1 (en) Approximate random number generator by empirical cumulative distribution function
CN113641525A (en) Variable exception recovery method, apparatus, medium, and computer program product
CN111950810B (en) Multi-variable time sequence prediction method and equipment based on self-evolution pre-training
US10635078B2 (en) Simulation system, simulation method, and simulation program
CN113328908B (en) Abnormal data detection method and device, computer equipment and storage medium
US20200167660A1 (en) Automated heuristic deep learning-based modelling
EP3701403B1 (en) Accelerated simulation setup process using prior knowledge extraction for problem matching
JP2019215698A (en) Image inspection support apparatus and method
EP4111373A1 (en) Robust artificial intelligence inference in edge computing devices
KR20210015531A (en) Method and System for Updating Neural Network Model
CN110232130B (en) Metadata management pedigree generation method, apparatus, computer device and storage medium
CN113095508A (en) Regression model construction optimization method, device, medium, and computer program product
KR20160128869A (en) Method for visual object localization using privileged information and apparatus for performing the same
CN115565115A (en) Outfitting intelligent identification method and computer equipment
WO2023086033A2 (en) Methods and systems for asset management
US20220164659A1 (en) Deep Learning Error Minimizing System for Real-Time Generation of Big Data Analysis Models for Mobile App Users and Controlling Method for the Same
CN114492174A (en) Full life cycle shield tunneling parameter prediction method and device
CN115968479A (en) System and method for automating data science processes
JP2022163293A (en) Operation support device, operation support method and program
US9619765B2 (en) Monitoring a situation by generating an overall similarity score

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination