CN116304589A - Abnormal data detection and restoration method based on multi-constraint collaboration and related products - Google Patents

Abnormal data detection and restoration method based on multi-constraint collaboration and related products Download PDF

Info

Publication number
CN116304589A
CN116304589A CN202310156625.6A CN202310156625A CN116304589A CN 116304589 A CN116304589 A CN 116304589A CN 202310156625 A CN202310156625 A CN 202310156625A CN 116304589 A CN116304589 A CN 116304589A
Authority
CN
China
Prior art keywords
repair
variable
sequence
constraint
time interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310156625.6A
Other languages
Chinese (zh)
Inventor
黄建平
王红凯
冯珺
彭梁英
潘司晨
赵帅
陈浩
李钟煦
丁小欧
王宏志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
State Grid Zhejiang Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Harbin Institute of Technology
State Grid Zhejiang Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology, State Grid Zhejiang Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd filed Critical Harbin Institute of Technology
Priority to CN202310156625.6A priority Critical patent/CN116304589A/en
Publication of CN116304589A publication Critical patent/CN116304589A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method for detecting and repairing abnormal data based on multi-constraint cooperation and a related product, which can be applied to the technical field of data repair. The method comprises the following steps: determining a time interval to be repaired in the multi-element time sequence; establishing a dependent network according to the time interval to be repaired; obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network; determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary; and updating the multi-element time sequence according to the candidate repair value to realize the repair of time sequence data. Therefore, the method divides the multi-element time sequence, and repairs the multi-element time sequence by adopting the algorithm of the generic constraint, thereby ensuring that new constraint violations are not introduced in the repair process, and improving the repair speed of the error data on the premise of improving the accuracy and reliability of the error data repair result.

Description

Abnormal data detection and restoration method based on multi-constraint collaboration and related products
Technical Field
The application relates to the technical field of data restoration, in particular to a method for detecting and restoring abnormal data based on multi-constraint cooperation and a related product.
Background
In the industrial field, the acquisition of time series data is mainly completed by sensors, and different sensors distributed on different machines can monitor the running condition of the machines in real time. These sensors typically collect data at a frequency of seconds, and therefore the amount of data collected is quite large.
The sensor may be affected by a number of factors that may lead to an unstable quality of the data stream. There is often a large amount of erroneous data in the multivariate time series, which presents great difficulties for subsequent data management and analysis, and therefore requires timely and efficient repair of the time series data. In industrial big data, different sensor data belonging to the same system have extremely strong correlation, namely, each different characteristic in time sequence data in a multi-element time sequence has extremely strong correlation, and the time sequence data are usually continuous, so the applicability of a repairing method based on function dependence in a traditional relational database can be reduced, and the difficulty of repairing inferior data is increased due to the continuity of the data.
Therefore, how to increase the repair speed of the error data on the premise of increasing the accuracy and reliability of the repair result of the error data is a problem which needs to be solved by those skilled in the art.
Disclosure of Invention
Based on the problems, the application provides a method for detecting and repairing abnormal data based on multi-constraint collaboration and a related product, wherein the multi-element time sequence is divided, and the multi-element time sequence is repaired by adopting an algorithm of the above constraint type, so that the violation of new constraint is not introduced in the repairing process, and the problems of low accuracy and reliability of the repairing result and low repairing speed of the existing repairing method are solved.
In a first aspect, an embodiment of the present application provides a method for repairing time-series data based on multi-constraint collaboration, including:
determining a time interval to be repaired in the multi-element time sequence;
establishing a dependent network according to the time interval to be repaired;
obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network;
determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary;
and updating the multi-element time sequence according to the candidate repair value to realize the repair of time sequence data.
Optionally, the determining the time interval to be repaired in the multivariate time sequence includes:
compliance checking is carried out on the constraints in the constraint compliance set, and illegal constraints are determined;
determining an abnormal time interval in the multi-element time sequence according to the violation constraint;
determining a time interval to be repaired in the multi-element time sequence according to the abnormal time interval;
the time interval to be repaired is larger than the abnormal time interval.
Optionally, the establishing a dependent network according to the time interval to be repaired includes:
acquiring a variable feature set in the time interval to be repaired;
and establishing a variable characteristic dependent network directed graph according to the dependency relationship among the variable characteristics in the variable characteristic set, so as to establish a dependent network.
Optionally, the obtaining the variable feature prediction order and the corresponding prediction model dictionary according to the dependency network includes:
analyzing the variable feature-dependent network directed graph;
obtaining a variable feature prediction sequence according to the variable feature dependence network directed graph;
and learning a variable feature prediction model according to the variable feature dependence network directed graph, and obtaining a prediction model dictionary corresponding to the variable feature prediction sequence.
Optionally, the determining the candidate repair value corresponding to the variable feature according to the variable feature prediction sequence and the prediction model dictionary includes:
acquiring machine learning model information corresponding to the variable feature prediction sequence from the prediction model dictionary;
and determining candidate repair values corresponding to the variable features by using the machine learning model information and the variable feature prediction sequence.
Optionally, before updating the candidate repair value to the multivariate time sequence to implement repair of the time series data, the method further includes:
evaluating the candidate repair values corresponding to the variable features by using a cleaning cost function according to constraint information and original data information;
and determining the best candidate restoration value according to the evaluation result.
Optionally, the updating the multivariate time sequence according to the candidate repair value to implement repair of time series data includes:
selecting the best candidate repair value from the candidate repair values;
and updating the best candidate repair value into the multivariate time sequence to realize the repair of time sequence data.
In a second aspect, an embodiment of the present application provides a time-series data restoration device based on multi-constraint collaboration, including:
the first determining module is used for determining a time interval to be repaired in the multi-element time sequence;
the building module is used for building a dependent network according to the time interval to be repaired;
the obtaining module is used for obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network;
the second determining module is used for determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary;
and the repair module is used for updating the candidate repair value into the multivariate time sequence to realize the repair of the time sequence data.
In a third aspect, an embodiment of the present application provides a time-series data repair device based on multi-constraint collaboration, including:
a memory for storing a computer program;
a processor for implementing the steps of the multi-constraint collaboration-based time series data restoration method as described in any one of the above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a multi-constraint collaboration-based time series data restoration method as described in any of the above.
From the above technical solution, compared with the prior art, the present application has the following advantages:
in summary, the present application first determines a time interval to be repaired in a multivariate time sequence, then establishes a dependent network according to the time interval to be repaired, and obtains a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependent network. And then determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary. And finally updating the multi-element time sequence according to the candidate repair value to realize the repair of the time sequence data. Therefore, the multi-element time sequence is divided, and the multi-element time sequence is repaired by adopting the algorithm of the upper class constraint, so that the violation of new constraint is not introduced in the repairing process, and the repairing speed of the error data is improved on the premise of improving the accuracy and the reliability of the error data repairing result.
Drawings
FIG. 1 is a flow chart of a method for detecting and repairing abnormal data based on multi-constraint collaboration provided by the application;
fig. 2 is a schematic structural diagram of a device for detecting and repairing abnormal data based on multi-constraint collaboration.
Detailed Description
As described above, the existing time series data repairing method cannot guarantee the accuracy and reliability of the error data repairing result and the repairing speed of the error data. Specifically, in industrial big data, there is a strong correlation between different sensor data belonging to the same system, that is, there is a strong correlation between each different feature in time series data in a multi-element time series, and these time series data are usually continuous, so the applicability of the repairing method based on function dependency in the traditional relational database will be reduced, and the difficulty of repairing bad data is increased due to the continuity of data.
In order to solve the above problems, an embodiment of the present application provides a time-series data restoration method based on multi-constraint collaboration, including: firstly, determining a time interval to be repaired in a multi-element time sequence, then establishing a dependent network according to the time interval to be repaired, and acquiring a variable characteristic prediction sequence and a corresponding prediction model dictionary according to the dependent network. And then determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary. And finally updating the multi-element time sequence according to the candidate repair value to realize the repair of the time sequence data.
Therefore, the multi-element time sequence is divided, and the multi-element time sequence is repaired by adopting the algorithm of the upper class constraint, so that the violation of new constraint is not introduced in the repairing process, and the repairing speed of the error data is improved on the premise of improving the accuracy and the reliability of the error data repairing result.
It should be noted that the time sequence data restoration method based on multi-constraint cooperation and the related products provided by the application can be applied to the technical field of data restoration. The foregoing is merely an example, and is not intended to limit the application field of the method for repairing time series data based on multi-constraint collaboration and related products provided in the present application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Fig. 1 is a flowchart of a method for detecting and repairing abnormal data based on multi-constraint collaboration according to an embodiment of the present application. Referring to fig. 1, a time sequence data repairing method based on multi-constraint collaboration provided in an embodiment of the present application may include:
s101: and determining a time interval to be repaired in the multi-element time sequence.
In practice, the multivariate time series X is a series of n univariate time series. Each unitary time series x f Are all uniquely determined by the upper corner mark fAnd (3) determining, wherein f represents a feature. The feature set of X is f= { F 1 ,f 2 ,...,f n }. Timestamp sets in a multivariate time series
Figure BDA0004092640440000051
Where K is a positive integer, the set of timestamps refers to the set of timestamps for all data points on the multivariate time series X. When the time stamp is T and T is T, the observed value of X is
Figure BDA0004092640440000052
Wherein->
Figure BDA0004092640440000053
Indicating feature f when the timestamp is t i Is a data value of (a). When defining time interval T i,j Is a continuous set of time stamps T i,j ={t i ,....,t j }, and->
Figure BDA0004092640440000054
The multiple time series X is in the time interval T i,j All data point sets on the table are as follows:
Figure BDA0004092640440000055
the above formula is called a subsequence of the multivariate time series X, comprising n subsequences of the univariate time series having the same time starting point. In the present application, in order to reduce the throughput of data and to increase the processing speed, it is first necessary to divide the multiplex time series X into a plurality of repair blocks. Repair block B may be represented by a triplet: b=<T B ,T abnormal ,F F >Wherein T is B To repair the time interval of block B, T abnormal To repair the abnormal time interval of block B, F F To repair the variable feature set of block B, F F The features within are referred to as variable features. Any timestamp belonging to the abnormal time interval can cause a certain constraint violation due to the observation value corresponding to the timestamp in the multivariate time sequence. So can be according toThe time interval divides the multi-element time sequence into a plurality of repair blocks, namely, the time interval to be repaired in the multi-element time sequence is determined.
In addition, since the manner of determining the time intervals to be repaired in the multivariate time series is not the same, the present application can be described in terms of a possible determination manner.
In one case, it is directed to how to determine the time interval to repair. Accordingly, S101: the method for determining the time interval to be repaired in the multi-element time sequence specifically comprises the following steps:
compliance checking is carried out on the constraints in the constraint compliance set, and illegal constraints are determined;
determining an abnormal time interval in the multi-element time sequence according to the violation constraint;
determining a time interval to be repaired in the multi-element time sequence according to the abnormal time interval;
the time interval to be repaired is larger than the abnormal time interval.
In practical applications, the offending constraint may be determined by compliance checking of the constraints in the set of constraints. For any timestamp T e T abnormal The observations Xt of the multivariate time series X will all lead to a violation of a constraint, for any feature F ε F F There is a timestamp T e T abnormal Data points of
Figure BDA0004092640440000061
Participation causes violations of the constraint. So can be according to the abnormal time interval T abnormal And determining an abnormal time interval in the multi-element time sequence according to the relation between the multi-element time sequence and the violation constraint. It should be noted that, for the sake of calculation, it is necessary to make a reliable timestamp set exist in the selected time interval to be repaired, where the reliable timestamp set is defined as T R =T B -T abnormal That is, the value of the time interval to be repaired should be slightly larger than the abnormal time interval, but should not be too large in order to ensure the calculation speed.
S102: and establishing a dependent network according to the time interval to be repaired.
In practical application, in order to implement repair of time series data, after determining a time interval to be repaired in a multi-element time sequence, a dependent network needs to be established according to the time interval to be repaired, and it should be noted that this step is mainly to establish a directed graph part of the dependent network.
In addition, since the ways of establishing the dependent network are not the same, the present application can be described in terms of possible establishment ways.
In one case, it is directed to how to establish a dependent network. Correspondingly, S102: establishing a dependent network according to the time interval to be repaired, which specifically comprises the following steps:
acquiring a variable feature set in the time interval to be repaired;
and establishing a variable characteristic dependent network directed graph according to the dependency relationship among the variable characteristics in the variable characteristic set, so as to establish a dependent network.
In practical application, F F To repair the variable feature set of block B, F F The features within are referred to as variable features. Because the dependency relationship exists between the variable features, the dependency relationship between the variable features in the repair block B is acquired firstly, namely the dependency relationship between the variable features in the variable feature set in the time interval to be repaired, and then a variable feature dependent network directed graph is built according to the dependency relationship, so that the establishment of a dependent network is realized.
S103: and obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network.
In practical application, after the dependency network is established, a variable feature prediction sequence and a corresponding prediction model dictionary are required to be acquired according to the dependency network, so that the subsequent repair of time sequence data is realized. For example, obtaining a variable feature prediction order and corresponding prediction model dictionary from a dependent network may be accomplished by creating a global model dictionary, cycling through all the variable features and training the corresponding model, creating nodes that include the variable features and model dictionary, and invoking a recursive algorithm according to a dependent network dependent correlation algorithm.
In addition, since the manner of acquiring the variable feature prediction order and the corresponding prediction model dictionary is not the same, the present application can explain possible acquisition manners.
In one case, a prediction model dictionary is used for how to obtain the variable feature prediction order and corresponding. Accordingly, S103: obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network, wherein the method specifically comprises the following steps:
analyzing the variable feature-dependent network directed graph;
obtaining a variable feature prediction sequence according to the variable feature dependence network directed graph;
and learning a variable feature prediction model according to the variable feature dependence network directed graph, and obtaining a prediction model dictionary corresponding to the variable feature prediction sequence.
In practical application, the variable feature-dependent network directed graph can be analyzed first, then the variable feature prediction sequence is obtained according to the variable feature-dependent network directed graph, the variable feature prediction model is learned, and finally the prediction model dictionary corresponding to the variable feature prediction sequence is obtained. The specific algorithm is as follows:
1.
Figure BDA0004092640440000071
2.F F obtaining a variable feature set of repair block B
3.T B Time interval for acquiring repair block B
4.T abnormal Obtaining an abnormal time interval of repair block B
5.T R =T B -T abnormal
6.for f in F F
7.
Figure BDA0004092640440000072
8.root←newNode(NULL,globalModelDict)
9.PredictionOrderTree(root,G,T R )
10.orderList,modelDictList←root.GetTreePath()
11.return(orderList,modelDictList)
To sum up, a global model dictionary may be built and the variable feature set F of the repair block B is input F Time interval T B Abnormal time interval T abnormal Reliable timestamp set T R =T B -T abnormal . The relevant machine learning model is trained by the first input variable feature dependent network G, the relevant feature dictionary RF and the repair block B. And then creating a root node comprising the variable features and the model dictionary and calling a recursive algorithm to obtain the variable feature prediction sequence and the corresponding prediction model dictionary.
S104: and determining candidate restoration values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary.
In practical application, in order to realize the repair of time sequence data, after the variable feature prediction sequence and the corresponding prediction model dictionary are obtained, the variable feature value search tree is searched for variable feature values by utilizing the variable feature prediction sequence and the corresponding prediction model dictionary, so as to determine candidate repair values corresponding to the variable features.
Further, since the manner of determining the candidate restoration value corresponding to the variable feature is not the same, the present application can explain the possible manner of determination.
In one case, it is directed to how to determine candidate repair values for the variable feature. Correspondingly, S104: determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary, wherein the candidate repair values specifically comprise:
acquiring machine learning model information corresponding to the variable feature prediction sequence from the prediction model dictionary;
and determining candidate repair values corresponding to the variable features by using the machine learning model information and the variable feature prediction sequence.
In practice, the predictive model dictionary is all machine learning model information that may be used to record the predicted order of the variable features for the corresponding locations. Therefore, the method can acquire the machine learning model information corresponding to the variable feature prediction sequence from the prediction model dictionary, and then determine candidate repair values corresponding to the variable features by using the machine learning model information and the variable feature prediction sequence. The specific algorithm is as follows:
1.if i=order.size()
2.
Figure BDA0004092640440000081
3.return
4.models←modelDict[order[i]]
5.
Figure BDA0004092640440000091
6.for model F,f in models
7.
Figure BDA0004092640440000092
8.
Figure BDA0004092640440000093
Figure BDA0004092640440000097
9.
Figure BDA0004092640440000094
10.
Figure BDA0004092640440000095
in summary, a variable feature prediction order, a corresponding prediction model dictionary modeDict, and a multivariate time subsequence may be given
Figure BDA0004092640440000096
Predicting the ith variable feature in the variable feature prediction sequence traversed at present to obtain a variable feature candidate repair valueAnd retains the original information of the variable features.
S105: and updating the multi-element time sequence according to the candidate repair value to realize the repair of time sequence data.
In practical application, after the candidate repair value is obtained, the multi-element time sequence can be updated according to the candidate repair value, so that the error time sequence data in the multi-element time sequence can be repaired.
In addition, since the repair time becomes shorter as the data to be repaired is smaller, the present application can explain the possible manner of shortening the repair time.
In one case, it is aimed at how to shorten the repair time of the time series data. Correspondingly, before updating the candidate repair value into the multivariate time sequence to realize the repair of the time series data, the method further comprises the following steps:
evaluating the candidate repair values corresponding to the variable features by using a cleaning cost function according to constraint information and original data information;
and determining the best candidate restoration value according to the evaluation result.
In practical application, if the target feature value, i.e. the predicted candidate repair value, is simply predicted by using the corresponding prediction model dictionary according to the variable feature prediction sequence, and then the time sequence is updated by using the candidate repair value, the obtained values of the variable features in the new time sequence are modified. However, such repair results are not ideal. The original data information of the variable features may be correct so we cannot modify the data value of each variable feature. In this regard, the benefit and cost of the candidate repair values may be considered simultaneously in the present application, the candidate repair values corresponding to the variable features are evaluated by using the cleaning cost function according to the constraint information and the original data information, and the optimal candidate repair value is determined according to the evaluation result. It should be noted that candidate repair values pose fewer constraint violations, the greater the benefit; the greater the modification of the original data information by the candidate repair values, the higher the cost. Therefore, we find a candidate repair value among all candidate repairs as a repair result to update into the multivariate time series, which has greater benefit while having less cost, where the candidate repair value is the best candidate repair value. The specific cleaning cost function is as follows:
Figure BDA0004092640440000101
the method comprises the steps of determining a constraint value of a cleaning cost function VRRD, wherein the constraint value is used for constraint information, the cleaning cost function VRRD is used for cleaning the cleaning cost function VRRD, the constraint information is used for the operation rate, the original data information is used for the cleaning cost function VRRD, and k is a constant larger than 0 and used for controlling the dominant weight of the operation rate and the cleaning cost function VRRD.
In addition, since the ways of updating the multivariate time series are not the same, the present application can be described in terms of possible updating ways.
In one case, the repair of the time series data is implemented for how the multivariate time series is updated. Accordingly, S105: updating the multivariate time sequence according to the candidate repair value to realize the repair of time sequence data, and specifically comprises the following steps:
selecting the best candidate repair value from the candidate repair values;
and updating the best candidate repair value into the multivariate time sequence to realize the repair of time sequence data.
In practical application, in order to reduce the cost, the benefit of candidate repair is improved. And evaluating candidate repair values corresponding to the variable features by using the cleaning cost function. And selecting a candidate repair value with the minimum VRRD value, namely selecting the best candidate repair value in the candidate repair values. Updating the best candidate repair value into the multivariate time sequence to realize the repair of the time sequence data.
In summary, the present application first determines a time interval to be repaired in a multivariate time sequence, then establishes a dependent network according to the time interval to be repaired, and obtains a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependent network. And then determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary. And finally updating the multi-element time sequence according to the candidate repair value to realize the repair of the time sequence data. Therefore, the multi-element time sequence is divided, and the multi-element time sequence is repaired by adopting the algorithm of the upper class constraint, so that the violation of new constraint is not introduced in the repairing process, and the repairing speed of the error data is improved on the premise of improving the accuracy and the reliability of the error data repairing result.
Based on the method for detecting and repairing abnormal data based on multi-constraint coordination provided by the embodiment, the application also provides a device for detecting and repairing abnormal data based on multi-constraint coordination. The device for detecting and repairing abnormal data based on multi-constraint cooperation is described below with reference to the embodiments and the drawings, respectively.
Fig. 2 is a schematic structural diagram of a device for detecting and repairing abnormal data based on multi-constraint collaboration according to an embodiment of the present application. As described in connection with fig. 2, an apparatus 200 for detecting and repairing abnormal data based on multi-constraint collaboration according to an embodiment of the present application includes:
a first determining module 201, configured to determine a time interval to be repaired in the multivariate time sequence;
a building module 202, configured to build a dependent network according to the time interval to be repaired;
the obtaining module 203 is configured to obtain a variable feature prediction order and a corresponding prediction model dictionary according to the dependency network;
a second determining module 204, configured to determine candidate repair values corresponding to the variable features according to the variable feature prediction order and the prediction model dictionary;
and the repair module 205 is configured to update the candidate repair value to the multivariate time sequence, so as to implement repair of time-series data.
As an embodiment, the first determining module 201 is specifically configured to:
compliance checking is carried out on the constraints in the constraint compliance set, and illegal constraints are determined;
determining an abnormal time interval in the multi-element time sequence according to the violation constraint;
determining a time interval to be repaired in the multi-element time sequence according to the abnormal time interval;
the time interval to be repaired is larger than the abnormal time interval.
As an embodiment, the above-mentioned establishing module 202 is specifically configured to:
acquiring a variable feature set in the time interval to be repaired;
and establishing a variable characteristic dependent network directed graph according to the dependency relationship among the variable characteristics in the variable characteristic set, so as to establish a dependent network.
As an embodiment, the above-mentioned obtaining module 203 is specifically configured to, with respect to how to obtain the variable feature prediction order and the corresponding prediction model dictionary according to the dependency network:
analyzing the variable feature-dependent network directed graph;
obtaining a variable feature prediction sequence according to the variable feature dependence network directed graph;
and learning a variable feature prediction model according to the variable feature dependence network directed graph, and obtaining a prediction model dictionary corresponding to the variable feature prediction sequence.
As an embodiment, the second determining module 204 is specifically configured to determine, according to the variable feature prediction order and the prediction model dictionary, a candidate repair value corresponding to the variable feature:
acquiring machine learning model information corresponding to the variable feature prediction sequence from the prediction model dictionary;
and determining candidate repair values corresponding to the variable features by using the machine learning model information and the variable feature prediction sequence.
As an embodiment, the apparatus 200 for detecting and repairing abnormal data based on multi-constraint collaboration further includes: an evaluation module;
the evaluation module is used for evaluating the candidate repair values corresponding to the variable features by using a cleaning cost function according to constraint information and original data information;
and determining the best candidate restoration value according to the evaluation result.
As an embodiment, for how to update the multivariate time sequence according to the candidate repair values, the repair module 205 is specifically configured to:
selecting the best candidate repair value from the candidate repair values;
and updating the best candidate repair value into the multivariate time sequence to realize the repair of time sequence data.
In summary, the present application first determines a time interval to be repaired in a multivariate time sequence, then establishes a dependent network according to the time interval to be repaired, and obtains a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependent network. And then determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary. And finally updating the multi-element time sequence according to the candidate repair value to realize the repair of the time sequence data. Therefore, the multi-element time sequence is divided, and the multi-element time sequence is repaired by adopting the algorithm of the upper class constraint, so that the violation of new constraint is not introduced in the repairing process, and the repairing speed of the error data is improved on the premise of improving the accuracy and the reliability of the error data repairing result.
In addition, the application also provides time sequence data restoration equipment based on multi-constraint cooperation, which is characterized by comprising the following steps:
a memory for storing a computer program;
a processor for implementing the steps of the multi-constraint collaboration-based time series data restoration method as described in any one of the above when executing the computer program.
In addition, the application further provides a readable storage medium, and the readable storage medium stores a computer program, and the computer program realizes the steps of the time sequence data restoration method based on multi-constraint collaboration according to any one of the above steps when being executed by a processor.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for detecting and repairing abnormal data based on multi-constraint collaboration, the method comprising:
determining a time interval to be repaired in the multi-element time sequence;
establishing a dependent network according to the time interval to be repaired;
obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network;
determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary;
and updating the multi-element time sequence according to the candidate repair value to realize the repair of time sequence data.
2. The method of claim 1, wherein the determining the time interval to repair in the multivariate time series comprises:
compliance checking is carried out on the constraints in the constraint compliance set, and illegal constraints are determined;
determining an abnormal time interval in the multi-element time sequence according to the violation constraint;
determining a time interval to be repaired in the multi-element time sequence according to the abnormal time interval;
the time interval to be repaired is larger than the abnormal time interval.
3. The method of claim 1, wherein the establishing a dependent network according to the time interval to be repaired comprises:
acquiring a variable feature set in the time interval to be repaired;
and establishing a variable characteristic dependent network directed graph according to the dependency relationship among the variable characteristics in the variable characteristic set, so as to establish a dependent network.
4. A method according to claim 3, wherein said obtaining a variable feature prediction order and corresponding prediction model dictionary from said dependent network comprises:
analyzing the variable feature-dependent network directed graph;
obtaining a variable feature prediction sequence according to the variable feature dependence network directed graph;
and learning a variable feature prediction model according to the variable feature dependence network directed graph, and obtaining a prediction model dictionary corresponding to the variable feature prediction sequence.
5. The method of claim 1, wherein said determining candidate repair values for variable features from the variable feature prediction order and the prediction model dictionary comprises:
acquiring machine learning model information corresponding to the variable feature prediction sequence from the prediction model dictionary;
and determining candidate repair values corresponding to the variable features by using the machine learning model information and the variable feature prediction sequence.
6. The method of claim 1, wherein updating the candidate repair values into the multivariate time series, prior to effecting repair of the time series data, further comprises:
evaluating the candidate repair values corresponding to the variable features by using a cleaning cost function according to constraint information and original data information;
and determining the best candidate restoration value according to the evaluation result.
7. The method of claim 6, wherein updating the multivariate time series based on the candidate repair values to effect repair of time series data comprises:
selecting the best candidate repair value from the candidate repair values;
and updating the best candidate repair value into the multivariate time sequence to realize the repair of time sequence data.
8. An apparatus for detecting and repairing abnormal data based on multi-constraint collaboration, comprising:
the first determining module is used for determining a time interval to be repaired in the multi-element time sequence;
the building module is used for building a dependent network according to the time interval to be repaired;
the obtaining module is used for obtaining a variable feature prediction sequence and a corresponding prediction model dictionary according to the dependence network;
the second determining module is used for determining candidate repair values corresponding to the variable features according to the variable feature prediction sequence and the prediction model dictionary;
and the repair module is used for updating the candidate repair value into the multivariate time sequence to realize the repair of the time sequence data.
9. An apparatus for detecting and repairing abnormal data based on multi-constraint collaboration, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for detecting and repairing abnormal data based on multi-constraint collaboration according to any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, wherein a computer program is stored on the readable storage medium, which when executed by a processor, implements the steps of the method for detecting and repairing abnormal data based on multi-constraint collaboration according to any one of claims 1 to 7.
CN202310156625.6A 2023-02-17 2023-02-17 Abnormal data detection and restoration method based on multi-constraint collaboration and related products Pending CN116304589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310156625.6A CN116304589A (en) 2023-02-17 2023-02-17 Abnormal data detection and restoration method based on multi-constraint collaboration and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310156625.6A CN116304589A (en) 2023-02-17 2023-02-17 Abnormal data detection and restoration method based on multi-constraint collaboration and related products

Publications (1)

Publication Number Publication Date
CN116304589A true CN116304589A (en) 2023-06-23

Family

ID=86825007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310156625.6A Pending CN116304589A (en) 2023-02-17 2023-02-17 Abnormal data detection and restoration method based on multi-constraint collaboration and related products

Country Status (1)

Country Link
CN (1) CN116304589A (en)

Similar Documents

Publication Publication Date Title
CN111459700A (en) Method and apparatus for diagnosing device failure, diagnostic device, and storage medium
CN111177505A (en) Training method, recommendation method and device of index anomaly detection model
CN112528519A (en) Method, system, readable medium and electronic device for engine quality early warning service
CN112036426B (en) Method and system for unsupervised anomaly detection and liability using majority voting of high-dimensional sensor data
CN113032238A (en) Real-time root cause analysis method based on application knowledge graph
US20210365813A1 (en) Management computer, management program, and management method
CN111897695A (en) Method and device for acquiring KPI abnormal data sample and computer equipment
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
US11928565B2 (en) Automated model building and updating environment
CN111241079A (en) Data cleaning method and device and computer readable storage medium
CN113110961B (en) Equipment abnormality detection method and device, computer equipment and readable storage medium
CN114416573A (en) Defect analysis method, device, equipment and medium for application program
WO2021199227A1 (en) Error cause estimation device and estimation method
CN108459920A (en) A kind of identification of test data outlier and modification method
CN117171157A (en) Clearing data acquisition and cleaning method based on data analysis
CN116304589A (en) Abnormal data detection and restoration method based on multi-constraint collaboration and related products
JP5191064B2 (en) Inference method and apparatus for indeterminate and inconsistent ontologies for specific queries
JP2007164346A (en) Decision tree changing method, abnormality determination method, and program
CN113570070B (en) Streaming data sampling and model updating method, device, system and storage medium
CN114629776B (en) Fault analysis method and device based on graph model
CN114880407A (en) Intelligent user identification method and system based on strong and weak relation network
CN114153881A (en) High-recall cause and effect discovery method, device and equipment based on time sequence operation and maintenance big data
CN117520040B (en) Micro-service fault root cause determining method, electronic equipment and storage medium
CN110244563B (en) Neural network internal model controller model mismatch identification and online updating method
CN113377630A (en) Universal KPI anomaly detection framework implementation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination