CN114253959B - Data complement method based on dynamics principle and time difference - Google Patents

Data complement method based on dynamics principle and time difference Download PDF

Info

Publication number
CN114253959B
CN114253959B CN202111570817.9A CN202111570817A CN114253959B CN 114253959 B CN114253959 B CN 114253959B CN 202111570817 A CN202111570817 A CN 202111570817A CN 114253959 B CN114253959 B CN 114253959B
Authority
CN
China
Prior art keywords
matrix
data
complement
observation
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111570817.9A
Other languages
Chinese (zh)
Other versions
CN114253959A (en
Inventor
侯修全
冯守渤
马艺鸣
韩敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111570817.9A priority Critical patent/CN114253959B/en
Publication of CN114253959A publication Critical patent/CN114253959A/en
Application granted granted Critical
Publication of CN114253959B publication Critical patent/CN114253959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Quality & Reliability (AREA)
  • Complex Calculations (AREA)

Abstract

A data complement method based on a dynamics principle and time difference comprises a potential dimension analysis, a complement model and an iterative optimization algorithm part of multi-element time sequence data. In operation, the potential dimension analysis part uses singular value decomposition to estimate the number of principal components of the data and determines the dimension of the potential variable; the completion model is derived from a basic differential equation of a dynamics system, and based on the assumption that data can be represented by a low dimension and sparse noise, the data is completed by utilizing time differential regularization; the optimization solution algorithm iteratively solves the model by solving information such as gradients and near-end operators. Aiming at the common missing problem in time sequence data sampling, the invention provides an effective data complement method by considering potential information, has the advantages of good complement effect, quick and simple operation, strong robustness, wide application occasions and the like, is suitable for various time sequence fields conforming to the dynamics principle, and solves the unavoidable problems of data loss and noise.

Description

Data complement method based on dynamics principle and time difference
Technical Field
The invention belongs to the fields of control science and computer application, relates to a data complement method based on a dynamics principle and time difference, and provides an effective complement method aiming at the common data loss and noise problems in multi-element time sequence sampling.
Background
The multi-element time sequence is a plurality of groups of data obtained by interval sampling, can be used for analyzing the internal change rule of a system by analyzing and modeling the multi-element time sequence, and has wide application prospect in various fields such as weather, economy and the like. In the sampling process, the multivariate time sequence is affected by sensor errors, manual misoperation, inherent noise, unexpected faults and the like, and the data possibly contains missing values and noise. To ensure information integrity and handleability, the problem of data loss needs to be solved. Existing treatment methods can be broadly divided into: a filling method based on a statistical principle and neighbor data, an interpolation method based on a fitting function, an optimization algorithm based on low-rank matrix decomposition and a deep learning method based on a neural network. The matrix decomposition-based method assumes that the time sequences have high correlation, tries to find the spatial correlation among the multidimensional sequences, or adopts a regularization method based on graph or time correlation to reserve the time dependence, which is represented by TemporalRegularizedMatrixFactorization (TRMF); neural network-related methods are based on data driving, including using self-encoders to express deep features of data (e.g., VAEs), using generation of more optimal data against the network (e.g., GAIN), or using autoregressive models to learn the relationships between sampling points (e.g., RNN and GRU-related methods).
In order to obtain an innovative data complement scheme meeting the requirements, the data complement method needs to keep the purity of original data, capture the relativity among the data as much as possible, and reduce the calculation complexity of an algorithm. With the increase of data dimension, the increase of data volume and the requirement of the complement effect, the simple filling and interpolation method cannot meet the requirement. The method based on deep learning has higher computational complexity, and the capturing and complementing effects of batch training on the integral characteristics of the data are difficult to guarantee. In contrast, the complement method of the multivariate time series based on low rank matrix decomposition has lower computational complexity. However, most of the existing methods are based on the thought of statistics, and aiming at the assumption that all data adopts low data rank and abnormal value sparseness, the inherent change characteristics and dynamics rules of the multi-element time sequence cannot be considered. Therefore, it is necessary to establish a complementary method that is simple, efficient, fast, lightweight, and captures the dynamics characteristic of a multivariate time series.
Disclosure of Invention
In order to improve the complement effect and reduce the computation complexity, the invention establishes a multielement time sequence complement method based on a dynamics principle and time difference. Aiming at the problems of insufficient utilization of the characteristics of the multi-element time series based on the matrix decomposition method, poor complementation effect of the traditional interpolation and fitting method and low training speed of the deep neural network method, the invention builds a model by combining the basic thought of low-rank matrix decomposition from the aspect of the dynamics principle satisfied by the multi-element time series, fully utilizes the variation factors of the potential characteristics in the data, considers the connection among the sampling points of the multi-element time series by combining the time difference regularization, and realizes the effective filtering and the improvement of the complementation effect of noise.
In order to achieve the purposes of improving the complement effect and reducing the calculation complexity, the invention adopts the following technical scheme:
a data complement method based on a dynamics principle and a time difference comprises the following steps:
Step 1, acquiring a multi-element time sequence to be completed, wherein the multi-element time sequence is generally data obtained by actual sampling, such as temperature and change data of pollutant content along with time. Then it is converted into a two-dimensional matrix, represented by the observation matrix M, The number of rows n and the number of columns s respectively represent the sampling place and the sampling time, and each row of data in M is a one-dimensional time sequence.
And 2, preprocessing the observation matrix M before constructing the model, and marking invalid elements and missing values in the observation matrix M as 0. In order to distinguish between missing and non-missing parts of the observation matrix M, a corresponding mask matrix W is first generated from the observation matrix M. The dimension of the mask matrix W is the same as that of the observation matrix M, and if the element M ij of the ith row and jth column of the observation matrix M is not missing, the element W ij of the ith row and jth column of the mask matrix W is set to 1; if the element of the kth row and the kth column of the observation matrix M is missing, the element of the kth row and the kth column of the mask matrix W is 0. In order to avoid the influence of different data scales on the complement effect, normalization operation is performed on all data of each row of the observation matrix M, that is, the same sampling location, as shown in formula (1):
the maximum and minimum values for each row of the observation matrix M are recorded for use in inverse normalization of the complement results. The observation matrix after normalization is still denoted by M, and the matrix M in operation refers to the normalized observation matrix M.
Step 3, in order to obtain the potential feature dimension d of the data, SVD decomposition is carried out on the observation matrix M after normalization in the second step, three matrices of U, sigma and V are obtained, and the three matrices are shown as a formula (2):
M=UΣV (2)
Wherein U and V are left and right singular matrices, respectively, independent of subsequent operations. Σ is a diagonal matrix, diagonal elements are singular values σ 12…σm of the observation matrix M, as shown in equation (3):
And the singular values are arranged from large to small, i.e. sigma 123>...>σm, and the singular value size represents the importance of the information. The number of singular values m=min (n, s).
The selection method of the potential feature dimension d comprises the following two references:
1) Cumulative summation is performed on singular values σ 123,...,σm to find the first k singular values σ 12,...σk, so that the sum of the first k singular values accounts for more than a certain proportion (such as 90%) of the sum of all m total singular values, and k at this time is used as a potential feature dimension d, as shown in formula (4):
2) The first k singular values are found such that from the (k+1) th singular value, there is a significant decrease in the order of magnitude of the singular value, e.g., the value of σ k+1 decreases below 1/10 of σ k, where the number k is the potential feature dimension d, as shown in equation (5):
and 4, after the potential feature dimension d is determined, determining the dimensions of each matrix of the completion model. The matrix used in the model comprises a reconstruction complement matrix Latent feature matrixFeature mapping matrixNoise matrixAnd an observation matrix M and a mask matrix W in the second step.
The construction of the complement model comprises the following 2 substeps:
Step 4.1 in order to ensure that the reconstruction complement matrix Y does not change the existing data of the observation matrix M and to filter noise, a relationship between the reconstruction complement matrix Y and the observation matrix M needs to be established. The observation matrix M has noise and missing values, the noise only exists in the non-missing part of the observation matrix M, the reconstruction complement matrix Y does not contain noise, and a constraint equation (6) is introduced to express the relation between Y and M:
Wherein, Representing Hadamard product, and (6) representing the real observation matrix M and the reconstruction complement matrix Y, their non-missing partsOnly by sparse noise S. Noise S is removed from the observation matrix M in a low rank + sparse separation form to preserve an effective reconstruction complement matrix Y. Meanwhile, the l 1 norm of the matrix is used for measuring the sparseness degree of the noise matrix S, as shown in the formula (7):
Where s= { S ij } represents an element in the noise matrix, and l 1 norm is defined as the sum of absolute values of all elements in the matrix, and constraining l 1 norm of the S matrix can make S have sparse characteristics.
In the step 4.1, the constraint of l 1 norms is introduced, so that noise is filtered, but most existing complement methods do not have the effect of noise filtering.
Step 4.2 uses the idea of low rank complement, assuming Y is a linear combination of potential features X, so index (8) is used to measure the low rank characteristics of the data:
Wherein the F-norm is defined as the sum of squares of the absolute values of all elements of the matrix, the smaller the value thereof, the smaller the difference between Y and CX. The formula (8) adopts the thought of low-rank matrix decomposition, and the reconstructed complement matrix Y is the complement result.
In addition, as the observed multivariate time sequence is the representation of the evolution of the complex system in practice, the state of the system accords with a dynamic equation, and the change of the system is continuous by Lipschitz, each row of data X of the potential feature X has a smooth feature. To ensure such smooth features of X, it is ensured that the result after completion of the latent feature X still conforms to the kinetic equation, the difference between adjacent moments of each row of data X of the latent feature X is constrained by equation (9), the smaller the value of which indicates the smoother the change of each row of data X of the latent feature X:
wherein F 1 is a first order differential matrix, having values 1 and-1 only at positions near the diagonal, the remaining positions being 0, As shown in formula (10):
In combination with the above equations (7), (8), (9) and (10), the constraint relation between the reconstruction completion matrix Y and the observation matrix M can be described by the constraint optimization problem as shown in equation (11):
the λ and β are coefficients of l 1 norm regularization and time difference regularization, respectively, and are used for balancing the low rank of data, the duty ratio of noise, and the smoothness of data change.
In the step 4.1, when the low-rank characteristic of the complement matrix Y is constrained and reconstructed, time difference regularization is innovatively introduced, so that the complement precision of the invention for the multi-element time sequence is improved.
And 5, after the model is constructed, carrying out optimization solving on the formula (11). The constrained optimization problem may be solved using the augmented lagrangian multiplier method. Constructing a corresponding augmented lagrangian function as shown in equation (12):
where ρ is the coefficient of the augmentation term, For the augmented Lagrangian multiplier matrix, an alternate direction multiplier method is then used to solve for the optimal solution for the augmented Lagrangian function. Firstly, initializing super parameters lambda and rho, and a reconstruction complement matrix Y, a mapping matrix C, a potential feature matrix X and a noise matrix S, wherein Y, S, X, C uses a random initialization method; Λ is the augmented lagrangian submatrix using a zero initialization method. And setting the iteration times, and updating Y, X, C, S and Λ respectively by iteration solution formulas (13), (14), (15), (16) and (17) in each step of iteration.
After multiple iterations, the overall optimal solution can be gradually approached, the reconstruction complement matrix Y is the multi-element time sequence after the completion, and the obtained Y matrix is subjected to inverse normalization to obtain the final multi-element time sequence complement result.
The beneficial effects of the invention are as follows:
the invention provides a multi-element time sequence complement method, which aims at the problems of high computational complexity and poor complement effect of the current method and constructs a matrix decomposition model based on a dynamics principle and time difference. Compared with the existing method, the method fully considers the time dependence and the data relevance, and has low calculation complexity and good complementation effect. In addition, the realization of the method is based on matrix operation, is light and flexible, does not need excessive dependence, can be conveniently applied to various data processing flows, and has wide application scenes.
Drawings
FIG. 1 is a flow chart of a data complement method based on the dynamics principle and time difference.
Fig. 2 is a graph of Beijing temperature data (excluding missing values).
Fig. 3 (a) is a Beijing temperature data scatter plot (containing 30% missing).
Fig. 3 (b) is a Beijing temperature data line graph (with 30% missing).
Fig. 4 is a graph of the Beijing temperature data complement results.
Fig. 5 is a Beijing temperature data latent feature map.
Detailed Description
In order to make the solution to the problems of the method, the method scheme adopted and the effect of the method achieved by the invention more clear, the invention is further described in detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the matters related to the present invention are shown in the accompanying drawings.
1. Data and operating environment
This example is described using a portion of the data from the Beijing air temperature monitoring station, with 30% of the data being randomly missing by hand. As shown in fig. 2 and 3.
A computer having a programming language or computing environment capable of performing matrix operations can read a given multivariate time sequence to perform the operational steps of the method, as shown in fig. 1.
2. Implementation steps
Step 1: in this example, the Beijing temperature data is used, and 18 places such as the morning sun, the sulcus of the door, and the plain valley in Beijing city are included, and the sampling interval is 1h from 30 days of 1 month in 2017 to 30 days of 1 month in 2018. The first 1000 sampling time points of data are selected in this example, the data dimension is 18×1000, n=18, s=1000, as shown in formula (18):
For convenience of comparison and complementation, the selected data set per se does not contain missing values, 30% of random missing is set manually, the missing part is set to 0, and an observation matrix M is obtained, as shown in formula (19):
step 2: in this example, M does not have an invalid element such as NaN, and therefore, 0 is not required to be used instead. Generating a corresponding mask matrix W only according to the observation matrix M, wherein the dimensions of the mask matrix W and the observation matrix M are the same, and if one element in the observation matrix M is missing, the element in the corresponding position in the mask matrix W is 0; if the element of the observation matrix M is not missing, the element of the mask matrix W corresponding to the position is 1. The mask matrix W generated in this example is shown in equation (20):
Then, normalization operation is performed on each row of the observation matrix M as shown in expression (1). The normalized observation matrix M in this example is shown in formula (21):
The maximum value and the minimum value of each row of the observation matrix M before normalization are recorded so as to inversely normalize the result. The maximum values M max and M min recorded in this example each contain 18 pieces of data, as shown in formula (22):
The observation matrices M referred to in the later steps are all normalized observation matrices M.
Step 3: to determine the data latent feature dimension d. SVD decomposition is carried out on the observation matrix M after the normalization in the second step, three matrixes U, sigma and V are obtained, U, V are respectively a left singular matrix and a right singular matrix, the sigma matrix is a diagonal matrix, diagonal elements are singular values sigma 1、σ2…σ18 of the observation matrix M, and the matrices are arranged from large to small. In this example, the sigma matrix obtained by SVD decomposition is represented by equation (23).
The potential feature dimension d is next determined from the matrix Σ to construct a matrix decomposition based model. d is selected by the following two references:
1) The singular values σ 123,...,σm are cumulatively summed to find the first k singular values σ 12,...σk, so that the sum of the first k singular values accounts for more than a certain proportion (e.g., 90%) of the sum of all m total singular values, where k is the potential feature dimension d, as shown in equation (4). In the case of this example, d=15 is set so that the potential feature dimension satisfies the requirement of greater than 90% of the duty cycle, and thus the potential feature dimension d is determined to be 15.
2) The first k singular values are found such that from the k+1st singular value, the singular value order is significantly reduced, e.g., the value of σ k+1 is reduced to less than 1/10 of σ k, where the number k is the potential feature dimension d, as shown in equation (5). From the singular value results (23) of the observation matrix M, it can be seen that M has a singular value of 593.83 and a minimum value of 81.52, and does not change by a significant order of magnitude, which is not suitable for this principle.
By combining the two principles, the dimension d of the potential characteristic of the data in this example is determined to be 15.
Step 4: after determining the potential feature dimension d, a reconstruction completion matrix can be obtainedMapping matrixLatent feature matrixNoise matrixAugmented lagrangian multiplier matrixPerforming initialization, wherein Y, S, X, C uses a random initialization method; Λ uses a zero initialization method. Setting initial super parameters, wherein the parameters in the formula (12) are as follows: the l 1 -norm regularization coefficient λ=39.0, the augmentation term coefficient ρ= 36.17, the time-differential regularization coefficient β=3.82.
Step 5: the iteration number is set to 150. In each iteration, the Y, X, C, Λ, S matrices are updated using equations (13), (14), (15), (16), (17), respectively. After multiple iterations, the model gradually converges, and after the iterations are completed, the completion matrix Y is reconstructed to be the completed result. According to the normalization parameters recorded in the formula (22), carrying out inverse normalization on Y to obtain a final complement result, wherein the execution process of the overall optimization algorithm is as shown in the formula (24):
fig. 4 shows schematically the reconstruction of the complement matrix Y, and fig. 5 shows schematically the potential features X in the model, from which it is also evident that part of the features are consistent with the variation of the observation matrix, thus illustrating that the model is effective.
Finally, it should be noted that: the above examples are given solely for the purpose of illustrating the embodiments of the present application and are not intended to limit the scope of the present application, since it will be apparent to those skilled in the art after reading the present disclosure that various changes and modifications can be made therein without departing from the spirit of the application, and that various equivalent modifications of the application fall within the scope of the appended claims.

Claims (2)

1. The data complement method based on the dynamics principle and the time difference is characterized by comprising the following steps of:
Step 1, taking temperature data obtained by actual sampling as a multi-element time sequence to be completed, converting the multi-element time sequence into a two-dimensional matrix, representing the two-dimensional matrix by an observation matrix M, The number n of rows and the number s of columns respectively represent the number of sampling places and sampling times, and each row of data in M is a one-dimensional time sequence;
Step 2, preprocessing an observation matrix M before constructing a model, and marking invalid elements and missing values in the observation matrix M as 0; in order to distinguish the missing part and the non-missing part of the observation matrix M, a corresponding mask matrix W is generated according to the observation matrix M; the dimensions of the mask matrix W are the same as the observation matrix M: if the element M ij of the ith row and jth column of the observation matrix M is not missing, the element W ij of the ith row and jth column of the mask matrix W is set to 1; if the element of the kth row and the kth column of the observation matrix M is missing, the element of the kth row and the kth column of the mask matrix W is 0;
Step 3, in order to obtain the potential feature dimension d of the data, SVD decomposition is carried out on the observation matrix M after normalization in the second step, three matrices of U, sigma and V are obtained, and the three matrices are shown as a formula (2):
M=UΣV (2)
Wherein U and V are left and right singular matrices respectively, and are irrelevant to subsequent operation; Σ is a diagonal matrix, diagonal elements are singular values σ 12…σm of the observation matrix M, as shown in equation (3):
and the singular values are arranged from large to small, i.e. σ 123>...>σm, the singular value number m=min (n, s);
the selection method of the potential feature dimension d comprises the following two references:
1) Cumulative summation is carried out on singular values sigma 123,...,σm, the first k singular values sigma 12,...σk are found, the sum of the first k singular values accounts for more than 90% of the sum of all m total singular values, and k at the moment is taken as potential characteristic dimension d as shown in a formula (4):
2) The first k singular values are found such that from the k+1st singular value, there is a significant decrease in the order of magnitude of the singular value, e.g., the value of σ k+1 decreases below 1/10 of σ k, where the number k is taken as the potential feature dimension d, as shown in equation (5):
step 4, after the potential feature dimension d is determined, determining the dimensions of each matrix of the completion model; the matrix used in the model comprises a reconstruction complement matrix Latent feature matrixFeature mapping matrixNoise matrixAnd an observation matrix M and a mask matrix W in the second step;
the construction of the complement model comprises the following 2 substeps:
Step 4.1, in order to ensure that the reconstruction complement matrix Y does not change the existing data of the observation matrix M, and simultaneously filter noise, the relation between the reconstruction complement matrix Y and the observation matrix M needs to be established; the observation matrix M has noise and missing values, the noise only exists in the non-missing part of the observation matrix M, the reconstruction complement matrix Y does not contain noise, and a constraint equation (6) is introduced to express the relation between Y and M:
Wherein, Representing Hadamard product, and (6) representing the real observation matrix M and the reconstruction complement matrix Y, their non-missing partsIs only affected by sparse noise S; removing noise S from the observation matrix M in a low-rank+sparse separation mode to keep an effective reconstruction complement matrix Y; meanwhile, the l 1 norm of the matrix is used for measuring the sparseness degree of the noise matrix S, as shown in the formula (7):
wherein s= { S ij } represents an element in the noise matrix, l 1 norm is defined as the sum of absolute values of all elements in the matrix, and constraint on l 1 norm of the S matrix can enable S to have sparse characteristics;
step 4.2 uses the idea of low rank complement, assuming Y is a linear combination of potential features X, so index (8) is used to measure the low rank characteristics of the data:
Wherein, F norm is defined as the square sum of absolute values of all elements of the matrix, and the smaller the value is, the smaller the difference between Y and CX is represented; the formula (8) adopts the idea of low-rank matrix decomposition, and the reconstructed complement matrix Y is the complement result;
To ensure such smooth features of X, it is ensured that the result after completion of the latent feature X still conforms to the kinetic equation, the difference between adjacent moments of each row of data X of the latent feature X is constrained by equation (9), the smaller the value of which indicates the smoother the change of each row of data X of the latent feature X:
wherein F 1 is a first order differential matrix, having values 1 and-1 only at positions near the diagonal, the remaining positions being 0,
Reconstructing the constraint relation between the complement matrix Y and the observation matrix M, wherein the constraint optimization problem is described by the following formula (11):
The lambda and the beta are coefficients of l 1 norm regularization and time difference regularization respectively and are used for balancing the low rank property of data, the duty ratio of noise and the emphasis degree of the smoothness of data change in the completion process;
step 5, after the model is constructed, carrying out optimization solution on the formula (11); solving the constraint optimization problem using an augmented lagrangian multiplier method; constructing a corresponding augmented lagrangian function as shown in equation (12):
Where ρ is the coefficient of the augmentation term, For the augmented Lagrangian multiplier matrix, an optimal solution of the augmented Lagrangian function is solved by adopting an alternate direction multiplier method;
Firstly, initializing super parameters lambda and rho, and a reconstruction complement matrix Y, a mapping matrix C, a potential feature matrix X and a noise matrix S, wherein Y, S, X, C uses a random initialization method; Λ is an augmented lagrangian submatrix, using a zero initialization method; setting iteration times, and in each step of iteration, updating Y, X, C, S and Λ by iteration solution formulas (13), (14), (15), (16) and (17) respectively;
After multiple iterations, the overall optimal solution can be gradually approached, the reconstruction complement matrix Y is the multi-element time sequence after the completion, and the obtained Y matrix is subjected to inverse normalization to obtain the final multi-element time sequence complement result.
2. The method for data complement based on dynamic principle and time difference according to claim 1, wherein in step 2, in order to avoid the influence of different data scales on the complement effect, normalization operation is further performed on each row of the observation matrix M, i.e. all data of the same sampling location, as shown in formula (1):
Recording the maximum value and the minimum value of each row of the observation matrix M so as to be used for the inverse normalization of the completion result; the observation matrix after normalization is still denoted by M, and the matrix M in operation refers to the normalized observation matrix M.
CN202111570817.9A 2021-12-21 2021-12-21 Data complement method based on dynamics principle and time difference Active CN114253959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111570817.9A CN114253959B (en) 2021-12-21 2021-12-21 Data complement method based on dynamics principle and time difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111570817.9A CN114253959B (en) 2021-12-21 2021-12-21 Data complement method based on dynamics principle and time difference

Publications (2)

Publication Number Publication Date
CN114253959A CN114253959A (en) 2022-03-29
CN114253959B true CN114253959B (en) 2024-07-12

Family

ID=80793721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111570817.9A Active CN114253959B (en) 2021-12-21 2021-12-21 Data complement method based on dynamics principle and time difference

Country Status (1)

Country Link
CN (1) CN114253959B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115145906B (en) * 2022-09-02 2023-01-03 之江实验室 Preprocessing and completion method for structured data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133930A (en) * 2017-04-30 2017-09-05 天津大学 Ranks missing image fill method with rarefaction representation is rebuild based on low-rank matrix
CN108010320A (en) * 2017-12-21 2018-05-08 北京工业大学 A kind of complementing method of the road grid traffic data based on adaptive space-time constraint low-rank algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8935308B2 (en) * 2012-01-20 2015-01-13 Mitsubishi Electric Research Laboratories, Inc. Method for recovering low-rank matrices and subspaces from data in high-dimensional matrices
CN108492561B (en) * 2018-04-04 2020-06-19 北京工业大学 Road network traffic state space-time characteristic analysis method based on matrix decomposition
CN109241491A (en) * 2018-07-28 2019-01-18 天津大学 The structural missing fill method of tensor based on joint low-rank and rarefaction representation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133930A (en) * 2017-04-30 2017-09-05 天津大学 Ranks missing image fill method with rarefaction representation is rebuild based on low-rank matrix
CN108010320A (en) * 2017-12-21 2018-05-08 北京工业大学 A kind of complementing method of the road grid traffic data based on adaptive space-time constraint low-rank algorithm

Also Published As

Publication number Publication date
CN114253959A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN110909926A (en) TCN-LSTM-based solar photovoltaic power generation prediction method
CN107507135B (en) Image reconstruction method based on coding aperture and target
Ni et al. Example-driven manifold priors for image deconvolution
CN106067165B (en) High spectrum image denoising method based on clustering sparse random field
CN113744136A (en) Image super-resolution reconstruction method and system based on channel constraint multi-feature fusion
CN113313625B (en) Ink and wash painting artistic style conversion method, system, computer equipment and storage medium
CN111832228A (en) Vibration transmission system based on CNN-LSTM
CN114253959B (en) Data complement method based on dynamics principle and time difference
CN115099461A (en) Solar radiation prediction method and system based on double-branch feature extraction
CN115694985A (en) TMB-based hybrid network traffic attack prediction method
CN110289987B (en) Multi-agent system network anti-attack capability assessment method based on characterization learning
CN113208641B (en) Auxiliary diagnosis method for lung nodule based on three-dimensional multi-resolution attention capsule network
CN114399642A (en) Convolutional neural network fluorescence spectrum feature extraction method
Liu et al. Image formation, deep learning, and physical implication of multiple time-series one-dimensional signals: method and application
CN114545494A (en) Non-supervision seismic data reconstruction method and device based on sparse constraint
CN112597890A (en) Face recognition method based on multi-dimensional Taylor network
CN117058079A (en) Thyroid imaging image automatic diagnosis method based on improved ResNet model
CN111951181A (en) Hyperspectral image denoising method based on non-local similarity and weighted truncation kernel norm
CN111209530A (en) Tensor decomposition-based heterogeneous big data factor feature extraction method and system
CN114581470B (en) Image edge detection method based on plant community behaviors
CN113888413B (en) Different-time different-source multispectral image blind spectrum super-resolution method and system
CN112767539B (en) Image three-dimensional reconstruction method and system based on deep learning
CN110780604B (en) Space-time signal recovery method based on space-time smoothness and time correlation
CN114004170A (en) Reconstruction method of cascade flow field based on limited measuring points
CN116698410B (en) Rolling bearing multi-sensor data monitoring method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant