CN114253959B

CN114253959B - Data complement method based on dynamics principle and time difference

Info

Publication number: CN114253959B
Application number: CN202111570817.9A
Authority: CN
Inventors: 侯修全; 冯守渤; 马艺鸣; 韩敏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2024-07-12
Anticipated expiration: 2041-12-21
Also published as: CN114253959A

Abstract

A data complement method based on a dynamics principle and time difference comprises a potential dimension analysis, a complement model and an iterative optimization algorithm part of multi-element time sequence data. In operation, the potential dimension analysis part uses singular value decomposition to estimate the number of principal components of the data and determines the dimension of the potential variable; the completion model is derived from a basic differential equation of a dynamics system, and based on the assumption that data can be represented by a low dimension and sparse noise, the data is completed by utilizing time differential regularization; the optimization solution algorithm iteratively solves the model by solving information such as gradients and near-end operators. Aiming at the common missing problem in time sequence data sampling, the invention provides an effective data complement method by considering potential information, has the advantages of good complement effect, quick and simple operation, strong robustness, wide application occasions and the like, is suitable for various time sequence fields conforming to the dynamics principle, and solves the unavoidable problems of data loss and noise.

Description

Data complement method based on dynamics principle and time difference

Technical Field

The invention belongs to the fields of control science and computer application, relates to a data complement method based on a dynamics principle and time difference, and provides an effective complement method aiming at the common data loss and noise problems in multi-element time sequence sampling.

Background

The multi-element time sequence is a plurality of groups of data obtained by interval sampling, can be used for analyzing the internal change rule of a system by analyzing and modeling the multi-element time sequence, and has wide application prospect in various fields such as weather, economy and the like. In the sampling process, the multivariate time sequence is affected by sensor errors, manual misoperation, inherent noise, unexpected faults and the like, and the data possibly contains missing values and noise. To ensure information integrity and handleability, the problem of data loss needs to be solved. Existing treatment methods can be broadly divided into: a filling method based on a statistical principle and neighbor data, an interpolation method based on a fitting function, an optimization algorithm based on low-rank matrix decomposition and a deep learning method based on a neural network. The matrix decomposition-based method assumes that the time sequences have high correlation, tries to find the spatial correlation among the multidimensional sequences, or adopts a regularization method based on graph or time correlation to reserve the time dependence, which is represented by TemporalRegularizedMatrixFactorization (TRMF); neural network-related methods are based on data driving, including using self-encoders to express deep features of data (e.g., VAEs), using generation of more optimal data against the network (e.g., GAIN), or using autoregressive models to learn the relationships between sampling points (e.g., RNN and GRU-related methods).

In order to obtain an innovative data complement scheme meeting the requirements, the data complement method needs to keep the purity of original data, capture the relativity among the data as much as possible, and reduce the calculation complexity of an algorithm. With the increase of data dimension, the increase of data volume and the requirement of the complement effect, the simple filling and interpolation method cannot meet the requirement. The method based on deep learning has higher computational complexity, and the capturing and complementing effects of batch training on the integral characteristics of the data are difficult to guarantee. In contrast, the complement method of the multivariate time series based on low rank matrix decomposition has lower computational complexity. However, most of the existing methods are based on the thought of statistics, and aiming at the assumption that all data adopts low data rank and abnormal value sparseness, the inherent change characteristics and dynamics rules of the multi-element time sequence cannot be considered. Therefore, it is necessary to establish a complementary method that is simple, efficient, fast, lightweight, and captures the dynamics characteristic of a multivariate time series.

Disclosure of Invention

In order to improve the complement effect and reduce the computation complexity, the invention establishes a multielement time sequence complement method based on a dynamics principle and time difference. Aiming at the problems of insufficient utilization of the characteristics of the multi-element time series based on the matrix decomposition method, poor complementation effect of the traditional interpolation and fitting method and low training speed of the deep neural network method, the invention builds a model by combining the basic thought of low-rank matrix decomposition from the aspect of the dynamics principle satisfied by the multi-element time series, fully utilizes the variation factors of the potential characteristics in the data, considers the connection among the sampling points of the multi-element time series by combining the time difference regularization, and realizes the effective filtering and the improvement of the complementation effect of noise.

In order to achieve the purposes of improving the complement effect and reducing the calculation complexity, the invention adopts the following technical scheme:

a data complement method based on a dynamics principle and a time difference comprises the following steps:

Step 1, acquiring a multi-element time sequence to be completed, wherein the multi-element time sequence is generally data obtained by actual sampling, such as temperature and change data of pollutant content along with time. Then it is converted into a two-dimensional matrix, represented by the observation matrix M, The number of rows n and the number of columns s respectively represent the sampling place and the sampling time, and each row of data in M is a one-dimensional time sequence.

And 2, preprocessing the observation matrix M before constructing the model, and marking invalid elements and missing values in the observation matrix M as 0. In order to distinguish between missing and non-missing parts of the observation matrix M, a corresponding mask matrix W is first generated from the observation matrix M. The dimension of the mask matrix W is the same as that of the observation matrix M, and if the element M _ij of the ith row and jth column of the observation matrix M is not missing, the element W _ij of the ith row and jth column of the mask matrix W is set to 1; if the element of the kth row and the kth column of the observation matrix M is missing, the element of the kth row and the kth column of the mask matrix W is 0. In order to avoid the influence of different data scales on the complement effect, normalization operation is performed on all data of each row of the observation matrix M, that is, the same sampling location, as shown in formula (1):

the maximum and minimum values for each row of the observation matrix M are recorded for use in inverse normalization of the complement results. The observation matrix after normalization is still denoted by M, and the matrix M in operation refers to the normalized observation matrix M.

Step 3, in order to obtain the potential feature dimension d of the data, SVD decomposition is carried out on the observation matrix M after normalization in the second step, three matrices of U, sigma and V are obtained, and the three matrices are shown as a formula (2):

M＝UΣV (2)

Wherein U and V are left and right singular matrices, respectively, independent of subsequent operations. Σ is a diagonal matrix, diagonal elements are singular values σ ₁,σ₂…σ_m of the observation matrix M, as shown in equation (3):

And the singular values are arranged from large to small, i.e. sigma ₁>σ₂>σ₃>...>σ_m, and the singular value size represents the importance of the information. The number of singular values m=min (n, s).

The selection method of the potential feature dimension d comprises the following two references:

1) Cumulative summation is performed on singular values σ ₁,σ₂,σ₃,...,σ_m to find the first k singular values σ ₁,σ₂,...σ_k, so that the sum of the first k singular values accounts for more than a certain proportion (such as 90%) of the sum of all m total singular values, and k at this time is used as a potential feature dimension d, as shown in formula (4):

2) The first k singular values are found such that from the (k+1) th singular value, there is a significant decrease in the order of magnitude of the singular value, e.g., the value of σ _k+1 decreases below 1/10 of σ _k, where the number k is the potential feature dimension d, as shown in equation (5):

and 4, after the potential feature dimension d is determined, determining the dimensions of each matrix of the completion model. The matrix used in the model comprises a reconstruction complement matrix Latent feature matrixFeature mapping matrixNoise matrixAnd an observation matrix M and a mask matrix W in the second step.

The construction of the complement model comprises the following 2 substeps:

Step 4.1 in order to ensure that the reconstruction complement matrix Y does not change the existing data of the observation matrix M and to filter noise, a relationship between the reconstruction complement matrix Y and the observation matrix M needs to be established. The observation matrix M has noise and missing values, the noise only exists in the non-missing part of the observation matrix M, the reconstruction complement matrix Y does not contain noise, and a constraint equation (6) is introduced to express the relation between Y and M:

Wherein, Representing Hadamard product, and (6) representing the real observation matrix M and the reconstruction complement matrix Y, their non-missing partsOnly by sparse noise S. Noise S is removed from the observation matrix M in a low rank + sparse separation form to preserve an effective reconstruction complement matrix Y. Meanwhile, the l ₁ norm of the matrix is used for measuring the sparseness degree of the noise matrix S, as shown in the formula (7):

Where s= { S _ij } represents an element in the noise matrix, and l ₁ norm is defined as the sum of absolute values of all elements in the matrix, and constraining l ₁ norm of the S matrix can make S have sparse characteristics.

In the step 4.1, the constraint of l ₁ norms is introduced, so that noise is filtered, but most existing complement methods do not have the effect of noise filtering.

Step 4.2 uses the idea of low rank complement, assuming Y is a linear combination of potential features X, so index (8) is used to measure the low rank characteristics of the data:

Wherein the F-norm is defined as the sum of squares of the absolute values of all elements of the matrix, the smaller the value thereof, the smaller the difference between Y and CX. The formula (8) adopts the thought of low-rank matrix decomposition, and the reconstructed complement matrix Y is the complement result.

In addition, as the observed multivariate time sequence is the representation of the evolution of the complex system in practice, the state of the system accords with a dynamic equation, and the change of the system is continuous by Lipschitz, each row of data X of the potential feature X has a smooth feature. To ensure such smooth features of X, it is ensured that the result after completion of the latent feature X still conforms to the kinetic equation, the difference between adjacent moments of each row of data X of the latent feature X is constrained by equation (9), the smaller the value of which indicates the smoother the change of each row of data X of the latent feature X:

wherein F ₁ is a first order differential matrix, having values 1 and-1 only at positions near the diagonal, the remaining positions being 0, As shown in formula (10):

In combination with the above equations (7), (8), (9) and (10), the constraint relation between the reconstruction completion matrix Y and the observation matrix M can be described by the constraint optimization problem as shown in equation (11):

the λ and β are coefficients of l ₁ norm regularization and time difference regularization, respectively, and are used for balancing the low rank of data, the duty ratio of noise, and the smoothness of data change.

In the step 4.1, when the low-rank characteristic of the complement matrix Y is constrained and reconstructed, time difference regularization is innovatively introduced, so that the complement precision of the invention for the multi-element time sequence is improved.

And 5, after the model is constructed, carrying out optimization solving on the formula (11). The constrained optimization problem may be solved using the augmented lagrangian multiplier method. Constructing a corresponding augmented lagrangian function as shown in equation (12):

where ρ is the coefficient of the augmentation term, For the augmented Lagrangian multiplier matrix, an alternate direction multiplier method is then used to solve for the optimal solution for the augmented Lagrangian function. Firstly, initializing super parameters lambda and rho, and a reconstruction complement matrix Y, a mapping matrix C, a potential feature matrix X and a noise matrix S, wherein Y, S, X, C uses a random initialization method; Λ is the augmented lagrangian submatrix using a zero initialization method. And setting the iteration times, and updating Y, X, C, S and Λ respectively by iteration solution formulas (13), (14), (15), (16) and (17) in each step of iteration.

After multiple iterations, the overall optimal solution can be gradually approached, the reconstruction complement matrix Y is the multi-element time sequence after the completion, and the obtained Y matrix is subjected to inverse normalization to obtain the final multi-element time sequence complement result.

The beneficial effects of the invention are as follows:

the invention provides a multi-element time sequence complement method, which aims at the problems of high computational complexity and poor complement effect of the current method and constructs a matrix decomposition model based on a dynamics principle and time difference. Compared with the existing method, the method fully considers the time dependence and the data relevance, and has low calculation complexity and good complementation effect. In addition, the realization of the method is based on matrix operation, is light and flexible, does not need excessive dependence, can be conveniently applied to various data processing flows, and has wide application scenes.

Drawings

FIG. 1 is a flow chart of a data complement method based on the dynamics principle and time difference.

Fig. 2 is a graph of Beijing temperature data (excluding missing values).

Fig. 3 (a) is a Beijing temperature data scatter plot (containing 30% missing).

Fig. 3 (b) is a Beijing temperature data line graph (with 30% missing).

Fig. 4 is a graph of the Beijing temperature data complement results.

Fig. 5 is a Beijing temperature data latent feature map.

Detailed Description

In order to make the solution to the problems of the method, the method scheme adopted and the effect of the method achieved by the invention more clear, the invention is further described in detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the matters related to the present invention are shown in the accompanying drawings.

1. Data and operating environment

This example is described using a portion of the data from the Beijing air temperature monitoring station, with 30% of the data being randomly missing by hand. As shown in fig. 2 and 3.

A computer having a programming language or computing environment capable of performing matrix operations can read a given multivariate time sequence to perform the operational steps of the method, as shown in fig. 1.

2. Implementation steps

Step 1: in this example, the Beijing temperature data is used, and 18 places such as the morning sun, the sulcus of the door, and the plain valley in Beijing city are included, and the sampling interval is 1h from 30 days of 1 month in 2017 to 30 days of 1 month in 2018. The first 1000 sampling time points of data are selected in this example, the data dimension is 18×1000, n=18, s=1000, as shown in formula (18):

For convenience of comparison and complementation, the selected data set per se does not contain missing values, 30% of random missing is set manually, the missing part is set to 0, and an observation matrix M is obtained, as shown in formula (19):

step 2: in this example, M does not have an invalid element such as NaN, and therefore, 0 is not required to be used instead. Generating a corresponding mask matrix W only according to the observation matrix M, wherein the dimensions of the mask matrix W and the observation matrix M are the same, and if one element in the observation matrix M is missing, the element in the corresponding position in the mask matrix W is 0; if the element of the observation matrix M is not missing, the element of the mask matrix W corresponding to the position is 1. The mask matrix W generated in this example is shown in equation (20):

Then, normalization operation is performed on each row of the observation matrix M as shown in expression (1). The normalized observation matrix M in this example is shown in formula (21):

The maximum value and the minimum value of each row of the observation matrix M before normalization are recorded so as to inversely normalize the result. The maximum values M _max and M _min recorded in this example each contain 18 pieces of data, as shown in formula (22):

The observation matrices M referred to in the later steps are all normalized observation matrices M.

Step 3: to determine the data latent feature dimension d. SVD decomposition is carried out on the observation matrix M after the normalization in the second step, three matrixes U, sigma and V are obtained, U, V are respectively a left singular matrix and a right singular matrix, the sigma matrix is a diagonal matrix, diagonal elements are singular values sigma ₁、σ₂…σ₁₈ of the observation matrix M, and the matrices are arranged from large to small. In this example, the sigma matrix obtained by SVD decomposition is represented by equation (23).

The potential feature dimension d is next determined from the matrix Σ to construct a matrix decomposition based model. d is selected by the following two references:

1) The singular values σ ₁,σ₂,σ₃,...,σ_m are cumulatively summed to find the first k singular values σ ₁,σ₂,...σ_k, so that the sum of the first k singular values accounts for more than a certain proportion (e.g., 90%) of the sum of all m total singular values, where k is the potential feature dimension d, as shown in equation (4). In the case of this example, d=15 is set so that the potential feature dimension satisfies the requirement of greater than 90% of the duty cycle, and thus the potential feature dimension d is determined to be 15.

2) The first k singular values are found such that from the k+1st singular value, the singular value order is significantly reduced, e.g., the value of σ _k+1 is reduced to less than 1/10 of σ _k, where the number k is the potential feature dimension d, as shown in equation (5). From the singular value results (23) of the observation matrix M, it can be seen that M has a singular value of 593.83 and a minimum value of 81.52, and does not change by a significant order of magnitude, which is not suitable for this principle.

By combining the two principles, the dimension d of the potential characteristic of the data in this example is determined to be 15.

Step 4: after determining the potential feature dimension d, a reconstruction completion matrix can be obtainedMapping matrixLatent feature matrixNoise matrixAugmented lagrangian multiplier matrixPerforming initialization, wherein Y, S, X, C uses a random initialization method; Λ uses a zero initialization method. Setting initial super parameters, wherein the parameters in the formula (12) are as follows: the l ₁ -norm regularization coefficient λ=39.0, the augmentation term coefficient ρ= 36.17, the time-differential regularization coefficient β=3.82.

Step 5: the iteration number is set to 150. In each iteration, the Y, X, C, Λ, S matrices are updated using equations (13), (14), (15), (16), (17), respectively. After multiple iterations, the model gradually converges, and after the iterations are completed, the completion matrix Y is reconstructed to be the completed result. According to the normalization parameters recorded in the formula (22), carrying out inverse normalization on Y to obtain a final complement result, wherein the execution process of the overall optimization algorithm is as shown in the formula (24):

fig. 4 shows schematically the reconstruction of the complement matrix Y, and fig. 5 shows schematically the potential features X in the model, from which it is also evident that part of the features are consistent with the variation of the observation matrix, thus illustrating that the model is effective.

Finally, it should be noted that: the above examples are given solely for the purpose of illustrating the embodiments of the present application and are not intended to limit the scope of the present application, since it will be apparent to those skilled in the art after reading the present disclosure that various changes and modifications can be made therein without departing from the spirit of the application, and that various equivalent modifications of the application fall within the scope of the appended claims.

Claims

1. The data complement method based on the dynamics principle and the time difference is characterized by comprising the following steps of:

Step 1, taking temperature data obtained by actual sampling as a multi-element time sequence to be completed, converting the multi-element time sequence into a two-dimensional matrix, representing the two-dimensional matrix by an observation matrix M, The number n of rows and the number s of columns respectively represent the number of sampling places and sampling times, and each row of data in M is a one-dimensional time sequence;

Step 2, preprocessing an observation matrix M before constructing a model, and marking invalid elements and missing values in the observation matrix M as 0; in order to distinguish the missing part and the non-missing part of the observation matrix M, a corresponding mask matrix W is generated according to the observation matrix M; the dimensions of the mask matrix W are the same as the observation matrix M: if the element M _ij of the ith row and jth column of the observation matrix M is not missing, the element W _ij of the ith row and jth column of the mask matrix W is set to 1; if the element of the kth row and the kth column of the observation matrix M is missing, the element of the kth row and the kth column of the mask matrix W is 0;

M＝UΣV (2)

Wherein U and V are left and right singular matrices respectively, and are irrelevant to subsequent operation; Σ is a diagonal matrix, diagonal elements are singular values σ ₁,σ₂…σ_m of the observation matrix M, as shown in equation (3):

and the singular values are arranged from large to small, i.e. σ ₁>σ₂>σ₃>...>σ_m, the singular value number m=min (n, s);

1) Cumulative summation is carried out on singular values sigma ₁,σ₂,σ₃,...,σ_m, the first k singular values sigma ₁,σ₂,...σ_k are found, the sum of the first k singular values accounts for more than 90% of the sum of all m total singular values, and k at the moment is taken as potential characteristic dimension d as shown in a formula (4):

2) The first k singular values are found such that from the k+1st singular value, there is a significant decrease in the order of magnitude of the singular value, e.g., the value of σ _k+1 decreases below 1/10 of σ _k, where the number k is taken as the potential feature dimension d, as shown in equation (5):

step 4, after the potential feature dimension d is determined, determining the dimensions of each matrix of the completion model; the matrix used in the model comprises a reconstruction complement matrix Latent feature matrixFeature mapping matrixNoise matrixAnd an observation matrix M and a mask matrix W in the second step;

the construction of the complement model comprises the following 2 substeps:

Step 4.1, in order to ensure that the reconstruction complement matrix Y does not change the existing data of the observation matrix M, and simultaneously filter noise, the relation between the reconstruction complement matrix Y and the observation matrix M needs to be established; the observation matrix M has noise and missing values, the noise only exists in the non-missing part of the observation matrix M, the reconstruction complement matrix Y does not contain noise, and a constraint equation (6) is introduced to express the relation between Y and M:

Wherein, Representing Hadamard product, and (6) representing the real observation matrix M and the reconstruction complement matrix Y, their non-missing partsIs only affected by sparse noise S; removing noise S from the observation matrix M in a low-rank+sparse separation mode to keep an effective reconstruction complement matrix Y; meanwhile, the l ₁ norm of the matrix is used for measuring the sparseness degree of the noise matrix S, as shown in the formula (7):

wherein s= { S _ij } represents an element in the noise matrix, l ₁ norm is defined as the sum of absolute values of all elements in the matrix, and constraint on l ₁ norm of the S matrix can enable S to have sparse characteristics;

Wherein, F norm is defined as the square sum of absolute values of all elements of the matrix, and the smaller the value is, the smaller the difference between Y and CX is represented; the formula (8) adopts the idea of low-rank matrix decomposition, and the reconstructed complement matrix Y is the complement result;

To ensure such smooth features of X, it is ensured that the result after completion of the latent feature X still conforms to the kinetic equation, the difference between adjacent moments of each row of data X of the latent feature X is constrained by equation (9), the smaller the value of which indicates the smoother the change of each row of data X of the latent feature X:

wherein F ₁ is a first order differential matrix, having values 1 and-1 only at positions near the diagonal, the remaining positions being 0,

Reconstructing the constraint relation between the complement matrix Y and the observation matrix M, wherein the constraint optimization problem is described by the following formula (11):

The lambda and the beta are coefficients of l ₁ norm regularization and time difference regularization respectively and are used for balancing the low rank property of data, the duty ratio of noise and the emphasis degree of the smoothness of data change in the completion process;

step 5, after the model is constructed, carrying out optimization solution on the formula (11); solving the constraint optimization problem using an augmented lagrangian multiplier method; constructing a corresponding augmented lagrangian function as shown in equation (12):

Where ρ is the coefficient of the augmentation term, For the augmented Lagrangian multiplier matrix, an optimal solution of the augmented Lagrangian function is solved by adopting an alternate direction multiplier method;

Firstly, initializing super parameters lambda and rho, and a reconstruction complement matrix Y, a mapping matrix C, a potential feature matrix X and a noise matrix S, wherein Y, S, X, C uses a random initialization method; Λ is an augmented lagrangian submatrix, using a zero initialization method; setting iteration times, and in each step of iteration, updating Y, X, C, S and Λ by iteration solution formulas (13), (14), (15), (16) and (17) respectively;

2. The method for data complement based on dynamic principle and time difference according to claim 1, wherein in step 2, in order to avoid the influence of different data scales on the complement effect, normalization operation is further performed on each row of the observation matrix M, i.e. all data of the same sampling location, as shown in formula (1):

Recording the maximum value and the minimum value of each row of the observation matrix M so as to be used for the inverse normalization of the completion result; the observation matrix after normalization is still denoted by M, and the matrix M in operation refers to the normalized observation matrix M.