CN117033352A - Data restoration method and device, terminal equipment and storage medium - Google Patents

Data restoration method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN117033352A
CN117033352A CN202310802230.9A CN202310802230A CN117033352A CN 117033352 A CN117033352 A CN 117033352A CN 202310802230 A CN202310802230 A CN 202310802230A CN 117033352 A CN117033352 A CN 117033352A
Authority
CN
China
Prior art keywords
matrix
data
initial
objective function
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310802230.9A
Other languages
Chinese (zh)
Other versions
CN117033352B (en
Inventor
李林超
崔孝冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202310802230.9A priority Critical patent/CN117033352B/en
Publication of CN117033352A publication Critical patent/CN117033352A/en
Application granted granted Critical
Publication of CN117033352B publication Critical patent/CN117033352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Remote Sensing (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Monitoring And Testing Of Transmission In General (AREA)

Abstract

The application is suitable for the technical field of data processing, and provides a data repairing method, a device, terminal equipment and a storage medium. The method comprises the following steps: acquiring N sections of time sequence data respectively acquired by N monitoring stations, constructing to obtain a first matrix, optimizing an objective function by taking correlation among the data as a constraint condition of the objective function, and decomposing the first matrix into the sum of a second matrix and a third matrix according to an optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, the fifth matrix represents the correlation in the time dimension, the predicted value corresponding to the missing value is determined according to the third matrix, the fourth matrix and the fifth matrix, and the missing value is filled by using the predicted value. The application can simultaneously consider the time correlation of single time sequence data and the space correlation of time sequence data of different sites, thereby improving the accuracy of data restoration.

Description

Data restoration method and device, terminal equipment and storage medium
Technical Field
The present application belongs to the technical field of data processing, and in particular, relates to a data repairing method, a device, a terminal device, and a storage medium.
Background
The global satellite navigation system (Global Navigation Satellite System, GNSS) can monitor time series data of targets through each monitoring station and provide data for early warning and research of natural disasters such as earthquakes, volcanoes and the like. However, due to a receiver failure, power interruption, etc., the time-series data monitored by the GNSS may have a problem of data loss.
Aiming at the problem of data missing, the current common data restoration methods comprise an interpolation method, a mean value method, a matrix filling method, a decomposition method and the like, but the data restoration results of the methods have larger deviation from a true value, so that the accuracy of data restoration is lower.
Disclosure of Invention
In view of the above, the embodiments of the present application provide a data repairing method, apparatus, terminal device, and storage medium, so as to solve the technical problem of low accuracy of the existing data repairing.
In a first aspect, an embodiment of the present application provides a data repair method, including:
acquiring N sections of time sequence data respectively acquired by N monitoring stations; wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2;
constructing and obtaining a first matrix according to the N-section time sequence data;
optimizing the objective function by taking the correlation between the data as a constraint condition of the objective function, and decomposing the first matrix into the sum of a second matrix and a third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension;
determining a predicted value corresponding to the missing value according to the third matrix, the fourth matrix and the fifth matrix;
filling the missing value by using the predicted value.
Optionally, the objective function is created by:
constructing a main term expression of the objective function, wherein the main term expression is used for estimating errors of the first matrix and the sum of the second matrix and the third matrix obtained by decomposition;
constructing a first constraint condition of the objective function, wherein the first constraint condition is used for capturing time correlation among time series data of different monitoring sites;
constructing a second constraint condition of the objective function, wherein the second constraint condition is used for capturing the spatial correlation between time series data of different monitoring sites;
and creating the objective function according to the main term expression, the first constraint condition and the second constraint condition.
Optionally, the creating the objective function according to the main term expression, the first constraint condition and the second constraint condition includes:
constructing a third constraint for balancing the first constraint and the second constraint;
and creating and obtaining the objective function according to the main term expression, the first constraint condition, the second constraint condition and the third constraint condition.
Optionally, the optimizing the objective function with the correlation between the data as the constraint condition of the objective function, and decomposing the first matrix into a sum of a second matrix and a third matrix according to the result of the optimizing, includes:
acquiring a first initial matrix corresponding to the third matrix, a second initial matrix corresponding to the fourth matrix and a third initial matrix corresponding to the fifth matrix, which are obtained through initialization;
performing iterative update processing on the first initial matrix, the second initial matrix and the third initial matrix by using a random gradient descent method and combining the objective function and the first matrix, so that a reconstruction error of the sum of the first matrix, the second matrix and the third matrix is minimum;
determining the first initial matrix after the iterative update processing as the third matrix, determining the second initial matrix after the iterative update processing as the fourth matrix, and determining the third initial matrix after the iterative update processing as the fifth matrix.
Optionally, the obtaining the first initial matrix corresponding to the third matrix, the second initial matrix corresponding to the fourth matrix, and the third initial matrix corresponding to the fifth matrix obtained through initialization includes:
initializing by a principal component analysis method to obtain the second initial matrix and the third initial matrix, and initializing by a random initialization method to obtain the first initial matrix;
the iterative update processing of the first initial matrix, the second initial matrix and the third initial matrix by using a random gradient descent method and combining the objective function and the first matrix comprises the following steps:
setting the learning rate of the iterative update process;
selecting sample data from the first matrix;
calculating a target gradient of the target function on the sample data;
and carrying out iterative updating processing on the first initial matrix, the second initial matrix and the third initial matrix according to the learning rate of the iterative updating processing and the target gradient.
Optionally, the constructing to obtain a first matrix according to the N-period time series data includes:
converting each section of time sequence data into a Hanker matrix respectively to obtain N Hanker matrixes;
and combining the N Hank matrixes to obtain the first matrix.
Optionally, the determining, according to the third matrix, the fourth matrix, and the fifth matrix, the predicted value corresponding to the missing value includes:
constructing a sixth matrix according to the third matrix, the fourth matrix and the fifth matrix;
and determining a predicted value corresponding to the missing value according to the sixth matrix.
In a second aspect, an embodiment of the present application provides a data repair apparatus, including:
the acquisition unit is used for acquiring N sections of time sequence data acquired by N monitoring stations respectively; wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2;
the construction unit is used for constructing and obtaining a first matrix according to the N sections of time sequence data;
the decomposition unit is used for optimizing the objective function by taking the correlation between the data as the constraint condition of the objective function, and decomposing the first matrix into the sum of a second matrix and a third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension;
the determining unit is used for determining a predicted value corresponding to the missing value according to the third matrix, the fourth matrix and the fifth matrix;
and the filling unit is used for filling the missing value by using the predicted value.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the method for obtaining a dynamic constitutive relation of a material according to any one of the first aspects described above when the computer program is executed by the processor.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the data repair method according to any one of the first aspects above.
In a fifth aspect, an embodiment of the present application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the steps of the data repair method as set forth in any one of the first aspects above.
The method, the device, the equipment and the medium for the data restoration method provided by the embodiment of the application have the following beneficial effects:
according to the data restoration method provided by the application, N sections of time series data respectively acquired by N monitoring stations are acquired, a first matrix is constructed according to the N sections of time series data, then the correlation among the data is used as a constraint condition of an objective function to optimize the objective function, and the first matrix is decomposed into the sum of a second matrix and a third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, the fifth matrix represents the correlation in the time dimension, then the predicted value corresponding to the missing value is determined according to the third matrix, the fourth matrix and the fifth matrix, and the missing value is filled with the predicted value. According to the data restoration method provided by the embodiment of the application, the first matrix constructed by the N time series data is decomposed into the third matrix, the fourth matrix and the fifth matrix, wherein the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension, so that the time correlation of single time series data and the space correlation among time series data of different sites can be considered at the same time, and the accuracy of data restoration is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a data repair method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of creating an objective function according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of decomposing a first matrix into a sum of a second matrix and a third matrix according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data repairing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
It is to be understood that the terminology used in the embodiments of the application is for the purpose of describing particular embodiments of the application only, and is not intended to be limiting of the application. In the description of the embodiments of the present application, unless otherwise indicated, "a plurality" means two or more, and "at least one", "one or more" means one, two or more. The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a definition of "a first", "a second" feature may explicitly or implicitly include one or more of such features.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The execution main body of the data restoration method provided by the embodiment of the application can be terminal equipment. The terminal device may be included in an electronic device such as a mobile phone, a tablet computer, a notebook computer, and a desktop computer.
The data restoration method provided by the embodiment of the application can be applied to restoring the missing data in the time sequence data monitored by each monitoring station in the GNSS. Specifically, when a user wants to perform data restoration on missing data in time series data monitored by a certain monitoring station, the steps of the data restoration method provided by the embodiment of the application can be executed through terminal equipment, so that the missing data in the time series data can be subjected to data restoration.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a data repair method according to an embodiment of the present application, where the data repair method may include S101 to S105, which are described in detail as follows:
in S101, N pieces of time-series data acquired by N monitoring stations, respectively, are acquired.
Wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2.
The specific value of N can be set according to actual demands, and the time series data comprise a plurality of moments and monitoring data corresponding to the moments.
In the embodiment of the application, when the time series data of the monitoring station A has a missing value, N sections of time series data respectively acquired by N monitoring stations comprising the monitoring station A can be acquired. The number of times and the number of data included in the N-period time-series data may be the same.
In S102, a first matrix is constructed from the N-segment time-series data.
In the embodiment of the application, after the terminal equipment acquires the N-section time series data, a first matrix can be constructed according to the acquired N-section time series data, wherein the first matrix has a missing value needing to be repaired. For example, the first matrix belonging to the hanker matrix may be constructed according to characteristics of time series data acquired by the monitoring station of the GNSS.
In one possible implementation manner, S102 may be implemented through steps a to b, which are described in detail as follows:
in the step a, each period of time series data is respectively converted into a Hanker matrix to obtain N Hanker matrices.
In this implementation manner, for any time series data, a sliding window size t may be selected, where, because GNSS time series data generally has periodicity, in practical application, the selected sliding window size t may be an integer multiple of the period of GNSS time series data, after the sliding window size t, the time series data may be divided into a plurality of sub-sequences with a length of t according to the sliding window size t, and each sub-sequence obtained by division may be spliced according to rows, thereby forming a hanker matrix; and executing the steps for forming the N-section time sequence data to obtain the Hanker matrix, thereby obtaining N Hanker matrices.
In step b, N hanker matrices are combined to obtain a first matrix.
In this implementation manner, after obtaining N hanker matrices, the terminal device may combine the N hanker matrices to obtain the first matrix. The specific method of combination may be set according to practical application, and is not limited herein. For example, N hank matrices may be spliced according to a preset order according to rows, so as to form the first matrix.
In S103, the objective function is optimized with the correlation between the data as the constraint condition of the objective function, and the first matrix is decomposed into the sum of the second matrix and the third matrix according to the result of the optimization.
The second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension.
In the embodiment of the application, the terminal equipment can pre-create and obtain the objective functionFor approximately decomposing the first matrix to obtain a sum of the second matrix and the third matrix. Specifically, the first matrix may be H P The second matrix may be represented by the product of a fourth matrix, which may be represented by U, and a fifth matrix, which may be represented by V, and the third matrix may be represented by S, based on which an objective function is used to apply the first matrix H P Approximately decomposed into U x V + S.
In one possible implementation manner, the objective function may be created through S201 to S204 as shown in fig. 2, and fig. 2 is a schematic flow chart for creating the objective function according to an embodiment of the present application, which is described in detail below:
in S201, a main term expression of the objective function is constructed.
The main term expression is used for estimating errors of the first matrix and the sum of the second matrix and the third matrix obtained through decomposition.
In this implementation, the principal term expression of the objective function may be:
in S202, a first constraint of an objective function is constructed.
Wherein the first constraint is used to capture time correlation between time series data of different monitoring sites.
In this implementation, the first constraint of the objective function may be:
λ T C(U,V)
wherein, the definition of the first constraint condition may be:
in S203, a second constraint of the objective function is constructed.
Wherein the second constraint is used to capture spatial correlation between time series data of different monitoring sites.
In this implementation, the second constraint of the objective function may be:
λ S C(S)
wherein, the definition of the second constraint condition may be:
in S204, an objective function is created based on the main expression, the first constraint, and the second constraint.
In this implementation, after the main term expression, the first constraint condition, and the second constraint condition are constructed, a third constraint condition for balancing the first constraint condition and the second constraint condition may also be constructed.
The third constraint may be:
λ O C(U,V,S)
wherein, the definition of the third constraint condition may be:
based on this, the terminal device may create the objective function according to the main term expression, the first constraint condition, the second constraint condition, and the third constraint condition.
The resulting objective function created can thus be:
wherein, in the formulaIs the index set corresponding to the measured value in HP, h pn,t And->Respectively H P And H isElements (pn, t). C (U, V), C (S) and C (U, V, S) are three constraints of temporal correlation, spatial correlation and avoidance of overfitting, respectively, lambda T 、λ S And lambda (lambda) O Coefficients, respectively. />Representing the F-norm.
In the embodiment of the present application, after the objective function is constructed, the objective function is optimized, and the first matrix is decomposed into the sum of the second matrix and the third matrix according to the result of the optimization, that is, the first matrix H P Approximately decomposed into U x V + S.
Referring to fig. 3, fig. 3 is a schematic flow chart of decomposing a first matrix into a sum of a second matrix and a third matrix according to an embodiment of the present application, and as shown in fig. 3, a method for decomposing the first matrix into a sum of the second matrix and the third matrix includes S301 to S303, which are described in detail as follows:
in S301, a first initial matrix corresponding to the third matrix, a second initial matrix corresponding to the fourth matrix, and a third initial matrix corresponding to the fifth matrix obtained by the initialization are acquired.
In the embodiment of the application, the terminal equipment can respectively obtain the first initial matrix corresponding to the third matrix, the second initial matrix corresponding to the fourth matrix and the third initial matrix corresponding to the fifth matrix in an initializing mode.
Specifically, the first initial matrix corresponding to the third matrix can be obtained by initializing through a random initializing method, the second initial matrix corresponding to the fourth matrix U can be obtained by initializing through a principal component analysis method, and the third initial matrix corresponding to the fifth matrix U can be obtained by initializing through the principal component analysis method.
In S302, by using a random gradient descent method, and combining the objective function and the first matrix, iterative update processing is performed on the first initial matrix, the second initial matrix, and the third initial matrix, so that a reconstruction error of a sum of the first matrix, the second matrix, and the third matrix is minimized.
In the embodiment of the application, a user can set the learning rate of iterative updating processing through terminal equipment, randomly select sample data from a first matrix, calculate the target gradient of an objective function on the selected sample data, and after the learning rate of iterative updating processing is set and the target gradient on the selected sample data is obtained through calculation, iterative updating processing can be carried out on the first initial matrix, the second initial matrix and a third initial matrix according to the learning rate and the target gradient, and the sum of the product of the second initial matrix and the third initial matrix and the first initial matrix is more and more close to the first matrix in the iterative updating processing process.
The first initial matrix, the second initial matrix and the third initial matrix may be iteratively updated respectively by the following formulas:
where f is an objective function, u ik 、v ik Sum s k The elements in the second initial matrix, the third initial matrix, and the first initial matrix, respectively.
After each iteration update processing is performed on the first initial matrix, the second initial matrix and the third initial matrix, whether a preset stop condition is reached or not can be judged, and if the preset stop condition is reached, the iteration update processing can be stopped on the first initial matrix, the second initial matrix and the third initial matrix.
For example, the preset stopping condition may be set such that the difference between the product of the first matrix and the second initial matrix and the sum of the third initial matrix and the first initial matrix is smaller than the preset stopping value, and the preset stopping value may be 0.01, that is, when the difference between the product of the second initial matrix and the third initial matrix and the sum of the first initial matrix and the first matrix is smaller than 0.01, the iterative updating process is stopped for the first initial matrix, the second initial matrix and the third initial matrix.
In S303, the first initial matrix after the iterative update processing is determined as a third matrix, the second initial matrix after the iterative update processing is determined as a fourth matrix, and the third initial matrix after the iterative update processing is determined as a fifth matrix.
In the embodiment of the application, after a preset stopping condition is reached, the first initial matrix obtained by the latest iteration update process can be determined as the third matrix, the second initial matrix obtained by the latest iteration update process can be determined as the fourth matrix, and the third initial matrix obtained by the latest iteration update process can be determined as the fifth matrix.
In S104, a predicted value corresponding to the missing value is determined based on the third matrix, the fourth matrix, and the fifth matrix.
In the embodiment of the present application, after decomposing the first matrix into the sum of the second matrix and the third matrix, that is, decomposing the first matrix into the sum of the product of the fourth matrix and the fifth matrix and the third matrix, the terminal device may construct a sixth matrix according to the third matrix, the fourth matrix and the fifth matrix, where the sixth matrix may be the sum of the product of the fourth matrix and the fifth matrix and the third matrix, where the difference between the sixth matrix and the first matrix is that there is a missing value on the first matrix, and the sixth matrix is a matrix that is obtained by decomposing and reconstructing the first matrix and does not have a missing value.
After the sixth matrix is constructed, the predicted value corresponding to the missing value may be determined according to the sixth matrix, specifically, the corresponding row and the corresponding column on the sixth matrix may be determined according to the row and the column where the missing value exists on the first matrix, and the predicted value corresponding to the missing value may be determined according to the data on the corresponding row and the corresponding column on the sixth matrix.
In S105, the missing value is padded with the predicted value.
In the embodiment of the application, after the terminal equipment determines the predicted value corresponding to the missing value, the predicted value can be used for filling the missing value, specifically, the predicted value corresponding to the missing value can be filled to the corresponding position of the first matrix, the filled first matrix is converted into time series data, and further, the data restoration can be carried out on the time series data with the missing value.
The above can be seen that, according to the data restoration method provided by the application, the first matrix is constructed according to N sections of time series data acquired by N monitoring stations respectively, the correlation between the data is used as the constraint condition of the objective function to optimize the objective function, and the first matrix is decomposed into the sum of the second matrix and the third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, the fifth matrix represents the correlation in the time dimension, then the predicted value corresponding to the missing value is determined according to the third matrix, the fourth matrix and the fifth matrix, and the missing value is filled with the predicted value. According to the data restoration method provided by the embodiment of the application, the first matrix constructed by the N time series data is decomposed into the third matrix, the fourth matrix and the fifth matrix, wherein the third matrix represents the correlation in the space dimension, and the fifth matrix represents the correlation in the time dimension, so that the time correlation of the time series data and the space correlation among the time series data of different sites are simultaneously considered, and the accuracy of data restoration is further improved.
Based on the data repairing method provided by the above embodiment, the embodiment of the present application further provides a data repairing device for implementing the above method embodiment, please refer to fig. 4, and fig. 4 is a schematic structural diagram of a data repairing device provided by the embodiment of the present application. As shown in fig. 4, the data repair device 4 may include an acquisition unit 41, a construction unit 42, a decomposition unit 43, a determination unit 44, and a shim unit 45. Wherein:
the acquiring unit 41 is configured to acquire N-period time-series data acquired by N monitoring stations respectively; wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2.
The construction unit 42 is configured to construct a first matrix according to the N-segment time-series data.
The decomposition unit 43 is configured to optimize an objective function with a correlation between data as a constraint condition of the objective function, and decompose the first matrix into a sum of the second matrix and the third matrix according to a result of the optimization; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension.
The determining unit 44 is configured to determine a predicted value corresponding to the missing value according to the third matrix, the fourth matrix, and the fifth matrix.
The shim unit 45 is used to shim the missing values using the predicted values.
Optionally, the data repair device 4 may further include an objective function creation unit, wherein:
the objective function creation unit is specifically configured to construct a main term expression of the objective function, where the main term expression is used to estimate an error between the first matrix and a sum of the second matrix and the third matrix obtained by decomposition;
constructing a first constraint condition of an objective function, wherein the first constraint condition is used for capturing time correlation among time series data of different monitoring sites;
constructing a second constraint condition of the objective function, wherein the second constraint condition is used for capturing the spatial correlation between time series data of different monitoring sites;
and creating and obtaining an objective function according to the main term expression, the first constraint condition and the second constraint condition.
Optionally, the objective function creating unit is specifically further configured to construct a third constraint condition for balancing the first constraint condition and the second constraint condition;
and creating and obtaining an objective function according to the main term expression, the first constraint condition, the second constraint condition and the third constraint condition.
Optionally, the decomposing unit 43 is specifically configured to obtain a first initial matrix corresponding to the third matrix, a second initial matrix corresponding to the fourth matrix, and a third initial matrix corresponding to the fifth matrix, which are obtained through initialization;
performing iterative updating processing on the first initial matrix, the second initial matrix and the third initial matrix by utilizing a random gradient descent method and combining an objective function and the first matrix, so that the reconstruction error of the sum of the first matrix, the second matrix and the third matrix is minimum;
the first initial matrix after the iterative updating process is determined as a third matrix, the second initial matrix after the iterative updating process is determined as a fourth matrix, and the third initial matrix after the iterative updating process is determined as a fifth matrix.
Optionally, the decomposition unit 43 is further specifically configured to initialize the second initial matrix and the third initial matrix by a principal component analysis method, and initialize the first initial matrix by a random initialization method.
Optionally, the decomposing unit 43 is specifically further configured to set a learning rate of the iterative update process;
selecting sample data from the first matrix;
calculating a target gradient of the target function on the sample data;
and carrying out iterative updating processing on the first initial matrix, the second initial matrix and the third initial matrix according to the learning rate and the target gradient of the iterative updating processing.
Optionally, the construction unit 42 is specifically configured to convert each period of time series data into a hanker matrix, to obtain N hanker matrices;
and combining the N Hank matrixes to obtain a first matrix.
Optionally, the determining unit 44 is specifically configured to construct a sixth matrix according to the third matrix, the fourth matrix, and the fifth matrix;
and determining a predicted value corresponding to the missing value according to the sixth matrix.
It should be noted that, because the content of information interaction and execution process between the above units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to the method embodiment specifically, and will not be described herein again.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device 5 provided in this embodiment may include: a processor 50, a memory 51 and a computer program 52 stored in the memory 51 and executable on the processor 50. Such as a program corresponding to a data repair method. The steps described above as being applied to the data repair method embodiment are implemented when the processor 50 executes the computer program 52, such as S101 to S105 shown in fig. 1, S201 to S204 shown in fig. 2, and S301 to S303 in fig. 3. Alternatively, the processor 50 may implement the functions of the modules/units in the embodiment corresponding to the terminal device 5, such as the functions of the units 41 to 45 shown in fig. 4, when executing the computer program 52.
By way of example, the computer program 52 may be partitioned into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 52 in the terminal device 5. For example, the computer program 52 may be divided into the obtaining unit 31, the constructing unit 32, the decomposing unit 33, the determining unit 34 and the filling unit 35, and the specific functions of the respective units are described in the corresponding embodiment of fig. 4, which is not repeated here.
It will be appreciated by those skilled in the art that fig. 5 is merely an example of the terminal device 5 and does not constitute a limitation of the terminal device 5, and may include more or fewer components than shown, or may combine certain components, or different components.
The processor 50 may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used to store computer programs and other programs and data required by the terminal device. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units according to needs, i.e. the internal structure of the data repair device is divided into different functional units, so as to perform all or part of the above-described functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application. The specific working process of the units in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, which when executed by a processor, performs the steps of the respective method embodiments described above.
The embodiments of the present application provide a computer program product for causing a terminal device to carry out the steps of the respective method embodiments described above when the computer program product is run on the terminal device.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference may be made to related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A method of data repair, comprising:
acquiring N sections of time sequence data respectively acquired by N monitoring stations; wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2;
constructing and obtaining a first matrix according to the N-section time sequence data;
optimizing the objective function by taking the correlation between the data as a constraint condition of the objective function, and decomposing the first matrix into the sum of a second matrix and a third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension;
determining a predicted value corresponding to the missing value according to the third matrix, the fourth matrix and the fifth matrix;
filling the missing value by using the predicted value.
2. The data repair method of claim 1 wherein the objective function is created by:
constructing a main term expression of the objective function, wherein the main term expression is used for estimating errors of the first matrix and the sum of the second matrix and the third matrix obtained by decomposition;
constructing a first constraint condition of the objective function, wherein the first constraint condition is used for capturing time correlation among time series data of different monitoring sites;
constructing a second constraint condition of the objective function, wherein the second constraint condition is used for capturing the spatial correlation between time series data of different monitoring sites;
and creating the objective function according to the main term expression, the first constraint condition and the second constraint condition.
3. The method of claim 2, wherein creating the objective function based on the main term expression, the first constraint, and the second constraint comprises:
constructing a third constraint for balancing the first constraint and the second constraint;
and creating and obtaining the objective function according to the main term expression, the first constraint condition, the second constraint condition and the third constraint condition.
4. The method of claim 1, wherein optimizing the objective function with the correlation between data as the constraint condition of the objective function, and decomposing the first matrix into a sum of a second matrix and a third matrix according to the result of the optimization, comprises:
acquiring a first initial matrix corresponding to the third matrix, a second initial matrix corresponding to the fourth matrix and a third initial matrix corresponding to the fifth matrix, which are obtained through initialization;
performing iterative update processing on the first initial matrix, the second initial matrix and the third initial matrix by using a random gradient descent method and combining the objective function and the first matrix, so that a reconstruction error of the sum of the first matrix, the second matrix and the third matrix is minimum;
determining the first initial matrix after the iterative update processing as the third matrix, determining the second initial matrix after the iterative update processing as the fourth matrix, and determining the third initial matrix after the iterative update processing as the fifth matrix.
5. The method for repairing data according to claim 4, wherein the obtaining the first initial matrix corresponding to the third matrix, the second initial matrix corresponding to the fourth matrix, and the third initial matrix corresponding to the fifth matrix obtained by initialization includes:
initializing by a principal component analysis method to obtain the second initial matrix and the third initial matrix, and initializing by a random initialization method to obtain the first initial matrix;
the iterative update processing of the first initial matrix, the second initial matrix and the third initial matrix by using a random gradient descent method and combining the objective function and the first matrix comprises the following steps:
setting the learning rate of the iterative update process;
selecting sample data from the first matrix;
calculating a target gradient of the target function on the sample data;
and carrying out iterative updating processing on the first initial matrix, the second initial matrix and the third initial matrix according to the learning rate of the iterative updating processing and the target gradient.
6. The method of claim 1, wherein the constructing a first matrix according to the N-segment time-series data includes:
converting each section of time sequence data into a Hanker matrix respectively to obtain N Hanker matrixes;
and combining the N Hank matrixes to obtain the first matrix.
7. The method of any one of claims 1 to 6, wherein determining the predicted value corresponding to the missing value according to the third matrix, the fourth matrix, and the fifth matrix includes:
constructing a sixth matrix according to the third matrix, the fourth matrix and the fifth matrix;
and determining a predicted value corresponding to the missing value according to the sixth matrix.
8. A data repair device, comprising:
the acquisition unit is used for acquiring N sections of time sequence data acquired by N monitoring stations respectively; wherein, at least one period of time sequence data in the N period of time sequence data has a missing value, and N is more than or equal to 2;
the construction unit is used for constructing and obtaining a first matrix according to the N sections of time sequence data;
the decomposition unit is used for optimizing the objective function by taking the correlation between the data as the constraint condition of the objective function, and decomposing the first matrix into the sum of a second matrix and a third matrix according to the optimized result; the second matrix is the product of a fourth matrix and a fifth matrix, the third matrix represents the correlation in the space dimension, the fourth matrix represents the potential correlation, and the fifth matrix represents the correlation in the time dimension;
the determining unit is used for determining a predicted value corresponding to the missing value according to the third matrix, the fourth matrix and the fifth matrix;
and the filling unit is used for filling the missing value by using the predicted value.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data restoration method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the data restoration method according to any one of claims 1 to 7.
CN202310802230.9A 2023-07-03 2023-07-03 Data restoration method and device, terminal equipment and storage medium Active CN117033352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310802230.9A CN117033352B (en) 2023-07-03 2023-07-03 Data restoration method and device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310802230.9A CN117033352B (en) 2023-07-03 2023-07-03 Data restoration method and device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117033352A true CN117033352A (en) 2023-11-10
CN117033352B CN117033352B (en) 2024-08-16

Family

ID=88641883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310802230.9A Active CN117033352B (en) 2023-07-03 2023-07-03 Data restoration method and device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117033352B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754008A (en) * 2018-12-28 2019-05-14 上海理工大学 The estimation method of the symmetrical sparse network missing information of higher-dimension based on matrix decomposition
KR20190059088A (en) * 2017-11-22 2019-05-30 한국전자통신연구원 Method and apparatus for minimizing deviation of pronunciation score using decorrelation of low-rank matrix
CN114168574A (en) * 2021-10-27 2022-03-11 清华大学 Industrial load oriented data missing processing method and device
WO2022110640A1 (en) * 2020-11-27 2022-06-02 平安科技(深圳)有限公司 Model optimization method and apparatus, computer device and storage medium
CN115618208A (en) * 2022-10-24 2023-01-17 招联消费金融有限公司 Data restoration method and device, computer equipment and storage medium
CN116089788A (en) * 2023-03-23 2023-05-09 深圳市大数据研究院 Online missing data processing method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190059088A (en) * 2017-11-22 2019-05-30 한국전자통신연구원 Method and apparatus for minimizing deviation of pronunciation score using decorrelation of low-rank matrix
CN109754008A (en) * 2018-12-28 2019-05-14 上海理工大学 The estimation method of the symmetrical sparse network missing information of higher-dimension based on matrix decomposition
WO2022110640A1 (en) * 2020-11-27 2022-06-02 平安科技(深圳)有限公司 Model optimization method and apparatus, computer device and storage medium
CN114168574A (en) * 2021-10-27 2022-03-11 清华大学 Industrial load oriented data missing processing method and device
CN115618208A (en) * 2022-10-24 2023-01-17 招联消费金融有限公司 Data restoration method and device, computer equipment and storage medium
CN116089788A (en) * 2023-03-23 2023-05-09 深圳市大数据研究院 Online missing data processing method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何丹丹;王立娟;: "分布式数据库用户丢失数据恢复重构仿真", 计算机仿真, no. 06, 15 June 2018 (2018-06-15) *
朱芳;符欲梅;陈得宝;: "基于SVR的桥梁健康监测系统缺失数据在线填补研究", 传感技术学报, no. 05, 12 June 2018 (2018-06-12) *
袁卫华;王红;杜向华;: "结合非负矩阵填充及子集划分的协同推荐算法", 小型微型计算机系统, no. 12, 15 December 2017 (2017-12-15) *

Also Published As

Publication number Publication date
CN117033352B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN112380098B (en) Timing sequence abnormality detection method and device, computer equipment and storage medium
CN113946986B (en) Method and device for evaluating average time before product failure based on accelerated degradation test
CN110827369B (en) Undersampling model generation method, image reconstruction method, apparatus and storage medium
CN115841046B (en) Accelerated degradation test data processing method and device based on wiener process
JP2019105927A (en) Failure probability calculation device, failure probability calculation method and program
CN116010226A (en) Software system reliability simulation evaluation method and device and computer equipment
CN113609445A (en) Multi-source heterogeneous monitoring data processing method, terminal device and readable storage medium
CN108038149A (en) A kind of temperature field data reconstruction method
CN115689061B (en) Wind power ultra-short term power prediction method and related equipment
CN117033352B (en) Data restoration method and device, terminal equipment and storage medium
CN117473312A (en) Bearing state prediction method, bearing state prediction device, computer equipment and storage medium
CN115840881B (en) Air data processing method and device and related equipment
CN110631499B (en) Digital image deformation measuring method based on non-grid method and electronic equipment
CN115577573B (en) Method, device, equipment and storage medium for predicting output current of synchronous generator
CN114003172B (en) Storage capacity correction method, storage capacity correction device, computer equipment and storage medium
CN107562695B (en) Load data processing method and device for distribution transformer
CN114549945A (en) Remote sensing image change detection method and related device
CN114816954A (en) Performance prediction method for deep learning model training and related equipment
CN110990761B (en) Hydrological model parameter calibration method, hydrological model parameter calibration device, computer equipment and storage medium
CN118229271B (en) Service life assessment method, device, equipment and medium for nuclear power safety level equipment
CN115994235B (en) Chromatographic analysis method library construction method, chromatographic analysis device library construction equipment and chromatographic analysis computer medium
CN117634101B (en) Chip surface morphology determination method, chip surface morphology determination device, computer device and storage medium
CN115208412A (en) Sparse signal reconstruction method, apparatus, computer device, medium, and product
CN116258068A (en) Transient stability evaluation method and device for power system and computer equipment
WO2020054599A1 (en) Model generator, prediction device, model generation method, prediction method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant