CN113392139B - Environment monitoring data completion method and system based on association fusion - Google Patents

Environment monitoring data completion method and system based on association fusion Download PDF

Info

Publication number
CN113392139B
CN113392139B CN202110624648.6A CN202110624648A CN113392139B CN 113392139 B CN113392139 B CN 113392139B CN 202110624648 A CN202110624648 A CN 202110624648A CN 113392139 B CN113392139 B CN 113392139B
Authority
CN
China
Prior art keywords
monitoring data
time
attribute
data
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110624648.6A
Other languages
Chinese (zh)
Other versions
CN113392139A (en
Inventor
刘财政
刘盛华
沈华伟
程学旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110624648.6A priority Critical patent/CN113392139B/en
Publication of CN113392139A publication Critical patent/CN113392139A/en
Application granted granted Critical
Publication of CN113392139B publication Critical patent/CN113392139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an environment monitoring data complement method and system based on association fusion, comprising the following steps: acquiring environmental monitoring data with missing data and a corresponding marking matrix thereof; obtaining association coefficients among the attributes according to each attribute of each time point in the environment monitoring data to construct a graph G, wherein nodes in the graph G correspond to the attributes, and edges among the nodes correspond to the attribute association coefficients among the attributes; multiplying the graph G and the matrix to be complemented according to the bit to obtain an intermediate matrix, and performing time sequence processing on the intermediate matrix through a neural network to obtain the hidden state of each time point in the environment monitoring data; calculating a time relevance coefficient of the environment monitoring data according to the hidden state of each time point; multiplying the time relevance coefficient and the hidden state of each time point by bits to obtain the intermediate state of each time point in the environment monitoring data; in the environment monitoring data, nonlinear transformation of the intermediate state is adopted to obtain reconstruction complement data of the environment monitoring data.

Description

Environment monitoring data completion method and system based on association fusion
Technical Field
The invention relates to the field of data mining, in particular to an environment monitoring data complement method and system based on association fusion.
Background
In the world where the information technology is highly developed and widely applied, the digitalization degree of the social life of people is higher and higher, and the dependence of people on the Internet is also stronger and stronger. Multivariate time series data is a set of random variables ordered by time, which is typically the result of observing a certain underlying process at a given sampling rate over equally spaced time periods. The multivariate time series data essentially reflects the trend of one or more random variables over time, and the core of the multivariate time series prediction method is to mine this rule from the data and use it to make an estimate of future data. The multivariate time series analysis is widely applied to the scenes of biology, medical treatment, weather, stock exchange and the like. For example, in a stock exchange scenario, stock exchange information includes time, stock price, amount traded, trader, etc., where each attribute is called a meta. Complex time and association relationships exist for the multivariate time series, specifically: (1) attribute relevance. There is an association between the dimensions of each point in time of the multi-element time series data. (2) time correlation. The multivariate time series dynamically changes over time, and is mainly reflected in autocorrelation and trends. For example, weather observations, each hour, each day, observations of each week are characterized as dynamically changing over time. In performing multivariate time series analysis, as in fig. 1, data loss is a common problem in most scientific research fields. In some data sets, loss rates can reach 90%, which makes it difficult to use data, where the cause of the data loss is that time series data may come from different sources, data processing is inadequate, signal to noise ratio is low, measurement errors, abnormal values without response or deletion, and human error operations, etc. Data loss has been recognized as a major problem in scientific research and enterprise production, and even the most elaborate and performed studies produce missing values. Missing data hinders the ability to interpret and understand the phenomena under study, and the findings are largely dependent on analysis of these observations, and thus the effectiveness of the missing data in scientific research poses a challenge. Missing values can severely compromise analysis of multivariate time series such as classification or regression, sequential data integration and predictive tasks, resulting in extremely high demands on data completions. In the face of the high requirement of multi-element time series data complement, researchers have conducted a great deal of research and have proposed a number of models and methods, and the current methods mainly comprise the following categories:
(1) Statistical analysis-based methods: algorithms based on statistical analysis can be mainly divided into two types, wherein the first type directly deletes missing values, including simple deletion, paired deletion, column deletion and the like. These deletion-based methods face the problem of a dramatic drop in analysis performance in the face of higher data loss rates. The other class is filled with statistics, including mean, median, mode, and last significant observation.
(2) A machine learning based method: traditional machine learning abstracts the completion problem into matrix analysis and tensor decomposition problems and K neighbor problems, and compared with simple statistical analysis, the method has a better filling effect.
(3) Deep learning-based models: deep learning-based methods can be divided into three categories: 1) VAE-based methods. 2) LSTM based methods. 3) GAN-based methods. The VAE acts as a generation model by transforming the real samples through the encoder network into an ideal data distribution, which is then transferred to a decoder network to obtain generated samples, and if the generated samples are sufficiently close to the real samples, a self-encoder model is trained. The LSTM based approach receives input from the last node at each node in a time step and this can be represented by a feedback loop. In each time step, an input x is taken i And the output a of the previous node i-1 Calculate it and generate an output h i . This output is taken out and provided to the next node. This process will continue until all time steps are evaluated to completion. The GAN algorithm idea comprises two models, a generation model and a discrimination model, wherein the two models are used for countertraining together, the generation model generates some data to deceptively discriminate the model, then the discrimination model judges whether the data are true or false, finally, in the training process of the two models, the capacity of the two models is stronger and stronger, and finally, the steady state is reached
The traditional machine learning-based method is often only capable of carrying out linear relation mining, and has limited feature extraction capability when facing to nonlinear relations of multiple time sequences. In multivariate time series, whether VAE or GAN models, the learned models have limited expression capability when learning. The multivariate time series data usually has a complex association relationship, and the existing model does not extract or explicitly extract the association relationship of the multivariate time series data or only uses the time relationship when the environment monitoring data is full. When the multi-element time sequence data is fully filled, some time points are missing, certain errors exist when each missing time point is calculated, the errors are gradually accumulated along with the continuous extension of the calculation time sequence, and particularly when continuous time period data are missing, the model cannot be fully filled accurately.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an environment monitoring data complement method based on association fusion, which comprises the following steps:
step 1, acquiring the multi-element time sequence with missing data, and marking the missing data position of the multi-element time sequence to obtain a marking matrix.
Step 2, according to each attribute of each time point in the multi-element time sequence, obtaining an association coefficient between the attributes to construct a graph G, wherein nodes in the graph G correspond to the attributes, and edges between the nodes correspond to the attribute association coefficients between the attributes;
step 3, multiplying the graph G and the matrix to be complemented according to the bit to obtain an intermediate matrix X ', and performing time sequence processing on the intermediate matrix X' through a neural network to obtain the hidden state of each time point in the multi-element time sequence;
step 4, calculating the time relevance coefficient of the multi-element time sequence according to the hidden state of each time point; obtaining an intermediate state H' of each time point in the multi-element time sequence by multiplying the time correlation coefficient and the hidden state of each time point by bits;
and 5, adopting the nonlinear transformation of the generated formula to the intermediate state H' in the multi-element time sequence to obtain reconstruction complement data of the multi-element time sequence.
The environmental monitoring data complement method based on the association fusion further comprises the following steps:
step 6, calculating a reconstruction loss of the missing value according to the marking matrix and the reconstruction complement data by the following formula:
where M is the marker matrix, x is the missing data in the multivariate time series,the missing data in the multivariate time series is complemented by the reconstruction complement data;
and 7, judging whether the reconstruction loss meets a preset condition, if so, outputting the reconstruction complement data as a final result, otherwise, executing the step 2 again.
The environmental monitoring data complement method based on association fusion, wherein the step 2 comprises the following steps:
obtaining a first association coefficient by adopting a graph attention mechanism:
e ij =tanh(w·[x i |x j ]+b)
wherein x is i And x j Is the value of attribute i and attribute j, where e ij The first association coefficients representing the attribute i and the adjacent node attribute j, w and b are parameters learned by the neural network;
the attribute association coefficient a is obtained by regularizing the adjacent node attributes j of all the attributes i ij
a ij The values of the ith row and the jth column of the graph G represent the attribute association coefficients of the attribute i and the attribute j, and N represents the total number of attributes.
The environmental monitoring data complement method based on the association fusion comprises the following steps:
h=lstm (X'), where h= { H 1 ,h 2 ,h 3 ,...,h t -the hidden state at each point in time in the multivariate time sequence;
the step 4 comprises the following steps:
according to the difference of the time relevance, calculating the time relevance coefficient:
β mn =sigmod(w·[h m |h n ]+δ)
wherein h is m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
In the environmental monitoring data complement method based on the association fusion, the multi-element time sequence is environmental monitoring data, and the attribute of each time point in the environmental monitoring data comprises temperature and air pressure.
The invention also provides an environment monitoring data complement system based on the association fusion, which comprises the following steps:
the module 1 is used for acquiring a multi-element time sequence with missing data, and marking the missing data position of the multi-element time sequence to obtain a marking matrix;
the module 2 is used for obtaining association coefficients among the attributes according to each attribute of each time point in the multi-element time sequence so as to construct a graph G, wherein the graph G corresponds to the attributes, and the edges among the nodes correspond to the attribute association coefficients among the attributes;
the module 3 is configured to obtain an intermediate matrix X 'by multiplying the graph G and the matrix to be complemented by bits, and perform time sequence processing on the intermediate matrix X' through a neural network to obtain a hidden state of each time point in the multi-element time sequence;
the module 4 is used for calculating the time relevance coefficient of the multi-element time sequence according to the hidden state of each time point; obtaining an intermediate state H' of each time point in the multi-element time sequence by multiplying the time correlation coefficient and the hidden state of each time point by bits;
and a module 5, configured to obtain reconstruction complement data of the multiple time series by adopting a nonlinear transformation of a generated formula to the intermediate state H' in the multiple time series.
The environment monitoring data complement system based on the association fusion further comprises:
a module 6, configured to calculate a reconstruction loss of the missing value according to the marking matrix and the reconstruction complement data by:
where M is the marker matrix, x is the missing data in the multivariate time series,the missing data in the multivariate time series is complemented by the reconstruction complement data;
and a module 7, configured to determine whether the reconstruction loss meets a preset condition, if yes, output the reconstruction complement data as a final result, otherwise, execute the module 2 again.
The environmental monitoring data complement system based on association fusion, wherein the module 2 comprises:
obtaining a first association coefficient by adopting a graph attention mechanism:
e ij =tanh(w·[x i |x j ]+b)
wherein x is i And x j Is the value of attribute i and attribute j, where e ij The first association coefficients representing the attribute i and the adjacent node attribute j, w and b are parameters learned by the neural network;
the attribute association coefficient a is obtained by regularizing the adjacent node attributes j of all the attributes i ij
a ij The values of the ith row and the jth column of the graph G represent the attribute association coefficients of the attribute i and the attribute j, and N represents the total number of attributes.
The environmental monitoring data complement system based on the association fusion, wherein the module 3 comprises a time sequence processing for the intermediate matrix X' based on the following formula:
h=lstm (X'), where h= { H 1 ,h 2 ,h 3 ,...,h t -the hidden state at each point in time in the multivariate time sequence;
the module 4 comprises:
according to the difference of the time relevance, calculating the time relevance coefficient:
β mn =sigmod(w·[h m |h n ]+δ)
wherein h is m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
The environment monitoring data complement system based on the association fusion, wherein the multivariate time sequence is environment monitoring data, and the attribute of each time point in the environment monitoring data comprises temperature and air pressure.
The advantages of the invention are as follows:
(1) Attribute association fusion: and (3) fusing attribute relevance of the multi-element time sequence, modeling the attribute relevance of the multi-element time sequence better, extracting more relevant information, and performing sequence complementation.
(2) Time-dependent fusion: and fusing the time relevance of the multi-element time sequence, performing the nonlinear transformation of the generation formula on the fused data to obtain the reconstruction output of the sequence, and avoiding the accumulated error propagation generated gradually.
(3) The new data reconstruction method comprises the following steps: by calculating only the reconstruction loss of the missing values by means of the marker matrix, the influence of existing data on reconstruction is avoided, and missing data complementation is better focused.
Drawings
FIG. 1 is a sample of input data for a multivariate time series;
FIG. 2 is a general architecture diagram of the method of the present invention;
FIG. 3 is a schematic diagram of a method attribute association module of the present invention;
FIG. 4 is a schematic diagram of a time-dependent module of the method of the present invention;
FIG. 5 is a flow chart of an implementation of the method of the present invention;
Detailed Description
In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The flow of the invention is shown in fig. 2, the input data X is the data of the mark matrix to be complemented, the mark matrix M indicates the missing condition, if 0 indicates no missing, and if 1 indicates missing. The whole processing process comprises attribute association and time association, wherein an attribute association module is divided into attribute association coefficient calculation and attribute association allocation, and a time association module is divided into time association coefficient calculation and time association coefficient allocation. Specifically:
(1) Calculating attribute association coefficients: the multivariate time series data has an association relationship, and attributes such as temperature and air pressure of the environmental monitoring data, stock price in stock exchange and transaction amount. The attribute association module constructs a graph G, wherein attributes are regarded as points forming the graph, and association coefficients among the attributes are regarded as edges forming the graph; the method of the invention adopts a graph attention mechanism to calculate the association coefficient, as shown in figure 3, and specifically comprises the following steps of
e ij =tanh(w·[x i |x j ]+b) formula (one)
Wherein x is i And x j Is the value of attribute i and attribute j in the multivariate time series, where e ij The association coefficients representing the attribute i and the attribute j, w and b are parameters learned by the neural network.
Judging whether the input data is lost or not through the marking matrix M, and when the input data is lost, referring to the thought of DropOut, calculating neurons corresponding to the representation mask w by the missing value so as not to participate in the attention calculation. The calculation efficiency is improved by regularizing the adjacent node attributes j of all the attributes i, and the formula is as follows:
a ij is the ith row sum of the graph GThe value in the j-th column represents the regularized association coefficient of the attribute i and the attribute j, and N represents the total number of the attributes; and obtaining a graph G of the correlation fusion through attribute correlation coefficient calculation, wherein the dimension of the graph G is the same as the dimension of input data X.
(2) Attribute association coefficient assignment: and obtaining G of the association fusion through attribute association coefficient calculation, wherein the dimension of the graph G is the same as the dimension of the input data X. The graph G and the input data X are multiplied according to the position to obtain X', and the calculation formula is as follows:
x' =g X formula (iii)
X' merges the distribution of attribute association coefficients, and maintains the time sequence chronology.
(3) Calculating a time correlation coefficient: the method designs a time correlation module based on an attention mechanism, and as shown in fig. 4, the module takes the output X 'of an attribute correlation fusion module as input, adopts standard LSTM to perform time sequence processing, and adopts a calculation formula of H=LSTM (X'), wherein H= { H 1 ,h 2 ,h 3 ,...,h t The hidden state of LSTM (LSTM learned parameter) at each point in time.
(4) According to the difference of time relevance, calculating a time relevance coefficient, wherein the calculation formula is as follows:
β mn =sigmod(w·[h m |h n ]+delta) formula (IV)
Wherein H= { H 1 ,h 2 ,h 3 ,...,h t The hidden state of LSTM at each point in time, h m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
A={β mn Formula (five)
(5) Calculating a time correlation coefficient: according to the time correlation coefficient and the hidden state of the LSTM at each time point, the time correlation coefficient is distributed, and the calculation formula is as follows:
h' =a×h formula (six)
Wherein H= { H 1 ,h 2 ,h 3 ,...,h t The hidden state of LSTM at each point in time, h i (i=1, 2,3, t) is an element of each time point of H, H' is the output of the time correlation module.
(6) The new data reconstruction method comprises the following steps: in the multi-element time sequence, the H' is subjected to nonlinear transformation to obtain the reconstruction output of the sequence, so that the accumulated error propagation generated gradually is avoided;
(7) The reconstruction loss function calculation method comprises the following steps: in the environment monitoring data complement, when the reconstruction loss function is designed, the method only calculates the reconstruction loss of the missing value by means of the marking matrix, and the calculation formula is as follows:
where M is a marker matrix, x is the un-missing data,is the reconstructed value of the missing data.
In this embodiment, the implementation process is shown in fig. 5, and the detailed description is given. Specific examples are as follows:
step 1: data input: from the input data X, a marking matrix M is obtained, as in fig. 1, using X and M as inputs to the method.
Step 2: attribute association fusion module: x and M are used as inputs of the module, attribute association is calculated by adopting a formula I and a formula II, the thought of DropOut is used for reference, and the missing value represents neurons corresponding to the mask w in calculation, so that the neurons do not participate in the calculation of attention; and adopting a formula III to distribute attribute association coefficients to obtain X'.
Step 3: and a time correlation fusion module: using X' of the attribute association module as input, adopting a formula IV and a formula V to calculate a time association coefficient of the multi-element time sequence; and adopting a formula six to distribute the time correlation coefficient to obtain H'.
Step 4: data reconstruction and loss calculation: the H' is used as input to integrally generate reconstruction data, and the reconstruction method adopts MLP or nonlinear full connection to integrally generate the data. And adopting a loss function calculation method of a formula seven.
Step 5: algorithm output: initially, randomly initializing a value of a loss function, ending when the value of the loss function is less than a threshold or no longer decreases, and outputting the reconstructed data
The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides an environment monitoring data complement system based on the association fusion, which comprises the following steps:
the module 1 is used for acquiring a multi-element time sequence with missing data, and marking the missing data position of the multi-element time sequence to obtain a marking matrix;
the module 2 is used for obtaining association coefficients among the attributes according to each attribute of each time point in the multi-element time sequence so as to construct a graph G, wherein the graph G corresponds to the attributes, and the edges among the nodes correspond to the attribute association coefficients among the attributes;
the module 3 is configured to obtain an intermediate matrix X 'by multiplying the graph G and the matrix to be complemented by bits, and perform time sequence processing on the intermediate matrix X' through a neural network to obtain a hidden state of each time point in the multi-element time sequence;
the module 4 is used for calculating the time relevance coefficient of the multi-element time sequence according to the hidden state of each time point; obtaining an intermediate state H' of each time point in the multi-element time sequence by multiplying the time correlation coefficient and the hidden state of each time point by bits;
and a module 5, configured to obtain reconstruction complement data of the multiple time series by adopting a nonlinear transformation of a generated formula to the intermediate state H' in the multiple time series.
The environment monitoring data complement system based on the association fusion further comprises:
a module 6, configured to calculate a reconstruction loss of the missing value according to the marking matrix and the reconstruction complement data by:
where M is the marker matrix, x is the missing data in the multivariate time series,the missing data in the multivariate time series is complemented by the reconstruction complement data;
and a module 7, configured to determine whether the reconstruction loss meets a preset condition, if yes, output the reconstruction complement data as a final result, otherwise, execute the module 2 again.
The environmental monitoring data complement system based on association fusion, wherein the module 2 comprises:
obtaining a first association coefficient by adopting a graph attention mechanism:
e ij =tanh(w·[x i |x j ]+b)
wherein x is i And x j Is the value of attribute i and attribute j, where e ij The first association coefficients representing the attribute i and the adjacent node attribute j, w and b are parameters learned by the neural network;
the attribute association coefficient a is obtained by regularizing the adjacent node attributes j of all the attributes i ij
a ij Is the value of the ith row and jth column of graph G, representing the attribute relationship of attribute i and attribute jCo-coefficient, N, represents the total number of attributes.
The environmental monitoring data complement system based on the association fusion, wherein the module 3 comprises a time sequence processing for the intermediate matrix X' based on the following formula:
h=lstm (X'), where h= { H 1 ,h 2 ,h 3 ,...,h t -the hidden state at each point in time in the multivariate time sequence;
the module 4 comprises:
according to the difference of the time relevance, calculating the time relevance coefficient:
β mn =sigmod(w·[h m |h n ]+δ)
wherein h is m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
The environment monitoring data complement system based on the association fusion, wherein the multivariate time sequence is environment monitoring data, and the attribute of each time point in the environment monitoring data comprises temperature and air pressure.
Although specific embodiments of, and the accompanying drawings for, the present invention are disclosed for illustrative purposes only and are for the purpose of aiding in the understanding of the present invention and the practice thereof, it will be understood by those skilled in the art that: corresponding methods and tools may be implemented on other platforms without departing from the spirit and scope of the present invention and the appended claims. Accordingly, the invention should not be limited to the disclosure of the embodiments and drawings.

Claims (8)

1. The environment monitoring data complement method based on the association fusion is characterized by comprising the following steps of:
step 1, acquiring environment monitoring data with missing data, and marking the data missing position of the environment monitoring data to obtain a marking matrix, wherein the attribute of each time point in the environment monitoring data comprises temperature and air pressure;
step 2, obtaining association coefficients among the attributes according to each attribute of each time point in the environment monitoring data to construct a graph G, wherein nodes in the graph G correspond to the attributes, and edges among the nodes correspond to the attribute association coefficients among the attributes;
step 3, multiplying the graph G and the matrix to be complemented according to the bit to obtain an intermediate matrix X ', and performing time sequence processing on the intermediate matrix X' through a neural network to obtain the hidden state of each time point in the environment monitoring data;
step 4, calculating a time relevance coefficient of the environment monitoring data according to the hidden state of each time point; obtaining an intermediate state H' of each time point in the environment monitoring data by multiplying the time correlation coefficient and the hidden state of each time point by bits;
and 5, in the environment monitoring data, adopting the nonlinear transformation of the generation formula to the intermediate state H' to obtain the reconstruction complement data of the environment monitoring data.
2. The association fusion-based environmental monitoring data completion method of claim 1, further comprising:
step 6, calculating a reconstruction loss of the missing value according to the marking matrix and the reconstruction complement data by the following formula:
where M is the tag matrix, x is the data not missing from the environmental monitoring data,is the missing data in the environmental monitoring data complemented by the reconstruction complement data;
and 7, judging whether the reconstruction loss meets a preset condition, if so, outputting the reconstruction complement data as a final result, otherwise, executing the step 2 again.
3. The environmental monitoring data complement method based on association fusion as set forth in claim 1 or 2, wherein the step 2 includes:
obtaining a first association coefficient by adopting a graph attention mechanism:
e ij =tanh(w·[x i |x j ]+b)
wherein x is i And x j Is the value of attribute i and attribute j, where e ij The first association coefficients representing the attribute i and the adjacent node attribute j, w and b are parameters learned by the neural network;
the attribute association coefficient a is obtained by regularizing the adjacent node attributes j of all the attributes i ij
a ij The values of the ith row and the jth column of the graph G represent the attribute association coefficients of the attribute i and the attribute j, and N represents the total number of attributes.
4. The association fusion-based environmental monitoring data completion method of claim 3, wherein the step 3 includes performing a time sequence process on the intermediate matrix X' based on the following formula:
h=lstm (X'), where h= { H 1 ,h 2 ,h 3 ,...,h t -is the hidden state of each point in time in the environmental monitoring data;
the step 4 comprises the following steps:
according to the difference of the time relevance, calculating the time relevance coefficient:
β mn =sigmod(w·[h m |h n ]+δ)
wherein h is m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
5. An environmental monitoring data completion system based on association fusion, comprising:
the module 1 is used for acquiring environment monitoring data with missing data, and marking the data missing position of the environment monitoring data to obtain a marking matrix; wherein the attributes of each point in time in the environmental monitoring data include temperature and air pressure;
the module 2 is used for obtaining association coefficients among the attributes according to each attribute of each time point in the environment monitoring data so as to construct a graph G, wherein the graph G corresponds to the attributes, and the edges among the nodes correspond to the attribute association coefficients among the attributes;
the module 3 is used for multiplying the graph G and the matrix to be complemented according to the bit to obtain an intermediate matrix X ', and carrying out time sequence processing on the intermediate matrix X' through a neural network to obtain the hidden state of each time point in the environment monitoring data;
the module 4 is used for calculating the time relevance coefficient of the environment monitoring data according to the hidden state of each time point; obtaining an intermediate state H' of each time point in the environment monitoring data by multiplying the time correlation coefficient and the hidden state of each time point by bits;
and the module 5 is used for obtaining reconstruction complement data of the environment monitoring data by adopting the nonlinear transformation of the generation formula on the intermediate state H' in the environment monitoring data.
6. The association fusion-based environmental monitoring data completion system of claim 5, further comprising:
a module 6, configured to calculate a reconstruction loss of the missing value according to the marking matrix and the reconstruction complement data by:
where M is the tag matrix, x is the data not missing from the environmental monitoring data,is completed by the reconstruction complement dataMissing data in (a);
and a module 7, configured to determine whether the reconstruction loss meets a preset condition, if yes, output the reconstruction complement data as a final result, otherwise, execute the module 2 again.
7. The environmental monitoring data completion system based on associative fusion according to claim 5 or 6, wherein the module 2 comprises:
obtaining a first association coefficient by adopting a graph attention mechanism:
e ij =tanh(w·[x i |x j ]+b)
wherein x is i And x j Is the value of attribute i and attribute j, where e ij The first association coefficients representing the attribute i and the adjacent node attribute j, w and b are parameters learned by the neural network;
the attribute association coefficient a is obtained by regularizing the adjacent node attributes j of all the attributes i ij
a ij The values of the ith row and the jth column of the graph G represent the attribute association coefficients of the attribute i and the attribute j, and N represents the total number of attributes.
8. The association fusion-based environmental monitoring data completion system of claim 7, wherein the module 3 includes timing processing of the intermediate matrix X' based on the following formula:
h=lstm (X'), where h= { H 1 ,h 2 ,h 3 ,...,h t -is the hidden state of each point in time in the environmental monitoring data;
the module 4 comprises:
according to the difference of the time relevance, calculating the time relevance coefficient:
β mn =sigmod(w·[h m |h n ]+δ)
wherein the method comprises the steps of,h m And h n Is that LSTM calculates hidden state according to each time step m and time step n, beta mn The time correlation coefficient, w and sigma are parameters learned by the neural network.
CN202110624648.6A 2021-06-04 2021-06-04 Environment monitoring data completion method and system based on association fusion Active CN113392139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110624648.6A CN113392139B (en) 2021-06-04 2021-06-04 Environment monitoring data completion method and system based on association fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110624648.6A CN113392139B (en) 2021-06-04 2021-06-04 Environment monitoring data completion method and system based on association fusion

Publications (2)

Publication Number Publication Date
CN113392139A CN113392139A (en) 2021-09-14
CN113392139B true CN113392139B (en) 2023-10-20

Family

ID=77618237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110624648.6A Active CN113392139B (en) 2021-06-04 2021-06-04 Environment monitoring data completion method and system based on association fusion

Country Status (1)

Country Link
CN (1) CN113392139B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108228832A (en) * 2018-01-04 2018-06-29 南京大学 A kind of time series data complementing method based on distance matrix
CN110837888A (en) * 2019-11-13 2020-02-25 大连理工大学 Traffic missing data completion method based on bidirectional cyclic neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034378B (en) * 2018-09-04 2023-03-31 腾讯科技(深圳)有限公司 Network representation generation method and device of neural network, storage medium and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108228832A (en) * 2018-01-04 2018-06-29 南京大学 A kind of time series data complementing method based on distance matrix
CN110837888A (en) * 2019-11-13 2020-02-25 大连理工大学 Traffic missing data completion method based on bidirectional cyclic neural network

Also Published As

Publication number Publication date
CN113392139A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN112529168B (en) GCN-based attribute multilayer network representation learning method
CN110048827B (en) Class template attack method based on deep learning convolutional neural network
CN112131383B (en) Specific target emotion polarity classification method
CN113919441A (en) Classification method based on hypergraph transformation network
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
CN115618196A (en) Transformer anomaly detection method based on space-time characteristics
CN110956309A (en) Flow activity prediction method based on CRF and LSTM
CN115168443A (en) Anomaly detection method and system based on GCN-LSTM and attention mechanism
CN111259264B (en) Time sequence scoring prediction method based on generation countermeasure network
CN110781401A (en) Top-n project recommendation method based on collaborative autoregressive flow
Ibanez et al. Prediction of missing values and detection of ‘exceptional events’ in a chronological planktonic series: a single algorithm
Wen et al. Causal-tgan: Modeling tabular data using causally-aware gan
CN113392139B (en) Environment monitoring data completion method and system based on association fusion
CN116959585A (en) Deep learning-based whole genome prediction method
Huang et al. FL-Net: A multi-scale cross-decomposition network with frequency external attention for long-term time series forecasting
Cheri Optimizations of training dataset on house price estimation
CN114547276A (en) Three-channel diagram neural network-based session recommendation method
Ding et al. c-NTPP: learning cluster-aware neural temporal point process
Kraamwinkel Time Series Forecasting on COVID-19 Data and Its Relevance to International Health Security
Yuan et al. Multi-scale transition matrix approach to time series
Burra et al. Stock Price Prediction using Zero-Shot Sentiment Classification
Jiang et al. ALAE: self-attention reconstruction network for multivariate time series anomaly identification
Song et al. Decoupled Marked Temporal Point Process using Neural Ordinary Differential Equations
Toğaçar The Contribution of AI-Based Approaches in the Determination of CO
Ma et al. An engineering approach to forecast volatility of financial indices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant