CN116541667A - Interpolation method and system for buoy time sequence data missing value - Google Patents

Interpolation method and system for buoy time sequence data missing value Download PDF

Info

Publication number
CN116541667A
CN116541667A CN202310782657.7A CN202310782657A CN116541667A CN 116541667 A CN116541667 A CN 116541667A CN 202310782657 A CN202310782657 A CN 202310782657A CN 116541667 A CN116541667 A CN 116541667A
Authority
CN
China
Prior art keywords
data
interpolation
missing
time
reverse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310782657.7A
Other languages
Chinese (zh)
Other versions
CN116541667B (en
Inventor
张彩云
林晨旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202310782657.7A priority Critical patent/CN116541667B/en
Publication of CN116541667A publication Critical patent/CN116541667A/en
Application granted granted Critical
Publication of CN116541667B publication Critical patent/CN116541667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an interpolation method and system of a buoy time sequence data missing value, wherein the method comprises the following steps: acquiring sample data; determining data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; performing wavelet decomposition on a front section and a rear section of the interpolation data by using a mother wavelet to obtain components; respectively obtaining forward and reverse data sets according to the preset time step and the obtained components in the forward time direction and the reverse time direction respectively; each component is trained to obtain a forward model and a reverse model by utilizing a long-short-term memory network model; interpolating the missing time periods one by using the model, wherein the interpolated data can be added into a time vector at the next time of interpolation until the whole time period is completely interpolated; and adding the results obtained by the forward model to obtain a forward interpolation result, adding the reverse model results to obtain a reverse interpolation result, and multiplying the two results by a weight value to obtain a final interpolation result. The method can effectively interpolate the loss of the buoy time sequence data.

Description

Interpolation method and system for buoy time sequence data missing value
Technical Field
The invention relates to the technical field of data interpolation, in particular to an interpolation method and system for a buoy time sequence data missing value.
Background
In the case of regional environment variable studies using buoy data, there is a loss of data over a period of time due to instrument updates or damage, resulting in a discontinuity in time. Sometimes it happens that this time is in the range of the desired study and interpolation is needed if the data is desired to be used. However, several interpolation methods are often used, which often do not reflect the change rule of the data well, and only fit with a certain mathematical relationship. With the rapid development of computer software technology and artificial intelligence, the system has the capability of identifying the internal connection of data and learning the change rule of the data, and has unique advantages for solving the problems of nonlinearity and unclear mechanism. The long-term and short-term memory network can be used for solving the long-term dependence problem in the neural network, the long-term and short-term memory network can be used for effectively transmitting effective information in the long-term sequence, past information can be transmitted to the future, meanwhile, the wavelet analysis method can well solve the defect of Fourier change on abrupt signals, and the two can be combined to learn detailed signals in the original time sequence more closely.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide an interpolation method and an interpolation system for the missing value of buoy time series data, which can effectively interpolate the missing of the buoy time series data so that the recovered data is close to the original data.
The invention adopts the following technical scheme:
in one aspect, a method for interpolating missing values of time series data of a buoy includes:
s1, acquiring ocean observation time series data with missing values, which are acquired by a buoy; the ocean observation time series data with the missing values are data of equal time intervals obtained at a certain sampling frequency, and the missing time periods exist in the data;
s2, taking the logarithm of the marine observation time series data to obtain processed marine observation time series data;
s3, determining data participating in interpolation or pre-selecting the data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; the data participating in interpolation comprises front section data and rear section data of a missing value, and the data participating in interpolation is marine observation time sequence data after logarithmic processing;
s4, carrying out wavelet decomposition on the front-stage data and the rear-stage data by using a mother wavelet to obtain components;
s5, according to a preset time step, the obtained components are respectively processed in a forward time direction and a reverse time direction to obtain a forward data set and a reverse data set;
s6, utilizing a long-short-term memory network model, dividing the forward data set and the reverse data set of each component into a training set and a verification set according to a preset proportion, and training according to set training parameters to obtain a forward model and a reverse model;
s7, respectively carrying out one-to-one interpolation on the missing time periods by utilizing the forward model and the reverse model, and adding the time vector obtained by interpolation into the time vector obtained by interpolation at the next time until the interpolation of the whole missing time period is complete;
s8, adding results obtained by the forward models and performing exponential operation to obtain forward interpolation results, adding reverse model results and performing exponential operation to obtain reverse interpolation results, and multiplying the forward interpolation results and the reverse interpolation results by corresponding weights respectively to obtain interpolation results.
Preferably, the S3 specifically includes:
MK assay was performed on the first f days of the deletion periodGraph, MK test was performed on the post-b days of the deletion periodGraphs f and b are graphs of the number of days selected in advance,andobtaining a statistic sequence according to the time sequence;
for a marine observation time sequence x= =Constructing rank sequencesWhereinOrder ofWhen the value at the ith moment is larger than the value at the j moment, accumulating the number of the values;
under the assumption that the time sequence is random, defining statistics Wherein, the method comprises the steps of, wherein,andrespectively areMean and variance of (a), andindependently of each other, they have the same continuous distribution, which can be deduced from the following equation: ,
then according to the reverse order of the ocean observation time sequence X) Repeating the above steps to makeObtaining
When (when)Andafter the significance level is exceeded, if an intersection point appears and the intersection point is in a critical line obtained according to the significance level, the intersection point is a trend mutation point of the ocean observation data, and the corresponding moment of the intersection point is mutation starting time;
starting to search for a mutation point from the moment closest to the missing time period, confirming the mutation point, continuing to search for the moment with obvious trend after mutation, judging whether the found moment is more than or equal to a preset number of days m from the missing time period, and if so, confirming the mutation point as the mutation moment of the required ocean observation time sequence; if not, continuing to search;
if the moment meeting the requirement is found in the previous f days, selecting the data from the previous f days of the missing time period to the moment when the missing value starts as front-stage data, and if the moment meeting the requirement is not found in the previous f days, selecting the data from the previous m days of the missing time period to the moment when the missing value starts as front-stage data;
and if the moment meeting the requirements is found in the following b days, selecting the data from the moment of ending the missing value to the moment of b days after the missing period as the back-end data, and if the moment meeting the requirements is not found in the following b days, selecting the data from the moment of ending the missing value to the moment of m days after the missing period as the back-end data.
Preferably, the step S4 further includes:
respectively carrying out normalization processing on all components, wherein the normalization formula is that
Wherein, the liquid crystal display device comprises a liquid crystal display device,representing normalized marine observation time series data; x is a component obtained after wavelet decomposition of the obtained marine observation time series data;a minimum value representing marine observation time series data;represents the maximum value of the marine observation time series data.
Preferably, the step S6 specifically includes:
dividing the obtained forward data set and reverse data set of each component into a training set and a verification set according to a preset proportion, integrating the two forward data sets with the same components in the front section data and the rear section data, memorizing the network model for a long period according to the set training parameters, and continuously updating the weight until the network converges to obtain a plurality of forward models; integrating two reverse data sets with the same components in the front section data and the rear section data, and continuously updating weights until the network converges through a long-period memory network model according to set training parameters to obtain a plurality of reverse models; the number of forward models and reverse models is related to the number of wavelet decomposition levels.
Preferably, before S8, the method further includes:
performing inverse normalization on the forward model result and the reverse model result, wherein an inverse normalization formula is thatThe method comprises the steps of carrying out a first treatment on the surface of the Wherein, the liquid crystal display device comprises a liquid crystal display device,as a result of the model,in order to obtain the result after the inverse normalization,a minimum value representing marine observation time series data;represents the maximum value of the marine observation time series data.
Preferably, the step S8 specifically includes:
multiplying the forward interpolation result by the forward interpolation weight plus the backward interpolation result by the backward interpolation weightThe method comprises the steps of carrying out a first treatment on the surface of the Wherein, the liquid crystal display device comprises a liquid crystal display device,a vector composed of interpolation values at all missing moments, s represents the number of missing,representing the s-th interpolation value;representing the missing t time point, and utilizing an interpolation result obtained by weighting the forward interpolation result and the backward interpolation result;for forward interpolation weights at time point t,reverse interpolation weights for time points t;representing a forward interpolation result obtained by adding all the forward model results after the inverse normalization;representing a reverse interpolation result obtained by adding all reverse model results after the reverse normalization;
and performing exponential operation to obtain interpolation data.
Preferably, the forward interpolation weightsThe following are provided:
inverse interpolation weightsThe following are provided:
where t represents an interpolation time point, L represents a sea observation missing time length,
in another aspect, an interpolation system for a missing value of time series data of a buoy includes:
the ocean observation time sequence data acquisition module is used for acquiring ocean observation time sequence data with missing values, which are acquired by the buoy; the ocean observation time series data with the missing values are data of equal time intervals obtained at a certain sampling frequency, and the missing time periods exist in the data;
the logarithmic processing module is used for taking the logarithm of the marine observation time series data to obtain processed marine observation time series data;
the interpolation data determining module is used for determining data participating in interpolation or pre-selecting the data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; the data participating in interpolation comprises front-section data and rear-section data of the missing value;
the wavelet decomposition module is used for carrying out wavelet decomposition on the front-section data and the rear-section data by utilizing a mother wavelet to obtain components;
the forward and reverse data set acquisition module is used for processing the obtained components in the forward time direction and the reverse time direction respectively according to a preset time step to obtain a forward data set and a reverse data set;
the positive and negative model training module is used for dividing the positive data set and the negative data set of each component into a training set and a verification set according to a preset proportion by utilizing the long-short-period memory network model, and training according to set training parameters to obtain a positive model and a negative model;
the interpolation processing module is used for respectively carrying out one-to-one interpolation on the missing time periods by utilizing the forward model and the reverse model, and the time vector at the next time of interpolation is added into the data obtained by interpolation until the whole missing time period is completely interpolated;
the positive and negative interpolation fusion module is used for adding the results obtained by the forward models and performing exponential operation to obtain forward interpolation results, adding the reverse model results and performing exponential operation to obtain reverse interpolation results, and multiplying the forward interpolation results and the reverse interpolation results by corresponding weight values respectively.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention decomposes the obtained front-stage data and the back-stage data by a wavelet analysis method, performs forward and reverse training by utilizing a long-short-term memory network, multiplies the obtained two groups of results by weights respectively, can effectively learn the change rule in the original time sequence, and effectively interpolates the missing data;
2. according to the invention, the Mann-Kendall test is utilized, the front-section data and the rear-section data which participate in interpolation are determined according to the missing time period, the acquired data which participate in interpolation are more reasonable, and the recovered data are closer to the original data.
Drawings
FIG. 1 is a flowchart of an interpolation method for a time series data missing value of a buoy according to an embodiment of the invention;
FIG. 2 is a schematic illustration of an embodiment of the present inventionAnda schematic diagram of a graph;
FIG. 3 is a schematic diagram of a long-short term memory network according to an embodiment of the present invention;
FIG. 4 is a graph illustrating forward interpolation weights and reverse interpolation weights implemented in accordance with the present invention;
FIG. 5 is a schematic diagram of interpolation results according to an embodiment of the present invention;
fig. 6 is a block diagram of an interpolation system for time series data missing values of a buoy according to an embodiment of the invention.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The term "comprising" in the description of the invention and the claims and in the above figures and any variants thereof is intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the interpolation method of the missing values of the time series data of the buoy according to the present embodiment includes the following steps:
s1, acquiring ocean observation time series data with missing values, which are acquired by a buoy; the ocean observation time series data with the missing value is data of equal time intervals obtained at a certain sampling frequency, and the missing time period exists in the data.
Specifically, sea surface temperature data of the buoy in the time range from 2017-05-1:05:00 to 2017-06-10:05:00 are acquired, wherein the sea surface temperature data of the buoy from 2017-05-22:00 to 2017-05-23:05:00 are missing.
S2, taking the logarithm of the marine observation time series data to obtain the processed marine observation time series data.
Specifically, the sea surface temperature data after logarithmic treatment is made to be T, whereinIs the collected original sea surface temperature value.
S3, determining data participating in interpolation or pre-selecting the data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; the data participating in interpolation comprises front section data and rear section data of the missing value, and the data participating in interpolation is marine observation time sequence data after logarithmic processing.
Specifically, MK assay was performed on the first f days of the deletion periodGraph, MK test was performed on the post-b days of the deletion periodGraphs f and b are graphs of the number of days selected in advance,andobtaining a statistic sequence according to the time sequence;
for a marine observation time sequence x= =Constructing rank sequencesWhereinOrder ofWhen the value at the ith moment is larger than the value at the j moment, accumulating the number of the values;
under the assumption that the time sequence is random, defining statistics Wherein, the method comprises the steps of, wherein,andrespectively areMean and variance of (a), andindependently of each other, they have the same continuous distribution, which can be deduced from the following equation: ,
then according to the reverse order of the ocean observation time sequence X) Repeating the above steps to makeObtaining
When (when)Andafter the significance level is exceeded, the rising or descending trend of the ocean observation data is obvious, if an intersection point appears and the intersection point is in a critical line obtained according to the significance level, the intersection point is an ocean observation data trend mutation point, and the corresponding moment of the intersection point is mutation starting time;
starting to search for a mutation point from the moment closest to the missing time period, confirming the mutation point, continuing to search for the moment with obvious trend after mutation, judging whether the found moment is more than or equal to a preset number of days m from the missing time period, and if so, confirming the mutation point as the mutation moment of the required ocean observation time sequence; if not, continuing to search;
if the moment meeting the requirement is found in the previous f days, selecting the data from the previous f days of the missing time period to the moment when the missing value starts as front-stage data, and if the moment meeting the requirement is not found in the previous f days, selecting the data from the previous m days of the missing time period to the moment when the missing value starts as front-stage data;
and if the moment meeting the requirements is found in the following b days, selecting the data from the moment of ending the missing value to the moment of b days after the missing period as the back-end data, and if the moment meeting the requirements is not found in the following b days, selecting the data from the moment of ending the missing value to the moment of m days after the missing period as the back-end data.
Sea surface temperature data of 2017-05-22:00 to 2017-05-23:05:00, respectively, were obtained by MK test for 14 days before and 14 days after the deletion period to obtain the data shown in FIG. 2, respectivelyAndgraph, wherein the abscissa represents time point and the ordinate represents calculated UFK and UBK.
When (when)Andafter the significance level is exceeded, the sea surface temperature rising or descending trend is obvious, if an intersection point appears and the intersection point is in a critical line obtained according to the significance level, the intersection point is a sea surface temperature mutation point, and the corresponding moment of the intersection point is mutation starting time; starting to search for the mutation point from the moment closest to the missing time period, confirming the mutation point, continuing to search for the moment with obvious trend after mutation, then judging whether the found moment is more than 5 days away from the missing time period, if so, determining that the sea surface temperature is suddenly changedEtching; if not, continuing to search; if the moment meeting the requirement cannot be found in the f days, the moment 5 from the preset days of the missing time period is selected. If the selection time is not satisfied, the time is manually selected.
After the above operation, sea surface temperature data with a period of time from 2017-05-08:00 to 2017-05-22:00 and from 2017-05-23:00 to 2017-05-29:10:30 are selected as data participating in interpolation.
Wherein 2017-05-08:00 to 2017-05-22:00 are recorded as a first time period, sea surface temperature data of the first time period are recorded as front-stage data, 2017-05-23:00 to 2017-05-29:30 are recorded as a second time period, and sea surface temperature data of the second time period are recorded as back-stage data.
S4, carrying out wavelet decomposition on the front-stage data and the rear-stage data by using a mother wavelet to obtain components.
Specifically, the front-stage data and the back-stage data which participate in interpolation data are subjected to wavelet decomposition by using a preset mother wavelet and a preset decomposition level number respectively to obtain components, and normalization is performed.
If the sea surface temperature data obtained by taking the logarithm of the data of the first time period and the data of the second time period selected in the step S3 are respectively processed, a wavelet analysis method is utilized, a mother wavelet is selected as db4, 3-level decomposition is carried out, 4 components are respectively obtained, normalization processing is respectively carried out on all the components, and a normalization formula is thatThe resulting components are D1 for the first time period respectively 1 ,D2 1 ,D3 1 ,A3 1 And D1 for a second period of time 2 ,D2 2 ,D3 2 ,A3 2 . Wherein X is a component obtained after wavelet decomposition of the obtained marine observation time series data;a minimum value representing marine observation time series data;representing the number of time series of ocean observationsAccording to the maximum value.
S5, according to a preset time step, the obtained components are respectively processed in the forward time direction and the reverse time direction to obtain a forward data set and a reverse data set.
Specifically, the input time step is set to 48, and D1 including 659 data is set in the first period of S4 1 Component of%) Make sure that%), ()…() Is a forward data set, wherein the front of the 'I' is model input, the rear of the 'I' is prediction target, and the 'I' is model input), ()…() The same is done for the remaining three components for one reverse data set, each component yielding two data sets, forward and reverse, as follows:
forward data set: (), ()…();
), ()…();
), ()…();
Reverse data set: (), ()…();
), ()…();
), ()…();
The same process is performed for the second time period to obtain the following data set:
forward data set: (), ()…();
), ()…();
), ()…();
), ()…();
Reverse data set: (), ()…();
), ()…();
), ()…();
), ()…()。
S6, utilizing the long-short-term memory network model, dividing the forward data set and the reverse data set of each component into a training set and a verification set according to a preset proportion, and training according to set training parameters to obtain a forward model and a reverse model.
Referring to fig. 3, a long-term memory network (Long Short Term Memory, LSTM) is proposed to solve the long-term dependency problem of machine learning by introducing forgetting gates, memory gates, output gates.
The forgetting gate decides which information before according to the new input and the output of the last moment, so that the network remembers the information which is important for a long time, and the memory gate decides how many new inputs are put into the memory and the output gate outputs the final result.
The step of S5 is to divide the forward and reverse data sets of each component into a training set and a verification set according to a preset proportion, memorize the network model for a long period according to a set training parameter, and continuously update the weight until the network converges to obtain a forward model and a reverse model.
For example, four forward data sets and four reverse data sets of the first time period are obtained by processing in S5, four forward data sets and four reverse data sets of the second time period are obtained by integrating two forward data sets which are both components of D1 in the first time period and the second time period, 80% of which are used as training sets, and 20% of which are used as verification sets, and using LSTM trainingFor forward predicting D1 component, integrating two reverse data sets which are the same as D1 component, taking 80% of the two reverse data sets as training set, and 20% as verification set, and training with LSTM to obtainFor backward predicting the D1 component.
The same operation is performed on D2, D3 and A3 to obtain a forward model and a reverse model respectively, such as
Forward model:
reverse model:
and finally, 4 forward models, 4 reverse models and 8 models are obtained.
And S7, respectively carrying out one-to-one interpolation on the missing time periods by using the forward model and the reverse model, and adding the time vector obtained by interpolation into the time vector obtained by interpolation at the next time until the interpolation of the whole missing time period is complete.
Specifically, the model obtained by training the D1 forward data set in S6For the missing time [ ]) Interpolation is performed by using D1 of the first time period 1 Time vector of [ ]) Input to the modelObtainingInterpolation value of time D1 componentAnd then willAdded to the input to form an input vectorInput modelObtainingInterpolation value of time D1 componentAnd is constructed intoAnd so on, get();
Then the model obtained by training the D1 reverse data set in S6 is utilizedFor the missing time [ ]) Interpolation is performed using D1 of the second period of time 2 Time vector of [ ]) Input to the modelObtainingInterpolation value of time D1 componentAnd then willAdded into input to form input vector%) Input to the modelObtainingTime D1 component interpolation valueAnd form%) Analogically, get);
The same operation is carried out on the components D2, D3 and A3 to obtain
Forward model results: (
Reverse model results: (
From the above description, the original data in the input vector is continuously reduced, and the prediction accuracy is also worse and worse.
S8, adding results obtained by the forward models and performing exponential operation to obtain forward interpolation results, adding reverse model results and performing exponential operation to obtain reverse interpolation results, and multiplying the forward interpolation results and the reverse interpolation results by corresponding weights respectively to obtain interpolation results.
Specifically, the forward model result and the reverse model result are inversely normalized, and the inverse normalization formula is thatWhereinAs used for normalization in S4,as a result of the model,is the inverse normalized result. Adding all the forward model results after the inverse normalization to obtain a forward interpolation result for the sea surface temperature) Adding all the reverse model results after the inverse normalization to obtain a reverse interpolation result for the sea surface temperature)。
As is known from S7, as the true sea level temperature data of the buoy in the input vector is gradually replaced by the predicted data, the more backward the time is for forward interpolation, the less reliable the result is, and the more forward the time is for backward interpolation, the less reliable the result is. The two are combined using weights.
Forward interpolation weightsThe following are provided:
inverse interpolation weightsThe following are provided:
where t represents an interpolation time point, L represents a sea observation missing time length,
the formula is utilized to obtain the product) And%) See fig. 4, where the abscissa represents the time point of the missing period and the ordinate represents the weight.
Multiplying the forward interpolation result by the forward interpolation weight plus the backward interpolation result by the backward interpolation weight,The interpolation of the sea surface temperature after logarithm taking is completed, the interpolation of the sea surface temperature is required to be obtained, and the exponential operation is carried out, for example
Finally, the obtained interpolation value of the sea surface temperature is filled in the missing unit, and the interpolation is completed, and the result is shown in fig. 5.
In summary, according to the interpolation method for the missing value of the buoy time sequence data, the original data is detailed through wavelet analysis, a model is obtained through long-period memory network training, the missing time period is interpolated through forward interpolation and reverse interpolation, and the interpolation result accords with the change rule of the original data through addition of a weighting method.
Referring to fig. 6, the embodiment further discloses an interpolation system for the missing values of the buoy time series data, which includes:
the ocean observation time series data acquisition module 601 is used for acquiring ocean observation time series data with missing values, which are acquired by the buoy; the ocean observation time series data with the missing values are data of equal time intervals obtained at a certain sampling frequency, and the missing time periods exist in the data;
the log processing module 602 is configured to log the marine observation time series data to obtain processed marine observation time series data;
an interpolation data determining module 603, configured to determine data participating in interpolation or pre-select data participating in interpolation according to a missing period by using a Mann-Kendall test; the data participating in interpolation comprises front-section data and rear-section data of the missing value;
a wavelet decomposition module 604, configured to perform wavelet decomposition on the front-segment data and the back-segment data by using a mother wavelet to obtain components;
a forward and reverse data set obtaining module 605, configured to process the obtained components in a forward time direction and a reverse time direction according to a preset time step, so as to obtain a forward data set and a reverse data set;
the positive and negative model training module 606 is configured to divide the positive data set and the negative data set of each component into a training set and a verification set according to a preset proportion by using the long-short term memory network model, and train the training set according to the set training parameters to obtain a positive model and a negative model;
the interpolation processing module 607 is configured to interpolate the missing time periods one by using the forward model and the reverse model, where the interpolated data is added to a time vector at the next time of interpolation until the interpolation of the entire missing time period is complete;
the positive and negative interpolation fusion module 608 is configured to add the results obtained by the positive model and perform an exponential operation to obtain a positive interpolation result, add the results of the negative model and perform an exponential operation to obtain a negative interpolation result, and multiply the positive interpolation result and the negative interpolation result with corresponding weights respectively to obtain an interpolation result.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. Multiple units or systems as set forth in the system claims may also be implemented by means of one unit or system in software or hardware.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (8)

1. A method for interpolating a missing value of time series data of a buoy, comprising:
s1, acquiring ocean observation time series data with missing values, which are acquired by a buoy; the ocean observation time series data with the missing values are data of equal time intervals obtained at a certain sampling frequency, and the missing time periods exist in the data;
s2, taking the logarithm of the marine observation time series data to obtain processed marine observation time series data;
s3, determining data participating in interpolation or pre-selecting the data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; the data participating in interpolation comprises front section data and rear section data of a missing value, and the data participating in interpolation is marine observation time sequence data after logarithmic processing;
s4, carrying out wavelet decomposition on the front-stage data and the rear-stage data by using a mother wavelet to obtain components;
s5, according to a preset time step, the obtained components are respectively processed in a forward time direction and a reverse time direction to obtain a forward data set and a reverse data set;
s6, utilizing a long-short-term memory network model, dividing the forward data set and the reverse data set of each component into a training set and a verification set according to a preset proportion, and training according to set training parameters to obtain a forward model and a reverse model;
s7, respectively carrying out one-to-one interpolation on the missing time periods by utilizing the forward model and the reverse model, and adding the time vector obtained by interpolation into the time vector obtained by interpolation at the next time until the interpolation of the whole missing time period is complete;
s8, adding results obtained by the forward models and performing exponential operation to obtain forward interpolation results, adding reverse model results and performing exponential operation to obtain reverse interpolation results, and multiplying the forward interpolation results and the reverse interpolation results by corresponding weights respectively to obtain interpolation results.
2. The method for interpolating a missing value of time series data of a buoy according to claim 1, wherein S3 specifically comprises:
MK assay was performed on the first f days of the deletion periodGraph, MK test was performed on post-b days of the deletion period to obtain +.>Graph f and b are selected days in advance, < >>And->Obtaining a statistic sequence according to the time sequence;
for a marine observation time sequence x= =Constructing rank sequence->Wherein->Order->When the value at the ith moment is larger than the value at the j moment, accumulating the number of the values;
under the assumption that the time sequence is random, defining statistics Wherein, the method comprises the steps of, wherein,,/>and->Are respectively->Mean and variance of (2), and->Independently of each other, they have the same continuous distribution, which can be deduced from the following equation: /> , />
Then according to the reverse order of the ocean observation time sequence X) Repeating the above process to give ∈ ->Obtain->, />
When (when)And->After the significance level is exceeded, if an intersection point appears and the intersection point is in a critical line obtained according to the significance level, the intersection point is a trend mutation point of the ocean observation data, and the corresponding moment of the intersection point is mutation starting time;
starting to search for a mutation point from the moment closest to the missing time period, confirming the mutation point, continuing to search for the moment with obvious trend after mutation, judging whether the found moment is more than or equal to a preset number of days m from the missing time period, and if so, confirming the mutation point as the mutation moment of the required ocean observation time sequence; if not, continuing to search;
if the moment meeting the requirement is found in the previous f days, selecting the data from the previous f days of the missing time period to the moment when the missing value starts as front-stage data, and if the moment meeting the requirement is not found in the previous f days, selecting the data from the previous m days of the missing time period to the moment when the missing value starts as front-stage data;
and if the moment meeting the requirements is found in the following b days, selecting the data from the moment of ending the missing value to the moment of b days after the missing period as the back-end data, and if the moment meeting the requirements is not found in the following b days, selecting the data from the moment of ending the missing value to the moment of m days after the missing period as the back-end data.
3. The method of interpolation of missing values of time series data of a buoy according to claim 1, wherein S4 further comprises:
respectively are provided withRespectively carrying out normalization processing on all components, wherein the normalization formula is that
Wherein, the liquid crystal display device comprises a liquid crystal display device,representing normalized marine observation time series data; x is a component obtained after wavelet decomposition of the obtained marine observation time series data; />A minimum value representing marine observation time series data; />Represents the maximum value of the marine observation time series data.
4. The method for interpolating a missing value of time series data of a buoy according to claim 1, wherein S6 specifically comprises:
dividing the obtained forward data set and reverse data set of each component into a training set and a verification set according to a preset proportion, integrating the two forward data sets with the same components in the front section data and the rear section data, memorizing the network model for a long period according to the set training parameters, and continuously updating the weight until the network converges to obtain a plurality of forward models; integrating two reverse data sets with the same components in the front section data and the rear section data, and continuously updating weights until the network converges through a long-period memory network model according to set training parameters to obtain a plurality of reverse models; the number of forward models and reverse models is related to the number of wavelet decomposition levels.
5. The method of interpolation of missing values of time series data of a buoy according to claim 1, further comprising, prior to S8:
the forward model result and the reverse model result are reversely normalizedThe inverse normalization formula isThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For model result->For the inverse normalized result, ++>A minimum value representing marine observation time series data; />Represents the maximum value of the marine observation time series data.
6. The method for interpolation of missing values of time series data of a buoy according to claim 5, wherein S8 specifically comprises:
multiplying the forward interpolation result by the forward interpolation weight plus the backward interpolation result by the backward interpolation weightThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Vector representing the composition of interpolation values at all deletion moments, s representing the number of deletions, < >>Representing the s-th interpolation value; />Representing the missing t time point, and utilizing an interpolation result obtained by weighting the forward interpolation result and the backward interpolation result; />For the forward interpolation weight of time point t, < +.>Reverse interpolation weights for time points t; />Representing a forward interpolation result obtained by adding all the forward model results after the inverse normalization; />Representing a reverse interpolation result obtained by adding all reverse model results after the reverse normalization;
and performing exponential operation to obtain interpolation data.
7. The method of interpolation of missing values of time series data of a buoy of claim 6, wherein the weight is forward interpolatedThe following are provided:
inverse interpolation weightsThe following are provided:
where t represents an interpolation time point, L represents a sea observation missing time length,,/>
8. an interpolation system for a missing value of time series data of a buoy, comprising:
the ocean observation time sequence data acquisition module is used for acquiring ocean observation time sequence data with missing values, which are acquired by the buoy; the ocean observation time series data with the missing values are data of equal time intervals obtained at a certain sampling frequency, and the missing time periods exist in the data;
the logarithmic processing module is used for taking the logarithm of the marine observation time series data to obtain processed marine observation time series data;
the interpolation data determining module is used for determining data participating in interpolation or pre-selecting the data participating in interpolation according to the missing time period by utilizing Mann-Kendall test; the data participating in interpolation comprises front-section data and rear-section data of the missing value;
the wavelet decomposition module is used for carrying out wavelet decomposition on the front-section data and the rear-section data by utilizing a mother wavelet to obtain components;
the forward and reverse data set acquisition module is used for processing the obtained components in the forward time direction and the reverse time direction respectively according to a preset time step to obtain a forward data set and a reverse data set;
the positive and negative model training module is used for dividing the positive data set and the negative data set of each component into a training set and a verification set according to a preset proportion by utilizing the long-short-period memory network model, and training according to set training parameters to obtain a positive model and a negative model;
the interpolation processing module is used for respectively carrying out one-to-one interpolation on the missing time periods by utilizing the forward model and the reverse model, and the time vector at the next time of interpolation is added into the data obtained by interpolation until the whole missing time period is completely interpolated;
the positive and negative interpolation fusion module is used for adding the results obtained by the forward models and performing exponential operation to obtain forward interpolation results, adding the reverse model results and performing exponential operation to obtain reverse interpolation results, and multiplying the forward interpolation results and the reverse interpolation results by corresponding weight values respectively.
CN202310782657.7A 2023-06-29 2023-06-29 Interpolation method and system for buoy time sequence data missing value Active CN116541667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310782657.7A CN116541667B (en) 2023-06-29 2023-06-29 Interpolation method and system for buoy time sequence data missing value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310782657.7A CN116541667B (en) 2023-06-29 2023-06-29 Interpolation method and system for buoy time sequence data missing value

Publications (2)

Publication Number Publication Date
CN116541667A true CN116541667A (en) 2023-08-04
CN116541667B CN116541667B (en) 2023-11-03

Family

ID=87450958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310782657.7A Active CN116541667B (en) 2023-06-29 2023-06-29 Interpolation method and system for buoy time sequence data missing value

Country Status (1)

Country Link
CN (1) CN116541667B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117609706A (en) * 2023-10-20 2024-02-27 北京师范大学 Method for interpolating data of carbon water flux
CN117609706B (en) * 2023-10-20 2024-06-04 北京师范大学 Method for interpolating data of carbon water flux

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882157A (en) * 2020-06-24 2020-11-03 东莞理工学院 Demand prediction method and system based on deep space-time neural network and computer readable storage medium
CA3177585A1 (en) * 2021-04-16 2022-10-16 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for digital product network systems and biology-based value chain networks
CN115935139A (en) * 2023-01-09 2023-04-07 吉林大学 Space field interpolation method for ocean observation data
CN116245018A (en) * 2023-01-12 2023-06-09 南京信息工程大学 Sea wave missing measurement data forecasting method based on bivariate long-short-term memory algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882157A (en) * 2020-06-24 2020-11-03 东莞理工学院 Demand prediction method and system based on deep space-time neural network and computer readable storage medium
CA3177585A1 (en) * 2021-04-16 2022-10-16 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for digital product network systems and biology-based value chain networks
CN115935139A (en) * 2023-01-09 2023-04-07 吉林大学 Space field interpolation method for ocean observation data
CN116245018A (en) * 2023-01-12 2023-06-09 南京信息工程大学 Sea wave missing measurement data forecasting method based on bivariate long-short-term memory algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜浩;赵中阔;樊伟;宋金宝;: "基于经验模态分解和小波分解估算海气通量涡相关计算中的截断时间尺度", 海洋与湖沼, no. 06, pages 12 - 24 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117609706A (en) * 2023-10-20 2024-02-27 北京师范大学 Method for interpolating data of carbon water flux
CN117609706B (en) * 2023-10-20 2024-06-04 北京师范大学 Method for interpolating data of carbon water flux

Also Published As

Publication number Publication date
CN116541667B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
US20200272905A1 (en) Artificial neural network compression via iterative hybrid reinforcement learning approach
CN110427654B (en) Landslide prediction model construction method and system based on sensitive state
CN111985523A (en) Knowledge distillation training-based 2-exponential power deep neural network quantification method
CN113723007A (en) Mechanical equipment residual life prediction method based on DRSN and sparrow search optimization BilSTM
CN111461445B (en) Short-term wind speed prediction method and device, computer equipment and storage medium
CN110633859B (en) Hydrologic sequence prediction method integrated by two-stage decomposition
CN111967183A (en) Method and system for calculating line loss of distribution network area
CN114707712A (en) Method for predicting requirement of generator set spare parts
CN116451848A (en) Satellite telemetry data prediction method and device based on space-time attention mechanism
CN116484747A (en) Sewage intelligent monitoring method based on self-adaptive optimization algorithm and deep learning
CN112070272A (en) Method and device for predicting icing thickness of power transmission line
CN114694379B (en) Traffic flow prediction method and system based on self-adaptive dynamic graph convolution
CN115879369A (en) Coal mill fault early warning method based on optimized LightGBM algorithm
CN116541667B (en) Interpolation method and system for buoy time sequence data missing value
CN111144473B (en) Training set construction method, training set construction device, electronic equipment and computer readable storage medium
CN115326397B (en) Method and related device for establishing crankshaft bearing wear degree prediction model and prediction method
CN111626472B (en) Scene trend judgment index computing system and method based on depth hybrid cloud model
RU2744041C1 (en) Method and a system for predicting time series values using an artificial neural network
CN114638421A (en) Method for predicting requirement of generator set spare parts
CN113902187A (en) Time-of-use electricity price prediction method and device and terminal equipment
CN111126659A (en) Power load prediction method and system
CN111210877A (en) Method and device for deducing physical property parameters
CN111242379A (en) Nuclear recursive maximum correlation entropy time sequence online prediction method based on random Fourier features
CN110990766A (en) Data prediction method and storage medium
Krifa et al. Parametric complexity reduction of discrete-time linear systems having a slow initial onset or delay

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant