CN117972604A - Ocean observation data anomaly detection method based on adjacent site space-time correlation - Google Patents

Ocean observation data anomaly detection method based on adjacent site space-time correlation Download PDF

Info

Publication number
CN117972604A
CN117972604A CN202410124766.4A CN202410124766A CN117972604A CN 117972604 A CN117972604 A CN 117972604A CN 202410124766 A CN202410124766 A CN 202410124766A CN 117972604 A CN117972604 A CN 117972604A
Authority
CN
China
Prior art keywords
time
model
anomaly
observation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410124766.4A
Other languages
Chinese (zh)
Inventor
宋晓
刘玉龙
徐珊珊
岳心阳
苗庆生
杨杨
郑兵
韦广昊
丁峰
李维禄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NATIONAL MARINE DATA AND INFORMATION SERVICE
Original Assignee
NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NATIONAL MARINE DATA AND INFORMATION SERVICE filed Critical NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority to CN202410124766.4A priority Critical patent/CN117972604A/en
Publication of CN117972604A publication Critical patent/CN117972604A/en
Pending legal-status Critical Current

Links

Landscapes

  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention discloses a method for detecting ocean observation data anomalies based on adjacent site space-time correlation, which comprises the following steps: constructing a marine observation standard data set, adaptively capturing the dynamic relevance among all neighbor sites in the space dimension by using an attention mechanism method, forming a multi-dimensional long-time sequence formed by multiple marine elements by using a sliding time window mechanism, and dividing the data set; constructing an anomaly detection model, setting model training parameters, acquiring anomaly scores of each training sample, developing model training, and determining a threshold value of the anomaly scores; filling up the blank data of the test sample, inputting the test data into an anomaly detection model to obtain anomaly scores, judging anomaly points, determining specific anomaly elements and marking, and carrying out performance evaluation of each observation element. The invention comprehensively considers the space-time association relation between the ocean observation sites, and can rapidly and accurately detect the abnormal value of the ocean real-time observation data element information by using a deep learning method.

Description

Ocean observation data anomaly detection method based on adjacent site space-time correlation
Technical Field
The invention relates to the technical field of ocean information, in particular to an ocean observation data anomaly detection method based on adjacent site space-time correlation.
Background
The long-term continuity and real-time property of the marine environment observation data make the marine environment observation data an important data base for researching marine phenomena, but because observation equipment is influenced by artificial factors, non-artificial factors, site transition, platform drift, instrument change, time/calculation method change during observation and the like, certain deviation exists in the observation data sometimes, scientific researchers are required to conduct management on the quality of the data, and abnormal values are identified. Meanwhile, with the technical development of marine observation instruments, the types and the data volumes of marine observation data are extremely fast increased, and the marine observation data are faced with multi-source observation data with various sources, different formats and huge data, and the timely and effective development of data quality control on the multi-source observation data is a great challenge for marine processors.
The traditional quality control method in China at present mainly comprises date rationality inspection, login point inspection, range inspection, correlation inspection, incremental inspection and the like, and the traditional method can identify partial abnormal data, but has low mass data processing efficiency and long abnormal detection time, and the traditional quality control method does not consider space-time association relations between adjacent ocean sites and between ocean elements. With the development of big data technology, machine learning technology is good at searching potential rules among elements in massive observation data, and some domestic scholars propose and try to use association rules and data mining methods to develop abnormal detection of ocean observation data, for example, vector-held machine algorithm is adopted to develop quality control of ocean station multi-element data, and association rule mining algorithm based on interestingness model is used to conduct quality control on ocean drifting buoy data. However, the method is to add the spatial correlation characteristics of the adjacent sites as the abnormal detection influence factors of the observation elements of the ocean site, and meanwhile, the time dimension data correlation among multiple elements is not considered, so that the detection accuracy is required to be improved.
Disclosure of Invention
In view of the above, the invention provides a method for detecting abnormal ocean observation data based on space-time association of adjacent sites, which at least solves the above part of technical problems, comprehensively considers the space-time association relationship between ocean observation sites, and can rapidly and accurately detect abnormal values of element information of ocean real-time observation data by using a deep learning method without manual intervention.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the embodiment of the invention provides a method for detecting ocean observation data anomalies based on adjacent site space-time correlation, which comprises the following steps:
S1, constructing a marine observation standard data set, adaptively capturing the dynamic relevance among all neighbor sites in the space dimension by using an attention mechanism method, forming a multi-dimensional long-time sequence data set consisting of multi-marine observation elements by using a sliding time window mechanism, and dividing the multi-dimensional long-time sequence data set into a training sample and a test sample;
S2, constructing an anomaly detection model based on Pytorch deep learning, initially setting model training parameters, developing model training, acquiring anomaly scores of each training sample by using an LSTM network and a variational self-encoder (VAE) of the model, and determining a threshold value of the anomaly scores;
S3, filling blank data of the test sample, inputting the test data into an anomaly detection model to obtain anomaly scores, judging anomaly points, determining specific anomaly elements and marking, and performing performance inspection and evaluation on each marine observation element.
Optionally, the step S1 specifically includes:
S11, standardization processing of observed data: a plurality of ocean observation element values which are acquired through a sensor and are continuous in time are subjected to normalization processing, so that the dimension difference of each variable is eliminated, and an ocean observation standard data set is constructed;
S12, extracting space information: according to a time sequence, utilizing an attention mechanism method to adaptively capture dynamic relevance among all neighbor stations in a space dimension, extracting effective information, and setting different influence weights for adjacent stations;
S13, extracting sliding time window information: selecting a time window with the length w from an observation standard data set of a marine station to be detected, and constructing a multi-dimensional long-time sequence data set consisting of a plurality of marine observation elements;
S14, data set division: and carrying out data set division on the multi-dimensional long-time sequence data sets of the adjacent sites by adopting a reserving method according to the sequence of the time sequence, and dividing training samples and test samples according to a preset proportion.
Optionally, the step S2 specifically includes:
S21, building a model: constructing an anomaly detection model based on Pytorch deep learning frames, wherein the model comprises an LSTM network and a variational self-encoder (VAE); the LSTM network is used for extracting time sequence dependency relations among different time sequences in the samples, the output of the LSTM network is used as the input of the variable self-encoder VAE, the variable self-encoder VAE is used for mapping input data into random variables, and probability reconstruction is carried out on the input samples to obtain abnormal scores;
S22, setting model parameters: selecting a loss function and an optimizer, and updating model parameters by setting a time window, iteration times, batch processing size and learning rate and using random gradient descent;
S23, model training: taking the training sample in the step S1 as a model input, training the established anomaly detection model, continuously optimizing the model, and storing model parameters when the loss function is minimized;
s24, threshold selection: obtaining the abnormal score of each sample through model training, forming a univariate time sequence by all the abnormal scores, and automatically selecting the threshold according to the extremum theory.
Optionally, the step S3 specifically includes:
s31, filling in vacant data: supplementing missing data in the test sample by using Ma Erka f Monte Carlo algorithm;
s32, abnormal point judgment: sending the test sample into an anomaly detection model to obtain anomaly scores, and judging anomaly points according to the threshold value of the anomaly scores;
s33, marking abnormal elements: normalizing the abnormal score of each ocean observation element, and marking the abnormal element according to a preset rule;
S34, checking and evaluating: and (5) independently calculating detection performance of each marine observation element by using a plurality of indexes, and evaluating model results.
Optionally, the marine observation element comprises: surface water temperature, surface salinity, air temperature, air pressure, relative humidity, wind speed and wind direction.
Optionally, in step S12, the dynamic relevance between all neighboring sites in the space dimension is adaptively captured by using an attention mechanism method according to a time sequence; the input is an observation data set of the detection site and the neighbor site, and the output is a space code obtained by weighting and summing according to the attention weight; the calculation formula of the space coding comprises:
ri (t)=ai (t)+ci (t)
Wherein r i (t) represents the spatial coding of the detection site i at the time t, and comprises spatial correlation characteristic information, so that effective information in the time dimension is provided for anomaly detection; a i (t) represents the original observation data observed by the detection site i at time t; c i (t) represents the sum of all neighbor site attention weighting; w j represents the attention weight of neighbor site j; b i,j (t) represents the value vector of the j-th neighbor station of the detection station i at the time t; n represents the number of neighbor sites; q represents a query vector; k j represents the key vector of the j-th neighbor station; k z represents the key vector of the z-th neighbor station; d is an attention scoring function.
Optionally, in step S22, the loss function selects a root mean square error function, and the optimizer selects an Adam optimizer.
Optionally, in step S34, the multiple indexes include: accuracy, precision, recall, and F1 score.
Compared with the prior art, the invention has at least the following beneficial effects:
1. The invention provides a method for detecting abnormal ocean observation data based on space-time correlation of adjacent sites, which aims at the problems of low detection efficiency and high manual labeling cost of the existing ocean data abnormal detection technology, comprehensively considers the space-time correlation relationship between ocean observation sites, and can rapidly and accurately detect abnormal values of ocean real-time observation data element information by using a deep learning method without manual intervention.
2. According to the ocean observation data anomaly detection method based on the space-time correlation of adjacent sites, aiming at the phenomenon that space-time correlation exists among elements among ocean multi-sites, the dynamic correlation among all adjacent sites in the space dimension is captured by combining an attention self-adaptive mechanism, and meanwhile, the time sequence dependency relationship among different time sequences in the ocean observation elements is extracted by utilizing an LSTM network, so that the high-efficiency and accurate judgment of the anomaly value of the ocean observation elements can be realized. The method effectively improves the speed and accuracy of detecting the abnormal value of the ocean observation element, and has excellent technical support for the abnormal detection business operation of the ocean observation data.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
Fig. 1 is a schematic flow chart of a method for detecting anomaly of marine observation data based on space-time correlation of adjacent sites according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a spatial information extraction flow according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. Moreover, various numbers and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention provides a method for detecting ocean observation data anomalies based on adjacent site space-time correlation, which mainly comprises the following steps:
S1, constructing a marine observation standard data set, adaptively capturing the dynamic relevance among all neighbor sites in the space dimension by using an attention mechanism method, forming a multi-dimensional long-time sequence data set consisting of multi-marine observation elements by using a sliding time window mechanism, and dividing the multi-dimensional long-time sequence data set into a training sample and a test sample;
S2, constructing an anomaly detection model based on Pytorch deep learning, initially setting model training parameters, developing model training, acquiring anomaly scores of each training sample by using an LSTM network and a variational self-encoder (VAE) of the model, and determining a threshold value of the anomaly scores;
S3, filling blank data of the test sample, inputting the test data into an anomaly detection model to obtain anomaly scores, judging anomaly points, determining specific anomaly elements and marking, and performing performance inspection and evaluation on each marine observation element.
The following describes in detail specific embodiments of the process according to the invention:
1. data set preparation
① And (3) normalizing the observed data: m observation element values such as surface water temperature, surface salinity, air temperature and air pressure which are obtained through collection of various sensors and are continuous in time are normalized, the dimensional difference of each variable is eliminated, and then standard formats of time, longitude, latitude, surface water temperature, surface salinity, air temperature, air pressure, relative humidity, wind speed and wind direction are output.
② Spatial information extraction: assuming that there are N observation sites in a certain sea area, the same M ocean elements are observed together, and the sites in the same sea area are neighboring sites. And (3) adaptively capturing the dynamic relevance among all neighbor stations in the space dimension according to the time sequence by using an attention mechanism method for the ocean observation data of the neighbor stations, extracting available information, and setting different influence weights for the neighbor stations. And extracting spatial characteristics of the observation data at each moment by using a neighbor point attention mechanism, obtaining a spatial coding representation of the detection site, and integrating the spatial coding representation into the site data to be detected. In this embodiment, the spatial information extraction flow is shown in fig. 2, and the attention mechanism method is used to adaptively capture the dynamic association between adjacent sites in the spatial dimension. The input is the observation data set of the detection site and the neighbor site, and the output is the space coding representation obtained by weighting and summing according to the attention weight. Taking observation site i as an example, data X t at time t includes observation data a i of site i and observation data B i of all sites in the area, i.e., X i=(Ai,Bi). Note that B i includes data a i of the detection site i itself.
Ai={ai (t-T+1),ai (t-T+2),...ai (t)}
Where a i (t) represents the data of the sea element observed at station i at time t.
Bi={Bi,1,Bi,2,...Bi,N}
Where B i is the set of observations of N sites in the region for site i, N representing the number of neighbor sites.
Bi,j={bi,j (t-T+1),bi,j (t-T+2),…bi,j (t)}
Wherein B i,j represents the observation data of the j-th neighbor site of site i, and T represents the length of the history time window. b i,j (t) represents the data (value vector) of the sea elements observed by the neighbor site j at time t.
The observed data of the detection site i is used as a query vector q=q (a i (t)) and information is extracted through a neural network Q, the observed data of the neighbor site is used as a key vector tuple after information is extracted through a fully-connected network K, wherein the key vector of the j-th neighbor site is K j=K(bi,j (t), and the value vector of the j-th neighbor site is b i,j (t). For the query vector q, the dot product of the query vector q and each key is calculated, and the attention distribution is acquired through a Softmax layer, wherein the calculation formula is as follows:
Where w j represents the attention weight of neighbor site j; q represents a query vector; k j represents the key vector of the j-th neighbor station; k z represents the key vector of the z-th neighbor station; d is an attention scoring function.
The weighted summation of attention weights is carried out on all neighbor stations, and the calculation formula is as follows:
Wherein b i,j (t) is the value vector of the j-th neighbor site.
Therefore, for each time step, a spatial coding representation r i (t),ri (t) can be constructed for the detection site i, which contains the spatial correlation characteristic information, and can provide effective information in the time dimension for anomaly detection, and the calculation formula is as follows:
ri (t)=ai (t)+ci (t)
where a i (t) represents the original observations and c i (t) is the weighted sum of all neighbor site attention weights.
③ Sliding time window information extraction: and selecting a time window with the length w from the multi-element observation standard data set of the ocean station to be detected, and forming a multi-dimensional long-time sequence formed by the multi-ocean elements in the fixed time window.
④ Data set partitioning: and dividing the data set of the multi-element data set observed by the adjacent ocean sites by adopting a leave-out method according to the sequence of the time sequence, and dividing the training sample and the test sample according to the proportion of 7:3.
2. Constructing an anomaly detection model and a training model
① Building a model: model building is carried out on Pytorch deep learning frames, sequences processed by a spatial attention mechanism are extracted through a sliding time window to serve as inputs of an LSTM training layer, time sequence dependency relations among different time sequences in LSTM extracted samples are utilized, outputs of the LSTM are used as inputs of variable self-encoders (VAEs), the VAE training layer is utilized to map input data into random variables, probability reconstruction is carried out on the input marine multi-element time sequence samples, abnormal scores are obtained, and initial building of an LSTM-VAE model (abnormal detection model) is completed.
② Setting model parameters: and selecting a loss function and an optimizer, and updating model parameters by setting a time window, iteration times, batch processing size, learning rate and the like through random gradient descent.
③ Model training: and (3) taking the training sample in the step (S1) as a model input, training an initial built abnormal detection model, continuously optimizing the model, and when the loss function is minimized, considering that the training model is optimal at the moment, and storing model parameters.
④ Threshold selection: obtaining the abnormal score of each sample through model training, forming a univariate time sequence by all the abnormal scores, automatically selecting a threshold according to an extremum theory (Extreme Value Theory, EVT), and taking an observation point with the abnormal score smaller than the threshold as an abnormal point.
3. Model test and verification assessment
① Filling in the blank data: and supplementing missing data in the test sample by using Ma Erka Fu Monte Carlo (MCMC) algorithm, solving the problem that the missing data possibly causes deviation to hidden vectors mapped by the editor, and facilitating the test of the integrity of the data sliding window data.
② Abnormal point judgment: and sending the observed multi-element test sample into an anomaly detection model to obtain anomaly scores, wherein the anomaly scores obtained by simultaneously carrying out anomaly detection on a plurality of ocean elements by the model are the result of accumulation of the anomaly scores of all elements, comparing the anomaly scores of the observed sample at a certain moment with the magnitude relation of a threshold value, and judging as an anomaly point if the anomaly scores are smaller than the threshold value.
③ Abnormal element marking: after the abnormal score of each element is normalized, sorting is carried out according to the value from small to large, the element with the top ranking is marked as abnormal, the other elements are marked as normal, and the abnormal value marking of all the elements is observed at a certain moment.
④ And (3) checking and evaluating: and (3) independently calculating detection performance of each element such as the observed surface water temperature, surface salinity, air temperature, air pressure, relative humidity, wind speed, wind direction and the like by using four indexes such as accuracy, precision, recall ratio and F1 fraction, and evaluating a model result.
The marine observation data anomaly detection method based on adjacent site space-time correlation is described below by taking domestic marine site Dan Pu as an example:
s1, data set preparation
① Observation data standard processing: domestic marine site Dan Pu sites acquired through marine business observation acquisition, time range: 1/2009 to 12/30/2020, elements: the surface water temperature, the surface salinity, the air temperature, the air pressure, the relative humidity, the wind speed, the wind direction and the like are subjected to normalization processing, so that sample data are uniformly mapped to a [0,1] interval;
② Spatial information extraction: five ocean sites including a sea area spanish mackerel ring (BYQ), shacheng (SCG), south Huang Dao (NHD), head (STO) and six-transverse (LHD) near Dan Pu sites are selected as neighbor sites, and the influence weights of the neighbor sites are set by using an attention mechanism method;
③ Extracting time information: selecting a time window with the length of w=12, and constructing a multi-dimensional long-time sequence data set consisting of a plurality of ocean elements;
④ Data set partitioning: the training samples and the test samples were divided in a ratio of 7:3.
S2, constructing an anomaly detection training model
① Constructing an anomaly detection model: setting up a deep learning frame by Pytorch, setting up an LSTM-VAE model, setting the number of layers of the LSTM model as 1 and the number of hidden layer units as 500, taking a sequence processed by an attention mechanism as the input of the LSTM, and finishing the initial establishment of the model;
② Setting model parameters: the loss function selects Root Mean Square Error (RMSE), the optimizer selects Adam optimizer, each round of training divides the training set into small data sets with equal size to accelerate model convergence, the batch processing size is 512, the total iteration number is 1000, and the learning rate is 0.001.
③ Model training: the training sample in the step S1 is used as a model input, and an initially established abnormality detection model is trained;
④ Threshold selection: the threshold is automatically selected based on extremum theory (Extreme Value Theory, EVT).
S3, model test and inspection evaluation
① Filling in the blank data: supplementing missing data in the test sample by using Ma Erka Fu Monte Carlo (MCMC) algorithm;
② Abnormal point judgment: sending the observed multi-element test sample into an anomaly detection model to obtain anomaly scores;
③ Abnormal element marking: after normalizing the abnormal score of each element, sorting according to the value from small to large, marking the element with the top ranking as abnormal, and marking the other elements as normal;
④ And (3) checking and evaluating: and (3) independently calculating detection performance of each element such as the observed surface water temperature, surface salinity, air temperature, air pressure, relative humidity, wind speed, wind direction and the like by using four indexes such as accuracy, precision, recall ratio and F1 score, evaluating a model result, wherein the evaluation result is shown in a table 1.
Table 1 test evaluation results
Evaluation index Surface water temperature Salinity of surface layer Air temperature Air pressure Relative humidity of Wind direction Wind speed
Accuracy rate of 0.970254 0.921571 0.981109 0.952375 0.957525 0.943574 0.965426
Precision ratio of 0.985308 0.95899 0.991126 0.976811 0.983521 0.984219 0.932031
Recall ratio 0.962526 0.905334 0.973312 0.939904 0.961686 0.947216 0.987035
F1 fraction 0.977723 0.944685 0.986092 0.964438 0.971235 0.955721 0.960426
From the table, the method can realize the efficient and accurate judgment of the abnormal value of the ocean observation element.
From the description of the above embodiments, it can be known by those skilled in the art that the present invention provides a method for detecting abnormal marine observation data based on space-time correlation of adjacent sites, and the method aims at the problems of low detection efficiency and high manual labeling cost existing in the existing technique for detecting abnormal marine data, and adopts a deep learning method to construct an unsupervised abnormal detection model, thereby reducing the cost of manual intervention and improving the abnormal detection rate; aiming at the phenomenon that space-time correlation exists among elements among ocean multi-sites, the dynamic correlation among all neighbor sites in the space dimension is captured by combining an attention self-adaptive mechanism, and meanwhile, the time sequence dependency relationship among different time sequences in the ocean observation elements is extracted by utilizing an LSTM network, so that the efficient and accurate determination of the abnormal value of the ocean observation elements can be realized. The method effectively improves the speed and accuracy of detecting the abnormal value of the ocean observation element, and has excellent technical support for the abnormal detection business operation of the ocean observation data.
Further, the invention also provides a system for detecting the abnormal ocean observation data based on the space-time correlation of the adjacent sites, which is applied to the method for detecting the abnormal ocean observation data based on the space-time correlation of the adjacent sites in the embodiment, and the system comprises the following steps:
the data set preparation module is used for constructing a marine observation standard data set, adaptively capturing the dynamic relevance among all neighbor sites in the space dimension by using an attention mechanism method, forming a multi-dimensional long-time sequence data set consisting of a plurality of marine observation elements by using a sliding time window mechanism, and dividing the multi-dimensional long-time sequence data set into a training sample and a test sample;
The model construction and training module is used for constructing an anomaly detection model based on Pytorch deep learning, preliminarily setting model training parameters, developing model training, acquiring anomaly scores of each training sample by utilizing an LSTM network and a variational self-encoder (VAE) of the model, and determining a threshold value of the anomaly scores;
The model test and inspection evaluation module is used for filling up the blank data of the test sample, inputting the test data into the anomaly detection model to obtain anomaly scores, judging anomaly points, determining specific anomaly elements and marking, and performing performance inspection evaluation on each ocean observation element.
The system provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity description, the corresponding contents in the foregoing method embodiment may be referred to for the parts of the system embodiment that are not mentioned, and will not be described herein again.
In addition, an embodiment of the present invention further provides a storage medium having stored thereon one or more programs readable by a computing device, the one or more programs including instructions, which when executed by the computing device, cause the computing device to perform a method for detecting a marine observation data anomaly based on adjacent site spatiotemporal association in the above embodiment.
In an embodiment of the present invention, the storage medium may be, for example, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the storage medium include: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, and any suitable combination of the foregoing.
It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a method, system, or computer program product, or the like. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
It is to be noticed that the term 'comprising', does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A method for detecting ocean observation data anomalies based on adjacent site space-time correlation is characterized by comprising the following steps:
S1, constructing a marine observation standard data set, adaptively capturing the dynamic relevance among all neighbor sites in the space dimension by using an attention mechanism method, forming a multi-dimensional long-time sequence data set consisting of multi-marine observation elements by using a sliding time window mechanism, and dividing the multi-dimensional long-time sequence data set into a training sample and a test sample;
S2, constructing an anomaly detection model based on Pytorch deep learning, initially setting model training parameters, developing model training, acquiring anomaly scores of each training sample by using an LSTM network and a variational self-encoder (VAE) of the model, and determining a threshold value of the anomaly scores;
S3, filling blank data of the test sample, inputting the test data into an anomaly detection model to obtain anomaly scores, judging anomaly points, determining specific anomaly elements and marking, and performing performance inspection and evaluation on each marine observation element.
2. The method for detecting abnormal ocean observation data based on adjacent site space-time correlation according to claim 1, wherein the step S1 specifically comprises:
S11, standardization processing of observed data: a plurality of ocean observation element values which are acquired through a sensor and are continuous in time are subjected to normalization processing, so that the dimension difference of each variable is eliminated, and an ocean observation standard data set is constructed;
S12, extracting space information: according to a time sequence, utilizing an attention mechanism method to adaptively capture dynamic relevance among all neighbor stations in a space dimension, extracting effective information, and setting different influence weights for adjacent stations;
S13, extracting sliding time window information: selecting a time window with the length w from an observation standard data set of a marine station to be detected, and constructing a multi-dimensional long-time sequence data set consisting of a plurality of marine observation elements;
S14, data set division: and carrying out data set division on the multi-dimensional long-time sequence data sets of the adjacent sites by adopting a reserving method according to the sequence of the time sequence, and dividing training samples and test samples according to a preset proportion.
3. The method for detecting abnormal ocean observation data based on adjacent site space-time correlation according to claim 2, wherein the step S2 specifically comprises:
S21, building a model: constructing an anomaly detection model based on Pytorch deep learning frames, wherein the model comprises an LSTM network and a variational self-encoder (VAE); the LSTM network is used for extracting time sequence dependency relations among different time sequences in the samples, the output of the LSTM network is used as the input of the variable self-encoder VAE, the variable self-encoder VAE is used for mapping input data into random variables, and probability reconstruction is carried out on the input samples to obtain abnormal scores;
S22, setting model parameters: selecting a loss function and an optimizer, and updating model parameters by setting a time window, iteration times, batch processing size and learning rate and using random gradient descent;
S23, model training: taking the training sample in the step S1 as a model input, training the established anomaly detection model, continuously optimizing the model, and storing model parameters when the loss function is minimized;
s24, threshold selection: obtaining the abnormal score of each sample through model training, forming a univariate time sequence by all the abnormal scores, and automatically selecting the threshold according to the extremum theory.
4. The method for detecting abnormal ocean observation data based on adjacent site space-time correlation according to claim 3, wherein the step S3 specifically comprises:
s31, filling in vacant data: supplementing missing data in the test sample by using Ma Erka f Monte Carlo algorithm;
s32, abnormal point judgment: sending the test sample into an anomaly detection model to obtain anomaly scores, and judging anomaly points according to the threshold value of the anomaly scores;
s33, marking abnormal elements: normalizing the abnormal score of each ocean observation element, and marking the abnormal element according to a preset rule;
S34, checking and evaluating: and (5) independently calculating detection performance of each marine observation element by using a plurality of indexes, and evaluating model results.
5. The method for detecting abnormal marine observation data based on space-time correlation of adjacent sites according to claim 1 or 4, wherein the marine observation elements comprise: surface water temperature, surface salinity, air temperature, air pressure, relative humidity, wind speed and wind direction.
6. The method for detecting abnormal ocean observation data based on space-time correlation of adjacent sites according to claim 2, wherein in the step S12, the dynamic correlation between all adjacent sites in the space dimension is adaptively captured by using an attention mechanism method according to a time sequence; the input is an observation data set of the detection site and the neighbor site, and the output is a space code obtained by weighting and summing according to the attention weight; the calculation formula of the space coding comprises:
ri (t)=ai (t)+ci (t)
Wherein r i (t) represents the spatial coding of the detection site i at the time t, and comprises spatial correlation characteristic information, so that effective information in the time dimension is provided for anomaly detection; a i (t) represents the original observation data observed by the detection site i at time t; c i (t) represents the sum of all neighbor site attention weighting; w j represents the attention weight of neighbor site j; b i,j (t) represents the value vector of the j-th neighbor station of the detection station i at the time t; n represents the number of neighbor sites; q represents a query vector; k j represents the key vector of the j-th neighbor station; k z represents the key vector of the z-th neighbor station; d is an attention scoring function.
7. A method for detecting abnormal ocean observation data based on space-time correlation of adjacent sites according to claim 3, wherein in the step S22, the loss function selects a root mean square error function, and the optimizer selects an Adam optimizer.
8. The method for detecting abnormal ocean observation data based on space-time correlation of adjacent sites according to claim 4, wherein in the step S34, the plurality of indexes include: accuracy, precision, recall, and F1 score.
CN202410124766.4A 2024-01-30 2024-01-30 Ocean observation data anomaly detection method based on adjacent site space-time correlation Pending CN117972604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410124766.4A CN117972604A (en) 2024-01-30 2024-01-30 Ocean observation data anomaly detection method based on adjacent site space-time correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410124766.4A CN117972604A (en) 2024-01-30 2024-01-30 Ocean observation data anomaly detection method based on adjacent site space-time correlation

Publications (1)

Publication Number Publication Date
CN117972604A true CN117972604A (en) 2024-05-03

Family

ID=90856764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410124766.4A Pending CN117972604A (en) 2024-01-30 2024-01-30 Ocean observation data anomaly detection method based on adjacent site space-time correlation

Country Status (1)

Country Link
CN (1) CN117972604A (en)

Similar Documents

Publication Publication Date Title
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN112651167A (en) Semi-supervised rolling bearing fault diagnosis method based on graph neural network
CN111460728A (en) Method and device for predicting residual life of industrial equipment, storage medium and equipment
CN112132102B (en) Intelligent fault diagnosis method combining deep neural network with artificial bee colony optimization
CN116448419A (en) Zero sample bearing fault diagnosis method based on depth model high-dimensional parameter multi-target efficient optimization
CN112001110A (en) Structural damage identification monitoring method based on vibration signal space real-time recursive graph convolutional neural network
CN115374995A (en) Distributed photovoltaic and small wind power station power prediction method
CN114462718A (en) CNN-GRU wind power prediction method based on time sliding window
CN108764527B (en) Screening method for soil organic carbon library time-space dynamic prediction optimal environment variables
CN113591215A (en) Abnormal satellite component layout detection method based on uncertainty
CN112818608A (en) Medium-and-long-term runoff forecasting method based on improved particle swarm optimization algorithm and support vector machine
CN111833310A (en) Surface defect classification method based on neural network architecture search
CN115654381A (en) Water supply pipeline leakage detection method based on graph neural network
CN116169670A (en) Short-term non-resident load prediction method and system based on improved neural network
CN116309310A (en) Pathological image cell nucleus detection method combining global regularization and local countermeasure learning
CN117556369B (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN115033591A (en) Intelligent detection method and system for electricity charge data abnormity, storage medium and computer equipment
CN115438897A (en) Industrial process product quality prediction method based on BLSTM neural network
CN117408167A (en) Debris flow disaster vulnerability prediction method based on deep neural network
CN117370766A (en) Satellite mission planning scheme evaluation method based on deep learning
CN116757321A (en) Solar direct radiation quantity prediction method, system, equipment and storage medium
CN117972604A (en) Ocean observation data anomaly detection method based on adjacent site space-time correlation
CN113449466B (en) Solar radiation prediction method and system for optimizing RELM based on PCA and chaos GWO
CN115753102A (en) Bearing fault diagnosis method based on multi-scale residual error sub-domain adaptation
CN112348700B (en) Line capacity prediction method combining SOM clustering and IFOU equation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination