CN116739048B - Multi-element transducer-based lightning long-term prediction model, method and system - Google Patents
Multi-element transducer-based lightning long-term prediction model, method and system Download PDFInfo
- Publication number
- CN116739048B CN116739048B CN202311027952.8A CN202311027952A CN116739048B CN 116739048 B CN116739048 B CN 116739048B CN 202311027952 A CN202311027952 A CN 202311027952A CN 116739048 B CN116739048 B CN 116739048B
- Authority
- CN
- China
- Prior art keywords
- data
- lightning
- time
- tau
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000007774 longterm Effects 0.000 title claims abstract description 46
- 230000004913 activation Effects 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 99
- 230000009466 transformation Effects 0.000 claims description 57
- 238000000354 decomposition reaction Methods 0.000 claims description 30
- 230000000737 periodic effect Effects 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 11
- 230000001131 transforming effect Effects 0.000 claims description 11
- 230000001934 delay Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000008521 reorganization Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 230000005684 electric field Effects 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000009825 accumulation Methods 0.000 abstract description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000000295 complement effect Effects 0.000 abstract description 2
- 238000012216 screening Methods 0.000 abstract description 2
- 230000000750 progressive effect Effects 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 56
- 239000003795 chemical substances by application Substances 0.000 description 8
- 238000009826 distribution Methods 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000000844 transformation Methods 0.000 description 3
- 238000011049 filling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000002310 reflectometry Methods 0.000 description 2
- 101100134058 Caenorhabditis elegans nth-1 gene Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01W—METEOROLOGY
- G01W1/00—Meteorology
- G01W1/10—Devices for predicting weather conditions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Environmental & Geological Engineering (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Atmospheric Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Ecology (AREA)
- Environmental Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to the technical field of artificial intelligence and lightning early warning, in particular to a lightning long-term prediction model, method and system based on a multi-element transducer. The invention provides a thunder long-term prediction model, which is used for carrying out de-moderation treatment on thunder data, wherein a moderation coding module comprises a moderation layer, a first linear network, a first correlation operation layer and an activation output layer, and codes at a time point with high correlation through correlation calculation and screening; the time dependence of the lightning data is better captured through the progressive accumulation of the lightning data and the time code; the multi-dimensional lightning data time sequence is subjected to alleviation treatment, the sudden occurrence of lightning is reduced, and the prediction accuracy is greatly improved. In the invention, the lightning parameters and the radar echo images are combined into the sample, so that the matching use of various data is realized, the advantages and the disadvantages are taken as a complement, the accuracy of lightning prediction is further improved, and the prediction time length is increased.
Description
Technical Field
The invention relates to the technical field of artificial intelligence and lightning early warning, in particular to a lightning long-term prediction model, method and system based on a multi-element transducer.
Background
Lightning is a strong electromagnetic phenomenon frequently occurring in nature, and has wide influence area and extremely great damage to lightning disasters. Lightning prediction can help humans to cope with future lightning activities in advance, thereby reducing the impact of lightning disasters.
The monitoring data of thunder and lightning are mainly thunder and lightning positioning information and radar echo. The ground flash positioning data of the lightning positioning information are relatively discrete; the space-time resolution of radar data is good, but strong echo only occurs after precipitation particles are formed, so that the prediction time is limited, and even if lightning disasters are predicted, enough time is not needed for protection.
In the prior art, a plurality of technical schemes are provided for the lightning prediction problem, but the following problems exist:
(1) Most of the existing methods only use a certain monitoring source, but the single monitoring source has the problems of low relativity with lightning activity trend, poor capability of processing abnormal data and the like.
(2) Lightning occurs suddenly, however, the existing method rarely considers the problem, and the characteristic cannot be well reflected in the predicted result.
(3) The time mode of the multi-element lightning time sequence data is complex, and coupling, correlation and even causal relation exists among a plurality of parameters. The prior art cannot deeply explore the relation between data and extract the time sequence characteristics of the data.
(4) There is a lack of methods to enable long-term predictions of lightning. The lightning protection requires a preparation time, but the accuracy of the prior art in the lightning long time sequence prediction is lower, so that the time pressure is brought to the lightning protection work.
Disclosure of Invention
In order to overcome the defects of short lightning prediction time and low precision in the prior art, the invention provides a lightning long-term prediction model based on a multi-element transducer, and the accuracy of lightning long-term prediction is improved.
According to the training method of the lightning long-term prediction model based on the multiple transformers, which is provided by the invention, a sample library and a basic model are constructed based on historical data, so that the basic model carries out machine learning on learning samples in the sample library, and the converged basic model is used as the lightning long-term prediction model:
the sample library is used for storing learning samples { (A, G) } (A, G) are time sequence samples, A is a multidimensional lightning data time sequence, and G is a radar echo image sequence; y is a lightning parameter time sequence; the time sequence of y is immediately after the time sequences of A and G;
the input of the basic model is a known time series sample (A, G), and the output is a predicted value Y of a lightning parameter time series; y has the same data structure as Y; the base model codes and decodes the time sequence samples (A, G) to obtain a predicted value Y; a moderation coding module is arranged in the basic model and is used for acquiring moderation codes of the multi-dimensional lightning data time sequence A and inputting the moderation codes into subsequent processing;
The moderating and encoding module comprises a moderating layer, a first linear network, a first correlation operation layer and an activation output layer;
the moderating layer is used for obtaining a normalized value A' of A; the first linear network is used for carrying out linear transformation on A and A ', linearly transforming A into an N-dimensional time sequence A_N and linearly transforming A ' into an N-dimensional time sequence A ' _N; A_N εR tx×N ,A'_N∈R tx×N The method comprises the steps of carrying out a first treatment on the surface of the tx is the number of time points contained in the time series samples (a, G), N is the dimension of the data feature at each of time points a_n and a' _n;
the first correlation operation layer accumulates the data at each time point in A' _N and the time code of the time point line by line to form a vector X (A), and accumulates the data at each time point in A_N and the time code of the time point line by line to form a vector X (0); the time code is a multi-row vector, and each row of vector is an N-dimensional feature array; transforming X (A) into Q (A), K (A) and V (A) by adopting different linear transformation modes, and transforming X (0) into Q and K by adopting different linear transformation modes; setting a delay tau, and defining a delay vector of a vector P containing a plurality of time points as a data vector formed by moving data at the former tau time points in the P to the tail end of the sequence; the first correlation operation layer traverses tau epsilon [1, tx-1] to calculate the correlation R (A, tau) corresponding to each tau, and the correlation R (A, tau) is calculated by combining at least partial data per se and delay vectors of partial data in Q (A), K (A), V (A), Q and K; then, obtaining h corresponding delays tau with the maximum correlation R (A, tau) to form a high correlation delay set [ tau 1, tau 2, … …, tau h ], wherein h is a set value related to the number of time points in X (A); the first correlation operation layer calculates a moderation code V (a_corr) of a in combination with the following formula;
V(A_corr)=∑ h i=1 (P(A,τi)×V(A,τi))
V (a, τi) is a vector obtained by transferring data at the first τi time points in V (a) to the end of the sequence; p (a, τi) is the active value of the i-th delay τi in the set of high correlation delays.
Preferably:
R(A,τ)=[1/tx]∑ tx t=1 Af(t)
Af(t)=Linear(σ(a) 2 )⊙q(A,t)×k(A,t,τ) T +μ(Q) T ×k(t,τ) T +q(t)×μ(K)-μ(Q) T ×μ(K) T
wherein Af (t) is an excess parameter, Q (a, t) represents a data vector corresponding to a t-th time point in Q (a); k (a, t, τ) represents a data vector corresponding to the t-th time point in the delay vector of K (a), and K (a, t) represents a data vector corresponding to the t-th time point in K (a); t represents matrix transposition; mu (Q) is the mean value of Q, and mu (K) is the mean value of K; k (t, τ) represents a data vector corresponding to a t-th time point in the delay vector of K; q (t) represents a data vector corresponding to a t-th time point in Q;
σ(a) 2 a variance of A; linear (sigma (a)) 2 ) As a result, q (A, t) represents σ (a) 2 Linear mapping to the same space as the q (a, t) data structure, and then for mapped sigma (a) 2 Multiplying the data corresponding to the position by q (a, t) yields the same calculation result as the q (a, t) data structure.
Preferably, a first coding module is arranged in the basic model, and the first coding module is used for acquiring a code V (G_corr) of the radar echo image sequence G and inputting the code V (G_corr) into subsequent processing; the first encoding module includes: the device comprises a feature extraction network, a dimension adjustment network, a second linear transformation network, a second correlation operation layer and a second output layer;
The feature extraction network is used for extracting the features of G and inputting the extracted features into the dimension adjustment network, and the dimension adjustment network performs dimension adjustment on the extracted features to output the adjusted featuresData G' ∈R tx×N ;
Accumulating the data at each time point in G' with the time code of the time point row by row to form a vector X (G); the second linear transformation network is used for transforming X (G) into Q (G), K (G) and V (G) in different linear transformation modes;
the second correlation operation layer acquires the code V (g_corr) based on Q (G), K (G), and V (G); the second correlation operation layer traverses tau epsilon [1, tx-1] to calculate the correlation R (G, tau) corresponding to each tau, and the correlation R (G, tau) is calculated by combining Q (G) and K (G); obtaining the maximum delay tau of h corresponding correlations R (G, tau) to form a high correlation delay set tau_G= [ tau_G1, tau_G2, … …, tau_Gh ], wherein h is a set value related to the number of time points in X (G);
the second output layer activates each delay in the set of high correlation delays τg; the second output layer computes the code V (g_corr) in combination with the following formula;
V(G_corr)=∑ h i=1 (P(G,i)×V(G,τ_Gi))
v (G, τgi) is a vector obtained by transferring data at the previous τgi time points in V (G) to the end of the sequence; p (G, i) is the active value of the i-th delay τ_gi in the high correlation delay set.
Preferably:
R(G,τ)=[1/tx]∑ tx t=1 [q(G,t)×k(G,t,τ) T ]
wherein Q (G, t) is the data vector corresponding to the t-th time point in Q (G), and K (G, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (G).
Preferably, the dimension adjustment network comprises a downsampling layer and a full-connection layer, the downsampling layer adjusts the resolution of the image G (t) in the G to a set value, and then the image features with the resolution adjusted downwards are mapped into the N-dimensional space through the full-connection layer to form G'.
Preferably, the base model includes an encoding portion and a decoding portion, the encoding portion including: the system comprises a time coding module, a moderation coding module, a first time decomposition network, a second time decomposition network and a second coding module; the decoding section includes: the device comprises a data reorganization module, a third coding module, a third time decomposition network, a fourth coding module, a residual error network, a fourth time decomposition network and a de-moderation module;
the time coding module is used for coding the time point to obtain a time code corresponding to the time point; the moderating encoding module is used for encoding the multidimensional lightning data time sequence A to obtain an encoding V (A_corr); the first coding module is used for coding the radar echo image sequence G to obtain a code V (G_corr); the first time decomposition network is used for decomposing the code V (A_corr) into a periodic characteristic X (AS) and a trend characteristic X (AT), and the second time decomposition network is used for decomposing the code V (G_corr) into a periodic characteristic X (GS) and a trend characteristic X (GT); the second encoding module is used for encoding the periodic characteristics X (AS) and X (GS) to obtain an encoding V (AG_corr);
The data reorganization module reorganizes the data space R tx×N The periodic characteristic X (AS), the trend characteristic X (AT), the periodic characteristic X (GS), the trend characteristic X (GT) and the code V (AG_corr) are respectively converted into a data space R ty×N X (AS) ', X (AT) ', X (GS) ', X (GT) ' and V (AG_corr) '; the third encoding module encodes in combination with X (AS) 'and X (GS)' to obtain an encoded Q (ag_corr); the third time decomposition network decomposes Q (ag_corr) into a period characteristic X (QS) and a trend characteristic X (QT); the fourth encoding module encodes in combination with the periodic features X (QS) and V (AG_corr)' to obtain an encoded V_corr; the residual network performs feature extraction on the V_corr, and transmits the extracted features to a fourth time decomposition network for decomposition to obtain periodic features Y (S) and trend features Y (T); performing dimension superposition on five data of Y (S), X (AT) ', X (GT)' and trend characteristics X (QT) and Y (T) obtained by a decoding part to obtain a code Y (de) epsilon R ty×N The method comprises the steps of carrying out a first treatment on the surface of the The input of the de-moderating module is Y (de), and the de-moderating module outputs a predicted value Y of the lightning parameter time sequence Y;
the input of the time coding module, the input of the moderating coding module and the input of the first coding module are connected with the input of the basic model; the output of the de-moderating module is used as the output of the basic model;
The second coding module comprises a third linear transformation network, a third correlation operation layer and a third output layer; the third linear transformation network transforms X (AS) into Q (en) and V (en) by adopting different linear transformation modes, and transforms X (GS) into K (en) by adopting a linear transformation mode;
the third encoding module includes: a fourth linear transformation network, a fourth correlation operation layer and a fourth output layer; the fourth linear transformation network transforms X (AS) 'into Q (en_1) and V (en_1) by adopting different linear transformation modes, and transforms X (GS)' into K (en_1) by adopting a linear transformation mode;
the fourth encoding module includes: a fifth linear transformation network, a fifth correlation operation layer and a fifth output layer; the fifth linear transformation network transforms the periodic characteristic X (QS) into Q (en_2) by adopting a linear transformation mode, and transforms V (AG_corr)' into V (en_2) and K (en_2) by adopting different linear transformation modes;
the first coding module, the second coding module, the third coding module and the fourth coding module are mutually independent in parameters; the network structures of the third correlation operation layer, the fourth correlation operation layer and the fifth correlation operation layer are the same as the network structure of the second correlation operation layer; the second output layer, the third output layer, the fourth output layer and the fifth output layer have the same network structure.
Preferably, the data reorganizing unit is used for reorganizing the data space R tx×N The data U in (1) is converted into a data space R ty×N The data conversion mode of the data U' is as follows: extracting data at ta time points from the U to form transition data U0, filling 0 backwards in the time dimension of the U0 to extend until the number of the time points reaches ty, and expanding the U0 to the ty time points to serve as data U'; tx and ty represent time points, N being the data dimension at a single time point; ta is a set value, and ta is less than or equal to ty;
the de-moderating module is combined with the following formula to obtain a predicted value Y of the lightning parameter time sequence Y;
Y=FC{Relu[Linear([σ(a) 2 ])]⊙Y(de)+Relu[Linear([μ(a)])]}
mu (a) is the mean value of A, sigma (a) 2 A variance of A; [ sigma (a) 2 ]Expressed by ty sigma (a) 2 Component vector, [ mu (a)]Representing a vector consisting of ty μ (a); mu (a) epsilon R 1×L ;σ(a) 2 ∈R 1×L ;[σ(a) 2 ]∈R ty×L ,[μ(a)]∈R ty×L The method comprises the steps of carrying out a first treatment on the surface of the FC represents the full connection layer, relu is the activation function, and as such, it represents the multiplication of data at the corresponding positions of two matrices with the same data structure; linear is a Linear map; linear ([ sigma (a)) 2 ]) Representing the pair [ sigma (a) through a linear network 2 ]Results of the Linear mapping ([ mu (a))]) Represents the pair [ mu (a) through a linear network]A result of performing the linear mapping; linear ([ sigma (a)) 2 ])∈R ty ×N ,Linear([μ(a)])∈R ty×N 。
Preferably, the multi-dimensional lightning data time series a comprises one or more of the following lightning parameters: latitude of lightning location, longitude of lightning location, lightning current, and number of ground striking returns; the lightning parameter time series y contains lightning intensity parameters including: one or more of lightning current, lightning voltage and lightning electric field.
According to the lightning long-term prediction method based on the multiple transformers, firstly, a training method of the lightning long-term prediction model based on the multiple transformers is adopted to obtain a lightning long-term prediction model, then multidimensional lightning data and radar echo images at the latest tx time points are obtained to form known samples (A and G), the known samples (A and G) are input into the lightning long-term prediction model, and the lightning long-term prediction model outputs lightning parameters at the next ty time points.
The lightning long-time prediction system based on the multiple transformers is loaded with a computer program, and the computer program is used for realizing the lightning long-time prediction method based on the multiple transformers when being executed.
The invention has the advantages that:
(1) According to the long-term thunder prediction model, the time dependence of thunder data is better captured through row-by-row accumulation of the thunder data and the time code; the multi-dimensional lightning data time sequence is subjected to alleviation treatment, the sudden occurrence of lightning is reduced, and the prediction accuracy is greatly improved. In the invention, the lightning parameters and the radar echo images are combined into the sample, so that the matching use of various data is realized, the advantages and the disadvantages are taken as a complement, the accuracy of lightning prediction is further improved, and the prediction time length is increased.
(2) The invention reserves the time delay with strong part of the correlation through the correlation screening, not only reduces the calculated amount, but also discards the time delay which is irrelevant to the prediction accuracy and even has negative effect on the prediction accuracy, thereby ensuring the accuracy of long-term prediction.
(3) According to the invention, the data is subjected to time decomposition, so that a complex time mode can be decomposed into periodic characteristics and trend characteristics; the periodic characteristics and the trend characteristics are respectively predicted in the model, and the periodic characteristics are relatively stable, so that the model can keep higher accuracy in long time sequence prediction.
(4) In order to utilize the relevance among the multiple data, the invention provides a multiple data fusion method through the intersection of time decomposition and coding, and the relevance among various data can be comprehensively considered.
(5) In a word, considering the implicit periodicity in the lightning parameters and the sudden occurrence of lightning, embedding a period decomposition method and a time delay method in a transducer, designing a moderation prediction mechanism, constructing a lightning prediction model based on multiple characteristics, and improving the long time sequence prediction capability of the transducer model.
Drawings
FIG. 1 is a diagram of a long-term prediction model of lightning;
FIG. 2 is a flow chart of a relaxed encoding method;
FIG. 3 is a flow chart of an encoding method;
FIG. 4 is a comparison of a first set of data in the example;
FIG. 5 is a comparison of the second set of data in the example.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The lightning long-time prediction method based on the multiple transformers provided by the embodiment comprises the following steps:
st1, constructing a sample library and a basic model based on historical data;
the sample library is used for storing learning samples { (A, G) }, y }, A is a multidimensional thunder and lightning data time sequence, G is a radar echo image sequence, and A and G share a time point sequence; y is a lightning parameter time sequence, and the lightning parameter is a lightning intensity parameter, including one or more of lightning current, lightning voltage and lightning electric field.
A=[a(1),a(2),…,a(t),…,a(tx)]
a (t) represents multidimensional lightning data at the t-th time point in the learning sample, a (t) epsilon R L ,A∈R tx×L L is the total number of lightning parameters contained in a (t), a (t, n 1) is the nth 1 lightning parameter at the nth time point, tx is the total number of time points contained in A;
1≤t≤tx;1≤n1≤L;
G=[g(1),g(2),…,g(t),…,g(tx)]
g (t) represents the radar echo image at the t-th time point in the learning sample, g (t) ε R H×W×C Wherein H, W and C are the height, width and number of channels of the radar echo image, respectively;
y=[w(tx+1),w(tx+2),…,w(tx+t2),…,w(tx+ty)]
w (tx+t2) represents the lightning parameters at the t2 th time point in y, and ty represents the total number of time points in y; t2 is more than or equal to 1 and less than or equal to ty; the first time tx+1 in y is the next time to the last time tx in a;
as shown in fig. 1, the base model includes an encoding section and a decoding section, the encoding section including: the system comprises a time coding module, a moderation coding module, a first time decomposition network, a second time decomposition network and a second coding module.
The time coding module is used for coding the time points to obtain time codes S= [ S1, S2, S3, S4] corresponding to the time points; s1 is an N-dimensional vector representing a year, s2 is an N-dimensional vector representing a month, s3 is an N-dimensional vector representing a day, s4 is an N-dimensional vector at the time of representation, and N is a set value.
The moderating encoding module is used for encoding the multidimensional lightning data time sequence A to obtain an encoding V (A_corr);
The first coding module is used for coding the radar echo image sequence G to obtain a code V (G_corr);
the first time decomposition network is used for decomposing the code V (A_corr) into a periodic characteristic X (AS) and a trend characteristic X (AT), and the second time decomposition network is used for decomposing the code V (G_corr) into a periodic characteristic X (GS) and a trend characteristic X (GT);
the second encoding module is used for encoding the periodic features X (AS) and X (GS) to obtain an encoding V (AG_corr).
The decoding section includes: the device comprises a data reorganization module, a third coding module, a third time decomposition network, a fourth coding module, a residual error network, a fourth time decomposition network and a de-moderation module;
the data reorganization module is used for reorganizing the data space R tx×N The periodic feature X (AS), the trend feature X (AT), the periodic feature X (GS), the trend feature X (GT) and the code V (AG_corr) are transferred to the data space R ty×N To generate X (AS) ', X (AT) ', X (GS) ' and X (GT) ' and V (AG_corr) ';
namely X (AS) epsilon R tx×N ,X(AT)∈R tx×N ,X(GS)∈R tx×N ,X(GT)∈R tx×N ;V(AG_corr)∈∈R tx×N The method comprises the steps of carrying out a first treatment on the surface of the X (AS) ' is data after X (AS) conversion, X (AT) ' is data after X (AT) conversion, X (GS) ' is data after X (GS) conversion, X (GT) ' is data after X (GT) conversion, and V (AG_corr) ' is data after V (AG_corr) conversion.
The data reorganizing unit is used for reorganizing the data space R tx×N The data U in (1) is converted into a data space R ty×N The data conversion mode of the data U' is as follows: extracting data at ta time points from the U to form transition data U0, filling 0 backwards in the time dimension of the U0 to extend until the number of the time points reaches ty, and expanding the U0 to the ty time points to serve as data U'; tx and ty represent time points, N being the data dimension at a single time point;ta is a set value, and ta is less than or equal to ty.
The third encoding module encodes in combination with X (AS) 'and X (GS)' to obtain an encoded Q (ag_corr);
the third time decomposition network decomposes Q (ag_corr) into a period characteristic X (QS) and a trend characteristic X (QT);
the fourth encoding module encodes in combination with the periodic features X (QS) and V (AG_corr)' to obtain an encoded V_corr;
the residual network performs feature extraction on the V_corr, and transmits the extracted features to a fourth time decomposition network for decomposition to obtain periodic features Y (S) and trend features Y (T);
performing dimension superposition on five data of Y (S), X (AT) ', X (GT)' and trend characteristics X (QT) and Y (T) obtained by a decoding part to obtain a code Y (de) epsilon R ty×N ;
The input of the de-moderating module is Y (de), and the de-moderating module is combined with the following formula to obtain a predicted value Y of the lightning parameter time sequence Y;
Y=FC{Relu[Linear([σ(a) 2 ])]⊙Y(de)+Relu[Linear([μ(a)])]}
Mu (a) is the mean value of A, sigma (a) 2 A variance of A; [ sigma (a) 2 ]Expressed by ty sigma (a) 2 Component vector, [ mu (a)]Representing a vector consisting of ty μ (a); mu (a) epsilon R 1×L ;σ(a) 2 ∈R 1×L ;[σ(a) 2 ]∈R ty×L ,[μ(a)]∈R ty×L The method comprises the steps of carrying out a first treatment on the surface of the FC represents the full connection layer, relu is the activation function, and as such, it represents the multiplication of data at the corresponding positions of two matrices with the same data structure; linear is a Linear map; linear ([ sigma (a)) 2 ]) Representing the pair [ sigma (a) through a linear network 2 ]Results of the Linear mapping ([ mu (a))]) Represents the pair [ mu (a) through a linear network]A result of performing the linear mapping; linear ([ sigma (a)) 2 ])∈R ty ×N ,Linear([μ(a)])∈R ty×N 。
Linear([σ(a) 2 ]) As a result, Y (de) indicates that [ sigma (a) ] 2 ]Linear mapping to the same space as the Y (de) data structure, and then for mapped [ sigma (a) ] 2 ]Multiplying the data corresponding to the position with Y (de) to obtain the number of q (A, t)According to the same structure.
The parameters of the two linear processes, the parameters of the two Relu functions and the parameters of the full connection layer in the Y calculation formula are all parameters to be learned, the parameters of the two linear processes are mutually independent, and the parameters of the two Relu functions are mutually independent.
The following explanation is made with reference to the operation of the hypothetical parameter vectors B1 and B2:
B1∈R ts×N ,B2∈R ts×N ;
it should be noted that the operation of the additional parameter only needs the same data structure of the two parameters, and in the implementation, ts is the number of time points included in B1 and B2, and N is the data dimension at each time point in B1 and B2. For example, in the above formula for Y, ts=ty.
St2, enabling the basic model to perform machine learning on learning samples in a sample library, and obtaining a learned basic model as a long-term prediction model of thunder and lightning; the input of the lightning long-term prediction model is a sample (A, G) composed of lightning data and radar echo images at the latest tx time points, and the input is a lightning parameter time sequence predicted value at the next ty time points.
St3, when lightning is predicted, multidimensional lightning data and radar echo images at the latest tx time points are obtained to form known samples (A, G), the known samples (A, G) are input into a lightning long-time prediction model, and the lightning long-time prediction model outputs a lightning parameter time sequence predicted value Y formed by lightning parameters at the next ty time points.
In this embodiment, the turbo coding module includes a turbo layer, a first linear network, a first correlation operation layer, and a first output layer.
Referring to fig. 2, the turbo encoding module performs the following steps st1.0-st1.4 to implement turbo encoding of a to output V (a_corr).
St1.0, moderating layer moderates A, in order to obtain normalized value A' of A; the first linear network linearly transforms a into an N-dimensional time series a_n and the first linear network linearly transforms a 'into an N-dimensional time series a' _n.
The input of the moderating layer is A, and the output is normalized value A ', A' E R of A tx×L ;
A'=[a'(1),a'(2),…,a'(t),…,a'(tx)]
a'(t)∈R L The method comprises the steps of carrying out a first treatment on the surface of the a '(t) is the data at time t in a', μ (a) is the mean value of a, σ (a) is the standard deviation of a;
μ(a)=(∑ tx t=1 a(t))/tx
σ(a) 2 =(∑ tx t=1 (a(t)-μ(a)) 2 /tx
a'(t)=(a(t)-μ(a))/σ(a)
the inputs of the first linear network are A and A ', and the outputs are A_N and A' _N; A_N εR tx×N ,A'_N∈R tx×N 。
St1.1, accumulating the data at each time point in A' _N with the time code of the time point row by row to form a vector X (A), accumulating the data at each time point in A_N with the time code of the time point row by row to form a vector X (0); x (A) is transformed into Q (A), K (A) and V (A) by different linear transformation modes, and X (0) is transformed into Q and K by different linear transformation modes.
Specifically, in this step, A_N εR tx×N ,A'_N∈R tx×N The method comprises the steps of carrying out a first treatment on the surface of the Time coding s= [ S1, S2, S3, S4 ]]The method comprises the steps of carrying out a first treatment on the surface of the s1 is an N-dimensional vector representing a year, s2 is an N-dimensional vector representing a month, s3 is an N-dimensional vector representing a day, s4 is an N-dimensional vector at the time of representation, and N is a set value. So that the data structure of X (A) obtained by overlapping each element in A_N with the time code of the corresponding time is the same as that of A_N, namely X (0) epsilon R tx×N The method comprises the steps of carrying out a first treatment on the surface of the Similarly X (A) E R tx×N 。
In this embodiment, taking a_n as an example, data at each time point in the time series and time codes at the time point are accumulated row by row to form a vector X (0) for explanation.
Assuming that a_n= { a_1; a_2; …; a_i1; …; a_tx }, a_tx representing the N-dimensional data characteristic at the tx-th time point in a_n, a_1, a_2 representing the N-dimensional data characteristic at the 1 st and 2 nd time points in a_n, a_i1 representing the N-dimensional data characteristic at the i1 st time point in a_n, respectively; i1 is more than or equal to 1 and tx is more than or equal to 1.
Let the time coding sequence corresponding to a_n be s_a_n= { S (1), S (2), …, S (i 1), …, S (tx) }; s (i 1) is the time code of the i1 st time point in the sequence a, namely: s (i 1) = [ s1_i1, s2_i1, s3_i1, s4_i1], s1_i1, s2_i1, s3_i1, and s4_i1 are N-dimensional vectors of year, month, day, and time, respectively;
then: the data at each time point in a_n is accumulated with the time code at that time point row by row to form a vector X (0) as expressed below:
X(0)={x0_1;x0_2;…;x0_i1;…;x0_tx}
x0_i1={x0_i1_1;x0_i1_2;…;x0_i1_n;…;x0_i1_N}
x0_i1_n=a_i1_n+s1_i1_n+s2_i1_n+s3_i1_n+s4_i1_n
x0_i1 represents the data feature at the i1 st time point in X (0), x0_i1_n represents the feature value in the N-th dimension of x0_i1, and N is equal to or greater than 1 and equal to or less than N; a_i1_n represents a feature value in the n-th dimension of a_i1, s1_i1_n represents a feature value in the n-th dimension of s1_i1, s2_i1_n represents a feature value in the n-th dimension of s2_i1, s3_i1_n represents a feature value in the n-th dimension of s3_i1, and s4_i1_n represents a feature value in the n-th dimension of s4_i1.
St1.2, setting a delay τ, traversing τ ε [1, tx-1], and calculating a correlation R (A, τ) corresponding to each τ according to the following formula.
Let P= [ P (1), P (2), …, P (t), …, P (tx) ]
The delay vector of P is a data vector formed by transferring data at the first tau time points in P to the end of the sequence, and the formula is [ P (tau+1), P (tau+2), … and P (tx); p (1), p (2), …, p (τ) ]; the delay vector of P corresponds to the data at the t-th time point, namely the t-th data in the tx data in [ P (tau+1), P (tau+2), …, P (tx), P (1), P (2), …, P (tau) ].
R(A,τ)=[1/tx]∑ tx t=1 Af(t)
Af(t)=Linear(σ(a) 2 )⊙q(A,t)×k(A,t,τ) T +μ(Q) T ×k(t,τ) T +q(t)×μ(K)-μ(Q) T ×μ(K) T
Wherein Af (t) is an excess parameter, and sigma (a) is the standard deviation of A; q (a, t) represents a data vector corresponding to the t-th time point in Q (a); k (a, t, τ) represents a data vector corresponding to the t-th time point in the delay vector of K (a), and K (a, t) represents a data vector corresponding to the t-th time point in K (a); t represents matrix transposition; mu (Q) is the mean value of Q, and mu (K) is the mean value of K; k (t, τ) represents a data vector corresponding to a t-th time point in the delay vector of K; q (t) represents a data vector corresponding to the t-th time point in Q.
σ(a) 2 A variance of A; linear is a Linear transformation; linear (sigma (a)) 2 ) Representing a pair sigma (a) through a linear network 2 Results of the Linear mapping, linear (σ (a) 2 )∈R 1×N As indicated by the fact that the data at corresponding positions of the two matrices with the same data structure are multiplied; i.e. Linear (sigma (a)) 2 ) As a result, q (A, t) represents σ (a) 2 Linear mapping to the same space as the q (a, t) data structure, and then for mapped sigma (a) 2 Multiplying the data corresponding to the position by q (a, t) yields the same calculation result as the q (a, t) data structure.
It is noted that the three linear transformations in the Y calculation formula, the two linear transformations in Af (t), are independent of each other, and the parameters of the three linear transformations are all parameters to be learned of the basic model.
Specific: q (a) = [ Q (a, 1), Q (a, 2), …, Q (a, t), …, Q (a, tx) ]
q(A,t)=[q(1,A,t),q(2,A,t),…,q(N,A,t)]
Q (a, t) represents a data vector corresponding to the t-th time point in Q (a), Q (1, a, t), Q (2, a, t), …, Q (N, a, t) represent data in the 1 st and 2 … … N dimensions in Q (a, t), respectively.
K(A)=[k(A,1),k(A,2),…,k(A,t),…,k(A,tx)]
k(A,t)=[k(1,A,t),k(2,A,t),…,k(N,A,t)]
K (A, t) represents a data vector corresponding to a t-th time point in K (A), and K (1, A, t), K (2, A, t), … … and K (N, A, t) represent data in 1 st and 2 … … N dimensions in K (A, t) respectively; k (a, t, τ) represents a data vector corresponding to the t-th time point in the delay vector of K (a).
V(A)=[v(A,1),v(A,2),…,v(A,t),…,v(A,tx)]
v(A,t)=[v(1,A,t),v(2,A,t),…,v(N,A,t)]
V (a, t) represents the data vector corresponding to the t-th time point in V (a), V (1, a, t), V (2, a, t), …, V (N, a, t) represent the data in the 1 st and 2 … … N dimensions in V (a, t), respectively.
Q=[q(1),q(2),…,q(t),…,q(tx)]
q(t)=[q(1,t),q(2,t),…,q(N,t)]
Q (t) represents the data vector corresponding to the t-th time point in Q, Q (1, t), Q (2, t), …, Q (N, t) represent the data in the 1 st and 2 … … N dimensions in Q (t), respectively.
K=[k(1),k(2),…,k(t),…,k(tx)]
k(t)=[k(1,t),k(2,t),…,k(N,t)]
K (t) represents a data vector corresponding to a t-th time point in K, and K (1, t), K (2, t), … … and K (N, t) represent data in 1 st and 2 … … N dimensions in K (t) respectively; k (t, τ) is the data vector corresponding to the t-th time point in the delay vector of K.
St1.3, sequencing R (A, tau) from big to small, and obtaining tau corresponding to R (A, tau) sequenced in the first h bits to form a high correlation delay set [ tau 1, tau 2, … …, tau h ]; h is a set value related to the number tx of time points included in the input data a of the turbo coding module;
st1.4 and the first output layer activate tau 1, tau 2, … … and tau h through functions to obtain corresponding probability distributions P (A, tau 1), P (A, tau 2), … … and P (A, tau h); the output layer also calculates a turbo code V (a_corr) in conjunction with the following formula and outputs.
V(A_corr)=∑ h i=1 (P(A,τi)×V(A,τi))
V (a, τi) is a vector obtained by transferring data at the first τi time points in V (a) to the end of the sequence;
V(A)=[v(1),v(2),…,v(t),…,v(tx)]
V(A,τi)=[v(τi+1),v(τi+2),…,v(tx),v(1),v(2),…,v(τi)]
in this embodiment, the first encoding module includes: the device comprises a feature extraction network, a dimension adjustment network, a second linear transformation network, a second correlation operation layer and a second output layer.
The input of the feature extraction network is G, and the output is G; the extraction features of G are input into a dimension adjustment network, the dimension adjustment network performs dimension adjustment on the input and outputs adjusted data G' E R tx×N ;
G'=[g'(1),g'(2),…,g'(t),…,g'(tx)]
G '(t) is the N-dimensional feature data at the t-th time point in G';
the specific dimension adjustment network comprises a downsampling layer and a full-connection layer, the downsampling layer adjusts the resolution of the image G (t) in G to a set value downwards, and then the image features with the resolution adjusted downwards are mapped into an N-dimensional space through the full-connection layer to form G'.
Referring to fig. 3, the first encoding module performs the following steps st2.1-st2.4 to obtain a highly correlated set of time delays:
st2.1, accumulating the data at each time point in G' and the time code of the time point row by row to form a vector X (G), and transforming the X (G) into Q (G), K (G) and V (G) by a second linear transformation network in different linear transformation modes.
St2.2, setting delay tau, traversing tau epsilon [1, tx-1] by a second correlation operation layer, and calculating correlation R (G, tau) corresponding to each tau according to the following formula;
R(G,τ)=[1/tx]∑ tx t=1 [q(G,t)×k(G,t,τ) T ]
wherein Q (G, t) is the data vector corresponding to the t-th time point in Q (G), and K (G, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (G).
St2.3, the second relativity operation layer orders R (G, tau) from big to small, obtain corresponding tau of R (G, tau) ordered in the first h bits and form a high relativity delay set tau_G= [ tau_G1, tau_G2, … …, tau_Gh ].
St2.4 and the second output layer activate tau_G1, tau_G2, … … and tau_Gh through functions to obtain corresponding probability distributions P (G, 1), P (G, 2), … … and P (G, h); the second output layer also calculates the code V (g_corr) in combination with the following formula and outputs;
V(G_corr)=∑ h i=1 (P(G,i)×V(G,τ_Gi))
v (G, τgi) is a vector obtained by transferring data at the previous τgi time points in V (G) to the end of the sequence.
In this embodiment, the second encoding module includes: a third linear transformation network, a third correlation operation layer and a third output layer;
the second encoding module performs the following steps st3.1-st3.3 to obtain a set of highly correlated time delays.
St3.1, third linear transformation network, use different linear transformation modes to transform X (AS) into Q (en) and V (en), use linear transformation mode to transform X (GS) into K (en).
St3.2, setting delay tau, traversing tau epsilon [1, tx-1] by a third correlation operation layer, and calculating correlation R (AG, tau) corresponding to each tau according to the following formula;
R(AG,τ)=[1/tx]∑ tx t=1 [q(en,t)×k(en,t,τ) T ]
wherein Q (en, t) is the data vector corresponding to the t-th time point in Q (en), and K (en, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (en).
St3.3, sequencing R (AG, tau) from big to small, and obtaining tau corresponding to R (AG, tau) sequenced in the first h bits to form a high correlation delay set [ tau (1, 1), tau (1, 2), … …, tau (1, h) ].
St3.4 and a third output layer activate tau (1, 1), tau (1, 2), … … and tau (1, h) through functions to obtain corresponding probability distributions P (AG, 1), P (AG, 2), … … and P (AG, h); the output layer also calculates the code V (ag_corr) in conjunction with the following formula and outputs;
V(AG_corr)=∑ h i=1 (P(AG,i)×V(en,τi))
v (en, τi) is a vector obtained by transferring data at the first τi time points in V (en) to the end of the sequence.
In this embodiment, the third encoding module includes: a fourth linear transformation network, a fourth correlation operation layer and a fourth output layer;
the third encoding module performs the following steps St4.1-St4.3 to obtain a set of highly correlated time delays.
St4.1, fourth linear transformation network, change X (AS) 'into Q (en_1) and V (en_1) with different linear transformation modes, change X (GS)' into K (en_1) with the linear transformation mode;
st4.2, setting delay tau, traversing tau epsilon [1, ty-1] by a fourth correlation operation layer, and calculating correlation R (en, tau) corresponding to each tau according to the following formula;
R(en,τ)=[1/ty]∑ ty t=1 [q(en_1,t)×k(en_1,t,τ) T ]
wherein Q (en_1, t) is the data vector corresponding to the t-th time point in Q (en_1), and K (en_1, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (en_1).
Sequencing R (en, tau) from large to small by a St4.3 and fourth correlation operation layer to obtain a tau corresponding to R (en, tau) in the first h 'position to form a high correlation delay set [ tau (2, 1), tau (2, 2), … …, tau (2,h') ]; h' is a set value associated with ty.
St4.4 and a fourth output layer activate tau (2, 1), tau (2, 2), … … and tau (2,h ') through functions to obtain corresponding probability distributions P (en, 1), P (en, 2), … … and P (en, h'); the output layer also calculates the code Q (ag_corr) in combination with the following formula and outputs;
Q(AG_corr)=∑ h' i=1 (P(en,i)×V(en_1,τi))
v (en_1, τi) is a vector obtained by transferring data at the first τi time points in V (en_1) to the end of the sequence.
In this embodiment, the fourth encoding module includes: a fifth linear transformation network, a fifth correlation operation layer and a fifth output layer;
the fourth encoding module performs the following steps St5.1-St5.3 to obtain a set of highly correlated time delays:
st5.1, the fifth linear transformation network transforms the periodic characteristic X (QS) into Q (en_2) by adopting a linear transformation mode, and transforms V (AG_corr)' into V (en_2) and K (en_2) by adopting different linear transformation modes;
st5.2, setting delay tau, traversing tau epsilon [1, ty-1] by a fifth correlation operation layer, and calculating correlation R (co, tau) corresponding to each tau according to the following formula;
R(co,τ)=[1/ty]∑ ty t=1 [q(en_2,t)×k(en_2,t,τ) T ]
wherein Q (en_2, t) is the data vector corresponding to the t-th time point in Q (en_2), and K (en_2, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (en_2).
And sequencing R (co, tau) from large to small by using the St5.3 and fifth correlation operation layers to obtain a high correlation delay set [ tau (3, 1), tau (3, 2), … … and tau (3, h ') ] corresponding to R (en, tau) in the first h' position.
St5.4 and a fifth output layer activate tau (3, 1), tau (3, 2), … … and tau (3, h ') through functions to obtain corresponding probability distributions P (co, 1), P (co, 2), … … and P (co, h'); the output layer also calculates and outputs the code V_corr by combining the following formula;
V_corr=∑ h' i=1 (P(co,i)×V(en_2,τi))
v (en_2, τi) is a vector obtained by transferring data at the first τi time points in V (en_2) to the end of the sequence.
The lightning long-term prediction model obtained by the training method of the lightning long-term prediction model based on the multiple transformers is verified by combining the specific embodiment.
In this embodiment, lightning location information and radar basic reflectivity data of a region 2020-2022 are selected as study objects. In this embodiment, in order to synchronize lightning location information with the time of radar basic reflectivity, 6 minutes are selected uniformly as unit time, that is, the difference between two adjacent time points is 6 minutes.
In this embodiment, the lightning intensity in the above-mentioned region is predicted by using the above-mentioned lightning long-term prediction model. The predicted result is also compared with the actual value in this embodiment.
In this embodiment, h= tx/6,h' = ty/4 is set.
In this embodiment, tx=48 and h=8. In this embodiment, the multidimensional lightning data time series a includes microsecond current, latitude of a lightning cloud centroid, longitude of the lightning cloud centroid, lightning strike current, and ground strike current; the output of the long-term prediction model of lightning is a lightning intensity time sequence, i.e. the lightning intensity at the time points of ty in the future, in this embodiment, the lightning intensity takes the lightning current.
In this embodiment, the lightning long-term prediction model predicts lightning intensity at the next ty time points according to the known latitude of lightning location at 48 times, longitude of lightning location, lightning current, number of ground shots, and radar echo image.
In this embodiment, two comparative models, i.e., a conventional LSTM prediction model and a conventional Tranformer prediction model, are also provided.
In the present embodiment, when ta=ty=16, h= 8,h' =4 is set; the predicted value obtained by the lightning long-term predicted model (abbreviated as the predicted value of the method) provided by the invention is respectively compared with the real value, the predicted value of the LSTM predicted model (abbreviated as the LSTM predicted) and the predicted value of the Tranformer predicted model (abbreviated as the Tranformer predicted), and the comparison result is shown in figure 4.
In the present embodiment, when ta=ty=24, h= 8,h' =6 is set; the predicted value obtained by the lightning long-term predicted model (abbreviated as the predicted value of the method) provided by the invention is respectively compared with the real value, the predicted value of the LSTM predicted model (abbreviated as the LSTM predicted) and the predicted value of the Tranformer predicted model (abbreviated as the Tranformer predicted), and the comparison result is shown in figure 5.
As can be seen from fig. 4, when the predicted time length is within 6 unit time, the predicted value of the method is not significantly improved compared with the LSTM predicted value and the transducer predicted value, but after the predicted time length exceeds 6 unit time, the predicted value of the method is closer to the true value than the LSTM predicted value and the transducer predicted value, and the more the predicted time is, the more the prediction accuracy of the method is improved.
As can be seen from fig. 5, when the prediction duration is within 7 unit time, the predicted value, LSTM predicted value and transducer predicted value of the method all change with the true value, so that the conventional model can obtain a good prediction effect in short-time prediction. However, after the predicted time length exceeds 7 unit times, the difference between the LSTM predicted value and the transducer predicted value and the true value is larger and larger; only the predicted value of the method is closer to the true value, especially when the predicted time length exceeds 16 unit time, the precision of the predicted value of the method is further improved and is extremely close to the true value. Compared with the traditional method, the method can still maintain higher precision in long-term prediction tasks.
Therefore, the LSTM model and the traditional transducer model have great influence on prediction accuracy due to the extension of prediction time in the lightning prediction task, and the lightning long-term prediction model provided by the invention still has more excellent performance in the long-term lightning prediction task.
It will be understood by those skilled in the art that the present invention is not limited to the details of the foregoing exemplary embodiments, but includes other specific forms of the same or similar structures that may be embodied without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
The technology, shape, and construction parts of the present invention, which are not described in detail, are known in the art.
Claims (9)
1. A training method of a lightning long-term prediction model based on a multi-element transducer is characterized by constructing a sample library and a basic model based on historical data, enabling the basic model to perform machine learning on learning samples in the sample library, and enabling the converged basic model to be used as the lightning long-term prediction model:
the sample library is used for storing learning samples { (A, G) } (A, G) are time sequence samples, A is a multidimensional lightning data time sequence, and G is a radar echo image sequence; y is a lightning parameter time sequence; the time sequence of y is immediately after the time sequences of A and G;
the input of the basic model is a known time series sample (A, G), and the output is a predicted value Y of a lightning parameter time series; y has the same data structure as Y; the base model codes and decodes the time sequence samples (A, G) to obtain a predicted value Y; a moderation coding module is arranged in the basic model and is used for acquiring moderation codes of the multi-dimensional lightning data time sequence A and inputting the moderation codes into subsequent processing;
The moderating and encoding module comprises a moderating layer, a first linear network, a first correlation operation layer and an activation output layer;
the moderating layer is used for obtaining a normalized value A' of A; the first linear network is used for carrying out linear transformation on A and A ', linearly transforming A into an N-dimensional time sequence A_N and linearly transforming A ' into an N-dimensional time sequence A ' _N; A_N εR tx×N ,A'_N∈R tx×N The method comprises the steps of carrying out a first treatment on the surface of the tx is the number of time points contained in the time series samples (a, G), N is the dimension of the data feature at each of time points a_n and a' _n;
the first correlation operation layer accumulates the data at each time point in A' _N and the time code of the time point line by line to form a vector X (A), and accumulates the data at each time point in A_N and the time code of the time point line by line to form a vector X (0); the time code is a multi-row vector, and each row of vector is an N-dimensional feature array; transforming X (A) into Q (A), K (A) and V (A) by adopting different linear transformation modes, and transforming X (0) into Q and K by adopting different linear transformation modes; setting a delay tau, and defining a delay vector of a vector P containing a plurality of time points as a data vector formed by moving data at the former tau time points in the P to the tail end of the sequence; the first correlation operation layer traverses tau epsilon [1, tx-1] to calculate the correlation R (A, tau) corresponding to each tau, and the correlation R (A, tau) is calculated by combining at least partial data per se and delay vectors of partial data in Q (A), K (A), V (A), Q and K; then, obtaining h corresponding delays tau with the maximum correlation R (A, tau) to form a high correlation delay set [ tau 1, tau 2, … …, tau h ], wherein h is a set value related to the number of time points in X (A); the first correlation operation layer calculates a moderation code V (a_corr) of a in combination with the following formula;
V(A_corr)=∑ h i=1 (P(A,τi)×V(A,τi))
V (a, τi) is a vector obtained by transferring data at the first τi time points in V (a) to the end of the sequence; p (A, τi) is the active value of the ith delay τi in the set of high correlation delays;
the base model includes an encoding portion and a decoding portion, the encoding portion including: the system comprises a time coding module, a moderation coding module, a first time decomposition network, a second time decomposition network and a second coding module; the decoding section includes: the device comprises a data reorganization module, a third coding module, a third time decomposition network, a fourth coding module, a residual error network, a fourth time decomposition network and a de-moderation module;
the time coding module is used for coding the time point to obtain a time code corresponding to the time point; the moderating encoding module is used for encoding the multidimensional lightning data time sequence A to obtain an encoding V (A_corr); the first coding module is used for coding the radar echo image sequence G to obtain a code V (G_corr); the first time decomposition network is used for decomposing the code V (A_corr) into a periodic characteristic X (AS) and a trend characteristic X (AT), and the second time decomposition network is used for decomposing the code V (G_corr) into a periodic characteristic X (GS) and a trend characteristic X (GT); the second encoding module is used for encoding the periodic characteristics X (AS) and X (GS) to obtain an encoding V (AG_corr);
The data reorganization module reorganizes the data space R tx×N The periodic characteristic X (AS), the trend characteristic X (AT), the periodic characteristic X (GS), the trend characteristic X (GT) and the code V (AG_corr) are respectively converted into a data space R ty×N X (AS) ', X (AT) ', X (GS) ', X (GT) ' and V (AG_corr) '; the third encoding module encodes in combination with X (AS) 'and X (GS)' to obtain an encoded Q (ag_corr); the third time decomposition network decomposes Q (ag_corr) into a period characteristic X (QS) and a trend characteristic X (QT); the fourth encoding module encodes in combination with the period characteristics X (QS) and V (AG_corr)' to obtain the code V_corr; the residual network performs feature extraction on the V_corr, and transmits the extracted features to a fourth time decomposition network for decomposition to obtain periodic features Y (S) and trend features Y (T); performing dimension superposition on five data of Y (S), X (AT) ', X (GT)' and trend characteristics X (QT) and Y (T) obtained by a decoding part to obtain a code Y (de) epsilon R ty×N The method comprises the steps of carrying out a first treatment on the surface of the The input of the de-moderating module is Y (de), and the de-moderating module outputs a predicted value Y of the lightning parameter time sequence Y;
the input of the time coding module, the input of the moderating coding module and the input of the first coding module are connected with the input of the basic model; the output of the de-moderating module is used as the output of the basic model;
The second coding module comprises a third linear transformation network, a third correlation operation layer and a third output layer; the third linear transformation network transforms X (AS) into Q (en) and V (en) by adopting different linear transformation modes, and transforms X (GS) into K (en) by adopting a linear transformation mode;
the third encoding module includes: a fourth linear transformation network, a fourth correlation operation layer and a fourth output layer; the fourth linear transformation network transforms X (AS) 'into Q (en_1) and V (en_1) by adopting different linear transformation modes, and transforms X (GS)' into K (en_1) by adopting a linear transformation mode;
the fourth encoding module includes: a fifth linear transformation network, a fifth correlation operation layer and a fifth output layer; the fifth linear transformation network transforms the periodic characteristic X (QS) into Q (en_2) by adopting a linear transformation mode, and transforms V (AG_corr)' into V (en_2) and K (en_2) by adopting different linear transformation modes;
the first coding module, the second coding module, the third coding module and the fourth coding module are mutually independent in parameters; the network structures of the third correlation operation layer, the fourth correlation operation layer and the fifth correlation operation layer are the same as the network structure of the second correlation operation layer; the second output layer, the third output layer, the fourth output layer and the fifth output layer have the same network structure.
2. The training method of the lightning long-term prediction model based on the multiple transformers according to claim 1, wherein:
R(A,τ)=[1/tx]∑ tx t=1 Af(t)
Af(t)=Linear(σ(a) 2 )⊙q(A,t)×k(A,t,τ) T +μ(Q) T ×k(t,τ) T +q(t)×μ(K)-μ(Q) T ×μ(K) T
wherein Af (t) is an excess parameter, Q (a, t) represents a data vector corresponding to a t-th time point in Q (a); k (a, t, τ) represents a data vector corresponding to the t-th time point in the delay vector of K (a), and K (a, t) represents a data vector corresponding to the t-th time point in K (a); t represents matrix transposition; mu (Q) is the mean value of Q, and mu (K) is the mean value of K; k (t, τ) represents a data vector corresponding to a t-th time point in the delay vector of K; q (t) represents a data vector corresponding to a t-th time point in Q;
σ(a) 2 a variance of A; linear (sigma (a)) 2 ) As a result, q (A, t) represents σ (a) 2 Linear mapping to the same space as the q (a, t) data structure, and then for mapped sigma (a) 2 Multiplying the data corresponding to the position by q (a, t) yields the same calculation result as the q (a, t) data structure.
3. The training method of a lightning long-term prediction model based on a multi-element transducer according to claim 1, wherein a first coding module is built in the basic model, and the first coding module is used for acquiring a code V (g_corr) of a radar echo image sequence G and inputting the code V (g_corr) into subsequent processing; the first encoding module includes: the device comprises a feature extraction network, a dimension adjustment network, a second linear transformation network, a second correlation operation layer and a second output layer;
The feature extraction network is used for extracting the features of the G and inputting the extracted features into the dimension adjustment network, and the dimension adjustment network performs dimension adjustment on the extracted features to output adjusted data G' E R tx×N ;
Accumulating the data at each time point in G' with the time code of the time point row by row to form a vector X (G); the second linear transformation network is used for transforming X (G) into Q (G), K (G) and V (G) in different linear transformation modes;
the second correlation operation layer acquires the code V (g_corr) based on Q (G), K (G), and V (G); the second correlation operation layer traverses tau epsilon [1, tx-1] to calculate the correlation R (G, tau) corresponding to each tau, and the correlation R (G, tau) is calculated by combining Q (G) and K (G); obtaining the maximum delay tau of h corresponding correlations R (G, tau) to form a high correlation delay set tau_G= [ tau_G1, tau_G2, … …, tau_Gh ], wherein h is a set value related to the number of time points in X (G);
the second output layer activates each delay in the set of high correlation delays τg; the second output layer computes the code V (g_corr) in combination with the following formula;
V(G_corr)=∑ h i=1 (P(G,i)×V(G,τ_Gi))
v (G, τgi) is a vector obtained by transferring data at the previous τgi time points in V (G) to the end of the sequence; p (G, i) is the active value of the i-th delay τ_gi in the high correlation delay set.
4. The training method of the lightning long-term prediction model based on the multiple transformers according to claim 1, wherein:
R(G,τ)=[1/tx]∑ tx t=1 [q(G,t)×k(G,t,τ) T ]
wherein Q (G, t) is the data vector corresponding to the t-th time point in Q (G), and K (G, t, τ) is the data vector corresponding to the t-th time point in the delay vector of K (G).
5. The training method of a long-term lightning prediction model based on a multivariate transducer according to claim 4, wherein the dimension adjustment network comprises a downsampling layer and a fully connected layer, the downsampling layer adjusts the resolution of the image G (t) in G down to a set value, and then the fully connected layer maps the image features with the downscaled resolution into an N-dimensional space to form G'.
6. The training method of a multi-element transducer-based long-term lightning prediction model according to claim 5, wherein the data reorganizing unit is configured to reorganize the data space R tx×N The data U in (1) is converted into a data space R ty×N The data conversion mode of the data U' is as follows: when ta are extracted from UThe data on the intermediate points form transition data U0, the U0 is filled with 0 backwards in the time dimension to extend until the number of the time points reaches ty, and the data U 'is used as data U' when the number of the time points reaches ty; tx and ty represent time points, N being the data dimension at a single time point; ta is a set value, and ta is less than or equal to ty;
The de-moderating module is combined with the following formula to obtain a predicted value Y of the lightning parameter time sequence Y;
Y=FC{Relu[Linear([σ(a) 2 ])]⊙Y(de)+Relu[Linear([μ(a)])]}
mu (a) is the mean value of A, sigma (a) 2 A variance of A; [ sigma (a) 2 ]Expressed by ty sigma (a) 2 Component vector, [ mu (a)]Representing a vector consisting of ty μ (a); mu (a) epsilon R 1×L ;σ(a) 2 ∈R 1×L ;[σ(a) 2 ]∈R ty×L ,[μ(a)]∈R ty×L The method comprises the steps of carrying out a first treatment on the surface of the FC represents the full connection layer, relu is the activation function, and as such, it represents the multiplication of data at the corresponding positions of two matrices with the same data structure; linear is a Linear map; linear ([ sigma (a)) 2 ]) Representing the pair [ sigma (a) through a linear network 2 ]Results of the Linear mapping ([ mu (a))]) Represents the pair [ mu (a) through a linear network]A result of performing the linear mapping; linear ([ sigma (a)) 2 ])∈R ty×N ,Linear([μ(a)])∈R ty×N 。
7. The method of training a multi-element transducer based long-term lightning prediction model of claim 1, wherein the multi-dimensional lightning data time series a comprises one or more of the following lightning parameters: latitude of lightning location, longitude of lightning location, lightning current, and number of ground striking returns; the lightning parameter time series y contains lightning intensity parameters including: one or more of lightning current, lightning voltage and lightning electric field.
8. A multi-element transducer-based long-term lightning prediction method using the multi-element transducer-based long-term lightning prediction model training method according to any one of claims 1 to 7, wherein the long-term lightning prediction model is obtained first using the multi-element transducer-based long-term lightning prediction model training method according to any one of claims 1 to 7, then the multidimensional lightning data and radar echo images at the most recent tx time points are obtained to construct known samples (a, G), the known samples (a, G) are input to the long-term lightning prediction model, and the long-term lightning prediction model outputs lightning parameters at the following ty time points.
9. A multiple transducer-based lightning long-term prediction system, characterized by carrying a computer program for implementing the multiple transducer-based lightning long-term prediction method of claim 8 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311027952.8A CN116739048B (en) | 2023-08-16 | 2023-08-16 | Multi-element transducer-based lightning long-term prediction model, method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311027952.8A CN116739048B (en) | 2023-08-16 | 2023-08-16 | Multi-element transducer-based lightning long-term prediction model, method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116739048A CN116739048A (en) | 2023-09-12 |
CN116739048B true CN116739048B (en) | 2023-10-20 |
Family
ID=87906493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311027952.8A Active CN116739048B (en) | 2023-08-16 | 2023-08-16 | Multi-element transducer-based lightning long-term prediction model, method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116739048B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220011506A (en) * | 2020-07-21 | 2022-01-28 | 한국전력공사 | System and Method for estimating power demand over the long term, Computer readable storage medium |
CN115587454A (en) * | 2022-10-24 | 2023-01-10 | 北京工商大学 | Traffic flow long-term prediction method and system based on improved Transformer model |
CN115598611A (en) * | 2022-09-30 | 2023-01-13 | 北京理工大学(Cn) | Radar intelligent echo extrapolation method based on global-local aggregation model |
EP4120256A1 (en) * | 2021-07-14 | 2023-01-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Processor for generating a prediction spectrum based on long-term prediction and/or harmonic post-filtering |
WO2023100342A1 (en) * | 2021-12-03 | 2023-06-08 | 三菱電機株式会社 | Tsunami learning device, tsunami learning method, tsunami prediction device, and tsunami prediction method |
WO2023103587A1 (en) * | 2021-12-09 | 2023-06-15 | 南京邮电大学 | Imminent precipitation forecast method and apparatus |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11586880B2 (en) * | 2018-08-28 | 2023-02-21 | Beijing Jingdong Shangke Information Technology Co., Ltd. | System and method for multi-horizon time series forecasting with dynamic temporal context learning |
-
2023
- 2023-08-16 CN CN202311027952.8A patent/CN116739048B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220011506A (en) * | 2020-07-21 | 2022-01-28 | 한국전력공사 | System and Method for estimating power demand over the long term, Computer readable storage medium |
EP4120256A1 (en) * | 2021-07-14 | 2023-01-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Processor for generating a prediction spectrum based on long-term prediction and/or harmonic post-filtering |
WO2023100342A1 (en) * | 2021-12-03 | 2023-06-08 | 三菱電機株式会社 | Tsunami learning device, tsunami learning method, tsunami prediction device, and tsunami prediction method |
WO2023103587A1 (en) * | 2021-12-09 | 2023-06-15 | 南京邮电大学 | Imminent precipitation forecast method and apparatus |
CN115598611A (en) * | 2022-09-30 | 2023-01-13 | 北京理工大学(Cn) | Radar intelligent echo extrapolation method based on global-local aggregation model |
CN115587454A (en) * | 2022-10-24 | 2023-01-10 | 北京工商大学 | Traffic flow long-term prediction method and system based on improved Transformer model |
Non-Patent Citations (2)
Title |
---|
Reliability assessment based on time waveform characteristics with small sample: A practice inspired by few-shot learnings in metric space;Kejie Li 等;Applied Soft Computing;1-11 * |
TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression;Shengchao Chen;PREPRINT & UNDER REVIEW;1-13 * |
Also Published As
Publication number | Publication date |
---|---|
CN116739048A (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111161535B (en) | Attention mechanism-based graph neural network traffic flow prediction method and system | |
CN111967679A (en) | Ionized layer total electron content forecasting method based on TCN model | |
CN117575111B (en) | Agricultural remote sensing image space-time sequence prediction method based on transfer learning | |
CN113255842B (en) | Vehicle replacement prediction method, device, equipment and storage medium | |
CN114936620A (en) | Sea surface temperature numerical value forecast deviation correction method based on attention mechanism | |
CN113516133A (en) | Multi-modal image classification method and system | |
CN116844041A (en) | Cultivated land extraction method based on bidirectional convolution time self-attention mechanism | |
CN114048845B (en) | Point cloud repairing method and device, computer equipment and storage medium | |
CN116739048B (en) | Multi-element transducer-based lightning long-term prediction model, method and system | |
CN117874448A (en) | Space-time weather prediction method based on transducer | |
CN117874443A (en) | Track prediction method based on attention neural network and generating countermeasure network | |
CN115953902B (en) | Traffic flow prediction method based on multi-view space-time diagram convolutional network | |
CN104463245A (en) | Target recognition method | |
CN112633579A (en) | Domain-confrontation-based traffic flow migration prediction method | |
CN116797885A (en) | Ionosphere total electron content forecasting method, model training method and system | |
CN116778709A (en) | Prediction method for traffic flow speed of convolutional network based on attention space-time diagram | |
Chen et al. | Explainable Global Wildfire Prediction Models using Graph Neural Networks | |
Proskura et al. | On classifier learning methodologies with application to compressed remote sensing images | |
CN115170682A (en) | Method for processing point cloud data and target point cloud data processing model | |
CN115333957A (en) | Service flow prediction method and system based on user behaviors and enterprise service characteristics | |
CN115187775A (en) | Semantic segmentation method and device for remote sensing image | |
Wong et al. | Addressing Deep Learning Model Uncertainty in Long-Range Climate Forecasting with Late Fusion | |
CN117540780B (en) | Compression method and related device of neural network model | |
CN117197513B (en) | Lightning locating point cloud clustering model, method and system based on graph neural network | |
CN118552747A (en) | Method for generating virtual test multivariable space-time sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |