CN112464567B

CN112464567B - Intelligent data assimilation method based on variational and assimilative framework

Info

Publication number: CN112464567B
Application number: CN202011420985.5A
Authority: CN
Inventors: 黄丽蓝; 宋君强; 任开军; 李小勇; 冷洪泽; 王东紫; 卢竞择; 陈睿
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2022-08-12
Anticipated expiration: 2040-12-08
Also published as: CN112464567A

Abstract

The invention discloses a method based on variationAn intelligent data assimilation method for assimilation frame includes S1 analyzing the first half data of assimilation window, and correcting assimilation time window [ t ] with the second half data ₁ ,t _n ]Intermediate analysis field, determining the analysis field at the current moment

S2 replacement assimilation process using the first multi-layered sensor, analyzing and obtaining the data according to history and future

S3 comparing the previous training data set with

Superposing and training the next multilayer perceptron to obtain

The method is equivalent to the method that a prediction model is simulated in a data assimilation period to carry out short-term prediction; s4, repeating S2 and S3, wherein each link before ct generates a new link

Before replacement

Form a new training data set until ct is ahead

Is completely replaced, and the background information in the following training data set is only composed of

Provided is a method.

Description

Intelligent data assimilation method based on variational and assimilative framework

Technical Field

The invention relates to the technical field of data assimilation, in particular to an intelligent data assimilation method based on a variational and assimilative framework.

Background

Numerical Weather Prediction (NWP) is a mainstream weather forecasting approach, which is defined by Bjerknes as an initial/boundary value problem. The future state of the atmosphere sensitively depends on its detailed Initial Conditions (IC), certain Boundary Conditions (BC) and a perfect forecasting model. The prediction model is mostly a mathematical physical model approximately describing the law of motion of the earth system, and is usually expressed by a differential equation. Generally, given the ICs and BCs of a forecasting model, the numerical solution of the forecasting model represents the corresponding numerical weather forecast. With the improvement of theory and the improvement of computing power, the precision of forecasting models and BCs is higher and higher. Therefore, the uncertainty of ICs becomes a factor of NWP prediction error. Data Assimilation (DA) is the integration of all available Data resources, including observations and short-term forecasts (i.e., background fields or initial guesses), to obtain an analysis field, which is expected to be the best estimate of the probability density of the actual atmospheric state. That is, the goal of the DA is to produce the most accurate IC for NWP. Therefore, the establishment of an optimal DA method is of high interest to researchers in the field of meteorology.

Based on bayesian theorem, the variational assimilation method (Var) was the predominant business method for NWP in the 20 th century before the 90 s. Basically, to handle uncertainties from different information, the guiding strategy for the variant assimilation approach is: the new observations are used to correct the initial guess at the specified grid points, i.e. the background field and the observations are combined by minimizing a cost function, which consists of two aspects: one part is a penalty term for the background mean distance and the other penalty term is the distance to the observed value. The specific implementation mode is that a large amount of observation data is directly assimilated by providing a static background error covariance matrix (B), wherein the matrix B is calculated by climate data once and cannot well represent real space-time error statistics, which means that the matrix B may introduce a large background error. Var is divided into 3D-Var, which uses single-time data, and 4D-Var, which uses data for a time window of a given length to estimate initial state. Although in 4D-Var B implicitly evolves with flow-dependent changes and better initial states than 3D-Var can be obtained, it requires expensive computational expense and requires the building of linear and adjoint numerical models, which is difficult to build.

As a data-driven approach, Machine Learning (ML) is the study of computer algorithms that are automatically improved by experience. In a computer system, experiences mainly exist in the form of data, and ML establishes a mathematical model according to sample data to predict and make decisions. ML has enjoyed significant success in various fields of computer vision, speech recognition and control systems, as well as in related scientific fields of physics, chemistry and biology. The development of ML has brought new opportunities for DA, and some researchers have attempted to explore suitable ML algorithms to solve local problems of traditional and single DA approaches. As one of the geosciences system problems, DA mainly includes three stages, and errors still occur at each stage, and there is no general DA technique that can ideally eliminate the errors. The first stage of DA is the pre-processing and quality control of the observed data, in order to eliminate the errors of the observers, researchers usually use an ideal function to invert the indirect observed data, and use an empirical observation operator (matrix) to interpolate the observed data into a numerical grid, which introduces errors that cannot be underestimated. ML can learn the time-space characteristics and the correlation of data, which has important significance for improving errors introduced by observed values. The second stage is objective analysis, integrating the observed data with the background information provided by the numerical model. Generally, functions of mathematical models for simulating the evolution of the earth system based on physical theory use empirical parameters, and have semi-empirical properties, so that these numerical models can be replaced by ML. The ML method has been shown to be more powerful and flexible than previous mechanistic or semi-empirical modeling methods. Furthermore, compared to current sequential assimilation of DA, ML is parallelizable, which is very helpful to speed up the assimilation process. The third stage is initialization, and the result of target analysis needs to be properly adjusted to avoid the problem that subsequent integration cannot be performed due to noise generated by inconsistent prediction models. Therefore, the output of the DA depends on the uncertainty of the numerical model. Based on a large amount of earth system data, model errors are corrected by using an ML method, so that the quality of DA results is improved, and some researchers pay attention.

The final goal of data assimilation is to quickly and accurately obtain the optimal analysis field and provide the optimal initial field for the numerical weather forecast mode. The existing variational assimilation method is based on the Bayesian theory, and has the advantages of directly assimilating a large amount of observation data and simple basic framework. But the storage and direct calculation of the large matrix B are involved, and high requirements are imposed on the storage capacity and the calculation capacity of a calculation device; furthermore, the results obtained by the variational assimilation method, i.e., the analysis field, are generally not the best results for the analysis field, since they are based on a variety of ideal assumptions. In other words, the existing DA directly depends on the quality of the observed field and the background field. The former is measured by various observation devices and obtained by a specific empirical inversion function, and the latter is provided by a numerical model. There occur inevitable errors that need to be eliminated. Based on the first-order Markov assumption and Bayes theorem, the integration and assimilation processes of the numerical model are sequentially performed and are not parallel, so that an assimilation period of the traditional DA method is long and needs to be accelerated.

Disclosure of Invention

To build a general model of the DA, the historical y is used ^o And x ^b And future y ^o Included therein, optimizing current x in a DA system ^a Secondly, different models are used for simulation, the process of model prediction is accelerated, and the technical problems in the prior art are solved.

The technical scheme of the invention is as follows: an intelligent data assimilation method based on a variation assimilation framework comprises the following steps:

s1, analyzing the data in the first half of the assimilation window, and correcting the assimilation time window [ t ] with the data in the second half ₁ ,t _n ]Intermediate analytical field of formula

Determining the analysis field of the current time

In which

n (1, 2, 3.) is the length of the assimilation time window; ct is the middle time of the assimilation time window; [1, p ]]Interval representing the first half of data, [0, q ]]Equal to the time window t ₁ ,t _n ]The interval of (1); ω and v are the weight matrices of x and y, respectively, bias is the intercept term, x ^b As a background field, y ^o Is a known observed value;

s2, replacing assimilation process by using the first multi-layer sensor, analyzing and obtaining data according to history and future

Wherein

An analysis field representing the ct instant obtained using the first MLP;

s3, comparing the previous training data set with the previous training data set

Superposing and training the next multilayer perceptron to obtain

Equivalent to in one dataSimulating the prediction model for short-term prediction during the assimilation period, wherein

A background field indicating the ct +1 time obtained using the second MLP; (ii) a

S4, carrying out iterative training, repeating the processes of S2 and S3, and generating a new link in each link before ct

Before replacement

Form a new training data set until ct is reached

Is provided, wherein

Representing the background field obtained by the conventional data assimilation method 3D-Var before the time of ct, if ct is 6, then when t is ∈ [7,12 ∈]When it is, then

Represents the background field obtained by the MLP method,

representing the background field at time t obtained using the second MLP.

Preferably, ct > 3.

Preferably, the previous input node in S2 and S3 is a vector obtained by superimposing the background information at the time before ct and the observed value at the time before n; the latter vector is the other vector, which contains the previous training data set and an analysis result.

Preferably, one of the output nodes of S2 and S3 is at time t

The other output node being at time t +1

Wherein

Representing the analysis field at time t obtained using the first MLP.

Preferably, each of the multi-layer sensors in S2 and S3 has 3 hidden layers, and the neuron number of each hidden layer is obtained through experiments.

Preferably, in S2 and S3, the result obtained by using ReLU as an activation function is non-linear.

Preferably, the learning rates with appropriate weight attenuation are defined in each of the multi-layered perceptrons in S2 and S3.

Preferably, the loss functions in S2 and S3 are mean square error MSE.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the pure 3D-Var (three-dimensional variation data assimilation), the intelligent data assimilation method provided by the invention effectively improves the quality of an analysis field and has low calculation cost.

2. During the iterative training process of the models, the models are continuously updated

The background error covariance is implicitly corrected to provide more accurate background information for the next analysis field.

3. Compared with the prior art, the 3D-Var of the Lorenz-63 system is increased by 56.44 percent; for the Lorenz-96 system, the 3D-Var is increased by 22.76%; with respect to computational cost, a major advantage of the present invention is that better assimilation results are obtained within an acceptable execution time. Compared with the empirical data assimilation method, the speed is respectively improved by about 3 times and 80 times.

Drawings

FIG. 1 is a block diagram of the preparation of an optimized partial data set;

FIG. 2 is a schematic diagram of HDA-MLP optimized 3D-Var;

FIG. 3 is a data processing block diagram;

FIG. 4 shows the RMSE of different models of the first part of HDA-MLP in the Lorenz-63 system at different time steps;

FIG. 5 shows the RMSE of different models of the first part of HDA-MLP in the Lorenz-96 system at different time steps;

FIG. 6 is a RMSE plot of 50 experiments in the Lorenz-63 system for various DA methods;

FIG. 7 shows x in Lorenz-63 system ^a And x ^b RMSE map of (a);

FIG. 8 is a graphical depiction of the trajectory of the state variables in Lorenz-63;

FIG. 9 is a trace schematic of "cold start" in an enlarged view;

FIG. 10 is an enlarged view of a portion of the trace;

FIG. 11 is a RMSE line graph of 50 experiments in a Lorenz-96 system for various DA methods;

FIG. 12 shows x in Lorenz-96 system ^a And x ^b RMSE line graph of (a);

FIG. 13 is the average R of the state variables ² And a standard differential intent;

FIG. 14 is a schematic diagram of a sensor structure;

FIG. 15 is a schematic diagram of an example of an MLP;

fig. 16 is a schematic diagram of an example of the Lorenz-96 model (E ═ 40);

FIG. 17 is a schematic diagram of a data assimilation analysis cycle.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Experiments have shown that historical data can convey time signatures, which are critical to describing the flow dependence of atmospheric variables.

In addition, 4D-Var introduces an assimilation time window, and the core idea is to adopt a time interval [ t ] in a DA period ₀ ,t _n ]Rather than using only data at a particular time in the 3D-Var. It is based on the short term y ^o To correct the initial state at the beginning of the assimilation time window, similar to the reverse propagation. Based on the ideas of Cache-MLP and assimilation time window, an HDA-MLP model is provided, and a 3D-Var system is optimized.

Data preparation structure of training data set of optimization part As shown in FIG. 1, setting assimilation time window as [ t ] ₁ ,t _n ]I.e. analysis using the first half of the data, and correction of the time window [ t ] using the second half of the data ₁ ,t _n ]The middle analysis field. Thereafter, the current analysis field is determined by the underlying bulletin

And (4) optimizing.

Wherein

Wherein n (n ═ 1,2, 3.) is the length of the assimilation time window; ct is the middle time of the assimilation time window; [1, p ]]Represents the interval of the first half data, [0, q ]]Equal to the time window t ₁ ,t _n ]The interval of (1); ω and v are weight matrices of x and y, respectively, bias is the intercept term, x ^b As a background field, y ^o For known observed values, x in the present invention ^b And y ^o The expressions are meant to be identical, but there are different expressions, which are adapted to different expression scenarios,namely, the priori knowledge is equivalent to a background field and is also equivalent to an initial guess field; y is ^o The same is true.

As shown in fig. 2 (assuming ct>3) And constructing a chain MLP model to optimize the result of the 3D-Var. The following is an explanation of its working principle. In link1, the first step is to replace the assimilation process with the first MLP, analyzed and derived from historical and future data

The next step is to sum the previous training data set with

Superposing and training the next MLP to obtain

This is equivalent to simulating a prediction model over a DA period for short-term prediction. Next, iterative training is performed, the steps being the same as link 1. It should be noted that, in the links before ct, each link generates a new link

Before replacement

Form a new training data set until ct is ahead

Provided is a method. Namely, the MLP model is made to be an independent innovative assimilation system, and the data of the 3D-Var is not depended on, and the 3D-Var only provides the first ct data

To "cold start" the proposed model. During the iterative training process of the models, the model is continuously trainedUpdating

The optimal configuration of HDA-MLP to 3D-Var is defined by empirical test and is obtained

And

are similar because their training data sets are nearly identical in number, wherein

An analysis field representing the ct instant obtained using the first MLP;

the background field representing the ct +1 instant obtained using the second MLP;

Representing background fields obtained by the MLP method for distinguishing background fields obtained by the 3D-Var method of conventional data assimilation

Representing the background field at time t obtained using the second MLP;

representing the analysis field at time t obtained using the first MLP.

One link in the model has the following features:

1) the previous input node is a vector which is obtained by superposing the background information of the moment before ct and the observed value of the moment before n; the latter vector is another vector, which contains the previous training data set and an analysis result;

2) one output node at time t

The other output node being at time t +1

3) Each MLP has 3 hidden layers, and the neuron number of each hidden layer is obtained through experiments;

4) taking ReLU as an activation function to obtain a nonlinear result;

5) defining a learning rate with an appropriate weight decay in each MLP;

6) the loss function is Mean Square Error (MSE);

examples

1. Building a data set

In general, two classical nonlinear chaotic power systems are used for carrying out an observation system simulation experiment to check the effectiveness of the DA method. In the present invention, the proposed HDA-MLP method was validated using the Lorenz-63 and Lorenz-96 systems.

Assuming that the Lorenz-63 system is perfect and there is no model error, so the model error covariance matrix Q is 0; given an initial state x ₀ ＝1.508870,y ₀ ＝-1.537121,z ₀ 25.46091, and time-integrates the Lorenz-63 model using a fourth-order Runge-Kutta time difference format, time step d _t The analog integration step size is 111000, 0.01. Then, without assimilation, can be passed

The formula is integrated to obtain a real field (x) ^t ). In a true atmosphere model, the ground truth value of a state variable generally represents x ^t . At x ^t Adding errors in random obedience Gaussian distribution to obtain an approximate observation field (y) ^o ). Furthermore, a linear observation operator h ═ I and an observation error covariance matrix R ═ I are provided, where I is a third order unitary matrix.

Similarly, under the same ICs, a pure 3D-Var DA system completes 111000 analysis cycles, generating a background field (x) ^b ) And analysis field (x) ^a ). In order to achieve the stable state of the Lorenz-63 system and obtain approximate simulated atmosphere, the cold start of the first 1000 integration time steps is needed, the data of the middle 100000 steps is used for preparing a training data set of HDA-MLP, and the data of the last 10000 steps is test data. Fig. 3 is a structure of data preparation.

Furthermore, to avoid the chance and randomness of the experiment, the same initial state, x ', is set for the 3D-Var DA system' ₀ ＝1.508870,y′ ₀ ＝-1.537121,z′ ₀ 25.46091, which is the 101001 th chaotic state obtained by 101000 step integration with the initial state mentioned in the data preparation process. 10000 cycles of analysis were then continued and repeated 50 times in preparation for testing the data set.

In the Lorenz-96 system, the steps and procedures for generating simulation data are the same as in the Lorenz-63 system, except that the number of state variables is increased from 3 to 40 in the Lorenz-63 system, and the corresponding initial states are different.

2. Evaluation index

In order to measure the assimilation and prediction performance of the model, the effect of the novel DA method was evaluated using Root Mean Square Error (RMSE) and Goodness of Fit (Goodness of Fit).

RMSE denotes the analysis field x ^a With the true value x ^t Sample standard deviation of the difference, which is x ^a And x ^t The square root of the ratio of the deviation between and the observed number N. RMSE is always non-negative and a value of 0 indicates that the target data fits perfectly. When x is ^a And target x ^t When there is a deviation between them, an analysis error occurs. The role of RMSE is to combine the errors at different times into one metric with analytical capabilities. Generally, a lower RMSE is better than a higher RMSE. Therefore, RMSE is widely used in the DA-related research literature to compare the performance of different DA processes. The lower the RMSE, the better the performance of the DA process; otherwise, the larger the analytical error of the DA method. In the present invention, RMSE is defined as:

where i is the ith state variable and e is the e-th set member.

Goodness of fit refers to the degree of fit of the regression line to the observed value. The statistic for measuring the goodness of fit is the determinable coefficient R ² ，R ² Has a maximum value of 1, R ² The closer the value is to 1, the better the fit of the regression line to the observed value. R ² Has a maximum value of 1, R ² The closer the value of (1) is, the better the fitting degree of the regression line and the observed value is; otherwise, R ² The smaller the value of (a) is, the worse the fitting degree of the regression line to the observed value is.

Wherein the Total Square Sum (TSS) calculates the total square sum of the samples; residual Sum of Squares (RSS) computes the sum of squares of the residuals, i.e. the sum of squares of the errors.

3. Selection of time step

From the DA period (fig. 3), it can be seen that,

from time t

And

the coupling is calculated directly, but in practice,

is to be

Short-term predictions obtained as ICs of the prediction model, i.e.

And

in the context of a correlation, the correlation,

is dependent on

And

in theory, these correlations are also true for longer periods of the DA cycle. The present invention therefore defines the problem as a time series problem that relies on historical meteorological data, including observation fields and ambient fields. It is common to solve the time series problem with the ML method, but the time step (timeout) is an important parameter for the model to learn the time information. In general, the longer the time step, the more information in the past and future can be used for prediction. But storeOne key issue is whether the accumulation of errors will be larger than the accumulation of valid information. Therefore, the invention makes preliminary experiments to select the optimal time step for subsequent research. As shown in FIGS. 4 and 5, in both Lorenz-63 and Lorenz-96 systems, timekeep 6+11 is better than others, where 6 represents the history y of the first 6 state variables ^o And x ^b It provides past information, then 5 future y ^o Adapted to correct the current x ^a . A total of 6 x ^b And 11 y ^o Together forming a training data set, input data for a neural network to obtain x ^a . the construction principle of the timer 4+7 and 8+15 is the same as that of the timer 6+11, but the former has less effective information and the latter has more accumulated errors compared with the timer 6+11, which is disadvantageous to the following research. The invention selects timekeep to be 6+11 to construct the training data set of the proposed model.

4. Analysis of experiments

In this section, to prove the validity and effectiveness of the proposed model, we first take the Lorenz-63 system as an example, which is a simple chaotic system with low dimension. The method comprises the steps of optimizing analysis fields of a traditional DA method and a 3D-Var by using HDA-MLP, comparing results of existing pure 3D-Var and Cache-MLP methods, analyzing whether the optimization aims can be achieved or not, drawing a regression curve, calculating goodness of fit, investigating the capability of each method for tracking a real field of a model, and evaluating the superiority of the proposed model.

4.1 analysis of the results of the experiments in the Lorenz-63 System

For HDA-MLP, first, FIG. 6 compares the 50 analytical fields of the Lorenz-63 system calculated by different DA methods, including pure 3D-Var, Cache-MLP and HDA-MLP. In the figure, the solid line marked with a circle represents pure 3D-Var; the Cache-MLP is represented by a solid line marked by a triangle; solid lines with square marks represent HDA-MLP. Table 1 reports the average RMSE of 50 experiments per method (bold indicates best results). By comparison, the minimum RMSE of 0.2105 can be achieved by HDA-MLP. Although the accuracy of 3D-Var is improved by 39.85% by Cache-MLP, the accuracy can be improved by 55.26% even by using HDA-MLP. This is possible because the HDA-MLP can use a short-term future training data set to extract more temporal features and information.

Table 1: comparison of the results of the experiment of FIG. 6

Method	RMSE	Improvement
			Pure 3D-Var	0.4705	/
Cache-MLP	0.2830	39.85％
			HDA-MLP	0.2049	56.44％

Furthermore, it was found that ^o Next, x obtained by HDA-MLP ^b Improvement of precision and x ^a Similarly (table 2). This is because HDA-MLP not only uses MLP to optimize the assimilation process, but also uses another MLP to simulate the pattern prediction process. The MLP training data set used for simulation model prediction contains historical data for many moments and a more accurate analysis field. In the chain iterative training process, more time information can be mined and B can be updated implicitly, which helps to obtain a useful simulator. Experimental results show that the trained model can be used quicklyModel prediction is optimized quickly and efficiently (fig. 7).

Table 2: comparison of the results of the experiment of FIG. 7

In addition, the best method for judging whether the DA method can well track the true value of the model is to draw a tracking track. As shown in fig. 8, a rough qualitative analysis shows that each DA method can track true values for all model three state variables (x, y, z). Wherein the solid line is model x ^t Dotted line is x of pure 3D-Var ^a And the dotted line segment is x of Cache-MLP ^a The dotted line combined with the dotted line is x of HDA-MLP ^a . Table 3 shows that all methods fit better than 0.99 in each state variable, quantitatively indicating that each method fits the true value of the model well (bold represents the best results). However, in HDA-MLP, R ² Slightly larger in each state variable, which means that the model performs better. Some partial sequences of the traces are enlarged and may reflect the details of fig. 8. As shown in fig. 9, the "cold start" period is an important indicator of whether the DA process can bring the assimilation system into steady state quickly. It can be seen that pure 3D-Var and Cache-MLP perform better than HDA-MLP because the former relies on a seemingly better static B, which is computed once from a large amount of meteorological data, while HDA-MLP starts with static B but updates it implicitly to accommodate future assimilation, thus requiring a longer adaptation phase. Furthermore, in fig. 10, all DA methods performed well in monotonically increasing or decreasing trace portions (fig. 10 (a) and fig. 10 (f)), which also yields R ² Verification of specific values, in Table 4, R ² Is much higher than 0.99 (columns a and f). However, some corners of the trace are seen (fig. 10 (b) and (e)), which represent abrupt changes in the state variables, to which pure 3D-Var cannot adapt, and Cache-MLP performance is slightly better, but with obvious drawbacks. In contrast, HDA-MLP was able to capture changes and obtain the highest R in most cases ² (bold in Table 4). The result shows that the proposed model can learn the change of the state variable from the time series training data set, and has better robustness than a pure 3D-Var and a numerical model which are based on ideal assumption and describe the change of the state variable by integration.

Table 3: goodness of fit of various method state variables

Table 4: goodness of fit of partial trajectories for different methods

4.2 analysis of the results of the experiments in the Lorenz-96 System

For HDA-MLP, FIG. 11 shows a comparison of the results of 50 experiments between the various DA methods of Lorenz-96 (note labels are the same as for the Lorenz-63 system). Pure 3D-Var is represented by a solid line marked with a circle; Cache-MLP is represented by solid lines marked with triangles; HDA-MLP is represented by the square-marked solid line). The models can improve assimilation accuracy with lower RMSE. However, the degree of performance improvement is relatively small, as shown in table 5, which may be caused by the complexity of the Lorenz-96 system itself. As shown in FIG. 17, the Lorenz-96 model includes 40 state variables (x) around the earth ₁ ，x ₂ ，...，x ₄₀ ) More spatial features and therefore more spatial dependencies between state variables than the Lorenz-63 system. Only the training data collected in a longer time step is considered to obtain more time information, and the construction of a better model for extracting the spatial features between variables is omitted. This may be the possible reason why the proposed model only slightly improves accuracy. In the present invention, the proposed model, as shown in fig. 12, uses time information to simulate and optimize the assimilation and model prediction processes of the Lorenz 96 system, which has proven to be effective. Similarly, HDA-MLP is implemented in the background field of 3D-VarA considerable improvement was observed with an improvement of 28.61% (table 6). More accurate x ^b And the method greatly contributes to optimizing an analysis field in the next assimilation. However, the improvement of the Lorenz-96 system is weaker than the Lorenz-63 system due to the accumulation of errors for more state variables.

Table 5: comparison of the results of the experiment of FIG. 11

Method	RMSE	Improvement
			Pure 3D-Var	0.5283	/
Cache-MLP	0.4283	18.93％
			HDA-MLP	0.4080	22.76％

Table 6: comparison of the results of the experiment of FIG. 12

Furthermore, the average RMSE may reflect the assimilation of all state variables in the Lorenz-96 system by the various DA methods, but R ² But can show every state changeThe degree of fit of the quantities, and therefore the average R of all 40 state variables was calculated ² And standard deviation (STD) and plotted in FIG. 13 (dotted labeled pure 3D-Var, circled on straight labeled HAD-MLP, cross labeled Cache-MLP). Intuitively, better R can be obtained by HDA-MLP ² R of each state variable ² Are all close to 1. That is, the HDA-MLP can better fit each state variable. Meanwhile, the STD of HDA-MLP is smaller compared with other methods (except static pure 3D-Var), which shows that the result of HDA-MLP is more concentrated near the average value and the model is more stable. The proposed model is trained by big data containing a large amount of state variable changes, and the neural network can learn the complex relation and time characteristics among the state variables, so that the HDA-MLP can adapt to more test data sets. In general, compared with pure 3D-Var and Cache-MLP, the proposed model can reach the highest R on the premise of ensuring the stability of the DA system ² The state variables can be fitted well.

5. Calculating the cost

DA is the process of the assimilation and long-short term prediction iterative loop. It is important to improve the quality of DA. However, the speed of the DA process is also an essential factor in determining the utility and effectiveness of the hybrid DA process. The ultimate goal of the DA is to quickly and accurately obtain the best ICs for NWP. The computing device used in all experiments of the invention was configured as GEFORCE RTX 2060, Intel CORE i 7. Table 7 below shows the calculated costs for each mixing method under different models.

Table 7: computational cost of various hybrid methods to different models

Previous hybrid approaches Cache-MLP and backpacked-MLP take the least time because it is only a forward propagation process after model training is complete. However, it relies on the pure 3D-Var background field, i.e. during the test, it is necessary to wait for the pure 3D-Var to complete the short-term prediction to obtain the background field for further test experiments, which is not added in Table 7. However, the HDA-MLP used to optimize the 3D-Var results only requires the first ct (a certain time step) background field to be "cold started". Then, it obtains background fields through an iterative loop, and then predicts the analysis field of the next time step using these background fields. In general, the novel model of (a) can yield higher quality results with an acceptable loss of time in a given integration step.

Inspired by the concept of four-dimensional variational data assimilation, the invention provides an innovative DA method named HDA-MLP. In the present invention, the DA can be defined as a time series problem, relying on historical weather data comprising the observed field and the first guess field. Furthermore, as with 4D-Var, to obtain a more robust hybrid DA system, a short term prediction is added to the HDA-MLP to correct the results at the current time. Under the guidance of the time characteristics of the state variables, HDA-MLP does optimize the analysis field of the conventional DA system in two classical nonlinear dynamical models by implicitly updating the background error covariance. Specifically, for the Lorenz-63 system, the 3D-Var increased by 56.44%; for the Lorenz-96 system, the 3D-Var increased by 22.76%.

The principle of the invention is as follows:

1. three-dimensional variation data assimilation (3D-Var)

Based on Bayes' theorem, the variational approach has been the dominant business approach to NWP in the past 90 s of the 20 th century. Typically, the 3D-Var finds the analytic field by estimating a single initial state of the NWP model by minimizing a cost function L. The cost function L is expressed as:

wherein x represents a state variable; x is a radical of a fluorine atom ^b A priori knowledge (also known as background fields) representing state variables; y is ^o Is a known observed value; h is an observation operator, h (x) is intended to calculate the values in the observation points using the model variables in the grid points; b is a background error covariance matrix, and a background field x is derived from known laws of atmospheric dynamics and physics ^b Correlation with model grid points; r is an observation error covariance matrix which can be obtained from the observation error of the instrument to describe the observed value y ^o And the correlation between the model grid points, and T represents the transposition operation of the matrix.

The cost function L is introduced as follows:

L _b for measuring x and x ^b Even if x fits x based on the background error covariance B ^b 。

L _o For measuring x and x ^o Even x fits y based on the observation error covariance R ^o Where h is the observation operator, which performs the necessary interpolation and converts the mode variables into observed spatial field values.

The minimum value of formula (1), i.e. the jacobian vector formula satisfying formula (2), is obtained by the nonlinear conjugate gradient method:

wherein H is a tangent operator of the observation operator H, defined as:

according to formula (2), x ^a Is the state that minimizes the function L and is solved by equation (4):

[B ^-1 +H ^T R ^-1 h]x ^a ＝B ^-1 x _b +H ^T R ^-1 y ^o (4)

the optimal analysis field is then:

x ^a ＝[B ^-1 +H ^T R ^-1 h] ^-1 [B ^-1 x ^b +H ^T R ^-1 y ^o ]

＝[B ^-1 +H ^T R ^-1 h] ^-1 [B ^-1 x ^b +H ^T R ^-1 h(x ^b )-H ^T R ^-1 h(x ^b )+H ^T R ^-1 y ^o ] (5)

＝x ^b +[B ^-1 +H ^T R ^-1 h] ^-1 H ^T R ^-1 (y ^o -h(x ^b ))

wherein, h (x) ^b ) Perform the necessary interpolation and change the mode variable x ^b Conversion into values of a corresponding observed spatial field

Minimizing the cost function L can be regarded as an inverse problem, since the construction and inversion operation of the error covariance matrix B, R are difficult points of three-dimensional variation and assimilation, and the inversion of the high-dimensional matrix B, R is very difficult, and requires a large amount of computing resources and storage space, when actually solving this problem, the equation (5) is not directly solved, but gradient information of the objective function is obtained through proper transformation, and then an optimization method (such as newton method, quasi-newton method, BFGS method, LBFGS method, etc.) is adopted to iteratively solve the approximate minimum value of the equation (5).

2. Multilayer perceptron (MLP)

On the basis of simulating human brain, F.Rosenblatt simply abstracts biological nerve cells, and designs a first computer neural network, namely a perception engine. The perceptron is a linear classifier, namely, a linearly separable data set, and a better linear equation can be fit by the perceptron in a learning mode. However, the perceptron model expression capability of single neurons is not sufficient to solve more complex problems than simple linear regression. The multi-layer perceptron has a fully connected structure, and each node except the input layer can be regarded as a processing unit with a non-linear activation function. If each activation function is a linear function, each node of the MLP in any layer can be reduced to a perceptron. Namely, MLP is a classical deep neural network, which is a general class of approximators. That is, providing enough hidden units, an MLP with at least one hidden layer can approximate any measurable function f (x) using an appropriate activation function.

An example of the construction of the sensor and MLP is shown in fig. 14 and 15.

3. Classical nonlinear dynamic model

3.1 Lorenz-63 System-Low dimensional chaos model

The Lorenz-63 system is a simplified mathematical model of atmospheric convection. The nonlinear Lorenz three-variable model system has the advantages of simple calculation and strong nonlinear interaction between variables. The state evolution in the Lorenz-63 system can be given by the following three general differential equations:

where the dots represent the derivative with respect to time. And integrating the model by using the typical values of the parameters sigma, beta and rho which are respectively 10, 8.3 and 28 and are described in the classical Lorenz paper to obtain the chaotic power system.

The equation relates to the behavior of a two-dimensional fluid layer that heats up uniformly from the bottom up and cools down uniformly from the top down. In particular, the equation describes the rate of change of three quantities with time: x is proportional to the convection velocity, y is proportional to the horizontal temperature change, and z is proportional to the vertical temperature change.

3.2 Lorenz-96 System-high-dimensional chaos model

Lorenz-96 is a more complex nonlinear kinetic system than Lorenze-63. Its function can be defined as equation (7). Wherein, the first term on the right is an advection term, and the second term is a damping term. Lorenz-96 is a periodic system that is commonly used to model the time evolution of atmospheric variables.

Where J is the number of state variables. Fig. 16 shows a connection line of the true states of 40 state variables of the Lorenz-96 model at a given time when J is 40, the state variables being uniformly distributed on the equator of the earth, F representing an externally forced constant set to 8, which is a known general value that forms chaotic behavior.

4. Problem definition

In business systems, short-term forecasting is usedAs a first guess field (x) ^b ) In combination with the new observation (y) ^o ) To obtain an analysis field (x) ^a ) Referred to as a data assimilation analysis cycle (see FIG. 17). Wherein, y ^o After real-time measurement of different devices, the measurement result is obtained by inversion of an empirical function. The prediction model is based on the x of the current time ^a Short term prediction for IC, giving a new x ^b . Thus, a complete analysis cycle consists of assimilation and model prediction, which are repeated.

In 3D-Var, the goal is to find the best weights for two sources within a single time t, as in equation (8).

Based on the first order Markov assumption, a traditional DA can be defined as a regression problem that relies only on variables at a single time. However, current x ^a This redefines the DA as a time series problem, subject to multiple time instants of historical data. The invention extends and takes into account future short-term predictions x ^b Is derived from the previous x ^a Evolved so that the future short-term y which can be measured independently can be utilized according to the dynamic property of the DA ^o And x corrected at the same time ^b To correct the current x ^a . An obvious difficulty with the model is finding a way to integrate historical x ^b And y ^o And future y ^o To obtain more accurate x for the new HDA-MLP model ^a . Thus, the assimilation process in a single DA method can be defined as a regression problem based on multiple time-varying variables, and the model can be represented by equation (9).

x ^a ＝f(historical(x ^b ,y ^o ),future(y ^o )) (9)

Where f is the DA model learned from the training data set.

In addition, x is obtained ^a The next process for DA is then short-term prediction by the predictive model. It is known that those predictive models are those havingThe mathematical partial differential equation of semi-empirical nature can also be regarded as a regression problem. However, solving partial differential equations using mathematical physics is difficult and time consuming. In order to speed up the process of model prediction, a generic approximator MLP is used to simulate and replace the prediction model.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An intelligent data assimilation method based on a variation assimilation framework is characterized by comprising the following steps:

s1, analyzing the first half of the data of the assimilation window, and correcting the assimilation time window [ t ] with the second half of the data ₁ ,t _n ]Intermediate analytical field of formula

Determining the analysis field of the current time

In which

n is the length of the assimilation time window, and n is 1,2 and 3; ct is the middle time of the assimilation time window; [1, p ]]Represents the interval of the first half data, [0, q ]]Equal to the time window t ₁ ,t _n ]The interval of (1); ω and v are weight matrices of x and y, respectively, bias is the intercept term, x ^b As a background field, y ^o Is a known observed value;

Wherein

An analysis field representing the ct instant obtained using the first MLP;

s3, comparing the previous training data set with

Superposing and training the next multilayer perceptron to obtain

Equivalent to simulating a predictive model for short-term prediction during a data assimilation period, wherein

Before replacement

Form a new training data set until ct is ahead

Is provided, wherein

Represents the time before ct by the traditionIf ct is 6, then when t is the [7,12 ] is in the background field obtained by the 3D-Var method of data assimilation]When it is, then

Represents the background field obtained by the MLP method,

representing the background field at time t obtained using the second MLP.

2. The method of claim 1, wherein ct > 3.

3. The method of claim 1, wherein the previous input node of S2 and S3 is a vector obtained by superimposing the background information of time before ct and the observed value of time n before ct; the latter vector is the other vector, which contains the previous training data set and an analysis result.

4. The method of claim 1, wherein one of the output nodes of S2 and S3 is at time t

The other output node being at time t +1

Wherein

Representing the analysis field at time t obtained using the first MLP.

5. The method of claim 1 wherein each of said S2 and S3 has 3 hidden layers, and the number of neurons in each hidden layer is experimentally obtained.

6. The method of claim 1 wherein the results of S2 and S3 are nonlinear with ReLU as the activation function.

7. The method of claim 1 wherein each of said S2 and S3 defines a learning rate with appropriate weight decay in each of said multi-level sensors.

8. The method of claim 1 wherein the loss function in S2 and S3 is Mean Square Error (MSE).