CN112464567A

CN112464567A - Intelligent data assimilation method based on variational and assimilative framework

Info

Publication number: CN112464567A
Application number: CN202011420985.5A
Authority: CN
Inventors: 黄丽蓝; 宋君强; 任开军; 李小勇; 冷洪泽; 王东紫; 卢竞择; 陈睿
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2021-03-09
Anticipated expiration: 2040-12-08
Also published as: CN112464567B

Abstract

The invention discloses an intelligent data assimilation method based on a variation and assimilation framework, comprising the steps of S1, analyzing data of the first half part of an assimilation window, and correcting an assimilation time window [ t ] by using data of the second half part₁,t_n]Intermediate analysis field, determining the analysis field at the current moment

S2 replacement assimilation process using the first multi-layered sensor, analyzing and obtaining the data according to history and future

S3 comparing the previous training data set with

Superposing and training the next multilayer perceptron to obtain

The method is equivalent to the method that a prediction model is simulated in a data assimilation period to carry out short-term prediction; s4, repeating S2 and S3, and generating a new link in each link before ct

Before replacement

Form a new training data set until ct is ahead

Is completely replaced, and the background information in the following training data set is only composed of

Provided is a method.

Description

Intelligent data assimilation method based on variational and assimilative framework

Technical Field

The invention relates to the technical field of data assimilation, in particular to an intelligent data assimilation method based on a variational and assimilative framework.

Background

Numerical Weather Prediction (NWP) is a mainstream weather forecasting approach, which is defined by Bjerknes as an initial/boundary value problem. The future state of the atmosphere sensitively depends on its detailed Initial Conditions (IC), certain Boundary Conditions (BC) and a perfect forecasting model. The prediction model is mostly a mathematical physical model approximately describing the law of motion of the earth system, and is usually expressed by a differential equation. Generally, given the ICs and BCs of a forecasting model, the numerical solution of the forecasting model represents the corresponding numerical weather forecast. With the improvement of theory and the improvement of computing power, the precision of forecasting models and BCs is higher and higher. Therefore, the uncertainty of ICs becomes a factor of NWP prediction error. Data Assimilation (DA) is the integration of all available Data resources, including observations and short-term forecasts (i.e., background fields or initial guesses), to obtain an analysis field, which is expected to be the best estimate of the probability density of the actual atmospheric state. That is, the goal of the DA is to produce the most accurate IC for NWP. Therefore, the establishment of an optimal DA method is of high interest to researchers in the field of meteorology.

Based on bayesian theorem, the variational assimilation method (Var) was the predominant business method for NWP in the 20 th century before the 90 s. Basically, to handle uncertainties from different information, the guiding strategy for the variant assimilation approach is: the new observations are used to correct the initial guess at the specified grid points, i.e. the background field and the observations are combined by minimizing a cost function, which consists of two aspects: one is a penalty term for the background mean distance and the other is the distance to the observed value. The specific implementation way is to directly assimilate a large amount of observation data by providing a static background error covariance matrix (B), wherein the matrix B is calculated by climate data once and cannot well represent the real space-time error statistics, which means that the matrix B may introduce a large background error. Var is divided into 3D-Var, which uses single-time data, and 4D-Var, which uses data for a time window of a given length to estimate the initial state. Although in 4D-Var B implicitly evolves with flow dependent changes and a better initial state than 3D-Var can be obtained, it requires expensive computational expense and requires the building of linear and adjoint numerical models, which is difficult to build.

As a data-driven approach, Machine Learning (ML) is the study of computer algorithms that are automatically improved by experience. In a computer system, experiences mainly exist in the form of data, and ML establishes a mathematical model according to sample data to predict and make decisions. ML has enjoyed significant success in various fields of computer vision, speech recognition and control systems, as well as in related scientific fields of physics, chemistry and biology. The development of ML has brought new opportunities for DA, and some researchers have attempted to explore suitable ML algorithms to solve local problems of traditional and single DA approaches. As one of the geosciences system problems, DA mainly includes three stages, and errors still occur at each stage, and there is no general DA technique that can ideally eliminate the errors. The first stage of DA is the pre-processing and quality control of the observed data, in order to eliminate the errors of the observers, researchers usually use an ideal function to invert the indirect observed data, and use an empirical observation operator (matrix) to interpolate the observed data into a numerical grid, which introduces errors that cannot be underestimated. ML can learn the time-space characteristics and the correlation of data, which has important significance for improving errors introduced by observed values. The second stage is objective analysis, integrating the observed data with the background information provided by the numerical model. Generally, functions of mathematical models for simulating the evolution of the earth system based on physical theory employ empirical parameters, having semi-empirical properties, and therefore, these numerical models can be replaced with ML. The ML method has been shown to be more powerful and flexible than previous mechanistic or semi-empirical modeling methods. Furthermore, compared to current sequential assimilation of DA, ML is parallelizable, which is very helpful to speed up the assimilation process. The third stage is initialization, and the result of target analysis needs to be properly adjusted to avoid the problem that subsequent integration cannot be performed due to noise generated by inconsistent prediction models. Therefore, the output of the DA depends on the uncertainty of the numerical model. Based on a large amount of earth system data, the model error is corrected by using the ML method, aiming at improving the quality of the DA result, and some researchers pay attention.

The final goal of data assimilation is to quickly and accurately obtain the optimal analysis field and provide the optimal initial field for the numerical weather forecast mode. The existing variational assimilation method is based on the Bayesian theory, and has the advantages of directly assimilating a large amount of observation data and simple basic framework. But the storage and direct calculation of the large matrix B are involved, and high requirements are imposed on the storage capacity and the calculation capacity of a calculation device; furthermore, the results obtained by the variational assimilation method, i.e., the analysis field, are generally not the best results for the analysis field, since they are based on a variety of ideal assumptions. In other words, the existing DA directly depends on the quality of the observed field and the background field. The former is measured by various observation devices and obtained by a specific empirical inversion function, and the latter is provided by a numerical model. There occur inevitable errors that need to be eliminated. Based on the first-order Markov assumption and Bayes theorem, the integration and assimilation processes of the numerical model are sequentially performed and are not parallel, so that an assimilation period of the traditional DA method is long and needs to be accelerated.

Disclosure of Invention

To build a general model of the DA, the historical y is used^oAnd x^bAnd future y^oIncluded therein, optimizing current x in a DA system^aSecondly, different models are used for simulation, the process of model prediction is accelerated, the technical problems in the prior art are solved, and the invention provides an intelligent data assimilation method based on a variational and assimilative framework.

The technical scheme of the invention is as follows: an intelligent data assimilation method based on a variation assimilation framework comprises the following steps:

s1, analyzing the first half of the data of the assimilation window, and correcting the assimilation time window [ t ] with the second half of the data₁,t_n]Intermediate analytical field of formula

Determining the analysis field of the current time

In which

n (n ═ 1,2, 3.) is the length of the assimilation time window; ct is the middle time of the assimilation time window; [1, p ]]Interval representing the first half of data, [0, q ]]Is equal to timeMiddle window [ t ]₁,t_n]The interval of (1); ω and v are the weight matrices of x and y, respectively, bias is the intercept term, x^bAs a background field, y^oIs a known observed value;

s2, replacing assimilation process by using the first multi-layer sensor, analyzing and obtaining data according to history and future

Wherein

An analysis field representing the ct instant obtained using the first MLP;

s3, comparing the previous training data set with the previous training data set

Superposing and training the next multilayer perceptron to obtain

Corresponding to the simulation of a prediction model for short-term prediction in a data assimilation period, wherein

The background field representing the ct +1 instant obtained using the second MLP; (ii) a

S4, carrying out iterative training, repeating the processes of S2 and S3, and generating a new link in each link before ct

Before replacement

Form a new training data set until ct is reached

Is provided, wherein

Representing the background field obtained by the conventional data assimilation method 3D-Var before the time of ct, if ct is 6, then when t is ∈ [7,12 ∈]When it is, then

Represents the background field obtained by the MLP method,

representing the background field at time t obtained using the second MLP.

Preferably, ct > 3.

Preferably, the previous input node in S2 and S3 is a vector obtained by superimposing the background information at the time before ct and the observed value at the time before n; the latter vector is the other vector, which contains the previous training data set and an analysis result.

Preferably, one of the output nodes of S2 and S3 is at time t

The other output node being at time t +1

Wherein

Representing the analysis field at time t obtained using the first MLP.

Preferably, each of the multi-layer sensors in S2 and S3 has 3 hidden layers, and the neuron number of each hidden layer is obtained through experiments.

Preferably, in S2 and S3, the result obtained by using ReLU as an activation function is non-linear.

Preferably, the learning rates with appropriate weight attenuation are defined in each of the multi-layered perceptrons in S2 and S3.

Preferably, the loss functions in S2 and S3 are mean square error MSE.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the pure 3D-Var (three-dimensional variation data assimilation), the intelligent data assimilation method provided by the invention effectively improves the quality of an analysis field and has low calculation cost.

2. During the iterative training process of the models, the models are continuously updated

The background error covariance is implicitly corrected to provide more accurate background information for the next analysis field.

3. Compared with the prior art, the 3D-Var of the Lorenz-63 system is increased by 56.44 percent; for the Lorenz-96 system, the 3D-Var is increased by 22.76%; with respect to computational cost, a major advantage of the present invention is that better assimilation results are obtained within an acceptable execution time. Compared with the empirical data assimilation method, the speed is respectively improved by about 3 times and 80 times.

Drawings

FIG. 1 is a block diagram of the preparation of an optimized partial data set;

FIG. 2 is a schematic diagram of HDA-MLP optimized 3D-Var;

FIG. 3 is a data processing block diagram;

FIG. 4 shows the RMSE of different models of the first part of HDA-MLP in the Lorenz-63 system at different time steps;

FIG. 5 shows the RMSE of different models of the first part of HDA-MLP in the Lorenz-96 system at different time steps;

FIG. 6 is a RMSE plot of 50 experiments in the Lorenz-63 system for various DA methods;

FIG. 7 shows x in Lorenz-63 system^aAnd x^bThe RMSE map of (a);

FIG. 8 is a graphical depiction of the trajectory of the state variables in Lorenz-63;

FIG. 9 is a trace schematic of "cold start" in an enlarged view;

FIG. 10 is an enlarged view of a portion of the trace;

FIG. 11 is a RMSE line graph of 50 experiments in a Lorenz-96 system for various DA methods;

FIG. 12 shows x in Lorenz-96 system^aAnd x^bRMSE line graph of (a);

FIG. 13 is the average R of the state variables²And a standard differential intent;

FIG. 14 is a schematic diagram of a sensor structure;

FIG. 15 is a schematic diagram of an example of an MLP;

fig. 16 is a schematic diagram of an example of the Lorenz-96 model (E ═ 40);

FIG. 17 is a schematic diagram of a data assimilation analysis cycle.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Experiments have shown that historical data can convey time signatures, which are critical to describing the flow dependence of atmospheric variables.

In addition, 4D-Var introduces an assimilation time window, and the core idea is to adopt a time interval [ t ] in a DA period₀,t_n]Rather than using only data at a particular time in the 3D-Var. It is based on the short term y^oTo correct the initial state at the beginning of the assimilation time window, similar to the reverse propagation. Based on the ideas of Cache-MLP and assimilation time window, an HDA-MLP model is provided, and a 3D-Var system is optimized.

Data preparation structure of training data set of optimized part is shown in FIG. 1, and the assimilation time window is set as [ t ]₁,t_n]I.e. analysis using the first half of the data, and correction of the time window [ t ] using the second half of the data₁,t_n]The middle analysis field. Thereafter, the current analysis field is determined by the underlying bulletin

And (4) optimizing.

Wherein

Wherein n (n ═ 1,2, 3.) is the length of the assimilation time window; ct is the middle time of the assimilation time window; [1, p ]]Represents the interval of the first half data, [0, q ]]Equal to the time window t₁,t_n]The interval of (1); ω and v are weight matrices of x and y, respectively, bias is the intercept term, x^bAs a background field, y^oFor known observed values, x in the present invention^bAnd y^oThe meanings of expressions are consistent, only different expressions are provided, and the method is suitable for different expression scenes, namely the prior knowledge is equivalent to a background field and is also equivalent to an initial guess field; y is^oThe same is true.

As shown in fig. 2 (assuming ct>3) And constructing a chain MLP model to optimize the result of the 3D-Var. The following is an explanation of its working principle. In link1, the first step is to replace the assimilation process with the first MLP, analyzed and derived from historical and future data

The next step is to sum the previous training data set with

Superposing and training the next MLP to obtain

This is equivalent to simulating a prediction model over a DA period for short-term prediction. Next, iterative training is performed, the steps being the same as link 1. It is worth noting that in the link before ct, every timeA link can generate a new link

Before replacement

Form a new training data set until ct is ahead

Provided is a method. Namely, the MLP model is made to be an independent innovative assimilation system, and the data of the 3D-Var is not depended on, and the 3D-Var only provides the first ct data

To "cold start" the proposed model. During the iterative training process of the models, the models are continuously updated

The optimal configuration of HDA-MLP to 3D-Var is defined by empirical test and is obtained

And

are similar because their training data sets are nearly identical in number, wherein

An analysis field representing the ct instant obtained using the first MLP;

the background field representing the ct +1 instant obtained using the second MLP;

Representing background fields obtained by the MLP method for distinguishing background fields obtained by the 3D-Var method of conventional data assimilation

Representing the background field at time t obtained using the second MLP;

representing the analysis field at time t obtained using the first MLP.

One link in the model has the following features:

1) the previous input node is a vector which is obtained by superposing the background information of the moment before ct and the observed value of the moment before n; the latter vector is the other vector, which contains the previous training data set and an analysis result;

2) one output node at time t

The other output node being at time t +1

3) Each MLP has 3 hidden layers, and the neuron number of each hidden layer is obtained through experiments;

4) taking ReLU as an activation function to obtain a nonlinear result;

5) defining a learning rate with an appropriate weight decay in each MLP;

6) the loss function is Mean Square Error (MSE);

examples

1. Building a data set

In general, two classical nonlinear chaotic power systems are used for carrying out an observation system simulation experiment to check the effectiveness of the DA method. In the present invention, the proposed HDA-MLP method was validated using the Lorenz-63 and Lorenz-96 systems.

Assuming that the Lorenz-63 system is perfect and there is no model error, so the model error covariance matrix Q is 0; given an initial state x₀＝1.508870,y₀＝-1.537121,z₀25.46091, and time-integrates the Lorenz-63 model using a fourth-order Runge-Kutta time difference format, time step d_tThe analog integration step size is 111000, 0.01. Then, without assimilation, can be passed

The formula is integrated to obtain a real field (x)^t). In a true atmosphere model, the ground truth value of a state variable generally represents x^t. At x^tAdding errors in random obedience Gaussian distribution to obtain an approximate observation field (y)^o). Furthermore, a linear observation operator h ═ I and an observation error covariance matrix R ═ I are provided, where I is a third order unitary matrix.

Similarly, under the same ICs, the pure 3D-Var DA system completes 111000 analysis cycles, generating the background field (x)^b) And analysis field (x)^a). In order to achieve the stable state of the Lorenz-63 system and obtain approximate simulated atmosphere, the cold start of the first 1000 integration time steps is needed, the data of the middle 100000 steps is used for preparing a training data set of HDA-MLP, and the data of the last 10000 steps is test data. Fig. 3 is a structure of data preparation.

Furthermore, to avoid the chance and randomness of the experiment, the same initial state, x ', is set for the 3D-Var DA system'₀＝1.508870,y′₀＝-1.537121,z′₀25.46091, which is the 101001 th chaotic state obtained by 101000 step integration with the initial state mentioned in the data preparation process. 10000 cycles of analysis were then continued and repeated 50 times in preparation for testing the data set.

In the Lorenz-96 system, the steps and procedures for generating simulation data are the same as in the Lorenz-63 system except that the number of state variables is increased from 3 to 40 in the Lorenz-63 system, and the corresponding initial states are different.

2. Evaluation index

In order to measure the assimilation and prediction performance of the model, the effect of the novel DA method was evaluated using Root Mean Square Error (RMSE) and Goodness of Fit (Goodness of Fit).

RMSE denotes the analysis field x^aWith the true value x^tSample standard deviation of the difference, which is x^aAnd x^tThe square root of the ratio of the deviation between and the observed number N. RMSE is always non-negative and a value of 0 indicates that the target data fits perfectly. When x is^aAnd target x^tWhen there is a deviation between them, an analysis error occurs. The role of RMSE is to combine the errors at different times into one metric with analytical capabilities. Generally, a lower RMSE is better than a higher RMSE. Therefore, RMSE is widely used in the DA-related research literature to compare the performance of different DA processes. The lower the RMSE, the better the performance of the DA process; otherwise, the larger the analytical error of the DA method. In the present invention, RMSE is defined as:

where i is the ith state variable and e is the e-th set member.

Goodness of fit refers to the degree of fit of the regression line to the observed value. The statistic for measuring the goodness of fit is the determinable coefficient R²，R²Has a maximum value of 1, R²The closer the value is to 1, the better the fit of the regression line to the observed value. R²Has a maximum value of 1, R²The closer the value of (1) is, the better the fitting degree of the regression line and the observed value is; otherwise, R²The smaller the value of (a), the worse the fitting degree of the regression line to the observed value.

Wherein the Total Square Sum (TSS) calculates the total square sum of the samples; residual Sum of Squares (RSS) computes the sum of squares of the residuals, i.e. the sum of squares of the errors.

3. Selection of time step

From the DA period (fig. 3), it can be seen that,

from time t

And

the coupling is calculated directly, but in practice,

is to be

Short-term predictions obtained as ICs of the prediction model, i.e.

And

in the context of a correlation, the correlation,

is dependent on

And

in theory, these correlations are also true for longer periods of the DA cycle. The present invention therefore defines the problem as a time series problem that relies on historical meteorological data, including observation fields and ambient fields. It is common to solve the time series problem with the ML method, but the time step (timeout) is an important parameter for the model to learn the time information. In general, the longer the time step, the more information in the past and future can be used for prediction. But there is a key question whether the accumulation of errors will be larger than the accumulation of valid information. Therefore, the invention makes preliminary experiments to select the optimal time step for subsequent research. As shown in FIGS. 4 and 5, in both Lorenz-63 and Lorenz-96 systems, timekeep 6+11 is better than others, where 6 represents the history y of the first 6 state variables^oAnd x^bIt provides past information, then 5 future y^oAdapted to correct the current x^a. A total of 6 x^bAnd 11 y^oTogether forming a training data set, input data for a neural network to obtain x^a. the construction principle of the timer 4+7 and 8+15 is the same as that of the timer 6+11, but the former has less effective information and the latter has more accumulated errors compared with the timer 6+11, which is disadvantageous to the following research. The invention selects timekeep to be 6+11 to construct the training data set of the proposed model.

4. Analysis of experiments

In this section, to prove the validity and effectiveness of the proposed model, the Lorenz-63 system is taken as an example first, which is a low-dimensional simple chaotic system. The method comprises the steps of optimizing analysis fields of a traditional DA method and a 3D-Var by using HDA-MLP, comparing results of existing pure 3D-Var and Cache-MLP methods, analyzing whether the optimization aims can be achieved or not, drawing a regression curve, calculating goodness of fit, investigating the capability of each method for tracking a real field of a model, and evaluating the superiority of the proposed model.

4.1 analysis of the results of the experiments in the Lorenz-63 System

For HDA-MLP, first, FIG. 6 compares the 50 analytical fields of the Lorenz-63 system calculated by different DA methods, including pure 3D-Var, Cache-MLP and HDA-MLP. In the figure, the solid line marked with a circle represents pure 3D-Var; the Cache-MLP is represented by a solid line marked by a triangle; solid lines with square marks represent HDA-MLP. Table 1 reports the average RMSE of 50 experiments per method (bold indicates best results). By comparison, the minimum RMSE of 0.2105 can be achieved by HDA-MLP. Although Cache-MLP improved the accuracy of 3D-Var by 39.85%, it could even be 55.26% higher using HDA-MLP. This is possible because the HDA-MLP can use a short-term future training data set to extract more temporal features and information.

Table 1: comparison of the results of the experiment of FIG. 6

Method	RMSE	Improvement
			Pure 3D-Var	0.4705	/
Cache-MLP	0.2830	39.85％
			HDA-MLP	0.2049	56.44％

Furthermore, it was found that^oNext, x obtained by HDA-MLP^bImprovement of precision and x^aSimilarly (table 2). This is because HDA-MLP not only uses MLP to optimize the assimilation process, but also uses another MLP to simulate the pattern prediction process. The MLP training data set used for simulation model prediction contains historical data for many moments and a more accurate analysis field. In the chain iterative training process, more time information can be mined and B can be updated implicitly, which helps to obtain a useful simulator. Experimental results show that the trained model can quickly and efficiently optimize model prediction (fig. 7).

Table 2: comparison of the results of the experiment of FIG. 7

In addition, the best method for judging whether the DA method can well track the true value of the model is to draw a tracking track. As shown in fig. 8, a rough qualitative analysis shows that each DA method can track true values for all model three state variables (x, y, z). Wherein the solid line is model x^tDotted line is x of pure 3D-Var^aAnd the dotted line segment is x of Cache-MLP^aThe dotted line combined with the dotted line is x of HDA-MLP^a. Table 3 shows that all methods fit better than 0.99 in each state variable, quantitatively indicating that each method fits the true value of the model well (bold represents the best results). However, in HDA-MLP, R²Slightly larger in each state variable, which means thatThe model performance is better. Some partial sequences of the traces are enlarged and may reflect the details of fig. 8. As shown in fig. 9, the "cold start" period is an important indicator of whether the DA process can bring the assimilation system into steady state quickly. It can be seen that pure 3D-Var and Cache-MLP perform better than HDA-MLP because the former relies on a seemingly better static B, which is computed once from a large amount of meteorological data, while HDA-MLP starts with static B but updates it implicitly to accommodate future assimilation, thus requiring a longer adaptation phase. Furthermore, in fig. 10, all DA methods performed well in monotonically increasing or decreasing trace portions (fig. 10 (a) and fig. 10 (f)), which also yields R²Verification of specific values, in Table 4, R²Is much higher than 0.99 (columns a and f). However, some corners of the trace are seen (fig. 10 (b) and (e)), which represent abrupt changes in the state variables, to which pure 3D-Var cannot adapt, and Cache-MLP performance is slightly better, but with obvious drawbacks. In contrast, HDA-MLP was able to capture changes and obtain the highest R in most cases²(bold in Table 4). The result shows that the proposed model can learn the change of the state variable from the time series training data set, and has better robustness than a pure 3D-Var and a numerical model which are based on ideal assumption and describe the change of the state variable by integration.

Table 3: goodness of fit of various method state variables

Table 4: goodness of fit of partial trajectories for different methods

4.2 analysis of the results of the experiments in the Lorenz-96 System

For HDA-MLP, FIG. 11 shows a comparison of the results of 50 experiments between the various DA methods of Lorenz-96 (notes scale)Note the same as Lorenz-63 system). Pure 3D-Var is represented by a solid line marked with a circle; Cache-MLP is represented by solid lines marked with triangles; HDA-MLP is represented by the square-marked solid line). The models can improve assimilation accuracy with lower RMSE. However, the degree of performance improvement is relatively small, as shown in table 5, which may be caused by the complexity of the Lorenz-96 system itself. As shown in FIG. 17, the Lorenz-96 model includes 40 state variables (x) around the earth₁，x₂，...，x₄₀) More spatial features and therefore more spatial dependencies between state variables than the Lorenz-63 system. Only the training data collected in a longer time step is considered to obtain more time information, and the construction of a better model for extracting the spatial features between variables is omitted. This may be the possible reason why the proposed model only slightly improves accuracy. In the present invention, the proposed model, as shown in fig. 12, uses time information to simulate and optimize the assimilation and model prediction processes of the Lorenz 96 system, which has proven to be effective. Also, HDA-MLP achieved a considerable improvement in the background field of 3D-Var, with an improvement of 28.61% (Table 6). More accurate x^bAnd the method greatly contributes to optimizing an analysis field in the next assimilation. However, the improvement of the Lorenz-96 system is weaker than the Lorenz-63 system due to the accumulation of errors for more state variables.

Table 5: comparison of the results of the experiment of FIG. 11

Method	RMSE	Improvement
			Pure 3D-Var	0.5283	/
Cache-MLP	0.4283	18.93％
			HDA-MLP	0.4080	22.76％

Table 6: comparison of the results of the experiment of FIG. 12

Furthermore, the average RMSE may reflect the assimilation ability of various DA methods for all state variables in the Lorenz-96 system, but R²But can show the degree of fit of each state variable, so the average R of all 40 state variables is calculated²And standard deviation (STD) and plotted in FIG. 13 (dotted labeled pure 3D-Var, circled on straight labeled HAD-MLP, cross labeled Cache-MLP). Intuitively, better R can be obtained by HDA-MLP²R of each state variable²Are all close to 1. That is, the HDA-MLP can better fit each state variable. Meanwhile, the STD of HDA-MLP is smaller compared with other methods (except static pure 3D-Var), which shows that the result of HDA-MLP is more concentrated near the average value and the model is more stable. The proposed model is trained by big data containing a large amount of state variable changes, and the neural network can learn the complex relation and time characteristics among the state variables, so that the HDA-MLP can adapt to more test data sets. In general, compared with pure 3D-Var and Cache-MLP, the proposed model can reach the highest R on the premise of ensuring the stability of the DA system²The state variables can be fitted well.

5. Calculating the cost

DA is the process of the assimilation and long-short term prediction iterative loop. It is important to improve the quality of DA. However, the speed of the DA process is also an essential factor in determining the utility and effectiveness of the hybrid DA process. The ultimate goal of the DA is to quickly and accurately obtain the best ICs for NWP. The computing device used in all experiments of the invention was configured as GEFORCE RTX 2060, Intel CORE i 7. Table 7 below shows the calculated costs for each mixing method under different models.

Table 7: computational cost of various hybrid methods to different models

Previous hybrid approaches Cache-MLP and backpacked-MLP take the least time because it is only a forward propagation process after model training is complete. However, it relies on the pure 3D-Var background field, i.e. during the test, it is necessary to wait for the pure 3D-Var to complete the short-term prediction to obtain the background field for further test experiments, which is not added in Table 7. However, the HDA-MLP used to optimize the 3D-Var results only requires the first ct (a certain time step) background field to be "cold started". Then, it obtains background fields through an iterative loop, and then predicts the analysis field of the next time step using these background fields. In general, the novel model of (a) can yield higher quality results with an acceptable loss of time in a given integration step.

Inspired by the concept of four-dimensional variational data assimilation, the invention provides an innovative DA method named HDA-MLP. In the present invention, the DA can be defined as a time series problem, relying on historical weather data comprising the observed field and the first guess field. Furthermore, as with 4D-Var, to obtain a more robust hybrid DA system, a short term prediction is added to the HDA-MLP to correct the results at the current time. Under the guidance of the time characteristics of the state variables, HDA-MLP does optimize the analysis field of the conventional DA system in two classical nonlinear dynamical models by implicitly updating the background error covariance. Specifically, for the Lorenz-63 system, the 3D-Var increased by 56.44%; for the Lorenz-96 system, the 3D-Var increased by 22.76%.

The principle of the invention is as follows:

1. three-dimensional variation data assimilation (3D-Var)

Based on Bayes' theorem, the variational approach has been the dominant business approach to NWP in the past 90 s of the 20 th century. Typically, the 3D-Var finds the analytic field by estimating a single initial state of the NWP model by minimizing a cost function L. The cost function L is expressed as:

wherein x represents a state variable; x is the number of^bA priori knowledge (also known as background fields) representing state variables; y is^oIs a known observed value; h is an observation operator, h (x) is intended to calculate the values in the observation points using the model variables in the grid points; b is a background error covariance matrix, and a background field x is derived from known laws of atmospheric dynamics and physics^bCorrelation with model grid points; r is an observation error covariance matrix which can be obtained from the observation error of the instrument to describe the observed value y^oAnd the correlation between the model grid points, and T represents the transposition operation of the matrix.

The cost function L is introduced as follows:

L_bfor measuring x and x^bEven if x fits x based on the background error covariance B^b。

L_oFor measuring x and x^oEven x fits y based on the observation error covariance R^oWhere h is the observation operator, which performs the necessary interpolation and converts the mode variables into observed spatial field values.

The minimum value of formula (1), i.e. the jacobian vector formula satisfying formula (2), is obtained by the nonlinear conjugate gradient method:

wherein H is the observation operator HA tangency operator defined as:

according to formula (2), x^aIs the state that minimizes the function L and is solved by equation (4):

[B^-1+H^TR^-1h]x^a＝B^-1x_b+H^TR^-1y^o (4)

the optimal analysis field is then:

x^a＝[B^-1+H^TR^-1h]^-1[B^-1x^b+H^TR^-1y^o]

＝[B^-1+H^TR^-1h]^-1[B^-1x^b+H^TR^-1h(x^b)-H^TR^-1h(x^b)+H^TR^-1y^o] (5)

＝x^b+[B^-1+H^TR^-1h]^-1H^TR^-1(y^o-h(x^b))

wherein, h (x)^b) Perform the necessary interpolation and change the mode variable x^bConversion into values of a corresponding observed spatial field

Minimizing the cost function L can be regarded as an inverse problem, since the construction and inversion operation of the error covariance matrix B, R are difficult points of three-dimensional variation and assimilation, and the inversion of the high-dimensional matrix B, R is very difficult, and requires a large amount of computing resources and storage space, when actually solving this problem, the equation (5) is not directly solved, but gradient information of the objective function is obtained through proper transformation, and then an optimization method (such as newton method, quasi-newton method, BFGS method, LBFGS method, etc.) is adopted to iteratively solve the approximate minimum value of the equation (5).

2. Multilayer perceptron (MLP)

On the basis of simulating human brain, F.Rosenblatt simply abstracts biological nerve cells, and designs a first computer neural network, namely a perception engine. The perceptron is a linear classifier, namely, a linearly separable data set, and a better linear equation can be fit by the perceptron in a learning mode. However, the perceptron model expression capability of single neurons is not sufficient to solve more complex problems than simple linear regression. The multi-layer perceptron has a fully connected structure, and each node except the input layer can be regarded as a processing unit with a non-linear activation function. If each activation function is a linear function, each node of the MLP in any layer can be reduced to a perceptron. Namely, MLP is a classical deep neural network, which is a general class of approximators. That is, providing enough hidden units, an MLP with at least one hidden layer can approximate any measurable function f (x) using an appropriate activation function.

An example of the construction of the sensor and MLP is shown in fig. 14 and 15.

3. Classical nonlinear dynamic model

3.1 Lorenz-63 System-Low dimensional chaos model

The Lorenz-63 system is a simplified mathematical model of atmospheric convection. The nonlinear Lorenz three-variable model system has the advantages of simple calculation and strong nonlinear interaction between variables. The state evolution in the Lorenz-63 system can be given by the following three general differential equations:

where the dots represent the derivative with respect to time. And integrating the model by using the typical values of the parameters sigma, beta and rho which are respectively 10, 8.3 and 28 and are described in the classical Lorenz paper to obtain the chaotic power system.

The equation relates to the behavior of a two-dimensional fluid layer that heats up uniformly from the bottom up and cools down uniformly from the top down. In particular, the equation describes the rate of change of three quantities with time: x is proportional to the convection velocity, y is proportional to the horizontal temperature change, and z is proportional to the vertical temperature change.

3.2 Lorenz-96 System-high-dimensional chaos model

Lorenz-96 is a more complex nonlinear kinetic system than Lorenze-63. Its function can be defined as equation (7). Wherein, the first term on the right is an advection term, and the second term is a damping term. Lorenz-96 is a periodic system that is commonly used to model the time evolution of atmospheric variables.

Where J is the number of state variables. Fig. 16 shows a connection line of the true states of 40 state variables of the Lorenz-96 model at a given time when J is 40, the state variables being uniformly distributed on the equator of the earth, F representing an externally forced constant set to 8, which is a known general value that forms chaotic behavior.

4. Problem definition

In business systems, short-term forecasts are used as initial guess fields (x)^b) In combination with the new observation (y)^o) To obtain an analysis field (x)^a) Referred to as a data assimilation analysis cycle (see FIG. 17). Wherein, y^oAfter real-time measurement of different devices, the measurement result is obtained by inversion of an empirical function. The prediction model is based on x at the current moment^aShort term prediction for IC, giving a new x^b. Thus, a complete analysis cycle consists of assimilation and model prediction, which are repeated.

In 3D-Var, the goal is to find the best weights for two sources within a single time t, as in equation (8).

Based on the first order Markov assumption, a traditional DA can be defined as a regression problem that relies only on variables at a single time. However, current x^aThis redefines the DA as a time series problem, subject to multiple time instants of historical data. The inventionExhibits and considers future short-term predictions x^bIs derived from the previous x^aEvolved so that the future short-term y which can be measured independently can be utilized according to the dynamic property of the DA^oAnd x corrected at the same time^bTo correct the current x^a. An obvious difficulty with the model is finding a way to integrate historical x^bAnd y^oAnd future y^oTo obtain more accurate x for the new HDA-MLP model^a. Thus, the assimilation process in a single DA method can be defined as a regression problem based on multiple time-varying variables, and the model can be represented by equation (9).

x^a＝f(historical(x^b,y^o),future(y^o)) (9)

Where f is the DA model learned from the training data set.

In addition, x is obtained^aThe next process for DA is then short-term prediction by the predictive model. Those predictive models are known to be mathematical partial differential equations of semi-empirical nature and can also be considered as a regression problem. However, solving partial differential equations using mathematical physics is difficult and time consuming. In order to speed up the process of model prediction, a generic approximator MLP is used to simulate and replace the prediction model.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An intelligent data assimilation method based on a variation assimilation framework is characterized by comprising the following steps:

Determining the analysis field of the current time

In which

q-n-1, n (n-1, 2, 3.) is the length of the assimilation time window; ct is the middle time of the assimilation time window; [1, p ]]Represents the interval of the first half data, [0, q ]]Equal to the time window t₁,t_n]The interval of (1); ω and v are weight matrices of x and y, respectively, bias is the intercept term, x^bAs a background field, y^oIs a known observed value;

Wherein

An analysis field representing the ct instant obtained using the first MLP;

Superposing and training the next multilayer perceptron to obtain

s4, carrying out iterative training, and repeating S2In the process of S3, each link before ct generates a new link

Before replacement

Form a new training data set until ct is ahead

Is provided, wherein

Represents the background field obtained by the MLP method,

representing the background field at time t obtained using the second MLP.

2. The method of claim 1, wherein ct > 3.

3. The method of claim 1, wherein the previous input node of S2 and S3 is a vector obtained by superimposing the background information of time before ct and the observed value of time n before ct; the latter vector is the other vector, which contains the previous training data set and an analysis result.

4. The method of claim 1, wherein one of the output nodes of S2 and S3 is at time t

The other output node being at time t +1

Wherein

Representing the analysis field at time t obtained using the first MLP.

5. The method of claim 1 wherein each of said S2 and S3 has 3 hidden layers, and the number of neurons in each hidden layer is experimentally obtained.

6. The method of claim 1 wherein the results of S2 and S3 are nonlinear with ReLU as the activation function.

7. The method of claim 1 wherein each of said S2 and S3 defines a learning rate with appropriate weight decay in each of said multi-level sensors.

8. The method of claim 1 wherein the loss function in S2 and S3 is Mean Square Error (MSE).