CN117809203B

CN117809203B - Multi-task continuous learning cross-sea area tropical cyclone strength estimation method

Info

Publication number: CN117809203B
Application number: CN202410217961.1A
Authority: CN
Inventors: 丁嘉慕; 杭仁龙; 刘青山
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-02-28
Filing date: 2024-02-28
Publication date: 2024-05-14
Anticipated expiration: 2044-02-28
Also published as: CN117809203A

Abstract

The invention discloses a multi-task continuous learning cross-sea area tropical cyclone strength estimation method, which comprises the following steps: the method comprises the steps of constructing a tropical cyclone strength estimation model, wherein the tropical cyclone strength estimation model comprises a sea area aggregation residual module and an air-pressure task module which are connected with each other, a single-frame tropical cyclone image formed by splicing an infrared channel and a water vapor channel is used as input of the tropical cyclone strength estimation model, and a maximum continuous wind speed and a minimum air pressure value obtained by regression of a neural network are used as output of the tropical cyclone strength estimation model; and inputting the infrared vapor data of a single frame and the designated sea area ID to a trained tropical cyclone strength estimation model, and outputting a corresponding maximum continuous wind speed value. The invention can be widely used in a plurality of sea areas at the same time and has stronger practicability.

Description

Multi-task continuous learning cross-sea area tropical cyclone strength estimation method

Technical Field

The invention belongs to the technical field of cross-sea area tropical cyclone analysis, and particularly relates to a multi-task continuous learning cross-sea area tropical cyclone intensity estimation method.

Background

Tropical cyclones can cause many secondary disasters such as floods, tornadoes, debris flows, storm surge, etc. Accurate prediction of tropical cyclone intensities is critical to disaster prevention and decision specification. The three methods currently mainstream are a method based on mathematical physical model analysis, a method based on statistical analysis and a method based on deep learning respectively. The mathematical physical model-based method refers to computing an assessment model through mathematical modeling and physical rules. Numerical weather forecast (NWP) mode is dominant in this field. Wherein ECMWF, GFS and the like are all models for global weather forecast. Plays an important role in weather forecast. However, the results of this type of approach are greatly affected by the initial values, have instability, start slowly and require a large amount of resources to be consumed for computation. The most representative of the methods based on statistical analysis is the de wok technique. This approach assumes that tropical cyclones of comparable intensity exhibit a similar pattern. However, this method is highly subjective and depends on the experience of the expert. Specifically, it requires an expert to provide the center location of the tropical cyclone, followed by a check of the intensity change over the past 24 hours to determine the current tropical cyclone intensity.

In recent years, with rapid development of computing power and increasing data volume, more and more researchers have tried to input meteorological data into data-driven neural networks, in combination with some meteorological knowledge. The intensity of the tropical cyclone is accurately estimated by combining the neural network with the expertise. Most of the existing methods assume that the training data and the test data are in the same distribution. However, due to the large difference in geographic location in different sea areas, inconsistent distribution of tropical cyclone intensity types, large differences in satellite parameters in different sea areas, and the like. Will cause a significant degradation in performance when the model is applied to other sea areas.

Disclosure of Invention

The technical problems to be solved are as follows: aiming at the domain migration phenomenon, the invention provides a multi-task continuous learning cross-sea area tropical cyclone strength estimation method, which solves the problem by introducing the combination of continuous learning and tropical cyclone strength estimation, and the tropical cyclone strength estimation model can simultaneously adapt to different distributions of different sea areas; on the other hand, as nonlinear strong correlation exists between the maximum continuous wind speed and the minimum air pressure, the method utilizes the parameter sharing part to fit the relation between the maximum continuous wind speed and the minimum air pressure through designing the multi-task learning, thereby improving the characteristic extraction capacity of the tropical cyclone strength estimation model on satellite cloud images in strength estimation.

The technical scheme is as follows:

A multi-task continuous learning cross-sea area tropical cyclone strength estimation method, comprising the steps of:

Collecting GridSat satellite data of a plurality of sea areas, cutting the data, and constructing a training set and a testing set after marking the maximum continuous wind speed and the minimum air pressure;

The method comprises the steps of constructing a tropical cyclone strength estimation model, wherein the tropical cyclone strength estimation model comprises a sea area aggregation residual module and an air-pressure task module which are connected with each other, a single-frame tropical cyclone image formed by splicing an infrared channel and a water vapor channel is used as input of the tropical cyclone strength estimation model, and a maximum continuous wind speed and a minimum air pressure value obtained by regression of a neural network are used as output of the tropical cyclone strength estimation model; the sea area aggregation residual error module is used as a hard parameter sharing part in multi-task learning and comprises a domain sharing layer and a plurality of domain specific layers, wherein the domain sharing layer is used for extracting common features in all sea area images; the domain-specific layers correspond to the sea domain IDs, and each domain-specific layer is used for respectively extracting unique characteristics of one sea domain; the wind-pressure task module includes a domain-specific layer;

training and testing the tropical cyclone strength estimation model by adopting a training set and a testing set respectively;

And inputting the infrared vapor data of a single frame and the designated sea area ID to a trained tropical cyclone strength estimation model, and outputting a corresponding maximum continuous wind speed value.

Further, the process of acquiring and cropping GridSat satellite data for a plurality of sea areas comprises the following steps:

performing linear interpolation on the label data to enable the label data to be matched with the original image data of the infrared channel and the water vapor channel in the time dimension;

Screening the label data, reserving the label data simultaneously containing two values of the maximum continuous wind speed and the minimum air pressure, and screening the original image data by adopting the screened label data;

expanding the cyclone center of the screened original image data up and down and left and right by 64 pixel points, and then cutting to form a 128×128 image;

and splicing the image data of the cut infrared channel and the cut water vapor channel to form an input image.

Further, the domain sharing layer takes ResNet as a backbone network and comprises a convolution layer, a pooling layer, a jump connection layer, a batch normalization layer and an activation function which are sequentially connected; the domain specific layer takes a parallel residual error adaption device as a backbone network and consists of a convolution layer, batch Normalization specific layers and a jump connection specific layer, wherein the jump connection specific layer in the domain specific layer is a learnable super parameter, and the parameter of the whole network is controlled by enabling each sea area to learn a unique jump connection parameter so as to reflect the difference between different sea areas.

Further, the sea area aggregation residual error module is used for extracting high-level semantic features from the input double-channel image, wherein the extracted high-level semantic features comprise common features of all sea areas and unique features of each sea area; the sea area aggregation residual error module consists of all domain sharing layers and partial domain specific layers, and specifically comprises a convolution layer, an attention channel, a first convolution sharing layer, a first jump connection specific layer, a first BN (Batch Normalization) specific layer, a second convolution sharing layer, a second jump connection specific layer, a second BN (BatchNormalization) specific layer and a ReLU layer;

The convolution layer performs feature extraction on the input two-channel image, and learns the importance of different channels on intensity estimation by adopting a shared attention channel; the characteristic extracted by the convolution layer is input into a first specific jump connection layer and a first convolution sharing layer to respectively extract and obtain the characteristic and common characteristic of the deeper layer of the current sea area, and the first BN specific layer normalizes the parameters to obtain the initial unique characteristic of the current sea area and the common characteristic of the previous learned sea area;

and the second convolution sharing layer and the second jump connection specific layer further extract deep features of the initial unique features of the current sea area obtained by normalizing the first BN specific layer and common features of the sea area learned before, and after the second BN layer is used for normalizing again, the features of the current sea area obtained by normalizing are transmitted to an activation function ReLU layer, so that high-level semantic information of the current sea area contained in the input double-channel image is obtained.

Further, the cross-sea area tropical cyclone strength estimation method further comprises the following steps:

the wind-pressure module outputs two estimated intensity values And tag/>Make regression loss；

Respectively inputting the data of the current sea area into the previous t-1 old models to obtain estimation results as pseudo tagsAll pseudo tags/>Estimated value of model of t-th stage/>, respectivelyDistillation loss/>；

Loss through both andTo update the domain-specific layer and the domain-sharing layer.

The beneficial effects are that:

firstly, according to the multi-task continuous learning cross-sea area tropical cyclone strength estimation method, continuous learning and tropical cyclone strength estimation are introduced, so that a tropical cyclone strength estimation model can be simultaneously adapted to different distributions of different sea areas;

Secondly, according to the multi-task continuous learning cross-sea area tropical cyclone strength estimation method, the multi-task learning is designed, and the parameter sharing part is utilized to fit the relation between the multi-task continuous learning cross-sea area tropical cyclone strength estimation model and the multi-sea area tropical cyclone strength estimation model, so that the characteristic extraction capacity of a tropical cyclone strength estimation model on satellite cloud images in strength estimation is improved.

Thirdly, the multi-task continuous learning cross-sea area tropical cyclone strength estimation method can be widely used in a plurality of sea areas at the same time, has strong practicability and innovation, and has good performance in the plurality of sea areas.

Drawings

FIG. 1 is a flowchart of a method for estimating tropical cyclone strength across sea areas for multitasking continuous learning according to the present invention;

FIG. 2 is a diagram of an overall network architecture of a tropical cyclone intensity estimation model of the present invention;

FIG. 3 is a schematic diagram of a sea area aggregation residual module according to the present invention;

FIG. 4 is a diagram showing the comparison result of the CB-TCIE model with other methods;

FIG. 5 is a graph showing the results of ablation experiments on steam channels and multi-task learning with maximum sustained wind speed as an intensity value;

fig. 6 is a schematic diagram of the results of ablation experiments on steam channels and multitask learning with the lowest air pressure as the intensity value.

Detailed Description

The following examples will provide those skilled in the art with a more complete understanding of the invention, but are not intended to limit the invention in any way.

Referring to fig. 1, the invention discloses a multi-task continuous learning cross-sea area tropical cyclone strength estimation method, which comprises the following steps:

As shown in fig. 1, the present invention adopts a single frame of tropical cyclone image as an input of a tropical cyclone intensity estimation model,. Wherein 2 represents GridSat, respectively including/>、/>H×w denotes the length and width of the image, 128 and 128, respectively. Output is Tropical cyclone intensity value/>. The whole frame comprises two parts, wherein the left half part is a main network characteristic extraction part, namely a sea area aggregation residual error module, and the right half part represents a wind pressure task part. The sea area aggregation residual error module consists of a domain specific layer and a domain sharing layer, wherein the domain specific layer is responsible for extracting the unique characteristics of each sea area, and the domain sharing layer is responsible for extracting the common knowledge of all sea areas. Thus, the whole model can ensure the ability of learning new knowledge and the forgetfulness of old knowledge. The module is also integrally used as a hard parameter sharing part in multi-task learning. Since there is a strong, nonlinear correlation physically between maximum sustained wind speed and minimum air pressure, this nonlinear relationship using CNN can be used exactly to fit. The effect of respectively lifting the two tasks is achieved.

Preprocessing of the input data. The tag data is first linearly interpolated to 3h to match the original image data in the time dimension. And then screening the label data, retaining the label data with two values of the maximum continuous wind speed and the minimum air pressure, screening the original data by using the screened labels, and removing some images with poor quality, such as more null values or abnormal values, and the like. And then, according to the corresponding position of the cyclone center in the label to the satellite image, the images are formed into 128×128 images by expanding 4.5 degrees, namely 64 pixel points, up, down, left and right. And finally, splicing the data of the infrared and water vapor channels to form an input image. In this embodiment, gridSat satellite data are collected and data cut, and images from year 2000 to year 2017 are used as model training sets, and years 2018 to 2021 are used as test sets of models.

The infrared and moisture channels may play a different and important role in tropical cyclone intensity estimation. The infrared channel may provide basic information about tropical cyclone intensity estimation, such as cloud top temperature, convective cloud features, eye structure, and the like. Thus, the data for this channel may indirectly reflect the intensity and nature of the tropical cyclone. At the same time, the water vapor channels may provide humidity, convection characteristics, latent heat, and the like. Thus, the data for this channel can help researchers understand atmospheric conditions and the moisture distribution of tropical cyclones.

To be able to embody the utility of the tropical cyclone intensity estimation model, domain incremental learning is introduced into the tropical cyclone intensity estimation. A domain sharing layer and a domain specific layer are designed. The domain sharing layer is used as a backbone network by ResNet and is used for extracting common high-level semantic features in all sea-area images, and comprises a convolution layer, a pooling layer, a jump connection layer, a batch normalization layer and an activation function which are connected in sequence. The domain-specific layer is composed of Parallel Residual Adapter (PRA) as a backbone network, similar in structure to ResNet network, consisting of 3 x 3 convolutional layers, batch Normalization specific layers, and jump connection specific layers, and the model learns unique specific layer parameters for each sea domain to reflect unique characteristics of each sea domain. The domain-specific layer is different from the domain-shared layer in that the hop-connection portion is a learnable super-parameter and a unique BN-specific layer, so that each sea domain learns a unique hop-connection parameter to control parameters of the entire network and learns a unique batch normalization manner to embody the variability between different sea domains.

The whole tropical cyclone strength estimation model is divided into two parts, namely a sea area aggregation residual error module and an air-pressure task module, as shown in fig. 2. The sea area aggregation residual error module consists of a domain sharing layer and a domain specific layer; the wind-pressure task module consists of domain-specific layers. According to the mathematical relationship between the maximum continuous wind speed and the minimum air pressure in the meteorology, the aim of promoting the tasks mutually is achieved by fitting the maximum continuous wind speed and the minimum air pressure through the nonlinear relationship of the CNN network. And sending the input image data into the module to extract high-level semantic information, and transmitting the high-level semantic information to the wind-pressure relation module to output the intensity estimated values of the two tasks. On the other hand, the freezing module in fig. 2 is a model obtained in the previous training stage, and parameters of the model are not updated in the current stage, so that the domain sharing layer cannot quickly learn the characteristics of the current sea domain to cause excessive forgetting of the learned knowledge of the previous sea domain. The training module is to add a domain specific layer of the current sea area on the basis of the freezing module, and acquire the common knowledge and the specific knowledge of the current sea area through training the domain sharing layer and the current domain specific layer.

Wherein the sea area aggregate residual module is shown in fig. 3. The sea area aggregation residual error module is used for extracting high-level semantic features from the input double-channel image, wherein the extracted high-level semantic features comprise common features of all sea areas and unique features of each sea area; the sea area aggregation residual error module consists of all domain sharing layers and partial domain specific layers, and specifically comprises a convolution layer, an attention channel, a first convolution sharing layer, a first jump connection specific layer, a first BN specific layer, a second convolution sharing layer, a second jump connection specific layer, a second BN specific layer and a ReLU layer; the convolution layer performs feature extraction on the input two-channel image, and learns the importance of different channels on intensity estimation by adopting a shared attention channel; the characteristic extracted by the convolution layer is input into a first specific jump connection layer and a first convolution sharing layer to extract and obtain the characteristic and common characteristic of the deeper level of the current sea area, and the first BN specific layer normalizes the parameters to obtain the characteristic and common characteristic of the current sea area and the previous learned sea area; the second convolution sharing layer and the second jump connection specific layer further extract deep features of the current sea area obtained by normalizing the first BN specific layer, and after the features of the current sea area are normalized again by the second BN layer, the features of the current sea area obtained by normalization are conveyed to an activation function ReLU layer to obtain high-level semantic information of the current sea area contained in the input double-channel image.

The specific training process of the sea area aggregation residual error module is as follows: the training process proceeds to the t-th stage for example. Firstly, initializing a model, namely initializing the model in a t-1 stage, initializing a t-1 specific layer to the specific layer in the t stage, and adding one domain specific layer every time one sea area is added, so that when the training stage t is reached, t-1 trained domain specific layers freeze parameters (a dotted line connecting part), 1 domain specific layer to be trained (a solid line connecting part) and 1 domain sharing layer to be trained (a solid line connecting part) during training. The data of the two channels are input into the convolution layer, attention mechanisms are carried out on the channels, and the importance of different channels for intensity estimation is learned according to the attention mechanisms of the channels. The features are then fed into a model for the current sea area consisting of a domain sharing layer W _s and a jump connection specific layer α ^w to further extract deep features, and the current sea area features are normalized by a Batch Normalization (BN) layer α ^b of the specific sea area. And then continuously extracting deep features, and finally obtaining final high-level semantic information through an activation function ReLU layer. It is through this module that the wind pressure relationship is also fitted. Finally, the high-level semantic information obtained from the sea area aggregation residual error module is input into the wind-pressure task module in fig. 2. Respectively outputting two intensity estimated valuesAnd tag/>Make regression lossThe specific layer and the shared layer are updated to ensure the ability (plasticity) of the model to acquire new knowledge. Meanwhile, the data of the current sea area are respectively input into the previous t-1 old models to obtain estimation results as pseudo tags/>. All pseudo tags/>Estimated value of model of t-th stage/>, respectivelyLoss by distillationFor ensuring that the model does not excessively forget (stability) for old knowledge while learning new knowledge,/>. Loss and/> through bothThe balance of plasticity-stability can be achieved at the same time.

In the reasoning phase, a model trained on t sea areas is divided into t domain-specific layers and 1 domain-sharing layer. When the intensity of a specific sea area needs to be estimated, the cyclone intensity value can be obtained only by inputting the task ID of the corresponding sea area and the satellite image of the double channels.

From the aspects of practicality and innovation, the invention provides a multi-task continuous learning cross-sea area tropical cyclone strength estimation model. The method can be simultaneously suitable for data distribution among different sea areas, and achieves better performance on a plurality of sea areas. The method comprises the following steps: pre-processing GridSat data to obtain infrared ray) Moisture (/ >)) And (5) splicing data of the two channels. On the other hand, IBTrACs tags were linearly interpolated 3 hours by 3 hours to match the time resolution of the satellite data. Then, based on the nonlinear strong correlation between the maximum sustained wind speed and the minimum air pressure, a multi-task learning is constructed to fit this wind-pressure relationship. In order to enable the whole model to adapt to data distribution of a plurality of sea areas, continuous learning is designed. The entire model is divided into a domain-specific layer and a domain-sharing layer. The domain-specific layer is responsible for extracting unique features of each sea area, and the domain-sharing layer is responsible for extracting common knowledge of all sea areas. The training process takes a single frame image as input, and utilizes a parameter sharing part to realize fitting of the wind-pressure relation. Distillation loss and regression loss were designed. Distillation loss is responsible for maintaining knowledge of old sea areas from forgetting, and regression loss is responsible for enhancing learning ability of new knowledge while maintaining old knowledge. In the test stage, only a single frame image and a task ID corresponding to the sea area are required to be input to obtain the intensity value.

(1) Experimental configuration and parameter settings.

The models to which the present invention relates are all trained on the Pytorch framework (version python 3.7) on NVIDIA GeForce RTX 3090. The batch size was set to 32 and 100 epochs were trained using the Adam (Adaptive Moment Estimation) gradient descent method. It should be noted that, since the domain sharing layer gradually extracts information of all sea domains, its learning rate should be slower than that of the domain-specific layer. Here we set the domain specific layer learning rate to 0.0005 and set the domain sharing layer 100 times slower than the specific layer.

Based on the regression task of intensity estimation, the model selects RMSE as a regression loss function for optimization, and is used for guaranteeing the ability (plasticity) of the model to learn new knowledge. The RMSE can measure the difference value between the true value and the predicted value of the tropical cyclone strength, the RMSE is reduced in the process of model optimization, the true value and the predicted value are as close as possible, and the smaller the RMSE is, the better the strength estimation performance is proved.

(2) Experimental results

In order to verify the effect of model strength estimation, the invention selects Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as evaluation indexes, and measures the difference between the estimated value and the true value. Meanwhile, in order to measure the effect of continuous learning, the invention selects the Forgetting Rate (FR) and the Average Forgetting Rate (AFR) as evaluation indexes for representing the forgetting degree of the model on old knowledge.

Experiment one shows that the method of the invention and other methods can be compared with experimental results on a plurality of sea areas at the same time, and the single sea area method represents the result obtained by training a single sea area by using a skeleton model without any continuous learning, wherein five sea areas correspond to five models with different distributions. Since the model is only directed to a single distribution at this time, we use this result as the best result, i.e., the upper bound of the comparative experiment, to measure the effectiveness of continuous learning. Generalization is the result obtained by directly testing a trained model under a single sea area to other sea areas, and can be regarded as the lower bound of the model. Trimming refers to trimming the next sea area with a model of the previous sea area. From the practical point of view, the method has practical significance in increasing other sea areas by taking the sea area as a substrate because of the North Pacific ocean at the site of China. Fig. 4 shows experimental results in increasing order of north west pacific, east pacific, south indian ocean, south pacific and north atlantic. It can be seen that RMSE of CB-TCIE model is only 2.8% lower than the upper bound, far better than generalization and fine tuning methods. It is explained that the continuous learning model across the sea area can be adapted to different data distributions simultaneously.

And the second experiment is an ablation experiment of the model regarding water vapor channel and multitask study by taking the maximum continuous wind speed as intensity and the lowest air pressure as a factor. Here RMSE is used as a measure. The smaller the value, the closer the estimated value of the model is to the true value, and the better the fitting effect is proved. Line 2 of the table of fig. 5 results are the upper bound of the comparison. The third row represents the model using only the image of the infrared channel as input, which can indirectly reflect the tropical cyclone intensity since the infrared channel only provides information on the convective cloud structure, temperature, etc. The water vapor channel provides atmospheric conditions and distribution information around the tropical cyclone of the water vapor, which also plays a vital role in the strength estimation. The estimation capability of the model under a single channel is reduced by 6.6% compared with that of a double channel. The fourth row represents the performance of the model without introducing multitasking conditions, where the model is only responsible for extracting knowledge about the maximum sustained wind speed from the satellite cloud. And the knowledge of the lowest air pressure cannot be extracted from the graph and integrated with the maximum continuous wind speed to improve the overall performance. The model was therefore reduced by 7.8% compared to the present invention.

In order to prove that the performance of two tasks of maximum continuous wind speed and minimum air pressure can be improved simultaneously by introducing the multi-task learning, namely the nonlinear fitting of the relation between the maximum continuous wind speed and the minimum air pressure is realized by sharing the parameter part. The third experiment is an ablation experiment for multi-task learning with respect to the water vapor channel with the lowest air pressure as intensity and the maximum sustained wind speed as factor. It can be seen from fig. 6 that the introduction of the maximum sustained wind speed as a factor also improves the ability of the model to estimate the minimum air pressure as an intensity value. Similarly, the inclusion of the WV channel also facilitates the lowest air pressure as the intensity, which can help the model better extract image information to facilitate the intensity estimation capability.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. The multi-task continuous learning cross-sea area tropical cyclone strength estimation method is characterized by comprising the following steps of:

Inputting infrared vapor data of a single frame and a designated sea area ID to a trained tropical cyclone strength estimation model, and outputting a corresponding maximum continuous wind speed value;

The process of collecting and cutting GridSat satellite data of a plurality of sea areas comprises the following steps:

Splicing the image data of the cut infrared channel and the cut water vapor channel to form an input image;

the domain sharing layer takes ResNet as a backbone network and comprises a convolution layer, a pooling layer, a jump connection layer, a batch normalization layer and an activation function which are sequentially connected; the domain specific layer takes a parallel residual error adaptive device as a backbone network and consists of a convolution layer, batch Normalization layers and a jump connection specific layer, wherein the jump connection specific layer of the domain specific layer is a learnable super parameter, and the parameter of the whole network is controlled by making each sea area learn a unique jump connection parameter so as to reflect the difference between different sea areas;

The sea area aggregation residual error module is used for extracting high-level semantic features from the input double-channel image, wherein the extracted high-level semantic features comprise common features of all sea areas and unique features of each sea area; the sea area aggregation residual error module consists of all domain sharing layers and partial domain specific layers, and specifically comprises a convolution layer, an attention channel, a first convolution sharing layer, a first jump connection specific layer, a first BN specific layer, a second convolution sharing layer, a second jump connection specific layer, a second BN specific layer and a ReLU layer;

The second convolution sharing layer and the second jump connection specific layer perform further deep feature extraction on initial unique features of the current sea area obtained by normalizing the first BN specific layer and common features of the previous learned sea area, after the second BN layer is normalized again, the features of the current sea area obtained by normalizing are conveyed to an activation function ReLU layer, and high-level semantic information of the current sea area contained in the input double-channel image is obtained;

the cross-sea area tropical cyclone strength estimation method further comprises the following steps:

the wind-pressure module outputs two estimated intensity values Regression loss with tag y _t

Respectively inputting the data of the current sea area into the previous t-1 old models to obtain estimation results as pseudo tagsAll pseudo tags/>Estimated value of model of t-th stage/>, respectivelyLoss by distillation

Updating the domain-specific layer and the domain-shared layer by both loss and L _reg+L_KD;

Wherein i is more than or equal to 0 and less than or equal to t-1.