CN116451083A

CN116451083A - Unsupervised carbon dioxide emission monitoring method based on deep transfer learning

Info

Publication number: CN116451083A
Application number: CN202310454552.9A
Authority: CN
Inventors: 陈磊; 杨玲; 徐炜; 王健; 郭诚
Original assignee: Wenergy Maanshan Electric Power Generation Co ltd
Current assignee: Wenergy Maanshan Electric Power Generation Co ltd
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-07-18

Abstract

The invention discloses an unsupervised carbon dioxide emission monitoring method based on deep transfer learning, and relates to the technical field of carbon dioxide emission monitoring. The unsupervised carbon dioxide emission monitoring method based on deep transfer learning has the following advantages and effects: tags of target domain data are often difficult or very costly to acquire during the transfer learning process. The use of a model trained using only source domain labeled data with a common non-migration method directly for predicting target domains is often unsatisfactory. The model can be trained under the condition of lacking the target domain label based on the deep unsupervised transfer learning method. Compared with the traditional countermeasure field self-adaptive method, the method adopts a double-flow structure, simultaneously focuses on the influence of edge distribution difference and the difference of conditional distribution, and represents the relative importance of feature mobility and separability through balance factors.

Description

Unsupervised carbon dioxide emission monitoring method based on deep transfer learning

Technical Field

The invention relates to the technical field of carbon dioxide emission monitoring, in particular to an unsupervised carbon dioxide emission monitoring method based on deep transfer learning.

Background

One effective way of carbon neutralization is carbon trade, which is premised on accurate carbon monitoring. For carbon emission equipment of different types, the data distribution of carbon emission data may be greatly changed, so that the generalization of a carbon dioxide concentration prediction model directly constructed according to a training sample collected at random is poor, and a training sample set adopted in the training process and a prediction sample set in the prediction process have great difference in data distribution, so that the accuracy of carbon dioxide concentration prediction is influenced.

The unsupervised carbon dioxide emission monitoring method based on deep transfer learning has the following advantages and effects: tags of target domain data are often difficult or very costly to acquire during the transfer learning process. The use of a model trained using only source domain labeled data with a common non-migration method directly for predicting target domains is often unsatisfactory. The model can be trained under the condition of lacking the target domain label based on the deep unsupervised transfer learning method. Compared with the traditional countermeasure field self-adaptive method, the method adopts a double-flow structure, simultaneously focuses on the influence of edge distribution difference and the difference of conditional distribution, and represents the relative importance of feature mobility and separability through balance factors.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides an unsupervised carbon dioxide emission monitoring method based on deep transfer learning, which solves the technical problems in the background art.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme: an unsupervised carbon dioxide emission monitoring method based on deep transfer learning comprises the following steps:

s1: preprocessing data;

s2: preprocessing data;

s3: setting up a model;

s4: training a model;

s5: testing a model;

the method comprises the steps that S1, first carbon emission data corresponding to first carbon emission equipment and second carbon emission data corresponding to second carbon emission equipment are obtained, wherein the types of the first carbon emission equipment and the second carbon emission equipment are different; taking the first carbon emission device as source domain data and the second carbon emission device as target domain data; collecting source domain and target domain data to obtain labeled source domain data < X ^s ，Y ^s > and unlabeled target domain data X ^t Wherein X represents data and Y represents its corresponding tag; taking a carbon emission data set of a power plant as an example, data such as temperature, humidity, coal consumption and the like at a certain sampling time form a feature vector, wherein one feature vector is a sample X, X is a transverse quantity of d dimension, d represents the acquired sample, Y is a scalar, namely a label corresponding to the sample, represents the concentration of carbon dioxide, and a marked sample set { (X) can be obtained through data collection for a period of time _i ，y _i ) Distinguishing the boilers of different models, and obtaining a labeling sample set corresponding to each model of boiler;

the step S2 comprises the step of carrying out normalization processing on source domain data and target domain data, and aims to eliminate the influence of the problems of quantity level differentiation, different data value ranges, unobvious data trend and the like in an original data sample on model training, and simultaneously improve model precision and model training speed;

the S3 comprises a model network adopting a double-flow structure, wherein the model network adopting the double-flow structure comprises two characteristic extraction neural networks G1 and G2; two tag classifiers C1, C2, C1 are primary classifiers, C2 is a final classifier; an countermeasure field discriminator D, wherein D packetsGlobal discriminator G _d Local discriminatork=1, 2, …, K is the number of data categories; and a distributed difference explicit measurement module, wherein the method for building the model comprises the following steps:

s31: selecting a proper network as a feature extractor, inputting tagged source domain data and untagged target domain data into G1 and G2, wherein the output features of the tags passing through G1 and G2 are fs1, ft1, fs2 and ft2, wherein fs1 and ft1 respectively represent X ^s Through the output characteristics of G1, X ^t Outputting the characteristic through G1; fs2, ft2 denote X respectively ^s Through the output characteristics of G2, X ^t Outputting the characteristic through G2; g1, G2 may employ one of a Resnet, VGG, and multiple CNN networks;

s32: c1 and C2 are conventional label classifiers such as neural networks and support vector machines; for classifying the data. Training a label classifier by adopting source domain labeled data, and if cross entropy loss training is adopted, the general expression of label classifier loss is as follows:

D _s representing source domain data, n _s The number of source domain data is represented,x represents _i Probability of belonging to class k, C _y Representing a tag classifier, G _f A representation feature extractor;

s33: taking fs1, ft1 as input to the challenge domain evaluator D, D is capable of reducing edge distribution differences between source domain and target domain data by evaluating input features from the source domain or the target domain; a common domain discriminator consists of a multi-layer perceptron and a Softmax function; marking the source domain data as 1, marking the target domain data as 0, if the input of a sample is carried out, outputting the sample from the source domain or the target domain, and calculating the loss value of the domain discriminator according to the actual result and the predicted value; if trained with a cross entropy loss function, the loss of the challenge domain discriminator can be expressed as:

x∈X ^s ∪X ^t m represents the number of samples of one batch, d ⁱ A domain label representing the i-th sample,represents the output of the ith sample through D, θ _G1 ，θ _d Respectively representing parameters in G1 and D;

wherein the global domain discriminator G _d The loss can be expressed as:

D _s representing source domain data, D _t Representing target domain data, n _s ，n _t Respectively represent D _s ，D _t Data, L _ce Representing cross entropy loss as a loss function of the domain classifier;

the local domain discriminator is subdivided into K domain discriminatorsEach class discriminator is responsible for matching the source domain data associated with class k with the target domain data, the partitioning over the target domain being based on the pseudo tags generated by the tag classifier. The loss function of the local area discriminator may be calculated as:

is a domain nameDevice for preventing and/or stopping the flow of air>Cross entropy loss of class k corresponding to the domain discriminator,/->Is X _i A probability distribution predicted as k classes;

usingTo measure the importance of domain discriminators, global domain discriminatorsExpressed as:

local area discriminatorExpressed as:

sample representing class k in source domain and target domain, respectively,/->Representing the loss of the local subfield discriminator on class k; finally, the dynamic challenge factor ω is expressed as:

in the above-described antagonistic domain adaptive structure, its final learning objective can be expressed as:

θ _f ，θ _y ，θ _d ，respectively represent G1, C1, G _d ，/>Wherein the value of ω is self-calculated over the network;

s34: the domain discriminator ensures the mobility contained in the feature, but paying too much attention to the mobility of the data can lead to the reduction of the mobility of the class in the data, and a balance factor is introduced for balancing the mobility and the mobility of the class:

maximum mean difference MMD (D _s ，D _t ) The method is a common estimation method for calculating the alignment degree of data distribution between two domains, and is used for measuring the mobility of the domains; the separability of the classes in the domain is measured by using a discrimination evaluation method maxJ (W) based on linear discriminant analysis, which is defined as follows:

wherein S is _b Is an inter-class scattering matrix, S _w Is an intra-class scattering matrix; clearly, the larger maxJ (W) means better separability;

since the estimated values of the two evaluation criteria are not usually on the same order of magnitude, the estimated values need to be further normalized to obtain

The balance factor is defined as follows:

of which smallerIndicating a better domain alignment, smaller +.>Indicating better class authenticability;

s35: in combination with the above S31, S32, S33, S34, the loss of the final superstructure is defined as:

wherein τ and ω are parameters calculated by the network itself;

s36: in the lower layer structure, according to the advantages of the maximum mean difference method, the Hilbert space embedding of the joint distribution is selected to measure the difference of two joint distributions P and Q, the distribution in one domain is transferred into the Regenerated Kernel Hilbert Space (RKHS), and the joint probability distribution loss can be obtained by directly calculating the MMD distance between a source domain and a target domain in the RKHS:

P _S (x _s ，y _s )，P _T (x _T ，y _T ) Representing the joint probability distribution of the source domain and the target domain respectively,respectively represent D _s ，D _t Features in RKHS corresponding to the ith data in (a)>Respectively represent D _s ，D _t Class labels corresponding to the ith and j data;

the step S4 includes step S41: in the upper layer structure, X is ^s ，X ^t As G1 input, training G1 and D to obtain optimal parameters using resistance training; because the target domain does not contain a label, only training C1 by adopting the source domain data, using the trained C1 for the prediction of the target domain data category, and taking the output of C1 as a pseudo label of the target domain dataThe training loss for C1 is as follows:

combining S35 can result in loss in the superstructure:

s42: x in the lower layer structure ^s ，X ^t As input of G2, the feature Z extracted by G2 is obtained ^s ，Z ^t ，Z ^s ，Z ^t Respectively X ^s ，X ^t The output characteristics obtained through G2; by < Z ^s ，Y ^s ＞，Calculate L _jmmd ；

S43: to integrate the migration ability of G1, G2 after training, X is calculated ^s The outputs of G1 and G2 are fused, and the fused features are used as the input of C2 for training, and the training loss of C2 is expressed as follows:

s44: from the network losses discussed in S41, S42, S43, the optimization objective of our proposed model can be expressed as

S5: after the model training is finished at S4, the test data is predicted using the feature extractors G1, G2 and the classification network C2.

(III) beneficial effects

The invention provides an unsupervised carbon dioxide emission monitoring method based on deep transfer learning. The beneficial effects are as follows:

the unsupervised carbon dioxide emission monitoring method based on deep transfer learning can train out a model under the condition of lacking a target domain label. Compared with the traditional countermeasure field self-adaptive method, the method adopts a double-flow structure, simultaneously focuses on the influence of edge distribution difference and the difference of conditional distribution, and represents the relative importance of feature mobility and separability through balance factors.

Drawings

Figure 1 is a schematic diagram of the model building of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention discloses an unsupervised carbon dioxide emission monitoring method based on deep transfer learning, which comprises the following steps:

1) Collecting data;

2) Preprocessing data;

3) Setting up a model;

4) Training a model;

5) Testing a model;

the method comprises the steps of 1) obtaining first carbon emission data corresponding to first carbon emission equipment and second carbon emission data corresponding to second carbon emission equipment, wherein the types of the first carbon emission equipment and the second carbon emission equipment are different; taking the first carbon emission device as source domain data and the second carbon emission device as target domain data; collecting source domain and target domain data to obtain labeled source domain data < X ^s ，Y ^s > and unlabeled target domain data X ^t Wherein X represents data and Y represents its corresponding tag; taking a carbon emission data set of a power plant as an example, data such as temperature, humidity, coal consumption and the like at a certain sampling time form a feature vector, wherein one feature vector is a sample X, X is a transverse quantity of d dimension, d represents the acquired sample, Y is a scalar, namely a label corresponding to the sample, represents the concentration of carbon dioxide, and a marked sample set { (X) can be obtained through data collection for a period of time _i ，y _i ) Distinguishing between different models of boilers can result in a set of labeled samples for each model of boiler.

The step 2) includes performing normalization processing on the source domain data and the target domain data, so as to eliminate the influence of the problems of different quantity levels, different data value ranges, insignificant data trend and the like in the original data samples on model training, and improve model accuracy and model training speed.

The step 3) comprises the following parts:

the model network adopts a double-flow structure and mainly comprises two characteristic extraction neural networks G1 and G2; two tag classifiers C1, C2, C1 are primary classifiers, C2 is a final classifier; a challenge domain discriminator D, wherein D comprises a global discriminator G _d Local discriminatorK is the data category number; and a distributed difference explicit measurement module, wherein the method for building the model comprises the following steps:

31 Selecting an appropriate network asFeature extractor for inputting labeled source domain data and unlabeled target domain data into G1, G2, and outputting features of fs1, ft1, fs2, ft2 via G1, G2, wherein fs1, ft1 respectively represent X ^s Through the output characteristics of G1, X ^t Outputting the characteristic through G1; fs2, ft2 denote X respectively ^s Through the output characteristics of G2, X ^t Outputting the characteristic through G2; g1, G2 may employ a Resnet, VGG, multiple CNN networks, etc.

32 C1 and C2 are conventional label classifiers such as neural networks and support vector machines; for classifying the data. Training a label classifier by adopting source domain labeled data, and if cross entropy loss training is adopted, the general expression of label classifier loss is as follows:

D _s representing source domain data, n _s The number of source domain data is represented,x represents _i Probability of belonging to class k, C _y Representing a tag classifier, G _f Representing the feature extractor.

33 Fs1, ft1 as input to the challenge domain discriminator D capable of reducing edge distribution differences between source domain and target domain data by discriminating whether the input features are from the source domain or the target domain; a common domain discriminator consists of a multi-layer perceptron and a Softmax function; the source domain data is marked as 1, the target domain data is marked as 0, if the input of a sample is output, the sample is from the source domain or the target domain, and the loss value of the domain discriminator is calculated according to the actual result and the predicted value. If trained with a cross entropy loss function, the loss of the challenge domain discriminator can be expressed as:

x∈X ^s ∪X ^t m represents the number of samples of one batch, d ⁱ A domain label representing the i-th sample,represents the output of the ith sample through D, θ _G1 ，θ _d The parameters in G1 and D are shown respectively.

Wherein the global domain discriminator G _d The loss can be expressed as:

D _s representing source domain data, D _t Representing target domain data, n _s ，n _t Respectively represent D _s ，D _t Data, L _ce Representing cross entropy loss as a loss function of the domain classifier,

is a domain discriminator, < >>Cross entropy loss of class k corresponding to the domain discriminator,/->Is X _i Predicted as a probability distribution of k classes.

local area discriminatorExpressed as:

sample representing class k in source domain and target domain, respectively,/->Representing the loss of the local subfield discriminator on class k. Finally, the dynamic challenge factor ω is expressed as:

θ _f ，θ _y ，θ _d ，respectively represent G1, C1, G _d ，/>Wherein the value of ω is self-calculated over the network.

34 A domain discriminator guarantees the mobility contained by the feature, but paying too much attention to the mobility of the data can lead to a decrease in the mobility of the class in the data, introducing a balancing factor for balancing its mobility with the mobility:

wherein S is _b Is an inter-class scattering matrix, S _w Is an intra-class scattering matrix. Clearly, a larger maxJ (W) means better separability.

The balance factor is defined as follows:

of which smallerIndicating a better domain alignment, smaller +.>Indicating better class authenticability.

35 In combination with 31) 32) 33) 34) above, the loss of the final superstructure is defined as:

where τ and ω are both parameters that the network has self-calculated.

36 In the lower layer structure, we choose to use the hilbert space embedding of the joint distribution to measure the difference of two joint distributions P and Q, transfer the distribution in one domain into the Regenerated Kernel Hilbert Space (RKHS), and obtain the joint probability distribution loss by directly calculating the MMD distance of the source domain and the target domain in the RKHS:

P _S (x _s ，y _s )，P _T (x _T ，y _T ) Representing the joint probability distribution of the source domain and the target domain respectively,respectively represent D _s ，D _t Features in RKHS corresponding to the ith data in (a)>Respectively represent D _s ，D _t Class labels corresponding to the ith and j data in the database.

The step 4) includes:

41 Upper layer structure, X is as follows ^s ，X ^t As G1 input, training G1 and using resistance trainingObtaining optimal parameters; because the target domain does not contain a label, only training C1 by adopting the source domain data, using the trained C1 for the prediction of the target domain data category, and taking the output of C1 as a pseudo label of the target domain data

The training loss for C1 is as follows:

bond 35) can result in loss in the superstructure:

42 Will X in the lower layer structure ^s ，X ^t As input of G2, the feature Z extracted by G2 is obtained ^s ，Z ^t ，Z ^s ，Z ^t Respectively X ^s ，X ^t The output characteristics obtained through G2; by < Z ^s ，Y ^s ＞，Calculate L _jmmd ；

43 To integrate the migration ability of G1, G2 after training, X is calculated ^s The outputs of G1 and G2 are fused, and the fused features are used as the input of C2 for training, and the training loss of C2 is expressed as follows:

44 According to the network loss discussed above under 41) 42) 43), the optimization objective of our proposed model can be expressed as

After the training is finished, the feature extractors G1 and G2 and the classification network C2 are used to predict the test data in step 5.

It should be noted that, in the description of the present invention, the positional or positional relation indicated by the terms such as "upper", "lower", "left", "right", "front", "rear", etc. are merely for convenience of describing the present invention based on the description of the present invention shown in the drawings, and are not intended to indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

The terms "first" and "second" in this technical solution are merely references to the same or similar structures, or corresponding structures that perform similar functions, and are not an arrangement of the importance of these structures, nor are they ordered, or are they of a comparative size, or other meaning.

In addition, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., the connection may be a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two structures. It will be apparent to those skilled in the art that the specific meaning of the terms described above in this application may be understood in the light of the general inventive concept in connection with the present application.

Claims

1. An unsupervised carbon dioxide emission monitoring method based on deep transfer learning comprises the following steps:

s1: preprocessing data;

s2: preprocessing data;

s3: setting up a model;

s4: training a model;

s5: testing a model;

the method is characterized in that:

the S1 comprises the step of obtaining a first carbon row corresponding to first carbon emission equipmentPlacing data and second carbon emission data corresponding to a second carbon emission device, wherein the first carbon emission device and the second carbon emission device are different in model; taking the first carbon emission device as source domain data and the second carbon emission device as target domain data; acquiring source domain and target domain data to obtain labeled source domain data<X ^s ，Y ^s >Label-free target domain data X ^t Wherein X represents data and Y represents its corresponding tag; taking a carbon emission data set of a power plant as an example, data such as temperature, humidity, coal consumption and the like at a certain sampling time form a feature vector, wherein one feature vector is a sample X, X is a transverse quantity of d dimension, d represents the acquired sample, Y is a scalar, namely a label corresponding to the sample, represents the concentration of carbon dioxide, and a marked sample set { (X) can be obtained through data collection for a period of time _i ，y _i ) Distinguishing the boilers of different models, and obtaining a labeling sample set corresponding to each model of boiler;

the S3 comprises a model network adopting a double-flow structure, wherein the model network adopting the double-flow structure comprises two characteristic extraction neural networks G1 and G2; two tag classifiers C1, C2, C1 are primary classifiers, C2 is a final classifier; a challenge domain discriminator D, wherein D comprises a global discriminator G _d Local discriminatorK is the data category number; and a distributed difference explicit measurement module, wherein the method for building the model comprises the following steps:

s31: selecting proper network as feature extractor, inputting labeled source domain data and unlabeled target domain data into G1, G2, outputting fs1, ft1, fs2, ft2 via G1, G2, and obtaining the final productFs1, ft1 respectively represent X ^s Through the output characteristics of G1, X ^t Outputting the characteristic through G1; fs2, ft2 denote X respectively ^s Through the output characteristics of G2, X ^t Outputting the characteristic through G2; g1, G2 may employ one of a Resnet, VGG, and multiple CNN networks;

wherein the global domain discriminator G _d The loss can be expressed as:

D _s representing source domain data, D _t Representing target domain data, n _s ，n _t Respectively represent D _s ，D _t The data of the data are stored in a memory, L (L) _ce Representing cross entropy loss as a loss function of the domain classifier;

the local domain discriminator is subdivided into K domain discriminatorsk=1, 2, …, K, each class discriminator is responsible for matching the source domain data and the target domain data associated with class K, the partitioning on the target domain being based on the pseudo tag generated by the tag classifier. The loss function of the local area discriminator may be calculated as:

is a domain discriminator, < >>Cross entropy loss of class k corresponding to the domain discriminator,/->Is X _i A probability distribution predicted as k classes;

local area discriminatorExpressed as:

respectively represent G1, C1, G _d ，/>Wherein the value of ω is self-calculated over the network;

The balance factor is defined as follows:

wherein τ and ω are parameters calculated by the network itself;

P _S (x _s ，y _s )，P _T (x _T ，y _T ) Representing the joint probability distribution of the source domain and the target domain respectively,

respectively represent D _s ，D _t Features in RKHS corresponding to the i-th data,

respectively represent D _s ，D _t Class labels corresponding to the ith and j data;

combining S35 can result in loss in the superstructure:

s42: x in the lower layer structure ^s ，X ^t As input of G2, the feature Z extracted by G2 is obtained ^s ，Z ^t ，Z ^s ，Z ^t Respectively X ^s ，X ^t The output characteristics obtained through G2; by means of<Z ^s ，Y ^s >,Calculate L _jmmd ；