CN117805658A

CN117805658A - Data-driven electric vehicle battery remaining life prediction method

Info

Publication number: CN117805658A
Application number: CN202410225854.3A
Authority: CN
Inventors: 毕远国; 李莹; 付饶
Original assignee: 东北大学
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2024-04-02
Anticipated expiration: 2044-02-29

Abstract

The invention belongs to the technical field of computer application, and discloses a data-driven electric automobile battery remaining life prediction method which comprises a dual-task battery remaining life prediction model and a cross-structure knowledge distillation network based on countermeasure learning. The dual-task battery residual life prediction model comprises a backbone network and a regression-classification dual-task branch network; the backbone network is of a parallel structure and comprises a time dimension feature extraction module and a parameter dimension feature extraction module, wherein the output of the time dimension feature extraction module is fused with the output of the parameter dimension feature extraction module; the regression-classification dual-task branch network comprises an attention feature selection sub-network and a joint loss function based on homodyne uncertainty; the attention feature selection sub-network is a RUL prediction branch network and a condition recognition network. The method has potential application prospect in the field of prediction of the residual life of the battery of the electric automobile, can improve the accuracy and efficiency of prediction, and provides support for intelligent management and control of the electric automobile.

Description

Data-driven electric vehicle battery remaining life prediction method

Technical Field

The invention relates to the technical field of computer application, in particular to a data-driven method for predicting the residual life of a battery of an electric automobile.

Background

In recent years, as the number of electric vehicles continues to increase, the number of safety accidents related to electric vehicles also continues to increase. As the battery ages and irreversibly decays, conditions of reduced available capacity, increased internal resistance, reduced peak power capability, and deteriorated electrical performance occur. The result of the electric automobile battery remaining life prediction research can help to improve the state monitoring and safety control of the battery system, quickly evaluate the reliability and safety of the battery pack, further prolong the service life of the battery system and improve the utilization efficiency of energy sources. In addition, the accurate and reliable battery residual life prediction can improve the service efficiency of the battery, reduce the maintenance cost and improve the economical efficiency of the electric automobile. The complexity of the battery degradation mechanism and the influence of the operation conditions present challenges for the accuracy and generalization of the residual life prediction model. In addition, the complexity of the model is also a problem to consider in order to obtain real-time predictions.

The battery charge-discharge cycle period data is multivariate time series data. Most of the existing researches only aim at extracting characteristic information in a single parameter dimension or time dimension, but the characteristics of the two dimensions are essential for mining degradation information in battery data. The characteristics of the parameter dimension can reflect the interaction effect among the battery parameters, and the characteristics of the time dimension can reflect the characteristics of the change of each parameter of the battery along with time. Therefore, the deep learning model can better learn the degradation modes in the battery data and predict the life of the battery by comprehensively considering the characteristics of the parameter dimension and the time dimension. In summary, the problem analysis is performed from three points of entry of the multi-task learning, the time dimension feature extraction, and the parameter dimension feature extraction, respectively:

(1) In multitasking learning, multitasking learning is a machine learning method that can process a plurality of tasks simultaneously, and by training, not only sharing characteristics of a plurality of related tasks but also unique characteristics of other tasks can be learned, and has been applied in many fields including health management. For example, "Liu R, yang B, hauptmann a g, simultaneous bearing fault recognition and remaining useful life prediction using joint-loss convolutional neural network [ J ]. IEEE Transactions on Industrial Informatics, 2019, 16 (1): 87-96." proposes a joint loss convolutional neural network architecture that achieves bearing failure identification and residual life prediction by sharing parameters and partial networks, and introduces penalty factors in the loss function to control the weights of the two tasks; "Liu Z, wang H, liu J, et al Multitask learning based on lightweight 1DCNN for fault diagnosis of wheelset bearings[J ]. IEEE Transactions on Instrumentation and Measurement, 2020, 10010-10023" propose a one-dimensional convolutional neural network model that can handle three tasks simultaneously, failure diagnosis, speed recognition, load recognition, and two weight coefficients are introduced into the loss function to weight and sum the losses of the three tasks to balance the three tasks.

(2) The main challenge in terms of the time dimension feature extraction problem is whether the time dependence of different range lengths can be successfully captured. Time series often exhibit short-term and long-term repetitive patterns, taking into account which are critical for accurate predictions, where the difficulty of handling long-range dependent terms is greater. The self-attention-based transducer model shows great potential on long-term dependence of processing time series data, compared with the model based on long-time short-time memory, little research has been tried to introduce the transducer into the health management field, "Chen D, hong W, zhou x. Transformer network for remaining useful life prediction of lithium-ion batteries [ J ]. IEEE Access, 2022, 10:19621-19628," a residual life prediction model based on the transducer and denoising encoder is proposed for the problems of noise and long-term dependence of battery capacity sequences; "Liu L, song X, zhou Z. Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture [ J ]. Reliability Engineering & System Safety, 2022, 221: 108330." proposes a dual attention based aeroengine life prediction framework that uses a transducer to focus attention on key time-step features.

(3) In terms of parameter dimension feature extraction problems, "Jiang Y, dai P, fang P, et al electric-STGCN: an Electrical spatio-temporal graph convolutional network for intelligent predictive maintenance [ J ]. IEEE Transactions on Industrial Informatics, 2022, 18 (12): 8509-8518." demonstrates that mining the internal relationship between Electrical properties such as voltage and current facilitates predictive maintenance; "Liu H, liu Z, jia W, et al A novel deep learning-based encoder-decoder model for remaining useful life prediction [ C ]//2019 International Joint Conference on Neural Networks, 2019:1-8." proposes a deep neural network-based encoder-decoder model that models the dependency of parameters of different sensors using convolutional neural networks.

In combination with the above research analysis, the research data-driven method for predicting the residual life of the battery of the electric vehicle mainly focuses on the following three aspects:

(1) Balance problem between tasks in a dual task framework: although there may be many shared features between the two tasks, the differences between them are not negligible. The output of the battery remaining life prediction task is a real value representing the number of cycles, while the output of the condition recognition task is a class label, the two task loss scales are different, which may result in a greater bias towards one of the tasks during training. Most multitasking models directly combine the loss functions of subtasks by using a simple weighting mode, the model performance is greatly dependent on the value of the weight hyper-parameter, and the experimental link needs to perform grid search to determine the value, which leads to a large number of repeated experiments.

(2) Long-term dependency problem in long-sequence data time-dependent relation feature extraction: the current state of health of the battery is affected by the historical state of health, i.e. depends on real-time data and historical data, and therefore the state of health of the battery cannot be analyzed by only a single cycle of data, but rather a relatively long sequence of cycles is required for analysis, which means that the long-term dependency of the state of health of the battery in the time dimension is not negligible.

(3) The relation characteristic among the multiple sequence data parameters is difficult to extract: the existing feature extraction research aiming at the dimension of the multi-element time sequence parameters mostly focuses on how to calculate the importance of each parameter, but ignores the interaction influence among different parameters, and the relation among the attributes of the battery becomes more complex along with the time, so that the modeling is difficult to carry out by a common deep learning method.

Disclosure of Invention

Aiming at the problems, the invention provides a data-driven method for predicting the residual life of the battery of the electric automobile.

The technical scheme of the invention is as follows: a data-driven method for predicting the residual life of an electric automobile battery comprises a dual-task battery residual life prediction model and a cross-structure knowledge distillation network based on countermeasure learning;

The dual-task battery residual life prediction model comprises a main network for extracting dual-task sharing characteristics and a regression-classification dual-task branch network; the backbone network for extracting the double-task shared features is of a parallel structure and comprises a double-attention-transducer-based time dimension feature extraction module and a graph-attention-network-based parameter dimension feature extraction module, wherein the double-attention-transducer-based time dimension feature extraction module comprises a position coding, a scaling dot product global self-attention mechanism and a local self-attention mechanism; the input of the time dimension feature extraction module based on the double-attention transducer sequentially passes through a position coding, a scaling dot product global self-attention mechanism and a local self-attention mechanism to obtain the output of the time dimension feature extraction module; the parameter dimension feature extraction module based on the graph attention network comprises three graph attention network layers and a residual error connection module; fusing the output characteristics of the parameter dimension characteristic extraction module and the output of the time dimension characteristic extraction module based on the graph attention network through a characteristic fusion layer;

the regression-classification dual-task branch network comprises an attention feature selection sub-network and a joint loss function based on homodyne uncertainty; the attention feature selection sub-network is a residual life prediction branch network and a working condition identification network respectively, and shared features and unique features between residual life prediction tasks and working condition identification tasks are respectively mined; the joint loss function based on the homodyne uncertainty is used for balancing the loss of the residual life prediction branch network and the working condition identification branch network in the training process;

The cross-structure knowledge distillation network based on the countermeasure learning comprises a teacher network and a student network; after the training of the double-task battery residual life prediction model is completed, a time dimension feature extraction module, a parameter dimension feature extraction module, a feature fusion layer and a residual life prediction branch network are used as teacher networks; establishing a full-connection layer and a feature extraction module based on expansion causal convolution as a student network, wherein the student network obtained through distillation is a lightweight prediction model, and parameters in the lightweight prediction model are further reduced through a parameter sharing mechanism; the method comprises the steps of adding contrast loss to update student network parameters through antagonism learning to align characteristic distribution of a residual life prediction model and a lightweight prediction model of a dual-task battery, and promoting characteristic distribution alignment; and aligning the predicted result of the lightweight predicted model with the predicted result and the actual residual life value of the predicted model of the residual life of the dual-task battery through knowledge distillation.

The time dimension feature extraction module based on the dual-attention transducer comprises the following steps,

for input dataInjecting relative position marks, wherein the position codes adopt sine functions and cosine functions with different frequencies as coding functions;

（1）

Wherein,representing that the battery charge-discharge cycle is located at the position of the total cycle of the battery life decay, +.>Representing the length of a charge-discharge cycle data +.>，/>Representing the position of an element in a charge-discharge cycle data,/->Representing the position-coded input data;

the scaling dot product global self-attention mechanism comprises a dot product and normalization operation; the input data after position coding is overlapped to form a matrix, and is duplicated to form three matrices, which are respectively named asThe method comprises the steps of carrying out a first treatment on the surface of the First, calculate matrix->And->Is then used +.>The function normalizes and scales the attention weight coefficient, and finally the normalized attention weight coefficient and the matrix are +.>The multiplication results in a scaled dot product global self-attention output, and the calculation process is expressed as follows:

（2）

the output of the local self-attention mechanism is calculated as follows:

（3）

wherein,represents the distance between each cycle, +.>Is a scale factor.

The parameter dimension feature extraction module based on the graph attention network is specifically as follows:

using undirected graphsTo represent a battery data sample, wherein +.>For adjacency matrix, including neighbor sets representing all nodes in undirected graph, +.>Representing the set of all parameters in a battery charge-discharge cycle data sample, the number of all parameters being expressed as +. >First->The individual monitoring parameters are denoted->The method comprises the steps of carrying out a first treatment on the surface of the Vertex->And->The edge between them represents the parameter->And->Is used for the dependence of (a) on (b),representing undirected graph->Is a vertex feature matrix of (a);

the construction process of the graph data is divided into candidate relation selection and vertex similarity calculation; in candidate relationship selection, a priori information is represented as each vertexIs->The prior information is the vertex ++>Other parameters of interaction:

（4）

in the vertex similarity calculation, the vertices are obtained by the following calculationIs a neighbor of:

（5）

wherein,representing vertex->Neighbor set of->The function is used to return vertex->Neighbor set of->Representing a selection radius; />Representing a similarity metric function; integrating neighbor sets of all nodes to +.>In (3) an adjacent matrix of the undirected graph is formed.

The calculation mode of the three graph meaning network layers is described as follows by the formula (6):

（6）

wherein,parameter matrix representing three layers of the graphical network, +.>Andan attention computation function corresponding to the three graph attention network layers; the calculation mode of the residual connection module is described as follows by the formula (7):

（7）

wherein,output representing the first graph attention network layer,/->Output representing the third figure attention network layer,/- >Representing the output features of the graph-attention-network-based parameter dimension feature extraction module.

The graph attention network layer realizes the rolling in undirected graph through a design graph attention mechanismPerforming product operation; the representation of each vertex is calculated as a weighted sum of the representations of its neighboring vertices, where the weights are represented by the attention coefficients, at the thDuring layer forward propagation, any vertex +>Output of +.>The expression is as follows:

（8）

wherein,representation->Activating function->Is vertex->Neighbor vertices of->To measure neighbor vertex +.>For vertex->The attention score of the contribution is calculated as follows:

（9）

wherein,representation->And->Is operated in series (I)>Is a column vector of a learnable parameter, and LeakyReLU is a nonlinear activation function.

The output of the time dimension feature extraction module based on the dual-attention transducer and the output of the parameter dimension feature extraction module based on the graph attention network are called as shared featuresThe method comprises the steps of carrying out a first treatment on the surface of the Regression-classification of a dual-tasked branched network to share features +.>As an input, RUL regression prediction results +.>And condition classification recognition result->Wherein->Representing the network propagation process of the RUL predictive branch network, < >>Network propagation procedure representing a condition-identifying branch network, +. >Parameters representing the RUL predicted branch network, +.>Parameters representing the condition recognition branch network;

the attention ofThe network structure of the feature selection sub-network comprises a feature selection layer, a full connection layer and an output layer; attention feature selection sub-network based on shared featuresGenerating a attention weight matrix, weighting each of the shared features, expressed by formulas (10) - (12):

（10）

（11）

（12）

wherein,represents the global feature weighted by attention, < ->Representing a concentration weight matrix,/->Represents the attention weight matrix after normalization, +.>And->Representing a matrix of learnable parameters;

the probability function for the condition recognition and the log likelihood thereof are as follows:

（13）

wherein,indicating that the output in the condition recognition task belongs to +.>Probability of class;

the global joint loss function based on the homodyne uncertainty is defined as follows:

（14）

wherein,parameters representing a predictive model of the remaining life of a bicuspid battery, for example>Is the uncertainty variance of the RUL prediction task; />Is the output of the RUL prediction task, +.>Is the output of the condition recognition task, +.>Representing a true tag value; />Is a scaling factor;

the global joint loss function is defined as the maximum likelihood estimate of the dual-task joint distribution:

（15）

Wherein,is in the RUL prediction taskRoot mean square error between predicted tag and real tag, +.>Cross entropy of the task for identifying the working condition, +.>Representing the relative weights of the RUL prediction tasks, < ->Representing the relative weights of the condition recognition tasks, +.>Is a regular term.

The cross-structure knowledge distillation network based on the countermeasure learning is specifically as follows:

during distillation, teacher's networkThe parameters of (a) remain unchanged, knowledge is directed along the teacher's network to the student's network +.>Transferring, namely continuously updating parameters of the student network; designing a discriminator network->For simulating countermeasure learning;

the teacher network and the student network respectively divide a feature extractor and a regressor; teacher network feature extractorRegressor of teacher network for extracting the backbone network of the dual-task sharing feature>Predicting a branching subnetwork for the RUL; student model feature extractor->To extract the module based on the convolution characteristic of the expansion cause and effect, the student networkRegression of->Is a full connection layer; give a +.>Training set of individual samples->The teacher network feature extractor and the student network feature extractor are used for extracting +.>Input sample of individual model->Is respectively expressed asAnd->The output of the teacher network RUL prediction task is denoted +. >The output of the student network RUL prediction task is expressed as +.>；

The student network learning process is divided into characteristic knowledge distillation and response knowledge distillation;

the characteristic knowledge distillation calculates loss termsAnd loss item->Item alignment teacher network feature extractor>And student model feature extractor->Is a feature extraction capability of (1); loss item->Representative student feature extractor->And discriminator network->Identifier network during gaming>Classification loss of (c); loss item->Representing the loss of contrast obtained by calculating the distance between each feature and its positive and negative samples;

the response knowledge distills through the loss termAligning the feature extractor of the student network with the feature extractor of the teacher network, losing the term +.>Aligning a regressor of the student network with a teacher network regressor; loss item->Output representing teacher network>Output from student network->Loss between, loss term->Output representing student network->With real labelsAnd loss between them.

The student network structure comprises an input layer, a convolution layer and an output layer, wherein the convolution layer is a characteristic extractor of the student network and comprises two residual blocks and is used for aligning the student network and a teacher networkThe forward propagation process of the convolutional layer is represented as follows:

（16）

Each residual block comprises two dilation causal convolutional layers, two gating active layers, a batch normalization layer, a Dropout layer anda convolution layer; applying a one-dimensional expansion causal convolution layer to two-dimensional battery cycle charge and discharge period data in a time dimension, and respectively processing each parameter sequence data by using a one-dimensional convolution kernel; meanwhile, a parameter sharing mechanism is adopted, and convolution weights at the same stage are shared among a plurality of one-dimensional expansion causal convolution layers; during training, after the loss function is calculated, the update weights and offsets are back-propagated.

The invention has the beneficial effects that: the dual-task battery residual life prediction model adopts a backbone network with a parallel structure, and can efficiently extract and fuse the characteristics of different dimensionalities by combining a time dimensionality characteristic extraction module based on a dual-attention transducer and a parameter dimensionality characteristic extraction module based on a graph attention network. The method can more fully understand and analyze the state of the battery, thereby improving the accuracy and reliability of the residual life prediction. The invention effectively excavates the sharing characteristic and the unique characteristic between the residual life prediction task and the working condition identification task through the attention characteristic selection sub-network. Meanwhile, a joint loss function based on the homodyne uncertainty is adopted, so that loss balance of the residual life prediction branch network and the working condition identification branch network in the training process is realized, and generalization capability and robustness of the model are improved. The complex dual-task battery residual life prediction model is converted into a light student network model through a cross-structure knowledge distillation network based on countermeasure learning. The method not only reduces the calculation burden of the model, but also is convenient for the deployment and practical application of the model, especially in the environment with limited calculation resources. In the lightweight predictive model, the number of parameters of the model is further reduced by introducing an expansion causal convolution network and implementing a parameter sharing mechanism. The method not only improves the running efficiency of the model, but also is beneficial to reducing the risk of overfitting and improving the generalization capability of the model.

Drawings

Fig. 1 is an overall network architecture diagram of a dual-tasked battery remaining life prediction model.

Fig. 2 is a network configuration diagram of the time dimension feature extraction module.

Fig. 3 is a network configuration diagram of the parameter dimension feature extraction module.

Fig. 4 is a diagram of a knowledge distillation process.

Fig. 5 is a diagram of a student network structure.

Detailed Description

In order to improve the accuracy and generalization of the model under different working conditions, the invention adopts a dual-task battery residual life prediction model based on a graph attention network and a dual-attention transducer. Firstly, a working condition identification task is introduced, and a regression-classification double-task framework is constructed so as to mine the characteristics of different working conditions. The framework utilizes a backbone network and a regression-classification dual-task branch network, simultaneously mines shared features and unique features between two tasks, and balances the two tasks by constructing a joint loss function based on homodyne uncertainty. Secondly, in order to fully mine degradation information in battery data and improve accuracy of a prediction model, a backbone network is designed into a parallel structure, and the feature extraction can be carried out from parameters and time dimensions of the battery data simultaneously, wherein the feature extraction module comprises a time dimension feature extraction module based on a dual-attention transducer and a parameter dimension feature extraction module based on a graph attention network. In order to obtain a lightweight and easy-to-deploy battery residual life prediction model, the invention provides a cross-structure knowledge distillation method based on countermeasure learning, which transfers prediction capacities based on a graph attention network and a dual attention transducer to a simpler target model. The method comprises the following two steps: first, sample-level feature alignment is facilitated by counterlearning feature distributions that align a dual-tasked battery remaining life prediction model and a lightweight prediction model, and adding a contrast penalty. The target model predictions are then aligned to the original model and the true remaining life values by knowledge distillation. In addition, the invention also designs a feature extraction module based on the dilation causal convolution, and a parameter sharing mechanism is used for further reducing network parameters. The method has potential application prospect in the field of prediction of the residual life of the battery of the electric automobile, can improve the accuracy and efficiency of prediction, and provides support for intelligent management and control of the electric automobile.

The invention provides a data-driven method for predicting the residual life of a battery of an electric vehicle, and an overall network structure diagram is shown in figure 1. Firstly, the invention analyzes the existing problems that the current mainstream target detection algorithm can not simultaneously meet the requirements of accuracy and generalization under different working conditions, and ignores the limitation on memory or computing capacity in a realistic deployment scene. The battery remaining life prediction method based on the model can obtain certain accuracy, but has the limitations of low dynamic accuracy and poor universality. The prediction performance can be greatly improved by a fusion method based on the model and the data, however, the method has the limitations of high calculation cost, fusion uncertainty and the like. Therefore, the current mainstream battery remaining life prediction algorithm cannot meet the requirements of accuracy and generalization under different working conditions and the limitations on memory or computing power at the same time. Therefore, the invention makes the model lighter and easier to deploy as much as possible and reduces network parameters on the premise of meeting the accuracy and generalization of the prediction model under different working conditions, so as to adapt to the limitation of memory or computing capacity in the actual deployment scene.

The invention provides a double-task battery residual life prediction model. A regression-classification dual-task framework as in fig. 1 was first constructed. A backbone network for extracting the dual-task sharing characteristics and a branch network for respectively carrying out the working condition identification task and the residual life prediction task are designed. In order to learn the characteristics specific to the subtasks, an attention selection module is added in the branch network as an attention characteristic selection sub-network to screen the global characteristics of the main network, and in addition, a joint loss function based on the uncertainty of the homodyne is designed to realize the automatic balance of the two tasks. Thereafter, as shown in fig. 2, a time dimension feature extraction module based on a dual-attention transducer is designed. The advantage of a transducer in processing time series data is utilized to extract features in the time dimension. In addition, the close cycle time in the battery data is more closely related, but the scaling dot product self-attention mechanism used by the classical transducer framework can cause insufficient consideration of local dependency, so the invention designs a time dimension feature extraction module of the dual-attention transducer combining the scaling dot product self-attention mechanism and the local self-attention mechanism. Finally, as shown in fig. 3, a parameter dimension feature extraction module based on a graph attention network is designed. The dependency between battery parameters is unstructured and conventional deep learning methods are not applicable to such unstructured data. Therefore, the invention constructs the graph data of the original battery charge-discharge cycle period data book, and uses the advantages of the graph neural network in processing unstructured data to extract the characteristics from the parameter dimension.

The invention designs a cross-frame knowledge distillation (Distilling Knowledge, KD) method based on countermeasure learning. First, a cross-structure knowledge distillation method based on countermeasure learning was designed as shown in fig. 4. The feature distribution between the student network and the teacher network is automatically aligned using the idea of generating an antagonistic study. However, this process can only align the global feature distribution without considering the fine-grained features, and to alleviate this problem, a contrast penalty is used to measure the example alignment between the student network and the teacher network, maximizing the alignment of the alignment teacher network and the student network feature distribution. A student network was then designed, as shown in fig. 5, comprising a fully connected layer and a feature extraction module based on the dilation causal convolution TCN. The development of technical tools such as CUDA, tensorRT and NCNN enables CNN to be well supported by hardware on a server and edge equipment. CNNs can achieve efficient time-series data processing through simple network structure and parameter sharing techniques. Compared with the common CNN, the TCN has more advantages in processing time series data, so that the invention adopts a feature extraction module based on the expansion causal convolution TCN, and respectively processes each parameter series data by a one-dimensional convolution kernel through a parameter sharing mechanism to further reduce the number of model parameters.

Finally, experimental analysis is carried out on the model provided by the invention, and the residual life prediction model provided by the invention is compared and analyzed with a plurality of comparison models through three indexes of average absolute error, root mean square error and scoring function, so that the experimental result shows that the residual life prediction model provided by the invention is excellent in the aspects of prediction performance and generalization performance. Meanwhile, the effectiveness of the cross-structure knowledge distillation method based on the countermeasure learning is verified, and the prediction capability of the residual life prediction model is maintained while model compression is achieved.

The following describes the present invention in detail.

The method of the present embodiment is as follows: the operating system is windows 10 and the deep learning framework is pytorch.

Step one: realizing the content of each innovation part.

First, the present invention constructs a regression-classification dual-tasking framework for battery remaining life prediction. The framework comprises a backbone network for extracting the double-task sharing characteristics and two branch networks for respectively executing the working condition identification task and the residual life prediction task. In order to learn subtask specific features, an attention selection module is introduced in the branch network to screen global features of the backbone network. To achieve automatic balancing of the two tasks, a joint loss function based on the homodyne uncertainty is designed. In order to extract the features from the time dimension, a time dimension feature extraction module based on a dual-attention transducer is designed, and the advantage of the transducer in processing time series data is utilized to extract the related features. In order to better consider the connection between similar periods in battery data, a DA-transducer module is provided in combination with a scaling dot product self-attention mechanism and a local self-attention mechanism, and a parameter dimension feature extraction module based on a graph attention network (GAT) is designed in feature extraction in parameter dimension. Because the dependency between battery parameters is unstructured, conventional deep learning methods have difficulty in efficiently processing such unstructured data. Therefore, the invention utilizes the advantages of the graph neural network to extract the characteristics in parameter dimension by constructing the graph data among the battery charge-discharge cycle period data. Through the combination of the above methods and modules, the present invention overcomes some challenges in battery remaining life prediction, including feature sharing and balancing between tasks, and feature extraction in the time dimension and parameter dimension.

And a second step of: the invention innovatively designs a cross-structure knowledge distillation method based on countermeasure learning, which is used for realizing knowledge transfer among different network structures. The method adopts a mechanism for generating antagonism learning, and the student network can gradually distill and extract knowledge from the teacher network and gradually optimize the prediction capability of the student network by establishing a antagonism training process between the student network and the teacher network. To account for the alignment of fine-grained features, the present method introduces a contrast penalty for measuring the similarity of features between the student network and the teacher network at the sample level. By contrasting the losses, the student network is encouraged to better simulate the teacher's network's representation of features, thereby delivering its knowledge more accurately. The process ensures the alignment of global feature distribution, and ensures that the features between the student network and the teacher network can be kept consistent better on a sample level, thereby effectively improving the effect of cross-structure knowledge distillation.

And a third step of: the invention adopts a student network based on TCN, and uses a one-dimensional convolution kernel to process each parameter sequence data respectively through a parameter sharing mechanism, thereby further reducing the parameter quantity of the model. By adopting the TCN as the basis of the student network, the advantages of the TCN in time series data processing can be fully utilized. The TCN has a longer receptive field, and can capture a wider time dependence relationship, so that a student network can better model and predict time series data. In addition, due to the application of the parameter sharing mechanism, the student network can efficiently process each parameter sequence data, repeated parameter calculation is avoided, the parameter number of the whole model is obviously reduced, and the improvement of the light weight degree of the model is facilitated.

Step two: experimental data set.

The purpose of the dataset is to verify the detection performance of the algorithm. In order to compare with other existing researches, the invention selects a public Data Set 'Battery Data Set' provided by the national aviation and aerospace agency NASA excellent prediction research center (Prognostics Center of Excellence, PCoE) for experimental analysis, the Data Set is one of the most widely used Data sets in the field of Battery residual life prediction, the Data acquisition object is a 18650-sized lithium cobalt oxide ion Battery with the rated capacity of 2Ah, and the Battery has the characteristics of high energy density, long service life, low self-discharge rate and the like, and besides, the Battery also has good safety performance and smaller volume, and can be conveniently integrated as a basic unit of an electric automobile Battery pack. Meanwhile, in order to better analyze the experimental result, the invention adopts three performance indexes of average absolute error (Mean Absolute Error, MAE), root mean square error (Root Mean Square Error, RMSE) and Scoring Function (SF) to evaluate the performance of the residual life prediction model.

Step three: the model is trained.

In the first step, the data extracted from the dataset file cannot be used directly for training and testing of the model, and requires pre-processing, comprising a total of five steps: 1) Acquiring original data from a battery file; 2) Selecting parameter characteristics; 3) Normalizing the data; 4) Acquiring sequence data and setting a residual life label by a sliding window; 5) The training set is divided from the test set.

And secondly, performing a model verification experiment of a data-driven electric vehicle battery remaining life prediction method. Random inactivation (Dropout) techniques and early stop (early stop) strategies are used. The random inactivation technology randomly eliminates the network nodes of the training model, so that the network has certain sparsity, the generalization and the robustness of the model are improved, the early stopping strategy monitors the change of model loss of the verification set in the training process, when the model loss on the verification set is no longer converged when the continuous iteration model exceeds a certain number of times, the training is stopped, and the early stopping is timely carried out, so that the situation that the model is excessively fitted with the training set sample in the training process and the fitting condition is over is avoided, and the optimal network parameters are saved. Four-fold cross-validation experiments were performed. In each experiment, one battery pack in three working conditions is reserved as test data, and model parameters and results of all iteration rounds are saved. In the model training process, a multi-task mode is adopted, working condition identification and residual life prediction tasks are carried out simultaneously, and model parameters are updated by combining loss functions of the working condition identification and the residual life prediction tasks.

Thirdly, in order to verify the effectiveness of each module in a data-driven electric vehicle battery remaining life prediction method model, a parameter dimension feature extraction module, a time dimension feature extraction module and a working condition recognition sub-network in the model are sequentially removed on the basis of the data-driven electric vehicle battery remaining life prediction method, and are respectively marked as models M1, M2 and M3, and the effectiveness of each module in the data-driven electric vehicle battery remaining life prediction method is analyzed through an ablation experiment. After the effectiveness of a data-driven electric vehicle battery remaining life prediction method model is verified through the visual analysis and the ablation experiment, the invention carries out comparison analysis on the model and other classical models, including DeTransformer, IDMFFN and Auto-CNN-LSTM, to further prove the advantages of the model.

And fourthly, performing a cross-structure knowledge distillation method experiment based on countermeasure learning. To determine the optimal hyper-parameter value, the weighted hyper-parameters in the joint loss function are distilled over the course of the experiment against the feature knowledgeSuper-parameters in response knowledge distillation loss function>And->Negative sample selection number +.>A grid search is performed.

And fifthly, carrying out detailed experimental analysis on the prediction performance of the student model obtained by distillation to verify whether the student model can successfully fit the performance of the teacher model under the condition of low complexity, so as to prove the effectiveness of the knowledge distillation method. In order to more intuitively analyze the prediction performance of the student model, the prediction value of the student model and the real residual life value of the battery are subjected to visual analysis, and for comparison with the teacher model, a visual diagram of the prediction results of the student model and the teacher model is drawn. To further verify the effectiveness of each step in the knowledge distillation method, experimental analysis was performed on the residual life prediction results of the student model obtained by removing the contrast loss (denoted as F1) in the characteristic knowledge distillation of the original distillation method, the contrast loss (denoted as F2) in the characteristic knowledge distillation of the original distillation method, and the absence of the knowledge distillation process (denoted as F3).

And sixthly, after the effectiveness of the cross-structure knowledge distillation method based on the countermeasure learning is verified through the visual analysis and the ablation experiment, the method is compared with classical methods in the existing research, including MDER, KDnet-RUL and Fitnets methods, so as to further prove the advantages of the method. Compared with three comparison methods, the student model obtained by the cross-structure knowledge distillation method based on the countermeasure learning can obtain the optimal result on three indexes, and the knowledge distillation method provided by the invention can automatically learn richer knowledge in the manner of the countermeasure learning.

Claims

1. The data-driven method for predicting the residual life of the battery of the electric automobile is characterized by comprising a dual-task battery residual life prediction model and a cross-structure knowledge distillation network based on countermeasure learning;

the dual-task battery residual life prediction model comprises a main network for extracting dual-task sharing characteristics and a regression-classification dual-task branch network; the backbone network for extracting the dual-task shared features is of a parallel structure and comprises a dual-attention-transducer-based time dimension feature extraction module and a graph-attention-network-based parameter dimension feature extraction module; the time dimension feature extraction module based on the double-attention transducer comprises a position coding module, a scaling dot product global self-attention mechanism and a local self-attention mechanism; the input of the time dimension feature extraction module based on the double-attention transducer sequentially passes through a position coding, a scaling dot product global self-attention mechanism and a local self-attention mechanism to obtain the output of the time dimension feature extraction module; the parameter dimension feature extraction module based on the graph attention network comprises three graph attention network layers and a residual error connection module; fusing the output characteristics of the parameter dimension characteristic extraction module and the output of the time dimension characteristic extraction module based on the graph attention network through a characteristic fusion layer;

The regression-classification dual-task branch network comprises an attention feature selection sub-network and a joint loss function based on homodyne uncertainty; the attention feature selection sub-network is an RUL prediction branch network and a working condition identification network respectively, and shared features and unique features between RUL prediction tasks and working condition identification tasks are respectively mined; the joint loss function based on the homodyne uncertainty is used for balancing the loss of the RUL prediction branch network and the working condition identification branch network in the training process;

the cross-structure knowledge distillation network based on the countermeasure learning comprises a teacher network and a student network; after the training of the residual life prediction model of the double-task battery is completed, a time dimension feature extraction module, a parameter dimension feature extraction module, a feature fusion layer and an RUL prediction branch network are used as teacher networks; establishing a full-connection layer and a feature extraction module based on expansion causal convolution as a student network, wherein the student network obtained through distillation is a lightweight prediction model, and parameters in the lightweight prediction model are further reduced through a parameter sharing mechanism; the method comprises the steps of adding contrast loss to update student network parameters through antagonism learning to align characteristic distribution of a residual life prediction model and a lightweight prediction model of a dual-task battery, and promoting characteristic distribution alignment; and aligning the predicted result of the lightweight predicted model with the predicted result and the actual residual life value of the predicted model of the residual life of the dual-task battery through knowledge distillation.

2. The method for predicting remaining life of a battery of a data-driven electric vehicle according to claim 1, wherein the time dimension feature extraction module based on a dual-attention transducer is configured to input data as followsInjecting relative position marks, wherein the position codes adopt sine functions and cosine functions with different frequencies as coding functions;

（1）

wherein,representing that the battery charge-discharge cycle is located at the position of the total cycle of the battery life decay, +.>Representing the length of a charge-discharge cycle data +.>，/>Representing the position of an element in a charge-discharge cycle data,/->Representing position-coded inputEntering data;

（2）

the output of the local self-attention mechanism is calculated as follows:

（3）

Wherein,represents the distance between each cycle, +.>Is a scale factor.

3. The method for predicting remaining battery life according to claim 1, wherein the parameter dimension feature extraction module based on the graph attention network is specifically as follows:

using undirected graphsTo represent a battery data sample, wherein +.>For adjacency matrix, including neighbor sets representing all nodes in undirected graph, +.>Representing the set of all parameters in a battery charge-discharge cycle data sample, the number of all parameters being expressed as +.>First->The individual monitoring parameters are denoted->The method comprises the steps of carrying out a first treatment on the surface of the Vertex pointAnd->The edge between them represents the parameter->And->Is used for the dependence of (a) on (b),representing undirected graph->Is a vertex feature matrix of (a);

（4）

（5）

wherein,representing vertex->Neighbor set of->The function is used to return vertex->Neighbor set of->Representing a selection radius; / >Representing a similarity metric function; integrating neighbor sets of all nodes to +.>Forming an adjacency matrix of the undirected graph;

（6）

wherein,parameter matrix representing three layers of the graphical network, +.>And->An attention computation function corresponding to the three graph attention network layers; the calculation mode of the residual connection module is described as follows by the formula (7):

（7）

wherein,output representing the first graph attention network layer,/->Output representing the third figure attention network layer,/->Representing the output features of the graph-attention-network-based parameter dimension feature extraction module.

4. The method for predicting remaining battery life as recited in claim 3, wherein the graph attention network layer performs convolution operation on the undirected graph through a design graph attention mechanism; the representation of each vertex is calculated as a weighted sum of the representations of its neighboring vertices, where the weights are represented by the attention coefficients, at the thDuring layer forward propagation, any vertex +>Output of (2)The expression is as follows:

（8）

wherein,representation->Activating function->Is vertex->Neighbor vertices of->To measure neighbor vertex +.>To the vertexThe attention score of the contribution is calculated as follows:

（9）

5. The method according to claim 1 or 4, wherein the output of the dual-attention-Transformer-based time dimension feature extraction module and the output of the graph-attention-network-based parameter dimension feature extraction module are referred to as a shared featureThe method comprises the steps of carrying out a first treatment on the surface of the Regression-classification of a dual-tasked branched network to share features +.>As an input, RUL regression prediction results +.>And condition classification recognition result->Wherein->Representing the network propagation process of the RUL predictive branch network, < >>Network propagation procedure representing a condition-identifying branch network, +.>Parameters representing the RUL predicted branch network, +.>Parameters representing the condition recognition branch network;

the network structure of the attention feature selection sub-network comprises a feature selection layer, a full connection layer and an output layer; attention feature selection sub-network based on shared featuresGenerating a attention weight matrix, weighting each of the shared features, expressed by formulas (10) - (12):

（10）

（11）

（12）

wherein,represents the global feature weighted by attention, < - >Representing a concentration weight matrix,/->Represents the attention weight matrix after normalization, +.>And->Representing a matrix of learnable parameters;

（13）

（14）

wherein,indicating remaining life pre-charge for a dual-tasked batteryMeasuring parameters of the model, ++>Is the uncertainty variance of the RUL prediction task; />Is the output of the RUL prediction task, +.>Is the output of the condition recognition task, +.>Representing a true tag value;is a scaling factor;

（15）

wherein,is the root mean square error between the predicted tag and the true tag in the RUL prediction task,/>Cross entropy of the task for identifying the working condition, +.>Representing the relative weights of the RUL prediction tasks, < ->Representing the relative weights of the condition recognition tasks, +.>Is a regular term.

6. The method for predicting remaining battery life as claimed in claim 5, wherein the cross-structural knowledge distillation network based on the countermeasure learning is specifically as follows:

during distillation, teacher's network The parameters of (a) remain unchanged, knowledge is directed along the teacher's network to the student's network +.>Transferring, namely continuously updating parameters of the student network; designing a discriminator network->For simulating countermeasure learning;

the teacher network and the student network respectively divide a feature extractor and a regressor; teacher network feature extractorRegressor of teacher network for extracting the backbone network of the dual-task sharing feature>Predicting a branching subnetwork for the RUL; student model feature extractor->Regressor of student network for extracting module based on expansion causal convolution characteristics>Is a full connection layer; give a +.>Training set of individual samples->The teacher network feature extractor and the student network feature extractor are used for extracting +.>Input sample of individual model->Is characterized by->And->The output of the teacher network RUL prediction task is denoted +.>The output of the student network RUL prediction task is expressed as +.>；

the characteristic knowledge distillation calculates loss termsAnd loss item->Item alignment teacher network feature extractor>And student model feature extractor->Is a feature extraction capability of (1); loss item->Representative student feature extractor- >And discriminator network->Identifier network during gaming>Classification loss of (c); loss item->Representing the loss of contrast obtained by calculating the distance between each feature and its positive and negative samples;

the response knowledge distills through the loss termAligning the feature extractor of the student network with the feature extractor of the teacher network, losing the term +.>Aligning a regressor of the student network with a teacher network regressor; loss item->Output representing teacher network>Output from student network->Loss between, loss term->Output representing student network->And (3) true label->And loss between them.

7. The method of claim 6, wherein the student network structure comprises an input layer, a convolution layer, and an output layer, the convolution layer being a feature extractor of the student network comprising two residual blocks and being used to align the student network and the teacher networkThe forward propagation process of the convolutional layer is represented as follows:

（16）