CN110596594A

CN110596594A - Method for predicting SOE of rail-traffic lithium battery through big data

Info

Publication number: CN110596594A
Application number: CN201910901034.0A
Authority: CN
Inventors: 常伟; 余捷全
Original assignee: Guangdong Yuxiu Technology Co Ltd
Current assignee: Guangdong Yuxiu Technology Co Ltd
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2019-12-20

Abstract

The invention relates to the technical field of rail transit maintenance, in particular to a method for predicting a rail transit lithium battery SOE through big data; it includes: the method comprises the steps of data preparation, data sorting, data characterization, target determination, data calculation, training verification and algorithm evaluation; hidden noise data are found through a special cleaning means, so that the effects of good cleaning effect, high accuracy and the like are achieved; in addition, model training and evaluation are carried out, different algorithms are selected for matching verification and release by introducing data and utilizing different models of machine learning, and the model becomes a structured product, and the prediction accuracy of the model can be continuously improved along with time accumulation and data enrichment.

Description

Method for predicting SOE of rail-traffic lithium battery through big data

Technical Field

The invention relates to the technical field of rail transit maintenance, in particular to a method for predicting a rail transit lithium battery SOE through big data.

Background

The State of remaining Energy (SOE) of the battery is defined as the percentage of the remaining Energy of the battery to the total available Energy, and refers to the ratio of the actual value to the nominal value of some directly measurable or indirectly calculated performance parameters after the battery is used for a period of time under certain conditions, and is used for judging the health and the use condition of the battery. The SOE is related not only to the electrochemical system of the battery itself and the battery manufacturing process, but also to the vehicle driving conditions and the working environment inside the battery pack.

In conventional rail transit, conventional data such as voltage and current are measured by sensors for monitoring. However, it is known that the capacity of the battery continuously decays with the increase of the charging and discharging times and the driving mileage, the reaction is a typical dynamic nonlinear electrochemical system, the internal parameters are difficult to measure in online application, and the degradation state identification and the state estimation of the battery still have great challenges.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for mining the implicit battery health state information and the evolution law thereof from the rated information and the state monitoring data (voltage, current, temperature, SOC and the like) of the battery under the condition of depending on the long period of rail transit data acquisition, so as to realize the prediction of the SOE of the battery.

The technical scheme of the invention is as follows:

a method for predicting the SOE of a rail-traffic lithium battery through big data comprises the following steps:

and S001, a data preparation step, namely acquiring data related to the use of the rail transit battery.

In this step, the data of the rail transit battery includes monitoring data of rail transit, and the monitoring data is collected every ten seconds (or other collection frequencies according to actual conditions), and is generated in different vehicle states of rail transit, such as during driving and charging. The monitoring data of the battery comprises battery self data related to the battery in normal use and rail traffic state data, and the total number of the data variables exceeds 200.

The usage data of the battery is streaming data based on time series, and comprises current, voltage, temperature, residual electric quantity (SOC) and the like related to machine learning. The contents of the relevant data are shown in the following table.

And S002 data arrangement, namely cleaning the data related to the use of the rail transit battery and constructing the data related to the use of the rail transit battery after cleaning on the basis of time units.

The data cleaning method comprises the following steps: establishing a database for the acquired data, acquiring abnormal data to obtain an original database, finding the attribute to be cleaned in the data set through the distribution of noise data in the abnormal data set, searching a tensor capable of expanding dimension, performing high-order dimension expansion on the attribute tensor to obtain a high-order tensor attribute set, performing attribute cleaning and data restoration on the abnormal data attribute by using the expanded attribute tensor, and updating the cleaned data into a new database to obtain a target database.

And S003, data characterization, namely summarizing and extracting the data obtained in the data arrangement step to obtain the characterized data.

Since data needs to be processed and calculated in subsequent processing steps, in order to facilitate calculation and identification of characteristics of the data, the sorted data needs to be first characterized in order to visualize various characteristics of the data so as to facilitate calculation and identification.

In this step, the summarization and extraction of data includes rolling aggregation. The rolling aggregation refers to setting a time window, and calculating an aggregation value of a predetermined variable in the time window, wherein the aggregation value can be a sum, an average or a standard deviation of data. As shown in fig. 4, for example, the t1 node, the time window is set to be 3, and the rolling aggregation thereof is to calculate the sum, the mean or the standard deviation of the t1 node and 3 nodes between the t1 nodes.

In this step, in order to be able to provide better, even additional learning and prediction capabilities to the learning algorithm, requiring more multivariate data, the invention summarizes and extracts from the time series based battery data, thereby expanding the characteristic variables in the initial S001. For example, when there are 126 feature variables in step S001, in the present example, the data to be expanded are mainly of two types: the first major category is to increase 126-2 to 124 on the average of the initial 126 feature variables according to rolling aggregation; the second type is that the initial 126 characteristic variables are increased by 126-2 to 124 according to the standard deviation of rolling aggregation; the last variable thus obtained is 126+124+ 124-374. This provides more multivariate data, which facilitates better and predictive ability of the learning algorithm.

And S004 target determination step, calculating an SOE value for learning, and capturing specific points for verification.

For each collection record of battery data, after characterization, calculation of a target value, i.e., SOE, is required.

The first step is as follows: obtaining battery basic data for calculating SOE of the second step

The basic data, which may also be referred to as factory data, includes: nominal capacity of battery (Cap _ BOL), nominal energy of battery (E)_rated) A corresponding relation table with temperature and a corresponding relation table of cycle times, capacity and energy attenuation under the ideal working condition of the battery.

The battery energy can be provided by a battery factory, because general batteries can label a corresponding relation table of the battery energy and the temperature, and also can be provided by the battery factory, if the general batteries cannot be provided, the relation table of the temperature T and the battery energy is learned through data (the SOC is from 20% to 100% during charging); the ideal condition in the corresponding relation table of the cycle number and the energy attenuation under the ideal working condition of the battery is that the battery 1C is discharged, 0.5C is charged (wherein C is the discharge rate of the battery), the battery is discharged to 0% SOC under the environment of 25 ℃, and one charge-discharge calculation cycle is performed once.

The second step is that: capture of SOE at time t during SOC discharge from 100%

Counting throughput from battery factory startWhere Δ t is the sampling time interval, which includes all charging and discharging processes, I_tIs current during charging and discharging, during charging I_tIs negative, at discharge time I_tIs positive. Since the battery is not at 25 deg.C during actual operation, 1C is discharged, 0.5C is charged, and fullUnder the ideal condition of full charge, the attenuation coefficient P is obtained by checking the capacity and temperature correspondence table in the first step according to the current actual SOC (residual capacity), T (temperature) and C (battery discharge rate), so the actual throughput isThe number of charge and discharge cycles of the battery in an ideal state isThen, the N is found out according to the cycle times and the energy attenuation Engergy Fade Curve_tCorresponding to total energy E of the battery_rated.

Take one active capture as an example: information on start of discharge: time t₀，SOC₀(100%), information of discharge to time t: time t₁，SOC₁Temperature T_SOCVoltage U_SOCThe calculation steps are as follows:

wherein E_wFor the energy consumed by the battery, it is possible to obtain:

through the above steps, calculation of the SOE is performed for each battery data, and then the SOE obtained in the second step is used_tAs a target of learning.

And S005, a data calculation step, namely establishing a battery SOE prediction model based on the characterized data.

For the problem of battery SOE prediction, a nonlinear mixed effect model, a survival model and a random forest model are adopted to establish the battery SOE prediction model in the specific implementation mode.

The model determines mathematical relationships between variables from a set of sample data, performs various statistical tests on the trustworthiness of the relationships, and finds out which variables are significantly affected and which are not significantly affected from among the variables affecting a particular variable.

SOE at time t_tAs Y, each data is tagged temporally; setting data obtained after steps S001, S002 and S003 as x, and establishing a model Y ═ f (x), wherein f () is a model obtained by the machine based on big data learning; in the actual operation and use process of rail transit, the battery SOE is difficult to monitor in real time, the SOE is roughly predicted by the conventional method mainly based on the conventional empirical formula, and the method has the main defects that the SOE cannot be calculated in real time and the accuracy is low, and each single SOE cannot be well predicted due to the difference of the single SOE. Models built based on big data can solve the above problems well. The input of the model is data collected at time t and time t, and the output of the model is battery SOE at time t_tIn the real-time operation and use process of rail transit, the SOE can be accurately deduced by using the model according to the collected data x_t。

The nonlinear mixed effect model is an extension of the linear mixed effect model, both a fixed effect part and a random effect part of the nonlinear mixed effect model can be incorporated into the model in a nonlinear form, and compared with the normal assumption of the linear model, the nonlinear model has no special requirement on the distribution of data, the data can be in normal distribution, also can be in binomial distribution and Poisson distribution, and meanwhile, the nonlinear mixed effect model has better robustness in the aspect of processing missing data. The model of the algorithm is Y ═ f (x + Φ) + e, where f () is a nonlinear function, Φ ═ a β + Bb where a, B is the designed matrix, β is the fixed effect parameter vector and B is the random effect parameter vector, e is the error vector, where β is the fixed effect data of input data x that are relevant for battery SOE prediction and B is the random effect data that are not relevant for SOE prediction. The estimation of the parameters A and B can be completed through iteration between two steps of a pseudo data step and a linear mixed effect step, and the estimation can be solved by using a Gauss-Newton iteration method and an EM algorithm respectively. Because the battery capacity is continuously attenuated and dynamically and nonlinearly changed in the daily use process of the rail transit, the nonlinear function in the nonlinear mixed effect model can better fit the dynamic and nonlinear battery capacity change, and because part of the collected battery parameters has correlation with the battery capacity and part of the collected battery parameters does not belong to the immediate distribution, the fixed effect term and the immediate effect term in the nonlinear mixed effect model can well describe the two types of parameters.

The survival analysis is used for researching the distribution rule of the survival time and the relationship between the survival time and relevant factors, and analyzing and deducing the survival time of organisms or people and the like according to data obtained by tests or surveys. The study of survival analysis focuses on predicting the probability of response, survival probability, and mean life span. The main method comprises the following steps: description method, nonparametric method, parametric method, semiparametric method. The description method directly calculates a survival function, a death function, a risk function and the like at each time point or each time interval by using a formula according to information provided by the sample observation value, and displays the distribution rule of the survival time in a list or drawing mode; the nonparametric method has no requirement on the distribution of survival time when estimating the survival function, and the nonparametric method is adopted when detecting the influence of the risk factors on the survival time; estimating parameters in the assumed distribution model by a parameter method according to the sample observed value to obtain a probability distribution model of survival time; the semi-parametric method does not need to make assumptions on the distribution of the survival time, but can analyze the distribution rule of the survival time and the influence of the risk factors on the survival time through a model. In the survival model algorithmWherein t is the service time of the battery, x is the data collected based on the time sequence, f (x) is the probability density function of the survival time distribution of the study object, and S (t) is the probability that the survival time of the study object is longer than t. The algorithm model of SOE is Y ═ f (s (t), x), where f () is the memory algorithm model. Since the SOE of the battery is discharged from the initial 100%, which is equivalent to a process from birth to death and survival, the battery life based on the SOE parameter can be well predicted according to the probability density distribution function obtained by modeling in the big data modeling process.

The random forest model is a forest formed by a plurality of decision trees, algorithm classification results are obtained by voting of the decision trees, random processes are added to the decision trees in the process of generation in the row direction and the column direction respectively, during construction of the decision trees in the row direction, a back sampling (bootstrapping) is adopted to obtain training data, a non-back sampling random sampling is adopted in the column direction to obtain a feature subset, and accordingly the optimal segmentation point is obtained. The decision forest is a combined model, the interior of the decision forest is still based on decision trees, and the decision forest is classified through voting results of a plurality of decision trees, so that the algorithm is not easy to generate the overfitting problem.

In this embodiment, the nonlinear mixed effect model, the survival model and the random forest are performed in parallel, the most suitable model is selected according to the effect of the last S007, and this selection is also dynamically adjusted.

And S006 training and verifying step, wherein the model is trained and verified to optimize the adaptive model.

On the basis of establishing the model, training and verification work is needed to optimize the model. In order to improve the accuracy of the model.

In this embodiment, the training verification step preferably includes cross-validation and few-class sampling.

In the cross-validation method, parameter frames of all models are optimized. The reliability of the algorithm depends on the parameter framework, that is, which battery data is most efficient for the results produced.

In this embodiment, to improve the quality of the parameter framework, the original data is first randomly divided into K parts. Of the K parts, one part is selected as test data, and the remaining K-1 parts are used as training data to obtain corresponding experimental results. Then, another part is selected as test data, the rest K-1 parts are used as training data, and the like, and the cross test is repeated for K times. In each experiment, a different part is selected from the K parts to be used as test data, the K parts of data are ensured to be respectively subjected to test data, and the rest K-1 parts are used as training data to be subjected to experiments. And finally, averaging the obtained K experimental results, wherein the experimental results can be the difference value between a predicted value and a check value, so that the smaller the difference value is, the better the difference value is, the best classification is determined, and the training of the model is realized. In the application, the data of the acquired rail transit can be immediately divided into K parts, the data of the K-1 parts are firstly used for establishing an SOE prediction model, and then the newly-established model is used for verifying whether the data of the remaining last part meets the model. And so on.

The minority class sampling is adopted when the data set is unbalanced when one class of data only has a small number of training samples. When there are only a few training samples in one type of data, the present embodiment may train the model by synthesizing a few failure sample data into a new few types of sample data. For example, in data collection of a battery, when only a small number of samples are collected, data synthesis is required to generate more data for machine learning from the small number of data. Specifically, for each minority class sample a, a sample B is randomly selected from its nearest neighbors, where the distance is calculated according to the distance in the time and variable graph, and then a point is randomly selected on the connecting line between a and B as the newly synthesized minority class sample. Through the continuous synthesis, a small amount of samples A can be changed into samples A + with multiple data, so that the data requirement of predicting the SOE of the battery is met, and overfitting or distortion caused by data imbalance in calculation cannot be generated.

And S007 algorithm evaluation step, namely evaluating the prediction results of the data under different algorithms, and selecting the optimal algorithm based on the evaluation.

In battery SOE prediction, different algorithms are used to obtain different results based on different prediction targets or different data sources, so that a better algorithm needs to be selected for different situations.

In general, in SOE prediction, the difference between the predicted value and the check value in S004 may be used to evaluate the prediction result, and compare whether the result obtained by using different algorithms under different conditions is optimal, so as to select the optimal algorithm.

The difference is how much the model predicts the difference between the battery SOE and the check value for the prediction result, and generally, the lower the difference is, the better the difference is.

The invention has the beneficial effects that: hidden noise data are found through a special cleaning means, so that the effects of good cleaning effect, high accuracy and the like are achieved; in addition, model training and evaluation are carried out, different algorithms are selected for matching verification and release by introducing data and utilizing different models of machine learning, and the model becomes a structured product, and the prediction accuracy of the model can be continuously improved along with time accumulation and data enrichment.

Drawings

FIG. 1 is a rail transit battery SOE prediction implementation;

FIG. 2 is a block diagram of the system architecture of the present invention;

FIG. 3 is a big data machine learning block diagram of the present invention;

FIG. 4 is a schematic diagram of rolling polymerization in the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the accompanying drawings:

as shown in fig. 1-4:

Counting throughput from battery factory startWhere Δ t is the sampling time interval, includingSome charging and discharging processes, I_tIs current during charging and discharging, during charging I_tIs negative, at discharge time I_tIs positive. Because the battery is not at 25 ℃, 1C discharging, 0.5C charging and full-charge discharging in the actual operation process, the attenuation coefficient P is obtained by checking the capacity and temperature corresponding relation table in the first step according to the current actual SOC (state of charge), T (temperature) and C (battery discharging rate), so the actual throughput isThe number of charge and discharge cycles of the battery in an ideal state isThen, the N is found out according to the cycle times and the energy attenuation Engergy Fade Curve_tCorresponding to total energy E of the battery_rated.

wherein E_wFor the energy consumed by the battery, it is possible to obtain:

The survival analysis is used for researching the distribution rule of the survival time and the relationship between the survival time and relevant factors, and analyzing and deducing the survival time of organisms or people and the like according to data obtained by tests or surveys. The study of survival analysis focuses on predicting the probability of response, survival probability, and mean life span. The main method comprises the following steps: description method, nonparametric method, parametric method, semiparametric method. The description method directly calculates a survival function, a death function, a risk function and the like at each time point or each time interval by using a formula according to information provided by the sample observation value, and displays the distribution rule of the survival time in a list or drawing mode; the nonparametric method has no requirement on the distribution of survival time when estimating the survival function, and the nonparametric method is adopted when detecting the influence of the risk factors on the survival time; estimating parameters in the assumed distribution model by a parameter method according to the sample observed value to obtain a probability distribution model of survival time; the semi-parametric method does not need to make assumptions on the distribution of the survival time, but can analyze the distribution rule of the survival time and the influence of the risk factors on the survival time through a model. In the survival model algorithmWherein t is the service time of the battery, x is the data collected based on the time sequence, f (x) is the probability density function of the survival time distribution of the study object, and S (t) is the probability that the survival time of the study object is longer than t. The algorithm model of SOE is Y ═ f (s (t), x), where f () is the memory algorithm model. Since the SOE of the battery starts from the initial 100%Discharging is equivalent to a process from birth to death and survival, so in the big data modeling process, the battery life based on the SOE parameter can be well predicted according to the probability density distribution function obtained by modeling.

The foregoing embodiments and description have been presented only to illustrate the principles and preferred embodiments of the invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as hereinafter claimed.

Claims

1. A method for predicting the SOE of a rail-traffic lithium battery through big data is characterized by comprising the following steps: it comprises the following steps:

s001, a data preparation step, namely acquiring data related to the use of the rail transit battery;

s002 data arrangement step, namely cleaning the data related to the use of the rail transit battery and constructing the data related to the use of the rail transit battery after cleaning on the basis of time units;

the data cleaning method comprises the following steps: establishing a database for the acquired data, acquiring abnormal data to obtain an original database, finding an attribute to be cleaned in the data set through the distribution of noise data in the abnormal data set, searching a tensor capable of expanding dimension, performing high-order dimension expansion on the attribute tensor to obtain a high-order tensor attribute set, performing attribute cleaning and data restoration on the abnormal data attribute by using the expanded attribute tensor, and updating the cleaned data into a new database to obtain a target database;

s003 data characterization step, summarizing and extracting the data obtained in the data arrangement step to obtain characterized data;

s004 target determination step, calculating an SOE value for learning, and capturing specific points for verification;

s005, a data calculation step, namely establishing a battery SOE prediction model based on the characterized data;

s006 training and verifying step, training and verifying the model to optimize the self-adaptive model;

2. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 1, wherein the method comprises the following steps: the rail transit battery in the S001 uses the use data of the battery in the car networking data of the rail transit; the use data of the battery comprises battery self data and rail traffic state data which are related to the battery in normal use; the use data of the battery are streaming data based on time series; and calculating the obtained SOEt at the t moment according to an empirical formula.

3. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 2, wherein the method comprises the following steps: the summarizing and extracting of the data in S003 includes rolling aggregation, where the rolling aggregation is to set a time window and calculate an aggregate value of a predetermined variable in the time window, where the aggregate value may be a sum, an average, or a standard deviation of the data; the summarizing and the extracting further comprise expanding the characteristic variables, wherein the expanding comprises increasing the initial characteristic variables by corresponding numbers according to the rolling aggregation mean value and increasing the initial characteristic variables by corresponding numbers according to the rolling aggregation standard deviation.

4. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 3, wherein the method comprises the following steps: the S005 also comprises a few types of sampling to train the model, and when only a small number of training samples exist in one type of data in the samples, the model is trained by synthesizing a small number of sample data into a new few types of sample data; for each minority class sample A, randomly selecting a sample B from the nearest neighbor of the minority class sample A, wherein the distance is calculated according to the distance in the time and variable graph, and then randomly selecting a point on a connecting line between A and B as a newly synthesized minority class sample; through continuous synthesis, a small amount of sample A is changed into a sample A + with multiple data.