CN110596594A - Method for predicting SOE of rail-traffic lithium battery through big data - Google Patents

Method for predicting SOE of rail-traffic lithium battery through big data Download PDF

Info

Publication number
CN110596594A
CN110596594A CN201910901034.0A CN201910901034A CN110596594A CN 110596594 A CN110596594 A CN 110596594A CN 201910901034 A CN201910901034 A CN 201910901034A CN 110596594 A CN110596594 A CN 110596594A
Authority
CN
China
Prior art keywords
data
battery
soe
model
rail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910901034.0A
Other languages
Chinese (zh)
Inventor
常伟
余捷全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yuxiu Technology Co Ltd
Original Assignee
Guangdong Yuxiu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Yuxiu Technology Co Ltd filed Critical Guangdong Yuxiu Technology Co Ltd
Priority to CN201910901034.0A priority Critical patent/CN110596594A/en
Publication of CN110596594A publication Critical patent/CN110596594A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/005Testing of electric installations on transport means
    • G01R31/008Testing of electric installations on transport means on air- or spacecraft, railway rolling stock or sea-going vessels
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/36Arrangements for testing, measuring or monitoring the electrical condition of accumulators or electric batteries, e.g. capacity or state of charge [SoC]
    • G01R31/367Software therefor, e.g. for battery testing using modelling or look-up tables
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/36Arrangements for testing, measuring or monitoring the electrical condition of accumulators or electric batteries, e.g. capacity or state of charge [SoC]
    • G01R31/382Arrangements for monitoring battery or accumulator variables, e.g. SoC
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/36Arrangements for testing, measuring or monitoring the electrical condition of accumulators or electric batteries, e.g. capacity or state of charge [SoC]
    • G01R31/392Determining battery ageing or deterioration, e.g. state of health
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Secondary Cells (AREA)

Abstract

The invention relates to the technical field of rail transit maintenance, in particular to a method for predicting a rail transit lithium battery SOE through big data; it includes: the method comprises the steps of data preparation, data sorting, data characterization, target determination, data calculation, training verification and algorithm evaluation; hidden noise data are found through a special cleaning means, so that the effects of good cleaning effect, high accuracy and the like are achieved; in addition, model training and evaluation are carried out, different algorithms are selected for matching verification and release by introducing data and utilizing different models of machine learning, and the model becomes a structured product, and the prediction accuracy of the model can be continuously improved along with time accumulation and data enrichment.

Description

Method for predicting SOE of rail-traffic lithium battery through big data
Technical Field
The invention relates to the technical field of rail transit maintenance, in particular to a method for predicting a rail transit lithium battery SOE through big data.
Background
The State of remaining Energy (SOE) of the battery is defined as the percentage of the remaining Energy of the battery to the total available Energy, and refers to the ratio of the actual value to the nominal value of some directly measurable or indirectly calculated performance parameters after the battery is used for a period of time under certain conditions, and is used for judging the health and the use condition of the battery. The SOE is related not only to the electrochemical system of the battery itself and the battery manufacturing process, but also to the vehicle driving conditions and the working environment inside the battery pack.
In conventional rail transit, conventional data such as voltage and current are measured by sensors for monitoring. However, it is known that the capacity of the battery continuously decays with the increase of the charging and discharging times and the driving mileage, the reaction is a typical dynamic nonlinear electrochemical system, the internal parameters are difficult to measure in online application, and the degradation state identification and the state estimation of the battery still have great challenges.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for mining the implicit battery health state information and the evolution law thereof from the rated information and the state monitoring data (voltage, current, temperature, SOC and the like) of the battery under the condition of depending on the long period of rail transit data acquisition, so as to realize the prediction of the SOE of the battery.
The technical scheme of the invention is as follows:
a method for predicting the SOE of a rail-traffic lithium battery through big data comprises the following steps:
and S001, a data preparation step, namely acquiring data related to the use of the rail transit battery.
In this step, the data of the rail transit battery includes monitoring data of rail transit, and the monitoring data is collected every ten seconds (or other collection frequencies according to actual conditions), and is generated in different vehicle states of rail transit, such as during driving and charging. The monitoring data of the battery comprises battery self data related to the battery in normal use and rail traffic state data, and the total number of the data variables exceeds 200.
The usage data of the battery is streaming data based on time series, and comprises current, voltage, temperature, residual electric quantity (SOC) and the like related to machine learning. The contents of the relevant data are shown in the following table.
And S002 data arrangement, namely cleaning the data related to the use of the rail transit battery and constructing the data related to the use of the rail transit battery after cleaning on the basis of time units.
The data cleaning method comprises the following steps: establishing a database for the acquired data, acquiring abnormal data to obtain an original database, finding the attribute to be cleaned in the data set through the distribution of noise data in the abnormal data set, searching a tensor capable of expanding dimension, performing high-order dimension expansion on the attribute tensor to obtain a high-order tensor attribute set, performing attribute cleaning and data restoration on the abnormal data attribute by using the expanded attribute tensor, and updating the cleaned data into a new database to obtain a target database.
And S003, data characterization, namely summarizing and extracting the data obtained in the data arrangement step to obtain the characterized data.
Since data needs to be processed and calculated in subsequent processing steps, in order to facilitate calculation and identification of characteristics of the data, the sorted data needs to be first characterized in order to visualize various characteristics of the data so as to facilitate calculation and identification.
In this step, the summarization and extraction of data includes rolling aggregation. The rolling aggregation refers to setting a time window, and calculating an aggregation value of a predetermined variable in the time window, wherein the aggregation value can be a sum, an average or a standard deviation of data. As shown in fig. 4, for example, the t1 node, the time window is set to be 3, and the rolling aggregation thereof is to calculate the sum, the mean or the standard deviation of the t1 node and 3 nodes between the t1 nodes.
In this step, in order to be able to provide better, even additional learning and prediction capabilities to the learning algorithm, requiring more multivariate data, the invention summarizes and extracts from the time series based battery data, thereby expanding the characteristic variables in the initial S001. For example, when there are 126 feature variables in step S001, in the present example, the data to be expanded are mainly of two types: the first major category is to increase 126-2 to 124 on the average of the initial 126 feature variables according to rolling aggregation; the second type is that the initial 126 characteristic variables are increased by 126-2 to 124 according to the standard deviation of rolling aggregation; the last variable thus obtained is 126+124+ 124-374. This provides more multivariate data, which facilitates better and predictive ability of the learning algorithm.
And S004 target determination step, calculating an SOE value for learning, and capturing specific points for verification.
For each collection record of battery data, after characterization, calculation of a target value, i.e., SOE, is required.
The first step is as follows: obtaining battery basic data for calculating SOE of the second step
The basic data, which may also be referred to as factory data, includes: nominal capacity of battery (Cap _ BOL), nominal energy of battery (E)rated) A corresponding relation table with temperature and a corresponding relation table of cycle times, capacity and energy attenuation under the ideal working condition of the battery.
The battery energy can be provided by a battery factory, because general batteries can label a corresponding relation table of the battery energy and the temperature, and also can be provided by the battery factory, if the general batteries cannot be provided, the relation table of the temperature T and the battery energy is learned through data (the SOC is from 20% to 100% during charging); the ideal condition in the corresponding relation table of the cycle number and the energy attenuation under the ideal working condition of the battery is that the battery 1C is discharged, 0.5C is charged (wherein C is the discharge rate of the battery), the battery is discharged to 0% SOC under the environment of 25 ℃, and one charge-discharge calculation cycle is performed once.
The second step is that: capture of SOE at time t during SOC discharge from 100%
Counting throughput from battery factory startWhere Δ t is the sampling time interval, which includes all charging and discharging processes, ItIs current during charging and discharging, during charging ItIs negative, at discharge time ItIs positive. Since the battery is not at 25 deg.C during actual operation, 1C is discharged, 0.5C is charged, and fullUnder the ideal condition of full charge, the attenuation coefficient P is obtained by checking the capacity and temperature correspondence table in the first step according to the current actual SOC (residual capacity), T (temperature) and C (battery discharge rate), so the actual throughput isThe number of charge and discharge cycles of the battery in an ideal state isThen, the N is found out according to the cycle times and the energy attenuation Engergy Fade CurvetCorresponding to total energy E of the batteryrated.
Take one active capture as an example: information on start of discharge: time t0,SOC0(100%), information of discharge to time t: time t1,SOC1Temperature TSOCVoltage USOCThe calculation steps are as follows:
wherein EwFor the energy consumed by the battery, it is possible to obtain:
through the above steps, calculation of the SOE is performed for each battery data, and then the SOE obtained in the second step is usedtAs a target of learning.
And S005, a data calculation step, namely establishing a battery SOE prediction model based on the characterized data.
For the problem of battery SOE prediction, a nonlinear mixed effect model, a survival model and a random forest model are adopted to establish the battery SOE prediction model in the specific implementation mode.
The model determines mathematical relationships between variables from a set of sample data, performs various statistical tests on the trustworthiness of the relationships, and finds out which variables are significantly affected and which are not significantly affected from among the variables affecting a particular variable.
SOE at time ttAs Y, each data is tagged temporally; setting data obtained after steps S001, S002 and S003 as x, and establishing a model Y ═ f (x), wherein f () is a model obtained by the machine based on big data learning; in the actual operation and use process of rail transit, the battery SOE is difficult to monitor in real time, the SOE is roughly predicted by the conventional method mainly based on the conventional empirical formula, and the method has the main defects that the SOE cannot be calculated in real time and the accuracy is low, and each single SOE cannot be well predicted due to the difference of the single SOE. Models built based on big data can solve the above problems well. The input of the model is data collected at time t and time t, and the output of the model is battery SOE at time ttIn the real-time operation and use process of rail transit, the SOE can be accurately deduced by using the model according to the collected data xt
The nonlinear mixed effect model is an extension of the linear mixed effect model, both a fixed effect part and a random effect part of the nonlinear mixed effect model can be incorporated into the model in a nonlinear form, and compared with the normal assumption of the linear model, the nonlinear model has no special requirement on the distribution of data, the data can be in normal distribution, also can be in binomial distribution and Poisson distribution, and meanwhile, the nonlinear mixed effect model has better robustness in the aspect of processing missing data. The model of the algorithm is Y ═ f (x + Φ) + e, where f () is a nonlinear function, Φ ═ a β + Bb where a, B is the designed matrix, β is the fixed effect parameter vector and B is the random effect parameter vector, e is the error vector, where β is the fixed effect data of input data x that are relevant for battery SOE prediction and B is the random effect data that are not relevant for SOE prediction. The estimation of the parameters A and B can be completed through iteration between two steps of a pseudo data step and a linear mixed effect step, and the estimation can be solved by using a Gauss-Newton iteration method and an EM algorithm respectively. Because the battery capacity is continuously attenuated and dynamically and nonlinearly changed in the daily use process of the rail transit, the nonlinear function in the nonlinear mixed effect model can better fit the dynamic and nonlinear battery capacity change, and because part of the collected battery parameters has correlation with the battery capacity and part of the collected battery parameters does not belong to the immediate distribution, the fixed effect term and the immediate effect term in the nonlinear mixed effect model can well describe the two types of parameters.
The survival analysis is used for researching the distribution rule of the survival time and the relationship between the survival time and relevant factors, and analyzing and deducing the survival time of organisms or people and the like according to data obtained by tests or surveys. The study of survival analysis focuses on predicting the probability of response, survival probability, and mean life span. The main method comprises the following steps: description method, nonparametric method, parametric method, semiparametric method. The description method directly calculates a survival function, a death function, a risk function and the like at each time point or each time interval by using a formula according to information provided by the sample observation value, and displays the distribution rule of the survival time in a list or drawing mode; the nonparametric method has no requirement on the distribution of survival time when estimating the survival function, and the nonparametric method is adopted when detecting the influence of the risk factors on the survival time; estimating parameters in the assumed distribution model by a parameter method according to the sample observed value to obtain a probability distribution model of survival time; the semi-parametric method does not need to make assumptions on the distribution of the survival time, but can analyze the distribution rule of the survival time and the influence of the risk factors on the survival time through a model. In the survival model algorithmWherein t is the service time of the battery, x is the data collected based on the time sequence, f (x) is the probability density function of the survival time distribution of the study object, and S (t) is the probability that the survival time of the study object is longer than t. The algorithm model of SOE is Y ═ f (s (t), x), where f () is the memory algorithm model. Since the SOE of the battery is discharged from the initial 100%, which is equivalent to a process from birth to death and survival, the battery life based on the SOE parameter can be well predicted according to the probability density distribution function obtained by modeling in the big data modeling process.
The random forest model is a forest formed by a plurality of decision trees, algorithm classification results are obtained by voting of the decision trees, random processes are added to the decision trees in the process of generation in the row direction and the column direction respectively, during construction of the decision trees in the row direction, a back sampling (bootstrapping) is adopted to obtain training data, a non-back sampling random sampling is adopted in the column direction to obtain a feature subset, and accordingly the optimal segmentation point is obtained. The decision forest is a combined model, the interior of the decision forest is still based on decision trees, and the decision forest is classified through voting results of a plurality of decision trees, so that the algorithm is not easy to generate the overfitting problem.
In this embodiment, the nonlinear mixed effect model, the survival model and the random forest are performed in parallel, the most suitable model is selected according to the effect of the last S007, and this selection is also dynamically adjusted.
And S006 training and verifying step, wherein the model is trained and verified to optimize the adaptive model.
On the basis of establishing the model, training and verification work is needed to optimize the model. In order to improve the accuracy of the model.
In this embodiment, the training verification step preferably includes cross-validation and few-class sampling.
In the cross-validation method, parameter frames of all models are optimized. The reliability of the algorithm depends on the parameter framework, that is, which battery data is most efficient for the results produced.
In this embodiment, to improve the quality of the parameter framework, the original data is first randomly divided into K parts. Of the K parts, one part is selected as test data, and the remaining K-1 parts are used as training data to obtain corresponding experimental results. Then, another part is selected as test data, the rest K-1 parts are used as training data, and the like, and the cross test is repeated for K times. In each experiment, a different part is selected from the K parts to be used as test data, the K parts of data are ensured to be respectively subjected to test data, and the rest K-1 parts are used as training data to be subjected to experiments. And finally, averaging the obtained K experimental results, wherein the experimental results can be the difference value between a predicted value and a check value, so that the smaller the difference value is, the better the difference value is, the best classification is determined, and the training of the model is realized. In the application, the data of the acquired rail transit can be immediately divided into K parts, the data of the K-1 parts are firstly used for establishing an SOE prediction model, and then the newly-established model is used for verifying whether the data of the remaining last part meets the model. And so on.
The minority class sampling is adopted when the data set is unbalanced when one class of data only has a small number of training samples. When there are only a few training samples in one type of data, the present embodiment may train the model by synthesizing a few failure sample data into a new few types of sample data. For example, in data collection of a battery, when only a small number of samples are collected, data synthesis is required to generate more data for machine learning from the small number of data. Specifically, for each minority class sample a, a sample B is randomly selected from its nearest neighbors, where the distance is calculated according to the distance in the time and variable graph, and then a point is randomly selected on the connecting line between a and B as the newly synthesized minority class sample. Through the continuous synthesis, a small amount of samples A can be changed into samples A + with multiple data, so that the data requirement of predicting the SOE of the battery is met, and overfitting or distortion caused by data imbalance in calculation cannot be generated.
And S007 algorithm evaluation step, namely evaluating the prediction results of the data under different algorithms, and selecting the optimal algorithm based on the evaluation.
In battery SOE prediction, different algorithms are used to obtain different results based on different prediction targets or different data sources, so that a better algorithm needs to be selected for different situations.
In general, in SOE prediction, the difference between the predicted value and the check value in S004 may be used to evaluate the prediction result, and compare whether the result obtained by using different algorithms under different conditions is optimal, so as to select the optimal algorithm.
The difference is how much the model predicts the difference between the battery SOE and the check value for the prediction result, and generally, the lower the difference is, the better the difference is.
The invention has the beneficial effects that: hidden noise data are found through a special cleaning means, so that the effects of good cleaning effect, high accuracy and the like are achieved; in addition, model training and evaluation are carried out, different algorithms are selected for matching verification and release by introducing data and utilizing different models of machine learning, and the model becomes a structured product, and the prediction accuracy of the model can be continuously improved along with time accumulation and data enrichment.
Drawings
FIG. 1 is a rail transit battery SOE prediction implementation;
FIG. 2 is a block diagram of the system architecture of the present invention;
FIG. 3 is a big data machine learning block diagram of the present invention;
FIG. 4 is a schematic diagram of rolling polymerization in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings:
as shown in fig. 1-4:
a method for predicting the SOE of a rail-traffic lithium battery through big data comprises the following steps:
and S001, a data preparation step, namely acquiring data related to the use of the rail transit battery.
In this step, the data of the rail transit battery includes monitoring data of rail transit, and the monitoring data is collected every ten seconds (or other collection frequencies according to actual conditions), and is generated in different vehicle states of rail transit, such as during driving and charging. The monitoring data of the battery comprises battery self data related to the battery in normal use and rail traffic state data, and the total number of the data variables exceeds 200.
The usage data of the battery is streaming data based on time series, and comprises current, voltage, temperature, residual electric quantity (SOC) and the like related to machine learning. The contents of the relevant data are shown in the following table.
And S002 data arrangement, namely cleaning the data related to the use of the rail transit battery and constructing the data related to the use of the rail transit battery after cleaning on the basis of time units.
The data cleaning method comprises the following steps: establishing a database for the acquired data, acquiring abnormal data to obtain an original database, finding the attribute to be cleaned in the data set through the distribution of noise data in the abnormal data set, searching a tensor capable of expanding dimension, performing high-order dimension expansion on the attribute tensor to obtain a high-order tensor attribute set, performing attribute cleaning and data restoration on the abnormal data attribute by using the expanded attribute tensor, and updating the cleaned data into a new database to obtain a target database.
And S003, data characterization, namely summarizing and extracting the data obtained in the data arrangement step to obtain the characterized data.
Since data needs to be processed and calculated in subsequent processing steps, in order to facilitate calculation and identification of characteristics of the data, the sorted data needs to be first characterized in order to visualize various characteristics of the data so as to facilitate calculation and identification.
In this step, the summarization and extraction of data includes rolling aggregation. The rolling aggregation refers to setting a time window, and calculating an aggregation value of a predetermined variable in the time window, wherein the aggregation value can be a sum, an average or a standard deviation of data. As shown in fig. 4, for example, the t1 node, the time window is set to be 3, and the rolling aggregation thereof is to calculate the sum, the mean or the standard deviation of the t1 node and 3 nodes between the t1 nodes.
In this step, in order to be able to provide better, even additional learning and prediction capabilities to the learning algorithm, requiring more multivariate data, the invention summarizes and extracts from the time series based battery data, thereby expanding the characteristic variables in the initial S001. For example, when there are 126 feature variables in step S001, in the present example, the data to be expanded are mainly of two types: the first major category is to increase 126-2 to 124 on the average of the initial 126 feature variables according to rolling aggregation; the second type is that the initial 126 characteristic variables are increased by 126-2 to 124 according to the standard deviation of rolling aggregation; the last variable thus obtained is 126+124+ 124-374. This provides more multivariate data, which facilitates better and predictive ability of the learning algorithm.
And S004 target determination step, calculating an SOE value for learning, and capturing specific points for verification.
For each collection record of battery data, after characterization, calculation of a target value, i.e., SOE, is required.
The first step is as follows: obtaining battery basic data for calculating SOE of the second step
The basic data, which may also be referred to as factory data, includes: nominal capacity of battery (Cap _ BOL), nominal energy of battery (E)rated) A corresponding relation table with temperature and a corresponding relation table of cycle times, capacity and energy attenuation under the ideal working condition of the battery.
The battery energy can be provided by a battery factory, because general batteries can label a corresponding relation table of the battery energy and the temperature, and also can be provided by the battery factory, if the general batteries cannot be provided, the relation table of the temperature T and the battery energy is learned through data (the SOC is from 20% to 100% during charging); the ideal condition in the corresponding relation table of the cycle number and the energy attenuation under the ideal working condition of the battery is that the battery 1C is discharged, 0.5C is charged (wherein C is the discharge rate of the battery), the battery is discharged to 0% SOC under the environment of 25 ℃, and one charge-discharge calculation cycle is performed once.
The second step is that: capture of SOE at time t during SOC discharge from 100%
Counting throughput from battery factory startWhere Δ t is the sampling time interval, includingSome charging and discharging processes, ItIs current during charging and discharging, during charging ItIs negative, at discharge time ItIs positive. Because the battery is not at 25 ℃, 1C discharging, 0.5C charging and full-charge discharging in the actual operation process, the attenuation coefficient P is obtained by checking the capacity and temperature corresponding relation table in the first step according to the current actual SOC (state of charge), T (temperature) and C (battery discharging rate), so the actual throughput isThe number of charge and discharge cycles of the battery in an ideal state isThen, the N is found out according to the cycle times and the energy attenuation Engergy Fade CurvetCorresponding to total energy E of the batteryrated.
Take one active capture as an example: information on start of discharge: time t0,SOC0(100%), information of discharge to time t: time t1,SOC1Temperature TSOCVoltage USOCThe calculation steps are as follows:
wherein EwFor the energy consumed by the battery, it is possible to obtain:
through the above steps, calculation of the SOE is performed for each battery data, and then the SOE obtained in the second step is usedtAs a target of learning.
And S005, a data calculation step, namely establishing a battery SOE prediction model based on the characterized data.
For the problem of battery SOE prediction, a nonlinear mixed effect model, a survival model and a random forest model are adopted to establish the battery SOE prediction model in the specific implementation mode.
The model determines mathematical relationships between variables from a set of sample data, performs various statistical tests on the trustworthiness of the relationships, and finds out which variables are significantly affected and which are not significantly affected from among the variables affecting a particular variable.
SOE at time ttAs Y, each data is tagged temporally; setting data obtained after steps S001, S002 and S003 as x, and establishing a model Y ═ f (x), wherein f () is a model obtained by the machine based on big data learning; in the actual operation and use process of rail transit, the battery SOE is difficult to monitor in real time, the SOE is roughly predicted by the conventional method mainly based on the conventional empirical formula, and the method has the main defects that the SOE cannot be calculated in real time and the accuracy is low, and each single SOE cannot be well predicted due to the difference of the single SOE. Models built based on big data can solve the above problems well. The input of the model is data collected at time t and time t, and the output of the model is battery SOE at time ttIn the real-time operation and use process of rail transit, the SOE can be accurately deduced by using the model according to the collected data xt
The nonlinear mixed effect model is an extension of the linear mixed effect model, both a fixed effect part and a random effect part of the nonlinear mixed effect model can be incorporated into the model in a nonlinear form, and compared with the normal assumption of the linear model, the nonlinear model has no special requirement on the distribution of data, the data can be in normal distribution, also can be in binomial distribution and Poisson distribution, and meanwhile, the nonlinear mixed effect model has better robustness in the aspect of processing missing data. The model of the algorithm is Y ═ f (x + Φ) + e, where f () is a nonlinear function, Φ ═ a β + Bb where a, B is the designed matrix, β is the fixed effect parameter vector and B is the random effect parameter vector, e is the error vector, where β is the fixed effect data of input data x that are relevant for battery SOE prediction and B is the random effect data that are not relevant for SOE prediction. The estimation of the parameters A and B can be completed through iteration between two steps of a pseudo data step and a linear mixed effect step, and the estimation can be solved by using a Gauss-Newton iteration method and an EM algorithm respectively. Because the battery capacity is continuously attenuated and dynamically and nonlinearly changed in the daily use process of the rail transit, the nonlinear function in the nonlinear mixed effect model can better fit the dynamic and nonlinear battery capacity change, and because part of the collected battery parameters has correlation with the battery capacity and part of the collected battery parameters does not belong to the immediate distribution, the fixed effect term and the immediate effect term in the nonlinear mixed effect model can well describe the two types of parameters.
The survival analysis is used for researching the distribution rule of the survival time and the relationship between the survival time and relevant factors, and analyzing and deducing the survival time of organisms or people and the like according to data obtained by tests or surveys. The study of survival analysis focuses on predicting the probability of response, survival probability, and mean life span. The main method comprises the following steps: description method, nonparametric method, parametric method, semiparametric method. The description method directly calculates a survival function, a death function, a risk function and the like at each time point or each time interval by using a formula according to information provided by the sample observation value, and displays the distribution rule of the survival time in a list or drawing mode; the nonparametric method has no requirement on the distribution of survival time when estimating the survival function, and the nonparametric method is adopted when detecting the influence of the risk factors on the survival time; estimating parameters in the assumed distribution model by a parameter method according to the sample observed value to obtain a probability distribution model of survival time; the semi-parametric method does not need to make assumptions on the distribution of the survival time, but can analyze the distribution rule of the survival time and the influence of the risk factors on the survival time through a model. In the survival model algorithmWherein t is the service time of the battery, x is the data collected based on the time sequence, f (x) is the probability density function of the survival time distribution of the study object, and S (t) is the probability that the survival time of the study object is longer than t. The algorithm model of SOE is Y ═ f (s (t), x), where f () is the memory algorithm model. Since the SOE of the battery starts from the initial 100%Discharging is equivalent to a process from birth to death and survival, so in the big data modeling process, the battery life based on the SOE parameter can be well predicted according to the probability density distribution function obtained by modeling.
The random forest model is a forest formed by a plurality of decision trees, algorithm classification results are obtained by voting of the decision trees, random processes are added to the decision trees in the process of generation in the row direction and the column direction respectively, during construction of the decision trees in the row direction, a back sampling (bootstrapping) is adopted to obtain training data, a non-back sampling random sampling is adopted in the column direction to obtain a feature subset, and accordingly the optimal segmentation point is obtained. The decision forest is a combined model, the interior of the decision forest is still based on decision trees, and the decision forest is classified through voting results of a plurality of decision trees, so that the algorithm is not easy to generate the overfitting problem.
In this embodiment, the nonlinear mixed effect model, the survival model and the random forest are performed in parallel, the most suitable model is selected according to the effect of the last S007, and this selection is also dynamically adjusted.
And S006 training and verifying step, wherein the model is trained and verified to optimize the adaptive model.
On the basis of establishing the model, training and verification work is needed to optimize the model. In order to improve the accuracy of the model.
In this embodiment, the training verification step preferably includes cross-validation and few-class sampling.
In the cross-validation method, parameter frames of all models are optimized. The reliability of the algorithm depends on the parameter framework, that is, which battery data is most efficient for the results produced.
In this embodiment, to improve the quality of the parameter framework, the original data is first randomly divided into K parts. Of the K parts, one part is selected as test data, and the remaining K-1 parts are used as training data to obtain corresponding experimental results. Then, another part is selected as test data, the rest K-1 parts are used as training data, and the like, and the cross test is repeated for K times. In each experiment, a different part is selected from the K parts to be used as test data, the K parts of data are ensured to be respectively subjected to test data, and the rest K-1 parts are used as training data to be subjected to experiments. And finally, averaging the obtained K experimental results, wherein the experimental results can be the difference value between a predicted value and a check value, so that the smaller the difference value is, the better the difference value is, the best classification is determined, and the training of the model is realized. In the application, the data of the acquired rail transit can be immediately divided into K parts, the data of the K-1 parts are firstly used for establishing an SOE prediction model, and then the newly-established model is used for verifying whether the data of the remaining last part meets the model. And so on.
The minority class sampling is adopted when the data set is unbalanced when one class of data only has a small number of training samples. When there are only a few training samples in one type of data, the present embodiment may train the model by synthesizing a few failure sample data into a new few types of sample data. For example, in data collection of a battery, when only a small number of samples are collected, data synthesis is required to generate more data for machine learning from the small number of data. Specifically, for each minority class sample a, a sample B is randomly selected from its nearest neighbors, where the distance is calculated according to the distance in the time and variable graph, and then a point is randomly selected on the connecting line between a and B as the newly synthesized minority class sample. Through the continuous synthesis, a small amount of samples A can be changed into samples A + with multiple data, so that the data requirement of predicting the SOE of the battery is met, and overfitting or distortion caused by data imbalance in calculation cannot be generated.
And S007 algorithm evaluation step, namely evaluating the prediction results of the data under different algorithms, and selecting the optimal algorithm based on the evaluation.
In battery SOE prediction, different algorithms are used to obtain different results based on different prediction targets or different data sources, so that a better algorithm needs to be selected for different situations.
In general, in SOE prediction, the difference between the predicted value and the check value in S004 may be used to evaluate the prediction result, and compare whether the result obtained by using different algorithms under different conditions is optimal, so as to select the optimal algorithm.
The difference is how much the model predicts the difference between the battery SOE and the check value for the prediction result, and generally, the lower the difference is, the better the difference is.
The foregoing embodiments and description have been presented only to illustrate the principles and preferred embodiments of the invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as hereinafter claimed.

Claims (4)

1. A method for predicting the SOE of a rail-traffic lithium battery through big data is characterized by comprising the following steps: it comprises the following steps:
s001, a data preparation step, namely acquiring data related to the use of the rail transit battery;
s002 data arrangement step, namely cleaning the data related to the use of the rail transit battery and constructing the data related to the use of the rail transit battery after cleaning on the basis of time units;
the data cleaning method comprises the following steps: establishing a database for the acquired data, acquiring abnormal data to obtain an original database, finding an attribute to be cleaned in the data set through the distribution of noise data in the abnormal data set, searching a tensor capable of expanding dimension, performing high-order dimension expansion on the attribute tensor to obtain a high-order tensor attribute set, performing attribute cleaning and data restoration on the abnormal data attribute by using the expanded attribute tensor, and updating the cleaned data into a new database to obtain a target database;
s003 data characterization step, summarizing and extracting the data obtained in the data arrangement step to obtain characterized data;
s004 target determination step, calculating an SOE value for learning, and capturing specific points for verification;
s005, a data calculation step, namely establishing a battery SOE prediction model based on the characterized data;
s006 training and verifying step, training and verifying the model to optimize the self-adaptive model;
and S007 algorithm evaluation step, namely evaluating the prediction results of the data under different algorithms, and selecting the optimal algorithm based on the evaluation.
2. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 1, wherein the method comprises the following steps: the rail transit battery in the S001 uses the use data of the battery in the car networking data of the rail transit; the use data of the battery comprises battery self data and rail traffic state data which are related to the battery in normal use; the use data of the battery are streaming data based on time series; and calculating the obtained SOEt at the t moment according to an empirical formula.
3. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 2, wherein the method comprises the following steps: the summarizing and extracting of the data in S003 includes rolling aggregation, where the rolling aggregation is to set a time window and calculate an aggregate value of a predetermined variable in the time window, where the aggregate value may be a sum, an average, or a standard deviation of the data; the summarizing and the extracting further comprise expanding the characteristic variables, wherein the expanding comprises increasing the initial characteristic variables by corresponding numbers according to the rolling aggregation mean value and increasing the initial characteristic variables by corresponding numbers according to the rolling aggregation standard deviation.
4. The method for predicting the SOE of the rail lithium battery through the big data as claimed in claim 3, wherein the method comprises the following steps: the S005 also comprises a few types of sampling to train the model, and when only a small number of training samples exist in one type of data in the samples, the model is trained by synthesizing a small number of sample data into a new few types of sample data; for each minority class sample A, randomly selecting a sample B from the nearest neighbor of the minority class sample A, wherein the distance is calculated according to the distance in the time and variable graph, and then randomly selecting a point on a connecting line between A and B as a newly synthesized minority class sample; through continuous synthesis, a small amount of sample A is changed into a sample A + with multiple data.
CN201910901034.0A 2019-09-23 2019-09-23 Method for predicting SOE of rail-traffic lithium battery through big data Pending CN110596594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910901034.0A CN110596594A (en) 2019-09-23 2019-09-23 Method for predicting SOE of rail-traffic lithium battery through big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910901034.0A CN110596594A (en) 2019-09-23 2019-09-23 Method for predicting SOE of rail-traffic lithium battery through big data

Publications (1)

Publication Number Publication Date
CN110596594A true CN110596594A (en) 2019-12-20

Family

ID=68862426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910901034.0A Pending CN110596594A (en) 2019-09-23 2019-09-23 Method for predicting SOE of rail-traffic lithium battery through big data

Country Status (1)

Country Link
CN (1) CN110596594A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111965562A (en) * 2020-10-20 2020-11-20 江苏慧智能源工程技术创新研究院有限公司 Method for predicting residual cycle life of lithium battery based on random forest model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951662A (en) * 2015-07-16 2015-09-30 中国科学院广州能源研究所 Method for estimating SOE (State of Energy) of lithium iron phosphate battery
CN105468658A (en) * 2014-09-26 2016-04-06 中国移动通信集团湖北有限公司 Data cleaning method and apparatus
CN106168799A (en) * 2016-06-30 2016-11-30 常伟 A kind of method carrying out batteries of electric automobile predictive maintenance based on big data machine learning
CN107122594A (en) * 2017-04-10 2017-09-01 湖南中车时代电动汽车股份有限公司 A kind of health forecast method and system of new energy vehicle battery
CN107677965A (en) * 2017-09-21 2018-02-09 中国科学院广州能源研究所 A kind of lithium battery energy state evaluation method
CN109839602A (en) * 2019-02-02 2019-06-04 爱驰汽车(上海)有限公司 Power battery performance and Valuation Method, device, electronic equipment
CN109934294A (en) * 2019-03-18 2019-06-25 常伟 A method of batteries of electric automobile SOH prediction is carried out based on big data machine learning
CN109932663A (en) * 2019-03-07 2019-06-25 清华四川能源互联网研究院 Cell health state appraisal procedure, device, storage medium and electronic device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468658A (en) * 2014-09-26 2016-04-06 中国移动通信集团湖北有限公司 Data cleaning method and apparatus
CN104951662A (en) * 2015-07-16 2015-09-30 中国科学院广州能源研究所 Method for estimating SOE (State of Energy) of lithium iron phosphate battery
CN106168799A (en) * 2016-06-30 2016-11-30 常伟 A kind of method carrying out batteries of electric automobile predictive maintenance based on big data machine learning
CN107122594A (en) * 2017-04-10 2017-09-01 湖南中车时代电动汽车股份有限公司 A kind of health forecast method and system of new energy vehicle battery
CN107677965A (en) * 2017-09-21 2018-02-09 中国科学院广州能源研究所 A kind of lithium battery energy state evaluation method
CN109839602A (en) * 2019-02-02 2019-06-04 爱驰汽车(上海)有限公司 Power battery performance and Valuation Method, device, electronic equipment
CN109932663A (en) * 2019-03-07 2019-06-25 清华四川能源互联网研究院 Cell health state appraisal procedure, device, storage medium and electronic device
CN109934294A (en) * 2019-03-18 2019-06-25 常伟 A method of batteries of electric automobile SOH prediction is carried out based on big data machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111965562A (en) * 2020-10-20 2020-11-20 江苏慧智能源工程技术创新研究院有限公司 Method for predicting residual cycle life of lithium battery based on random forest model
CN111965562B (en) * 2020-10-20 2020-12-29 江苏慧智能源工程技术创新研究院有限公司 Method for predicting residual cycle life of lithium battery based on random forest model

Similar Documents

Publication Publication Date Title
CN109931678B (en) Air conditioner fault diagnosis method based on deep learning LSTM
CN110850297A (en) Method for predicting SOH of rail-traffic lithium battery through big data
CN108664700B (en) Accelerated degradation information fusion modeling method based on uncertain data envelope analysis
CN112904219B (en) Big data-based power battery health state prediction method
CN109934294A (en) A method of batteries of electric automobile SOH prediction is carried out based on big data machine learning
CN109934408A (en) A kind of application analysis method carrying out automobile batteries RUL prediction based on big data machine learning
Thelen et al. Augmented model-based framework for battery remaining useful life prediction
CN113447828B (en) Lithium battery temperature estimation method and system based on Bayesian neural network
CN112380630B (en) New energy automobile battery thermal runaway risk prediction method based on Internet of vehicles data
Zhang et al. Cost-effective Lebesgue sampling long short-term memory networks for lithium-ion batteries diagnosis and prognosis
CN114651183A (en) Battery performance prediction
CN114236393A (en) Method and system for online detection of battery abnormity based on big data
US20230305073A1 (en) Method and apparatus for providing a predicted aging state of a device battery based on a predicted usage pattern
CN110516813A (en) A method of batteries of electric automobile RDR prediction is carried out based on big data machine learning
CN114545234A (en) Method for monitoring battery state based on battery temperature gradient change
Li et al. State-of-health estimation method for fast-charging lithium-ion batteries based on stacking ensemble sparse Gaussian process regression
CN115366683A (en) Fault diagnosis strategy for new energy automobile power battery multi-dimensional model fusion
CN111983474A (en) Lithium ion battery life prediction method and system based on capacity decline model
CN115656840A (en) Method, device, system and storage medium for predicting battery charging remaining time
CN110596595A (en) Method for predicting RUL of rail-traffic lithium battery through big data
CN116482540A (en) Analysis and prediction method, device and system for battery voltage inconsistency
Cao et al. A flexible battery capacity estimation method based on partial voltage curves and polynomial fitting
CN111709577A (en) RUL prediction method based on long-range correlation GAN-LSTM
CN115684936A (en) Power battery health failure probability calculation and health failure map establishment method based on machine learning
Wang et al. Dynamic early recognition of abnormal lithium-ion batteries before capacity drops using self-adaptive quantum clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination