Specific embodiment
Specific implementation of the patent mode is described in detail with reference to the accompanying drawing, it should be pointed out that the specific reality
Applying mode is only the citing to optimal technical scheme of the present invention, can not be interpreted as limiting the scope of the invention.
Fig. 1 shows the step of one of this patent specific embodiment batteries of electric automobile approaches of predictive maintenance.Wherein:
Step S001 data preparation step obtains and uses relevant data to batteries of electric automobile.
In this step, the data of the batteries of electric automobile include the use data of breakdown maintenance data and battery.Its
In, the breakdown maintenance data include the mantenance data of the data record and/or battery before cell malfunctions.The battery
Battery itself data relevant to battery and vehicle condition data when being included in normal use using data.
The stream data that time series is all based on using data of the breakdown maintenance data, battery, including but it is unlimited
In voltage, electric current, remaining capacity (SOC) etc..A kind of citing but not all data content is as shown in the table.
S002 data preparation step, to the batteries of electric automobile using relevant data carry out cleaning and will be after cleaning
The batteries of electric automobile is based on time quantum using relevant data and carries out data building.
In the present embodiment, due to being mainly based upon data processing realization, guarantee that the data of high quality are conducive to mention
The accuracy of high result, it is therefore desirable to which data preparation is carried out to the data of acquisition.The data preparation first has to carry out data
Cleaning, the present invention have formulated corresponding cleaning rule and have converted data of low quality to the data for meeting quality of data requirement.
Cleaning rule includes:
Vacant assignment: battery data is in transmission process, it is easy to and occurring to exchange causes variable to lack, in the present invention,
The main assignment that vacant variable is carried out using the average value or median or neighbor interpolation that take a trip variable.
Mistake value removal: by setting batteries of electric automobile using the reasonable value range of each variable of related data, i.e.,
Threshold value checks data whether meet the requirement, and the data that will exceed normal range (NR) are deleted or corrected.
Crosscheck: by setting batteries of electric automobile using the mutual constraint of related data and dependence, by logic
Upper unreasonable or conflicting data are deleted or are corrected.
It cleans after data, data building is carried out based on time quantum, i.e., it is other by what is collected according to the sequence of time
Data are integrated.Time quantum can be based on millisecond, second, minute etc., and time quantum can be different with the frequency of collection
It causes.
After completing data building, need to be assessed and corrected to based on the data that time quantum is constructed.Institute
Commentary is estimated including filtering out wrong data, i.e., there are those of mistake data for data itself.E.g., including but be not limited to, it lacks
Value, exceptional value, time cycle mistake and calculating specification mistake etc..After evaluation, the wrong data is corrected.Example
Such as missing values, the value that null will be present is set as 0, supplements the data of missing;For exceptional value, 0 is set by negative value, is kept away
Exempt from occur mistake in training process;For the numerical value of time cycle mistake, the time cycle should be clearly obtained, adjusts and transports again
Row data;For calculating the numerical value of specification mistake, bore adjustment and again operation data are specified.
The data obtained by data preparation step are summarized and are extracted by S003 data characterization step, obtain special
Data after signization.
Due to needing to be handled data and calculated in subsequent processing step, for ease of calculation with identification data
Feature, it is necessary first to reduced data is characterized in order to show the various features of the data consequently facilitating meter
It calculates and identifies.
It in this step, include rolling polymerization for the summary of data and extraction.The rolling polymerization refers to setting one
Time window, calculates the polymerizing value in scheduled variable in the time window, and the polymerizing value can be the summation of data, put down
Mean value either standard deviation.As shown in figure 4, such as t1 node, setting time window are 3, its rolling polymerization is exactly to calculate t1 section
Summation, mean value or the standard deviation of point and 3 nodes between the t1 node.
In this step, more preferable in order to provide learning algorithm, even additional study and predictive ability need
More multivariate data, invention are summarized and are extracted from the battery data based on time series, thus by initial S001
Characteristic variable is extended.For example, when there is 65 characteristic variables in step S001, in this example, the number being extended
According to mainly two classes: first is to increase 65-2=63 according to the mean value for rolling polymerization to initial 65 characteristic variables greatly;Second
Class is to increase 65-2=63 according to the standard deviation for rolling polymerization to 65 initial characteristic variables;The change finally obtained in this way
Amount is 65+63+63=191.This makes it possible to provide more multivariate data, so that being conducive to learning algorithm provides more preferable and prediction
Ability.
S004 data calculate step, establish battery predictive maintenance adaptive model based on the data after characterization.
The problem of for battery predictive maintenance, two sub-problems can be resolved into, first subproblem be battery whether
It will break down;Second subproblem is that there are also how long can break down for battery.Difference can be passed through for different problems
Model and algorithm go to be predicted.
Whether will break down for battery, it is pre- to establish the battery in present embodiment using binary classification model
The property surveyed maintenance adaptive model.
Specifically, the battery data of input is set as x;It is set as target judging whether battery will break down and is
Y, then the individual of y is only there are two types of selection, y=1 as breaks down, and y=0 is to break down.
The model of so binary classification is: y=f (x), wherein f is specific algorithm, battery data x can be mapped to mesh
It marks in y.
When being trained above-mentioned model using initial training data, need to carry out label to initial training data set,
Using the data to break down as positive (label 1), using the data of normal operation as reversed (label 0), it is established that
Next cycle possible breakdown or normal mode y=f (x), wherein y is whether battery will break down, and x is battery
Data, f are specific algorithm.
Wherein, the specific algorithm f selectively includes: logistic regression, promotes decision tree, decision forest and nerve net
Network.
The logistic regression algorithm assumes that the example of class is linear separability, passes through the gain of parameter of direct estimation discriminate
Final prediction model.Consider vector x '=(x that there is P independent variable for the data of electric car predictive maintenance1,
x2... xp), if conditional probability P (Y=1 | x)=p is the probability occurred according to observed quantity relative to certain event.Logistic regression is collinear
Property return equally required a hypothesis function, Sigmoid function is introduced in this algorithmWherein π
(x) domain is (- ∞ ,+∞), and codomain is (0,1).According to defined above, formula used by the logistic regression algorithm
Are as follows:
The promotion decision Tree algorithms are by combining the hierarchical data structure of decision tree divide-and-conquer strategy to initial classification
The data weighting of last misclassification is improved every time and is a little classified again by generated classifying rules, such loop iteration
Obtain objective result.
If D is the division that use classes carry out training tuple, then the entropy of D indicates are as follows:
Wherein, pi indicates what i-th of classification occurred in entire training tuple
Probability can be used and belong to the quantity of this class elements divided by training tuple elements total quantity as estimating.The practical significance table of entropy
Show it is average information required for the class label of tuple in D.For this prediction technique, D is battery failures situation, is had
Failure and normal two states, so m=2.
If training tuple D is divided by attribute A, wherein A be after characterization, battery data one of them
Feature, then the expectation information that A divides D are as follows:Wherein j indicates certain of attribute A
A type, V indicate the classification sum of attribute A;And the information gain of attribute A is the difference of the two: gain (A)=info (D)-
infoA(D).Need to calculate the information gain of each attribute in battery data training tuple at every secondary clearing (division), then
The selection maximum attribute of ratio of profit increase is layered, and the decision tree for being able to carry out electric car predictive maintenance thus can be formed.
The forest that decision forest is made of multiple decision trees, algorithm classification result are voted to obtain by these decision trees, decision
Tree adds random process on line direction and column direction respectively in the process of generation, uses when constructing decision tree on line direction
Sampling with replacement (bootstraping) obtains training data, puts back to random sampling using nothing on column direction and obtains character subset, and
Its optimal cut-off is obtained accordingly.Decision forest is a built-up pattern, and inside is still based on decision tree, with single decision
Unlike tree classification, decision forest is classified by multiple decision tree voting results, and algorithm is not easy overfitting occur
Problem.
Neural network is exactly that the second way of human brain thinking is simulated using its algorithm characteristic, it is a Nonlinear Dynamic
Mechanical system is able to carry out concurrent collaborative processing although the structure of single neuron and its simple.It is different in neural network
The output layer of scene corresponds to different cost functions, and in this method, output layer is K logistic regression, the cost letter of whole network
Number is exactly the adduction of this K Logic Regression Models cost function, can carry out batteries of electric automobile failure by this cost function
Prediction, the assessment of cost function carries out according to s006 algorithm evaluation.
For battery there are also that how long can break down, the battery is established in present embodiment using regression model
Predictive maintenance adaptive model.
Regression model determines the relationship between variable to the credible of these relational expressions from one group of sample data
Degree carries out various statistical checks, and the influence for finding out from all multivariables for influencing a certain particular variables which variable is significant,
Which is not significant.
Time to break down marks each battery data from the time that time upper distance breaks down as Y
Labelization;For example, when battery is used for 5 days, fault time is 300, remaining time represented by the label is
300-5=295;In another example when battery is used for 10 days, fault time is 280, remaining time represented by the label
For 280-10=270.Sample each in this way can have a remaining pot life.Specific label is as shown in the table:
The battery data of input is set as x;The model of regression algorithm is Y=f (x).Wherein, the regression model is used
Specific algorithm f include decision forest algorithm return, promoted decision tree return, Poisson regression and neural net regression.
Promoting decision tree and returning with decision forest recurrence is made of decision tree one or several decision trees, is
The combination of decision tree and the battery whether will break down it is middle using the relevant algorithm of decision tree as, battery also
In the regression model that how long can be broken down, also using judging to promote what decision tree and decision forest returned using information gain
Quality passes through difference:
Gain (A)=info (D)-infoA(D), judge.
In Poisson regression, modeled using the Poisson regression model recorded extensively in the prior art.
Neural network is exactly a kind of algorithm for the simulation human brain thinking recorded extensively in the prior art.Neural network
In, the output layer of different scenes corresponds to different cost functions.In this method, output layer can be K logistic regression, entire net
The cost function of network is exactly the adduction of this K Logic Regression Models cost function.
S005 trains verification step, is trained and verifies to adaptive model to optimize the adaptive model.
On the basis of establishing above-mentioned model, the work for needing to be trained and verify carrys out Optimized model.In order to improve
The accuracy of model.
In this embodiment, the trained verification step preferably includes cross validation and minority class sampling.
The cross validation method of parameter frame in to(for) each model optimizes.Such as disaggregated model above-mentioned
(decision forest algorithm returns, promotes decision for (logistic regression promotes decision tree, decision forest and neural network) and regression model
Tree algorithm returns, Poisson algorithm returns and neural network algorithm returns), the reliability of these algorithms relies on parameter frame, is exactly
Say which battery data for generation the result is that most effective.
In this embodiment, in order to improve the quality of parameter frame, original data are randomly divided into K first
Part.In this K part, select one of part as test data, remaining K-1 part is obtained as training data
To corresponding experimental result.Then, another part is selected as test data, and remaining K-1 part is as training number
According to, and so on, repeat K crosscheck.Every time experiment all selected from K part a different part as
Test data guarantees that the data of K part all did test data respectively, and remaining K-1 are tested as training data.
Finally K obtained experimental result is averaged, the experimental result may include accuracy, recall rate and comprehensive evaluation index
Deng.According to the purpose of each predictive maintenance, in the selection of accuracy, recall rate and three kinds of comprehensive evaluation index of mean value, thus
Determine optimal classification, the training of implementation model.
The minority class sampling is when only having small number of training sample for a kind of data, and data set is unbalanced
It is used when situation.It, can be by will be a small number of in present embodiment when a kind of data only have a small amount of training sample
The new minority class sample data of fault sample Data Synthesis carry out the training of model.Such as in the data collection of battery,
Only discovery has a small amount of fault record data, in order to generate more data for carrying out machine learning from a small amount of fault data,
It needs to carry out Data Synthesis.Specifically, a sample B is selected at random from its arest neighbors to each minority class sample A, this
In distance be to be calculated according to the distance in time and variogram, then randomly choosed a bit on the line between A and B
As newly synthesized minority class sample.Continuous synthesis in this way, a small amount of sample A can be become to have multidata
Sample A+ will not be generated in calculating because of mistake caused by data nonbalance to reach the data demand of predictive maintenance
Fitting or distortion.
S006 algorithm evaluation step is assessed prediction result of the data under algorithms of different, optimal calculation is selected based on assessment
Method.
In the predictive maintenance of battery, based on different prediction targets or it is different data source, using different
The obtained result of algorithm is also different, and thus needs to select preferable algorithm for different situations.
Usually in batteries of electric automobile predictive maintenance, accuracy (Precision), recall rate can be used
(Recall) or comprehensive evaluation index (F1-Measure) carrys out assessment prediction as a result, more in varied situations using different
Whether the obtained result of algorithm is optimal, to select optimal algorithm.
Wherein, accuracy is that how many is practical in the model prediction is broken down for prediction result sample
The sample really to break down, usually the higher the better.The recall rate is how many quilt really to break down in sample
Predict correct, usually the higher the better.
In battery predictive maintenance, the two usually conflicts.In order to improve for the reasonable of more excellent algorithms selection
Property, F1-Measure comprehensive evaluation index is preferably used in this embodiment, it combines accuracy and recall rate
Weighted average, the higher the better for value.Formula isWherein P is accuracy, and R is recall rate, when parameter alpha=1
When, it is exactly the most common F1, namelyThe result F or F1 obtained according to algorithms of different is different to judge
The superiority of algorithm in different environments.Such as a certain group of specific data and prediction target, after calculating relatively
It was found that such data and target select to promote decision Tree algorithms in disaggregated model and select neural network in regression model
Regression algorithm result is optimal.