CN110456799A

CN110456799A - A kind of online incremental learning method of automatic driving vehicle Controlling model

Info

Publication number: CN110456799A
Application number: CN201910777721.6A
Authority: CN
Inventors: 张卫忠
Original assignee: Hefei Yunjia Intelligent Technology Co Ltd
Current assignee: Hefei Yunjia Intelligent Technology Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-11-15

Abstract

The invention discloses a kind of online incremental learning methods of automatic driving vehicle Controlling model, the data that on-line study system in cloud utilizes vehicle end to upload, on-line study is carried out to existing Controlling model, the model updated, and it is verified by model of the validation data set to update, vehicle end will be reached if trueness error is less than predefined thresholds with model accuracy of the existing model on verifying collection by calculating it under this model.Vehicle end data acquisition system is compared using the output of "current" model with the driver behavior that human driver exports in real time, if exporting difference is greater than predefined thresholds, the sensing data of the set time length stored in data buffer storage and driver behavior sequence are then uploaded to cloud, trigger the on-line study process of current Controlling model.The real time data when present invention can be driven using human driver carries out incrementally updating to the Controlling model based on machine learning, promotes Controlling model to continue to optimize, is a novelty, practical thinking.

Description

A kind of online incremental learning method of automatic driving vehicle Controlling model

Technical field

The present invention relates to automatic driving vehicle and its related application technical fields more particularly to a kind of based on machine learning Automatic driving vehicle Controlling model collecting training data and the online Increment Learning Algorithm of model.

Background technique

Automatic driving vehicle has huge application prospect in the fields such as national economy and national defence.Traditional is unmanned Vehicle is intercoupled using perception, planning, control hierarchical Technical Architecture, various key technologies, and technology complexity is high, causes Automatic driving vehicle landing application encounters huge obstacle.

In recent years, the rapid advances of the artificial intelligence technologys such as deep learning, rare hair is brought to automatic driving vehicle Open up opportunity.However, current depth learning technology needs the data largely marked to carry out the study for having supervision, it is this to have mark Data acquisition cost it is extremely high；Current depth learning model is mostly once trained complete just investment application deployment, it is difficult to suitable Answer the continuous variation of Driving Scene and environment.

Summary of the invention

The object of the invention is to remedy the disadvantages of known techniques, and it is online to provide a kind of automatic driving vehicle Controlling model Incremental learning method.Mass data needed for being able to solve conventional depth study marks at high cost and acquistion Controlling model The problems such as being difficult to adapt to the variation of environment and scene.

The present invention is achieved by the following technical solutions:

A kind of online incremental learning method of automatic driving vehicle Controlling model includes automatic driving vehicle control mould Type, cloud on-line study system and vehicle end data acquisition system, the automatic driving vehicle Controlling model are by based on depth What the end to end model of study was constituted；The cloud on-line study system passes through the instruction using the acquisition of vehicle end data acquisition system Practice data, on-line study update is carried out to current automatic driving vehicle Controlling model using on-line learning algorithm, then to more Controlling model after new is verified, the precision difference of the Controlling model by comparing updated Controlling model and before updating, Determine whether the Controlling model for passing update down；Every time after study, original control mould is replaced with updated Controlling model is learnt Type keeps the continuous renewal of Controlling model；The vehicle end data acquisition system runs logical when automatic driving vehicle Controlling model The movement execution that driver controls automatic driving vehicle is crossed, then the output valve of calculating Controlling model and driver export in real time The difference of driver behavior, if difference is greater than predefined thresholds, by multiple biographies of the set time length stored in data buffer storage Sensor data and driver behavior sequence critical data are uploaded to cloud, trigger the on-line study process of current Controlling model.

The automatic driving vehicle Controlling model is a time-varying model, Controlling model M_tIt is expressed as follows:

M_t=f (I_1:t,θ_t)

Wherein, I_tIndicate the training data set of t moment, θ_tIndicate the parameter of t moment Controlling model.

The training data of the vehicle end data acquisition system acquisition is by multiple sensing datas in certain period of time It is formed with driver behavior sequence:

I_t={ d₁,d₂,...,d_n}

In above formula, I_tIndicate the training data sequence that t moment vehicle end uploads, d_iIndicate that a training sample, n are sample number Amount, i indicate sample serial number.d_iIt indicates are as follows:

d_i=(s₁,s₂,...,s_m,throttle,steering,brake)

Wherein, s₁,s₂,…,s_mIndicate that multiple sensing datas, throttle indicate the vehicle accelerator value of acquisition, Steering indicates the Vehicular turn value of acquisition, and brake indicates the vehicle braking value of acquisition, and m indicates number of sensors.

The on-line learning algorithm is trained based on batch gradient descent method, network weight more new formula are as follows:

W^t+1=W^t-η▽L(W^t)

Wherein, W is network weight, L (W^t) it is loss function, η is learning rate, and ▽ is to ask gradient signs, loss function Is defined as:

Wherein,It predicts to export for current Controlling model, c_hOutput true value is controlled for current training data, p is every train Hold the batch size of the training data uploaded.

Described verifies updated Controlling model, is specifically verified in validation data set, and number is verified According to the strategy of centralized procurement online updating, more new strategy is to be proportionally added into new training on the basis of previous moment is verified and collected Partial data in data set, it may be assumed that

Wherein, V_tIndicating the validation data set of t moment, α is that current training dataset picks up ratio,Merge behaviour for collection Make.

The other calculation method of the low precision are as follows:

Wherein, e₁It indicates verifying the loss difference on collection before and after model modification, | V_t| indicate the quantity of t moment verifying collection, LOSS_j(M_t) indicate model M_tLoss in j-th of verify data, if e₁Greater than 0 and it is greater than specific threshold TH₁When, issue this The model of update gives vehicle end.

The Controlling model, vehicle end distribute two memory headrooms and are respectively intended to store under current Controlling model and cloud The Controlling model of the update of biography assigns it to current model after the Controlling model that cloud passes down receives, then will more New model empties.

The difference for the driver behavior that the output valve of the calculating Controlling model and driver export in real time, specific as follows:

Wherein, e₂Indicate steering, the throttle, difference between brake value output valve corresponding with driver of Controlling model output Average value, A₁、A₂And A₃Respectively steering, throttle and braking,Indicate that k-th of value of the output of "current" model (turns To, throttle, brake value),Indicate k-th of value (steering, throttle, brake value) of human driver's output；If e₂Greater than spy Determine threshold value TH₂When, then the start recording critical data in data buffer storage, and when the quantity of critical data in data buffer storage reaches It is uploaded to cloud when predetermined value, triggers cloud on-line study mechanism.

The data buffer storage refers to the caching that particular size is opened up on the computer that Controlling model executes, for storing The multiple sensing datas and driver behavior sequence acquired in certain time, caching use the queue-type data knot of fixed size Structure is pressed into queue and is saved whenever computer acquisition to multiple sensing datas and driver behavior data.

The critical data is to work as e₂Greater than specific threshold TH₂When, N/2 time data sequence, the moment number before the moment According to the data sequence of, rear N/2, N is 10 frames~20 frames, and N+1 is the quantity of the data frame of data buffer storage record.

The Controlling model is made of the end to end model based on deep learning；The online incremental learning method includes Cloud on-line study system and vehicle end data acquisition system two parts.Online incremental learning scheme disclosed in this invention is adapted to Any end-to-end unmanned Controlling model using deep learning, the Controlling model, which has, directly perceives vehicle from sensor Mapping ability between motion control instruction, other perception based on deep learning, planning, in Controlling model, according to The identical method of the present invention also belongs to the scope that the present invention states.

Training dataset can be directly by there is the acquisition of the real time operating data of experience human driver, without artificial mark Note；On the other hand, by the output of real time contrast's Controlling model and human driver, the unconformable field of Controlling model itself is found Scape, and acquire corresponding contextual data and be uploaded to cloud on-line study, incrementally updating is carried out to Controlling model.Pass through above-mentioned mistake Journey can effectively realize the online incremental learning of automatic driving vehicle Controlling model, to promote automatic driving vehicle key technology Development and application provide novel reference thinking.

The invention has the advantages that one aspect of the present invention, can effectively reduce Traditional control model disposably train it is required Data volume, on the other hand, can using human driver drive when real time data to the Controlling model based on machine learning into Row incrementally updating promotes Controlling model to continue to optimize, until reaching the controlled level of class people.Method disclosed by the invention for The research and development of current automatic driving vehicle key technology provide a kind of novel, practical thinking.

Detailed description of the invention

Fig. 1 is work flow diagram of the invention.

Fig. 2 is data buffer storage data structure schematic diagram.

Specific embodiment

As shown in Figure 1, a kind of online incremental learning method of automatic driving vehicle Controlling model, it is characterised in that: including There are unmanned vehicle control model, cloud on-line study system and vehicle end data acquisition system, the automatic driving vehicle Controlling model is made of the end to end model based on deep learning；The cloud on-line study system is by utilizing vehicle end The training data of data collection system acquisition carries out current automatic driving vehicle Controlling model using on-line learning algorithm Line study updates, and then verifies to updated Controlling model, by comparing updated Controlling model and before updating The precision difference of Controlling model, it is determined whether pass the Controlling model of update down；Every time after study, with the updated control mould of study Type replaces original Controlling model, keeps the continuous renewal of Controlling model；The vehicle end data acquisition system runs nobody and drives It is executed when sailing vehicle control model by the movement that driver controls automatic driving vehicle, then calculates the output valve of Controlling model The difference of the driver behavior exported in real time with driver is consolidated if difference is greater than predefined thresholds by what is stored in data buffer storage The multiple sensing datas and driver behavior sequence critical data for length of fixing time are uploaded to cloud, trigger current Controlling model On-line study process.The Controlling model is made of the end to end model based on deep learning；The online incremental learning side Method includes cloud on-line study system and vehicle end data acquisition system two parts.Online incremental learning scheme disclosed in this invention It is adapted to any end-to-end unmanned Controlling model using deep learning, the Controlling model has directly from sensor sense Know the mapping ability between vehicle motion control instruction, other perception based on deep learning, planning, in Controlling model, if Using method same as the present invention, the scope that the present invention states is also belonged to.

Controlling model of the present invention is a time-varying model:

M_t=f (I_1:t,θ_t)

Wherein, I_tIndicate the training data set of t moment, θ_tIndicate the parameter of t moment model.This model is not only adapted to Controlling model end to end is further adapted to the time variation description of the models such as perception, planning, the control based on machine learning.

Cloud on-line study system operates on cloud server, by using vehicle end acquisition training data, using Line learning algorithm carries out on-line study update to current Controlling model, then verifies to model, after updating Model and update before model precision difference, it is determined whether pass the model of update down；Every time after study, with the mould after study Type replaces original model, keeps the continuous renewal of model.

The multi-sensor data that training sample needed for the on-line study of cloud is acquired by vehicle end in certain period of time with drive Sail action sequence composition:

I_t={ d₁,d₂,...,d_n}

d_i=(s₁,s₂,...,s_m,throttle,steering,brake)

Cloud on-line study method, is trained based on batch gradient descent method, network weight more new formula are as follows:

W^t+1=W^t-η▽L(W^t)

After completing model on-line study, needs to verify the model of update, have confirmed that whether its performance is changed Into.Model verifying carries out in validation data set, and validation data set equally uses the strategy of online updating, more new strategy be On the basis of previous moment verifying collection, it is proportionally added into the partial data that new training data is concentrated, it may be assumed that

Wherein, V_tIndicating the validation data set of t moment, ∝ is that current training dataset picks up ratio,Merge behaviour for collection Make.

Model performance calculation method before updated model and update are as follows:

A model modification is completed every time, existing model is replaced with to the model of update, so that model is instructed next time When practicing based on newest model, so that the incremental learning of implementation model updates.

The Controlling model of vehicle carried data collecting system, vehicle end distribute two memory headrooms and are respectively intended to store current control The Controlling model for the update that model and cloud pass down assigns it to current after the Controlling model that cloud passes down receives Then model empties the model of update.

Vehicle end data acquisition method, by calculating driver behavior that vehicle end Controlling model Controlling model exports in real time in real time Difference determines critical data frame:

The vehicle end data caching, opens up the caching of particular size, for storing on the computer that Controlling model executes The sensing data and driver behavior sequence acquired in certain time.The caching uses the queue-type data structure of fixed size, Whenever computer acquisition to sensing data and driver behavior data, it is pressed into queue and is saved.As shown in Figure 2.

The critical data frame definition is to work as e₂Greater than specific threshold TH₂When, N/2 time data sequence before the moment, this when Data, the data sequence of rear N/2 are carved, N can be generally defined as 10 frames~20 frames, and N+1 is the data that data buffer storage is able to record The quantity of frame.

Claims

1. a kind of online incremental learning method of automatic driving vehicle Controlling model, it is characterised in that: include automatic driving car Controlling model, cloud on-line study system and vehicle end data acquisition system, the automatic driving vehicle Controlling model be by What the end to end model based on deep learning was constituted；The cloud on-line study system is by utilizing vehicle end data acquisition system The training data of acquisition carries out on-line study update to current automatic driving vehicle Controlling model using on-line learning algorithm, Then updated Controlling model is verified, the essence of the Controlling model by comparing updated Controlling model and before updating Spend difference, it is determined whether pass the Controlling model of update down；Every time after study, replaced originally with updated Controlling model is learnt Controlling model keeps the continuous renewal of Controlling model；The vehicle end data acquisition system operation automatic driving vehicle controls mould It is executed when type by the movement that driver controls automatic driving vehicle, output valve and the driver for then calculating Controlling model are real-time The difference of the driver behavior of output, if difference is greater than predefined thresholds, by the set time length stored in data buffer storage Multiple sensing datas and driver behavior sequence critical data are uploaded to cloud, trigger the on-line study of current Controlling model Journey.

2. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 1, feature exist In: the automatic driving vehicle Controlling model is a time-varying model, Controlling model M_tIt is expressed as follows:

M_t=f (I_1:t,θ_t)

3. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 2, feature exist In: the training data of the vehicle end data acquisition system acquisition is by multiple sensing datas in certain period of time and to drive Action sequence composition:

I_t={ d₁,d₂,...,d_n}

In above formula, I_tIndicate the training data sequence that t moment vehicle end uploads, d_iIndicate that a training sample, n are sample size, i Indicate sample serial number, d_iIt indicates are as follows:

d_i=(s₁,s₂,...,s_m,throttle,steering,brake)

Wherein, s₁,s₂,…,s_mIndicate that multiple sensing datas, throttle indicate the vehicle accelerator value of acquisition, steering table Show the Vehicular turn value of acquisition, brake indicates the vehicle braking value of acquisition, and m indicates number of sensors.

4. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 3, feature exist In: the on-line learning algorithm is trained based on batch gradient descent method, network weight more new formula are as follows:

Wherein, W is network weight, L (W^t) it is loss function, η is learning rate,To ask gradient signs, loss function definition Are as follows:

Wherein,It predicts to export for current Controlling model, c_hOutput true value is controlled for current training data, p is the upload of every train end Training data batch size.

5. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 4, feature exist In: it is described that updated Controlling model is verified, it is specifically verified in validation data set, verify data centralized procurement With the strategy of online updating, more new strategy is to be proportionally added into new training dataset on the basis of previous moment is verified and collected In partial data, it may be assumed that

Wherein, V_tIndicating the validation data set of t moment, α is that current training dataset picks up ratio,To collect union operation.

6. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 5, feature exist In: the other calculation method of the low precision are as follows:

7. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 6, feature exist In: the Controlling model, vehicle end distribute two memory headrooms and are respectively intended to store what current Controlling model and cloud passed down The Controlling model of update assigns it to current model after the Controlling model that cloud passes down receives, then by update Model empties.

8. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 7, feature exist In: the difference for the driver behavior that the output valve of the calculating Controlling model exports in real time with driver, specific as follows:

Wherein, e₂Indicate that the steering of Controlling model output, throttle, difference is flat between brake value output valve corresponding with driver Mean value, A₁、A₂And A₃Respectively steering, throttle and braking,Indicate k-th of steering, the oil of the output of "current" model Door, brake value,Indicate that driver operates k-th steering, throttle, brake value of output；If e₂Greater than specific threshold TH₂ When, then the start recording critical data in data buffer storage, and when the quantity of critical data in data buffer storage reaches predetermined value It is uploaded to cloud, triggers cloud on-line study mechanism.

9. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 8, feature exist Refer to the caching that particular size is opened up on the computer that Controlling model executes in: the data buffer storage, it is certain for storing The multiple sensing datas and driver behavior sequence acquired in time, caching use the queue-type data structure of fixed size, often When computer acquisition to multiple sensing datas and driver behavior data, it is pressed into queue and is saved.

10. a kind of online incremental learning method of automatic driving vehicle Controlling model according to claim 8, feature exist In: the critical data is to work as e₂Greater than specific threshold TH₂When, N/2 time data sequence before the moment, the time data, after The data sequence of N/2, N are 10 frames~20 frames, and N+1 is the quantity of the data frame of data buffer storage record.