CN108909833A - Intelligent automobile rotating direction control method based on Policy iteration - Google Patents
Intelligent automobile rotating direction control method based on Policy iteration Download PDFInfo
- Publication number
- CN108909833A CN108909833A CN201810597914.9A CN201810597914A CN108909833A CN 108909833 A CN108909833 A CN 108909833A CN 201810597914 A CN201810597914 A CN 201810597914A CN 108909833 A CN108909833 A CN 108909833A
- Authority
- CN
- China
- Prior art keywords
- transport condition
- condition data
- course changing
- network model
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 101
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 230000003044 adaptive effect Effects 0.000 claims description 9
- 230000010365 information processing Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 claims description 5
- 238000011217 control strategy Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 34
- 238000003062 neural network model Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B62—LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
- B62D—MOTOR VEHICLES; TRAILERS
- B62D6/00—Arrangements for automatically controlling steering depending on driving conditions sensed and responded to, e.g. control circuits
- B62D6/001—Arrangements for automatically controlling steering depending on driving conditions sensed and responded to, e.g. control circuits the torque NOT being among the input parameters
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B62—LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
- B62D—MOTOR VEHICLES; TRAILERS
- B62D15/00—Steering not otherwise provided for
- B62D15/02—Steering position indicators ; Steering position determination; Steering aids
- B62D15/025—Active steering aids, e.g. helping the driver by actively influencing the steering system after environment evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Feedback Control In General (AREA)
- Steering Control In Accordance With Driving Conditions (AREA)
Abstract
The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile rotating direction control method based on Policy iteration, it is intended to solve the problems, such as how to improve online independent learning ability of the unmanned intelligent automobile on course changing control.For this purpose, the intelligent automobile rotating direction control method in the present invention based on Policy iteration includes:Acquire the transport condition data and vehicle control amount of vehicle;Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, predict the transport condition data at next acquisition moment;On-line training based on judgment mechanism control course changing control network model;Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration algorithm, the execution network after being optimized are executed;Steering based on the vehicle control amount control intelligent automobile for executing network output.This method improves the real-time of model training and the adaptability to current environment.
Description
Technical field
The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile based on Policy iteration turn
To control method.
Background technique
Intelligent automobile is unmanned be include environment sensing, path planning and the technology for independently realizing vehicle control.Research
Show in enhancing expressway safety, alleviate the fields such as traffic congestion, reduction air pollution, it is unmanned to bring subversiveness
Improvement.Unmanned automobile perceives environmental change and the vehicle running state of road by sensor, utilizes the unmanned control of vehicle
Technology processed provides vehicle optimum control according to vehicle running state and road environment.The research of intelligent automobile will reduce due to
The probability of traffic accident occurs for driver's negligence, while driver being absorbed in solve in state from prolonged driving and is released
Come.Currently, external have been achieved for tremendous development in terms of pilotless automobile, domestic research institution and colleges and universities have been opened
Begin correlative study, and obtained certain achievement, but has had certain gap compared to external unmanned technology.
The self-steering on road of vehicle belongs to the research contents of the L2 grade semi-automatic driving in unmanned grade
One of, how the road environment of vehicle driving is perceived according to detector and vehicle running state calculates corner information, and then controlled
The steering actuator of automotive interior processed adjusts Vehicular turn, is one of main research of self-steering.Motor turning control
With strong kinematic nonlinearity, neural network be then it is functional can be achieved the non-linear tool hinted obliquely at, training nerve net
Network needs a large amount of data, and the data for generally requiring to traverse entire sample-motion space can just obtain preferable model net
Network.Prototype network training method includes off-line training and two kinds of on-line training.Neural Network Online training mode is by constantly adopting
The shortcomings that course changing control information of collection automobile, the steering model of training vehicle, which, is, in the initial stage, due to lacking
Data, be difficult to initialize the course changing control to automobile, while training data is continuously increased, and be will affect and is established neural network mould
The real-time of type.Off-line training mode can not be done then since network training data are fixed in face of variation complex environment sometimes
It is empty to need to guarantee that sample traverses entire sample-movement network as far as possible to obtain preferable training effect for effective control out
Between, it means that pay biggish trained cost.
Summary of the invention
In order to solve the above problem in the prior art, turning in order to solve how to improve unmanned intelligent automobile
The problem of online independent learning ability in control, the present invention provides a kind of, and the intelligent automobile based on Policy iteration turns to
Control method, including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next
Acquire the transport condition data at moment;The course changing control network model is after preset training dataset off-line training
Neural network model;
Pair according to each transport condition data and course changing control the network model prediction collected in predetermined time period
The transport condition data at the next acquisition moment answered calculates contrast function value;
If the contrast function value is greater than preset threshold value, the traveling that will be collected in the predetermined time period
The preset training dataset is added in status data and vehicle control amount, and based on the training dataset to course changing control net
Network model carries out on-line training;
Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration is executed
Adaptive dynamic algorithm, the execution network after being optimized;
The steering of the intelligent automobile is controlled based on the vehicle control amount for executing network output.
Further, " according to each transport condition data and course changing control network mould collected in predetermined time period
Type prediction it is corresponding it is next acquisition the moment transport condition data, calculate contrast function value " the step of include, according to the following formula
Shown in contrast function E, calculate contrast function value:
Wherein, xsFor the transport condition data collected, x is the driving status number of course changing control network model prediction
According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label.
Further, the step of course changing control network model, on-line training includes:
Step S101:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model
Predict the transport condition data at corresponding next acquisition moment;
Step S102:It calculates course changing control network model prediction result and sets desired error;
Step S103:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S104:Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model
Prediction result and the desired error of setting are in a certain range.
Further, the Efficiency Function for evaluating network is:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukFor reality
The vehicle control amount on border;
Currently acquiring the performance index function at moment based on evaluation network evaluation is:
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control for acquiring the moment
Amount, x are the transport condition data of vehicle, and k is the label at current acquisition moment, and l is moment label, as l → ∞, vehicle row
It is parallel with road direction to sail direction, U (xk+l, υi(xk+l))→0。
Further, in Utilization assessment network and during executing the adaptive dynamic algorithm of network implementations Policy iteration,
It is optimized based on function shown in following formula to network is executed:
Vi(xk)=U (xk,υi(xk))+Vi(xk+1)
Wherein, Vi(xk) be control strategy under performance index function, U (xk,υi(xk)) it is the current efficiency for acquiring the moment
Function, Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is the current current control plan for acquiring the moment
Vehicle control amount under slightly, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkFor actual vehicle
Control amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F
(xk,uk) be course changing control network model abstract function.
Further, the step of course changing control network model, off-line training includes:
Step S201:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained
Collection;
Step S202:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model
Predict the transport condition data at corresponding next acquisition moment;
Step S203:It calculates course changing control network model prediction result and sets desired error;
Step S204:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S205:Step S202-S204 is repeated, until maximum the number of iterations or course changing control network model are predicted
As a result in a certain range with the desired error of setting.
Further, described pre-process includes:It denoises, screens out abnormal data, repeats state information processing;
It is described repeat state information processing include:Calculate the transport condition data collected and vehicle control amount and training
Difference value between transport condition data in data set and vehicle control amount compares the difference value and given threshold
Compared with if the difference value adds the transport condition data collected and vehicle control amount greater than the given threshold
Enter training dataset.
Further, " the traveling that the transport condition data and vehicle control amount and training data that calculating collects are concentrated
Difference value between status data and vehicle control amount ", as the following formula shown in function calculate difference value:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor the vehicle driving side of acquisition
To with road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control of acquisition
Amount processed, utFor the vehicle control amount that training data is concentrated, m is the label of training data.
Further, the transport condition data includes speed, acceleration, the lateral shift of road, vehicle heading
With road direction angle.
Compared with the immediate prior art, above-mentioned technical proposal is at least had the advantages that:
Intelligent automobile rotating direction control method based on Policy iteration of the invention realizes intelligence using neural network model
The advantages of course changing control of automobile, this method combines off-line training and on-line training, need not pay huge trained cost
Network training can be realized, simply and effectively initialize network.It also solves the disadvantage that on-line training simultaneously, that is, needs not
It is disconnected to introduce training data, retraining constantly is carried out to network.Current ambient conditions-movement net is traversed as far as possible in guarantee sample
While network, the generation of redundant data is also avoided as far as possible, improves the real-time of network training and current environment is fitted
Ying Xing.
Detailed description of the invention
Fig. 1 is the adaptive dynamic algorithm flow diagram based on Policy iteration of an embodiment of the present invention;
Fig. 2 is the decision device mechanism flow diagram of an embodiment of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.
Traditional intelligent automobile rotating direction control method needs to establish the mathematical model of motor turning control, for this reason, it may be necessary to
All dependent variables in motor turning control are solved, such as automobile self performance, automobile present speed, acceleration, road parameters,
Middle road parameters include road curve curvature, road inclination, coefficient of friction etc., after according to experimental fit or physics law
The nonlinear equation for obtaining multivariable input is deduced, the determination of the complexity of equation and each term coefficient needs many experiments and survey
Amount.Neural network has a good non-linear mapping capability, and network training mode includes on-line training and off-line training, however,
On-line training needs continuous acquisition data and constantly training obtains neural network model, cannot preferably initialize trolley control
System output, and since the acquisition of data is added in neural network model without the data that sensor is defeated of screening, it causes big
Invalid redundant data is measured, the real-time of model training is influenced, in face of needing to repeat aforesaid operations when environmental change, further
It is exaggerated the shortcomings that cannot initializing trolley control output and data redundancy.Off-line training mode is then due to network training data
It is fixed, effective control can not be made sometimes in face of variation complex environment, to obtain preferable training effect, to guarantee sample
This traverses entire sample-movement cyberspace as far as possible and needs to pay biggish trained cost.
Intelligent automobile rotating direction control method based on Policy iteration of the invention combines off-line training and on-line training
Advantage carries out course changing control using the good model of off-line training, is simply effectively obtained initialization model network.Exist simultaneously
Vehicle travel process joined judgment mechanism, and the on-line training of Controlling model solves and needs to continually introduce instruction in on-line training
The shortcomings that practicing data, retraining constantly carried out to network.With reference to the accompanying drawing, to provided by the invention based on Policy iteration
Intelligent automobile rotating direction control method is illustrated.
The present invention reflects the driving status of vehicle, the transport condition data packet of vehicle by the transport condition data of vehicle
Include speed, acceleration, the lateral shift of road, vehicle heading and road direction angle;The control amount of vehicle, that is, vehicle control
The input of signal processed, intelligent automobile is under unmanned mode, by constantly calculating the control amount of next acquisition moment vehicle,
And the control system of intelligent automobile is inputed to, to realize the course changing control under unmanned mode.It should be noted that this reality
Shi Zhong, the transport condition data of acquisition vehicle and the time interval of vehicle control amount are equal every time, when being specifically spaced
Between can be adjusted according to the actual situation.
Intelligent automobile rotating direction control method of one of the present embodiment based on Policy iteration, including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next
Acquire the transport condition data at moment;Course changing control network model is the nerve after preset training dataset off-line training
Network model;
Pair according to each transport condition data and course changing control the network model prediction collected in predetermined time period
The transport condition data at the next acquisition moment answered calculates contrast function value;
If contrast function value is greater than preset threshold value, the transport condition data that will be collected in predetermined time period
Training dataset is added with vehicle control amount, and on-line training is carried out to course changing control model based on the training dataset;
Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration is executed
Adaptive dynamic algorithm, the execution network after being optimized;
Steering based on the vehicle control amount control intelligent automobile for executing network output.
Further, course changing control network model is constructed based on BP neural network, and introduces a judgment mechanism, to control
The on-line training of course changing control network model processed specifically acquires the transport condition data and vehicle control amount of vehicle;It will go
Sail status data and vehicle control amount input course changing control network model, the corresponding next acquisition of course changing control network model prediction
The transport condition data at moment;Calculate the transport condition data collected in setting time length and course changing control network mould
The contrast function value E of the transport condition data at corresponding next acquisition moment of type prediction;By the contrast function value E and one
Preset threshold value T1 is compared, if contrast function value E is greater than preset threshold T1, will be acquired in the setting time
Preset training dataset is added to transport condition data and vehicle control amount, and based on the training dataset to course changing control
Network model carries out on-line training, and otherwise course changing control model keeps current state;Using course changing control network model as controlled
Object processed, based on evaluating network and executing network implementations Policy iteration algorithm, the execution network after obtaining optimization is based on executing
The vehicle control amount of network output realizes the control of intelligent automobile automated steering.
The intelligent automobile rotating direction control method based on Policy iteration of the present embodiment needs during overcoming on-line training
The shortcomings that continuous acquisition data and constantly training neural network model, while also solving driving status number in traditional mode
It is directly inputted in neural network model according to without screening, a large amount of invalid redundant datas is generated, to influence neural network model
The problem of trained real-time.
Further, " according to each transport condition data and course changing control network mould collected in predetermined time period
The step of transport condition data at corresponding next acquisition moment of type prediction, calculating contrast function value ", includes, by formula (1)
Shown in function calculate contrast function value:
Wherein, xsFor the transport condition data collected, x is the driving status number of course changing control network model prediction
According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label, needs to illustrate
It is that a value of setting should not be too large or too small, leads to decision device noise-sensitive if a value is too small, led if a value is excessive
Cause decision device insensitive to environmental change.
Further, if needing to instruct course changing control network model online in judgment mechanism in the method for the present invention
Practice, training method is:
Step Sa11:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model
Predict the transport condition data at corresponding next acquisition moment;
Step Sa12:It calculates course changing control network model prediction result and sets desired error;
Step Sa13:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step Sa14:Step Sa11-Sa13 is repeated, until reaching maximum the number of iterations or course changing control network model
Prediction result and the desired error of setting are in a certain range.
Further, in this embodiment course changing control network model is after preset training dataset off-line training
Neural network model, control is turned to based on the good model realization of off-line training intelligent automobile of the invention based on Policy iteration
Method processed overcomes the initial stage of on-line training, due to lacking data, it is difficult to the phenomenon that initializing course changing control output.This
Control network model is turned in inventive embodiments, off-line training method is:
Step Sb11:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained
Collection;
Step Sb12:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model
Predict the transport condition data at corresponding next acquisition moment;
Step Sb13:It calculates course changing control network model prediction result and sets desired error;
Step Sb14:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step Sb15:Step Sb12-Sb14 is repeated, until maximum the number of iterations or course changing control network model are predicted
As a result in a certain range with the desired error of setting.
Further, the transport condition data and vehicle control amount that intelligent vehicle running is acquired in above-mentioned steps Sb11, can
To obtain polymorphic transport condition data and vehicle control amount by the driving status for constantly changing automobile.The traveling shape of acquisition
State data include:Automobile related data, road related data;Automobile related data includes:Speed, acceleration;The road phase
Closing data includes:Lateral shift, vehicle heading and the road direction angle of road;The acquisition device of road related data
For one of camera, laser radar, GPS or multiple combinations;The acquisition device of automobile related data be automobile IMU or/and
Wheel type encoder.
Transport condition data and vehicle control amount to acquisition pre-process.Pretreatment specifically includes denoising, sieve
Except abnormal data, repeat state information processing.Carry out it is pretreated the reason is that in the collection process of data, due to sensor and
The reason of vehicle itself, can inevitably generate noise and exceptional value, so the characteristic for noise source carries out data de-noising, screens out
Exceptional value repeats state information processing.Wherein, repeating state information processing manner is:Calculate the transport condition data collected
The difference value between transport condition data and vehicle control amount concentrated vehicle control amount and training data, by difference value with
Given threshold is compared, if difference value is greater than given threshold, by the transport condition data collected and vehicle control amount
Training dataset is added, otherwise gives up and collects transport condition data and vehicle control amount.
Specifically, by function shown in formula (2), calculate the transport condition data collected and vehicle control amount with
The difference value between transport condition data and vehicle control amount that training data is concentrated:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor the vehicle driving side of acquisition
To with road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control of acquisition
Amount processed, utFor the vehicle control amount that training data is concentrated, m is the label of training data.
Refering to attached drawing 1, Fig. 1 illustrates the adaptive dynamic based on Policy iteration of an embodiment of the present invention
Algorithm flow schematic diagram, as shown in Figure 1, using course changing control network model as control target in this implementation, based on evaluation net
Network and execution network implementations Policy iteration algorithm, the execution network after being optimized.The input for turning to network model is currently to adopt
The transport condition data and vehicle control amount for collecting the moment export as the transport condition data at next acquisition moment;Evaluate network
Constructed based on neural network, to assess the performance indicator at current acquisition moment, input for turn to network model output,
Execute the output of network;It executes network and needs to be initialized that (the lesser number that the weighted value of network is positive is i.e. using algorithm
Can).Output valve by evaluating network is feedback, executes network weight using stochastic gradient descent scheduling algorithm iteration, reduction is commented
The output of valence network.
Further, in this embodiment shown in the Efficiency Function such as formula (3) of evaluation network:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukFor reality
The vehicle control amount on border;And then the performance indicator that network can be based on formula (4) evaluation current acquisition moment is evaluated,
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control for acquiring the moment
Amount, x are the transport condition data of vehicle, and k is the label at current acquisition moment, and l is moment label, as l → ∞, vehicle row
It is parallel with road direction to sail direction, U (xk+l, υi(xk+l))→0。
In the embodiment of the present invention during the adaptive dynamic algorithm of implementation strategy iteration, it is based on formula (5), (6)
Shown in function to execute network optimize:
Vi(xk)=U (xk,υi(xk))+Vi(xk+1) (5)
Wherein, Vi(xk) be control strategy under performance index function, U (xk,υi(xk)) it is the current efficiency for acquiring the moment
Function, Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is the current current control plan for acquiring the moment
Vehicle control amount under slightly, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkIt is actual
Vehicle control amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F
(xk, uk) be course changing control network model abstract function.
Refering to attached drawing 2, Fig. 2 illustrates the decision device mechanism flow chart of an embodiment of the present invention, below with reference to
Decision device mechanism shown in Fig. 2 describes the intelligent automobile course changing control based on Policy iteration of embodiment of the invention another
The specific steps of method:
Acquire the transport condition data and vehicle control amount of intelligent automobile;
The data of acquisition are denoised, exceptional value is screened out, repeat state information processing;
Based on treated, running data constructs training data pond;
The training of off-line mode is carried out to course changing control network model using the data in training data pond;
In vehicle travel process, the running data of vehicle and the output data of Vehicular turn control network model are acquired;
Introduce decision device mechanism, i.e., the transport condition data collected in calculating setting time length and course changing control
The contrast function value E of the transport condition data of the next sampling instant of correspondence of network model prediction, presets the actuation threshold of decision device
Value T, when E value is greater than T, training is added in the transport condition data and vehicle control amount which is measured
Data pool, and on-line training is carried out to course changing control network model based on the training data pond.Otherwise, course changing control network mould
Type keeps current state;
Using course changing control network model as the object controlled, building evaluation network changes with network implementations strategy is executed
For adaptive dynamic programming algorithm, the execution network after being optimized is controlled based on the vehicle control amount for executing network output
The steering of intelligent automobile.
It should be noted that the realization of the entire algorithm in the embodiment of the present invention is real in the calculating equipment all on unmanned vehicle
It is existing, and by decision device decide whether that detection data is added in training data pond and restart prototype network training.
Those skilled in the art should be able to recognize that, each example side described in conjunction with the examples disclosed in this document
Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are executed actually with electronic hardware or software mode, the specific application and design constraint item depending on technical solution
Part.Those skilled in the art can use different methods to achieve the described function each specific application, but this
Kind is realized and be should not be considered as beyond the scope of the present invention.
Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including one
The process, method of list of elements, not only includes those elements, but also other elements including being not explicitly listed, or also
Including the intrinsic element of these process, methods.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, ability
Field technique personnel are it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from
Under the premise of the principle of the present invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, this
Technical solution after a little changes or replacement will fall within the scope of protection of the present invention.
Claims (9)
1. a kind of intelligent automobile rotating direction control method based on Policy iteration, it is characterised in that including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, when predicting next acquisition
The transport condition data at quarter;The course changing control network model is the neural network after preset training dataset off-line training
Model;
It is predicted according to each transport condition data collected in predetermined time period with course changing control network model corresponding
The transport condition data at next acquisition moment, calculates contrast function value;
If the contrast function value is greater than preset threshold value, the driving status number that will be collected in the predetermined time period
Be added the preset training dataset according to vehicle control amount, and based on the training dataset to course changing control network model into
Row on-line training;
Using course changing control network model as control target, based on evaluation network and the adaptive of network implementations Policy iteration is executed
Dynamic algorithm is answered, the execution network after being optimized;
The steering of the intelligent automobile is controlled based on the vehicle control amount for executing network output.
2. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that " foundation
Corresponding next acquisition that each transport condition data collected in predetermined time period is predicted with course changing control network model
The transport condition data at moment, calculate contrast function value " the step of include, according to the following formula shown in contrast function E, calculate comparison
Functional value:
Wherein, xsFor the transport condition data collected, x is the transport condition data of course changing control network model prediction, and c is
The label of transport condition data, a are the length of the comparison ordered series of numbers of setting, and b is comparison data label.
3. the intelligent automobile rotating direction control method according to claim 2 based on Policy iteration, which is characterized in that described turn
To control network model, the step of on-line training, includes:
Step S101:Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model
The transport condition data at corresponding next acquisition moment;
Step S102:It calculates course changing control network model prediction result and sets desired error;
Step S103:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S104:Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model prediction knot
Fruit and the desired error of setting are in a certain range.
4. the intelligent automobile rotating direction control method according to claim 3 based on Policy iteration, which is characterized in that evaluation net
The Efficiency Function of network is:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukIt is actual
Vehicle control amount;
Currently acquiring the performance index function at moment based on evaluation network evaluation is:
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control amount for acquiring the moment, x is
The transport condition data of vehicle, k are the label at current acquisition moment, and l is moment label, as l → ∞, vehicle heading with
Road direction is parallel, U (xk+l,υi(xk+l))→0。
5. the intelligent automobile rotating direction control method according to claim 4 based on Policy iteration, which is characterized in that utilizing
During the adaptive dynamic algorithm for evaluating network and execution network implementations Policy iteration, based on function shown in following formula to execution
Network optimizes:
Vi(xk)=U (xk,υi(xk))+Vi(xk+1)
Wherein, Vi(xk) be control strategy under performance index function, U (xk,υi(xk)) it is the current Efficiency Function for acquiring the moment,
Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is under the current current control strategy for acquiring the moment
Vehicle control amount, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkFor actual vehicle control
Amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F (xk,uk) be
The abstract function of course changing control network model.
6. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that described turn
To control network model, the step of off-line training, includes:
Step S201:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training dataset is obtained;
Step S202:Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model
The transport condition data at corresponding next acquisition moment;
Step S203:It calculates course changing control network model prediction result and sets desired error;
Step S204:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S205:Repeat step S202-S204, until maximum the number of iterations or course changing control network model prediction result with
Set desired error in a certain range.
7. the intelligent automobile rotating direction control method according to claim 6 based on Policy iteration, which is characterized in that described pre-
Processing includes:It denoises, screens out abnormal data, repeats state information processing;
It is described repeat state information processing include:Calculate the transport condition data collected and vehicle control amount and training dataset
In transport condition data and vehicle control amount between difference value, the difference value is compared with given threshold, if institute
Difference value is stated greater than the given threshold, then training number is added in the transport condition data collected and vehicle control amount
According to collection.
8. the intelligent automobile rotating direction control method according to claim 7 based on Policy iteration, which is characterized in that " calculate
The transport condition data and vehicle control amount that the transport condition data and vehicle control amount that collect and training data are concentrated it
Between difference value ", as the following formula shown in function calculate difference value:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor acquisition vehicle heading with
Road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control amount of acquisition,
utFor the vehicle control amount that training data is concentrated, m is the label of training data.
9. the intelligent automobile rotating direction control method according to claim 1 to 8 based on Policy iteration, feature
It is, the transport condition data includes that speed, acceleration, the lateral shift of road, vehicle heading and road direction press from both sides
Angle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597914.9A CN108909833B (en) | 2018-06-11 | 2018-06-11 | Intelligent automobile steering control method based on strategy iteration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597914.9A CN108909833B (en) | 2018-06-11 | 2018-06-11 | Intelligent automobile steering control method based on strategy iteration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108909833A true CN108909833A (en) | 2018-11-30 |
CN108909833B CN108909833B (en) | 2020-07-28 |
Family
ID=64418858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810597914.9A Active CN108909833B (en) | 2018-06-11 | 2018-06-11 | Intelligent automobile steering control method based on strategy iteration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108909833B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110481536A (en) * | 2019-07-03 | 2019-11-22 | 中国科学院深圳先进技术研究院 | A kind of control method and equipment applied to hybrid vehicle |
WO2022115987A1 (en) * | 2020-12-01 | 2022-06-09 | 浙江吉利控股集团有限公司 | Method and system for automatic driving data collection and closed-loop management |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103097213A (en) * | 2010-09-09 | 2013-05-08 | 大陆-特韦斯贸易合伙股份公司及两合公司 | Determination of steering angle for a motor vehicle |
CN104392212A (en) * | 2014-11-14 | 2015-03-04 | 北京工业大学 | Method for detecting road information and identifying forward vehicles based on vision |
CN105243461A (en) * | 2015-11-20 | 2016-01-13 | 江苏省电力公司 | Short-term load forecasting method based on artificial neural network improved training strategy |
CN107092256A (en) * | 2017-05-27 | 2017-08-25 | 中国科学院自动化研究所 | A kind of unmanned vehicle rotating direction control method |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107438873A (en) * | 2017-07-07 | 2017-12-05 | 驭势科技(北京)有限公司 | A kind of method and apparatus for being used to control vehicle to travel |
-
2018
- 2018-06-11 CN CN201810597914.9A patent/CN108909833B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103097213A (en) * | 2010-09-09 | 2013-05-08 | 大陆-特韦斯贸易合伙股份公司及两合公司 | Determination of steering angle for a motor vehicle |
CN104392212A (en) * | 2014-11-14 | 2015-03-04 | 北京工业大学 | Method for detecting road information and identifying forward vehicles based on vision |
CN105243461A (en) * | 2015-11-20 | 2016-01-13 | 江苏省电力公司 | Short-term load forecasting method based on artificial neural network improved training strategy |
CN107092256A (en) * | 2017-05-27 | 2017-08-25 | 中国科学院自动化研究所 | A kind of unmanned vehicle rotating direction control method |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107438873A (en) * | 2017-07-07 | 2017-12-05 | 驭势科技(北京)有限公司 | A kind of method and apparatus for being used to control vehicle to travel |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110481536A (en) * | 2019-07-03 | 2019-11-22 | 中国科学院深圳先进技术研究院 | A kind of control method and equipment applied to hybrid vehicle |
WO2022115987A1 (en) * | 2020-12-01 | 2022-06-09 | 浙江吉利控股集团有限公司 | Method and system for automatic driving data collection and closed-loop management |
Also Published As
Publication number | Publication date |
---|---|
CN108909833B (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113805572B (en) | Method and device for motion planning | |
WO2020052587A1 (en) | System and method for hierarchical planning in autonomous vehicles | |
US11237562B2 (en) | System and method for avoiding contact between autonomous and manned vehicles caused by loss of traction | |
CN105700538B (en) | Track follower method based on neural network and pid algorithm | |
EP3782143B1 (en) | Method and system for multimodal deep traffic signal control | |
EP3588226B1 (en) | Method and arrangement for generating control commands for an autonomous road vehicle | |
CN107697070A (en) | Driving behavior Forecasting Methodology and device, unmanned vehicle | |
CN107264534A (en) | Intelligent driving control system and method, vehicle based on driver experience's model | |
CN108909833A (en) | Intelligent automobile rotating direction control method based on Policy iteration | |
EP3800521A1 (en) | Deep learning based motion control of a vehicle | |
CN111391831B (en) | Automobile following speed control method and system based on preceding automobile speed prediction | |
CN107092256A (en) | A kind of unmanned vehicle rotating direction control method | |
CN110456634A (en) | A kind of unmanned vehicle control parameter choosing method based on artificial neural network | |
CN111930112A (en) | Intelligent vehicle path tracking control method and system based on MPC | |
CN110879595A (en) | Unmanned mine card tracking control system and method based on deep reinforcement learning | |
WO2017212508A1 (en) | Control objective integration system, control objective integration method and control objective integration program | |
CN113465625B (en) | Local path planning method and device | |
Yang et al. | An intelligent predictive control approach to path tracking problem of autonomous mobile robot | |
Moghadam et al. | A deep reinforcement learning approach for long-term short-term planning on frenet frame | |
Mirchevska et al. | Amortized Q-learning with model-based action proposals for autonomous driving on highways | |
Han et al. | Reinforcement learning guided by double replay memory | |
DE112021006148T5 (en) | Method and system for determining a motion model for motion prediction when controlling autonomous vehicles | |
CN109712424A (en) | A kind of automobile navigation method based on Internet of Things | |
US20230177405A1 (en) | Ensemble of narrow ai agents | |
Tan et al. | A real-world application of lane-guidance technologies—Automated snowblower |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |