CN108909833A - Intelligent automobile rotating direction control method based on Policy iteration - Google Patents

Intelligent automobile rotating direction control method based on Policy iteration Download PDF

Info

Publication number
CN108909833A
CN108909833A CN201810597914.9A CN201810597914A CN108909833A CN 108909833 A CN108909833 A CN 108909833A CN 201810597914 A CN201810597914 A CN 201810597914A CN 108909833 A CN108909833 A CN 108909833A
Authority
CN
China
Prior art keywords
transport condition
condition data
course changing
network model
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810597914.9A
Other languages
Chinese (zh)
Other versions
CN108909833B (en
Inventor
汤淑明
卢晓昀
朱海兵
杜清秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810597914.9A priority Critical patent/CN108909833B/en
Publication of CN108909833A publication Critical patent/CN108909833A/en
Application granted granted Critical
Publication of CN108909833B publication Critical patent/CN108909833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D6/00Arrangements for automatically controlling steering depending on driving conditions sensed and responded to, e.g. control circuits
    • B62D6/001Arrangements for automatically controlling steering depending on driving conditions sensed and responded to, e.g. control circuits the torque NOT being among the input parameters
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D15/00Steering not otherwise provided for
    • B62D15/02Steering position indicators ; Steering position determination; Steering aids
    • B62D15/025Active steering aids, e.g. helping the driver by actively influencing the steering system after environment evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)
  • Steering Control In Accordance With Driving Conditions (AREA)

Abstract

The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile rotating direction control method based on Policy iteration, it is intended to solve the problems, such as how to improve online independent learning ability of the unmanned intelligent automobile on course changing control.For this purpose, the intelligent automobile rotating direction control method in the present invention based on Policy iteration includes:Acquire the transport condition data and vehicle control amount of vehicle;Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, predict the transport condition data at next acquisition moment;On-line training based on judgment mechanism control course changing control network model;Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration algorithm, the execution network after being optimized are executed;Steering based on the vehicle control amount control intelligent automobile for executing network output.This method improves the real-time of model training and the adaptability to current environment.

Description

Intelligent automobile rotating direction control method based on Policy iteration
Technical field
The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile based on Policy iteration turn To control method.
Background technique
Intelligent automobile is unmanned be include environment sensing, path planning and the technology for independently realizing vehicle control.Research Show in enhancing expressway safety, alleviate the fields such as traffic congestion, reduction air pollution, it is unmanned to bring subversiveness Improvement.Unmanned automobile perceives environmental change and the vehicle running state of road by sensor, utilizes the unmanned control of vehicle Technology processed provides vehicle optimum control according to vehicle running state and road environment.The research of intelligent automobile will reduce due to The probability of traffic accident occurs for driver's negligence, while driver being absorbed in solve in state from prolonged driving and is released Come.Currently, external have been achieved for tremendous development in terms of pilotless automobile, domestic research institution and colleges and universities have been opened Begin correlative study, and obtained certain achievement, but has had certain gap compared to external unmanned technology.
The self-steering on road of vehicle belongs to the research contents of the L2 grade semi-automatic driving in unmanned grade One of, how the road environment of vehicle driving is perceived according to detector and vehicle running state calculates corner information, and then controlled The steering actuator of automotive interior processed adjusts Vehicular turn, is one of main research of self-steering.Motor turning control With strong kinematic nonlinearity, neural network be then it is functional can be achieved the non-linear tool hinted obliquely at, training nerve net Network needs a large amount of data, and the data for generally requiring to traverse entire sample-motion space can just obtain preferable model net Network.Prototype network training method includes off-line training and two kinds of on-line training.Neural Network Online training mode is by constantly adopting The shortcomings that course changing control information of collection automobile, the steering model of training vehicle, which, is, in the initial stage, due to lacking Data, be difficult to initialize the course changing control to automobile, while training data is continuously increased, and be will affect and is established neural network mould The real-time of type.Off-line training mode can not be done then since network training data are fixed in face of variation complex environment sometimes It is empty to need to guarantee that sample traverses entire sample-movement network as far as possible to obtain preferable training effect for effective control out Between, it means that pay biggish trained cost.
Summary of the invention
In order to solve the above problem in the prior art, turning in order to solve how to improve unmanned intelligent automobile The problem of online independent learning ability in control, the present invention provides a kind of, and the intelligent automobile based on Policy iteration turns to Control method, including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next Acquire the transport condition data at moment;The course changing control network model is after preset training dataset off-line training Neural network model;
Pair according to each transport condition data and course changing control the network model prediction collected in predetermined time period The transport condition data at the next acquisition moment answered calculates contrast function value;
If the contrast function value is greater than preset threshold value, the traveling that will be collected in the predetermined time period The preset training dataset is added in status data and vehicle control amount, and based on the training dataset to course changing control net Network model carries out on-line training;
Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration is executed Adaptive dynamic algorithm, the execution network after being optimized;
The steering of the intelligent automobile is controlled based on the vehicle control amount for executing network output.
Further, " according to each transport condition data and course changing control network mould collected in predetermined time period Type prediction it is corresponding it is next acquisition the moment transport condition data, calculate contrast function value " the step of include, according to the following formula Shown in contrast function E, calculate contrast function value:
Wherein, xsFor the transport condition data collected, x is the driving status number of course changing control network model prediction According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label.
Further, the step of course changing control network model, on-line training includes:
Step S101:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment;
Step S102:It calculates course changing control network model prediction result and sets desired error;
Step S103:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S104:Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model Prediction result and the desired error of setting are in a certain range.
Further, the Efficiency Function for evaluating network is:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukFor reality The vehicle control amount on border;
Currently acquiring the performance index function at moment based on evaluation network evaluation is:
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control for acquiring the moment Amount, x are the transport condition data of vehicle, and k is the label at current acquisition moment, and l is moment label, as l → ∞, vehicle row It is parallel with road direction to sail direction, U (xk+l, υi(xk+l))→0。
Further, in Utilization assessment network and during executing the adaptive dynamic algorithm of network implementations Policy iteration, It is optimized based on function shown in following formula to network is executed:
Vi(xk)=U (xki(xk))+Vi(xk+1)
Wherein, Vi(xk) be control strategy under performance index function, U (xki(xk)) it is the current efficiency for acquiring the moment Function, Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is the current current control plan for acquiring the moment Vehicle control amount under slightly, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkFor actual vehicle Control amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F (xk,uk) be course changing control network model abstract function.
Further, the step of course changing control network model, off-line training includes:
Step S201:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained Collection;
Step S202:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment;
Step S203:It calculates course changing control network model prediction result and sets desired error;
Step S204:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S205:Step S202-S204 is repeated, until maximum the number of iterations or course changing control network model are predicted As a result in a certain range with the desired error of setting.
Further, described pre-process includes:It denoises, screens out abnormal data, repeats state information processing;
It is described repeat state information processing include:Calculate the transport condition data collected and vehicle control amount and training Difference value between transport condition data in data set and vehicle control amount compares the difference value and given threshold Compared with if the difference value adds the transport condition data collected and vehicle control amount greater than the given threshold Enter training dataset.
Further, " the traveling that the transport condition data and vehicle control amount and training data that calculating collects are concentrated Difference value between status data and vehicle control amount ", as the following formula shown in function calculate difference value:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor the vehicle driving side of acquisition To with road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control of acquisition Amount processed, utFor the vehicle control amount that training data is concentrated, m is the label of training data.
Further, the transport condition data includes speed, acceleration, the lateral shift of road, vehicle heading With road direction angle.
Compared with the immediate prior art, above-mentioned technical proposal is at least had the advantages that:
Intelligent automobile rotating direction control method based on Policy iteration of the invention realizes intelligence using neural network model The advantages of course changing control of automobile, this method combines off-line training and on-line training, need not pay huge trained cost Network training can be realized, simply and effectively initialize network.It also solves the disadvantage that on-line training simultaneously, that is, needs not It is disconnected to introduce training data, retraining constantly is carried out to network.Current ambient conditions-movement net is traversed as far as possible in guarantee sample While network, the generation of redundant data is also avoided as far as possible, improves the real-time of network training and current environment is fitted Ying Xing.
Detailed description of the invention
Fig. 1 is the adaptive dynamic algorithm flow diagram based on Policy iteration of an embodiment of the present invention;
Fig. 2 is the decision device mechanism flow diagram of an embodiment of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.
Traditional intelligent automobile rotating direction control method needs to establish the mathematical model of motor turning control, for this reason, it may be necessary to All dependent variables in motor turning control are solved, such as automobile self performance, automobile present speed, acceleration, road parameters, Middle road parameters include road curve curvature, road inclination, coefficient of friction etc., after according to experimental fit or physics law The nonlinear equation for obtaining multivariable input is deduced, the determination of the complexity of equation and each term coefficient needs many experiments and survey Amount.Neural network has a good non-linear mapping capability, and network training mode includes on-line training and off-line training, however, On-line training needs continuous acquisition data and constantly training obtains neural network model, cannot preferably initialize trolley control System output, and since the acquisition of data is added in neural network model without the data that sensor is defeated of screening, it causes big Invalid redundant data is measured, the real-time of model training is influenced, in face of needing to repeat aforesaid operations when environmental change, further It is exaggerated the shortcomings that cannot initializing trolley control output and data redundancy.Off-line training mode is then due to network training data It is fixed, effective control can not be made sometimes in face of variation complex environment, to obtain preferable training effect, to guarantee sample This traverses entire sample-movement cyberspace as far as possible and needs to pay biggish trained cost.
Intelligent automobile rotating direction control method based on Policy iteration of the invention combines off-line training and on-line training Advantage carries out course changing control using the good model of off-line training, is simply effectively obtained initialization model network.Exist simultaneously Vehicle travel process joined judgment mechanism, and the on-line training of Controlling model solves and needs to continually introduce instruction in on-line training The shortcomings that practicing data, retraining constantly carried out to network.With reference to the accompanying drawing, to provided by the invention based on Policy iteration Intelligent automobile rotating direction control method is illustrated.
The present invention reflects the driving status of vehicle, the transport condition data packet of vehicle by the transport condition data of vehicle Include speed, acceleration, the lateral shift of road, vehicle heading and road direction angle;The control amount of vehicle, that is, vehicle control The input of signal processed, intelligent automobile is under unmanned mode, by constantly calculating the control amount of next acquisition moment vehicle, And the control system of intelligent automobile is inputed to, to realize the course changing control under unmanned mode.It should be noted that this reality Shi Zhong, the transport condition data of acquisition vehicle and the time interval of vehicle control amount are equal every time, when being specifically spaced Between can be adjusted according to the actual situation.
Intelligent automobile rotating direction control method of one of the present embodiment based on Policy iteration, including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next Acquire the transport condition data at moment;Course changing control network model is the nerve after preset training dataset off-line training Network model;
Pair according to each transport condition data and course changing control the network model prediction collected in predetermined time period The transport condition data at the next acquisition moment answered calculates contrast function value;
If contrast function value is greater than preset threshold value, the transport condition data that will be collected in predetermined time period Training dataset is added with vehicle control amount, and on-line training is carried out to course changing control model based on the training dataset;
Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration is executed Adaptive dynamic algorithm, the execution network after being optimized;
Steering based on the vehicle control amount control intelligent automobile for executing network output.
Further, course changing control network model is constructed based on BP neural network, and introduces a judgment mechanism, to control The on-line training of course changing control network model processed specifically acquires the transport condition data and vehicle control amount of vehicle;It will go Sail status data and vehicle control amount input course changing control network model, the corresponding next acquisition of course changing control network model prediction The transport condition data at moment;Calculate the transport condition data collected in setting time length and course changing control network mould The contrast function value E of the transport condition data at corresponding next acquisition moment of type prediction;By the contrast function value E and one Preset threshold value T1 is compared, if contrast function value E is greater than preset threshold T1, will be acquired in the setting time Preset training dataset is added to transport condition data and vehicle control amount, and based on the training dataset to course changing control Network model carries out on-line training, and otherwise course changing control model keeps current state;Using course changing control network model as controlled Object processed, based on evaluating network and executing network implementations Policy iteration algorithm, the execution network after obtaining optimization is based on executing The vehicle control amount of network output realizes the control of intelligent automobile automated steering.
The intelligent automobile rotating direction control method based on Policy iteration of the present embodiment needs during overcoming on-line training The shortcomings that continuous acquisition data and constantly training neural network model, while also solving driving status number in traditional mode It is directly inputted in neural network model according to without screening, a large amount of invalid redundant datas is generated, to influence neural network model The problem of trained real-time.
Further, " according to each transport condition data and course changing control network mould collected in predetermined time period The step of transport condition data at corresponding next acquisition moment of type prediction, calculating contrast function value ", includes, by formula (1) Shown in function calculate contrast function value:
Wherein, xsFor the transport condition data collected, x is the driving status number of course changing control network model prediction According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label, needs to illustrate It is that a value of setting should not be too large or too small, leads to decision device noise-sensitive if a value is too small, led if a value is excessive Cause decision device insensitive to environmental change.
Further, if needing to instruct course changing control network model online in judgment mechanism in the method for the present invention Practice, training method is:
Step Sa11:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment;
Step Sa12:It calculates course changing control network model prediction result and sets desired error;
Step Sa13:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step Sa14:Step Sa11-Sa13 is repeated, until reaching maximum the number of iterations or course changing control network model Prediction result and the desired error of setting are in a certain range.
Further, in this embodiment course changing control network model is after preset training dataset off-line training Neural network model, control is turned to based on the good model realization of off-line training intelligent automobile of the invention based on Policy iteration Method processed overcomes the initial stage of on-line training, due to lacking data, it is difficult to the phenomenon that initializing course changing control output.This Control network model is turned in inventive embodiments, off-line training method is:
Step Sb11:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained Collection;
Step Sb12:Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment;
Step Sb13:It calculates course changing control network model prediction result and sets desired error;
Step Sb14:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step Sb15:Step Sb12-Sb14 is repeated, until maximum the number of iterations or course changing control network model are predicted As a result in a certain range with the desired error of setting.
Further, the transport condition data and vehicle control amount that intelligent vehicle running is acquired in above-mentioned steps Sb11, can To obtain polymorphic transport condition data and vehicle control amount by the driving status for constantly changing automobile.The traveling shape of acquisition State data include:Automobile related data, road related data;Automobile related data includes:Speed, acceleration;The road phase Closing data includes:Lateral shift, vehicle heading and the road direction angle of road;The acquisition device of road related data For one of camera, laser radar, GPS or multiple combinations;The acquisition device of automobile related data be automobile IMU or/and Wheel type encoder.
Transport condition data and vehicle control amount to acquisition pre-process.Pretreatment specifically includes denoising, sieve Except abnormal data, repeat state information processing.Carry out it is pretreated the reason is that in the collection process of data, due to sensor and The reason of vehicle itself, can inevitably generate noise and exceptional value, so the characteristic for noise source carries out data de-noising, screens out Exceptional value repeats state information processing.Wherein, repeating state information processing manner is:Calculate the transport condition data collected The difference value between transport condition data and vehicle control amount concentrated vehicle control amount and training data, by difference value with Given threshold is compared, if difference value is greater than given threshold, by the transport condition data collected and vehicle control amount Training dataset is added, otherwise gives up and collects transport condition data and vehicle control amount.
Specifically, by function shown in formula (2), calculate the transport condition data collected and vehicle control amount with The difference value between transport condition data and vehicle control amount that training data is concentrated:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor the vehicle driving side of acquisition To with road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control of acquisition Amount processed, utFor the vehicle control amount that training data is concentrated, m is the label of training data.
Refering to attached drawing 1, Fig. 1 illustrates the adaptive dynamic based on Policy iteration of an embodiment of the present invention Algorithm flow schematic diagram, as shown in Figure 1, using course changing control network model as control target in this implementation, based on evaluation net Network and execution network implementations Policy iteration algorithm, the execution network after being optimized.The input for turning to network model is currently to adopt The transport condition data and vehicle control amount for collecting the moment export as the transport condition data at next acquisition moment;Evaluate network Constructed based on neural network, to assess the performance indicator at current acquisition moment, input for turn to network model output, Execute the output of network;It executes network and needs to be initialized that (the lesser number that the weighted value of network is positive is i.e. using algorithm Can).Output valve by evaluating network is feedback, executes network weight using stochastic gradient descent scheduling algorithm iteration, reduction is commented The output of valence network.
Further, in this embodiment shown in the Efficiency Function such as formula (3) of evaluation network:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukFor reality The vehicle control amount on border;And then the performance indicator that network can be based on formula (4) evaluation current acquisition moment is evaluated,
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control for acquiring the moment Amount, x are the transport condition data of vehicle, and k is the label at current acquisition moment, and l is moment label, as l → ∞, vehicle row It is parallel with road direction to sail direction, U (xk+l, υi(xk+l))→0。
In the embodiment of the present invention during the adaptive dynamic algorithm of implementation strategy iteration, it is based on formula (5), (6) Shown in function to execute network optimize:
Vi(xk)=U (xki(xk))+Vi(xk+1) (5)
Wherein, Vi(xk) be control strategy under performance index function, U (xki(xk)) it is the current efficiency for acquiring the moment Function, Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is the current current control plan for acquiring the moment Vehicle control amount under slightly, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkIt is actual Vehicle control amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F (xk, uk) be course changing control network model abstract function.
Refering to attached drawing 2, Fig. 2 illustrates the decision device mechanism flow chart of an embodiment of the present invention, below with reference to Decision device mechanism shown in Fig. 2 describes the intelligent automobile course changing control based on Policy iteration of embodiment of the invention another The specific steps of method:
Acquire the transport condition data and vehicle control amount of intelligent automobile;
The data of acquisition are denoised, exceptional value is screened out, repeat state information processing;
Based on treated, running data constructs training data pond;
The training of off-line mode is carried out to course changing control network model using the data in training data pond;
In vehicle travel process, the running data of vehicle and the output data of Vehicular turn control network model are acquired;
Introduce decision device mechanism, i.e., the transport condition data collected in calculating setting time length and course changing control The contrast function value E of the transport condition data of the next sampling instant of correspondence of network model prediction, presets the actuation threshold of decision device Value T, when E value is greater than T, training is added in the transport condition data and vehicle control amount which is measured Data pool, and on-line training is carried out to course changing control network model based on the training data pond.Otherwise, course changing control network mould Type keeps current state;
Using course changing control network model as the object controlled, building evaluation network changes with network implementations strategy is executed For adaptive dynamic programming algorithm, the execution network after being optimized is controlled based on the vehicle control amount for executing network output The steering of intelligent automobile.
It should be noted that the realization of the entire algorithm in the embodiment of the present invention is real in the calculating equipment all on unmanned vehicle It is existing, and by decision device decide whether that detection data is added in training data pond and restart prototype network training.
Those skilled in the art should be able to recognize that, each example side described in conjunction with the examples disclosed in this document Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are executed actually with electronic hardware or software mode, the specific application and design constraint item depending on technical solution Part.Those skilled in the art can use different methods to achieve the described function each specific application, but this Kind is realized and be should not be considered as beyond the scope of the present invention.
Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including one The process, method of list of elements, not only includes those elements, but also other elements including being not explicitly listed, or also Including the intrinsic element of these process, methods.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, ability Field technique personnel are it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from Under the premise of the principle of the present invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, this Technical solution after a little changes or replacement will fall within the scope of protection of the present invention.

Claims (9)

1. a kind of intelligent automobile rotating direction control method based on Policy iteration, it is characterised in that including:
Acquire the transport condition data and vehicle control amount of vehicle;
Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, when predicting next acquisition The transport condition data at quarter;The course changing control network model is the neural network after preset training dataset off-line training Model;
It is predicted according to each transport condition data collected in predetermined time period with course changing control network model corresponding The transport condition data at next acquisition moment, calculates contrast function value;
If the contrast function value is greater than preset threshold value, the driving status number that will be collected in the predetermined time period Be added the preset training dataset according to vehicle control amount, and based on the training dataset to course changing control network model into Row on-line training;
Using course changing control network model as control target, based on evaluation network and the adaptive of network implementations Policy iteration is executed Dynamic algorithm is answered, the execution network after being optimized;
The steering of the intelligent automobile is controlled based on the vehicle control amount for executing network output.
2. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that " foundation Corresponding next acquisition that each transport condition data collected in predetermined time period is predicted with course changing control network model The transport condition data at moment, calculate contrast function value " the step of include, according to the following formula shown in contrast function E, calculate comparison Functional value:
Wherein, xsFor the transport condition data collected, x is the transport condition data of course changing control network model prediction, and c is The label of transport condition data, a are the length of the comparison ordered series of numbers of setting, and b is comparison data label.
3. the intelligent automobile rotating direction control method according to claim 2 based on Policy iteration, which is characterized in that described turn To control network model, the step of on-line training, includes:
Step S101:Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model The transport condition data at corresponding next acquisition moment;
Step S102:It calculates course changing control network model prediction result and sets desired error;
Step S103:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S104:Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model prediction knot Fruit and the desired error of setting are in a certain range.
4. the intelligent automobile rotating direction control method according to claim 3 based on Policy iteration, which is characterized in that evaluation net The Efficiency Function of network is:
Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, xkFor actual transport condition data, ukIt is actual Vehicle control amount;
Currently acquiring the performance index function at moment based on evaluation network evaluation is:
Wherein, V (xk) it is current acquisition moment performance index function, υi(xk+l) it is the current vehicle control amount for acquiring the moment, x is The transport condition data of vehicle, k are the label at current acquisition moment, and l is moment label, as l → ∞, vehicle heading with Road direction is parallel, U (xk+li(xk+l))→0。
5. the intelligent automobile rotating direction control method according to claim 4 based on Policy iteration, which is characterized in that utilizing During the adaptive dynamic algorithm for evaluating network and execution network implementations Policy iteration, based on function shown in following formula to execution Network optimizes:
Vi(xk)=U (xki(xk))+Vi(xk+1)
Wherein, Vi(xk) be control strategy under performance index function, U (xki(xk)) it is the current Efficiency Function for acquiring the moment, Vi(xk+1) it is to the performance index function after control network iteration, υi(xk) it is under the current current control strategy for acquiring the moment Vehicle control amount, υi+1(xk) it is to control the vehicle control amount after network iteration, u at the current acquisition momentkFor actual vehicle control Amount, υkFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, xkFor actual transport condition data, F (xk,uk) be The abstract function of course changing control network model.
6. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that described turn To control network model, the step of off-line training, includes:
Step S201:The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training dataset is obtained;
Step S202:Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model The transport condition data at corresponding next acquisition moment;
Step S203:It calculates course changing control network model prediction result and sets desired error;
Step S204:Utilize the weight of error backpropagation algorithm optimization course changing control network model;
Step S205:Repeat step S202-S204, until maximum the number of iterations or course changing control network model prediction result with Set desired error in a certain range.
7. the intelligent automobile rotating direction control method according to claim 6 based on Policy iteration, which is characterized in that described pre- Processing includes:It denoises, screens out abnormal data, repeats state information processing;
It is described repeat state information processing include:Calculate the transport condition data collected and vehicle control amount and training dataset In transport condition data and vehicle control amount between difference value, the difference value is compared with given threshold, if institute Difference value is stated greater than the given threshold, then training number is added in the transport condition data collected and vehicle control amount According to collection.
8. the intelligent automobile rotating direction control method according to claim 7 based on Policy iteration, which is characterized in that " calculate The transport condition data and vehicle control amount that the transport condition data and vehicle control amount that collect and training data are concentrated it Between difference value ", as the following formula shown in function calculate difference value:
Wherein, xs jFor the car speed of acquisition, xt jFor the speed that training data is concentrated, xosFor acquisition vehicle heading with Road direction angle, xotVehicle heading and road direction angle, u are concentrated for training datasFor the vehicle control amount of acquisition, utFor the vehicle control amount that training data is concentrated, m is the label of training data.
9. the intelligent automobile rotating direction control method according to claim 1 to 8 based on Policy iteration, feature It is, the transport condition data includes that speed, acceleration, the lateral shift of road, vehicle heading and road direction press from both sides Angle.
CN201810597914.9A 2018-06-11 2018-06-11 Intelligent automobile steering control method based on strategy iteration Active CN108909833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810597914.9A CN108909833B (en) 2018-06-11 2018-06-11 Intelligent automobile steering control method based on strategy iteration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810597914.9A CN108909833B (en) 2018-06-11 2018-06-11 Intelligent automobile steering control method based on strategy iteration

Publications (2)

Publication Number Publication Date
CN108909833A true CN108909833A (en) 2018-11-30
CN108909833B CN108909833B (en) 2020-07-28

Family

ID=64418858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810597914.9A Active CN108909833B (en) 2018-06-11 2018-06-11 Intelligent automobile steering control method based on strategy iteration

Country Status (1)

Country Link
CN (1) CN108909833B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110481536A (en) * 2019-07-03 2019-11-22 中国科学院深圳先进技术研究院 A kind of control method and equipment applied to hybrid vehicle
WO2022115987A1 (en) * 2020-12-01 2022-06-09 浙江吉利控股集团有限公司 Method and system for automatic driving data collection and closed-loop management

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103097213A (en) * 2010-09-09 2013-05-08 大陆-特韦斯贸易合伙股份公司及两合公司 Determination of steering angle for a motor vehicle
CN104392212A (en) * 2014-11-14 2015-03-04 北京工业大学 Method for detecting road information and identifying forward vehicles based on vision
CN105243461A (en) * 2015-11-20 2016-01-13 江苏省电力公司 Short-term load forecasting method based on artificial neural network improved training strategy
CN107092256A (en) * 2017-05-27 2017-08-25 中国科学院自动化研究所 A kind of unmanned vehicle rotating direction control method
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107438873A (en) * 2017-07-07 2017-12-05 驭势科技(北京)有限公司 A kind of method and apparatus for being used to control vehicle to travel

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103097213A (en) * 2010-09-09 2013-05-08 大陆-特韦斯贸易合伙股份公司及两合公司 Determination of steering angle for a motor vehicle
CN104392212A (en) * 2014-11-14 2015-03-04 北京工业大学 Method for detecting road information and identifying forward vehicles based on vision
CN105243461A (en) * 2015-11-20 2016-01-13 江苏省电力公司 Short-term load forecasting method based on artificial neural network improved training strategy
CN107092256A (en) * 2017-05-27 2017-08-25 中国科学院自动化研究所 A kind of unmanned vehicle rotating direction control method
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107438873A (en) * 2017-07-07 2017-12-05 驭势科技(北京)有限公司 A kind of method and apparatus for being used to control vehicle to travel

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110481536A (en) * 2019-07-03 2019-11-22 中国科学院深圳先进技术研究院 A kind of control method and equipment applied to hybrid vehicle
WO2022115987A1 (en) * 2020-12-01 2022-06-09 浙江吉利控股集团有限公司 Method and system for automatic driving data collection and closed-loop management

Also Published As

Publication number Publication date
CN108909833B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN113805572B (en) Method and device for motion planning
WO2020052587A1 (en) System and method for hierarchical planning in autonomous vehicles
US11237562B2 (en) System and method for avoiding contact between autonomous and manned vehicles caused by loss of traction
CN105700538B (en) Track follower method based on neural network and pid algorithm
EP3782143B1 (en) Method and system for multimodal deep traffic signal control
EP3588226B1 (en) Method and arrangement for generating control commands for an autonomous road vehicle
CN107697070A (en) Driving behavior Forecasting Methodology and device, unmanned vehicle
CN107264534A (en) Intelligent driving control system and method, vehicle based on driver experience's model
CN108909833A (en) Intelligent automobile rotating direction control method based on Policy iteration
EP3800521A1 (en) Deep learning based motion control of a vehicle
CN111391831B (en) Automobile following speed control method and system based on preceding automobile speed prediction
CN107092256A (en) A kind of unmanned vehicle rotating direction control method
CN110456634A (en) A kind of unmanned vehicle control parameter choosing method based on artificial neural network
CN111930112A (en) Intelligent vehicle path tracking control method and system based on MPC
CN110879595A (en) Unmanned mine card tracking control system and method based on deep reinforcement learning
WO2017212508A1 (en) Control objective integration system, control objective integration method and control objective integration program
CN113465625B (en) Local path planning method and device
Yang et al. An intelligent predictive control approach to path tracking problem of autonomous mobile robot
Moghadam et al. A deep reinforcement learning approach for long-term short-term planning on frenet frame
Mirchevska et al. Amortized Q-learning with model-based action proposals for autonomous driving on highways
Han et al. Reinforcement learning guided by double replay memory
DE112021006148T5 (en) Method and system for determining a motion model for motion prediction when controlling autonomous vehicles
CN109712424A (en) A kind of automobile navigation method based on Internet of Things
US20230177405A1 (en) Ensemble of narrow ai agents
Tan et al. A real-world application of lane-guidance technologies—Automated snowblower

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant