CN108909833A

CN108909833A - Intelligent automobile rotating direction control method based on Policy iteration

Info

Publication number: CN108909833A
Application number: CN201810597914.9A
Authority: CN
Inventors: 汤淑明; 卢晓昀; 朱海兵; 杜清秀
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2018-11-30
Anticipated expiration: 2038-06-11
Also published as: CN108909833B

Abstract

The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile rotating direction control method based on Policy iteration, it is intended to solve the problems, such as how to improve online independent learning ability of the unmanned intelligent automobile on course changing control.For this purpose, the intelligent automobile rotating direction control method in the present invention based on Policy iteration includes：Acquire the transport condition data and vehicle control amount of vehicle；Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, predict the transport condition data at next acquisition moment；On-line training based on judgment mechanism control course changing control network model；Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration algorithm, the execution network after being optimized are executed；Steering based on the vehicle control amount control intelligent automobile for executing network output.This method improves the real-time of model training and the adaptability to current environment.

Description

Intelligent automobile rotating direction control method based on Policy iteration

Technical field

The invention belongs to Vehicular automatic driving technical fields, and in particular to a kind of intelligent automobile based on Policy iteration turn To control method.

Background technique

Intelligent automobile is unmanned be include environment sensing, path planning and the technology for independently realizing vehicle control.Research Show in enhancing expressway safety, alleviate the fields such as traffic congestion, reduction air pollution, it is unmanned to bring subversiveness Improvement.Unmanned automobile perceives environmental change and the vehicle running state of road by sensor, utilizes the unmanned control of vehicle Technology processed provides vehicle optimum control according to vehicle running state and road environment.The research of intelligent automobile will reduce due to The probability of traffic accident occurs for driver's negligence, while driver being absorbed in solve in state from prolonged driving and is released Come.Currently, external have been achieved for tremendous development in terms of pilotless automobile, domestic research institution and colleges and universities have been opened Begin correlative study, and obtained certain achievement, but has had certain gap compared to external unmanned technology.

The self-steering on road of vehicle belongs to the research contents of the L2 grade semi-automatic driving in unmanned grade One of, how the road environment of vehicle driving is perceived according to detector and vehicle running state calculates corner information, and then controlled The steering actuator of automotive interior processed adjusts Vehicular turn, is one of main research of self-steering.Motor turning control With strong kinematic nonlinearity, neural network be then it is functional can be achieved the non-linear tool hinted obliquely at, training nerve net Network needs a large amount of data, and the data for generally requiring to traverse entire sample-motion space can just obtain preferable model net Network.Prototype network training method includes off-line training and two kinds of on-line training.Neural Network Online training mode is by constantly adopting The shortcomings that course changing control information of collection automobile, the steering model of training vehicle, which, is, in the initial stage, due to lacking Data, be difficult to initialize the course changing control to automobile, while training data is continuously increased, and be will affect and is established neural network mould The real-time of type.Off-line training mode can not be done then since network training data are fixed in face of variation complex environment sometimes It is empty to need to guarantee that sample traverses entire sample-movement network as far as possible to obtain preferable training effect for effective control out Between, it means that pay biggish trained cost.

Summary of the invention

In order to solve the above problem in the prior art, turning in order to solve how to improve unmanned intelligent automobile The problem of online independent learning ability in control, the present invention provides a kind of, and the intelligent automobile based on Policy iteration turns to Control method, including：

Acquire the transport condition data and vehicle control amount of vehicle；

Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next Acquire the transport condition data at moment；The course changing control network model is after preset training dataset off-line training Neural network model；

Pair according to each transport condition data and course changing control the network model prediction collected in predetermined time period The transport condition data at the next acquisition moment answered calculates contrast function value；

If the contrast function value is greater than preset threshold value, the traveling that will be collected in the predetermined time period The preset training dataset is added in status data and vehicle control amount, and based on the training dataset to course changing control net Network model carries out on-line training；

Using course changing control network model as control target, based on evaluation network and network implementations Policy iteration is executed Adaptive dynamic algorithm, the execution network after being optimized；

The steering of the intelligent automobile is controlled based on the vehicle control amount for executing network output.

Further, " according to each transport condition data and course changing control network mould collected in predetermined time period Type prediction it is corresponding it is next acquisition the moment transport condition data, calculate contrast function value " the step of include, according to the following formula Shown in contrast function E, calculate contrast function value：

Wherein, x_sFor the transport condition data collected, x is the driving status number of course changing control network model prediction According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label.

Further, the step of course changing control network model, on-line training includes：

Step S101：Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment；

Step S102：It calculates course changing control network model prediction result and sets desired error；

Step S103：Utilize the weight of error backpropagation algorithm optimization course changing control network model；

Step S104：Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model Prediction result and the desired error of setting are in a certain range.

Further, the Efficiency Function for evaluating network is：

Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, x_kFor actual transport condition data, u_kFor reality The vehicle control amount on border；

Currently acquiring the performance index function at moment based on evaluation network evaluation is：

Wherein, V (x_k) it is current acquisition moment performance index function, υ_i(x_k+l) it is the current vehicle control for acquiring the moment Amount, x are the transport condition data of vehicle, and k is the label at current acquisition moment, and l is moment label, as l → ∞, vehicle row It is parallel with road direction to sail direction, U (x_k+l, υ_i(x_k+l))→0。

Further, in Utilization assessment network and during executing the adaptive dynamic algorithm of network implementations Policy iteration, It is optimized based on function shown in following formula to network is executed：

V_i(x_k)=U (x_k,υ_i(x_k))+V_i(x_k+1)

Wherein, V_i(x_k) be control strategy under performance index function, U (x_k,υ_i(x_k)) it is the current efficiency for acquiring the moment Function, V_i(x_k+1) it is to the performance index function after control network iteration, υ_i(x_k) it is the current current control plan for acquiring the moment Vehicle control amount under slightly, υ_i+1(x_k) it is to control the vehicle control amount after network iteration, u at the current acquisition moment_kFor actual vehicle Control amount, υ_kFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, x_kFor actual transport condition data, F (x_k,u_k) be course changing control network model abstract function.

Further, the step of course changing control network model, off-line training includes：

Step S201：The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained Collection；

Step S202：Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment；

Step S203：It calculates course changing control network model prediction result and sets desired error；

Step S204：Utilize the weight of error backpropagation algorithm optimization course changing control network model；

Step S205：Step S202-S204 is repeated, until maximum the number of iterations or course changing control network model are predicted As a result in a certain range with the desired error of setting.

Further, described pre-process includes：It denoises, screens out abnormal data, repeats state information processing；

It is described repeat state information processing include：Calculate the transport condition data collected and vehicle control amount and training Difference value between transport condition data in data set and vehicle control amount compares the difference value and given threshold Compared with if the difference value adds the transport condition data collected and vehicle control amount greater than the given threshold Enter training dataset.

Further, " the traveling that the transport condition data and vehicle control amount and training data that calculating collects are concentrated Difference value between status data and vehicle control amount ", as the following formula shown in function calculate difference value：

Wherein, x_s ^jFor the car speed of acquisition, x_t ^jFor the speed that training data is concentrated, xo_sFor the vehicle driving side of acquisition To with road direction angle, xo_tVehicle heading and road direction angle, u are concentrated for training data_sFor the vehicle control of acquisition Amount processed, u_tFor the vehicle control amount that training data is concentrated, m is the label of training data.

Further, the transport condition data includes speed, acceleration, the lateral shift of road, vehicle heading With road direction angle.

Compared with the immediate prior art, above-mentioned technical proposal is at least had the advantages that：

Intelligent automobile rotating direction control method based on Policy iteration of the invention realizes intelligence using neural network model The advantages of course changing control of automobile, this method combines off-line training and on-line training, need not pay huge trained cost Network training can be realized, simply and effectively initialize network.It also solves the disadvantage that on-line training simultaneously, that is, needs not It is disconnected to introduce training data, retraining constantly is carried out to network.Current ambient conditions-movement net is traversed as far as possible in guarantee sample While network, the generation of redundant data is also avoided as far as possible, improves the real-time of network training and current environment is fitted Ying Xing.

Detailed description of the invention

Fig. 1 is the adaptive dynamic algorithm flow diagram based on Policy iteration of an embodiment of the present invention；

Fig. 2 is the decision device mechanism flow diagram of an embodiment of the present invention.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.

Traditional intelligent automobile rotating direction control method needs to establish the mathematical model of motor turning control, for this reason, it may be necessary to All dependent variables in motor turning control are solved, such as automobile self performance, automobile present speed, acceleration, road parameters, Middle road parameters include road curve curvature, road inclination, coefficient of friction etc., after according to experimental fit or physics law The nonlinear equation for obtaining multivariable input is deduced, the determination of the complexity of equation and each term coefficient needs many experiments and survey Amount.Neural network has a good non-linear mapping capability, and network training mode includes on-line training and off-line training, however, On-line training needs continuous acquisition data and constantly training obtains neural network model, cannot preferably initialize trolley control System output, and since the acquisition of data is added in neural network model without the data that sensor is defeated of screening, it causes big Invalid redundant data is measured, the real-time of model training is influenced, in face of needing to repeat aforesaid operations when environmental change, further It is exaggerated the shortcomings that cannot initializing trolley control output and data redundancy.Off-line training mode is then due to network training data It is fixed, effective control can not be made sometimes in face of variation complex environment, to obtain preferable training effect, to guarantee sample This traverses entire sample-movement cyberspace as far as possible and needs to pay biggish trained cost.

Intelligent automobile rotating direction control method based on Policy iteration of the invention combines off-line training and on-line training Advantage carries out course changing control using the good model of off-line training, is simply effectively obtained initialization model network.Exist simultaneously Vehicle travel process joined judgment mechanism, and the on-line training of Controlling model solves and needs to continually introduce instruction in on-line training The shortcomings that practicing data, retraining constantly carried out to network.With reference to the accompanying drawing, to provided by the invention based on Policy iteration Intelligent automobile rotating direction control method is illustrated.

The present invention reflects the driving status of vehicle, the transport condition data packet of vehicle by the transport condition data of vehicle Include speed, acceleration, the lateral shift of road, vehicle heading and road direction angle；The control amount of vehicle, that is, vehicle control The input of signal processed, intelligent automobile is under unmanned mode, by constantly calculating the control amount of next acquisition moment vehicle, And the control system of intelligent automobile is inputed to, to realize the course changing control under unmanned mode.It should be noted that this reality Shi Zhong, the transport condition data of acquisition vehicle and the time interval of vehicle control amount are equal every time, when being specifically spaced Between can be adjusted according to the actual situation.

Intelligent automobile rotating direction control method of one of the present embodiment based on Policy iteration, including：

Acquire the transport condition data and vehicle control amount of vehicle；

Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, are predicted next Acquire the transport condition data at moment；Course changing control network model is the nerve after preset training dataset off-line training Network model；

If contrast function value is greater than preset threshold value, the transport condition data that will be collected in predetermined time period Training dataset is added with vehicle control amount, and on-line training is carried out to course changing control model based on the training dataset；

Steering based on the vehicle control amount control intelligent automobile for executing network output.

Further, course changing control network model is constructed based on BP neural network, and introduces a judgment mechanism, to control The on-line training of course changing control network model processed specifically acquires the transport condition data and vehicle control amount of vehicle；It will go Sail status data and vehicle control amount input course changing control network model, the corresponding next acquisition of course changing control network model prediction The transport condition data at moment；Calculate the transport condition data collected in setting time length and course changing control network mould The contrast function value E of the transport condition data at corresponding next acquisition moment of type prediction；By the contrast function value E and one Preset threshold value T1 is compared, if contrast function value E is greater than preset threshold T1, will be acquired in the setting time Preset training dataset is added to transport condition data and vehicle control amount, and based on the training dataset to course changing control Network model carries out on-line training, and otherwise course changing control model keeps current state；Using course changing control network model as controlled Object processed, based on evaluating network and executing network implementations Policy iteration algorithm, the execution network after obtaining optimization is based on executing The vehicle control amount of network output realizes the control of intelligent automobile automated steering.

The intelligent automobile rotating direction control method based on Policy iteration of the present embodiment needs during overcoming on-line training The shortcomings that continuous acquisition data and constantly training neural network model, while also solving driving status number in traditional mode It is directly inputted in neural network model according to without screening, a large amount of invalid redundant datas is generated, to influence neural network model The problem of trained real-time.

Further, " according to each transport condition data and course changing control network mould collected in predetermined time period The step of transport condition data at corresponding next acquisition moment of type prediction, calculating contrast function value ", includes, by formula (1) Shown in function calculate contrast function value：

Wherein, x_sFor the transport condition data collected, x is the driving status number of course changing control network model prediction According to c is the label of transport condition data, and a is the length of the comparison ordered series of numbers of setting, and b is comparison data label, needs to illustrate It is that a value of setting should not be too large or too small, leads to decision device noise-sensitive if a value is too small, led if a value is excessive Cause decision device insensitive to environmental change.

Further, if needing to instruct course changing control network model online in judgment mechanism in the method for the present invention Practice, training method is：

Step Sa11：Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment；

Step Sa12：It calculates course changing control network model prediction result and sets desired error；

Step Sa13：Utilize the weight of error backpropagation algorithm optimization course changing control network model；

Step Sa14：Step Sa11-Sa13 is repeated, until reaching maximum the number of iterations or course changing control network model Prediction result and the desired error of setting are in a certain range.

Further, in this embodiment course changing control network model is after preset training dataset off-line training Neural network model, control is turned to based on the good model realization of off-line training intelligent automobile of the invention based on Policy iteration Method processed overcomes the initial stage of on-line training, due to lacking data, it is difficult to the phenomenon that initializing course changing control output.This Control network model is turned in inventive embodiments, off-line training method is：

Step Sb11：The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training data is obtained Collection；

Step Sb12：Based on transport condition data and vehicle control amount that training data is concentrated, course changing control network model Predict the transport condition data at corresponding next acquisition moment；

Step Sb13：It calculates course changing control network model prediction result and sets desired error；

Step Sb14：Utilize the weight of error backpropagation algorithm optimization course changing control network model；

Step Sb15：Step Sb12-Sb14 is repeated, until maximum the number of iterations or course changing control network model are predicted As a result in a certain range with the desired error of setting.

Further, the transport condition data and vehicle control amount that intelligent vehicle running is acquired in above-mentioned steps Sb11, can To obtain polymorphic transport condition data and vehicle control amount by the driving status for constantly changing automobile.The traveling shape of acquisition State data include：Automobile related data, road related data；Automobile related data includes：Speed, acceleration；The road phase Closing data includes：Lateral shift, vehicle heading and the road direction angle of road；The acquisition device of road related data For one of camera, laser radar, GPS or multiple combinations；The acquisition device of automobile related data be automobile IMU or/and Wheel type encoder.

Transport condition data and vehicle control amount to acquisition pre-process.Pretreatment specifically includes denoising, sieve Except abnormal data, repeat state information processing.Carry out it is pretreated the reason is that in the collection process of data, due to sensor and The reason of vehicle itself, can inevitably generate noise and exceptional value, so the characteristic for noise source carries out data de-noising, screens out Exceptional value repeats state information processing.Wherein, repeating state information processing manner is：Calculate the transport condition data collected The difference value between transport condition data and vehicle control amount concentrated vehicle control amount and training data, by difference value with Given threshold is compared, if difference value is greater than given threshold, by the transport condition data collected and vehicle control amount Training dataset is added, otherwise gives up and collects transport condition data and vehicle control amount.

Specifically, by function shown in formula (2), calculate the transport condition data collected and vehicle control amount with The difference value between transport condition data and vehicle control amount that training data is concentrated：

Refering to attached drawing 1, Fig. 1 illustrates the adaptive dynamic based on Policy iteration of an embodiment of the present invention Algorithm flow schematic diagram, as shown in Figure 1, using course changing control network model as control target in this implementation, based on evaluation net Network and execution network implementations Policy iteration algorithm, the execution network after being optimized.The input for turning to network model is currently to adopt The transport condition data and vehicle control amount for collecting the moment export as the transport condition data at next acquisition moment；Evaluate network Constructed based on neural network, to assess the performance indicator at current acquisition moment, input for turn to network model output, Execute the output of network；It executes network and needs to be initialized that (the lesser number that the weighted value of network is positive is i.e. using algorithm Can).Output valve by evaluating network is feedback, executes network weight using stochastic gradient descent scheduling algorithm iteration, reduction is commented The output of valence network.

Further, in this embodiment shown in the Efficiency Function such as formula (3) of evaluation network：

Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, x_kFor actual transport condition data, u_kFor reality The vehicle control amount on border；And then the performance indicator that network can be based on formula (4) evaluation current acquisition moment is evaluated,

In the embodiment of the present invention during the adaptive dynamic algorithm of implementation strategy iteration, it is based on formula (5), (6) Shown in function to execute network optimize：

V_i(x_k)=U (x_k,υ_i(x_k))+V_i(x_k+1) (5)

Wherein, V_i(x_k) be control strategy under performance index function, U (x_k,υ_i(x_k)) it is the current efficiency for acquiring the moment Function, V_i(x_k+1) it is to the performance index function after control network iteration, υ_i(x_k) it is the current current control plan for acquiring the moment Vehicle control amount under slightly, υ_i+1(x_k) it is to control the vehicle control amount after network iteration, u at the current acquisition moment_kIt is actual Vehicle control amount, υ_kFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, x_kFor actual transport condition data, F (x_k, u_k) be course changing control network model abstract function.

Refering to attached drawing 2, Fig. 2 illustrates the decision device mechanism flow chart of an embodiment of the present invention, below with reference to Decision device mechanism shown in Fig. 2 describes the intelligent automobile course changing control based on Policy iteration of embodiment of the invention another The specific steps of method：

Acquire the transport condition data and vehicle control amount of intelligent automobile；

The data of acquisition are denoised, exceptional value is screened out, repeat state information processing；

Based on treated, running data constructs training data pond；

The training of off-line mode is carried out to course changing control network model using the data in training data pond；

In vehicle travel process, the running data of vehicle and the output data of Vehicular turn control network model are acquired；

Introduce decision device mechanism, i.e., the transport condition data collected in calculating setting time length and course changing control The contrast function value E of the transport condition data of the next sampling instant of correspondence of network model prediction, presets the actuation threshold of decision device Value T, when E value is greater than T, training is added in the transport condition data and vehicle control amount which is measured Data pool, and on-line training is carried out to course changing control network model based on the training data pond.Otherwise, course changing control network mould Type keeps current state；

Using course changing control network model as the object controlled, building evaluation network changes with network implementations strategy is executed For adaptive dynamic programming algorithm, the execution network after being optimized is controlled based on the vehicle control amount for executing network output The steering of intelligent automobile.

It should be noted that the realization of the entire algorithm in the embodiment of the present invention is real in the calculating equipment all on unmanned vehicle It is existing, and by decision device decide whether that detection data is added in training data pond and restart prototype network training.

Those skilled in the art should be able to recognize that, each example side described in conjunction with the examples disclosed in this document Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are executed actually with electronic hardware or software mode, the specific application and design constraint item depending on technical solution Part.Those skilled in the art can use different methods to achieve the described function each specific application, but this Kind is realized and be should not be considered as beyond the scope of the present invention.

Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including one The process, method of list of elements, not only includes those elements, but also other elements including being not explicitly listed, or also Including the intrinsic element of these process, methods.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, ability Field technique personnel are it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from Under the premise of the principle of the present invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, this Technical solution after a little changes or replacement will fall within the scope of protection of the present invention.

Claims

1. a kind of intelligent automobile rotating direction control method based on Policy iteration, it is characterised in that including：

Acquire the transport condition data and vehicle control amount of vehicle；

Transport condition data and vehicle control amount of the course changing control network model according to the current acquisition moment, when predicting next acquisition The transport condition data at quarter；The course changing control network model is the neural network after preset training dataset off-line training Model；

It is predicted according to each transport condition data collected in predetermined time period with course changing control network model corresponding The transport condition data at next acquisition moment, calculates contrast function value；

If the contrast function value is greater than preset threshold value, the driving status number that will be collected in the predetermined time period Be added the preset training dataset according to vehicle control amount, and based on the training dataset to course changing control network model into Row on-line training；

Using course changing control network model as control target, based on evaluation network and the adaptive of network implementations Policy iteration is executed Dynamic algorithm is answered, the execution network after being optimized；

2. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that " foundation Corresponding next acquisition that each transport condition data collected in predetermined time period is predicted with course changing control network model The transport condition data at moment, calculate contrast function value " the step of include, according to the following formula shown in contrast function E, calculate comparison Functional value：

Wherein, x_sFor the transport condition data collected, x is the transport condition data of course changing control network model prediction, and c is The label of transport condition data, a are the length of the comparison ordered series of numbers of setting, and b is comparison data label.

3. the intelligent automobile rotating direction control method according to claim 2 based on Policy iteration, which is characterized in that described turn To control network model, the step of on-line training, includes：

Step S101：Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model The transport condition data at corresponding next acquisition moment；

Step S104：Step S101-S103 is repeated, until reaching maximum the number of iterations or course changing control network model prediction knot Fruit and the desired error of setting are in a certain range.

4. the intelligent automobile rotating direction control method according to claim 3 based on Policy iteration, which is characterized in that evaluation net The Efficiency Function of network is：

Wherein, Q=I, R=0.5I, I are unit matrix, and T is transposition symbol, x_kFor actual transport condition data, u_kIt is actual Vehicle control amount；

Wherein, V (x_k) it is current acquisition moment performance index function, υ_i(x_k+l) it is the current vehicle control amount for acquiring the moment, x is The transport condition data of vehicle, k are the label at current acquisition moment, and l is moment label, as l → ∞, vehicle heading with Road direction is parallel, U (x_k+l,υ_i(x_k+l))→0。

5. the intelligent automobile rotating direction control method according to claim 4 based on Policy iteration, which is characterized in that utilizing During the adaptive dynamic algorithm for evaluating network and execution network implementations Policy iteration, based on function shown in following formula to execution Network optimizes：

V_i(x_k)=U (x_k,υ_i(x_k))+V_i(x_k+1)

Wherein, V_i(x_k) be control strategy under performance index function, U (x_k,υ_i(x_k)) it is the current Efficiency Function for acquiring the moment, V_i(x_k+1) it is to the performance index function after control network iteration, υ_i(x_k) it is under the current current control strategy for acquiring the moment Vehicle control amount, υ_i+1(x_k) it is to control the vehicle control amount after network iteration, u at the current acquisition moment_kFor actual vehicle control Amount, υ_kFor the vehicle control amount for currently acquiring the moment, i is the number of iterations, x_kFor actual transport condition data, F (x_k,u_k) be The abstract function of course changing control network model.

6. the intelligent automobile rotating direction control method according to claim 1 based on Policy iteration, which is characterized in that described turn To control network model, the step of off-line training, includes：

Step S201：The transport condition data and vehicle control amount of acquisition vehicle are simultaneously pre-processed, and training dataset is obtained；

Step S202：Based on transport condition data and vehicle control amount that training data is concentrated, the prediction of course changing control network model The transport condition data at corresponding next acquisition moment；

Step S205：Repeat step S202-S204, until maximum the number of iterations or course changing control network model prediction result with Set desired error in a certain range.

7. the intelligent automobile rotating direction control method according to claim 6 based on Policy iteration, which is characterized in that described pre- Processing includes：It denoises, screens out abnormal data, repeats state information processing；

It is described repeat state information processing include：Calculate the transport condition data collected and vehicle control amount and training dataset In transport condition data and vehicle control amount between difference value, the difference value is compared with given threshold, if institute Difference value is stated greater than the given threshold, then training number is added in the transport condition data collected and vehicle control amount According to collection.

8. the intelligent automobile rotating direction control method according to claim 7 based on Policy iteration, which is characterized in that " calculate The transport condition data and vehicle control amount that the transport condition data and vehicle control amount that collect and training data are concentrated it Between difference value ", as the following formula shown in function calculate difference value：

Wherein, x_s ^jFor the car speed of acquisition, x_t ^jFor the speed that training data is concentrated, xo_sFor acquisition vehicle heading with Road direction angle, xo_tVehicle heading and road direction angle, u are concentrated for training data_sFor the vehicle control amount of acquisition, u_tFor the vehicle control amount that training data is concentrated, m is the label of training data.

9. the intelligent automobile rotating direction control method according to claim 1 to 8 based on Policy iteration, feature It is, the transport condition data includes that speed, acceleration, the lateral shift of road, vehicle heading and road direction press from both sides Angle.