CN107885086A - Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study - Google Patents
Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study Download PDFInfo
- Publication number
- CN107885086A CN107885086A CN201711144395.2A CN201711144395A CN107885086A CN 107885086 A CN107885086 A CN 107885086A CN 201711144395 A CN201711144395 A CN 201711144395A CN 107885086 A CN107885086 A CN 107885086A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- mtd
- msub
- mtr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a kind of autonomous navigation device control parameter on-line control method based on MCMC optimization Q study, comprise the following steps:The possibility situation of change of ROV pid control parameter is carried out by statistics according to the situation of reality first and draws the set of actions of parameter regulation, and experience initialization pid control parameter is controlled according to ROV;Then random selection one kind acts on autonomous navigation device, according to the value function value Q of each action obtained in Q learning algorithms*The action for showing that subsequent time is taken is sampled with MCMC algorithms, and the Studying factors l in Q learning algorithms is adjusted using SPSA step-lengths regulation algorithm over time;Finally the regulation repeatedly by control parameter draws optimal control parameter under the present circumstances.The present invention solves hyperharmonic delay problem of the autonomous navigation device during navigation, autonomous navigation device is rapidly adapted to the change of environment and arriving at for quick and stable.
Description
Technical field
It is specifically a kind of to autonomous navigation device control the invention belongs to autonomous navigation device control parameter on-line tuning field
The method of parameter regulation processed.
Background technology
ROV autonomous navigation refers to the destination that ROV reaches in the water surface by being artificially assigned to, then contexture by self
The path advanced well, arrived at eventually through continuous self-control.Have in water quality inspection and beat surface etc.
Important application value.
At present, traditional autonomous navigation device is using fixed pid parameter method, and this method is using fixed ROV control
Parameter processed, parameter are as acquired in substantial amounts of ROV autonomous navigation engineering project experience.When fixed control parameter is not suitable for
The problem of hyperharmonic response delay can be brought during current environment to ROV autonomous navigation, especially in the changeable situation of environment
Under, fixed control parameter may have preferable response to individual circumstances state, but can not meet all ambient conditions,
Artificial change ROV control parameter is needed to be not easy to the use of ROV when the environment changes.
The method that ROV control parameter regulation is more also carried out using fuzzy algorithmic approach, annealing algorithm, these methods
Control parameter self-correcting mechanism is introduced to a certain extent, but because these methods are not intelligent control algorithm in itself,
So the situation changeable to environment still can not solve the problems, such as autonomous navigation device control parameter quick regulation to optimal value.
The content of the invention
The present invention is the above-mentioned the shortcomings of the prior art of solution, there is provided one kind is based on MCMC optimization Q study
Autonomous navigation device control parameter on-line control method, to solve hyperharmonic time delay of the autonomous navigation device during navigation
Problem, so that autonomous navigation device rapidly adapts to arriving at for the change of environment and quick and stable.
In order to achieve the above object, the technical solution adopted in the present invention is:
The present invention it is a kind of based on MCMC optimization Q study autonomous navigation device control parameter on-line control method the characteristics of include
Following steps:
Step 1, the control accuracy α according to autonomous navigation device, tri- controls of autonomous navigation device PID are respectively obtained using formula (1)
Parameter k processedp、kiAnd kdAdjustment parameter Δ kp、ΔkiWith Δ kd:
In formula (1), Xp、Xi、XdDescribed three pid control parameter k of autonomous navigation device are represented respectivelyp、kiAnd kdThreshold value model
Enclose;
Step 2, utilize the adjustment parameter Δ kp、ΔkiWith Δ kdCombination draws the Parameters variation of the autonomous navigation device
Set of actions, it is designated as A={ a1,a2,···,an,···,aN, wherein, anRepresent in the Parameters variation set of actions
N control parameter regulation action, andRepresent the corresponding proportion adjustment of n-th of action
Parameter,The corresponding integral adjustment parameter of n-th of action is represented,Represent that n-th of control parameter regulation is dynamic
Make corresponding differential adjustment parameter, n=1,2 ..., N;
Step 3, setting time t=1, randomly choose a control parameter regulation actionAct on the autonomous navigation
Device;
Initialize the relevant parameter in Q learning algorithms:T Studying factors ltWith discount factor γ, lt> 0, γ ∈ [0,
1];
Described tri- control parameter k of PID are initialized according to the control experience of the autonomous navigation devicep、kiAnd kd;
By the value function estimate at t-1 moment in the Q learning algorithmsInitialized, wherein,
et-1Represent error of the autonomous navigation device at the t-1 moment, Δ et-1Represent that error of the autonomous navigation device at the t-1 moment becomes
Rate, and by et-1With Δ et-1Form the ambient condition at t-1 moment;
Step 4, the number N acted according to control parameter regulation in the Parameters variation set of actions A of the autonomous navigation device,
Transfer matrix using formula (2) to the decision process in Q learning algorithmsInitialized:
In formula (2),Represent that the t-1 moment acts from control parameter regulationIt is transferred to control parameter regulation
ActionTransition probability, and as t=1,
Step 5, optimize the decision process that Q learning algorithms obtain t using MCMC;
Step 5.1, n-th of control parameter regulation of t action is calculated using formula (3)Value function under ambient condition
Value
In formula (3), wj(t-1) represent BP neural network in j-th of hidden layer of t-1 moment weights, j=1,2 ...,
nh;Nh represents the number of BP neural network hidden layer;yj(t-1) represent BP neural network in j-th of hidden layer of t-1 moment it is defeated
Go out, and have:
In formula (4), oj(t-1) input of j-th of hidden layer of t in BP neural network is represented, and is had:
In formula (5), wij(t-1) represent BP neural network in i-th of input layer of t-1 moment to j-th of hidden layer weights,
xi(t-1) represent BP neural network in i-th of input layer of t-1 moment input, i=1,2 ..., ni, ni represent BP nerve nets
The number of network input layer;
Step 5.2, the control parameter regulation action for drawing autonomous navigation device described in t is sampled using MCMC algorithms
Step 5.2.1, acted according to n-th of control parameter regulation of tValue function value under ambient conditionThe action chosen with the t-1 momentUtilize the transition probability matrix of formula (6) renewal decision process
In formula (6),Represent n-th of control parameter regulation of t actionValue function value, i.e., Represent the summation of the value function value of t everything, n=1,2 ..., N;
Represent that t acts from n-th of control parameter regulationIt is transferred to m-th of control parameter regulation actionTransition probability;
Step 5.2.2, sampling number c=0,1,2C is set;
Step 5.2.3, to the transition probability matrix of tC sampling is carried out, and MCMC algorithms are obtained using formula (7)
In the sampling of t the c+1 times receptance
In formula (7),Represent the action obtained by the c+1 times sampling of tProbable value,Represent t
Action obtained by the c times sampling of momentProbable value;As c=0, the action obtained by the c times sampling of t is made
Probability distributionTo wait general distribution, i.e.,
Step 5.2.4, sampled to obtain random receptance u from being uniformly distributed in Uniform [0,1], will be received at random
Rate u and the receptanceIt is compared, ifThen receive the c+1 times dynamic obtained by sampling
MakeOtherwise the action obtained by the c+1 times sampling is not receivedAnd willIt is assigned to
Step 5.2.5, the action obtained by the c+1 times sampling of t is updated using formula (8)Probability distribution
In formula (8),Represent the action obtained by the c times sampling of tProbability distributionDenominator;
Represent the action obtained by the c times sampling of tProbability distributionMolecule;As c=0, order
Step 5.2.6, make c+1 be assigned to c, and judge whether c > C set up, if so, then execution step 5.2.7, it is no
Then, return to step 5.2.3 orders perform;
Step 5.2.7, to the transition probability matrix of tThe C+1 times sampling is carried out, obtains t autonomous navigation device
Control parameter regulation actionAnd make t value function estimateFor autonomous navigation device described in t
Control parameter regulation actsValue function value
Step 6, the control parameter regulation action of t autonomous navigation device is obtained using formula (9)Behavior act return
Value
In formula (9), α and β represent error return parameter and error rate return parameter respectively, the < β < 1 of 0 < α < 1,0,
And alpha+beta=1;
Step 7, utilize formula (10) renewal t-1 moment value Function Estimation valuesFor t-1 moment end values
Functional value
In formula (10),End value function difference value is represented, and is had:
Step 8, make t+1 be assigned to t, judge t > tmaxWhether set up, if so, then perform step 9;Otherwise according to SPSA
Step-length regulation algorithm t over time change, using formula (12) to Studying factors ltIt is adjusted, wherein, tmaxSet by expression
Maximum iteration:
In formula (12), l is the Studying factors value at t=1 moment, and μ and λ are the nonnegative constant in SPSA step-lengths regulation algorithm;
Step 9, the end value functional value for judging continuous two momentIt is
No establishment, if so, then represent that the regulation of autonomous navigation device pid control parameter finishes, and jump to step 11;Otherwise, step is performed
Rapid 10;
Step 10, judge whether t exceeds the stipulated time, if exceeding, jump to step 3, reselect initial control ginseng
Number regulation actionAdjust autonomous navigation device pid control parameter;Otherwise, jump to step 5 and continue autonomous navigation device PID
Control parameter is adjusted;
Step 11, make t=1;
The ambient condition e of step 12, autonomous navigation device collection ttWith Δ et, judge | et| > | emin| or | Δ
et| > | Δ emin| whether set up, if so, then perform step 13;Otherwise return to step 11;Wherein, eminWith Δ eminTable respectively
Show ambient condition error and error rate minimum value that autonomous navigation device allows;
Step 13, make t+1 be assigned to t, and judge whether t > T set up, if so, then perform step 3;Otherwise step is returned
Rapid 12 perform;Wherein, T represents that autonomous navigation device adapts to the time constant of environmental change speed.
Compared with the prior art, beneficial effects of the present invention are:
1st, present invention employs Q learning algorithms to carry out on-line control to ROV autonomous navigation control parameter, and learns in Q
MCMC sampling algorithms and SPSA step-lengths regulation algorithm are introduced in algorithm, makes ROV adaptive ring during autonomous navigation
The change in border and the voyage conditions for prejudging subsequent time in advance, solve the problems, such as ROV hyperharmonic time delay, make navigation process
More steady, particularly parameter regulation is rapid in the case of Changes in weather, has in ROV autonomous navigation field wide
Application prospect.
2nd, it is present invention introduces Q learning algorithms, ROV control effect is associated with ambient condition, pass through environmental feedback
Return value judges the quality of this parameter regulation action, and the direction become better of parameter regulation is gradually approached, and solves boat
Is there is the problem of hyperharmonic response delay in row device during navigation, control parameter is quickly changed to the change of environment
Optimal value, so as to rapidly adapt to the change of environment.
3rd, the present invention introduces MCMC sampling algorithms in traditional Q learning algorithms and is used to optimize, and will take at current time
Parameter regulation action policy do not use the single action for taking maximum behavior value function value, but by between behavior act
Transition probability goes the overall probability Distribution Model of estimation, solves the problems, such as to be absorbed in local optimum during Q learning algorithms selection action,
So as to draw the optimal adjustment action policy during the navigation of autonomous navigation device.
4th, initial samples moment action probability distribution the general distribution such as is arranged to by the present invention in MCMC sampling algorithms so that
MCMC sampling algorithms have the generality of action behavior sampling, the action that the later stage obtains with sampling every time early stage in algorithm operation
Action probability distribution is updated, will sample the action probability distribution ratio increase corresponding to obtained action every time, so that
Improve the correctness of per moment action sampling.
5th, change of the present invention to traditional Q learning algorithm learning factors l employs SPSA step-lengths regulation algorithm, passes through
The setting of parameters, defines the speed degree of Studying factors l changes and the section model of change in SPSA step-lengths regulation algorithm
Enclose, so that Studying factors l change has certain regularity during Q learning algorithms, make autonomous navigation device parameter regulation more
Add accurate.
Brief description of the drawings
Fig. 1 is the autonomous navigation device control parameter on-line control Method And Principle block diagram that the present invention optimizes Q study based on MCMC;
Fig. 2 is MCMC Optimization Steps figures in Q learning algorithms of the present invention;
Fig. 3 is the autonomous navigation device control parameter on-line control method flow diagram that the present invention optimizes Q study based on MCMC;
Fig. 4 is that BP neural network solves action behavior value function schematic diagram;
Fig. 5 is that the autonomous navigation device navigation process under different experiments disappears the inventive method with the fixed pid parameter method of tradition
The Comparison of experiment results figure of time-consuming;
Fig. 6 is that the environment during the navigation of autonomous navigation device is constant with the fixed pid parameter method of tradition for the inventive method
In the case of Real-time Error etComparison of experiment results figure;
Fig. 7 is that the environment during the navigation of autonomous navigation device becomes the inventive method with the fixed pid parameter method of tradition
Real-time Error e in the case of changetComparison of experiment results figure;
Fig. 8 is that the inventive method with the fixed pid parameter method of tradition sent out by the environment during the navigation of autonomous navigation device
Real-time Error e in the case of after changingtComparison of experiment results figure.
Embodiment
In the present embodiment, the principle of the autonomous navigation device control parameter on-line control method based on MCMC optimization Q study is such as
Shown in Fig. 1, the error e of autonomous navigation device real-time reception current environmenttWith error rate Δ et, Q study is optimized by MCMC and calculated
Method Real-time Decision goes out the parameter regulation action a of subsequent timen, finally when the end value functional value in Q learning algorithms is not occurring
The control parameter optimal value under current environment is drawn during change.MCMC Optimization Steps are as shown in Figure 2 in Q learning algorithms.This method
It is to be applied to autonomous navigation device control parameter on-line tuning field, is adapted to by changing the control parameter of autonomous navigation device current
Environment.
As shown in figure 3, autonomous navigation device control parameter on-line control method is carried out as follows:
Step 1, pid control parameter include scale parameter kp, integral parameter kiWith differential parameter kd, wherein scale parameter kp
Effect be to speed up the response speed of system, improve the degree of regulation of system, integral parameter kiEffect be the steady of elimination system
State error, differential parameter kdEffect be improvement system dynamic characteristic;
According to the control accuracy α of autonomous navigation device, tri- control parameters of autonomous navigation device PID are respectively obtained using formula (1)
kp、kiAnd kdAdjustment parameter Δ kp、ΔkiWith Δ kd:
In formula (1), Xp、Xi、XdDescribed three pid control parameter k of autonomous navigation device are represented respectivelyp、kiAnd kdThreshold value model
Enclose;
Such as α=0.1, Xp∈ [10,20], Xi∈ [1,6], Xp∈ [1,2], adjustment parameter Δ k is drawn according to formula (1)p's
Transition activities are positive increase 1, constant and reverse reduction 1;Δ k can similarly be drawniWith Δ kdTransition activities;
Traditional autonomous navigation device is given certainly using fixed pid parameter method, this method due to the uncertainty of environment
Main ROV is bringing the problem of hyperharmonic response is delayed during navigation, need artificial modification simultaneously for different environment
Pid parameter adapts to.So being directed to these problems and puzzlement, Q learning algorithms are introduced herein and carry out real-time online adjustment PID control
Parameter.
Q learning algorithms are a kind of intelligence learning algorithms of the Chris Watkins in proposition in 1989, by TD algorithms and dynamic
Planning is combined, and Watkins work advances the fast development of intensified learning.Q learning algorithms are a kind of and real system moulds
Type is unrelated, the nitrification enhancement of value iterative type, and the algorithm is the relevant theoretical of Dynamic Programming and animal learning is psychologic has
Profit be combined with each other, for solving the problems, such as the used sequence Optimal Decision-making with delay return.
Step 2, need to change autonomous navigation device control parameter by decision-making in due to Q learning, join if PID adjusted
Number is divided into from the point of view of three actions, then can increase the computation complexity in Q learning algorithms, so utilizing the adjustment parameter Δ kp、
ΔkiWith Δ kdCombination draws the Parameters variation set of actions of the autonomous navigation device, is designated as A={ a1,a2,···,
an,···,aN, wherein, anRepresent that n-th of control parameter regulation acts in the Parameters variation set of actions, andThe corresponding proportion adjustment parameter of n-th of action is represented,Represent described n-th
The corresponding integral adjustment parameter of action,Represent the corresponding differential regulation ginseng of n-th of control parameter regulation action
Number, n=1,2 ..., N;
Step 3, setting time t=1, randomly choose a control parameter regulation actionAct on the autonomous navigation
Device;
Initialize the relevant parameter in Q learning algorithms:T Studying factors ltWith discount factor γ, lt> 0, γ ∈ [0,
1];
Studying factors l in the Q learning algorithmstOver time t change and change, early stages of Q learning algorithms needs
Larger learning value is obtained from sample data, so initial Studying factors ltFor a larger positive number, over time t increasing
Add autonomous navigation device not in the very big learning value of needs so as to by Studying factors ltTaper into;Discount factor γ is used to control certainly
Main ROV is to short-term and long-term results consideration degree, such as considers two extreme cases, and as γ=0, autonomous navigation device is only
Consider the return value of current time environment, the return value for the moment environment that only looked to the future as γ=1, so according to autonomous navigation
Device actual demand is set to discount factor, typically takes γ=0.5 pair current time and future time instance to consider;
Described tri- control parameter k of PID are initialized according to the control experience of the autonomous navigation devicep、kiAnd kd;Such as
This experimental system initially sets three control parameters as kp=2.5, ki=0.5, kd=0.2;
By the value function estimate at t-1 moment in the Q learning algorithmsInitialized, wherein,
et-1Represent error of the autonomous navigation device at the t-1 moment, Δ et-1Represent that error of the autonomous navigation device at the t-1 moment becomes
Rate, and by et-1With Δ et-1Form the ambient condition at t-1 moment;
T=1 moment setting value Function Estimation valuesError et-1=0, error rate Δ et-1
=0;
In step 4, Q learning algorithms, the maximum action of autonomous navigation device selective value functional value is not only needed, to obtain maximum
Instant return;Also need to autonomous navigation device and select different actions as far as possible, it is contemplated that the situation of everything is so as to obtaining
Optimal policy.If autonomous navigation device selects the action with peak functional value always, can have the disadvantage that:If in the early stage
The stage of acquisition experience, autonomous navigation device not yet acquire optimal strategy, then the study stage afterwards would be impossible to obtain again
Optimal strategy.
So MCMC sampling algorithms are introduced in Q learning algorithms is used for the action that decision-making is chosen per the moment.MCMC is sampled
Algorithm obtains the sampled value for meeting action probability distribution by being sampled to action transfer matrix, in the case of Probability Distributed Unknown
The action for showing that per moment is chosen can accurately be sampled.
The number N acted according to control parameter regulation in the Parameters variation set of actions A of the autonomous navigation device, utilizes formula
(2) to the transfer matrix of the decision process in Q learning algorithmsInitialized:
In formula (2),Represent that the t-1 moment acts from control parameter regulationIt is dynamic to be transferred to control parameter regulation
MakeTransition probability, and as t=1,
Step 5, optimize the decision process that Q learning algorithms obtain t using MCMC;
Step 5.1, BP neural network have the ability of Approximation of Arbitrary Nonlinear Function, for solving extensive and continuous
Evolvement problem in state space plays an important roll, and BP neural network solves action behavior value function principle as shown in figure 4, profit
N-th of control parameter regulation of t is calculated with formula (3) to actValue function value under ambient condition
In formula (3), wj(t-1) represent BP neural network in j-th of hidden layer of t-1 moment weights, j=1,2 ...,
nh;Nh represents the number of BP neural network hidden layer;yj(t-1) represent BP neural network in j-th of hidden layer of t-1 moment it is defeated
Go out, and have:
In formula (4), oj(t-1) input of j-th of hidden layer of t in BP neural network is represented, and is had:
In formula (5), wij(t-1) represent BP neural network in i-th of input layer of t-1 moment to j-th of hidden layer weights,
xi(t-1) represent BP neural network in i-th of input layer of t-1 moment input, i=1,2 ..., ni, ni represent BP nerve nets
The number of network input layer;
Such as ni=3 represents 3 input layers of BP neural network, respectively error et-1, error rate Δ et-1With
ActionInput;Nh=5 represents to contain five hidden layer nodes, general hidden layer node number more at most counting accuracy
It is higher, but the complexity calculated is also bigger;The t=1 moment sets the weight w of hidden layerj(t-1)=1, j=1,2 ..., nh,
Input layer weight wij(t-1)=0.8, i=1,2 ..., ni;
Step 5.2, the control parameter regulation action for drawing autonomous navigation device described in t is sampled using MCMC algorithms
Step 5.2.1, acted according to n-th of control parameter regulation of tValue function value under ambient conditionThe action chosen with the t-1 momentUtilize the transition probability matrix of formula (6) renewal decision process
In formula (6),Represent n-th of control parameter regulation of t actionValue function value, i.e., Represent the summation of the value function value of t everything, n=1,2 ..., N;
Represent that t acts from n-th of control parameter regulationIt is transferred to m-th of control parameter regulation actionTransition probability;
Step 5.2.2, sampling number c=0,1,2C is set;
Step 5.2.3, to the transition probability matrix of tC sampling is carried out, and MCMC algorithms are obtained using formula (7)
In the sampling of t the c+1 times receptance
In formula (7),Represent the action obtained by the c+1 times sampling of tProbable value,Represent t
Action obtained by the c times sampling of momentProbable value;As c=0, the action obtained by the c times sampling of t is madeProbability distributionTo wait general distribution, i.e.,
By formula (7) as can be seen that tWithIt is definite value, when t
The action of c+1 samplingThe more big receptance that then samples of corresponding probable value is bigger, otherwise sampling receptance is smaller;
Because MCMC sampling algorithms are the transition probability matrixs by sampling actionGo to obtain and meet action probability distributionSampled value, so action probability distribution p (a when MCMC sampling algorithms startn) can arbitrarily set;Start to sample
When set actionProbability distributionTo wait general distribution,ROV is set to have every kind of action identical
Sampled probability, ensure that Q learning algorithms to per moment act sampling correctness;
Step 5.2.4, sampled to obtain random receptance u from being uniformly distributed in Uniform [0,1], will be received at random
Rate u and the receptanceIt is compared, ifThen receive the c+1 times dynamic obtained by sampling
MakeOtherwise the action obtained by the c+1 times sampling is not receivedAnd willIt is assigned to
Such as random receptance u=0.5, if the sampling receptance obtained according to formula (7)
Then think this sampling failure, sampling action valueKeep constant;If the sampling receptance obtained according to formula (7)Then think that this is sampled successfully, sampling action valueIt is changed into
Step 5.2.5, the action obtained by the c+1 times sampling of t is updated using formula (8)Probability distribution
In formula (8),Represent the action obtained by the c times sampling of tProbability distributionDenominator;
Represent the action obtained by the c times sampling of tProbability distributionMolecule;As c=0, order
Step 5.2.6, make c+1 be assigned to c, and judge whether c > C set up, if so, then execution step 5.2.7, it is no
Then, return to step 5.2.3 orders perform;
Step 5.2.7, to the transition probability matrix of tThe C+1 times sampling is carried out, obtains t autonomous navigation device
Control parameter regulation actionAnd make t value function estimateFor autonomous navigation device described in t
Control parameter regulation actsValue function value
Acted according to MCMC algorithms when sampling number c reaches 100 timesProbability distributionBasically reach steadily,
General setting C=100;Sampling number C can be set according to the precision of aircraft systems;
Step 6, the control parameter regulation action of t autonomous navigation device is obtained using formula (9)Behavior act return
Value
In formula (9), α and β represent error return parameter and error rate return parameter respectively, the < β < 1 of 0 < α < 1,0,
And alpha+beta=1;
Behavior act return valueIllustrate the action of t parameter regulationAfter acting on autonomous navigation device
The running situation of ROV, if the ambient condition returned is deteriorated, now behavior act return valueFor one
Negative, represent punishment;If the ambient condition returned improves, now behavior act return valueFor one just
Number, represent reward;If the ambient condition returned does not change, now behavior act return valueIt is zero, table
Show holding;The ambient condition of autonomous navigation device includes error etWith Δ et, so introducing α and β environment according to the different of importance
State reporting parameter determines the influence degree of different conditions, typically sets α=0.8, β=0.2;
Step 7, utilize formula (10) renewal t-1 moment value Function Estimation valuesFor t-1 moment end values
Functional value
In formula (10),End value function difference value is represented, and is had:
Step 8, make t+1 be assigned to t, judge t > tmaxWhether set up, if so, then perform step 9;Otherwise according to SPSA
Step-length regulation algorithm t over time change, using formula (12) to Studying factors ltIt is adjusted, wherein, tmaxSet by expression
Maximum iteration:
In formula (12), l is the Studying factors value at t=1 moment, and μ and λ are the nonnegative constant in SPSA step-lengths regulation algorithm;
Be introduced into SPSA step-lengths regulation algorithm make Q learn in Studying factors ltChange has certain regularity, and leads to
The setting of non-negative parameter μ and λ in SPSA step-lengths regulation algorithm is crossed, defines Studying factors ltThe speed degree of change and change
Interval range makes ROV parameter regulation more accurate, typically sets tmax=30, μ=0.3, λ=1.2;
Step 9, the end value functional value for judging continuous two momentIt is
No establishment, if so, then represent that the regulation of autonomous navigation device pid control parameter finishes, and jump to step 11;Otherwise, step is performed
Rapid 10;
ε is that a minimum positive number finishes for judging whether pid control parameter is adjusted, and the control accuracy of ROV has
Close;When ε is smaller, then the precision of ROV autonomous navigation will be higher, and obtained ROV pid control parameter will be closer to most
The figure of merit, typically set ε=0.2;
Step 10, judge whether t exceeds the stipulated time, if exceeding, jump to step 3, reselect initial control ginseng
Number regulation actionAdjust autonomous navigation device pid control parameter;Otherwise, jump to step 5 and continue autonomous navigation device PID
Control parameter is adjusted;
Step 11, make t=1;
The ambient condition e of step 12, autonomous navigation device collection ttWith Δ et, judge | et| > | emin| or | Δ
et| > | Δ emin| whether set up, if so, then perform step 13;Otherwise return to step 11;Wherein, eminWith Δ eminTable respectively
Show ambient condition error and error rate minimum value that autonomous navigation device allows;Such as general setting emin=0.1, Δ emin=
0.05;
Step 13, make t+1 be assigned to t, and judge whether t > T set up, if so, then perform step 3;Otherwise step is returned
Rapid 12 perform;Wherein, T represents that autonomous navigation device adapts to the time constant of environmental change speed.
Experimental result:
This patent method and the fixed pid parameter method of tradition are respectively used to autonomous navigation device simultaneously, it is multigroup right to have carried out
Than experiment, ensure that two groups of autonomous navigation devices reach identical terminal from identical starting point simultaneously in experiment.Fig. 5 is autonomous navigation
Device navigates by water the time consuming comparing result of process;Fig. 6, Fig. 7 and Fig. 8 are that autonomous navigation device navigates by water process Real-time Error etContrast
As a result.
It is time consuming experimentally in contrast, three groups of contrast experiments are taken, every group of experiment carries out 50 times and result is taken
Average value.First group of arrival time for contrasting two groups of autonomous navigation devices in the case of stable for current environment, second group is environment
The arrival time of two groups of autonomous navigation devices is contrasted in the case of suddenly change during navigation, the 3rd group is after environmental change
In the case of contrast arrival times of two groups of autonomous navigation devices;As shown in Figure 5, due to using in the state of initial environment is stable
The pid control parameter that the autonomous navigation device of fixed pid parameter method uses is close to optimized parameter so and using this patent method
Autonomous navigation device elapsed time it is roughly the same;When environment is during navigation in the case of suddenly change, although two groups autonomous
The time that ROV reaches is all elongated, but this it appears that is used using the autonomous navigation device elapsed time ratio of this patent method
The autonomous navigation device of conventional method is much smaller, and the time increased using the autonomous navigation device of this patent method occurs mainly in regulation
During control parameter;In the case of after the environmental change, because the autonomous navigation device using this patent method will
Optimal value is arrived in control parameter regulation under current environment, so the time of consumption has been returned to and identical water before environmental change
It is flat, and the autonomous navigation device of conventional method is used because the control parameter under new environment has not reacceesed optimal control parameter,
So the time continuation of consumption is elongated, when environmental change is violent, will be occurred using the autonomous navigation device of conventional method can not
Reach the situation for specifying destination.
In contrast Real-time Error etExperimentally, above-mentioned three groups of contrast experiments are equally taken, every group of experiment carries out 50 times simultaneously
To results averaged.Fig. 6 is the constant comparing result of initial environment, it is possible to find two groups of autonomous navigation device Real-time Error etChange
Change situation is roughly the same;Fig. 7 is contrast knot of the environment when autonomous navigation device navigates by water process the 7th second in the case of suddenly change
Fruit, it is possible to find two groups of autonomous navigation device Real-time Error e in environment suddenly changetAll great increase, but use this patent method
Autonomous navigation device through navigational parameter after a while adjustment after Real-time Error etIt is rapidly reduced to close to 0, and uses again
The autonomous navigation device Real-time Error e of conventional methodt0 can not be reduced to fluctuate up and down in an error range always;Fig. 8 is ring
Comparing result in the case of after the change of border, it is possible to find using the Real-time Error e of the autonomous navigation device of this patent methodtChange
Rule before rule and environmental change is basically identical, and uses the autonomous navigation device Real-time Error e of conventional methodtIt can not be reduced to
0 fluctuates up and down in an error range always.
Two kinds of comparing results under three groups of experiments of comprehensive appeal find that the relatively conventional fixed PID control of this patent method is joined
Counting method has more preferable autonomous navigation effect in the case where environment is changeable, while solves because control parameter is not current
Caused by optimal value under environment the problem of autonomous navigation device hyperharmonic response delay.
Claims (1)
- A kind of 1. autonomous navigation device control parameter on-line control method based on MCMC optimization Q study, it is characterised in that:Including with Lower step:Step 1, the control accuracy α according to autonomous navigation device, tri- control ginsengs of autonomous navigation device PID are respectively obtained using formula (1) Number kp、kiAnd kdAdjustment parameter Δ kp、ΔkiWith Δ kd:<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&Delta;k</mi> <mi>p</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&alpha;X</mi> <mi>p</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>&alpha;X</mi> <mi>p</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> <mtd> <mrow> <msub> <mi>&Delta;k</mi> <mi>i</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&alpha;X</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>&alpha;X</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> <mtd> <mrow> <msub> <mi>&Delta;k</mi> <mi>d</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&alpha;X</mi> <mi>d</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>&alpha;X</mi> <mi>d</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>In formula (1), Xp、Xi、XdDescribed three pid control parameter k of autonomous navigation device are represented respectivelyp、kiAnd kdThreshold range;Step 2, utilize the adjustment parameter Δ kp、ΔkiWith Δ kdCombination draws the Parameters variation action of the autonomous navigation device Set, is designated as A={ a1,a2,···,an,···,aN, wherein, anRepresent in the Parameters variation set of actions n-th Control parameter regulation acts, and Represent the corresponding proportion adjustment ginseng of n-th of action Number,The corresponding integral adjustment parameter of n-th of action is represented,Represent n-th of control parameter regulation action Corresponding differential adjustment parameter, n=1,2 ..., N;Step 3, setting time t=1, randomly choose a control parameter regulation actionAct on the autonomous navigation device;Initialize the relevant parameter in Q learning algorithms:T Studying factors ltWith discount factor γ, lt> 0, γ ∈ [0,1];Described tri- control parameter k of PID are initialized according to the control experience of the autonomous navigation devicep、kiAnd kd;By the value function estimate at t-1 moment in the Q learning algorithmsInitialized, wherein, et-1Table Show error of the autonomous navigation device at the t-1 moment, Δ et-1Error rate of the autonomous navigation device at the t-1 moment is represented, And by et-1With Δ et-1Form the ambient condition at t-1 moment;Step 4, the number N acted according to control parameter regulation in the Parameters variation set of actions A of the autonomous navigation device, are utilized Transfer matrix of the formula (2) to the decision process in Q learning algorithmsInitialized:<mrow> <msubsup> <mi>p</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>In formula (2),Represent that the t-1 moment acts from control parameter regulationIt is transferred to control parameter regulation actionTransition probability, and as t=1,Step 5, optimize the decision process that Q learning algorithms obtain t using MCMC;Step 5.1, n-th of control parameter regulation of t action is calculated using formula (3)Value function value under ambient condition<mrow> <msup> <mi>Q</mi> <mo>*</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>n</mi> <mi>h</mi> </mrow> </munderover> <msub> <mi>w</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <msub> <mi>y</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>In formula (3), wj(t-1) weights of j-th of hidden layer of t-1 moment in BP neural network, j=1,2 ..., nh are represented;Nh tables Show the number of BP neural network hidden layer;yj(t-1) output of j-th of hidden layer of t-1 moment in BP neural network is represented, and Have:<mrow> <msub> <mi>y</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>-</mo> <msup> <mi>e</mi> <mrow> <msub> <mi>o</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </msup> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <msub> <mi>o</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>In formula (4), oj(t-1) input of j-th of hidden layer of t in BP neural network is represented, and is had:<mrow> <msub> <mi>o</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>n</mi> <mi>i</mi> </mrow> </munderover> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <msub> <mi>x</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>In formula (5), wij(t-1) represent that i-th of input layer of t-1 moment is to the weights of j-th of hidden layer, x in BP neural networki (t-1) represent BP neural network in i-th of input layer of t-1 moment input, i=1,2 ..., ni, ni represent BP neural network The number of input layer;Step 5.2, the control parameter regulation action for drawing autonomous navigation device described in t is sampled using MCMC algorithmsStep 5.2.1, acted according to n-th of control parameter regulation of tValue function value under ambient conditionThe action chosen with the t-1 momentUtilize the transition probability matrix of formula (6) renewal decision process<mrow> <msubsup> <mi>p</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> <mi>t</mi> </msubsup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mfrac> <msubsup> <mi>Q</mi> <mrow> <mn>1</mn> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> <mrow> <mo>&Sigma;</mo> <msubsup> <mi>Q</mi> <mrow> <mi>n</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> </mrow> </mfrac> </mtd> <mtd> <mfrac> <msubsup> <mi>Q</mi> <mrow> <mn>2</mn> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> <mrow> <mo>&Sigma;</mo> <msubsup> <mi>Q</mi> <mrow> <mi>n</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> </mrow> </mfrac> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mfrac> <msubsup> <mi>Q</mi> <mrow> <mi>m</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> <mrow> <mo>&Sigma;</mo> <msubsup> <mi>Q</mi> <mrow> <mi>n</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> </mrow> </mfrac> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mfrac> <msubsup> <mi>Q</mi> <mrow> <mi>N</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> <mrow> <mo>&Sigma;</mo> <msubsup> <mi>Q</mi> <mrow> <mi>n</mi> <mi>t</mi> </mrow> <mo>*</mo> </msubsup> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>1</mn> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>....</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>N</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>In formula (6),Represent n-th of control parameter regulation of t actionValue function value, i.e., Represent the summation of the value function value of t everything, n=1,2 ..., N;Represent t from n-th of control Parameter regulation action processedIt is transferred to m-th of control parameter regulation actionTransition probability;Step 5.2.2, sampling number c=0,1,2C is set;Step 5.2.3, to the transition probability matrix of tC sampling is carried out, and the t in MCMC algorithms is obtained using formula (7) The receptance of the c+1 times sampling of moment<mrow> <msub> <mi>&alpha;</mi> <mrow> <mi>c</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>min</mi> <mo>{</mo> <mfrac> <mrow> <msub> <mi>p</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&times;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>p</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&times;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>|</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>In formula (7),Represent the action obtained by the c+1 times sampling of tProbable value,Represent t the Action obtained by c samplingProbable value;As c=0, the action obtained by the c times sampling of t is madeProbability DistributionTo wait general distribution, i.e.,Step 5.2.4, sampled to obtain random receptance u from being uniformly distributed in Uniform [0,1], by random receptance u and The receptanceIt is compared, ifThen receive the action obtained by the c+1 times sampling Otherwise the action obtained by the c+1 times sampling is not receivedAnd willIt is assigned toStep 5.2.5, the action obtained by the c+1 times sampling of t is updated using formula (8)Probability distribution<mrow> <msub> <mi>p</mi> <mrow> <mi>c</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mfrac> <mrow> <msubsup> <mi>d</mi> <mrow> <mi>n</mi> <mo>,</mo> <mi>c</mi> </mrow> <mi>t</mi> </msubsup> <mo>+</mo> <mn>1</mn> </mrow> <mrow> <msubsup> <mi>&sigma;</mi> <mi>c</mi> <mi>t</mi> </msubsup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> </mtd> <mtd> <mrow> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mfrac> <msubsup> <mi>d</mi> <mrow> <mi>n</mi> <mo>,</mo> <mi>c</mi> </mrow> <mi>t</mi> </msubsup> <mrow> <msubsup> <mi>&sigma;</mi> <mi>c</mi> <mi>i</mi> </msubsup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> </mtd> <mtd> <mrow> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>&NotEqual;</mo> <msubsup> <mi>a</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>In formula (8),Represent the action obtained by the c times sampling of tProbability distributionDenominator;Represent t Action obtained by the c times sampling of momentProbability distributionMolecule;As c=0, ordern =1,2 ..., N;Step 5.2.6, make c+1 be assigned to c, and judge whether c > C set up, if so, step 5.2.7 is then performed, otherwise, is returned Step 5.2.3 orders are returned to perform;Step 5.2.7, to the transition probability matrix of tThe C+1 times sampling is carried out, obtains the control of t autonomous navigation device Parameter regulation action processedAnd make t value function estimateFor the control of autonomous navigation device described in t Parameter regulation actsValue function valueStep 6, the control parameter regulation action of t autonomous navigation device is obtained using formula (9)Behavior act return value<mrow> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>&alpha;</mi> <mo>&times;</mo> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>&beta;</mi> <mo>&times;</mo> <mrow> <mo>(</mo> <msub> <mi>&Delta;e</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>In formula (9), α and β represent error return parameter and error rate return parameter respectively, the < β < 1 of 0 < α < 1,0, and α+ β=1;Step 7, utilize formula (10) renewal t-1 moment value Function Estimation valuesFor t-1 moment final value function Value<mrow> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>Q</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>l</mi> <mi>t</mi> </msub> <mi>&Delta;</mi> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>In formula (10),End value function difference value is represented, and is had:<mrow> <mi>&Delta;</mi> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msup> <mi>&gamma;Q</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>Q</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>&Delta;e</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msubsup> <mi>a</mi> <mi>n</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>Step 8, make t+1 be assigned to t, judge t > tmaxWhether set up, if so, then perform step 9;Otherwise according to SPSA step-lengths Algorithm t over time change is adjusted, using formula (12) to Studying factors ltIt is adjusted, wherein, tmaxIt is maximum set by expression Iterations:<mrow> <msub> <mi>l</mi> <mi>t</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mi>&mu;</mi> <mo>)</mo> </mrow> <mi>&lambda;</mi> </msup> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>In formula (12), l is the Studying factors value at t=1 moment, and μ and λ are the nonnegative constant in SPSA step-lengths regulation algorithm;Step 9, the end value functional value for judging continuous two momentWhether into It is vertical, if so, then represent that the regulation of autonomous navigation device pid control parameter finishes, and jump to step 11;Otherwise, step 10 is performed;Step 10, judge whether t exceeds the stipulated time, if exceeding, jump to step 3, reselect initial control parameter and adjust Section actsAdjust autonomous navigation device pid control parameter;Otherwise, jump to step 5 and continue autonomous navigation device PID control Parameter regulation;Step 11, make t=1;The ambient condition e of step 12, autonomous navigation device collection ttWith Δ et, judge | et| > | emin| or | Δ et| > | Δemin| whether set up, if so, then perform step 13;Otherwise return to step 11;Wherein, eminWith Δ eminRepresent respectively autonomous The ambient condition error and error rate minimum value that ROV allows;Step 13, make t+1 be assigned to t, and judge whether t > T set up, if so, then perform step 3;Otherwise return to step 12 Perform;Wherein, T represents that autonomous navigation device adapts to the time constant of environmental change speed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711144395.2A CN107885086B (en) | 2017-11-17 | 2017-11-17 | Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711144395.2A CN107885086B (en) | 2017-11-17 | 2017-11-17 | Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885086A true CN107885086A (en) | 2018-04-06 |
CN107885086B CN107885086B (en) | 2019-10-25 |
Family
ID=61777810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711144395.2A Active CN107885086B (en) | 2017-11-17 | 2017-11-17 | Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885086B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710289A (en) * | 2018-05-18 | 2018-10-26 | 厦门理工学院 | A method of the relay base quality optimization based on modified SPSA |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN111830822A (en) * | 2019-04-16 | 2020-10-27 | 罗伯特·博世有限公司 | System for configuring interaction with environment |
CN114237267A (en) * | 2021-11-02 | 2022-03-25 | 中国人民解放军海军航空大学航空作战勤务学院 | Flight maneuver decision auxiliary method based on reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110208377A1 (en) * | 2007-08-14 | 2011-08-25 | Propeller Control Aps | Efficiency optimizing propeller speed control for ships |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN106950956A (en) * | 2017-03-22 | 2017-07-14 | 合肥工业大学 | The wheelpath forecasting system of fusional movement model and behavior cognitive model |
CN107038477A (en) * | 2016-08-10 | 2017-08-11 | 哈尔滨工业大学深圳研究生院 | A kind of neutral net under non-complete information learns the estimation method of combination with Q |
-
2017
- 2017-11-17 CN CN201711144395.2A patent/CN107885086B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110208377A1 (en) * | 2007-08-14 | 2011-08-25 | Propeller Control Aps | Efficiency optimizing propeller speed control for ships |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN107038477A (en) * | 2016-08-10 | 2017-08-11 | 哈尔滨工业大学深圳研究生院 | A kind of neutral net under non-complete information learns the estimation method of combination with Q |
CN106950956A (en) * | 2017-03-22 | 2017-07-14 | 合肥工业大学 | The wheelpath forecasting system of fusional movement model and behavior cognitive model |
Non-Patent Citations (1)
Title |
---|
CHRISTOPHE ANDRIEU 等: "An Introduction to MCMC for Machine Learning", 《MACHINE LEARNING》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710289A (en) * | 2018-05-18 | 2018-10-26 | 厦门理工学院 | A method of the relay base quality optimization based on modified SPSA |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN109696830B (en) * | 2019-01-31 | 2021-12-03 | 天津大学 | Reinforced learning self-adaptive control method of small unmanned helicopter |
CN111830822A (en) * | 2019-04-16 | 2020-10-27 | 罗伯特·博世有限公司 | System for configuring interaction with environment |
CN114237267A (en) * | 2021-11-02 | 2022-03-25 | 中国人民解放军海军航空大学航空作战勤务学院 | Flight maneuver decision auxiliary method based on reinforcement learning |
CN114237267B (en) * | 2021-11-02 | 2023-11-24 | 中国人民解放军海军航空大学航空作战勤务学院 | Flight maneuver decision assisting method based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN107885086B (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107885086A (en) | Autonomous navigation device control parameter on-line control method based on MCMC optimization Q study | |
CN110427261A (en) | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree | |
CN109828552B (en) | Intermittent process fault monitoring and diagnosing method based on width learning system | |
CN104616060A (en) | Method for predicating contamination severity of insulator based on BP neural network and fuzzy logic | |
CN103971160B (en) | particle swarm optimization method based on complex network | |
CN106056127A (en) | GPR (gaussian process regression) online soft measurement method with model updating | |
CN109218744B (en) | A kind of adaptive UAV Video of bit rate based on DRL spreads transmission method | |
CN109214579B (en) | BP neural network-based saline-alkali soil stability prediction method and system | |
CN112766603B (en) | Traffic flow prediction method, system, computer equipment and storage medium | |
WO2023035727A1 (en) | Industrial process soft-measurement method based on federated incremental stochastic configuration network | |
CN105843189A (en) | Simplified simulation model based high efficient scheduling rule choosing method for use in semiconductor production lines | |
Hu et al. | Adaptive exploration strategy with multi-attribute decision-making for reinforcement learning | |
CN111582567B (en) | Wind power probability prediction method based on hierarchical integration | |
Mellios et al. | A multivariate analysis of the daily water demand of Skiathos Island, Greece, implementing the artificial neuro-fuzzy inference system (ANFIS) | |
Liu et al. | Accelerate mini-batch machine learning training with dynamic batch size fitting | |
Li et al. | Hyper-parameter tuning of federated learning based on particle swarm optimization | |
Remmerswaal et al. | Combined MPC and reinforcement learning for traffic signal control in urban traffic networks | |
Han et al. | Multi-step prediction for the network traffic based on echo state network optimized by quantum-behaved fruit fly optimization algorithm | |
Li et al. | Graph reinforcement learning-based cnn inference offloading in dynamic edge computing | |
Al-Lawati et al. | Anytime minibatch with stale gradients | |
CN111796519B (en) | Automatic control method of multi-input multi-output system based on extreme learning machine | |
CN109636609A (en) | Stock recommended method and system based on two-way length memory models in short-term | |
Cui | On asymptotics of t-type regression estimation in multiple linear model | |
Zhou et al. | Decentralized adaptive optimal control for massive multi-agent systems using mean field game with self-organizing neural networks | |
Yin et al. | FedSCS: Client selection for federated learning under system heterogeneity and client fairness with a Stackelberg game approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |