CN114667852A - Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning - Google Patents

Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning Download PDF

Info

Publication number
CN114667852A
CN114667852A CN202210248923.3A CN202210248923A CN114667852A CN 114667852 A CN114667852 A CN 114667852A CN 202210248923 A CN202210248923 A CN 202210248923A CN 114667852 A CN114667852 A CN 114667852A
Authority
CN
China
Prior art keywords
trimming
hedge
function
hedgerow
mechanical arm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210248923.3A
Other languages
Chinese (zh)
Other versions
CN114667852B (en
Inventor
蒙艳玫
李科
缪祥烜
韦锦
韩冰
武豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN202210248923.3A priority Critical patent/CN114667852B/en
Publication of CN114667852A publication Critical patent/CN114667852A/en
Application granted granted Critical
Publication of CN114667852B publication Critical patent/CN114667852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01GHORTICULTURE; CULTIVATION OF VEGETABLES, FLOWERS, RICE, FRUIT, VINES, HOPS OR SEAWEED; FORESTRY; WATERING
    • A01G3/00Cutting implements specially adapted for horticultural purposes; Delimbing standing trees
    • A01G3/04Apparatus for trimming hedges, e.g. hedge shears
    • A01G3/0435Machines specially adapted for shaping plants, e.g. topiaries
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01GHORTICULTURE; CULTIVATION OF VEGETABLES, FLOWERS, RICE, FRUIT, VINES, HOPS OR SEAWEED; FORESTRY; WATERING
    • A01G3/00Cutting implements specially adapted for horticultural purposes; Delimbing standing trees
    • A01G3/04Apparatus for trimming hedges, e.g. hedge shears
    • A01G3/0426Machines for pruning vegetation on embankments and road-sides

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ecology (AREA)
  • Forests & Forestry (AREA)
  • Environmental Sciences (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses an intelligent cooperative control method of a hedge trimming robot based on deep reinforcement learning, which comprises the following steps: establishing an MDP deep reinforcement learning model of the hedge trimming robot; building a deep neural network framework; designing a strategy network objective function and a value function network objective function of an improved PPO algorithm; training a deep neural network by adopting an improved PPO algorithm according to a maximization strategy network target reward function and a minimization function network target function mean square error principle; an Adam adaptive gradient algorithm with an improved adaptive learning rate is adopted to optimize an objective function, an optimal strategy of a training model of the hedgerow trimming robot is obtained through repeated updating iteration, the optimal action can be predicted and output through inputting latest state data, and control instructions of the mobile chassis and the trimming mechanical arm are output. According to the method, the hedge trimming robot is not required to be physically modeled, the control error caused by inaccurate model is avoided, the algorithm is prevented from falling into a local optimal solution, the updating efficiency of the algorithm is accelerated, and meanwhile the generalization capability of the control algorithm is improved.

Description

Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of control, in particular to a hedge trimming robot intelligent cooperative control method based on deep reinforcement learning and used for highway hedge trimming.
Background
With the gradual increase of the trimming task amount of the expressway green isolation belt, in view of low manual trimming efficiency, automatic equipment such as automatic green maintenance vehicles, vehicle-mounted hedge trimming vehicles, unmanned trimming robots and the like are applied in the industry. The unmanned pruning robot mainly comprises an unmanned chassis and a pruning manipulator, and the unmanned chassis and the manipulator need to be cooperatively controlled in real time during pruning operation, so that the robot can be close to and accurately position a hedgerow by adjusting the posture of the robot and prune the hedgerow.
At present, some research results have been found in the field in a cooperative control method for a mobile manipulator, for example:
chinese patent application No. CN109176525A discloses a mobile manipulator adaptive control method based on Radial Basis Function (RBF) neural network. The method comprises the steps of performing dynamic modeling on the mobile manipulator, building an RBF neural network of a robot dynamic model, designing a mobile manipulator trajectory tracking method by using the neural network, performing online compensation and identification on unknown dynamic parameters to further realize cooperative control of the mobile platform and the manipulator, and improving the dynamic performance of the mobile manipulator and the trajectory tracking precision of joint space.
Chinese patent application publication No. CN201510269642.6 discloses a method for controlling a mobile manipulator based on GPS and binocular vision positioning, where the mobile manipulator moves to a position near a target object according to position information of the target object obtained by the GPS, the binocular vision obtains three-dimensional information of the object, an end effector of the mobile manipulator approaches the target object quickly based on vision feed-forward, the manipulator is controlled to center the object and control the end effector to grasp the object based on vision feedback, and the vision feed-forward and feedback control enables the mobile manipulator to approach the target quickly and grasp the target after accurate positioning, thereby improving control efficiency.
The existing various mathematical analysis methods (inverse Jacobian matrix, model prediction control and the like) based on the system model or a plurality of methods have the defects of difficult system model establishment, inaccurate model, complex and tedious calculation, low system response speed, large position error accumulation, poor control system self-adaptive capacity, poor anti-jamming capability, poor robustness and the like. However, the existing solutions based on intelligent algorithms (such as neural networks and reinforcement learning) have the defects of incomplete data information, unclear and incomplete data sources, inaccurate environment models, long system response time, low algorithm learning efficiency and the like, and cannot fully ensure the positioning accuracy, the smoothness of joint operation, the response speed, the safety and other requirements of the robot system.
Disclosure of Invention
The invention aims to provide an intelligent cooperative control method of a hedge trimming robot based on deep reinforcement learning, which omits the complicated process of manual modeling calculation in the traditional method, can update a control strategy in real time, improves the dynamic response characteristic of a system, accelerates the update efficiency of an algorithm, avoids falling into a local optimal solution, and simultaneously improves the generalization capability of the control algorithm.
The hedgerow trimming robot comprises a movable chassis and a trimming mechanical arm fixed on the movable chassis, wherein a vision detection module is installed on the hedgerow trimming robot; the vision detection module comprises a hedge cross section detection camera arranged at the tail end of the trimming mechanical arm, a hedge height and distance detection camera arranged at the base of the trimming mechanical arm, and a front side lane line detection camera arranged on the moving chassis;
the invention provides an intelligent cooperative control method of a hedge trimming robot based on deep reinforcement learning, which comprises the following steps:
establishing a Markov Decision (MDP) deep reinforcement learning model of a hedge trimming robot;
step two, building a deep neural network frame;
Thirdly, designing a strategy network objective function and a value function network objective function of the improved PPO algorithm;
step four, training a deep neural network by adopting an improved PPO algorithm according to a maximum strategy network target reward function and a minimum value function network target function mean square error principle;
and fifthly, optimizing a target function by adopting an Adam adaptive gradient algorithm for improving the adaptive learning rate, obtaining an optimal strategy of the hedge trimming robot training model through repeated updating iteration, predicting and outputting optimal actions by inputting latest state data, and outputting control instructions of the mobile chassis and the trimming mechanical arm.
Preferably, in the MDP deep reinforcement learning model of the hedge trimming robot established in step one: the Markov Decision (MDP) process of the hedge trimming robot is described by a quintuple (S, A, P, R, gamma), wherein S represents a state set, A represents an action set, P represents a state transition probability, the value is 0 to 1, R is a reward function, gamma is a reward discount factor, the value is 0 to 1, and the MDP process is used for calculating the accumulated reward obtained in the interaction process of the intelligent agent and the environment; the intelligent agent is a vehicle-arm cooperative control module of the hedge trimming robot, and the environment comprises the hedge trimming robot, a hedge and a lane line; receiving the state S of the current environment moment by a strategy model of the hedge trimming robot tSelecting and performing action AtThen with a probability P (S) according to the environment modelt+1|St,At) Enter a new state St+1And earn a prize Rt+1Policy model re-acceptance State St+1And Rt+1Continuously generating and executing a control instruction of the hedge trimming robot, continuously optimizing and adjusting the strategy model according to the maximum reward obtained in the process until a certain condition is met, and finishing interaction between the intelligent agent and the environment;
wherein the state set comprises the following eight parts: position P of center of hedge cross section in pixel coordinate system of hedge cross section detection cameraC=(Xc,Yc) The height H of the hedge, the distance L from the center of the longitudinal section of the hedge to the origin of coordinates of the trimming mechanical arm, and the position of the lane line under the coordinate system of the front lane line detection cameraAnd the position of the hedgerow in the vehicle coordinate system
Figure BDA0003546191460000031
Figure BDA0003546191460000032
Pose of each joint of trimming mechanical arm
Figure BDA0003546191460000033
Position of hedgerow under coordinate system of trimming mechanical arm end effector
Figure BDA0003546191460000034
Pose of mobile chassis under world coordinate system
Figure BDA0003546191460000035
Wherein, the position P of the center of the cross section of the hedge under the pixel coordinate system of the detection camera of the cross section of the hedgeC=(Xc,Yc) The acquisition method comprises the following steps: the hedgerow image is shot through the hedgerow cross section detection camera and feature extraction is carried out, a feature template of the hedgerow cross section shape is obtained, when the tail end of the trimming mechanical arm moves right above the hedgerow, the hedgerow cross section shape is detected and matched with the feature template, after the matching coincidence degree is larger than 80%, the hedgerow cross section feature recognition can be considered to be effective, and a cross section center coordinate value P is output C=(Xc,Yc);
The method for acquiring the height H of the hedgerow and the distance L from the center of the longitudinal tangent plane of the hedgerow to the coordinate origin of the trimming mechanical arm comprises the following steps: shooting a hedge image through a hedge height and distance detection camera, and acquiring a hedge height H and a distance L from the center of a longitudinal section of the hedge to the coordinate origin of the trimming mechanical arm;
the method for acquiring the position of the lane line under the coordinate system of the front lane line detection camera comprises the following steps: the front lane line detection camera takes a picture of a front lane, and obtains a left front (x) line and a right front (x) line of a lane line in a front ROI area from the pictureflf,yflf) Left rear (x)flb,yflb)、Front right (x)frf,yfrf) Right rear (x)frb,yfrb) The coordinate values of (2).
Wherein the action set comprises the following three parts: motion velocity component of each joint of trimming mechanical arm
Figure BDA0003546191460000036
Figure BDA0003546191460000037
Longitudinal speed of moving chassis, front wheel slip angle
Figure BDA0003546191460000038
Wherein the reward function comprises the following parts: the operation smoothness of each joint of the trimming mechanical arm is rewarded, the distance between a hedge and an end actuator of the trimming mechanical arm is rewarded, and the tracking precision of the movable chassis to the lane line is rewarded;
wherein, the reward of running smoothness of each joint of the trimming mechanical arm is represented as:
Figure BDA0003546191460000041
wherein i is 0,1,2, 3; d omegaikThe angular velocity differential of the servo motor of each joint of the trimming mechanical arm, which is output by the system, represents the running stability of the motor, a 1,b1Is constant and a1<0,b1>0;
Wherein the distance reward between the hedge and the end effector of the trimming robot is expressed as:
rdist=a2(ddist)2+c
in the formula a2C is a constant and a2<0,c>0,rdistIs the distance between the hedge and the coordinate origin of the trimming mechanical arm end effector,
Figure BDA0003546191460000042
when the end effector of the trimming mechanical arm is closer to the hedgerow, the user can learnThe greater the value of the reward that can be obtained;
wherein, the reward of the tracking precision of the mobile chassis to the lane line is represented as:
Figure BDA0003546191460000043
in the formula, a3,b2Is constant and a3<0,b2>0,K1,K2The number of the intelligent agent is constant, the boundary of the coordinate range of the lane center line is represented, x represents the value of the lane center line, and the reward value of the intelligent agent is larger when the value of x is closer to the center line and is smaller otherwise;
in summary, the reward function is expressed as:
R=ω1*rarms2*rdist3*rtrack
in the formula, ω1,ω2,ω3The values of the weights of the pruning robot system on the reward functions of all parts are 0 to 1, and the larger the weight is, the better the performance of the trained prediction model in the aspect is.
Preferably, in the second step, a deep neural network framework of the strategy network and a deep neural network framework of the value function network are respectively built by adopting a fully-connected neural network model.
Preferably, the policy network objective function of the improved PPO algorithm in step three is expressed as:
Figure BDA0003546191460000044
respectively representing the original strategy and the updated strategy in a single round; n is a radical of kFor a set of trajectories obtained after executing a policy in an environment, Nk={τiDenoted by τ is a set of state-action sequences S0,A0,R0…St,At,Rt},
Figure BDA0003546191460000045
Representing the ratio between the old strategy and the new strategy; the epsilon is a hyperparametric truncation factor,
Figure BDA0003546191460000046
to use the merit function of generalized merit estimation (GAE), defined as
Figure BDA0003546191460000051
In the formula V (S)t) In order to estimate the state cost function, lambda is an introduced hyper-parameter and takes the value from 0 to 1; e is the same as0The initial truncation factor is N, the training frequency is already finished, and the set total training frequency is represented by N;
the value function network objective function of the improved PPO algorithm in step three is represented as:
Figure BDA0003546191460000052
in the formula, Vφ(St) Is a state cost function.
Preferably, the learning rate in the optimizer of Adam adaptive gradient algorithm described in step five is represented as:
Figure BDA0003546191460000053
wherein, alpha is the initial step length, beta1,β2An exponential decay rate is estimated for a moment, e 0, 1, which is a small constant with a stable value, usually 10-8,mt,vtThe biased first moment estimate and the biased second moment estimate, respectively, are calculated from the objective function gradient, and Down _ bdy _ a and Up _ bdy _ a, respectively, represent the lower and upper bounds of the learning rate, and are expressed as:
Figure BDA0003546191460000054
Figure BDA0003546191460000055
in the formula, N is the current training cycle number, N is the preset total training cycle number, and in the process of optimizing the objective function, the lower bound of the learning rate is set as the learning rate value when the objective function of the strategy network starts to rise or the learning rate value when the value function network starts to fall; the upper bound of the learning rate is set as a learning rate value when the objective function of the strategy network starts to fall or a learning rate value when the value function network starts to rise; by adaptively truncating the learning rate in a specified interval, keeping the upper bound of the learning rate unchanged in the early stage of training, continuously improving the lower bound of the learning rate, keeping the lower bound of the learning rate unchanged in the later stage of training, and continuously reducing the lower bound of the learning rate, a relatively large learning rate can be obtained in the early stage of training to enable the target function to jump out of a local optimal solution, and the learning rate is monotonically reduced in the later stage of training to ensure that the target function is monotonically converged and does not diverge;
Respectively optimizing a strategy network and a value function network objective function through the optimizer, and updating network parameters, wherein the updating process is represented as:
θ′=θ+Δα
Figure BDA0003546191460000056
an Adam adaptive gradient algorithm for improving the adaptive learning rate is adopted to optimize the objective function, the optimal strategy of the hedge trimming robot training model is obtained through repeated updating iteration, the optimal action can be predicted and output through inputting the latest state data, and the control instructions of the mobile chassis and the trimming mechanical arm are output.
The invention has the advantages of
(1) The invention provides a hedge trimming robot intelligent cooperative control method based on deep reinforcement learning, which trains a deep neural network model by adopting an improved near-end strategy optimization (PPO) algorithm, avoids the complexity of manually adjusting parameters of the traditional PPO algorithm, improves the performance and generalization capability of the algorithm while ensuring the convergence speed of the algorithm, optimizes a deep neural network target function by adopting an improved Adam adaptive gradient algorithm, adaptively adjusts the learning rate, enables the algorithm to jump out of a local optimal solution, accelerates the convergence speed of the algorithm and improves the generalization capability.
(2) Because the highway operation environment is dangerous, the invention identifies the lane line by the camera arranged at the front side of the mobile chassis, respectively detects the coordinate values of the straight lines at the left side and the right side of the lane line under the camera coordinate system, limits the lane line in a certain coordinate range, realizes the lane line tracking, enables the hedge trimming robot to work in a safe area, meets the independent motion path of the mobile chassis, and simultaneously coordinates the motion of the trimming mechanical arm, thereby ensuring higher operability and tail end tracking precision.
(3) Compared with the existing control method, the intelligent cooperative control method for the hedge trimming robot based on the deep reinforcement learning omits the complex process of manual modeling calculation of the traditional method, avoids control errors caused by model inaccuracy, does not consider the constraint limitation of a mobile platform, has higher universality, can realize the automatic operation of the hedge trimming robot, and improves the automation level and the intelligent level.
Drawings
Fig. 1 illustrates a flow of steps of an intelligent cooperative control method for a hedge trimming robot according to the present invention;
FIG. 2 illustrates a schematic diagram of a Markov Decision (MDP) deep reinforcement learning model of a hedge trimming robot in accordance with the present invention;
fig. 3 shows a hedge trimming robot configuration and hedge height H and distance L detection scheme according to the present invention;
FIG. 4 shows a schematic diagram of the detection principle of the center of the cross section of the hedgerow;
FIG. 5 is a schematic view showing a principle of lane line recognition detection;
in the figure, 1-end effector pixel coordinate system, 2-feature template of hedge cross section shape, 3-center position of hedge cross section, 4-hedge cross section, 5-lane line, 6-moving chassis, 7-front lane line detection camera, 8-trimming mechanical arm Base, 9-hedge height and distance detection camera, 10-hedge cross section detection camera, 11-saw blade of manipulator end effector, 12-hedge, 13-highway guardrail, 14-front lane line detection pixel coordinate system, 15-ROI area, 16-left side target straight line front left coordinate (x)flf,yflf) 17-left target straight line left rear coordinate (x)flb,yflb) 18-Right Forward coordinate of Right target straight line (x)frf,yfrf) 19-Right rear coordinate (x) of Right target straight linefrb,yfrb) 20-lane line.
Detailed Description
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention. Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations such as "comprises" or "comprising", etc., will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, methods, instrumentalities, and elements well known to those skilled in the art have not been described in detail so as not to obscure the present invention.
Embodiment is an intelligent cooperative control method for hedge trimming robot based on deep reinforcement learning
The hedge trimming robot performs hedge trimming operation on the expressway, the lane line width is 120mm, and the hedge height and distance are known;
as shown in fig. 3, the hedge trimming robot comprises a mobile chassis 6 and a 4-degree-of-freedom trimming mechanical arm fixed on the mobile chassis, wherein a saw blade 11 is arranged on an end effector (a cutter) of the trimming mechanical arm and is used for trimming hedges; install visual detection module on the hedge trimming robot, include: a hedge cross section detection camera 10 installed at the tail end of the trimming mechanical arm, a hedge height and distance detection camera 9 installed at a base 8 of the trimming mechanical arm, and a front lane line detection camera 7 installed at the front side of the moving chassis 6; the vision sensor chip is arranged in each camera, and an algorithm is embedded in the vision sensor chip, so that the functions of RGB image acquisition, image processing and recognition and positioning can be realized.
As shown in fig. 1, the intelligent cooperative control method for the hedge trimming robot based on the deep reinforcement learning includes the following steps:
establishing a Markov Decision (MDP) deep reinforcement learning model of a hedgerow trimming robot;
as shown in fig. 2, the Markov Decision (MDP) process of the hedge trimming robot is described by a quintuple (S, a, P, R, γ), where S represents a state set, a represents an action set, P represents a state transition probability, and takes a value from 0 to 1, R is a reward function, γ is a reward discount factor, and takes a value from 0 to 1, and is used to calculate the accumulated reward obtained by the interaction process between the agent and the environment; the intelligent agent is a vehicle-arm cooperative control module of the hedge trimming robot, and the environment comprises the hedge trimming robot, a hedge and a lane line; the strategy model of the hedge trimming robot receives the state S of the environment at the current momenttSelecting and performing action AtThen with a probability P (S) according to the environment modelt+1|St,At) Enter a new state St+1And earn a reward Rt+1Policy modelType re-acceptance state St+1And Rt+1Continuously generating and executing a control instruction of the hedge trimming robot, continuously optimizing and adjusting the strategy model according to the maximum reward obtained in the process until a certain condition is met, and finishing interaction between the intelligent agent and the environment;
Wherein the state set comprises the following eight parts: position P of center of hedgerow cross section under pixel coordinate system of hedgerow cross section detection cameraC=(Xc,Yc) The height H of the hedge, the distance L from the center of the longitudinal section of the hedge to the origin of coordinates of the trimming mechanical arm, the position of the lane line under the front lane line detection camera coordinate system, and the position of the hedge under the vehicle coordinate system
Figure BDA0003546191460000081
Figure BDA0003546191460000082
Pose of each joint of trimming mechanical arm
Figure BDA0003546191460000083
Position of hedgerow under coordinate system of end effector of mechanical arm of the trimming
Figure BDA0003546191460000084
Pose of mobile chassis under world coordinate system
Figure BDA0003546191460000085
In this embodiment, the state set is represented as:
Figure BDA0003546191460000086
Figure BDA0003546191460000087
Figure BDA0003546191460000088
state space dimension | S | ═ 25;
as shown in FIG. 4, the position P of the center of the hedge cross section in the pixel coordinate system of the hedge cross section inspection cameraC=(Xc, Yc) The acquisition method comprises the following steps: the hedgerow image is shot through the hedgerow cross section detection camera and feature extraction is carried out, a feature template of the hedgerow cross section shape is obtained, when the tail end of the trimming mechanical arm moves right above the hedgerow, the hedgerow cross section shape is detected and matched with the feature template, after the matching coincidence degree is larger than 80%, the hedgerow cross section feature recognition can be considered to be effective, and a cross section center coordinate value P is outputC=(Xc,Yc);
As shown in fig. 3, the method for acquiring the hedge height H and the distance L from the center of the longitudinal tangent plane of the hedge to the coordinate origin of the trimming mechanical arm comprises the following steps: the hedgerow height and distance detection camera is used for shooting hedgerow images, and the hedgerow height H and the distance L from the center of the longitudinal section of the hedgerow to the coordinate origin of the trimming mechanical arm are obtained;
As shown in fig. 5, the method for acquiring the position of the lane line in the front lane line detection camera coordinate system includes: the front lane line detection camera takes a picture of a front lane, and obtains a left front (x) line and a right front (x) line of a lane line in a front ROI area from the pictureflf, yflf) Left posterior (x)flb,yflb) Right front (x)frf,yfrf) Right rear (x)frb,yfrb) The coordinate values of (a);
wherein the action set comprises the following three parts: motion velocity component of each joint of trimming mechanical arm
Figure BDA0003546191460000091
Figure BDA0003546191460000092
Longitudinal speed of moving chassis, front wheel slip angle
Figure BDA0003546191460000093
In this embodiment, the action set is represented as:
Figure BDA0003546191460000094
dimension | a | ═ 6 of the motion space;
wherein, the reward function comprises the following parts: the operation smoothness of each joint of the trimming mechanical arm is rewarded, the distance between a hedge and an end actuator of the trimming mechanical arm is rewarded, and the tracking precision of the movable chassis to the lane line is rewarded;
wherein, the reward of running smoothness of each joint of the trimming mechanical arm is represented as:
Figure BDA0003546191460000095
wherein i is 0,1,2, 3; d omegaikThe angular velocity differential of each joint servo motor of the trimming mechanical arm output by the system represents the running stability of the motor, a1,b1Is constant and a1<0,b1>0;a1,b1Respectively a quadratic term coefficient and a first order term coefficient of the smoothness reward function of the mechanical arm joint of the pruning machine, wherein the function is a unary quadratic function, and the quadratic term coefficient a 1< 0, coefficient of first order term b1Is more than 0, and both can take any real number;
in this embodiment, the smoothness reward function for each joint of the trimming mechanical arm is expressed as:
Figure BDA0003546191460000096
Figure BDA0003546191460000097
wherein the distance reward between the hedgerow and the end effector of the trimming robot is expressed as:
rdist=a2(ddist)2+c
in the formula a2C is a constant and a2<0,c>0,a2Is the coefficient of the quadratic term of the unary quadratic function and is less than 0, c is a constant term, and the two can take any real number, rdistIs the distance between the hedge and the coordinate origin of the end effector of the trimming mechanical arm,
Figure BDA0003546191460000098
Figure BDA0003546191460000099
when the end executor of the trimming mechanical arm is closer to the hedgerow, the intelligent agent obtains a larger reward value;
in this embodiment, the distance reward function of the hedge from the trimming robot end effector is expressed as: r isdist=- (ddist)2+100;
Wherein, the reward of the tracking precision of the mobile chassis to the lane line is represented as:
Figure BDA0003546191460000101
in the formula, a3,b2Is constant and a3<0,b2>0,a3Is the coefficient of quadratic term of the unitary quadratic function and is less than 0, b2Is a first order coefficient and is greater than 0, and the two are arbitrary real numbers, K1,K2The number of the intelligent agent is constant, the boundary of the coordinate range of the lane center line is represented, x represents the value of the lane center line, and the reward value of the intelligent agent is larger when the value of x is closer to the center line and is smaller otherwise;
in this embodiment, the tracking accuracy reward function of the mobile chassis to the lane line is expressed as:
Figure BDA0003546191460000102
In summary, the reward function is represented as:
R=ω1*rarms2*rdist3*rtrack
in the formula, ω1,ω2,ω3The values of the weights of the pruning robot system on the reward functions of all parts are 0 to 1, and the larger the weight is, the better the performance of the trained prediction model in the aspect is.
Step two, building a deep neural network framework;
when the hedge trimming robot cooperative control training is carried out by using the deep reinforcement learning, two deep neural network models, namely a strategy network and a value function network, are included, and in a training task, because the dimensionality of the network models on state input and action output is low, the data processing requirements of the training task can be met by adopting a fully-connected neural network (MLP) model with a simple structure, so that deep neural network frames of the strategy network and the value function network can be respectively built by adopting the fully-connected neural network models;
in this embodiment, the policy network adopts a structure setting of 2 hidden layers and 1 output layer, because the state space dimension of the input layer is 25, the action space dimension of the output layer is 6, the requirement on network information capacity is small, the number of neurons of the hidden layers can be set to 128, and in the network structure, because the output action is the motor expected angular velocity, the longitudinal velocity of the mobile chassis and the front wheel rotation angle, and has a positive and negative score, the output layer height adopts a TanH function to normalize the output action, and the hidden layers adopt a ReLU as an activation function to improve the network training efficiency; the value function network adopts the structural setting of 3 hidden layers and 1 output layer, in the value function network, the input is the state and the action, the output is the Q value function, the input action is output to the network in the second full-connection hidden layer, because the dimension of the input and the output is larger, more neurons need to be set to increase the data processing capability, so in the network structure, the number of the neurons in the 1 st hidden layer is set to be 128, the number of the neurons in the 2 nd and the 3 rd hidden layers is set to be 256, and the ReLU is adopted as the activation function to improve the network training efficiency.
Thirdly, designing a strategy network objective function and a value function network objective function of the improved PPO algorithm;
in a policy network, to seek a policy πθ(At|St) After the training is finished, the optimal parameter θ is usually trained by a maximum strategic network objective reward function, and the PPO-clip with the clip item can be used as a strategic network optimization objective function, that is:
Figure BDA0003546191460000111
set of derived trajectories, Nk={τiDenoted by τ is a set of state-action sequences S0,A0,R0…St,At,Rt},
Figure BDA0003546191460000112
Figure BDA0003546191460000113
The ratio between the old strategy and the new strategy is expressed, epsilon is a hyperparametric truncation factor,
Figure BDA0003546191460000114
to use the generalized dominance function of the dominance estimation (GAE), is defined as
Figure BDA0003546191460000115
In the formula V (S)t) In order to estimate the state cost function, lambda is an introduced hyper-parameter and takes the value of 0 to 1;
in a strategy network objective function, a hyper-parameter truncation factor epsilon is manually designed during training, the value size of the hyper-parameter truncation factor epsilon limits the size of an objective function confidence interval, the difference of a new strategy and an old strategy after PPO-clip requirement updating cannot be too large, when the value is too small, the performance of a strategy optimization process can be stably improved, but the algorithm training time is increased, when the value is too large, the updating of a strategy network can be unstable, when the strategy network objective function is used for training a hedge trimming robot model, the value size both ensures that the strategy network can be rapidly converged, and simultaneously meets the algorithm stability requirement, therefore, the hyper-parameter truncation factor can be designed as follows:
Figure BDA0003546191460000116
Wherein, is E0The initial truncation factor is N, the training frequency is already finished, the set total training frequency is represented by N, and 1 is used for preventing the situation that the denominator is 0 is meaningless; enabling the hyperparameter truncation factor to be in an adaptive nonlinear way along with the training processAttenuation, namely, the strategy network can be ensured to accelerate the updating amplitude of the algorithm in the early training stage and ensure the algorithm to be stably converged in the later training stage;
in summary, the policy network objective function for improving the PPO algorithm is expressed as:
Figure BDA0003546191460000117
in an embodiment, an initial truncation factor e is set00.3, the total frequency of training is 106
In a value function network, the value function network is often trained with a minimum mean square error, and therefore, the value function network objective function of the improved PPO algorithm is expressed as:
Figure BDA0003546191460000118
in the formula, Vφ(St) Is a state cost function.
And step four, training the deep neural network by adopting an improved PPO algorithm according to the mean square error principle of the network objective function of the maximization strategy network objective reward function and the minimization value function network objective function.
Step five, optimizing a target function by adopting an Adam adaptive gradient algorithm for improving the adaptive learning rate, obtaining an optimal strategy of the training model of the hedgerow trimming robot through repeated updating iteration, predicting and outputting optimal actions by inputting latest state data, and outputting control instructions of the mobile chassis and the trimming mechanical arm; wherein, in the optimizer of the Adam adaptive gradient algorithm, the learning rate is expressed as:
Figure BDA0003546191460000121
Wherein α is the initial step size, β1,β2An exponential decay rate is estimated for a moment, e 0, 1, which is a small constant with a stable value, usually 10-8,mt,vtRespectively representing the first order moment estimate with deviationEstimating the second moment of the deviation, which is calculated by the gradient of the objective function; in this embodiment, the initial step size α is 0.001, and the moment is estimated as the exponential decay rate β1=0.9,β20.999; down _ bdy _ a and Up _ bdy _ a represent the lower and upper bounds of the learning rate, respectively, and are represented as:
Figure BDA0003546191460000122
Figure BDA0003546191460000123
in the formula, N is the current training cycle number, and N is the preset training total cycle number; in this embodiment, the preset total training cycle number N is 106
In the process of optimizing the objective function, the lower bound of the learning rate is set as a learning rate value when the objective function of the strategy network starts to rise or a learning rate value when the value function network starts to fall; the upper bound of the learning rate is set as a learning rate value when the objective function of the strategy network starts to fall or a learning rate value when the value function network starts to rise; by adaptively truncating the learning rate in a specified interval, keeping the upper bound of the learning rate unchanged in the early stage of training, continuously improving the lower bound of the learning rate, keeping the lower bound of the learning rate unchanged in the later stage of training, and continuously reducing the lower bound of the learning rate, a relatively large learning rate can be obtained in the early stage of training to enable the target function to jump out of a local optimal solution, and the learning rate is monotonically reduced in the later stage of training to ensure that the target function is monotonically converged and not diverged;
In this embodiment, the number of cycles of the rising and falling learning rates is equal to
Figure BDA0003546191460000131
I.e. 0.4 x 106Second, the convergence period is
Figure BDA0003546191460000132
I.e. 0.2 x 106
Respectively optimizing a policy network and a value function network objective function through the optimizer, and updating network parameters, wherein the updating process can be expressed as:
θ′=θ+Δα
Figure BDA0003546191460000133
an Adam adaptive gradient algorithm for improving the adaptive learning rate is adopted to optimize the objective function, the optimal strategy of the hedge trimming robot training model is obtained through repeated updating iteration, the optimal action can be predicted and output through inputting the latest state data, and the control instructions of the mobile chassis and the trimming mechanical arm are output.
Through the intelligent cooperative control method for the hedge trimming robot based on the deep reinforcement learning, the tedious process of manual modeling calculation in the traditional method is omitted, the control error caused by inaccurate model is avoided, the constraint limitation of a mobile platform is not considered, the method has higher universality, the automatic operation of the hedge trimming robot can be realized, and the automation level and the intelligent level are improved. Meanwhile, an improved near-end strategy optimization (PPO) algorithm is adopted to train a deep neural network model, the complexity of manually adjusting parameters of the traditional PPO algorithm is avoided, the convergence rate of the algorithm is guaranteed, meanwhile, the performance and the generalization capability of the algorithm are improved, meanwhile, an improved Adam adaptive gradient algorithm is adopted to optimize a deep neural network target function, the learning rate is adaptively adjusted, the algorithm jumps out of a local optimal solution, the convergence rate of the algorithm is accelerated, and the generalization capability is improved.

Claims (6)

1. An intelligent cooperative control method of a hedgerow trimming robot based on deep reinforcement learning is characterized by comprising the following steps:
the hedge trimming robot comprises a movable chassis and a trimming mechanical arm fixed on the movable chassis, and a vision detection module is mounted on the hedge trimming robot; the vision detection module comprises a hedge cross section detection camera arranged at the tail end of the trimming mechanical arm, a hedge height and distance detection camera arranged at the base of the trimming mechanical arm, and a front side lane line detection camera arranged on the moving chassis;
the intelligent cooperative control method for the hedge trimming robot comprises the following steps:
step one, establishing a Markov decision MDP deep reinforcement learning model of a hedgerow trimming robot;
step two, building a deep neural network framework;
thirdly, designing a strategy network objective function and a value function network objective function of the improved PPO algorithm;
step four, training a deep neural network by adopting an improved PPO algorithm according to a network target reward function of a maximization strategy and a network target function mean square error principle of a minimization function;
and fifthly, optimizing a target function by adopting an Adam adaptive gradient algorithm for improving the adaptive learning rate, obtaining an optimal strategy of the hedge trimming robot training model through repeated updating iteration, predicting and outputting optimal actions by inputting latest state data, and outputting control instructions of the mobile chassis and the trimming mechanical arm.
2. The intelligent cooperative control method for the hedge trimming robot according to claim 1, characterized in that:
in the MDP deep reinforcement learning model of the hedge trimming robot established in the first step, the Markov decision MDP process is described by a quintuple (S, A, P, R, gamma), wherein S represents a state set, A represents an action set, P represents a state transition probability, the value is 0 to 1, R is an incentive function, gamma is an incentive discount factor, the value is 0 to 1, and the method is used for calculating the accumulated incentive obtained in the interaction process of the agent and the environment; the intelligent agent is a vehicle-arm cooperative control module of the hedge trimming robot, and the environment comprises the hedge trimming robot, a hedge and a lane line; the strategy model of the hedge trimming robot receives the state S of the environment at the current momenttSelecting and performing action AtThen with probability P (S) according to the environment modelt+1|St,At) Enter a new state St+1And earn a reward Rt+1Policy model re-acceptance State St+1And Rt+1And continuously generating and executing a control instruction of the hedge trimming robot, continuously optimizing and adjusting the strategy model according to the maximum reward obtained in the process until a certain condition is met, and finishing the interaction between the intelligent agent and the environment.
3. The intelligent cooperative control method for the hedge trimming robot according to claim 2, characterized in that:
The state set includes the following eight parts: position P of center of hedgerow cross section under pixel coordinate system of hedgerow cross section detection cameraC=(Xc,Yc) The height H of the hedgerow, the distance L from the center of the longitudinal section of the hedgerow to the coordinate origin of the trimming mechanical arm, the position of the lane line under the coordinate system of the front lane line detection camera, and the position of the hedgerow under the coordinate system of the vehicle
Figure FDA0003546191450000021
Figure FDA0003546191450000022
Pose of each joint of trimming mechanical arm
Figure FDA0003546191450000023
Position of hedgerow under coordinate system of end effector of mechanical arm of the trimming
Figure FDA0003546191450000024
Pose of mobile chassis under world coordinate system
Figure FDA0003546191450000025
Figure FDA0003546191450000026
Wherein, the position P of the center of the cross section of the hedge under the pixel coordinate system of the detection camera of the cross section of the hedgeC=(Xc,Yc) The acquisition method comprises the following steps: through the hedgerowThe method comprises the steps that a hedgerow image is shot by a cross section detection camera and feature extraction is carried out, a feature template of the cross section shape of the hedgerow is obtained, when the tail end of a trimming mechanical arm moves right above the hedgerow, the cross section shape of the hedgerow is detected and matched with the feature template, after the matching coincidence degree is larger than 80%, the cross section feature identification of the hedgerow can be considered to be effective, and a cross section center coordinate value P is outputC=(Xc,Yc);
The method for acquiring the height H of the hedgerow and the distance L from the center of the longitudinal tangent plane of the hedgerow to the coordinate origin of the trimming mechanical arm comprises the following steps: the hedgerow height and distance detection camera is used for shooting hedgerow images, and the hedgerow height H and the distance L from the center of the longitudinal section of the hedgerow to the coordinate origin of the trimming mechanical arm are obtained;
The method for acquiring the position of the lane line under the coordinate system of the front lane line detection camera comprises the following steps: the front lane line detection camera takes a picture of a front lane, and obtains a left front (x) line and a right front (x) line of a lane line in a front ROI area from the pictureflf,yflf) Left posterior (x)flb,yflb) Right front (x)frf,yfrf) Right rear (x)frb,yfrb) The coordinate values of (2).
Wherein the set of actions includes the following three parts: motion velocity component of each joint of trimming mechanical arm
Figure FDA0003546191450000027
Figure FDA0003546191450000028
Longitudinal speed of moving chassis, front wheel slip angle
Figure FDA0003546191450000029
Wherein the reward function comprises the following parts: the operation smoothness of each joint of the trimming mechanical arm is rewarded, the distance between a hedge and an end actuator of the trimming mechanical arm is rewarded, and the tracking precision of the movable chassis to the lane line is rewarded;
wherein, the reward of running smoothness of each joint of the trimming mechanical arm is represented as:
Figure FDA00035461914500000210
wherein i is 0, 1, 2, 3; d omegaikThe angular velocity differential of the servo motor of each joint of the trimming mechanical arm, which is output by the system, represents the running stability of the motor, a1,b1Is constant and a1<0,b1>0;
Wherein the distance reward between the hedgerow and the end effector of the trimming robot is expressed as:
rdist=a2(ddist)2+c
in the formula a2C is a constant and a2<0,c>0,rdistIs the distance of the hedge from the origin of coordinates of the end effector of the trimming robot arm,
Figure FDA0003546191450000031
when the end effector of the trimming mechanical arm is closer to the hedgerow, the intelligent agent obtains a larger reward value;
Wherein, the reward of the tracking precision of the mobile chassis to the lane line is expressed as:
Figure FDA0003546191450000032
in the formula, a3,b2Is constant and a3<0,b2>0,K1,K2The number of the intelligent agent is a constant, the boundary of the coordinate range of the center line of the lane is represented, x represents the value of the center line of the lane line, and when the value of x is closer to the center line, the reward value of the intelligent agent is larger, otherwise, the reward value is smaller;
in summary, the reward function is represented as:
R=ω1*rarms2*rdist3*rtrack
in the formula, omega1,ω2,ω3Respectively a trimming robot systemThe weights of all parts of the reward function are set to be 0-1, and the larger the weight is, the better the performance of the trained prediction model in the aspect is.
4. The intelligent cooperative control method for the hedge trimming robot according to claim 1, characterized in that: and step two, adopting a fully-connected neural network model to respectively build a deep neural network framework of the strategy network and a deep neural network framework of the value function network.
5. The intelligent cooperative control method for the hedge trimming robot according to claim 1, characterized in that:
in step three, the policy network objective function of the improved PPO algorithm is expressed as:
Figure FDA0003546191450000033
in which M is
Figure FDA0003546191450000034
πθAnd pi'θRespectively representing the original strategy and the updated strategy in a single round; n is a radical ofkFor a set of trajectories obtained after executing a policy in an environment, Nk={τiR, denotes a set of state-action sequences S 0,A0,R0…St,At,Rt},
Figure FDA0003546191450000035
Representing the ratio between the old strategy and the new strategy; the epsilon is a hyperparametric truncation factor,
Figure FDA0003546191450000036
to use the merit function of generalized merit estimation (GAE), defined as
Figure FDA0003546191450000037
In the formulaV(St) In order to estimate the state cost function, lambda is an introduced hyper-parameter and takes the value of 0 to 1; e is a0The initial truncation factor is N, the training frequency is already finished, and the set total training frequency is represented by N;
the value function network objective function of the improved PPO algorithm is expressed as:
Figure FDA0003546191450000038
in the formula, Vφ(St) Is a state cost function.
6. The intelligent cooperative control method for the hedge trimming robot according to claim 1, characterized in that:
and fifthly, the learning rate in the optimizer of the Adam adaptive gradient algorithm is represented as:
Figure FDA0003546191450000041
wherein, alpha is the initial step length, beta1,β2An exponential decay rate is estimated for a moment, e 0, 1, which is a small constant with a stable value, usually 10-8,mt,vtThe biased first moment estimate and the biased second moment estimate, respectively, are calculated from the objective function gradient, and Down _ bdy _ a and Up _ bdy _ a, respectively, represent the lower and upper bounds of the learning rate, and are expressed as:
Figure FDA0003546191450000042
Figure FDA0003546191450000043
in the formula, N is the current training cycle number, N is the preset total training cycle number, and in the process of optimizing the objective function, the lower bound of the learning rate is set as the learning rate value when the objective function of the strategy network starts to rise or the learning rate value when the value function network starts to fall; the upper bound of the learning rate is set as a learning rate value when the objective function of the strategy network starts to fall or a learning rate value when the value function network starts to rise; by adaptively truncating the learning rate in a specified interval, keeping the upper bound of the learning rate unchanged in the early stage of training, continuously improving the lower bound of the learning rate, keeping the lower bound of the learning rate unchanged in the later stage of training, and continuously reducing the lower bound of the learning rate, a relatively large learning rate can be obtained in the early stage of training to enable the target function to jump out of a local optimal solution, and the learning rate is monotonically reduced in the later stage of training to ensure that the target function is monotonically converged and not diverged;
Respectively optimizing a strategy network and a value function network objective function through the optimizer, and updating network parameters, wherein the updating process is represented as:
θ′=θ+Δα
Figure FDA0003546191450000044
an Adam adaptive gradient algorithm for improving the adaptive learning rate is adopted to optimize the objective function, the optimal strategy of the hedge trimming robot training model is obtained through repeated updating iteration, the optimal action can be predicted and output through inputting the latest state data, and the control instructions of the mobile chassis and the trimming mechanical arm are output.
CN202210248923.3A 2022-03-14 2022-03-14 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning Active CN114667852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210248923.3A CN114667852B (en) 2022-03-14 2022-03-14 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210248923.3A CN114667852B (en) 2022-03-14 2022-03-14 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114667852A true CN114667852A (en) 2022-06-28
CN114667852B CN114667852B (en) 2023-04-14

Family

ID=82074131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210248923.3A Active CN114667852B (en) 2022-03-14 2022-03-14 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114667852B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117162102A (en) * 2023-10-30 2023-12-05 南京邮电大学 Independent near-end strategy optimization training acceleration method for robot joint action

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389102A (en) * 2018-11-23 2019-02-26 合肥工业大学 The system of method for detecting lane lines and its application based on deep learning
CN110619442A (en) * 2019-09-26 2019-12-27 浙江科技学院 Vehicle berth prediction method based on reinforcement learning
US20200128757A1 (en) * 2018-10-29 2020-04-30 Advanced Intelligent Systems Inc. Method and apparatus for performing pruning operations using an autonomous vehicle
CN111149536A (en) * 2019-12-31 2020-05-15 广西大学 Unmanned hedge trimmer and control method thereof
CN111251294A (en) * 2020-01-14 2020-06-09 北京航空航天大学 Robot grabbing method based on visual pose perception and deep reinforcement learning
CN112528552A (en) * 2020-10-23 2021-03-19 洛阳银杏科技有限公司 Mechanical arm control model construction method based on deep reinforcement learning
CN112868415A (en) * 2021-01-11 2021-06-01 福建思特电子有限公司 Control system of machine vision gardens flower nursery pruning equipment based on 5G network
CN113199474A (en) * 2021-04-25 2021-08-03 广西大学 Robot walking and operation intelligent cooperative motion planning method
CN113487902A (en) * 2021-05-17 2021-10-08 东南大学 Reinforced learning area signal control method based on vehicle planned path
CN113705115A (en) * 2021-11-01 2021-11-26 北京理工大学 Ground unmanned vehicle chassis motion and target striking cooperative control method and system
CN113821045A (en) * 2021-08-12 2021-12-21 浙江大学 Leg and foot robot reinforcement learning action generation system
CN113829351A (en) * 2021-10-13 2021-12-24 广西大学 Collaborative control method of mobile mechanical arm based on reinforcement learning
CN113910221A (en) * 2021-09-28 2022-01-11 广州杰赛科技股份有限公司 Mechanical arm autonomous motion planning method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200128757A1 (en) * 2018-10-29 2020-04-30 Advanced Intelligent Systems Inc. Method and apparatus for performing pruning operations using an autonomous vehicle
CN109389102A (en) * 2018-11-23 2019-02-26 合肥工业大学 The system of method for detecting lane lines and its application based on deep learning
CN110619442A (en) * 2019-09-26 2019-12-27 浙江科技学院 Vehicle berth prediction method based on reinforcement learning
CN111149536A (en) * 2019-12-31 2020-05-15 广西大学 Unmanned hedge trimmer and control method thereof
CN111251294A (en) * 2020-01-14 2020-06-09 北京航空航天大学 Robot grabbing method based on visual pose perception and deep reinforcement learning
CN112528552A (en) * 2020-10-23 2021-03-19 洛阳银杏科技有限公司 Mechanical arm control model construction method based on deep reinforcement learning
CN112868415A (en) * 2021-01-11 2021-06-01 福建思特电子有限公司 Control system of machine vision gardens flower nursery pruning equipment based on 5G network
CN113199474A (en) * 2021-04-25 2021-08-03 广西大学 Robot walking and operation intelligent cooperative motion planning method
CN113487902A (en) * 2021-05-17 2021-10-08 东南大学 Reinforced learning area signal control method based on vehicle planned path
CN113821045A (en) * 2021-08-12 2021-12-21 浙江大学 Leg and foot robot reinforcement learning action generation system
CN113910221A (en) * 2021-09-28 2022-01-11 广州杰赛科技股份有限公司 Mechanical arm autonomous motion planning method, device, equipment and storage medium
CN113829351A (en) * 2021-10-13 2021-12-24 广西大学 Collaborative control method of mobile mechanical arm based on reinforcement learning
CN113705115A (en) * 2021-11-01 2021-11-26 北京理工大学 Ground unmanned vehicle chassis motion and target striking cooperative control method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117162102A (en) * 2023-10-30 2023-12-05 南京邮电大学 Independent near-end strategy optimization training acceleration method for robot joint action

Also Published As

Publication number Publication date
CN114667852B (en) 2023-04-14

Similar Documents

Publication Publication Date Title
CN111413966B (en) Progressive model prediction unmanned planning tracking cooperative control method
CN111624992B (en) Path tracking control method of transfer robot based on neural network
CN110320809B (en) AGV track correction method based on model predictive control
CN109300144B (en) Pedestrian trajectory prediction method integrating social force model and Kalman filtering
Melon et al. Reliable trajectories for dynamic quadrupeds using analytical costs and learned initializations
JPH10254505A (en) Automatic controller
CN110989597A (en) Adaptive path tracking method of integrated fuzzy neural network
CN116533249A (en) Mechanical arm control method based on deep reinforcement learning
CN114667852B (en) Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
Fröhlich et al. Contextual tuning of model predictive control for autonomous racing
CN115416024A (en) Moment-controlled mechanical arm autonomous trajectory planning method and system
CN116460860A (en) Model-based robot offline reinforcement learning control method
CN109249393B (en) Multi-parameter robot real-time behavior correction method based on empirical control
Fröhlich et al. Model learning and contextual controller tuning for autonomous racing
CN112965487A (en) Mobile robot trajectory tracking control method based on strategy iteration
CN116755323A (en) Multi-rotor unmanned aerial vehicle PID self-tuning method based on deep reinforcement learning
CN114779661B (en) Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm
CN113829351B (en) Cooperative control method of mobile mechanical arm based on reinforcement learning
CN111413974B (en) Automobile automatic driving motion planning method and system based on learning sampling type
CN115918377A (en) Control method and control device of automatic tree fruit picking machine and automatic tree fruit picking machine
CN115542733A (en) Self-adaptive dynamic window method based on deep reinforcement learning
CN114839878A (en) Improved PPO algorithm-based biped robot walking stability optimization method
CN110703792B (en) Underwater robot attitude control method based on reinforcement learning
CN110502857B (en) Terrain roughness online estimation method for quadruped robot
JPH0635525A (en) Robot arm control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant