CN116237935A - Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium - Google Patents

Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium Download PDF

Info

Publication number
CN116237935A
CN116237935A CN202310106945.0A CN202310106945A CN116237935A CN 116237935 A CN116237935 A CN 116237935A CN 202310106945 A CN202310106945 A CN 202310106945A CN 116237935 A CN116237935 A CN 116237935A
Authority
CN
China
Prior art keywords
momentum
order
mechanical arm
fractional
gradient descent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310106945.0A
Other languages
Chinese (zh)
Other versions
CN116237935B (en
Inventor
赵东东
吴思敏
赵志立
孙卫国
孙万胜
张国华
阎石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou University
Original Assignee
Lanzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou University filed Critical Lanzhou University
Priority to CN202310106945.0A priority Critical patent/CN116237935B/en
Publication of CN116237935A publication Critical patent/CN116237935A/en
Application granted granted Critical
Publication of CN116237935B publication Critical patent/CN116237935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J18/00Arms
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a method, a system, a mechanical arm and a storage medium for collaborative grabbing of the mechanical arm, wherein the method for collaborative grabbing of the mechanical arm comprises the following steps: acquiring a target image; based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.

Description

Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium
Technical Field
The invention relates to the technical field of mechanical arm control, in particular to a method and a system for collaborative grabbing of a mechanical arm, the mechanical arm and a storage medium.
Background
With the rapid development of robotics and artificial intelligence, machines may be used instead of manpower to perform a wide variety of tasks. In order to realize that the machine replaces manpower to perform various works, a mechanical arm is required to perform machine learning (such as deep learning and reinforcement learning) so as to interact with the external environment to realize various grabbing tasks. However, the model effect is different due to different optimizing effects of different optimizing algorithms, so that the grabbing success rate of the mechanical arm is different.
At present, a gradient descent method is often used for training the neural network, and the essence is that the neural network model is subjected to weight updating, and the neural network converges after multiple iterations to obtain an optimal model. However, the gradient descent method has the following problems: the learning rate determines the reliability degree and risk degree of the optimizer, the too high learning rate setting may cause the optimizer to ignore the global minimum and easily fall into a local optimal solution, and the too low learning rate may cause the running crash, spending a lot of time and having a slow training speed. The gradient descent method takes a long time to obtain a convergence solution, and each step calculates and adjusts the direction of the next step. When applied to large data sets, each input sample needs to update its parameters, and each iteration needs to traverse all samples. Once the model falls into the saddle point, the gradient is zero, the model parameters are not updated, the stability of model training is not guaranteed, and the grabbing success rate of the mechanical arm is low.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides the collaborative grabbing method for the mechanical arm, which can improve the stability of model training, ensure that the mechanical arm obtains larger rewarding rewards and improve the grabbing success rate of the mechanical arm.
The invention also provides a mechanical arm collaborative grabbing system and a computer readable storage medium.
According to an embodiment of the first aspect of the invention, the mechanical arm collaborative grabbing method comprises the following steps:
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp a target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
The mechanical arm collaborative grabbing method provided by the embodiment of the invention has at least the following beneficial effects:
the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing a loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.
According to some embodiments of the invention, the momentum information comprises a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing the first-order momentum and the second-order momentum into the fractional order gradient descent algorithm to obtain the momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all moments in the past.
According to some embodiments of the invention, the constraint formula of the momentum fractional gradient descent algorithm is:
Figure BDA0004075514720000021
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k For the first order momentum, v k Epsilon is the minimum constant and delta is the weight decay parameter for the second order momentum.
According to some embodiments of the invention, the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the order of the fractional order is 0.999, the minimum constant is 1e-7, and the weight decay parameter is 5e-3.
According to some embodiments of the invention, the constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k
the constraint formula of the second-order momentum is as follows:
v k =β 2 v k-1 +(1-β 2 )g k 2
wherein beta is 1 And beta 2 G is the momentum factor k Is the gradient at time k.
According to some embodiments of the present invention, after the controlling the mechanical arm according to the target image to perform reinforcement learning training to grasp the target object, the method further includes the following steps:
the grasping result and the rewarding rewards are determined.
According to some embodiments of the invention, the reinforcement learning model employs a DQN algorithm.
According to a second aspect of the present invention, a robot arm co-gripping system includes:
a target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
The mechanical arm collaborative grabbing system provided by the embodiment of the invention has at least the following beneficial effects:
the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing a loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing system provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger return rewards, and the mechanical arm grabbing success rate is improved.
An embodiment of the third aspect of the present invention provides a manipulator, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the manipulator collaborative grabbing method according to the embodiment of the first aspect when executing the computer program. The control device adopts all the technical schemes of the mechanical arm collaborative grabbing method in the embodiment, so that the control device has at least all the beneficial effects brought by the technical schemes in the embodiment.
According to a fourth aspect of the present invention, a computer readable storage medium is provided, which stores computer executable instructions for performing the robot arm co-grasping method according to the first aspect of the present invention. The computer readable storage medium adopts all the technical schemes of the mechanical arm collaborative grabbing method in the above embodiments, so that the method has at least all the beneficial effects brought by the technical schemes in the above embodiments.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of an implementation of a momentum fractional gradient descent algorithm according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for collaborative gripping of a robotic arm according to an embodiment of the present invention;
FIG. 3 is a graph of the grasping results according to an embodiment of the invention;
FIG. 4 is a graph of rewards in accordance with an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be determined reasonably by a person skilled in the art in combination with the specific content of the technical solution.
The following description will be made in detail and detail on the method for collaborative gripping by a mechanical arm according to the first embodiment of the present invention with reference to fig. 1 to 4, and it is apparent that the embodiments described below are some, but not all, embodiments of the present invention.
According to the embodiment of the first aspect of the invention, the mechanical arm collaborative grabbing method comprises the following steps of;
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information. The momentum fractional gradient descent algorithm can be deduced according to definition of the fractional gradient descent algorithm and momentum information, and can converge to a real extreme point w * The loss function of the reinforcement learning model neural network can be optimized by a momentum fractional gradient descent algorithm.
The parameters involved in the momentum fractional gradient descent algorithm are: the method comprises the steps of learning rate, first-order momentum, second-order momentum, fractional order, minimum constant and weight attenuation parameters, determining specific values of the parameters according to past experience and through multiple experiments, controlling a mechanical arm to perform reinforcement learning training according to a target image based on a reinforcement learning model after the parameters are set, grabbing a target object, and optimizing a loss function through a momentum fractional order gradient descent algorithm.
The model effect of the reinforcement learning model is verified by training the mechanical arm for a plurality of times and testing the grabbing success rate of the mechanical arm and the return rewarding magnitude obtained by the inverted pendulum in reinforcement learning. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
According to the mechanical arm collaborative grabbing method provided by the embodiment of the invention, the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing the loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.
In some embodiments of the invention, the momentum information includes a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing first-order momentum and second-order momentum into a fractional order gradient descent algorithm to obtain a momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all the moments in the past.
The constraint formula of the fractional order gradient descent algorithm is obtained by definition of fractional order, and is specifically defined as:
Figure BDA0004075514720000051
wherein alpha is the order of fractional order, m-1 < alpha < m,
Figure BDA0004075514720000052
t 0 is a preset initial value.
The definition of Gamma function Γ (χ) is:
Figure BDA0004075514720000053
the definition of the fractional order gradient is:
Figure BDA0004075514720000054
wherein w is a parameter to be optimized, mu is a learning rate, k is iteration times, and alpha is the order of a fractional order.
Obtaining a fractional step update expression by combining the formula (1) and the formula (3):
Figure BDA0004075514720000061
the first order momentum is the average value of the gradient directions at each moment, namely the descending direction at the moment t, and is determined not only by the gradient direction of the current point, but also by the descending direction accumulated before.
The constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k (5)
The constraint formula of the second-order momentum is:
v k =β 2 v k-1 +(1-β 2 )g k 2 (6)
Wherein beta is 1 And beta 2 As momentum factor, beta 1 ,β 2 ∈(0,1),g k Is the gradient at time k.
Formula (5) can be simplified as:
Figure BDA0004075514720000062
formula (6) can be simplified as:
Figure BDA0004075514720000063
wherein m is 0 And v 0 As the preset initial momentum, the initial momentum is generally set to 0, and thus, the equation (7) can be simplified as:
Figure BDA0004075514720000064
equation (8) can be simplified as:
Figure BDA0004075514720000065
in order to weaken the influence of the initial momentum of 0, deviation correction is introduced to obtain:
Figure BDA0004075514720000066
Figure BDA0004075514720000067
by analogy to equation (4), an expression of the momentum fraction order gradient can be obtained:
Figure BDA0004075514720000071
in order to make the mode denominator be 0, a minimum constant epsilon is introduced, and according to the formula (9), the formula (10) and the formula (11), a momentum fraction gradient update expression can be obtained:
Figure BDA0004075514720000072
if equation (14) can converge to the real extreme point w * Then it can be used to optimize the loss function in the neural network and it is demonstrated that equation (14) can converge to the real extreme point.
Can be demonstrated by the countercheck method, assuming w of formula (14) k Converging to the real extreme point w, while w is not equal to w * And lim|w is present k -w|=0. According to the definition of convergence, when k-1 > N, there is always a sufficiently large number for any sufficiently small ε
Figure BDA0004075514720000073
(/>
Figure BDA0004075514720000074
Is a natural number set) such that |w k-1 -w|<ε<|w * -w| holds. Therefore, the expression (15) is also established.
Figure BDA0004075514720000075
By combining (14) and (15), the following inequality is obtained:
Figure BDA0004075514720000081
wherein, formula (16) always exists epsilon such that
Figure BDA0004075514720000082
The following inequality holds:
Figure BDA0004075514720000083
equation (17) can be simplified as:
d>|w k -w k-1 | α (18)
By the formulas (16) and (18), the following inequality holds:
|w k+1 -w k |>|w k -w k-1 i type (19)
Formula (19) shows w k Not converged at w, and converged to w due to the countercheck method * And (5) a dot.
The weight decay constant delta is added in the formula (13), so the formula (13) can simplify the constraint formula for obtaining the momentum fractional order gradient descent algorithm:
Figure BDA0004075514720000084
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k Is the first order momentum, v k Is the second order momentum, epsilon is the minimum constant, and delta is the weight decay parameter.
Therefore, the momentum fractional gradient descent algorithm can converge to a real extreme point and can be used for optimizing the loss function of the neural network when w k+1 -w k When=0, the gradient no longer drops, the optimal point is reached, and the optimization is finished.
In some embodiments of the invention, referring to FIG. 1, the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the fractional order is 0.999, the minimum constant is 1e-7, and the weight decay parameter is 5e-3. The specific values of the above parameters can be selected according to the actual situation, and are not to be construed as limiting the invention.
In some embodiments of the present invention, referring to fig. 3 and 4, after the robot arm is controlled to perform reinforcement learning training according to the target image to grasp the target object, the method further includes the steps of: the grasping result and the rewarding rewards are determined. The model effect of the reinforcement learning model optimized by adopting the momentum fraction order gradient descent algorithm can be verified by testing the success rate of the mechanical arm grabbing and the return rewards. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
In some embodiments of the invention, the reinforcement learning model employs the DQN algorithm. The algorithm adopted by the reinforcement learning model is a DQN (Deep Q-network) algorithm based on a value, the algorithm architecture is an DQN algorithm based on OpenAI, an Adam optimizer is adopted originally to optimize a loss function, and the momentum fractional gradient descent algorithm is adopted to optimize the loss function in the implementation of the reinforcement learning model, so that the stability of model training can be improved, and a better reinforcement learning model can be obtained. It should be noted that other algorithms may be used for reinforcement learning model, and should not be construed as limiting the present invention.
In addition, the convergence of the DQN network can be tested in a reinforcement learning classical gym environment using the momentum fractional gradient descent algorithm FoAdam. The gym includes a plurality of test environments, such as "CartPole-v1", "mountain Car-v0", etc. In some embodiments of the present invention, the convergence speed of the DQN network can be tested in a "carthole-v 1" environment, respectively, using a conventional Adam and a momentum fractional gradient descent algorithm FoAdam proposed by embodiments of the present invention. After training for a plurality of rounds, the DQN networks based on Adam and FoAdam are converged, the convergence speed difference is not large, but when training is about 3000 rounds, the success rate of grabbing a target object by a mechanical arm based on FoAdam is higher, and the obtained rewards are larger. It should be noted that how many specific training wheels can be selected according to practical situations, and the present invention is not limited thereto.
The embodiment of the second aspect of the invention also provides a mechanical arm collaborative grabbing system. The mechanical arm collaborative grabbing system according to the embodiment of the second aspect of the invention comprises a target image acquisition unit and a grabbing control unit.
A target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information.
The momentum fractional gradient descent algorithm can be deduced according to definition of the fractional gradient descent algorithm and momentum information, and can converge to a real extreme point w * Therefore, the strengthening learning model spirit can be optimized through a momentum fractional order gradient descent algorithmLoss function via the network. The parameters involved in the momentum fractional gradient descent algorithm are: the method comprises the steps of learning rate, first-order momentum, second-order momentum, fractional order, minimum constant and weight attenuation parameters, determining specific values of the parameters according to past experience and through multiple experiments, controlling a mechanical arm to perform reinforcement learning training according to a target image based on a reinforcement learning model after the parameters are set, grabbing a target object, and optimizing a loss function through a momentum fractional order gradient descent algorithm.
The model effect of the reinforcement learning model is verified by training the mechanical arm for a plurality of times and testing the grabbing success rate of the mechanical arm and the return rewarding magnitude obtained by the inverted pendulum in reinforcement learning. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
The convergence speed of the neural network is respectively tested by using a traditional Adam and a momentum fractional order gradient descent algorithm FoAdam provided by the embodiment of the invention. After training for a plurality of rounds, the neural networks based on Adam and Foadam are converged, the convergence speed difference is not large, but when training for about 3000 rounds, the success rate of grabbing a target object by a mechanical arm based on Foadam is higher, and the obtained return rewards are larger. It should be noted that how many specific training wheels can be selected according to practical situations, and the present invention is not limited thereto.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
According to the mechanical arm collaborative grabbing system provided by the embodiment of the invention, the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing the loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing system provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger return rewards, and the mechanical arm grabbing success rate is improved.
In addition, an embodiment of the third aspect of the present invention further provides a mechanical arm, where the control device includes: memory, a processor, and a computer program stored on the memory and executable on the processor. The processor and memory may be connected by bus 700 or otherwise.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software program and instructions required to implement the robotic arm co-fetching method of the above embodiments are stored in a memory, which when executed by a processor, performs the robotic arm co-fetching method of the above embodiments.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Further, the fourth aspect of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by a processor or a controller, for example, by one of the processors in the control apparatus, so that the processor performs the robot arm collaborative gripping method in the above embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention.

Claims (10)

1. The mechanical arm collaborative grabbing method is characterized by comprising the following steps of:
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp a target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
2. The robotic arm co-grasping method of claim 1, wherein the momentum information comprises a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing the first-order momentum and the second-order momentum into the fractional order gradient descent algorithm to obtain the momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all moments in the past.
3. The mechanical arm collaborative grabbing method according to claim 2, wherein the constraint formula of the momentum fractional order gradient descent algorithm is:
Figure FDA0004075514710000011
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k For the first order momentum, v k Epsilon is the minimum constant and delta is the weight decay parameter for the second order momentum.
4. The method according to claim 3, wherein the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the order of the fractional order is 0.999, the minimum constant is 1e-7, and the weight attenuation parameter is 5e-3.
5. A manipulator co-gripping method according to claim 2 or 3, wherein the constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k
the constraint formula of the second-order momentum is as follows:
v k =β 2 v k-1 +(1-β 2 )g k 2
wherein beta is 1 And beta 2 G is the momentum factor k Is the gradient at time k.
6. The method for collaborative gripping according to claim 1, further comprising the steps of, after the controlling the robotic arm to perform reinforcement learning training based on the target image to grip the target object:
the grasping result and the rewarding rewards are determined.
7. The robotic arm collaborative gripping method according to claim 1, wherein the reinforcement learning model employs a DQN algorithm.
8. A robotic arm co-grasping system, comprising:
a target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
9. A robotic arm comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the robotic arm co-fetching method of any of claims 1-7 when executing the computer program.
10. A computer-readable storage medium storing computer-executable instructions for performing the robot arm co-grasping method according to any one of claims 1 to 7.
CN202310106945.0A 2023-02-03 2023-02-03 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium Active CN116237935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310106945.0A CN116237935B (en) 2023-02-03 2023-02-03 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310106945.0A CN116237935B (en) 2023-02-03 2023-02-03 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium

Publications (2)

Publication Number Publication Date
CN116237935A true CN116237935A (en) 2023-06-09
CN116237935B CN116237935B (en) 2023-09-15

Family

ID=86629103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310106945.0A Active CN116237935B (en) 2023-02-03 2023-02-03 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium

Country Status (1)

Country Link
CN (1) CN116237935B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089589A1 (en) * 2016-09-27 2018-03-29 Fanuc Corporation Machine learning device and machine learning method for learning optimal object grasp route
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN111738408A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for optimizing loss function and storage medium
KR20210012672A (en) * 2019-07-26 2021-02-03 한국생산기술연구원 System and method for automatic control of robot manipulator based on artificial intelligence
CN114939870A (en) * 2022-05-30 2022-08-26 兰州大学 Model training method and device, strategy optimization method, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089589A1 (en) * 2016-09-27 2018-03-29 Fanuc Corporation Machine learning device and machine learning method for learning optimal object grasp route
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
KR20210012672A (en) * 2019-07-26 2021-02-03 한국생산기술연구원 System and method for automatic control of robot manipulator based on artificial intelligence
CN111738408A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for optimizing loss function and storage medium
CN114939870A (en) * 2022-05-30 2022-08-26 兰州大学 Model training method and device, strategy optimization method, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭明霄, 等.: "基于动量分数阶梯度的卷积神经网络优化方法", 计算机工程与应用, pages 80 - 86 *

Also Published As

Publication number Publication date
CN116237935B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
US20220363259A1 (en) Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device
CN108051999B (en) Accelerator beam orbit control method and system based on deep reinforcement learning
Kaushik et al. Fast online adaptation in robotics through meta-learning embeddings of simulated priors
WO2018017546A1 (en) Training machine learning models on multiple machine learning tasks
US11709462B2 (en) Safe and efficient training of a control agent
US8521678B2 (en) Learning control system and learning control method
CN112001501B (en) Parameter updating method, device and equipment of AI distributed training system
WO2019222745A1 (en) Sample-efficient reinforcement learning
JP2016158485A (en) System and method for stopping train within predetermined position range
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN116237935B (en) Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium
JP2019074947A (en) Learning device, learning method, and learning program
JP2005078516A (en) Device, method and program for parallel learning
CN114371729B (en) Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback
CN108073072B (en) Parameter self-tuning method of SISO (Single input Single output) compact-format model-free controller based on partial derivative information
CN114527642B (en) Method for automatically adjusting PID parameters by AGV based on deep reinforcement learning
CN116047888A (en) Control method of self-balancing vehicle based on BP neural network PID
Gao et al. Cooperative learning control of unknown nonlinear multi-agent systems with time-varying output constraints using neural networks and barrier lyapunov functions via backstepping design
CN112162404B (en) Design method of free-form surface imaging system
CN113485099A (en) Online learning control method of nonlinear discrete time system
CN113052312A (en) Deep reinforcement learning model training method and device, medium and electronic equipment
CN117872731A (en) Aircraft load shedding guidance method, device and storage medium
KR20230091821A (en) System and method for performing reinforcement learning by setting hindsight goal ranking for replay buffer
Pu et al. Context-based soft actor critic for environments with non-stationary dynamics
Hofer et al. Online reinforcement learning for real-time exploration in continuous state and action markov decision processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant