CN116237935A - Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium - Google Patents
Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium Download PDFInfo
- Publication number
- CN116237935A CN116237935A CN202310106945.0A CN202310106945A CN116237935A CN 116237935 A CN116237935 A CN 116237935A CN 202310106945 A CN202310106945 A CN 202310106945A CN 116237935 A CN116237935 A CN 116237935A
- Authority
- CN
- China
- Prior art keywords
- momentum
- order
- mechanical arm
- fractional
- gradient descent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1679—Programme controls characterised by the tasks executed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J18/00—Arms
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a method, a system, a mechanical arm and a storage medium for collaborative grabbing of the mechanical arm, wherein the method for collaborative grabbing of the mechanical arm comprises the following steps: acquiring a target image; based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.
Description
Technical Field
The invention relates to the technical field of mechanical arm control, in particular to a method and a system for collaborative grabbing of a mechanical arm, the mechanical arm and a storage medium.
Background
With the rapid development of robotics and artificial intelligence, machines may be used instead of manpower to perform a wide variety of tasks. In order to realize that the machine replaces manpower to perform various works, a mechanical arm is required to perform machine learning (such as deep learning and reinforcement learning) so as to interact with the external environment to realize various grabbing tasks. However, the model effect is different due to different optimizing effects of different optimizing algorithms, so that the grabbing success rate of the mechanical arm is different.
At present, a gradient descent method is often used for training the neural network, and the essence is that the neural network model is subjected to weight updating, and the neural network converges after multiple iterations to obtain an optimal model. However, the gradient descent method has the following problems: the learning rate determines the reliability degree and risk degree of the optimizer, the too high learning rate setting may cause the optimizer to ignore the global minimum and easily fall into a local optimal solution, and the too low learning rate may cause the running crash, spending a lot of time and having a slow training speed. The gradient descent method takes a long time to obtain a convergence solution, and each step calculates and adjusts the direction of the next step. When applied to large data sets, each input sample needs to update its parameters, and each iteration needs to traverse all samples. Once the model falls into the saddle point, the gradient is zero, the model parameters are not updated, the stability of model training is not guaranteed, and the grabbing success rate of the mechanical arm is low.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides the collaborative grabbing method for the mechanical arm, which can improve the stability of model training, ensure that the mechanical arm obtains larger rewarding rewards and improve the grabbing success rate of the mechanical arm.
The invention also provides a mechanical arm collaborative grabbing system and a computer readable storage medium.
According to an embodiment of the first aspect of the invention, the mechanical arm collaborative grabbing method comprises the following steps:
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp a target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
The mechanical arm collaborative grabbing method provided by the embodiment of the invention has at least the following beneficial effects:
the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing a loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.
According to some embodiments of the invention, the momentum information comprises a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing the first-order momentum and the second-order momentum into the fractional order gradient descent algorithm to obtain the momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all moments in the past.
According to some embodiments of the invention, the constraint formula of the momentum fractional gradient descent algorithm is:
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k For the first order momentum, v k Epsilon is the minimum constant and delta is the weight decay parameter for the second order momentum.
According to some embodiments of the invention, the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the order of the fractional order is 0.999, the minimum constant is 1e-7, and the weight decay parameter is 5e-3.
According to some embodiments of the invention, the constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k
the constraint formula of the second-order momentum is as follows:
v k =β 2 v k-1 +(1-β 2 )g k 2
wherein beta is 1 And beta 2 G is the momentum factor k Is the gradient at time k.
According to some embodiments of the present invention, after the controlling the mechanical arm according to the target image to perform reinforcement learning training to grasp the target object, the method further includes the following steps:
the grasping result and the rewarding rewards are determined.
According to some embodiments of the invention, the reinforcement learning model employs a DQN algorithm.
According to a second aspect of the present invention, a robot arm co-gripping system includes:
a target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
The mechanical arm collaborative grabbing system provided by the embodiment of the invention has at least the following beneficial effects:
the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing a loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing system provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger return rewards, and the mechanical arm grabbing success rate is improved.
An embodiment of the third aspect of the present invention provides a manipulator, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the manipulator collaborative grabbing method according to the embodiment of the first aspect when executing the computer program. The control device adopts all the technical schemes of the mechanical arm collaborative grabbing method in the embodiment, so that the control device has at least all the beneficial effects brought by the technical schemes in the embodiment.
According to a fourth aspect of the present invention, a computer readable storage medium is provided, which stores computer executable instructions for performing the robot arm co-grasping method according to the first aspect of the present invention. The computer readable storage medium adopts all the technical schemes of the mechanical arm collaborative grabbing method in the above embodiments, so that the method has at least all the beneficial effects brought by the technical schemes in the above embodiments.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of an implementation of a momentum fractional gradient descent algorithm according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for collaborative gripping of a robotic arm according to an embodiment of the present invention;
FIG. 3 is a graph of the grasping results according to an embodiment of the invention;
FIG. 4 is a graph of rewards in accordance with an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be determined reasonably by a person skilled in the art in combination with the specific content of the technical solution.
The following description will be made in detail and detail on the method for collaborative gripping by a mechanical arm according to the first embodiment of the present invention with reference to fig. 1 to 4, and it is apparent that the embodiments described below are some, but not all, embodiments of the present invention.
According to the embodiment of the first aspect of the invention, the mechanical arm collaborative grabbing method comprises the following steps of;
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information. The momentum fractional gradient descent algorithm can be deduced according to definition of the fractional gradient descent algorithm and momentum information, and can converge to a real extreme point w * The loss function of the reinforcement learning model neural network can be optimized by a momentum fractional gradient descent algorithm.
The parameters involved in the momentum fractional gradient descent algorithm are: the method comprises the steps of learning rate, first-order momentum, second-order momentum, fractional order, minimum constant and weight attenuation parameters, determining specific values of the parameters according to past experience and through multiple experiments, controlling a mechanical arm to perform reinforcement learning training according to a target image based on a reinforcement learning model after the parameters are set, grabbing a target object, and optimizing a loss function through a momentum fractional order gradient descent algorithm.
The model effect of the reinforcement learning model is verified by training the mechanical arm for a plurality of times and testing the grabbing success rate of the mechanical arm and the return rewarding magnitude obtained by the inverted pendulum in reinforcement learning. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
According to the mechanical arm collaborative grabbing method provided by the embodiment of the invention, the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing the loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing method provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger rewarding rewards, and the mechanical arm grabbing success rate is improved.
In some embodiments of the invention, the momentum information includes a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing first-order momentum and second-order momentum into a fractional order gradient descent algorithm to obtain a momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all the moments in the past.
The constraint formula of the fractional order gradient descent algorithm is obtained by definition of fractional order, and is specifically defined as:
The definition of Gamma function Γ (χ) is:
the definition of the fractional order gradient is:
wherein w is a parameter to be optimized, mu is a learning rate, k is iteration times, and alpha is the order of a fractional order.
Obtaining a fractional step update expression by combining the formula (1) and the formula (3):
the first order momentum is the average value of the gradient directions at each moment, namely the descending direction at the moment t, and is determined not only by the gradient direction of the current point, but also by the descending direction accumulated before.
The constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k (5)
The constraint formula of the second-order momentum is:
v k =β 2 v k-1 +(1-β 2 )g k 2 (6)
Wherein beta is 1 And beta 2 As momentum factor, beta 1 ,β 2 ∈(0,1),g k Is the gradient at time k.
Formula (5) can be simplified as:
formula (6) can be simplified as:
wherein m is 0 And v 0 As the preset initial momentum, the initial momentum is generally set to 0, and thus, the equation (7) can be simplified as:
equation (8) can be simplified as:
in order to weaken the influence of the initial momentum of 0, deviation correction is introduced to obtain:
by analogy to equation (4), an expression of the momentum fraction order gradient can be obtained:
in order to make the mode denominator be 0, a minimum constant epsilon is introduced, and according to the formula (9), the formula (10) and the formula (11), a momentum fraction gradient update expression can be obtained:
if equation (14) can converge to the real extreme point w * Then it can be used to optimize the loss function in the neural network and it is demonstrated that equation (14) can converge to the real extreme point.
Can be demonstrated by the countercheck method, assuming w of formula (14) k Converging to the real extreme point w, while w is not equal to w * And lim|w is present k -w|=0. According to the definition of convergence, when k-1 > N, there is always a sufficiently large number for any sufficiently small ε(/>Is a natural number set) such that |w k-1 -w|<ε<|w * -w| holds. Therefore, the expression (15) is also established.
By combining (14) and (15), the following inequality is obtained:
equation (17) can be simplified as:
d>|w k -w k-1 | α (18)
By the formulas (16) and (18), the following inequality holds:
|w k+1 -w k |>|w k -w k-1 i type (19)
Formula (19) shows w k Not converged at w, and converged to w due to the countercheck method * And (5) a dot.
The weight decay constant delta is added in the formula (13), so the formula (13) can simplify the constraint formula for obtaining the momentum fractional order gradient descent algorithm:
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k Is the first order momentum, v k Is the second order momentum, epsilon is the minimum constant, and delta is the weight decay parameter.
Therefore, the momentum fractional gradient descent algorithm can converge to a real extreme point and can be used for optimizing the loss function of the neural network when w k+1 -w k When=0, the gradient no longer drops, the optimal point is reached, and the optimization is finished.
In some embodiments of the invention, referring to FIG. 1, the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the fractional order is 0.999, the minimum constant is 1e-7, and the weight decay parameter is 5e-3. The specific values of the above parameters can be selected according to the actual situation, and are not to be construed as limiting the invention.
In some embodiments of the present invention, referring to fig. 3 and 4, after the robot arm is controlled to perform reinforcement learning training according to the target image to grasp the target object, the method further includes the steps of: the grasping result and the rewarding rewards are determined. The model effect of the reinforcement learning model optimized by adopting the momentum fraction order gradient descent algorithm can be verified by testing the success rate of the mechanical arm grabbing and the return rewards. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
In some embodiments of the invention, the reinforcement learning model employs the DQN algorithm. The algorithm adopted by the reinforcement learning model is a DQN (Deep Q-network) algorithm based on a value, the algorithm architecture is an DQN algorithm based on OpenAI, an Adam optimizer is adopted originally to optimize a loss function, and the momentum fractional gradient descent algorithm is adopted to optimize the loss function in the implementation of the reinforcement learning model, so that the stability of model training can be improved, and a better reinforcement learning model can be obtained. It should be noted that other algorithms may be used for reinforcement learning model, and should not be construed as limiting the present invention.
In addition, the convergence of the DQN network can be tested in a reinforcement learning classical gym environment using the momentum fractional gradient descent algorithm FoAdam. The gym includes a plurality of test environments, such as "CartPole-v1", "mountain Car-v0", etc. In some embodiments of the present invention, the convergence speed of the DQN network can be tested in a "carthole-v 1" environment, respectively, using a conventional Adam and a momentum fractional gradient descent algorithm FoAdam proposed by embodiments of the present invention. After training for a plurality of rounds, the DQN networks based on Adam and FoAdam are converged, the convergence speed difference is not large, but when training is about 3000 rounds, the success rate of grabbing a target object by a mechanical arm based on FoAdam is higher, and the obtained rewards are larger. It should be noted that how many specific training wheels can be selected according to practical situations, and the present invention is not limited thereto.
The embodiment of the second aspect of the invention also provides a mechanical arm collaborative grabbing system. The mechanical arm collaborative grabbing system according to the embodiment of the second aspect of the invention comprises a target image acquisition unit and a grabbing control unit.
A target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and the momentum information.
The momentum fractional gradient descent algorithm can be deduced according to definition of the fractional gradient descent algorithm and momentum information, and can converge to a real extreme point w * Therefore, the strengthening learning model spirit can be optimized through a momentum fractional order gradient descent algorithmLoss function via the network. The parameters involved in the momentum fractional gradient descent algorithm are: the method comprises the steps of learning rate, first-order momentum, second-order momentum, fractional order, minimum constant and weight attenuation parameters, determining specific values of the parameters according to past experience and through multiple experiments, controlling a mechanical arm to perform reinforcement learning training according to a target image based on a reinforcement learning model after the parameters are set, grabbing a target object, and optimizing a loss function through a momentum fractional order gradient descent algorithm.
The model effect of the reinforcement learning model is verified by training the mechanical arm for a plurality of times and testing the grabbing success rate of the mechanical arm and the return rewarding magnitude obtained by the inverted pendulum in reinforcement learning. As shown in fig. 3 and 4, fig. 3 is a graph of the grasping result and fig. 4 is a graph of the rewards. In fig. 3, the x-axis is the training frequency, the y-axis is the grabbing result, the solid line represents the success rate of the mechanical arm grabbing the target object under the traditional Adam optimizer, and the dotted line represents the success rate of the mechanical arm grabbing the target object under the Adam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the invention. In fig. 4, the x-axis is the training number, the y-axis is the rewarding reward, the solid line represents the rewarding reward obtained by the inverted pendulum under the traditional Adam optimizer, and the dotted line represents the rewarding reward obtained by the inverted pendulum under the foam (momentum fractional gradient descent algorithm) optimizer provided by the embodiment of the present invention. As can be seen from fig. 3 and fig. 4, the foam not only gives a larger rewarding prize to reinforcement learning, but also increases the success rate of the robot arm grasping the target object after the same round of training as compared with Adam.
The convergence speed of the neural network is respectively tested by using a traditional Adam and a momentum fractional order gradient descent algorithm FoAdam provided by the embodiment of the invention. After training for a plurality of rounds, the neural networks based on Adam and Foadam are converged, the convergence speed difference is not large, but when training for about 3000 rounds, the success rate of grabbing a target object by a mechanical arm based on Foadam is higher, and the obtained return rewards are larger. It should be noted that how many specific training wheels can be selected according to practical situations, and the present invention is not limited thereto.
It should be noted that, the inverted pendulum, rewarding reward, etc. in reinforcement learning are all common knowledge known to those skilled in the art, and the specific principles and meanings thereof will not be explained in detail herein. In addition, how to determine the grasping result and the rewarding rewards is not limited herein, and the determining means may be selected according to actual situations, and this part of content is known in the art by those skilled in the art, and will not be described herein.
According to the mechanical arm collaborative grabbing system provided by the embodiment of the invention, the momentum information is introduced into the fractional gradient descent algorithm, so that the momentum fractional gradient descent algorithm with the momentum information can be obtained, the momentum fractional gradient descent algorithm is used for optimizing the loss function in the neural network, the stability of model training can be improved, and a better reinforcement learning model can be obtained. The momentum fractional order gradient descent algorithm is applied to the reinforcement learning model, and based on the reinforcement learning model, the mechanical arm is controlled to carry out reinforcement learning training according to the target image so as to grasp the target object, so that the mechanical arm in reinforcement learning can obtain larger return rewards, and the grasping success rate of the mechanical arm is improved. The mechanical arm collaborative grabbing system provided by the embodiment of the invention can improve the stability of model training, so that the mechanical arm obtains larger return rewards, and the mechanical arm grabbing success rate is improved.
In addition, an embodiment of the third aspect of the present invention further provides a mechanical arm, where the control device includes: memory, a processor, and a computer program stored on the memory and executable on the processor. The processor and memory may be connected by bus 700 or otherwise.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software program and instructions required to implement the robotic arm co-fetching method of the above embodiments are stored in a memory, which when executed by a processor, performs the robotic arm co-fetching method of the above embodiments.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Further, the fourth aspect of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by a processor or a controller, for example, by one of the processors in the control apparatus, so that the processor performs the robot arm collaborative gripping method in the above embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention.
Claims (10)
1. The mechanical arm collaborative grabbing method is characterized by comprising the following steps of:
acquiring a target image;
based on the reinforcement learning model, controlling the mechanical arm to perform reinforcement learning training according to the target image so as to grasp a target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
2. The robotic arm co-grasping method of claim 1, wherein the momentum information comprises a first order momentum and a second order momentum; the momentum fractional order gradient descent algorithm is obtained by the following steps:
and introducing the first-order momentum and the second-order momentum into the fractional order gradient descent algorithm to obtain the momentum fractional order gradient descent algorithm, wherein the first-order momentum is an average value of gradient directions at all moments, and the second-order momentum is a linear combination of squares of gradients at all moments in the past.
3. The mechanical arm collaborative grabbing method according to claim 2, wherein the constraint formula of the momentum fractional order gradient descent algorithm is:
wherein w is a parameter to be optimized, mu is a learning rate, k is the iteration number, alpha is the order of the fractional order, and m k For the first order momentum, v k Epsilon is the minimum constant and delta is the weight decay parameter for the second order momentum.
4. The method according to claim 3, wherein the learning rate is 2e-3, the first order momentum is 0.9, the second order momentum is 0.999, the order of the fractional order is 0.999, the minimum constant is 1e-7, and the weight attenuation parameter is 5e-3.
5. A manipulator co-gripping method according to claim 2 or 3, wherein the constraint formula of the first order momentum is:
m k =β 1 m k-1 +(1-β 1 )g k
the constraint formula of the second-order momentum is as follows:
v k =β 2 v k-1 +(1-β 2 )g k 2
wherein beta is 1 And beta 2 G is the momentum factor k Is the gradient at time k.
6. The method for collaborative gripping according to claim 1, further comprising the steps of, after the controlling the robotic arm to perform reinforcement learning training based on the target image to grip the target object:
the grasping result and the rewarding rewards are determined.
7. The robotic arm collaborative gripping method according to claim 1, wherein the reinforcement learning model employs a DQN algorithm.
8. A robotic arm co-grasping system, comprising:
a target image acquisition unit configured to acquire a target image;
the grabbing control unit is used for controlling the mechanical arm to carry out reinforcement learning training according to the target image based on the reinforcement learning model so as to grab the target object; and optimizing a loss function by adopting a momentum fractional gradient descent algorithm in the model training process of the reinforcement learning model, wherein the momentum fractional gradient descent algorithm is obtained according to the fractional gradient descent algorithm and momentum information.
9. A robotic arm comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the robotic arm co-fetching method of any of claims 1-7 when executing the computer program.
10. A computer-readable storage medium storing computer-executable instructions for performing the robot arm co-grasping method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310106945.0A CN116237935B (en) | 2023-02-03 | 2023-02-03 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310106945.0A CN116237935B (en) | 2023-02-03 | 2023-02-03 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116237935A true CN116237935A (en) | 2023-06-09 |
CN116237935B CN116237935B (en) | 2023-09-15 |
Family
ID=86629103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310106945.0A Active CN116237935B (en) | 2023-02-03 | 2023-02-03 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116237935B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180089589A1 (en) * | 2016-09-27 | 2018-03-29 | Fanuc Corporation | Machine learning device and machine learning method for learning optimal object grasp route |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
CN111738408A (en) * | 2020-05-14 | 2020-10-02 | 平安科技(深圳)有限公司 | Method, device and equipment for optimizing loss function and storage medium |
KR20210012672A (en) * | 2019-07-26 | 2021-02-03 | 한국생산기술연구원 | System and method for automatic control of robot manipulator based on artificial intelligence |
CN114939870A (en) * | 2022-05-30 | 2022-08-26 | 兰州大学 | Model training method and device, strategy optimization method, equipment and medium |
-
2023
- 2023-02-03 CN CN202310106945.0A patent/CN116237935B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180089589A1 (en) * | 2016-09-27 | 2018-03-29 | Fanuc Corporation | Machine learning device and machine learning method for learning optimal object grasp route |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
KR20210012672A (en) * | 2019-07-26 | 2021-02-03 | 한국생산기술연구원 | System and method for automatic control of robot manipulator based on artificial intelligence |
CN111738408A (en) * | 2020-05-14 | 2020-10-02 | 平安科技(深圳)有限公司 | Method, device and equipment for optimizing loss function and storage medium |
CN114939870A (en) * | 2022-05-30 | 2022-08-26 | 兰州大学 | Model training method and device, strategy optimization method, equipment and medium |
Non-Patent Citations (1)
Title |
---|
郭明霄, 等.: "基于动量分数阶梯度的卷积神经网络优化方法", 计算机工程与应用, pages 80 - 86 * |
Also Published As
Publication number | Publication date |
---|---|
CN116237935B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220363259A1 (en) | Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device | |
CN108051999B (en) | Accelerator beam orbit control method and system based on deep reinforcement learning | |
Kaushik et al. | Fast online adaptation in robotics through meta-learning embeddings of simulated priors | |
WO2018017546A1 (en) | Training machine learning models on multiple machine learning tasks | |
US11709462B2 (en) | Safe and efficient training of a control agent | |
US8521678B2 (en) | Learning control system and learning control method | |
CN112001501B (en) | Parameter updating method, device and equipment of AI distributed training system | |
WO2019222745A1 (en) | Sample-efficient reinforcement learning | |
JP2016158485A (en) | System and method for stopping train within predetermined position range | |
CN112286218B (en) | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient | |
CN116237935B (en) | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium | |
JP2019074947A (en) | Learning device, learning method, and learning program | |
JP2005078516A (en) | Device, method and program for parallel learning | |
CN114371729B (en) | Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback | |
CN108073072B (en) | Parameter self-tuning method of SISO (Single input Single output) compact-format model-free controller based on partial derivative information | |
CN114527642B (en) | Method for automatically adjusting PID parameters by AGV based on deep reinforcement learning | |
CN116047888A (en) | Control method of self-balancing vehicle based on BP neural network PID | |
Gao et al. | Cooperative learning control of unknown nonlinear multi-agent systems with time-varying output constraints using neural networks and barrier lyapunov functions via backstepping design | |
CN112162404B (en) | Design method of free-form surface imaging system | |
CN113485099A (en) | Online learning control method of nonlinear discrete time system | |
CN113052312A (en) | Deep reinforcement learning model training method and device, medium and electronic equipment | |
CN117872731A (en) | Aircraft load shedding guidance method, device and storage medium | |
KR20230091821A (en) | System and method for performing reinforcement learning by setting hindsight goal ranking for replay buffer | |
Pu et al. | Context-based soft actor critic for environments with non-stationary dynamics | |
Hofer et al. | Online reinforcement learning for real-time exploration in continuous state and action markov decision processes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |