CN113232019A

CN113232019A - Mechanical arm control method and device, electronic equipment and storage medium

Info

Publication number: CN113232019A
Application number: CN202110521680.1A
Authority: CN
Inventors: 王洛威; 王恺; 廉士国
Original assignee: China United Network Communications Group Co Ltd; Unicom Big Data Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Big Data Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-10

Abstract

The application provides a mechanical arm control method, a mechanical arm control device, electronic equipment and a storage medium, and a target object image corresponding to an object to be acquired is acquired; determining type information and pose information of a to-be-taken object according to the target object image; determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be fetched; and controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be grabbed. Combine together vision sensor and arm, accomplish the snatching of the object of waiting to get that the structure is unset through the vision guide, snatch the precision high and stable, restriction condition is less simultaneously, and the self-adaptability is strong, and is nimble relatively.

Description

Mechanical arm control method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of robot arm grabbing control, and in particular, to a robot arm control method and apparatus, an electronic device, and a storage medium.

Background

As technology continues to advance, industrial robots are moving into factories to replace human work.

At present, a mechanical arm grabs a planar object with a fixed structure according to a set control program so as to improve the working efficiency.

However, when the structure of the planar object is slightly changed, the robot cannot continue to operate, and the control program must be reset.

Disclosure of Invention

The application provides a mechanical arm control method, a mechanical arm control device, electronic equipment and a storage medium, which are used for solving the problem that a mechanical arm cannot continue to work when the structure of a planar object changes.

In a first aspect, the present application provides a method for controlling a robot arm, the method including:

acquiring a target object image corresponding to an object to be acquired;

determining type information and pose information of a to-be-taken object according to the target object image;

determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be fetched;

and controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be grabbed.

Optionally, determining type information of the object to be taken according to the target object image includes:

obtaining a classification result by using a pre-trained classification model according to the target object image;

and determining the type information of the object to be taken according to the classification result.

Optionally, determining pose information of the object to be taken according to the target object image includes:

determining the characteristics of a target image according to the target object image;

and determining the pose information of the object to be taken by using a pose calculation model according to the characteristics of the target image.

Optionally, determining a grab path according to the type information and the pose information includes:

acquiring current pose information of each joint of the mechanical arm;

and determining a grabbing path by using a decision model according to the current pose information, the type information and the pose information of the object to be taken.

Optionally, determining a grabbing path by using the decision model according to the current pose information, the type information of the object to be taken, and the pose information, including:

the decision model is Q, the iteration frequency is assumed to be Rounds, wherein Rounds is a positive integer, batch _ size when the batch gradient is decreased is m, and the maximum size n of the empirical playback pool is obtained;

taking the current pose information, the type information of the object to be taken and the pose information as state vectors in the state S

Wherein, the state S is an initialization state;

state vector

Inputting the current action A into a decision model Q;

executing the current action A in the state S to obtain the next state S ', wherein the next state S' corresponds to the feature vector

Reward R, whether the state is _ end is terminated;

will five-membered group

Adding into experience playback pool, if experience returnsSampling in batches from the empirical playback pool and updating network parameters in the decision model if the size of the playback pool is larger than m, and removing the quintuple added earliest from the empirical playback pool and adding a new quintuple if the size of the empirical playback pool is larger than n;

updating the state S to a state S';

judging whether is _ end is in a final state, if not, continuing to circularly randomly take samples from the experience playback pool, and if so, finishing the circulation to obtain a final decision model;

and determining a grabbing path according to the final decision model.

Optionally, a state vector

The method also comprises a specific scene, wherein the specific scene comprises a scene with an unfixed object structure to be taken.

Optionally, according to the grabbing path, each joint of the mechanical arm is controlled to perform angle adjustment so as to grab the object to be taken, including:

obtaining the motion trail of the mechanical arm by using a smooth trail interpolation method according to the grabbing path;

and controlling each joint of the mechanical arm to adjust the angle according to the motion trail so as to grab the object to be picked.

Optionally, the method further comprises:

and according to the target object image, if the type of the target object cannot be determined, acquiring the target object image again through the vision sensor.

In a second aspect, the present application provides an arm control apparatus, the apparatus comprising:

the acquisition module is used for acquiring a target object image corresponding to a to-be-taken object;

the processing module is used for determining the type information and the pose information of the object to be taken according to the target object image;

the processing module is further used for determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be taken;

and the processing module is also used for controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be grabbed.

In a third aspect, the present application provides an electronic device, comprising: a memory, a processor;

a memory; a memory for storing processor-executable instructions;

a processor for implementing the robot arm control method according to the first aspect and the alternative aspects, according to executable instructions stored in a memory.

In a fourth aspect, the present application provides a computer-readable storage medium having computer-executable instructions stored thereon, where the computer-executable instructions are executed by a processor to implement the robot arm control method according to the first aspect and the alternative.

In a fifth aspect, the present application provides a computer program product comprising instructions which, when executed by a processor, implement the robot arm control method of the first aspect and the alternatives.

Drawings

FIG. 1 is a schematic view of a robotic arm control system shown herein according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method for controlling a robotic arm according to an exemplary embodiment of the present application;

FIG. 3 is a flow diagram illustrating a method of controlling a robotic arm according to another exemplary embodiment of the present application;

FIG. 4 is a schematic diagram illustrating the construction of a robot arm control apparatus according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As technology continues to advance, industrial robots are moving into factories to replace human work. A robotic arm is a mechanical structure that mimics a human hand, such as a planar multi-joint robot, a palletizer, and the like. The mechanical arm generally has a plurality of joint arms and an execution end arranged on the last joint arm, various execution components are arranged on the execution end, and the execution end is moved to a specified coordinate in space through automatic control to realize functions provided by the execution components, such as writing, grabbing, testing and the like.

In order to solve the problems, the application provides a mechanical arm control method, a target object image corresponding to an object to be taken is obtained through a vision sensor, and information and pose information of the object to be taken in the target object image are obtained by utilizing a depth learning and computer image processing method. And meanwhile, converting the pose information into pose information under a universal coordinate system. And then, automatically calculating the grabbing path of the mechanical arm by using the decision model. And controlling the mechanical arm to adjust the angle of each joint according to the grabbing path, and controlling the mechanical arm to grab the object to be taken. Combine together vision sensor and arm, accomplish the snatching of the object of waiting to get that the structure is unset through the vision guide, snatch the precision high and stable, restriction condition is less simultaneously, and the self-adaptability is strong, and is nimble relatively.

FIG. 1 is a schematic view of a robotic arm control system according to an exemplary embodiment of the present application. As shown in fig. 1, the robot arm control system provided in this embodiment includes: a vision sensor 110, a master server 120, and a robotic arm 130. The vision sensor 110 is configured to acquire a target object image corresponding to an object to be acquired, and send the target object image to the main control server 120. The main control server 120 receives the target object image sent by the vision sensor 110, determines a motion path from the current position of the mechanical arm 130 to the position of the object to be taken according to the target object image, controls each joint of the mechanical arm 130 to perform angle adjustment according to the motion path, and sends a control signal to the mechanical arm 130. The mechanical arm 130 is configured to receive a control signal sent by the main control server 120, and perform angle adjustment on each joint according to the control signal to grab an object to be picked.

Fig. 2 is a flow chart diagram illustrating a robot arm control method according to an exemplary embodiment of the present application. As shown in fig. 2, the robot arm control method provided in this embodiment is based on the robot arm control system shown in fig. 1, and includes the following steps:

s101, obtaining a target object image corresponding to the object to be acquired.

More specifically, the target object image is an RGB picture in a three-dimensional space. One or more vision sensors shoot the object to be taken to obtain a target object image at a corresponding angle, and the target object image is sent to the master control server. Wherein the vision sensor comprises an RGB video camera or an industrial camera. The vision sensor can shoot and acquire target object images from one or more angles, and the multi-angle target object images can determine the type information and the pose information of the object to be taken from multiple angles, so that the mechanical arm can grab the object more accurately. The master control server receives a target object image corresponding to an object to be taken.

And S102, determining the type information and the pose information of the object to be taken according to the target object image.

More specifically, the master control server determines type information of the object to be taken by using the classification model according to the target object image information sent by the vision sensor, wherein the type information comprises a type number. And inputting the image information of the target object into a classification model, and outputting the type number of the object to be taken by the classification model so as to determine the type information of the object to be taken. The master control server obtains relative coordinate positions of at least four three-dimensional space points according to target object image information sent by the vision sensor, and the pose calculation model is used for determining the poses of the three-dimensional space points under the vision sensor. And converting the pose under the vision sensor into pose information under a universal coordinate system. The pose information comprises spatial position information and direction information of the object to be taken. The spatial position information of the object to be taken is the spatial coordinates of a preset number of points on the surface of the object to be taken, and the preset number of points comprises at least four three-dimensional spatial points.

S103, determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be fetched.

More specifically, the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be taken, and the moving path also comprises moving paths of all joints of the mechanical arm. And determining the motion path of each joint of the mechanical arm at each moment by adopting a reinforcement learning algorithm according to the type information and the pose information until the mechanical arm successfully grabs the object to be taken.

For a multi-joint mechanical arm, various joint angle configurations exist, so that the mechanical arm can grab a to-be-picked object, and countless solutions exist in the grabbing path of the mechanical arm. The traditional method generally uses Sampling-based Planning (Sampling-based Planning) to perform path Planning, and the method finds an optimal solution, but quickly finds an effective solution. Finding the optimal solution takes more time because there are numerous paths. However, the effective solution may not be the optimal solution, which means that when the machine grabs the object according to the calculated grabbing path, the grabbing path obtained by the non-optimal solution is not the shortest, that is, a single mechanical arm joint needs to rotate by an unnecessary angle or multiple joints need to rotate by an unnecessary angle. Meanwhile, because the method is based on sampling, under the condition that interpolation sampling has errors, the path planned by sampling is sometimes an invalid path for the mechanical arm, namely the path cannot be planned for the actual mechanical arm. Therefore, in the embodiment, a reinforcement learning algorithm is adopted to complete the grabbing path planning of the end-to-end real-time scene of the environment, and each step of decision-making action of each joint of the mechanical arm is determined, so that an optimal complete path is obtained.

And S104, controlling each joint of the mechanical arm to adjust the angle according to the grabbing path so as to grab the object to be grabbed.

More specifically, assuming that the time taken for the robot arm from the start of gripping to the completion of gripping is t, the gripping path includes the movement path of the robot arm at time 1, 2, …, t. And controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path at the corresponding moment at the moment 1, 2, … and t-1, and controlling the executing end of the mechanical arm to grab the object to be picked while controlling the angle adjustment of each joint of the mechanical arm until the moment t.

In the method provided by the embodiment, a target object image corresponding to an object to be acquired is acquired; determining type information and pose information of a to-be-taken object according to the target object image; determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be fetched; and controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be grabbed. Combine together vision sensor and arm, accomplish the snatching of the object of waiting to get that the structure is unset through the vision guide, snatch the precision high and stable, restriction condition is less simultaneously, and the self-adaptability is strong, and is nimble relatively.

Fig. 3 is a flow chart diagram illustrating a robot arm control method according to another exemplary embodiment of the present application. As shown in fig. 3, the method for controlling a robot arm provided in this embodiment includes the following steps:

s201, obtaining a target object image corresponding to the object to be taken.

Step S201 is similar to the step S101 in the embodiment of fig. 2, and this embodiment is not described herein again.

S202, obtaining a classification result by using a pre-trained classification model according to the target object image; and determining the type information of the object to be taken according to the classification result.

More specifically, the pre-trained classification model may be a Yolo model, a Convolutional Neural Networks (CNN) model, a Mask R-CNN (Region-CNN) model, a fast Mask R-CNN model, or the like.

Taking the CNN model as an example, the high-dimensional features on the target object image are extracted through convolution, and different types of objects to be extracted have different features, that is, the features can be extracted through the RGB images taken by the visual sensor by using the neural network convolution operation. Then, the fully-connected layer of the model carries out reward penalty on the judgment by classifying the features and carrying out priori knowledge (from pre-artificial marking), so that the convolutional neural network can learn by self under the supervision, and is continuously optimized, namely the convolutional neural network is adjusted to be in an optimal parameter state through self optimization, and a pre-trained classification model is obtained. And inputting the target object image into a pre-trained classification model, and outputting a classification result by the classification model. And determining the type information of the object to be taken according to the classification result.

S203, determining the characteristics of the target image according to the target object image; and determining the pose information of the object to be taken by using a pose calculation model according to the characteristics of the target image.

More specifically, the pose calculation model may be a Perspective-N-Point (PNP) model. The fully connected layers in the classification model in step S202 are removed, and the plurality of remaining convolutional layers are used as a feature extraction model. And inputting the target object image into a feature extraction model, and outputting the target image features by the feature extraction model. And comparing the target image features with the features extracted from the RGB images manually marked before, so that the features of the target image features and the features of the RGB images manually marked before can be matched one by one. The target image features obtained after matching at least comprise relative coordinate positions of four feature points in a three-dimensional space. And inputting the target image characteristics obtained after matching into the PNP model, and determining the poses of the characteristic points in the three-dimensional space under the vision sensor. And converting the pose under the vision sensor into pose information under a universal coordinate system. The pose information comprises spatial position information and direction information of the object to be taken. The spatial position information of the object to be taken is the spatial coordinates of a preset number of points on the surface of the object to be taken, and the preset number of points comprises at least four three-dimensional spatial points.

In this embodiment, the steps S202 and S203 are not limited by the described operation sequence, and the steps S202 and S203 may be performed in other sequences or simultaneously.

And S204, acquiring current pose information of each joint of the mechanical arm.

More specifically, the current pose information includes spatial position information and orientation information where the mechanical arm is currently located.

And S205, determining a grabbing path by using a decision model according to the current pose information, the type information and the pose information of the object to be taken.

More specifically, the decision model includes a Deep Learning (DRL) algorithm. DRL algorithms include Deep Q-Learning (DQN) algorithms and Q-Learning (QL) algorithms. The DQN algorithm is one of deep reinforcement learning DRL algorithms, and is an algorithm that combines deep learning and reinforcement learning to realize end-to-end learning from perception to action. The QL algorithm is a classical reinforcement learning algorithm, and because the QL algorithm requires a huge Q table, the memory occupied in a high-dimensional space is huge and convergence is not easy. Therefore, the present embodiment uses the DQN algorithm. Unlike most collision detection algorithms in the past, the DQN algorithm is a model-free algorithm and does not need to be modeled for each scene. And determining a grabbing path by utilizing a DQN algorithm according to the current pose information, the type information and the pose information of the object to be taken so as to realize end-to-end control on the mechanical arms of the plurality of joints.

The behavior value function of the DQN algorithm is approximated by a neural network, belongs to nonlinear approximation, and adopts a network structure of three convolution layers and two full-connection layers. The decision model is formulated as

The updating of the network is actually the updating of the parameter theta, and once theta is determined, the network parameter is determined.

The DQN algorithm is mainly characterized by introducing empirical playback, namely a quintuple

Added to an empirical replay pool, which will later be used to update the decision model

The network parameter θ in (1). Wherein the content of the first and second substances,

and

both in the form of tensors, the actions a and rewards R are scalar quantities, and is _ end is a boolean value.

the decision model is Q, and the number of iterations is assumed to be Rounds, where Rounds is a positive integer, batch _ size when the batch gradient is decreasing is m, and the empirical playback pool maximum size n.

Wherein, the state S is an initialization state,

tensors are formed by the current pose information of each joint of the mechanical arm, the type information of the object to be taken and the pose information.

State vector

And inputting the current action A into a decision model Q.

Reward R, whether the state is _ end is terminated.

Will five-membered group

Added to the experience playback pool.

If the size of the empirical playback pool is larger than m, sampling in batch from the empirical playback pool and updating the network parameters in the decision model, specifically comprising:

step 1, randomly taking m samples from an experience playback pool

Where j is 1, 2, 3, …, m, and the target value y is calculated_i：

Wherein, y_iDenotes the target value of the jth sample, R_jRepresents the reward, is _ end, for the j-th sample_jIndicates whether the jth sample is terminated, gamma indicates the attenuation coefficient,

the decision model representing the jth sample,

represents the feature vector, A ', for the j-th sample'_jRepresents the action of the jth sample and theta represents a network parameter.

Step 2, using a mean square error loss function

The network parameter θ in the decision model Q is updated.

And if the size of the experience playback pool is larger than n, removing the quintuple added earliest from the experience playback pool, and adding a new quintuple.

The state S is updated to the state S'.

And judging whether the is _ end is in a final state, if not, continuing to circularly and randomly take samples from the experience playback pool, and if so, finishing the circulation to obtain a final decision model.

And determining a grabbing path according to the final decision model. The optimal grabbing path is obtained according to the steps, so that unnecessary rotation angles of a single joint or unnecessary rotation angles of a plurality of joints of the mechanical arm are avoided to a certain extent, and the loss of the joints of the mechanical arm is reduced.

Optionally, a state vector

More specifically, the specific scene may be a scene in which the object structure and size are changing. Taking the specific scene, the current pose information, the type information and the pose information of the object to be taken as state vectors in the state S

Wherein the state S is an initialization state.

S206, obtaining the motion trail of the mechanical arm by using a smooth trail interpolation method according to the grabbing path; and controlling each joint of the mechanical arm to adjust the angle according to the motion trail so as to grab the object to be picked.

More specifically, the smooth trajectory interpolation method comprises a polynomial curve method, so that the mechanical arm is more continuous and smooth in the motion process, and noise is reduced.

Optionally, according to the target object image, if the type of the target object cannot be determined, the target object image is obtained again through the vision sensor.

More specifically, according to a target object image shot by the vision sensor, when a classification result cannot be identified by a pre-trained classification model, the main control server sends a target object image re-acquisition instruction to the vision sensor, and after receiving the instruction, the vision server re-acquires the target image and sends the target image to the main control server.

In the method provided by the embodiment, the path is planned in real time based on the deep reinforcement learning algorithm, and end-to-end real-time path planning can also be performed according to a specific scene. And (4) obtaining each step of decision-making action of the mechanical arm in a specific scene through training a decision-making model, and further obtaining an optimal complete path. In the practical application process, the trained decision model is utilized to input the target object image acquired by the visual sensor, and the path information of the mechanical arm movement can be obtained. The robustness is ensured, and the dependence on scenes is reduced.

Fig. 4 is a schematic structural diagram of a robot arm control device according to an exemplary embodiment of the present application. As shown in fig. 4, the present application provides a robot arm control apparatus 40, the apparatus 40 including:

the obtaining module 41 is configured to obtain a target object image corresponding to the object to be taken.

And the processing module 42 is configured to determine type information and pose information of the object to be taken according to the target object image.

And the processing module 42 is further configured to determine a grabbing path according to the type information and the pose information, where the grabbing path is a moving path from the current position of the robot arm to the position of the object to be fetched.

And the processing module 42 is further configured to control each joint of the mechanical arm to perform angle adjustment according to the grabbing path, so as to grab the object to be grabbed.

Specifically, the present embodiment may refer to the above method embodiments, and the principle and the technical effect are similar, which are not described again.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 5, the electronic apparatus 50 of the present embodiment includes: a processor 51 and a memory 52; wherein the content of the first and second substances,

a memory 52, a memory for storing processor-executable instructions.

The processor 51 is configured to implement the robot arm control method in the above embodiments according to executable instructions stored in the memory. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 52 may be separate or integrated with the processor 51.

When the memory 52 is provided separately, the electronic device 50 further includes a bus 53 for connecting the memory 52 and the processor 51.

The present application also provides a computer readable storage medium, in which computer instructions are stored, and the computer instructions are executed by a processor to implement the methods provided by the above-mentioned various embodiments.

The computer-readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a computer readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the computer readable storage medium. Of course, the computer readable storage medium may also be integral to the processor. The processor and the computer-readable storage medium may reside in an Application Specific Integrated Circuit (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the computer-readable storage medium may also reside as discrete components in a communication device.

The computer-readable storage medium may be implemented by any type of volatile or nonvolatile Memory device or combination thereof, such as Static Random-Access Memory (SRAM), Electrically-Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The present application also provides a computer program product comprising execution instructions stored in a computer readable storage medium. The at least one processor of the device may read the execution instructions from the computer-readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of controlling a robot arm, the method comprising:

acquiring a target object image corresponding to an object to be acquired;

determining the type information and the pose information of the object to be acquired according to the target object image;

and controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be fetched.

2. The method according to claim 1, wherein the determining the type information of the object to be taken according to the target object image comprises:

3. The method according to claim 1, wherein the determining pose information of the object to be fetched according to the target object image comprises:

and determining the pose information of the object to be acquired by using a pose calculation model according to the characteristics of the target image.

4. The method of claim 1, wherein determining a grab path from the type information and pose information comprises:

acquiring current pose information of each joint of the mechanical arm;

and determining the grabbing path by using a decision model according to the current pose information, the type information and the pose information of the object to be taken.

5. The method of claim 4, wherein determining the grasp path using a decision model based on the current pose information, the type information of the object to be taken, and the pose information comprises:

the current pose information and the object to be taken are acquiredType information and pose information of a body as state vectors in state S

Wherein the state S is an initialization state;

the state vector is integrated into a vector

Inputting the current action A into the decision model Q;

executing the current action A in the state S to obtain a next state S ', wherein the next state S' corresponds to the feature vector

Reward R, whether the state is _ end is terminated;

will five-membered group

Adding the obtained data into an empirical playback pool, sampling in batch from the empirical playback pool and updating network parameters in the decision model if the size of the empirical playback pool is larger than m, and removing the five tuples added earliest and adding new five tuples from the empirical playback pool if the size of the empirical playback pool is larger than n;

updating the state S to a state S';

judging whether is _ end is in a final state, if not, continuing to circularly and randomly take samples from the experience playback pool, and if so, finishing the circulation to obtain a final decision model;

and determining the grabbing path according to the final decision model.

6. The method of claim 5, wherein the state vector

Further comprises a specific scene, wherein the specific scene comprises that the object to be taken has an unfixed structureAnd (5) determining a scene.

7. The method according to claim 1, wherein the controlling the mechanical arms to perform angular adjustment according to the grabbing path to grab the object to be taken comprises:

obtaining the motion track of the mechanical arm by using a smooth track interpolation method according to the grabbing path;

and controlling each joint of the mechanical arm to adjust the angle according to the motion track so as to grab the object to be fetched.

8. The method of any one of claims 1-7, further comprising:

and according to the target object image, if the type of the target object cannot be determined, acquiring the target object image again through the visual sensor.

9. An apparatus for controlling a robot arm, comprising:

the processing module is used for determining the type information and the pose information of the object to be acquired according to the target object image;

the processing module is further used for determining a grabbing path according to the type information and the pose information, wherein the grabbing path is a moving path from the current position of the mechanical arm to the position of the object to be fetched;

and the processing module is also used for controlling each joint of the mechanical arm to carry out angle adjustment according to the grabbing path so as to grab the object to be fetched.

10. An electronic device, comprising: a memory, a processor;

a memory; a memory for storing the processor-executable instructions;

a processor for implementing the robot arm control method of any one of claims 1 to 8 in accordance with executable instructions stored in the memory.

11. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the robot arm control method of any one of claims 1 to 8.

12. A computer program product comprising instructions which, when executed by a processor, carry out the robot arm control method of any of claims 1 to 8.