Disclosure of Invention
An object of embodiments of the present invention is to provide a robot capture method, a terminal, and a computer-readable storage medium, which can ensure stability of a predicted capture position and improve probability of capture success.
In order to solve the technical problem, an embodiment of the present invention provides a method for robot grabbing, including: acquiring first grabbing position information of an object to be grabbed in a first image and second grabbing position information of the object to be grabbed in a second image, wherein the second image is acquired within a preset radius range taking the acquisition position of the first image as the center; judging whether the first grabbing position and the second grabbing position are at the same position or not according to the first grabbing position information and the second grabbing position information; and if the first grabbing position and the second grabbing position are in the same position, carrying out grabbing operation.
An embodiment of the present invention further provides a terminal, including: at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the robot grabbing method.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the robot grabbing method.
Compared with the prior art, the method and the device have the advantages that the first grabbing position of the first image and the second grabbing position of the second image are obtained, whether the first grabbing position and the second grabbing position are located at the same position is judged, if the first grabbing position and the second grabbing position are located at the same position, the determined first grabbing position and the determined second grabbing position are accurate, grabbing operation according to the first grabbing position is further guaranteed, the object to be grabbed can be grabbed accurately, and the grabbing success rate is improved; meanwhile, the second image is acquired after the first image is acquired, and the acquisition position of the second image is within a preset radius range by taking the acquisition position of the first image as a center, namely the first image and the second image are images acquired under different visual angles.
In addition, if it is determined that the first grabbing position and the second grabbing position are not at the same position, the robot grabbing method further comprises the following steps: updating the first image and the second image; and re-acquiring the first grabbing position information and the second grabbing position information until the re-acquired first grabbing position and the re-acquired second grabbing position are at the same position. The first image and the second image are updated, so that the first grabbing position and the second grabbing position are updated, the first image and the second image are not updated until the first grabbing position and the second grabbing position are located at the same position, the determined grabbing position is accurate through continuous judgment, and the grabbing success rate is improved.
In addition, according to first snatching position information and second and snatching position information, judge first snatching position and second and snatch the position and be in same position, specifically include: determining the similarity between the image corresponding to the first grabbing position and the image corresponding to the second grabbing position according to the first grabbing position information and the second grabbing position information; and comparing the similarity with a preset similarity threshold, if the similarity is determined to be greater than or equal to the similarity threshold, determining that the first grabbing position and the second grabbing position are in the same position, otherwise, determining that the first grabbing position and the second grabbing position are in different positions. The image similarity between the image of the first grabbing position and the image corresponding to the second grabbing position is high, and therefore the first grabbing position and the second grabbing position are the same, and whether the first grabbing position and the second grabbing position are the same can be determined quickly by judging the similarity between the image of the first grabbing position and the image of the second grabbing position.
In addition, determining the similarity between the image corresponding to the first capture position and the image corresponding to the second capture position specifically includes: according to the first grabbing position information and the second grabbing position information, calculating two-dimensional similarity between the two-dimensional image of the first grabbing position and the two-dimensional image of the second grabbing position, and calculating three-dimensional similarity between the three-dimensional image of the first grabbing position and the three-dimensional image of the second grabbing position; and fusing the two-dimensional similarity and the three-dimensional similarity according to the respective weights of the two-dimensional similarity and the three-dimensional similarity, and taking the combined similarity as the similarity between the image corresponding to the first grabbing position and the image corresponding to the second grabbing position. Because the images are divided into two-dimensional images and three-dimensional images, the two-dimensional similarity between the two-dimensional image at the first grabbing position and the two-dimensional image at the second grabbing position is independently used as the similarity, or the three-dimensional similarity between the three-dimensional image at the first grabbing position and the three-dimensional image at the second grabbing position is independently used as the similarity, the accuracy of the similarity can be reduced, and the accuracy of the calculated similarity can be effectively improved by combining the similarity of the three-dimensional images and the similarity of the two-dimensional images.
In addition, calculating the two-dimensional similarity between the two-dimensional image of the first capture position and the two-dimensional image of the second capture position specifically includes: respectively determining a first feature vector of a central point of the two-dimensional image of the first grabbing position and a second feature vector of a central point of the two-dimensional image of the second grabbing position; determining a first included angle between the first characteristic vector and the second characteristic vector; and determining two-dimensional similarity according to the first included angle. If the two grabbing positions are located at the same position, the central points of the two grabbing positions are the same, and if the included angle between the feature vector of the central point of the first grabbing position and the feature vector of the central point of the second grabbing position is larger, the difference between the first grabbing position and the second grabbing position is larger, the probability that the first grabbing position and the second grabbing position are different positions is larger, and based on the fact, the two-dimensional similarity between the images of the two grabbing positions can be reflected quickly through the included angle between the feature vectors of the central points of the two grabbing positions.
In addition, calculating the three-dimensional similarity between the three-dimensional image of the first capture position and the three-dimensional image of the second capture position specifically includes: respectively determining a third feature vector of the central point of the three-dimensional image at the first grabbing position and a fourth feature vector of the central point of the three-dimensional image at the second grabbing position; determining a second included angle between the third feature vector and the fourth feature vector; and determining the three-dimensional similarity according to the second included angle. The similarity between the three-dimensional images of the two grabbing positions is the same as the similarity determination principle between the two-dimensional images of the two grabbing positions, so that the two-dimensional similarity and the three-dimensional similarity are convenient to fuse, and the fusion deviation is reduced.
In addition, updating the first image and the second image specifically includes: acquiring a first acquisition position for acquiring a second image; selecting a second acquisition position within a preset radius range by taking the first acquisition position as a center; acquiring a third image according to the second acquisition position; the second image is taken as the updated first image, and the third image is taken as the updated second image. The updated second image is obtained after the acquisition position is moved, and a new grabbing position is obtained again through images at different viewing angles, so that the accurate grabbing position can be found quickly.
In addition, acquiring first grabbing position information of an object to be grabbed in the first image and second grabbing position information of the object to be grabbed in the second image, specifically including: inputting the first image into a preset grabbing position determination model to obtain first grabbing position information, wherein the grabbing position determination model is obtained by training according to training image data and grabbing position information of an object to be grabbed in each training image data; and inputting the second image into a preset grabbing position determining model to obtain second grabbing position information. The grabbing position can be quickly determined through the grabbing position determination model.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
A first embodiment of the invention relates to a method of robotic grasping. The robot grabbing method is applied to a robot, and the robot can be a single robot arm or an intelligent robot with a grabbing arm. The specific flow of the robot grabbing method is shown in fig. 1, and includes:
step 101: acquiring first grabbing position information of an object to be grabbed in a first image and second grabbing position information of the object to be grabbed in a second image, wherein the second image is acquired within a preset radius range taking the acquisition position of the first image as a center.
Specifically, a first image including an object to be grasped is acquired, and after the first image is acquired, a second image is acquired within a preset radius range centered on the acquisition position of the first image. In order to ensure the accuracy of determining whether the first grasping position and the second grasping position are the same position, the preset radius should not be excessively large, for example, the preset radius may be 1 cm. The shape of the preset radius range centered on the acquisition position of the first image may be within a cube having a radius of 1 cm centered on the acquisition position of the first image.
The mode of acquiring the second image within the preset radius range can be that the acquisition position is randomly moved within the preset radius range, the image is acquired at a new acquisition position, and the acquired image is used as the second image; it is also possible to keep the capture position within the preset radius range, i.e., capture the image again at the capture position of the first image and take the captured image as the second image. In the embodiment, the situation that the difference between the acquired third image and the acquired second image is too large to influence the judgment of the capture position is avoided, and the second image is acquired after the acquisition position is randomly moved within the preset radius range.
In a specific implementation, a first image is input into a preset grabbing position determination model to obtain first grabbing position information, and the grabbing position determination model is obtained by training according to training image data and grabbing position information of an object to be grabbed in each training image data; and inputting the second image into a preset grabbing position determining model to obtain second grabbing position information.
Specifically, the grasp location determination model may be trained in a deep learning manner, such as a Convolutional Neural Network (CNN). The training image data can be color (RGB) images and depth images of different objects acquired under different visual angles, or RGB images and depth images rendered by projecting around multiple angles of each 3D model according to stored textured 3D models of different objects.
It should be noted that the grasping position information includes position information and grasping angle composition of the grasping quadrangle, and may be generally expressed by using a quintuple (x, y, w, h, θ), where (x, y) represents coordinates of a center position of the grasping quadrangle, w represents a length of the grasping quadrangle, that is, a length of a parallel opening of the grasping unit (e.g., grasping palm), h represents a width of the grasping quadrangle, and θ represents an angle between the grasping quadrangle and the horizontal axis. Fig. 2 is a representation of grabbing position information of an object to be grabbed, wherein a square in fig. 2 is the grabbing quadrangle, and reference numeral 10 in fig. 2 represents the object to be grabbed.
Marking each training image in the training image data, marking the most suitable central position (x, y, w, h) of the grabbing quadrangle and the included angle theta between the grabbing component and the horizontal axis, and training the marked training image data to obtain the grabbing position determination model. The grasping position determining model may adopt a neural network structure as shown in fig. 3, and the number of network layers of the model is 7; during operation, an RGB image of the object to be grabbed is input, scaled to a preset pixel (for example, 227 × 227 pixels), and input to the model, so that the grabbing position information of the object to be grabbed in the current image can be predicted.
Step 102: judging whether the first grabbing position and the second grabbing position are at the same position or not according to the first grabbing position information and the second grabbing position information; if the first grabbing position and the second grabbing position are determined to be at the same position, step 103 is executed, and if the first grabbing position and the second grabbing position are determined to be at different positions, step 104 is executed.
In a specific implementation, according to the first grabbing position information and the second grabbing position information, determining the similarity between the image corresponding to the first grabbing position and the image corresponding to the second grabbing position; and comparing the similarity with a preset similarity threshold, if the similarity is determined to be greater than or equal to the similarity threshold, determining that the first grabbing position and the second grabbing position are in the same position, otherwise, determining that the first grabbing position and the second grabbing position are in different positions.
Specifically, the preset similarity threshold may be set according to actual needs, for example, the preset similarity threshold may be set to 90%. Generally, in order to accurately capture an object, the first and second captured images include RGB images and depth images. Thus, the first grasp location information includes two-dimensional location information of the first grasp location and three-dimensional location information of the first grasp location, and the second grasp location information includes two-dimensional location information of the second grasp location and three-dimensional location information of the second grasp location.
In a specific implementation, a specific process of determining the similarity between the image corresponding to the first capture position and the image corresponding to the second capture position is as follows: according to the first grabbing position information and the second grabbing position information, calculating two-dimensional similarity between the two-dimensional image of the first grabbing position and the two-dimensional image of the second grabbing position, and calculating three-dimensional similarity between the three-dimensional image of the first grabbing position and the three-dimensional image of the second grabbing position; and fusing the two-dimensional similarity and the three-dimensional similarity according to the respective weights of the two-dimensional similarity and the three-dimensional similarity, and taking the combined similarity as the similarity between the image corresponding to the first grabbing position and the image corresponding to the second grabbing position.
Specifically, the first image and the second image each include a two-dimensional image and a three-dimensional image. If only the two-dimensional similarity between the two-dimensional image at the first capture position in the first image and the two-dimensional image at the second capture position in the second image is considered, or only the three-dimensional similarity between the three-dimensional image at the first capture position in the first image and the three-dimensional image at the second capture position in the second image is calculated, only the two-dimensional similarity or the three-dimensional similarity is used for representing the similarity between the image corresponding to the first capture position and the image corresponding to the second capture position, and the accuracy of the similarity is low, in the embodiment, the accuracy of the similarity is effectively improved by fusing the two-dimensional similarity and the three-dimensional similarity.
In this embodiment, the similarity may be a fusion method as shown in formula (1):
sim (p1, p2) ═ α × Sim2D + β × Sim3D formula (1);
where p1 denotes a first grasping position, p2 denotes a second grasping position, sim2D denotes a two-dimensional similarity, sim3D denotes a three-dimensional similarity, α denotes a weight of the two-dimensional similarity, β denotes a weight of the three-dimensional similarity, and the weight of the two-dimensional similarity and the weight of the three-dimensional similarity satisfy, and α + β may be 1. In the present embodiment, α is set to 0.5 and β is set to 0.5.
The following describes a two-dimensional similarity determination process and a three-dimensional similarity determination process, respectively.
The calculation of the two-dimensional similarity includes the substeps shown in fig. 4.
Substep 1021: and respectively determining a first feature vector of the central point of the two-dimensional image of the first grabbing position and a second feature vector of the central point of the two-dimensional image of the second grabbing position.
Specifically, the feature descriptor of the central point of the two-dimensional image at the first capture position is obtained, and the feature descriptor may adopt Features such as Scale-invariant feature transform (SIFT), Speeded Up Robust Features (SURF), or Oriented FAST Robust binary descriptor (ORB), and in this embodiment, the feature descriptor of the central point is obtained in a SIFT manner, and the feature descriptor of the central point of the two-dimensional image at the first capture position is a first feature vector and is represented as Vec2dp1. Similarly, the second can be obtained in the same mannerFeature vector, which can be expressed as Vec2dp2。
Substep 1022: a first angle between the first eigenvector and the second eigenvector is determined.
Specifically, the magnitude of the first angle may be measured by a trigonometric function, for example, by using a cosine formula, to obtain cos (θ), and the magnitude of the first angle may be expressed by a value of cos (θ).
Substep 1023: and determining two-dimensional similarity according to the first included angle.
In particular, if Vec2dp1And Vec2dp2If the two are identical, cos (theta) is equal to 1, and simultaneously, cos (theta) is equal to-1, 1]The closer the value of cos (θ) is to 1, the more the first and second eigenvectors are similar. Therefore, in the present embodiment, the value of cos (θ) is used as the value of the two-dimensional similarity. That is to say that the first and second electrodes,
the three-dimensional similarity calculation includes the substeps shown in fig. 5.
Substep 1031: and respectively determining a third feature vector of the central point of the three-dimensional image at the first grabbing position and a fourth feature vector of the central point of the three-dimensional image at the second grabbing position.
Specifically, the three-dimensional depth image is converted into a 3D point cloud image, the feature description vector of the central point of the three-dimensional image at the first capture position is used as a third feature vector, and the feature description vector of the central point of the three-dimensional image at the second capture position is used as a fourth feature vector. The Feature descriptor of the 3D Point may be determined by using features such as a Point Feature Histogram (PFH) Feature, a Spin Images Feature, a Histogram of oriented Signatures (boot), and the like, and in the present embodiment, a PFH Feature extraction algorithm is used to determine the third Feature vector and the fourth Feature vector. PFH characterizes the distribution of points in the p neighborhood of a 3D point. Selecting all points contained in a sphere with the distance p and the point radius r, and calculating the normal direction of each pointQuantity, for each pair of points p in the neighborhood of points piAnd pj(i≠j,j<i) One of them is selected as a source point psOne of which serves as target point ptAnd the included angle between the normal direction of the source point and the connecting line between the two points is ensured to be smaller. The local frame formed by the source point and the target point is shown in fig. 6, fig. 6 adopts a UVW coordinate system, and ns and nt are normal vectors of the source point and the target point respectively.
Source point p
sAnd target point p
tThe link between can be defined by the following 3 features: α ═ v · n
t,
θ=arctan(w·n
t,u·n
t). For each pair of points in the neighborhood of point p, these three features are computed; dividing the value range of each characteristic into b barrel intervals, wherein the three characteristics form b
3And each barrel interval. The three characteristics obtained by calculating each point pair uniquely correspond to a bucket interval, the proportion of the number of the point pairs in each bucket to the number of all the point pairs is calculated and used as the description of the bucket, and the bucket description is combined to form b
3The vector of the dimension is used as the PFH feature description of the point. Let b be 4, which usually results in a 64-dimensional feature descriptor.
Representing the 3D descriptor of the central point of the three-dimensional image of the first grabbing position as a third feature vector as Vec3D through a PFH algorithmp1The fourth feature vector is denoted as Vec3dp2
Substep 1032: and determining a second included angle between the third feature vector and the fourth feature vector.
Specifically, the magnitude of the second angle can be measured using trigonometric functions, in a manner similar to the characterization of the first angle.
Substep 1033: and determining the three-dimensional similarity according to the second included angle.
In particular, if Vec3d
p1And Vec3d
p2Are completely identical, then
At the same time, because of the fact that,
then
The closer to 1 the value of (d) indicates that the third feature vector is more similar to the fourth feature vector. Therefore, in this embodiment
The value of (c) is taken as the value of the three-dimensional similarity. That is to say that the first and second electrodes,
step 103: and executing grabbing operation.
Specifically, the grasping operation can be performed after the first grasping position and the second grasping position are determined to be at the same position.
Step 104: and executing emergency operation corresponding to the grabbing position determination error.
Specifically, the emergency operation may be to end the grabbing process, or to send a prompt message indicating that the grabbing position is determined incorrectly to a human, and the grabbing position is set manually.
Compared with the prior art, the method and the device have the advantages that the first grabbing position of the first image and the second grabbing position of the second image are obtained, whether the first grabbing position and the second grabbing position are located at the same position is judged, if the first grabbing position and the second grabbing position are located at the same position, the determined first grabbing position and the determined second grabbing position are accurate, grabbing operation according to the first grabbing position is further guaranteed, the object to be grabbed can be grabbed accurately, and the grabbing success rate is improved; meanwhile, the second image is acquired after the first image is acquired, and the acquisition position of the second image is within a preset radius range by taking the acquisition position of the first image as a center, namely the first image and the second image are images acquired under different visual angles.
A second embodiment of the invention relates to a method of robotic grasping. The robot grabbing method comprises the following steps: acquiring first grabbing position information of an object to be grabbed in a first image and second grabbing position information of the object to be grabbed in a second image; judging whether the first grabbing position and the second grabbing position are at the same position or not according to the first grabbing position information and the second grabbing position information; and if the first grabbing position and the second grabbing position are in the same position, carrying out grabbing operation.
The second embodiment is a further improvement of the first embodiment, and the main improvements are as follows: the robot grabbing method further comprises the following steps: and if the first grabbing position and the second grabbing position are determined to be in different positions, the first grabbing position and the second grabbing position are determined again. The specific flow of the robot grabbing method is shown in fig. 7.
Step 201: and acquiring first grabbing position information of the object to be grabbed in the first image and second grabbing position information of the object to be grabbed in the second image.
Step 202: judging whether the first grabbing position and the second grabbing position are at the same position or not according to the first grabbing position information and the second grabbing position information, and executing a step 203 if the first grabbing position and the second grabbing position are determined to be at the same position; if the first and second grasping positions are determined to be at different positions, step 204 is performed.
Step 203: and executing grabbing operation.
Step 204: and updating the first image and the second image, re-acquiring the first capture position information and the second capture position information, and returning to execute the step 202.
Specifically, if the first image and the second image are determined to be at the same position after the step 202 is performed, the step 203 is performed, otherwise, the step 204 is continued until the reacquired first capture position and the reacquired second capture position are at the same position.
The specific process of updating the first image and the second image is as follows: acquiring a first acquisition position for acquiring a second image; selecting a second acquisition position within a preset radius range by taking the first acquisition position as a center; acquiring a third image according to the second acquisition position; the second image is taken as the updated first image, and the third image is taken as the updated second image. The preset radius is substantially the same as the preset radius in the first embodiment, that is, the preset radius should not be set too large, so as to avoid the situation that the difference between the acquired third image and the second image is too large. The second image is taken as the updated first image, and the third image is taken as the updated second image. The manner of acquiring the first grasp position information and the second grasp position information again is substantially the same as that of the first embodiment, that is, the updated first image is input to the grasp position determination model, the first grasp position information is acquired again, the updated second image is input to the grasp position determination model, and the second grasp position information is acquired again. Then, returning to step 202, it is determined whether the first grasping position and the second grasping position are at the same position.
According to the robot grabbing method provided by the embodiment, the first grabbing position and the second grabbing position are updated by updating the first image and the second image, the first image and the second image are not updated until the first grabbing position and the second grabbing position are located at the same position, the determined grabbing positions are accurate through continuous judgment, and therefore the grabbing success rate is improved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A third embodiment of the present invention relates to a terminal having a structure as shown in fig. 8, including: at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; wherein the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform the method of robot grabbing as in the first embodiment or the second embodiment.
The memory 302 and the processor 301 are connected by a bus, which may include any number of interconnected buses and bridges that link one or more of the various circuits of the processor and the memory together. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 301 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 301.
The processor 301 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 302 may be used to store data used by the processor in performing operations.
A third embodiment of the present invention relates to a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of robot grasping as in the first or second embodiment.
Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.