CN114037753A

CN114037753A - Object grabbing method applied to intelligent equipment, intelligent equipment and storage medium

Info

Publication number: CN114037753A
Application number: CN202111284094.6A
Authority: CN
Inventors: 张兴全
Original assignee: Hangzhou Ezviz Software Co Ltd
Current assignee: Hangzhou Ezviz Software Co Ltd
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-02-11

Abstract

The embodiment of the application discloses an object grabbing method applied to intelligent equipment, the intelligent equipment and a storage medium. The scheme is applied to intelligent equipment and comprises an equipment body and a mechanical arm for grabbing an object; the method comprises the following steps: determining a first position of a key point on a target object according to a first image shot by a first camera device arranged on an equipment body; the first image comprises a target object; controlling the mechanical arm to move so that a second camera device arranged on the mechanical arm is located at a second position corresponding to the first position; determining pose information corresponding to the target object according to a second image of the target object shot by a second camera at a second position; and controlling the mechanical arm to move from the second position to the grabbing position corresponding to the target object according to the pose information, and grabbing the target object at the grabbing position. The technical scheme can accurately determine the pose information of the target object, so that the target object can be accurately grabbed based on the pose information.

Description

Object grabbing method applied to intelligent equipment, intelligent equipment and storage medium

Technical Field

The invention relates to the technical field of equipment automation and control, in particular to an object grabbing method applied to intelligent equipment, the intelligent equipment and a storage medium.

Background

The traditional automatic control equipment (such as an intelligent robot) is widely applied to the industrial field due to the characteristics of high rigidity, high strength, high precision and high speed, however, when a complex and changeable object carries out interactive operation, the characteristics of high rigidity, high strength and high precision of the automatic control equipment become the defect that the automatic control equipment cannot be used for such tasks. For example, when a robot is used to grasp an object, since it is difficult to design and manufacture a robot-like finger and the control process of tens of joints of a robot finger is complicated, it is a great challenge to grasp an object accurately by using a robot.

Disclosure of Invention

An object of the embodiment of the application is to provide an object grabbing method applied to an intelligent device, the intelligent device and a storage medium, so as to solve the problem that the accuracy of grabbing an object by the existing intelligent device is low.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

in one aspect, an embodiment of the present application provides an object grasping method applied to an intelligent device, where the intelligent device includes a device body and a robot arm for grasping an object; a first camera device is mounted on the equipment body, and a second camera device is mounted on the mechanical arm; the method comprises the following steps:

determining a first position of a key point on a target object according to a first image shot by the first camera device; the first image comprises the target object;

controlling the mechanical arm to move so that the second camera device is located at a second position corresponding to the first position;

determining pose information corresponding to the target object according to a second image of the target object shot by the second camera at the second position;

and controlling the mechanical arm to move from the second position to a grabbing position corresponding to the target object according to the pose information, and grabbing the target object at the grabbing position.

On the other hand, the embodiment of the application provides intelligent equipment, which comprises a control device, an equipment body and a mechanical arm for grabbing an object; a first camera device is mounted on the equipment body, and a second camera device is mounted on the mechanical arm; wherein the content of the first and second substances,

the first camera shooting device is used for shooting a first image and transmitting the first image to the control device; the first image comprises a target object;

the control device is used for determining a first position where a key point on the target object is located according to the first image; controlling the mechanical arm to move so that the second camera device is located at a second position corresponding to the first position;

the second camera shooting device is used for shooting a second image of the target object at the second position and transmitting the second image to the control device;

the control device is further configured to determine pose information corresponding to the target object according to the second image; controlling the mechanical arm to move from the second position to the grabbing position corresponding to the target object according to the pose information, and driving the mechanical arm to execute grabbing actions;

the mechanical arm is used for grabbing the target object under the driving of the control device.

In another aspect, an embodiment of the present application provides an intelligent device, which includes a processor and a memory electrically connected to the processor, where the memory stores a computer program, and the processor is configured to invoke and execute the computer program from the memory to implement the above object capture method applied to the intelligent device.

In another aspect, an embodiment of the present application provides a storage medium for storing a computer program, where the computer program is executable by a processor to implement the above object capture method applied to a smart device.

By adopting the technical scheme of the embodiment of the application, the first position where the key point on the target object is located is determined according to the first image shot by the first camera device arranged on the equipment body, wherein the first image comprises the target object. And then controlling the mechanical arm to move to a second position corresponding to the first position, and further determining the pose information corresponding to the target object according to a second image of the target object, which is shot at the second position by a second camera device installed on the mechanical arm. Therefore, according to the technical scheme, the first camera device on the equipment body, the second camera device on the mechanical arm and the mechanical arm are cooperated with each other, so that the second camera device can be controlled to shoot the target object at an accurate shooting position, and accurate pose information can be acquired. Furthermore, according to the pose information, the mechanical arm is controlled to move to the grabbing position corresponding to the target object from the second position, and the target object is grabbed at the grabbing position, so that the intelligent device can grab the object based on the accurate pose information, and the accuracy of grabbing the object by the intelligent device is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic flowchart of an object grabbing method applied to a smart device according to an embodiment of the present application;

FIG. 2 is a schematic view of a minimum bounding box of a target object according to an embodiment of the present disclosure;

FIG. 3 is a schematic view of a minimum bounding box of a target object according to another embodiment of the present disclosure;

fig. 4 is a schematic flowchart of an object capture method applied to a smart device according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of an intelligent device provided by an embodiment of the present application;

fig. 6 is a schematic block diagram of an intelligent device according to another embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an intelligent device, including controlling means, equipment body and the arm that is used for snatching the object. Optionally, if the intelligent device is an intelligent robot, the first camera device may be installed at a head position of the intelligent robot, and the second camera device may be installed at a wrist position of the intelligent robot. The first camera device is used for shooting a first image of a target object and transmitting the first image to the control device. The second camera device is used for shooting a second image of the target object at a second position and transmitting the second image to the control device. The control device is used for determining the pose information of the target object according to the first image and the second image, further determining the corresponding grabbing position of the target object according to the pose information, and controlling the mechanical arm to move to the grabbing position. The mechanical arm is used for grabbing the target object under the driving of the control device. How the smart device performs the object grasping method is described in detail below.

Fig. 1 is a schematic flowchart of an object grabbing method applied to a smart device according to an embodiment of the present application, and as shown in fig. 1, the method applied to the smart device in the above embodiment includes the following steps S102 to S108:

s102, determining a first position where a key point on a target object is located according to a first image shot by a first camera device; wherein the first image comprises a target object.

Optionally, the first camera means comprises an RGB camera and a depth camera. Based on this, the first image captured by the first image capturing device may include an RGB image captured by an RGB camera and a depth image captured by a depth camera.

In this embodiment, the key point is a point on the target object that meets a preset condition, where the preset condition includes at least one of the following: the point closest to the first image pickup device belongs to a point of a specified type on the target object. The specified type may be a type that is different from most other points on the target object, for example, when the target object is a cube, the point of the specified type may be a vertex of the cube.

And S104, controlling the mechanical arm to move so that the second camera device is located at a second position corresponding to the first position.

And the second position is the position of the second camera device on the mechanical arm. A preset positional relationship between the first position and the second position may be predetermined so that after the first position of the key point of the target object is acquired, the second position may be determined according to the preset positional relationship. When the first position and the second position meet the preset position relation, the image effect of the target object shot by the second camera device at the second position is optimal. The determination method of the preset position relationship will be described in detail in the following embodiments, and will not be described herein again.

And S106, determining the corresponding pose information of the target object according to the second image of the target object shot by the second camera device at the second position.

Wherein the pose information may include pose information and/or centroid position information of the target object.

Optionally, the second camera comprises an RGB camera and a depth camera. And the second image obtained by shooting through the second camera shooting device comprises an RGB image shot by the RGB camera and a depth image shot by the depth camera.

And S108, controlling the mechanical arm to move from the second position to the grabbing position corresponding to the target object according to the pose information, and grabbing the target object at the grabbing position.

In this embodiment, the grasping position of the target object may be determined according to the pose information of the target object. Optionally, the grabbing path of the mechanical arm can be planned according to the pose information of the target object, so that the mechanical arm is controlled to grab the target object according to the grabbing path.

In one embodiment, the first camera includes an RGB camera and a depth camera, and the first image includes an RGB image captured by the RGB camera and a depth image captured by the depth camera. Based on this, in performing S102, a first location at which a keypoint on the target object is located may be determined based on the following steps A1-A3:

step a1, determining first point cloud information corresponding to the first image according to the RGB image and the depth image captured by the first imaging device.

Optionally, the first point cloud information corresponding to the first image includes three-dimensional coordinate information corresponding to each point on the target object in the first image. The three-dimensional coordinate information can be three-dimensional coordinate information in a world coordinate system and also can be three-dimensional coordinate information in a camera coordinate system.

Optionally, the RGB image carries RGB information of each pixel point on the target object in the first image, and the depth image carries depth information of each pixel point on the target object in the first image. RGB-D information of each pixel point on the target object in the first image can be obtained by fusing the RGB information and the depth information of each pixel point on the target object, wherein the RGB-D information comprises the RGB information and the depth information of each pixel point on the target object. And then, converting the RGB-D information of each pixel point into three-dimensional coordinate information to obtain first point cloud information of the target object.

Step A2, determining points meeting preset conditions on the target object as key points based on the first point cloud information; the preset conditions include at least one of the following: the point closest to the first image pickup device belongs to a point of a specified type on the target object.

In this step, if the keypoint belongs to a point of a specified type on the target object, the keypoint may be determined according to the definition of the specified type. For example, if the point of the designated type is a vertex, a certain vertex on the target object in the first image is taken as a key point; for example, if the specified type of point is a centroid point, the centroid point of the target object in the first image is calculated, and the centroid point is determined as a key point.

If the key point is the closest point on the target object to the first image capture device, the key point on the target object can be determined in at least two ways.

In the first mode, the key points are determined by the distance between each point on the target object in the first image: according to the first point cloud information corresponding to the target object in the first image, the distance (such as Euclidean distance) between the upper point of the target object and the point is calculated, and then the point corresponding to the minimum distance value is selected from the calculated distance values to serve as the key point.

In a second mode, determining key points according to the depth information of each pixel point on the target object in the first image: and determining a point closest to the first imaging device in each point according to the depth information of each point, and taking the point as a key point.

And A3, determining the first position of the key point according to the three-dimensional coordinate information corresponding to the key point.

In this embodiment, the accuracy of the determined key points is improved by acquiring point cloud information of each point on the target object and determining, according to the point cloud information, a point on the target object closest to the first camera device or a point of a specified type as a key point.

In one embodiment, to determine the first locations of key points on the target object, first point cloud information for each point on the target object in the first image is obtained. The first point cloud information corresponding to the first image may be determined based on the following steps:

firstly, a preset target detection algorithm is utilized to carry out image detection on the RGB image, and the minimum external frame of the target object on the first image is determined according to the detection result.

In this step, before detecting the RGB image in the first image, preprocessing operations are performed on the RGB image, where the preprocessing operations include normalization, averaging, and the like. Then, a preset target detection algorithm is used for carrying out image detection on the preprocessed RGB image, wherein the target detection algorithm can be any one of a Faster R-CNN algorithm, a YOLO algorithm, an SSD algorithm and the like. Assuming that the target object is a cube, after image detection is performed on the RGB image by using a target detection algorithm, a minimum bounding box of the cube, such as the bounding rectangle 20 shown in fig. 2, is obtained.

Secondly, determining first point cloud information of the target object in the minimum external frame according to RGB information of each pixel point carried by the RGB image and depth information of each pixel point carried by the depth image. The first point cloud information comprises three-dimensional coordinate information corresponding to each point on the target object.

In this embodiment, the RGB information of each pixel point carried by the RGB image and the depth information of each pixel point carried by the depth image are fused, so as to obtain the RGB-D information of the target object in the minimum bounding box. When the first point cloud information of the target object in the minimum external frame is determined, the RGB-D information of the target object in the minimum external frame can be converted into three-dimensional coordinate information, so that the first point cloud information of the target object is obtained.

A world coordinate system is established by taking a certain static object in the environment as a coordinate origin, and the positions of all points on a target object in the environment are represented by three-dimensional coordinates. The conversion process of converting the RGB-D information of the target object within the minimum bounding box to three-dimensional coordinate information in the camera coordinate system is characterized by equation (1) below:

wherein d is depth information of pixel points on the target object; s is a unit conversion factor for converting the unit of the depth information d output by the depth information sensor in the depth camera into a unit (meter) in the camera coordinate system, and assuming that the unit of the depth information d output by the depth information sensor in the depth camera is millimeter, s is 1000 (i.e., 1 meter is 1000 millimeters). u is the abscissa of the pixel in the pixel coordinate system, v is the ordinate of the pixel in the pixel coordinate system, f_x、f_y、c_x、c_yRespectively are internal reference coefficients of the camera; x, y, and z are coordinate values of x, y, and z axes orthogonal to each other in the camera coordinate system.

The following represents a conversion process of converting the three-dimensional coordinate information in the camera coordinate system into the three-dimensional coordinate information in the world coordinate system by formula (2):

wherein, x ', y ' and z ' are coordinate values of an x axis, a y axis and a z axis which are mutually orthogonal in a world coordinate system respectively. R, T, representing the transformation relationship between the camera coordinate system and the world coordinate system, and all being the external parameters of the camera, specifically, R represents the rotation, and includes the attitude information of the coordinate points in the camera coordinate system in the world coordinate system; t represents translation, including position information of coordinate points in the camera coordinate system in the world coordinate system. The three-dimensional coordinate information (x ', y ', z ') is coordinate information of the pixel point in the world coordinate system.

The first point cloud information may be three-dimensional coordinate information of each point on the target object in a camera coordinate system, or may be three-dimensional coordinate information of each point on the target object in a world coordinate system, which is not limited in the present application.

In this embodiment, the RGB-D information of each pixel point on the target object is obtained by fusing the information of the RGB image and the depth image of each pixel point on the target object, and then the RGB-D information is converted to generate three-dimensional coordinate information (first point cloud information) corresponding to each point on the target object, so that the key point on the target object and the first position corresponding to the key point are calculated according to the first point cloud information.

In one embodiment, in order for the second camera to capture a more accurate second image of the target object, the robot arm is moved so that the second camera is located at a second position corresponding to the first position of the keypoint. The preset position relationship between the first position and the second position can be preset, and then the second position corresponding to the first position is determined according to the preset position relationship between the first position and the second position, so that the mechanical arm is controlled to move to the second position.

In this embodiment, the preset position relationship includes at least one of the following: the second position is located at the preset height of the first position, a preset angle is formed between a connecting line of the second position and the first position and a connecting line of the first position and the first camera device, and the center position of the minimum outer connecting frame of the target object on the second image is overlapped with the center position of the second image.

Optionally, the preset positional relationship between the first position and the second position is a preset angle between a connection line between the second position and the first position and a connection line between the first position and the first image capturing device. For example, the first imaging device is provided at a tip position of the apparatus body, and the second imaging device is provided at a wrist position of the apparatus body. When a preset angle between a connecting line of the second position and the first position and a connecting line of the first position of the key point on the target object and the first camera device is 30 degrees, the second position is the best shooting position, and the image effect of the second image shot by the second camera device at the second position is better.

Optionally, the preset positional relationship between the first position and the second position is that the center position of the minimum bounding box of the target object on the second image coincides with the center position of the second image. For example, the second image captured by the second imaging device is an RGB image captured by an RGB camera, and the pixel size of the RGB image is the pixel size of the RGB camera in the second imaging device. When the minimum external frame of the target object in the second image is located at the center position of the second image, the center position of the second image is overlapped with the center position in the minimum external frame of the target object on the second image, at the moment, the second position is the best shooting position, and the image effect of the second image shot by the second camera device at the second position is better.

Optionally, the preset positional relationship between the first position and the second position is that the second position is located at a preset height of the first position, a preset angle is formed between a connection line between the second position and the first position and a connection line between the first position and the first image capturing device, and the center position of the minimum outer frame of the target object on the second image coincides with the center position of the second image.

For example, the preset height is that the second position is located 10 cm higher than the first position, the preset angle between the connecting line of the second position and the first position and the connecting line of the first position and the first camera (which can be located at the top end position of the equipment body) is 35 degrees, and the minimum outer frame of the target object in the second image is located at the center position of the second image. The second position is the best shooting position, and the image effect of the second image shot by the second camera device at the second position is better.

In this embodiment, after the first position of the key point is determined, the second position is determined according to the preset position relationship, and the mechanical arm is controlled to accurately and rapidly move so that the second camera device is located at the second position; in addition, the second position determined through the preset position relation is more accurate, so that the image effect of the second image shot by the second camera device is better, and the pose information of the target object can be conveniently determined according to the second image.

In one embodiment, before determining the first position of the key point on the target object according to the first image captured by the first camera, the preset position relationship between the first position and the second position may be determined by performing an experimental simulation on the sample object, so that when the preset position relationship is satisfied between the first position and the second position, the image effect of the second image captured by the second camera is better. That is, before executing S102, the following steps B1-B5 may be executed to determine the preset positional relationship:

in step B1, a first sample image of a sample object is captured by a first imaging device.

In this step, the first sample image includes an RGB image captured by an RGB camera and a depth image captured by a depth camera. The RGB image carries RGB information of each pixel point on the sample object, and the depth image carries depth information of each pixel point on the sample object.

Step B2, determining a third location of the sample keypoint on the sample object based on the first sample image.

Optionally, the RGB image may be subjected to image detection by using a target detection algorithm, and a minimum bounding box of the sample object in the first sample image is obtained according to the detection result. And then determining sample point cloud information of the sample object in the minimum external frame according to RGB information of each pixel point on the sample object carried by the RGB image and depth information of each pixel point on the sample object carried by the depth image, wherein the sample point cloud information comprises three-dimensional coordinate information corresponding to each point on the sample object.

And step B3, controlling the second camera to move, and shooting the sample object by using the second camera in the moving process of the second camera to obtain a second sample image of the sample object.

And step B4, if the second sample image meets the preset image condition, determining that the position of the second imaging device corresponding to the second sample image is a fourth position.

Alternatively, the preset image condition may be that the sample object is located at the center position of the second sample image; or, the sample object is placed at a preset angle; alternatively, the sharpness of the sample image, etc., may be set as desired.

And step B5, determining the preset position relation between the first position and the second position according to the position relation between the third position and the fourth position.

In this embodiment, a sample object is used as a simulation object, a cooperation process of the first camera device, the second camera device and the mechanical arm is simulated in advance, that is, a shooting position of the second camera device is continuously adjusted, a second sample image is continuously shot in the adjustment process, when the shot second sample image meets a preset image condition, it is indicated that the second camera device is located at an accurate shooting position, and at this time, a relationship between a position of the second camera device and a position of a sample key point is determined to be a preset position relationship, so that the second position of the second camera device can be directly determined by using the preset position relationship in a subsequent operation process, and accuracy of positioning the shooting position of the second camera device is improved.

In one embodiment, the pose information corresponding to the target object may be determined from a second image of the target object captured by the second imaging device at the second location. The second camera device comprises an RGB camera and a depth camera, and the second image comprises an RGB image shot by the RGB camera and a depth image shot by the depth camera.

In this embodiment, when pose information corresponding to a target object is determined, first, second point cloud information corresponding to a second image is determined according to an RGB image and a depth image captured by a second imaging device; and secondly, determining the pose information of the target object according to the second point cloud information and a pose estimation algorithm. Wherein the pose information comprises pose information and/or centroid position information of the target object.

Optionally, the second point cloud information may be obtained by: firstly, preprocessing operations such as normalization, averaging removal and the like are carried out on an RGB image, and the RGB image is detected by using a target detection algorithm to obtain a minimum external frame of a target object; secondly, performing image segmentation processing on the target object in the minimum external frame by using an image segmentation related algorithm in a target detection algorithm, such as a MASK RCNN algorithm or an FCN algorithm, so as to determine a MASK image corresponding to the target object; thirdly, obtaining RGB-D information of the target object in the minimum external frame according to RGB information of each pixel point in a mask image corresponding to the target object and depth information of each pixel point carried by the depth image; and finally, converting the RGB-D information of each pixel point on the target object into three-dimensional coordinate information, thereby obtaining second point cloud information of the target object.

Assuming that the target object is a cube, detecting the cube in the RGB image by using a target detection algorithm, and determining a minimum external frame of the cube; and further carrying out image segmentation processing on the cube in the minimum external frame so as to obtain a mask image of the cube. As shown in fig. 3, (a) in fig. 3 shows a minimum bounding box 30 of the cube, and (b) in fig. 3 shows a mask diagram of the cube.

After the second point cloud information of the target object is determined, the pose information and/or the centroid position information of the target object are determined by using a pose estimation algorithm and the second point cloud information. Alternatively, the pose estimation algorithm may adopt a Densefusion algorithm, and the pose information of the target object may be 6D pose information of the target object.

In the embodiment, multiple calculation is performed on the target object in the minimum external frame by adopting a plurality of target detection algorithms, so that the determined pixel information of the target object is more accurate, and second point cloud information and pose information with higher accuracy can be obtained; based on this, can accurately predict the snatching position and the snatching route of arm, accomplish the motion of snatching high-efficiently.

Fig. 4 is a schematic flowchart of an object grabbing method applied to a smart device according to another embodiment of the present application. In this embodiment, the smart device is a smart robot, and includes a head camera (disposed at a head position of the smart robot), a wrist camera (disposed at a wrist position of the smart robot), and a control device. Wherein, head camera and wrist camera are binocular camera, including RGB camera and depth camera. The head camera is used for shooting a first image and transmitting the first image to the control device; the wrist camera is used for shooting a second image and transmitting the second image to the control device; the control device is used for determining the pose information of the target object according to the first image and the second image, further determining the corresponding grabbing position of the target object according to the pose information, and controlling the mechanical arm to move to the grabbing position. The mechanical arm is used for grabbing the target object under the driving of the control device. How the smart device performs the object grasping method is described in detail below.

As shown in fig. 4, the method includes the following steps S401 to S410:

s401, shooting a first image including a target object by using a head camera; wherein the first image includes an RGB image captured by an RGB camera and a depth image captured by a depth camera.

S402, image detection is carried out on the RGB image by utilizing a preset target detection algorithm so as to determine the minimum external frame of the target object on the first image.

S403, determining first point cloud information of the target object in the minimum external frame according to RGB information of each pixel point carried by the RGB image and depth information of each pixel point carried by the depth image; the first point cloud information comprises three-dimensional coordinate information corresponding to each point on the target object.

S404, determining points meeting preset conditions on the target object as key points based on the first point cloud information.

Wherein the preset condition comprises at least one of the following conditions: the point closest to the first image pickup device belongs to a point of a specified type on the target object.

S405, determining a first position of the key point according to the three-dimensional coordinate information corresponding to the key point.

S406, determining a second position corresponding to the first position according to a preset position relation between the first position and the second position.

Wherein the preset position relationship comprises at least one of the following: the second position is located at the preset height of the first position, a preset angle is formed between a connecting line of the second position and the first position and a connecting line of the first position and the first camera device, and the center position of the minimum outer connecting frame of the target object on the second image is overlapped with the center position of the second image.

The preset positional relationship between the first position and the second position may be predetermined by: shooting a first sample image of a sample object by using a first camera device; determining a third position where a sample key point on the sample object is located according to the first sample image; controlling the second camera device to move, and shooting the sample object by using the second camera device in the moving process of the second camera device to obtain a second sample image of the sample object; if the second sample image meets the preset image condition, determining that the position of a second camera device corresponding to the second sample image is a fourth position; and determining a preset position relation between the first position and the second position according to the position relation between the third position and the fourth position. The detailed implementation of each step has been described in the above embodiments, and is not described herein again.

And S407, controlling the mechanical arm to move so that the wrist camera is located at the second position.

S408, shooting a second image by using the wrist camera, and determining second point cloud information corresponding to the second image according to the RGB image and the depth image shot by the wrist camera.

Wherein the second image comprises an RGB image and a depth image.

In the step, firstly, preprocessing operations such as normalization or de-equalization and the like are carried out on an RGB image shot by a wrist camera, and the RGB image is detected by using a target detection algorithm to obtain a minimum external frame of a target object; secondly, performing image segmentation processing on the target object in the minimum external frame by using an image segmentation related algorithm in a target detection algorithm to determine a mask image corresponding to the target object; thirdly, according to RGB information of each pixel point in a mask image corresponding to the target object and depth information of each pixel point carried in a depth image shot by the wrist camera, RGB-D information of the target object in the minimum external frame is obtained; and finally, converting the RGB-D information of each pixel point on the target object in the minimum external frame into three-dimensional coordinate information of each point on the target object to obtain second point cloud information of the target object. The three-dimensional coordinate information includes, but is not limited to, three-dimensional coordinate system information in a world coordinate system and a camera coordinate system.

And S409, determining the pose information of the target object according to the second point cloud information and the pose estimation algorithm.

Wherein the pose information comprises pose information and/or centroid position information of the target object. Wherein the pose estimation algorithm can adopt a DenseFasion algorithm.

And S410, controlling the mechanical arm to move from the second position to the grabbing position corresponding to the target object according to the pose information, and grabbing the target object at the grabbing position.

By adopting the technical scheme of the embodiment of the application, the first position where the key point on the target object is located is determined according to the first image shot by the first camera device arranged on the equipment body, wherein the first image comprises the target object. And then controlling the mechanical arm to move to a second position corresponding to the first position, and further determining the pose information corresponding to the target object according to a second image of the target object, which is shot at the second position by a second camera device installed on the mechanical arm. Therefore, according to the technical scheme, through mutual cooperation among the first camera device on the equipment body, the second camera device on the mechanical arm and the mechanical arm, the second camera device can be controlled to shoot the target object at an accurate shooting position, a second image with a better image effect is obtained, and more accurate point cloud information and pose information are obtained. Furthermore, according to the pose information, the mechanical arm is controlled to move to the grabbing position corresponding to the target object from the second position, and the target object is grabbed at the grabbing position, so that the intelligent device can grab the object based on the accurate pose information, and the accuracy of grabbing the object by the intelligent device is improved.

In summary, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

The above object grabbing method applied to the intelligent device provided by the embodiment of the application is based on the same idea, and the embodiment of the application further provides the intelligent device. Fig. 5 is a schematic block diagram of an intelligent device provided in an embodiment of the present application, including a device body 500, a control device 520, and a robot arm 540 for grasping an object; a first camera 510 is arranged on the equipment body, and a second camera 530 is arranged on the mechanical arm 540; wherein the content of the first and second substances,

the first camera 510 is configured to capture a first image and transmit the first image to the control device 520; the first image comprises a target object;

the control device 520 is configured to determine a first position where a key point on the target object is located according to the first image; controlling the mechanical arm 540 to move so that the second camera device 530 is located at a second position corresponding to the first position;

the second camera 530 is used for shooting a second image of the target object at the second position and transmitting the second image to the control device 520;

the control device 520 is further configured to determine pose information corresponding to the target object according to the second image; according to the pose information, controlling the mechanical arm 540 to move from the second position to the grabbing position corresponding to the target object, and driving the mechanical arm 540 to execute grabbing action;

the mechanical arm 540 is used for grabbing the target object under the driving of the control device.

By means of the intelligent equipment, the first position where the key point on the target object is located is determined according to the first image shot by the first camera device installed on the equipment body, wherein the first image comprises the target object. And then controlling the mechanical arm to move to a second position corresponding to the first position, and further determining the pose information corresponding to the target object according to a second image of the target object, which is shot at the second position by a second camera device installed on the mechanical arm. Therefore, according to the technical scheme, the first camera device on the equipment body, the second camera device on the mechanical arm and the mechanical arm are cooperated with each other, so that the second camera device can be controlled to shoot the target object at an accurate shooting position, and accurate pose information can be acquired. Furthermore, according to the pose information, the mechanical arm is controlled to move to the grabbing position corresponding to the target object from the second position, and the target object is grabbed at the grabbing position, so that the intelligent device can grab the object based on the accurate pose information, and the accuracy of grabbing the object by the intelligent device is improved.

In one embodiment, the first camera comprises an RGB camera and a depth camera; the first image comprises an RGB image and a depth image; the RGB camera is used for shooting the RGB image; the depth camera is used for shooting the depth image;

the control device 520 is further configured to determine first point cloud information corresponding to the first image according to the RGB image and the depth image captured by the first image capturing device 510; the first point cloud information comprises three-dimensional coordinate information corresponding to each point on the target object; determining a point meeting a preset condition on the target object as the key point based on the first point cloud information; the preset condition comprises at least one of the following conditions: a point which is closest in distance to the first camera 510 and belongs to a specified type on the target object; and determining the first position of the key point according to the three-dimensional coordinate information corresponding to the key point.

In one embodiment, the control device 520 is further configured to:

performing image detection on the RGB image by using a preset target detection algorithm to determine a minimum external frame of the target object on the first image; and determining the first point cloud information of the target object in the minimum external frame according to the RGB information of each pixel point carried by the RGB image and the depth information of each pixel point carried by the depth image.

In one embodiment, the control device 520 is further configured to:

determining the second position corresponding to the first position according to a preset position relation between the first position and the second position; controlling the robotic arm 540 to move to the second position; wherein the preset position relationship comprises at least one of: the second position is located at a preset height of the first position, a preset angle is formed between a connecting line of the second position and the first position and a connecting line of the first position and the first camera device, and the center position of the minimum outer connecting frame of the target object on the second image is overlapped with the center position of the second image.

In one embodiment, the control device 520 is further configured to:

before determining a first position of a key point on a target object according to a first image shot by the first camera 510, shooting a first sample image of a sample object by the first camera 510; determining a third position where a sample key point on the sample object is located according to the first sample image; controlling the second camera 530 to move, and in the moving process of the second camera 530, shooting the sample object by using the second camera 530 to obtain a second sample image of the sample object; if the second sample image meets a preset image condition, determining that the position of the second camera 530 corresponding to the second sample image is a fourth position; and determining the preset position relation between the first position and the second position according to the position relation between the third position and the fourth position.

In one embodiment, the second camera 530 includes an RGB camera and a depth camera; the second image comprises an RGB image and a depth image; the RGB camera is used for shooting the RGB image; the depth camera is used for shooting the depth image;

the control device 520 is further configured to determine second point cloud information corresponding to the second image according to the RGB image and the depth image captured by the second imaging device 530; determining pose information of the target object according to the second point cloud information and a pose estimation algorithm; the pose information includes pose information and/or centroid position information of the target object.

Based on the same idea, an embodiment of the present application further provides an intelligent device, as shown in fig. 6. Smart devices may vary significantly depending on configuration or performance and may include one or more processors 601 and memory 602, where one or more stored applications or data may be stored in memory 602. Wherein the memory 602 may be transient or persistent storage. The application program stored in memory 602 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for use in a smart device. Still further, the processor 601 may be configured to communicate with the memory 602 to execute a series of computer-executable instructions in the memory 602 on the smart device. The smart device may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more input-output interfaces 605, one or more keyboards 606.

In particular, in this embodiment, the smart device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the smart device, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

The embodiment of the present application further provides a storage medium, where the storage medium stores one or more computer programs, where the one or more computer programs include instructions, and when the instructions are executed by an electronic device including multiple application programs, the electronic device can execute the processes of the above embodiment of the object capture method applied to the intelligent device, and the same technical effects can be achieved, and in order to avoid repetition, details are not described here again.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. The object grabbing method applied to the intelligent equipment is characterized in that the intelligent equipment comprises an equipment body and a mechanical arm used for grabbing an object; a first camera device is mounted on the equipment body, and a second camera device is mounted on the mechanical arm; the method comprises the following steps:

2. The method of claim 1, wherein the first camera comprises an RGB camera and a depth camera; the first image comprises an RGB image captured by the RGB camera and a depth image captured by the depth camera;

the determining a first position where a key point on a target object is located according to a first image shot by the first camera device includes:

determining first point cloud information corresponding to the first image according to the RGB image and the depth image shot by the first camera device; the first point cloud information comprises three-dimensional coordinate information corresponding to each point on the target object;

determining a point meeting a preset condition on the target object as the key point based on the first point cloud information; the preset condition comprises at least one of the following conditions: a point which is closest to the first image pickup device and belongs to a specified type on the target object;

and determining the first position of the key point according to the three-dimensional coordinate information corresponding to the key point.

3. The method according to claim 2, wherein the determining the first point cloud information corresponding to the first image according to the RGB image and the depth image captured by the first camera comprises:

performing image detection on the RGB image by using a preset target detection algorithm to determine a minimum external frame of the target object on the first image;

and determining the first point cloud information of the target object in the minimum external frame according to the RGB information of each pixel point carried by the RGB image and the depth information of each pixel point carried by the depth image.

4. The method of claim 1, wherein said controlling the robotic arm to move to a second position corresponding to the first position comprises:

determining the second position corresponding to the first position according to a preset position relation between the first position and the second position;

controlling the mechanical arm to move to the second position;

wherein the preset position relationship comprises at least one of: the second position is located at a preset height of the first position, a preset angle is formed between a connecting line of the second position and the first position and a connecting line of the first position and the first camera device, and the center position of the minimum outer connecting frame of the target object on the second image is overlapped with the center position of the second image.

5. The method according to claim 1, wherein before determining the first position of the key point on the target object from the first image captured by the first camera, the method further comprises:

shooting a first sample image of a sample object by using the first camera device;

determining a third position where a sample key point on the sample object is located according to the first sample image;

controlling the second camera device to move, and shooting the sample object by using the second camera device in the moving process of the second camera device to obtain a second sample image of the sample object;

if the second sample image meets a preset image condition, determining that the position of the second camera device corresponding to the second sample image is a fourth position;

and determining the preset position relation between the first position and the second position according to the position relation between the third position and the fourth position.

6. The method of claim 1, wherein the second camera comprises an RGB camera and a depth camera; the second image comprises an RGB image captured by the RGB camera and a depth image captured by the depth camera;

the determining, according to a second image of the target object captured by the second imaging device at the second position, pose information corresponding to the target object includes:

determining second point cloud information corresponding to the second image according to the RGB image and the depth image shot by the second camera device;

determining pose information of the target object according to the second point cloud information and a pose estimation algorithm; the pose information includes pose information and/or centroid position information of the target object.

7. The intelligent equipment is characterized by comprising a control device, an equipment body and a mechanical arm for grabbing an object; a first camera device is mounted on the equipment body, and a second camera device is mounted on the mechanical arm; wherein the content of the first and second substances,

8. The smart device of claim 7, wherein the first camera comprises an RGB camera and a depth camera; the first image comprises an RGB image and a depth image; the RGB camera is used for shooting the RGB image; the depth camera is used for shooting the depth image;

the control device is further configured to determine first point cloud information corresponding to the first image according to the RGB image and the depth image captured by the first camera device; the first point cloud information comprises three-dimensional coordinate information corresponding to each point on the target object; determining a point meeting a preset condition on the target object as the key point based on the first point cloud information; the preset condition comprises at least one of the following conditions: a point which is closest to the first image pickup device and belongs to a specified type on the target object; and determining the first position of the key point according to the three-dimensional coordinate information corresponding to the key point.

9. The smart device of claim 7, wherein the second camera comprises an RGB camera and a depth camera; the second image comprises an RGB image and a depth image; the RGB camera is used for shooting the RGB image; the depth camera is used for shooting the depth image;

the control device is further used for determining second point cloud information corresponding to the second image according to the RGB image and the depth image shot by the second camera device; determining pose information of the target object according to the second point cloud information and a pose estimation algorithm; the pose information includes pose information and/or centroid position information of the target object.

10. An intelligent device comprising a processor and a memory electrically connected to the processor, the memory storing a computer program, the processor being configured to invoke and execute the computer program from the memory to implement the method of any of claims 1-6.

11. A storage medium for storing a computer program executable by a processor for performing the method of any one of claims 1 to 6.