WO2022227678A1 - Three-dimensional target detection method and grabbing method, apparatus, and electronic device - Google Patents

Three-dimensional target detection method and grabbing method, apparatus, and electronic device Download PDF

Info

Publication number
WO2022227678A1
WO2022227678A1 PCT/CN2021/143443 CN2021143443W WO2022227678A1 WO 2022227678 A1 WO2022227678 A1 WO 2022227678A1 CN 2021143443 W CN2021143443 W CN 2021143443W WO 2022227678 A1 WO2022227678 A1 WO 2022227678A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
camera
world
coordinate system
target
Prior art date
Application number
PCT/CN2021/143443
Other languages
French (fr)
Chinese (zh)
Inventor
刘亦芃
杜国光
赵开勇
Original Assignee
达闼机器人股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼机器人股份有限公司 filed Critical 达闼机器人股份有限公司
Publication of WO2022227678A1 publication Critical patent/WO2022227678A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the embodiments of the present disclosure relate to the technical field of computer vision, and in particular, to a three-dimensional target detection method, a grasping method, an apparatus, and an electronic device.
  • Three-dimensional object detection refers to the technology of detecting the three-dimensional space coordinates of objects.
  • vehicle collision can be avoided through 3D object detection to control vehicles; in the field of service robots, objects can be accurately grasped through 3D object detection.
  • 3D target detection generally outputs the circumscribed minimum rectangle, category and corresponding confidence of the target recognition object according to the input point cloud data.
  • it is generally necessary to obtain camera extrinsic parameters, and convert the point cloud data in the camera coordinate system into the point cloud data in the world coordinate system according to the camera extrinsic parameters.
  • the detection accuracy of the three-dimensional target in the related technology is low.
  • the embodiments of the present disclosure provide a three-dimensional target detection method, a grasping method, an apparatus, and an electronic device, which are used to solve the problem of low three-dimensional target detection accuracy existing in the prior art.
  • a three-dimensional target detection method comprising:
  • the camera point cloud is a point cloud in a camera coordinate system
  • the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
  • the minimum circumscribed rectangle of the target identifier in the camera coordinate system is generated according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  • the converting the camera point cloud into the world point cloud includes:
  • the step of registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system includes:
  • a transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  • the converting the camera point cloud into the world point cloud according to the transformation matrix includes:
  • the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  • the method further includes:
  • the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
  • a preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  • the constructing a training set of point cloud data includes:
  • the three-dimensional model library includes three-dimensional models of a plurality of identification objects
  • a three-dimensional target grasping method including the above-mentioned three-dimensional target detection method, and the three-dimensional target grasping method further includes:
  • a grasping instruction is generated according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  • a three-dimensional target detection device comprising:
  • the acquisition module is used to acquire the depth image containing the target recognition object
  • a first generation module configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
  • a conversion module for converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
  • a second generation module configured to perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
  • the third generation module is configured to generate the minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  • the conversion module includes:
  • a registration unit for registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
  • a conversion unit configured to convert the camera point cloud into a world point cloud according to the transformation matrix.
  • the registration unit is configured to include:
  • a transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  • the conversion unit is configured to include:
  • the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  • the apparatus further includes a training module for:
  • the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
  • a preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  • the training module is configured to include:
  • the three-dimensional model library includes three-dimensional models of a plurality of identification objects
  • a three-dimensional target grasping device which is characterized by comprising the above-mentioned three-dimensional target detection device, and the three-dimensional target grasping device further includes:
  • a spatial determination module configured to determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system
  • a grasping module configured to generate a grasping instruction according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  • an electronic device including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface communicate with each other through the communication bus. communication between;
  • the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to perform the operations of the above-mentioned three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
  • a computer-readable storage medium where at least one executable instruction is stored in the storage medium, and when the executable instruction is executed on an electronic device, the electronic device executes the above The operation of the three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
  • a computer program comprising instructions that, when executed on a computer, cause the computer to perform operations according to the above-mentioned three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
  • a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
  • FIG. 1 shows a schematic flowchart of a three-dimensional target detection method provided by an embodiment of the present disclosure
  • FIG. 2( a ) shows a schematic diagram of a placement scene of an identification object and a simulated position of a corresponding camera provided by an embodiment of the present disclosure
  • Fig. 2(b) shows a schematic diagram of the rendering effect of the camera in Fig. 2(a);
  • FIG. 3( a ) shows a schematic diagram of another identification object placement scene and a corresponding camera simulation position provided by an embodiment of the present disclosure
  • Fig. 3(b) shows a schematic diagram of the rendering effect of the camera in Fig. 3(a);
  • FIG. 4 shows a schematic flowchart of a three-dimensional target grasping method provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of a three-dimensional target detection apparatus provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 1 shows a flowchart of a three-dimensional target detection method according to an embodiment of the present disclosure, and the method is executed by an electronic device.
  • the memory of the electronic device is used to store at least one executable instruction, and the executable instruction enables the processor of the electronic device to perform the operations of the above-mentioned three-dimensional target detection method.
  • the electronic device can be a robot, a car, a computer or other terminal equipment. As shown in Figure 1, the method includes the following steps:
  • Step 110 Acquire a depth image containing the target identifier.
  • the depth image may be an RGBD image, that is, a depth image in an RGB color mode.
  • the target recognition object in the depth image is the recognition object that needs to be detected.
  • the target identifier can be, for example, a water glass, a beverage bottle, a fruit, and the like.
  • a depth image containing the target recognition object can be obtained by photographing the scene containing the target recognition object by the depth camera.
  • Step 120 Generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system.
  • the camera point cloud corresponding to the depth image can be generated according to the depth image and the camera internal parameters, and the camera point cloud is the point cloud in the camera coordinate system.
  • the camera intrinsic parameter is a parameter related to the own characteristics of the camera that captures the depth image, and generally includes the focal length of the camera, the pixel size, and the like.
  • Step 130 Convert the camera point cloud into a world point cloud, where the world point cloud is a point cloud in a world coordinate system.
  • the camera point cloud can be registered with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system, and the camera point cloud can be converted into a world point cloud according to the transformation matrix .
  • the mean value of the camera point cloud in the three dimensions can be calculated separately, the homogeneous transformation matrix can be constructed according to the mean value, and the homogeneous transformation matrix is set as the initial value of the iterative closest point algorithm.
  • the plane point cloud of the gravity axis generates the transformation matrix from the camera coordinate system to the world coordinate system.
  • the rotation matrix corresponding to the transformation matrix determines the rotation matrix corresponding to the transformation matrix, if the rotation angle corresponding to the rotation matrix is greater than 90 degrees, then according to the rotation matrix Generate the world point cloud with the camera point cloud. If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, then generate the world point cloud according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud. For example, if the rotation angle does not exceed 90 degrees, the difference between 180 degrees and the rotation angle is used as the rotation angle of the rotation matrix.
  • Step 140 perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system.
  • the target detection can be performed on the world point cloud according to the preset target recognition model, so as to generate the circumscribed minimum rectangle of the target recognition object in the world coordinate system.
  • the circumscribed minimum rectangle that is, the circumscribed minimum cuboid, also known as the bounding box, is an algorithm used to solve the optimal enclosing space of a discrete point set. in place of complex geometric objects.
  • the minimum circumscribed rectangle of the target identifier can be, for example, an AABB bounding box, a bounding sphere, an oriented bounding box OBB, and a fixed-direction convex hull FDH.
  • the target recognition algorithm can be trained based on deep learning to generate the target recognition model. The training process of the target recognition algorithm is described in detail below.
  • the target recognition model may be, for example, a Vote Net network (three-dimensional target detection network).
  • the Vote Net network is an end-to-end 3D object detection network based on the synergy of deep point set network and Hough voting.
  • the point cloud data training set can be constructed as follows:
  • a 3D model library which includes 3D models of multiple objects, and align each object to the world coordinate system (the x-axis is rightward, the y-axis is forward, and the z-axis is upward), so that the object is vertical
  • the long axis corresponds to the y axis
  • the width corresponds to the x axis
  • the height corresponds to the z axis.
  • principal component analysis method can be used to calculate the circumscribed minimum rectangle of each recognized object.
  • an identification object placement scene for simulation is constructed, each identification object is placed in a simulation position under the placement scene, and the circumscribed minimum rectangle of each identification object at the simulation position is calculated.
  • the placement position is the spatial position of each identification object within the preset space range in the world coordinate system. After the identification object is rectified to the world coordinate system, the initial position of the identification object is determined, and the identification object is determined by the translation matrix and the rotation matrix. The placement position of , where the rotation matrix is the rotation matrix around the z-axis. Further, multiple camera perspectives can be randomly generated, and the world point cloud data can be rendered based on each camera perspective to generate camera point cloud data corresponding to the camera perspective of each recognized object, and save the recognition corresponding to the camera point cloud data. The object type, the centroid, length, width, height, and rotation angle around the z-axis of the corresponding smallest circumscribed rectangle.
  • FIG. 2(a) shows a schematic diagram of an object placement scene and a simulated position of a corresponding camera provided by an embodiment of the present disclosure
  • Fig. 2(b) shows a schematic diagram of a rendering effect of the camera in Fig. 2(a); in Fig. 2(a) ), the camera perspective is randomly generated, and the point cloud data of the object in the world coordinate system is rendered based on the camera perspective, and the rendering effect in Figure 2(b) can be obtained.
  • FIG. 3(a) shows another object placement scene and a schematic diagram of a corresponding camera simulation position provided by an embodiment of the present disclosure
  • FIG. 3(b) shows a schematic diagram of the rendering effect of the camera in FIG.
  • the following describes the process of calculating the circumscribed minimum rectangle of the recognized object by using the principal component analysis method.
  • M is a 3 ⁇ n matrix, which is used to represent the point cloud coordinates in three-dimensional space, and n is the number of point clouds.
  • mean(M) represents a matrix formed by the mean of M in three dimensions, that is, the mean(M) matrix is also a 3 ⁇ n matrix
  • the elements of each row are equal, and the elements of each row are equal to the matrix M in The mean on the corresponding dimension.
  • the column vectors of the feature vector V are rearranged to obtain the feature vectors V , , corresponding to the six different placement modes of the identified objects.
  • the corrected point clouds M , , respectively, of the identified objects in 6 different placement states can be obtained.
  • Translate M , to the origin, that is, M ′ M ′ -mean(M ′ ), then the circumscribed minimum rectangle B of the corrected point cloud M , can be calculated.
  • xmin, ymin and zmin are the correction point cloud M , the minimum values in the x-axis direction, y-axis direction and z-axis direction respectively
  • xmax, ymax and zmax are the correction point cloud M , respectively in the x-axis direction, y-axis direction and the maximum value in the z-axis direction.
  • is the correction point cloud M
  • the rotation angle around the z-axis, t x , ty and t z are the correction point cloud M
  • the following describes the process of randomly generating the camera perspective and rendering the point cloud in the world coordinate system based on the camera perspective.
  • the position matrix C P [x p , y p , z p ] T of the virtual camera
  • the front facing matrix C f [x f , y f , z f ] T
  • the camera angle of view of the virtual camera at the corresponding position can be determined by the front facing matrix, the top facing matrix and the left facing matrix.
  • T C is the homogeneous transformation matrix of the camera coordinate system relative to the world coordinate system, we can get in, is the extrinsic parameter matrix of the camera, It is the orientation transformation matrix of the camera coordinate system relative to the world coordinate system.
  • Vote Net can only predict rotation around a single axis relatively well, so before training the Vote Net network based on deep learning, it is necessary to transform the camera point cloud of the recognized object to the world point cloud, even if the direction of gravity is aligned with the -z axis. Further, the camera point cloud of the recognized object can be converted into the recognized object world point cloud based on the iterative closest point algorithm. The process of transforming the camera point cloud of the recognized object to the world point cloud will be described below.
  • a homogeneous transformation matrix as the initial value for the iterative closest point algorithm. Since the background desktop occupies a large proportion in the scene where the recognized objects are placed, and the point cloud corresponding to the background desktop is relatively large, a plane point cloud perpendicular to the z-axis is generated, and the iterative nearest point algorithm can be used to perform plane registration and calculate the recognition
  • the transformation matrix from the camera point cloud of the object to the plane point cloud, the transformation matrix includes a translation matrix and a rotation matrix, and the rotation angle corresponding to the rotation matrix can be further determined.
  • the rotation angle of the T vector should exceed 90 degrees; if the rotation angle of the (0,0,1) T vector does not exceed 90 degrees , then the difference between 180 degrees and the rotation angle of the (0,0,1) T vector is used as the rotation angle of the rotation matrix. Finally, the camera point cloud is converted to the world point cloud through the rotation matrix, that is, the -z axis is consistent with the direction of gravity.
  • the point cloud data training set can be constructed by converting the camera point cloud data corresponding to the camera perspective at each placement position into the world point cloud data, and adding label information to the world point cloud data.
  • the label information may include, for example, the category of the corresponding identifier, and the centroid, length, width, height, and rotation angle around the z-axis of the circumscribed smallest rectangle corresponding to the simulation position.
  • the Vote Net network takes the world point cloud as the input, and outputs the 3D circumscribed minimum rectangle, confidence and category of the target recognition object in the actual placement scene.
  • To detect 3D targets through the Vote Net network only the coordinate information of the world point cloud is needed, and there is no great dependence on the density of the world point cloud, and the generalization performance is very good.
  • Vote Net has achieved good results in the task of 3D object detection in indoor scenes, it only deals with real data of large indoor objects.
  • Vote Net is used to process the simulation data, use the simulation data for training, and detect the world point cloud obtained from the real shooting data. Since the geometric features of the simulated data and the real shot data are not very different, this makes the embodiment of the present disclosure more feasible.
  • the following describes the training of the Vote Net network based on the point cloud data training set.
  • the Vote Net network When training the Vote Net network, first construct a 2.5D point cloud in the simulated scene according to a similar density, and then shoot through a virtual camera, and generate world point cloud data according to the camera point cloud data obtained by shooting, and automatically obtain it The label information of each world point cloud data, which can improve the training speed of the target recognition model. Input the world point cloud data with label information into the Vote Net network for training, and determine the total number of training rounds according to the point cloud volume. After the training of the Vote Net network is completed, the 3D target detection is performed on the world point cloud processed by the iterative closest point algorithm, and the 3D circumscribed minimum rectangle, the confidence level and the type of the recognized object corresponding to the camera point cloud data can be obtained.
  • Step 150 Generate a minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the smallest circumscribed rectangle of the target identifier in the world coordinate system.
  • the minimum circumscribed rectangle of the target recognition object in the world coordinate system can be converted into the minimum circumscribed rectangle of the target recognition object in the camera coordinate system according to the above-mentioned rotation matrix. Further, the rotation matrix can be used to right multiply the circumscribed minimum rectangle matrix of the target identifier in the world coordinate system to obtain the circumscribed minimum rectangle matrix of the target identifier in the camera coordinate system.
  • a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
  • FIG. 4 shows a flowchart of a three-dimensional object grasping method according to another embodiment of the present disclosure, and the method is executed by an electronic device.
  • the memory of the electronic device is used to store at least one executable instruction, and the executable instruction enables the processor of the electronic device to perform the operations of the above-mentioned three-dimensional object grasping method.
  • the method includes the following steps:
  • Step 210 Determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system.
  • the spatial position of the target recognition object can be determined according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system.
  • the spatial position of the target identifier includes the spatial coordinates of the target identifier and the rotation angle of the target identifier in the three-dimensional space.
  • Step 220 Generate a grasping instruction according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  • a grabbing instruction may be generated according to the spatial position of the target identifier, and the grabbing instruction may be sent to a grabber for grabbing the target identifier.
  • the grasper can determine the grasping path of the target identification object according to the grasping instruction, and grasp the target identification object according to the grasping path.
  • the embodiment of the present disclosure generates the minimum circumscribed rectangle of the target recognition object in the camera coordinate system based on the camera point cloud, determines the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system, and generates a grasping instruction according to the spatial position , so that the grasper can accurately grasp the target identification object according to the grasping instruction.
  • FIG. 5 shows a schematic structural diagram of a three-dimensional object detection apparatus according to an embodiment of the present disclosure.
  • the apparatus 300 includes: an acquisition module 310 , a first generation module 320 , a conversion module 330 , a second generation module 340 and a third generation module 350 .
  • the obtaining module 310 is used to obtain the depth image including the target identifier
  • a first generating module 320 configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
  • a conversion module 330 configured to convert the camera point cloud into a world point cloud, where the world point cloud is a point cloud in a world coordinate system;
  • the second generation module 340 is configured to perform target detection on the world point cloud according to a preset target recognition model, so as to generate the circumscribed minimum rectangle of the target recognition object in the world coordinate system;
  • the third generating module 350 is configured to generate a minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  • the conversion module 330 includes:
  • a registration unit for registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
  • a conversion unit configured to convert the camera point cloud into a world point cloud according to the transformation matrix.
  • the registration unit is configured to include:
  • a transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  • the conversion unit is used to include:
  • the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  • the apparatus 300 further includes a training module for:
  • the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
  • a preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  • the training module is used to:
  • the three-dimensional model library includes three-dimensional models of a plurality of identification objects
  • a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
  • FIG. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and the specific embodiment of the present disclosure does not limit the specific implementation of the electronic device.
  • the electronic device may include: a processor (processor) 402 , a communication interface (Communications Interface) 404 , a memory (memory) 406 , and a communication bus 408 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 .
  • the communication interface 404 is used for communicating with network elements of other devices such as clients or other servers.
  • the processor 402 is configured to execute the program 410, and specifically may execute the relevant steps in the foregoing embodiments of the three-dimensional target detection method.
  • program 410 may include program code, which includes computer-executable instructions.
  • the processor 402 may be a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure.
  • the one or more processors included in the electronic device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 406 is used to store the program 410 .
  • Memory 406 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • the program 410 can be specifically called by the processor 402 to make the electronic device perform the following operations:
  • the camera point cloud is a point cloud in a camera coordinate system
  • the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
  • the minimum circumscribed rectangle of the target identifier in the camera coordinate system is generated according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • the camera point cloud is registered with the preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • a transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
  • a preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • the three-dimensional model library includes three-dimensional models of a plurality of identification objects
  • the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
  • a grasping instruction is generated according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  • a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
  • An embodiment of the present disclosure provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction runs on an electronic device, causes the electronic device to execute any of the foregoing method embodiments. 3D object detection method.
  • An embodiment of the present disclosure provides a three-dimensional target detection apparatus, which is used for executing the above-mentioned three-dimensional target detection method.
  • An embodiment of the present disclosure provides a computer program, and the computer program can be invoked by a processor to cause an electronic device to execute the three-dimensional target detection method in any of the foregoing method embodiments.
  • An embodiment of the present disclosure provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions, when the program instructions are executed on a computer, the computer is caused to execute any of the above The three-dimensional target detection method in the method embodiment.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Abstract

A three-dimensional target detection method and grabbing method, an apparatus, and an electronic device, which relate to the technical field of computer vision. The detection method comprises: obtaining a depth image comprising a target identification object (110); generating a camera point cloud corresponding to the depth image according to the depth image and a camera intrinsic parameter, the camera point cloud being a point cloud in a camera coordinate system (120); converting the camera point cloud into a world point cloud, the world point cloud being a point cloud in a world coordinate system (130); performing target detection on the world point cloud according to a preset target recognition model, so as to generate an external minimal cuboid for the target identification object in the world coordinate system (140); and generating an external minimal cuboid for the target identification object in the camera coordinate system according to the external minimal cuboid for the target identification object in the world coordinate system (150). The present method improves the detection quality of a three-dimensional target.

Description

三维目标检测方法、抓取方法、装置及电子设备Three-dimensional target detection method, grasping method, device and electronic equipment
交叉引用cross reference
本申请要求2021年04月29日递交的、申请号为“202110473106.3”、发明名称为“三维目标检测方法、抓取方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 29, 2021 with the application number "202110473106.3" and the invention titled "3D object detection method, grasping method, device and electronic equipment", the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本公开实施例涉及计算机视觉技术领域,具体涉及一种三维目标检测方法、抓取方法、装置及电子设备。The embodiments of the present disclosure relate to the technical field of computer vision, and in particular, to a three-dimensional target detection method, a grasping method, an apparatus, and an electronic device.
背景技术Background technique
三维目标检测是指对物体的三维空间坐标进行检测的技术。在自动驾驶领域,通过三维目标检测以对车辆进行控制可以避免车辆发生碰撞;在服务型机器人领域,通过三维目标检测可以对物体进行准确抓取。Three-dimensional object detection refers to the technology of detecting the three-dimensional space coordinates of objects. In the field of autonomous driving, vehicle collision can be avoided through 3D object detection to control vehicles; in the field of service robots, objects can be accurately grasped through 3D object detection.
三维目标检测一般根据输入的点云数据,输出目标识别物的外接最小矩体、类别以及对应的置信度。然而,相关技术中进行三维目标检测时,一般需要获取相机外参,根据相机外参将相机坐标系下的点云数据转换为世界坐标系下的点云数据。当无法获取相机外参时,相关技术对三维目标进行检测的准确度较低。3D target detection generally outputs the circumscribed minimum rectangle, category and corresponding confidence of the target recognition object according to the input point cloud data. However, when performing three-dimensional target detection in the related art, it is generally necessary to obtain camera extrinsic parameters, and convert the point cloud data in the camera coordinate system into the point cloud data in the world coordinate system according to the camera extrinsic parameters. When the external parameters of the camera cannot be obtained, the detection accuracy of the three-dimensional target in the related technology is low.
发明内容SUMMARY OF THE INVENTION
鉴于上述问题,本公开实施例提供了一种三维目标检测方法、抓取方法、装置及电子设备,用于解决现有技术中存在的三维目标检测准确度较低的问题。In view of the above problems, the embodiments of the present disclosure provide a three-dimensional target detection method, a grasping method, an apparatus, and an electronic device, which are used to solve the problem of low three-dimensional target detection accuracy existing in the prior art.
根据本公开实施例的一个方面,提供了一种三维目标检测方法,所述方法包括:According to an aspect of the embodiments of the present disclosure, a three-dimensional target detection method is provided, the method comprising:
获取包含目标识别物的深度图像;Obtain a depth image containing the target identifier;
根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;Perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The minimum circumscribed rectangle of the target identifier in the camera coordinate system is generated according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
在一种可选的方式中,所述将所述相机点云转换为世界点云包括:In an optional manner, the converting the camera point cloud into the world point cloud includes:
将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵;registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
根据所述变换矩阵将所述相机点云转换为世界点云。Transform the camera point cloud into a world point cloud according to the transformation matrix.
在一种可选的方式中,所述将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵包括:In an optional manner, the step of registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system includes:
分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
在一种可选的方式中,所述根据所述变换矩阵将所述相机点云转换为世界点云包括:In an optional manner, the converting the camera point cloud into the world point cloud according to the transformation matrix includes:
确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
在一种可选的方式中,所述方法还包括:In an optional manner, the method further includes:
构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
在一种可选的方式中,所述构建点云数据训练集包括:In an optional manner, the constructing a training set of point cloud data includes:
构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识 别物的相机点云数据;Randomly generate a camera perspective, and render based on the camera perspective to generate camera point cloud data for each recognized object;
将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
根据本公开实施例的另一方面,提供了一种三维目标抓取方法,包括上述的三维目标检测方法,所述三维目标抓取方法还包括:According to another aspect of the embodiments of the present disclosure, a three-dimensional target grasping method is provided, including the above-mentioned three-dimensional target detection method, and the three-dimensional target grasping method further includes:
根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置;Determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system;
根据所述空间位置生成抓取指令,以使得抓取器根据所述抓取指令对所述目标识别物进行抓取。A grasping instruction is generated according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
根据本公开实施例的另一方面,提供了一种三维目标检测装置,所述装置包括:According to another aspect of the embodiments of the present disclosure, there is provided a three-dimensional target detection device, the device comprising:
获取模块,用于获取包含目标识别物的深度图像;The acquisition module is used to acquire the depth image containing the target recognition object;
第一生成模块,用于根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;a first generation module, configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
转换模块,用于将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;a conversion module for converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
第二生成模块,用于根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;a second generation module, configured to perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
第三生成模块,用于根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The third generation module is configured to generate the minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
在一种可选的方式中,所述转换模块,包括:In an optional manner, the conversion module includes:
配准单元,用于将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵;a registration unit for registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
转换单元,用于根据所述变换矩阵将所述相机点云转换为世界点云。A conversion unit, configured to convert the camera point cloud into a world point cloud according to the transformation matrix.
在一种可选的方式中,所述配准单元,用于包括:In an optional manner, the registration unit is configured to include:
分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
在一种可选的方式中,所述转换单元,用于包括:In an optional manner, the conversion unit is configured to include:
确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
在一种可选的方式中,所述装置还包括训练模块,用于:In an optional manner, the apparatus further includes a training module for:
构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
在一种可选的方式中,所述训练模块,用于包括:In an optional manner, the training module is configured to include:
构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识别物的相机点云数据;Randomly generating a camera perspective, and rendering based on the camera perspective to generate camera point cloud data for each identified object;
将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
根据本公开实施例的另一方面,提供了一种三维目标抓取装置,其特征在于,包括上述的三维目标检测装置,所述三维目标抓取装置还包括:According to another aspect of the embodiments of the present disclosure, a three-dimensional target grasping device is provided, which is characterized by comprising the above-mentioned three-dimensional target detection device, and the three-dimensional target grasping device further includes:
空间确定模块,用于根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置;a spatial determination module, configured to determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system;
抓取模块,用于根据所述空间位置生成抓取指令,以使得抓取器根据所述抓取指令对所述目标识别物进行抓取。A grasping module, configured to generate a grasping instruction according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
根据本公开实施例的另一方面,提供了一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;According to another aspect of the embodiments of the present disclosure, an electronic device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface communicate with each other through the communication bus. communication between;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述的三维目标检测方法或上述的三维目标抓取方法的操作。The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to perform the operations of the above-mentioned three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
根据本公开实施例的又一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令在电子设备上运行时,使得电子设备执行上述的三维目标检测方法或上述的三维目标抓取方法的操作。According to yet another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, where at least one executable instruction is stored in the storage medium, and when the executable instruction is executed on an electronic device, the electronic device executes the above The operation of the three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
根据本公开实施例的又一方面,提供了一种计算机程序,包括指令,当其在计算机上运行时,使得计算机执行根据上述的三维目标检测方法或 上述的三维目标抓取方法的操作。According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program comprising instructions that, when executed on a computer, cause the computer to perform operations according to the above-mentioned three-dimensional target detection method or the above-mentioned three-dimensional target grasping method.
本公开实施例通过深度图像以及相机内参,可以生成对应于深度图像的相机点云;将相机点云转换为世界点云之后,可以根据预设的目标识别模型对世界点云进行目标检测,以生成世界坐标系下目标识别物的外接最小矩体;进一步的,可以根据世界坐标系下目标识别物的外接最小矩体生成相机坐标系下目标识别物的外接最小矩体,以完成对目标识别物的检测。可以看出,本公开实施例在不获取相机外参的情况下,仍然可以基于相机点云生成相机坐标系下目标识别物的外接最小矩体,能够提高目标识别物的检测准确度。In the embodiment of the present disclosure, a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
上述说明仅是本公开实施例技术方案的概述,为了能够更清楚了解本公开实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本公开实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present disclosure. In order to understand the technical means of the embodiments of the present disclosure more clearly, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and characteristics of the embodiments of the present disclosure. The advantages can be more clearly understood, and the specific embodiments of the present disclosure are given below.
附图说明Description of drawings
附图仅用于示出实施方式,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:The drawings are for illustrative purposes only and are not to be considered limiting of the present disclosure. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:
图1示出了本公开实施例提供的三维目标检测方法的流程示意图;FIG. 1 shows a schematic flowchart of a three-dimensional target detection method provided by an embodiment of the present disclosure;
图2(a)示出了本公开实施例提供的识别物摆放场景及对应相机模拟位置示意图;FIG. 2( a ) shows a schematic diagram of a placement scene of an identification object and a simulated position of a corresponding camera provided by an embodiment of the present disclosure;
图2(b)示出了图2(a)中相机的渲染效果示意图;Fig. 2(b) shows a schematic diagram of the rendering effect of the camera in Fig. 2(a);
图3(a)示出了本公开实施例提供的另一识别物摆放场景及对应相机模拟位置示意图;FIG. 3( a ) shows a schematic diagram of another identification object placement scene and a corresponding camera simulation position provided by an embodiment of the present disclosure;
图3(b)示出了图3(a)中相机的渲染效果示意图;Fig. 3(b) shows a schematic diagram of the rendering effect of the camera in Fig. 3(a);
图4示出了本公开实施例提供的三维目标抓取方法的流程示意图;FIG. 4 shows a schematic flowchart of a three-dimensional target grasping method provided by an embodiment of the present disclosure;
图5示出了本公开实施例提供的三维目标检测装置的结构示意图;FIG. 5 shows a schematic structural diagram of a three-dimensional target detection apparatus provided by an embodiment of the present disclosure;
图6示出了本公开实施例提供的电子设备的结构示意图。FIG. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein.
图1示出了本公开实施例三维目标检测方法的流程图,该方法由电子设备执行。电子设备的存储器用于存放至少一可执行指令,该可执行指令 使电子设备的处理器执行上述的三维目标检测方法的操作。该电子设备可以是机器人、汽车、计算机或其他终端设备。如图1所示,该方法包括以下步骤:FIG. 1 shows a flowchart of a three-dimensional target detection method according to an embodiment of the present disclosure, and the method is executed by an electronic device. The memory of the electronic device is used to store at least one executable instruction, and the executable instruction enables the processor of the electronic device to perform the operations of the above-mentioned three-dimensional target detection method. The electronic device can be a robot, a car, a computer or other terminal equipment. As shown in Figure 1, the method includes the following steps:
步骤110:获取包含目标识别物的深度图像。Step 110: Acquire a depth image containing the target identifier.
其中,深度图像可以为RGBD图像,即RGB色彩模式的深度图像。深度图像中的目标识别物为需要进行目标检测的识别物。目标识别物例如可以为水杯、饮料瓶、水果等。一般来说,通过深度相机对包含目标识别物的场景进行拍摄可以获取到包含目标识别物的深度图像。The depth image may be an RGBD image, that is, a depth image in an RGB color mode. The target recognition object in the depth image is the recognition object that needs to be detected. The target identifier can be, for example, a water glass, a beverage bottle, a fruit, and the like. Generally speaking, a depth image containing the target recognition object can be obtained by photographing the scene containing the target recognition object by the depth camera.
步骤120:根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云。Step 120: Generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system.
其中,根据深度图像以及相机内参可以生成对应于深度图像的相机点云,相机点云为相机坐标系下的点云。相机内参为与拍摄深度图像的相机的自身特性相关的参数,一般包括相机的焦距、像素大小等。The camera point cloud corresponding to the depth image can be generated according to the depth image and the camera internal parameters, and the camera point cloud is the point cloud in the camera coordinate system. The camera intrinsic parameter is a parameter related to the own characteristics of the camera that captures the depth image, and generally includes the focal length of the camera, the pixel size, and the like.
步骤130:将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云。Step 130: Convert the camera point cloud into a world point cloud, where the world point cloud is a point cloud in a world coordinate system.
在一种可选的方式中,可以将相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵,根据变换矩阵将相机点云转换为世界点云。为了获取变换矩阵,可以分别计算相机点云在三个维度上的均值,根据均值构造齐次变换矩阵,将齐次变换矩阵设置为迭代最近点算法的初值,根据迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。In an optional way, the camera point cloud can be registered with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system, and the camera point cloud can be converted into a world point cloud according to the transformation matrix . In order to obtain the transformation matrix, the mean value of the camera point cloud in the three dimensions can be calculated separately, the homogeneous transformation matrix can be constructed according to the mean value, and the homogeneous transformation matrix is set as the initial value of the iterative closest point algorithm. The plane point cloud of the gravity axis generates the transformation matrix from the camera coordinate system to the world coordinate system.
例如,首先计算相机点云在三维空间中每一维的均值
Figure PCTCN2021143443-appb-000001
Figure PCTCN2021143443-appb-000002
然后构造齐次变换矩阵
Figure PCTCN2021143443-appb-000003
作为迭代最近点算法的初值;生成垂直于世界坐标系重力轴(z轴)的平面点云,以求出相机点云到该平面点云的变换矩阵,通过该变换矩阵即可将相机点云转换为世界点云。
For example, first calculate the mean of each dimension of the camera point cloud in the three-dimensional space
Figure PCTCN2021143443-appb-000001
and
Figure PCTCN2021143443-appb-000002
Then construct a homogeneous transformation matrix
Figure PCTCN2021143443-appb-000003
As the initial value of the iterative closest point algorithm; generate a plane point cloud perpendicular to the gravity axis (z-axis) of the world coordinate system to obtain the transformation matrix from the camera point cloud to the plane point cloud. Clouds are converted to world point clouds.
在一种可选的方式中,在根据变换矩阵将相机点云转换为世界点云时,首先,确定变换矩阵对应的旋转矩阵,若旋转矩阵所对应的旋转角大于90度,则根据旋转矩阵与相机点云生成世界点云,若旋转矩阵所对应的旋转角不大于90度,则根据旋转矩阵对应的余角旋转量与相机点云生成世界点云。例如,若旋转角未超过90度,则将180度与旋转角之间的差值作为旋转矩阵的旋转角。In an optional way, when converting the camera point cloud into the world point cloud according to the transformation matrix, first, determine the rotation matrix corresponding to the transformation matrix, if the rotation angle corresponding to the rotation matrix is greater than 90 degrees, then according to the rotation matrix Generate the world point cloud with the camera point cloud. If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, then generate the world point cloud according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud. For example, if the rotation angle does not exceed 90 degrees, the difference between 180 degrees and the rotation angle is used as the rotation angle of the rotation matrix.
步骤140:根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体。Step 140 : perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system.
其中,可以根据预设的目标识别模型对世界点云进行目标检测,以生成世界坐标系下目标识别物的外接最小矩体。外接最小矩体,即外接最小长方体,也称包围盒,是用于求解离散点集最优包围空间的算法,基本思想是用体积稍大且特性简单的几何体(外接最小矩体)来近似地代替复杂的几何对象。目标识别物的外接最小矩体例如可以为AABB包围盒、包围球、方向包围盒OBB以及固定方向凸包FDH。在根据预设的目标识别模型对世界点云进行目标检测之前,可以基于深度学习对目标识别算法进行训练,以生成目标识别模型。下面对目标识别算法的训练过程进行详细说明。Among them, the target detection can be performed on the world point cloud according to the preset target recognition model, so as to generate the circumscribed minimum rectangle of the target recognition object in the world coordinate system. The circumscribed minimum rectangle, that is, the circumscribed minimum cuboid, also known as the bounding box, is an algorithm used to solve the optimal enclosing space of a discrete point set. in place of complex geometric objects. The minimum circumscribed rectangle of the target identifier can be, for example, an AABB bounding box, a bounding sphere, an oriented bounding box OBB, and a fixed-direction convex hull FDH. Before the target detection is performed on the world point cloud according to the preset target recognition model, the target recognition algorithm can be trained based on deep learning to generate the target recognition model. The training process of the target recognition algorithm is described in detail below.
在对目标识别算法进行训练之前,需要构建点云数据训练集,该点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息。利用点云数据训练集训练预设的目标识别算法,以生成目标识别模型。本公开的一个实施例中,目标识别模型例如可以是Vote Net网络(三维目标检测网络)。Vote Net网络是一种基于deep point set网络和霍夫投票协同作用的端到端3D物体检测网络。Before training the target recognition algorithm, a point cloud data training set needs to be constructed, and the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data. Use the point cloud data training set to train the preset target recognition algorithm to generate the target recognition model. In one embodiment of the present disclosure, the target recognition model may be, for example, a Vote Net network (three-dimensional target detection network). The Vote Net network is an end-to-end 3D object detection network based on the synergy of deep point set network and Hough voting.
在一种可选的方式中,可以通过如下方式构建点云数据训练集:In an optional way, the point cloud data training set can be constructed as follows:
构建三维模型库,该三维模型库包括多个识别物的三维模型,将每一个识别物摆正至世界坐标系(x轴向右,y轴向前,z轴向上),使得物体竖直放置时的长轴对应y轴,宽对应x轴,高对应z轴。然后,可以利用主成分分析法计算每一个识别物的外接最小矩体。进一步的,构建用于仿真的识别物摆放场景,将每一个识别物在摆放场景下摆放在仿真位置,并计算每一个识别物在仿真位置的外接最小矩体。若摆放位置包括多个识别物,还可以进行碰撞检测,以确保识别物之间不发生碰撞。摆放位置为每一个识别物在世界坐标系下预设空间范围内的空间位置,在将识别物摆正至世界坐标系之后,确定识别物的初始位置,通过平移矩阵和旋转矩阵确定识别物的摆放位置,其中,旋转矩阵为绕z轴的旋转矩阵。进一步的,可以随机生成多个相机视角,并基于每个相机视角对世界点云数据进行渲染,以生成每一个识别物的对应相机视角的相机点云数据,并保存相机点云数据对应的识别物类别、对应外接最小矩体的质心、长度、宽度、高度以及绕z轴的旋转角。Build a 3D model library, which includes 3D models of multiple objects, and align each object to the world coordinate system (the x-axis is rightward, the y-axis is forward, and the z-axis is upward), so that the object is vertical When placed, the long axis corresponds to the y axis, the width corresponds to the x axis, and the height corresponds to the z axis. Then, principal component analysis method can be used to calculate the circumscribed minimum rectangle of each recognized object. Further, an identification object placement scene for simulation is constructed, each identification object is placed in a simulation position under the placement scene, and the circumscribed minimum rectangle of each identification object at the simulation position is calculated. If the placement position includes multiple identification objects, collision detection can also be performed to ensure that the identification objects do not collide. The placement position is the spatial position of each identification object within the preset space range in the world coordinate system. After the identification object is rectified to the world coordinate system, the initial position of the identification object is determined, and the identification object is determined by the translation matrix and the rotation matrix. The placement position of , where the rotation matrix is the rotation matrix around the z-axis. Further, multiple camera perspectives can be randomly generated, and the world point cloud data can be rendered based on each camera perspective to generate camera point cloud data corresponding to the camera perspective of each recognized object, and save the recognition corresponding to the camera point cloud data. The object type, the centroid, length, width, height, and rotation angle around the z-axis of the corresponding smallest circumscribed rectangle.
图2(a)示出了本公开实施例提供的物体摆放场景及对应相机模拟位置示意图,图2(b)示出了图2(a)中相机的渲染效果示意图;在图2(a)的物体摆放场景下,随机生成相机视角,并且基于相机视角对世界坐标系下的物体点云数据进行渲染,即可得到图2(b)中的渲染效果。同样的,图3(a)示出了本公开实施例提供的另一物体摆放场景及对应相机模拟位置示意图,图3(b)示出了图3(a)中相机的渲染效果示意图;在图3(a)的物体摆放场景下,随机生成相机视角,并且基于相机视角对 世界坐标系下的物体点云数据进行渲染,即可得到图3(b)中的渲染效果。需要说明的是,对于任一物体摆放场景,可以随机生成多个相机视角,分别基于每一个相机视角对识别物的世界点云进行渲染,以得到对应相机视角下的相机点云。Fig. 2(a) shows a schematic diagram of an object placement scene and a simulated position of a corresponding camera provided by an embodiment of the present disclosure, and Fig. 2(b) shows a schematic diagram of a rendering effect of the camera in Fig. 2(a); in Fig. 2(a) ), the camera perspective is randomly generated, and the point cloud data of the object in the world coordinate system is rendered based on the camera perspective, and the rendering effect in Figure 2(b) can be obtained. Similarly, FIG. 3(a) shows another object placement scene and a schematic diagram of a corresponding camera simulation position provided by an embodiment of the present disclosure, and FIG. 3(b) shows a schematic diagram of the rendering effect of the camera in FIG. 3(a); In the object placement scene in Figure 3(a), the camera perspective is randomly generated, and the object point cloud data in the world coordinate system is rendered based on the camera perspective, and the rendering effect in Figure 3(b) can be obtained. It should be noted that, for any object placement scene, multiple camera perspectives can be randomly generated, and the world point cloud of the recognized object is rendered based on each camera perspective to obtain the camera point cloud corresponding to the camera perspective.
下面通过公式对利用主成分分析法计算识别物的外接最小矩体的过程进行说明。The following describes the process of calculating the circumscribed minimum rectangle of the recognized object by using the principal component analysis method.
假设M是一个3×n的矩阵,用来表示三维空间中的点云坐标,n为点云的数量。假设mean(M)表示M在三个维度上的均值所构成的矩阵,即mean(M)矩阵也是一个3×n的矩阵,每一行的元素均相等,且每一行的元素均等于矩阵M在对应维度上的均值。定义
Figure PCTCN2021143443-appb-000004
计算
Figure PCTCN2021143443-appb-000005
的协方差矩阵Corr,
Figure PCTCN2021143443-appb-000006
并求出Corr的特征值A和特征向量V,使得CorrV=AV。进一步的,将特征向量V的列向量重新排列,得到识别物的6种不同摆放方式所对应的特征向量V
Suppose M is a 3×n matrix, which is used to represent the point cloud coordinates in three-dimensional space, and n is the number of point clouds. Assuming that mean(M) represents a matrix formed by the mean of M in three dimensions, that is, the mean(M) matrix is also a 3×n matrix, the elements of each row are equal, and the elements of each row are equal to the matrix M in The mean on the corresponding dimension. definition
Figure PCTCN2021143443-appb-000004
calculate
Figure PCTCN2021143443-appb-000005
The covariance matrix Corr of ,
Figure PCTCN2021143443-appb-000006
And find the eigenvalue A and eigenvector V of Corr, so that CorrV=AV. Further, the column vectors of the feature vector V are rearranged to obtain the feature vectors V , , corresponding to the six different placement modes of the identified objects.
进一步的,通过计算M =V M,即可分别得到识别物6种不同摆放状态下的校正点云M 。将M 平移至原点,即使得M =M -mean(M ),之后即可计算出校正点云M 的外接最小矩体B。其中,xmin、ymin和zmin为校正点云M 分别在x轴方向、y轴方向和z轴方向的最小值,xmax、ymax和zmax为校正点云M 分别在x轴方向、y轴方向和z轴方向的最大值。 Further, by calculating M =V , M, the corrected point clouds M , , respectively, of the identified objects in 6 different placement states can be obtained. Translate M , to the origin, that is, M =M -mean(M ), then the circumscribed minimum rectangle B of the corrected point cloud M , can be calculated. Among them, xmin, ymin and zmin are the correction point cloud M , the minimum values in the x-axis direction, y-axis direction and z-axis direction respectively, xmax, ymax and zmax are the correction point cloud M , respectively in the x-axis direction, y-axis direction and the maximum value in the z-axis direction.
Figure PCTCN2021143443-appb-000007
Figure PCTCN2021143443-appb-000007
通过旋转矩阵
Figure PCTCN2021143443-appb-000008
以及平移矩阵t=[t x,t y,t z] T,可以对校正点云M 进行随机摆放,然后更新校正点云M =RM +t。其中,θ为校正点云M 绕z轴的旋转角,t x、t y和t z分别为校正点云M 在x轴、y轴和z轴的平移量。
by rotation matrix
Figure PCTCN2021143443-appb-000008
And the translation matrix t=[t x , ty , t z ] T , the correction point cloud M can be randomly placed, and then the correction point cloud M , =RM , +t can be updated. Among them, θ is the correction point cloud M , the rotation angle around the z-axis, t x , ty and t z are the correction point cloud M , the translation amount on the x-axis, y-axis and z-axis, respectively.
下面通过公式对随机生成相机视角,并基于相机视角对世界坐标系下的点云进行渲染的过程进行说明。The following describes the process of randomly generating the camera perspective and rendering the point cloud in the world coordinate system based on the camera perspective.
其中,可以设置虚拟相机的位置矩阵C P=[x p,y p,z p] T、前方朝向矩阵C f=[x f,y f,z f] T和上方朝向矩阵C t=[x t,y t,z t] T,则可以得出相机的左方朝向矩阵为C l=[y tz f-z ty f,z tx f-x tz f,x ty f-y tx f] T。通过前方朝向矩阵、上方朝向矩阵和左方朝向矩阵即可确定虚拟相机在对应位置的相机视角。假设T C为相 机坐标系相对于世界坐标系的齐次变换矩阵,则可以得到
Figure PCTCN2021143443-appb-000009
其中,
Figure PCTCN2021143443-appb-000010
为相机的外参矩阵,
Figure PCTCN2021143443-appb-000011
为相机坐标系相对于世界坐标系的方向变换矩阵。
Among them, the position matrix C P =[x p , y p , z p ] T of the virtual camera, the front facing matrix C f =[x f , y f , z f ] T and the upward facing matrix C t =[x f ] T can be set t , y t , z t ] T , then the left orientation matrix of the camera can be obtained as C l =[y t z f -z t y f , z t x f -x t z f ,x t y f - y t x f ] T . The camera angle of view of the virtual camera at the corresponding position can be determined by the front facing matrix, the top facing matrix and the left facing matrix. Assuming that T C is the homogeneous transformation matrix of the camera coordinate system relative to the world coordinate system, we can get
Figure PCTCN2021143443-appb-000009
in,
Figure PCTCN2021143443-appb-000010
is the extrinsic parameter matrix of the camera,
Figure PCTCN2021143443-appb-000011
It is the orientation transformation matrix of the camera coordinate system relative to the world coordinate system.
通过求解上述的线性方程可以得到
Figure PCTCN2021143443-appb-000012
进一步的,通过对T C求逆,可以得到世界坐标系相对于相机坐标系的齐次变换矩阵为
Figure PCTCN2021143443-appb-000013
进而,识别物的相机点云坐标M C
Figure PCTCN2021143443-appb-000014
By solving the above linear equation, we can get
Figure PCTCN2021143443-appb-000012
Further, by inverting T C , the homogeneous transformation matrix of the world coordinate system relative to the camera coordinate system can be obtained as
Figure PCTCN2021143443-appb-000013
Furthermore, the camera point cloud coordinate M C of the recognized object is
Figure PCTCN2021143443-appb-000014
由于本公开实施例选择对Vote Net网络进行训练,以得目标识别模型。Vote Net只能比较好地预测绕单轴的旋转,所以在基于深度学习对Vote Net网络进行训练之前,需要将识别物的相机点云变换至世界点云,即使重力方向与-z轴对齐。进一步的,可以基于迭代最近点算法将识别物的相机点云转换为识别物世界点云。下面对将识别物的相机点云变换至世界点云的过程进行说明。Because the embodiment of the present disclosure chooses to train the Vote Net network to obtain the target recognition model. Vote Net can only predict rotation around a single axis relatively well, so before training the Vote Net network based on deep learning, it is necessary to transform the camera point cloud of the recognized object to the world point cloud, even if the direction of gravity is aligned with the -z axis. Further, the camera point cloud of the recognized object can be converted into the recognized object world point cloud based on the iterative closest point algorithm. The process of transforming the camera point cloud of the recognized object to the world point cloud will be described below.
在一种可选的方式中,首先计算识别物的相机点云在三维空间中每一维的均值
Figure PCTCN2021143443-appb-000015
Figure PCTCN2021143443-appb-000016
然后,基于每一维的均值
Figure PCTCN2021143443-appb-000017
Figure PCTCN2021143443-appb-000018
In an optional way, first calculate the mean value of each dimension of the camera point cloud of the recognized object in the three-dimensional space
Figure PCTCN2021143443-appb-000015
and
Figure PCTCN2021143443-appb-000016
Then, based on the mean of each dimension
Figure PCTCN2021143443-appb-000017
and
Figure PCTCN2021143443-appb-000018
构造齐次变换矩阵
Figure PCTCN2021143443-appb-000019
作为迭代最近点算法的初值。由于识别物摆放场景中背景桌面占比较大,背景桌面对应的点云比例较大,所以生成一个垂直于z轴的平面点云,可以利用迭代最近点算法进行平面的配准,计算出识别物的相机点云到平面点云的变换矩阵,变换矩阵包括平移 矩阵和旋转矩阵,进一步可以确定旋转矩阵所对应的旋转角。
Construct a homogeneous transformation matrix
Figure PCTCN2021143443-appb-000019
as the initial value for the iterative closest point algorithm. Since the background desktop occupies a large proportion in the scene where the recognized objects are placed, and the point cloud corresponding to the background desktop is relatively large, a plane point cloud perpendicular to the z-axis is generated, and the iterative nearest point algorithm can be used to perform plane registration and calculate the recognition The transformation matrix from the camera point cloud of the object to the plane point cloud, the transformation matrix includes a translation matrix and a rotation matrix, and the rotation angle corresponding to the rotation matrix can be further determined.
需要说明的是,由于机器人抓取时默认为俯视观察,所以(0,0,1) T向量的旋转角应该超过90度;若(0,0,1) T向量的旋转角未超过90度,则将180度与(0,0,1) T向量的旋转角之间的差值作为旋转矩阵的旋转角。最后通过旋转矩阵将相机点云转换为世界点云,即使得-z轴与重力方向一致。 It should be noted that since the robot grasps by default, the rotation angle of the T vector should exceed 90 degrees; if the rotation angle of the (0,0,1) T vector does not exceed 90 degrees , then the difference between 180 degrees and the rotation angle of the (0,0,1) T vector is used as the rotation angle of the rotation matrix. Finally, the camera point cloud is converted to the world point cloud through the rotation matrix, that is, the -z axis is consistent with the direction of gravity.
将每一个摆放位置对应相机视角的相机点云数据转换为世界点云数据,并且为世界点云数据添加标签信息,即可构建点云数据训练集。标签信息例如可以包括对应识别物的类别,以及对应仿真位置的外接最小矩体的质心、长度、宽度、高度以及绕z轴的旋转角。The point cloud data training set can be constructed by converting the camera point cloud data corresponding to the camera perspective at each placement position into the world point cloud data, and adding label information to the world point cloud data. The label information may include, for example, the category of the corresponding identifier, and the centroid, length, width, height, and rotation angle around the z-axis of the circumscribed smallest rectangle corresponding to the simulation position.
Vote Net网络以世界点云为输入,输出实际摆放场景中目标识别物的3D外接最小矩体、置信度以及类别。通过Vote Net网络对三维目标进行检测,只需要世界点云的坐标信息,对世界点云的疏密性没有太大依赖,泛化性能很好。尽管Vote Net已经在室内场景的3D目标检测任务中取得了很好的成果,但处理的都是室内大物体的真实数据。在本说明书中,将Vote Net用于处理仿真数据,利用仿真数据进行训练,并且对根据真实拍摄数据获取的世界点云进行检测。由于仿真数据与实拍数据的几何特征差异不大,这使得本公开实施例的可行性较好。The Vote Net network takes the world point cloud as the input, and outputs the 3D circumscribed minimum rectangle, confidence and category of the target recognition object in the actual placement scene. To detect 3D targets through the Vote Net network, only the coordinate information of the world point cloud is needed, and there is no great dependence on the density of the world point cloud, and the generalization performance is very good. Although Vote Net has achieved good results in the task of 3D object detection in indoor scenes, it only deals with real data of large indoor objects. In this specification, Vote Net is used to process the simulation data, use the simulation data for training, and detect the world point cloud obtained from the real shooting data. Since the geometric features of the simulated data and the real shot data are not very different, this makes the embodiment of the present disclosure more feasible.
下面对基于点云数据训练集对Vote Net网络的训练进行说明。The following describes the training of the Vote Net network based on the point cloud data training set.
在对Vote Net网络进行训练时,首先按照相似的密度,构造仿真场景下的2.5D点云,然后通过虚拟相机进行拍摄,并根据拍摄获取的相机点云数据生成世界点云数据,并自动获取每一个世界点云数据的标签信息,这样可以提高目标识别模型的训练速度。将含有标签信息的世界点云数据输入Vote Net网络进行训练,根据点云量确定总的训练轮数。Vote Net网络训练结束后,对经过迭代最近点算法处理的世界点云进行三维目标检测,可得到对应于相机点云数据的目标识别物的3D外接最小矩体、置信度以及识别物类别。When training the Vote Net network, first construct a 2.5D point cloud in the simulated scene according to a similar density, and then shoot through a virtual camera, and generate world point cloud data according to the camera point cloud data obtained by shooting, and automatically obtain it The label information of each world point cloud data, which can improve the training speed of the target recognition model. Input the world point cloud data with label information into the Vote Net network for training, and determine the total number of training rounds according to the point cloud volume. After the training of the Vote Net network is completed, the 3D target detection is performed on the world point cloud processed by the iterative closest point algorithm, and the 3D circumscribed minimum rectangle, the confidence level and the type of the recognized object corresponding to the camera point cloud data can be obtained.
步骤150:根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。Step 150: Generate a minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the smallest circumscribed rectangle of the target identifier in the world coordinate system.
其中,可以根据上述的旋转矩阵将世界坐标系下目标识别物的外接最小矩体转换为相机坐标系下目标识别物的外接最小矩体。进一步的,可以利用旋转矩阵右乘世界坐标系下目标识别物的外接最小矩体矩阵以得到相机坐标系下目标识别物的外接最小矩体矩阵。Wherein, the minimum circumscribed rectangle of the target recognition object in the world coordinate system can be converted into the minimum circumscribed rectangle of the target recognition object in the camera coordinate system according to the above-mentioned rotation matrix. Further, the rotation matrix can be used to right multiply the circumscribed minimum rectangle matrix of the target identifier in the world coordinate system to obtain the circumscribed minimum rectangle matrix of the target identifier in the camera coordinate system.
本公开实施例通过深度图像以及相机内参,可以生成对应于深度图像的相机点云;将相机点云转换为世界点云之后,可以根据预设的目标识别模型对世界点云进行目标检测,以生成世界坐标系下目标识别物的外接最小矩体;进一步的,可以根据世界坐标系下目标识别物的外接最小矩体生 成相机坐标系下目标识别物的外接最小矩体,以完成对目标识别物的检测。可以看出,本公开实施例在不获取相机外参的情况下,仍然可以基于相机点云生成相机坐标系下目标识别物的外接最小矩体,能够提高目标识别物的检测准确度。In the embodiment of the present disclosure, a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
图4示出了本公开另一个实施例的三维目标抓取方法流程图,该方法由电子设备执行。电子设备的存储器用于存放至少一可执行指令,该可执行指令使电子设备的处理器执行上述的三维目标抓取方法的操作。如图4所示,该方法包括以下步骤:FIG. 4 shows a flowchart of a three-dimensional object grasping method according to another embodiment of the present disclosure, and the method is executed by an electronic device. The memory of the electronic device is used to store at least one executable instruction, and the executable instruction enables the processor of the electronic device to perform the operations of the above-mentioned three-dimensional object grasping method. As shown in Figure 4, the method includes the following steps:
步骤210:根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置。Step 210: Determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system.
其中,根据相机坐标系下目标识别物的外接最小矩体可以确定目标识别物的空间位置。目标识别物的空间位置包括目标识别物的空间坐标以及目标识别物在三维空间的旋转角。Among them, the spatial position of the target recognition object can be determined according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system. The spatial position of the target identifier includes the spatial coordinates of the target identifier and the rotation angle of the target identifier in the three-dimensional space.
步骤220:根据所述空间位置生成抓取指令,以使得抓取器根据所述抓取指令对所述目标识别物进行抓取。Step 220: Generate a grasping instruction according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
其中,可以根据目标识别物的空间位置生成抓取指令,将抓取指令发送给用于抓取目标识别物的抓取器。抓取器可以根据抓起指令确定目标识别物的抓取路径,根据抓取路径对目标识别物进行抓取。Wherein, a grabbing instruction may be generated according to the spatial position of the target identifier, and the grabbing instruction may be sent to a grabber for grabbing the target identifier. The grasper can determine the grasping path of the target identification object according to the grasping instruction, and grasp the target identification object according to the grasping path.
本公开实施例基于相机点云生成相机坐标系下目标识别物的外接最小矩体,根据相机坐标系下目标识别物的外接最小矩体确定目标识别物的空间位置,根据空间位置生成抓取指令,可以使得抓取器根据抓取指令对目标识别物进行准确抓取。The embodiment of the present disclosure generates the minimum circumscribed rectangle of the target recognition object in the camera coordinate system based on the camera point cloud, determines the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system, and generates a grasping instruction according to the spatial position , so that the grasper can accurately grasp the target identification object according to the grasping instruction.
图5示出了本公开实施例三维目标检测装置的结构示意图。如图5所示,该装置300包括:获取模块310、第一生成模块320、转换模块330、第二生成模块340和第三生成模块350。FIG. 5 shows a schematic structural diagram of a three-dimensional object detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 5 , the apparatus 300 includes: an acquisition module 310 , a first generation module 320 , a conversion module 330 , a second generation module 340 and a third generation module 350 .
其中,获取模块310,用于获取包含目标识别物的深度图像;Wherein, the obtaining module 310 is used to obtain the depth image including the target identifier;
第一生成模块320,用于根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;a first generating module 320, configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
转换模块330,用于将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;a conversion module 330, configured to convert the camera point cloud into a world point cloud, where the world point cloud is a point cloud in a world coordinate system;
第二生成模块340,用于根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;The second generation module 340 is configured to perform target detection on the world point cloud according to a preset target recognition model, so as to generate the circumscribed minimum rectangle of the target recognition object in the world coordinate system;
第三生成模块350,用于根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The third generating module 350 is configured to generate a minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
在一种可选的方式中,转换模块330包括:In an optional manner, the conversion module 330 includes:
配准单元,用于将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵;a registration unit for registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
转换单元,用于根据所述变换矩阵将所述相机点云转换为世界点云。A conversion unit, configured to convert the camera point cloud into a world point cloud according to the transformation matrix.
在一种可选的方式中,所述配准单元,用于包括:In an optional manner, the registration unit is configured to include:
分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
在一种可选的方式中,转换单元用于包括:In an optional way, the conversion unit is used to include:
确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
在一种可选的方式中,装置300还包括训练模块,用于:In an optional manner, the apparatus 300 further includes a training module for:
构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
在一种可选的方式中,所述训练模块用于:In an optional way, the training module is used to:
构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识别物的相机点云数据;Randomly generating a camera perspective, and rendering based on the camera perspective to generate camera point cloud data for each identified object;
将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
本公开实施例通过深度图像以及相机内参,可以生成对应于深度图像的相机点云;将相机点云转换为世界点云之后,可以根据预设的目标识别模型对世界点云进行目标检测,以生成世界坐标系下目标识别物的外接最小矩体;进一步的,可以根据世界坐标系下目标识别物的外接最小矩体生 成相机坐标系下目标识别物的外接最小矩体,以完成对目标识别物的检测。可以看出,本公开实施例在不获取相机外参的情况下,仍然可以基于相机点云生成相机坐标系下目标识别物的外接最小矩体,能够提高目标识别物的检测准确度。In the embodiment of the present disclosure, a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
图6示出了本公开实施例电子设备结构示意图,本公开具体实施例并不对电子设备的具体实现做限定。FIG. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and the specific embodiment of the present disclosure does not limit the specific implementation of the electronic device.
如图6所示,该电子设备可以包括:处理器(processor)402、通信接口(Communications Interface)404、存储器(memory)406、以及通信总线408。As shown in FIG. 6 , the electronic device may include: a processor (processor) 402 , a communication interface (Communications Interface) 404 , a memory (memory) 406 , and a communication bus 408 .
其中:处理器402、通信接口404、以及存储器406通过通信总线408完成相互间的通信。通信接口404,用于与其它设备比如客户端或其它服务器等的网元通信。处理器402,用于执行程序410,具体可以执行上述用于三维目标检测方法实施例中的相关步骤。The processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 . The communication interface 404 is used for communicating with network elements of other devices such as clients or other servers. The processor 402 is configured to execute the program 410, and specifically may execute the relevant steps in the foregoing embodiments of the three-dimensional target detection method.
具体地,程序410可以包括程序代码,该程序代码包括计算机可执行指令。Specifically, program 410 may include program code, which includes computer-executable instructions.
处理器402可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本公开实施例的一个或多个集成电路。电子设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 402 may be a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure. The one or more processors included in the electronic device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器406,用于存放程序410。存储器406可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 406 is used to store the program 410 . Memory 406 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
程序410具体可以被处理器402调用使电子设备执行以下操作:The program 410 can be specifically called by the processor 402 to make the electronic device perform the following operations:
获取包含目标识别物的深度图像;Obtain a depth image containing the target identifier;
根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;Perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The minimum circumscribed rectangle of the target identifier in the camera coordinate system is generated according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世 界坐标系的变换矩阵;The camera point cloud is registered with the preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
根据所述变换矩阵将所述相机点云转换为世界点云。Transform the camera point cloud into a world point cloud according to the transformation matrix.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识别物的相机点云数据;Randomly generating a camera perspective, and rendering based on the camera perspective to generate camera point cloud data for each identified object;
将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
在一种可选的方式中,所述程序410被处理器402调用使电子设备执 行以下操作:In an optional manner, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:
根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置;Determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system;
根据所述空间位置生成抓取指令,以使得抓取器根据所述抓取指令对所述目标识别物进行抓取。A grasping instruction is generated according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
本公开实施例通过深度图像以及相机内参,可以生成对应于深度图像的相机点云;将相机点云转换为世界点云之后,可以根据预设的目标识别模型对世界点云进行目标检测,以生成世界坐标系下目标识别物的外接最小矩体;进一步的,可以根据世界坐标系下目标识别物的外接最小矩体生成相机坐标系下目标识别物的外接最小矩体,以完成对目标识别物的检测。可以看出,本公开实施例在不获取相机外参的情况下,仍然可以基于相机点云生成相机坐标系下目标识别物的外接最小矩体,能够提高目标识别物的检测准确度。In the embodiment of the present disclosure, a camera point cloud corresponding to the depth image can be generated by using the depth image and the internal parameters of the camera; after the camera point cloud is converted into a world point cloud, target detection can be performed on the world point cloud according to a preset target recognition model to obtain Generate the minimum circumscribed rectangle of the target recognition object in the world coordinate system; further, the circumscribed minimum rectangle of the target recognition object in the camera coordinate system can be generated according to the circumscribed minimum rectangle of the target recognition object in the world coordinate system, so as to complete the target recognition detection of objects. It can be seen that the embodiment of the present disclosure can still generate the circumscribed minimum rectangle of the target recognition object in the camera coordinate system based on the camera point cloud without acquiring the external parameters of the camera, which can improve the detection accuracy of the target recognition object.
本公开实施例提供了一种计算机可读存储介质,所述存储介质存储有至少一可执行指令,该可执行指令在电子设备上运行时,使得所述电子设备执行上述任意方法实施例中的三维目标检测方法。An embodiment of the present disclosure provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction runs on an electronic device, causes the electronic device to execute any of the foregoing method embodiments. 3D object detection method.
本公开实施例提供一种三维目标检测装置,用于执行上述三维目标检测方法。An embodiment of the present disclosure provides a three-dimensional target detection apparatus, which is used for executing the above-mentioned three-dimensional target detection method.
本公开实施例提供了一种计算机程序,所述计算机程序可被处理器调用使电子设备执行上述任意方法实施例中的三维目标检测方法。An embodiment of the present disclosure provides a computer program, and the computer program can be invoked by a processor to cause an electronic device to execute the three-dimensional target detection method in any of the foregoing method embodiments.
本公开实施例提供了一种计算机程序产品,计算机程序产品包括存储在计算机可读存储介质上的计算机程序,计算机程序包括程序指令,当程序指令在计算机上运行时,使得所述计算机执行上述任意方法实施例中的三维目标检测方法。An embodiment of the present disclosure provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions, when the program instructions are executed on a computer, the computer is caused to execute any of the above The three-dimensional target detection method in the method embodiment.
在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本公开实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本公开的内容,并且上面对特定语言所做的描述是为了披露本公开的最佳实施方式。The algorithms or displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present disclosure are not directed to any particular programming language. It is to be understood that various programming languages may be used to implement the disclosures described herein and that the descriptions of specific languages above are intended to disclose the best mode of the disclosure.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the present disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本公开的示例性实施例的描述中,本公开实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并 不应将该公开的方法解释成反映如下意图:即所要求保护的本公开要求比在每个权利要求中所明确记载的特征更多的特征。Similarly, it is to be understood that in the above descriptions of exemplary embodiments of the present disclosure, various features of embodiments of the present disclosure are sometimes grouped together into a single implementation in order to simplify the disclosure and to aid in the understanding of one or more of the various inventive aspects. examples, figures, or descriptions thereof. However, this method of disclosure is not to be interpreted as reflecting an intention that the claimed disclosure requires more features than are expressly recited in each claim.
本领域技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
应该注意的是上述实施例对本公开进行说明而不是对本公开进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-described embodiments illustrate rather than limit the disclosure, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The present disclosure may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names. The steps in the above embodiments should not be construed as limitations on the execution order unless otherwise specified.

Claims (17)

  1. 一种三维目标检测方法,其特征在于,所述方法包括:A three-dimensional target detection method, characterized in that the method comprises:
    获取包含目标识别物的深度图像;Obtain a depth image containing the target identifier;
    根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
    将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
    根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;Perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
    根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The minimum circumscribed rectangle of the target identifier in the camera coordinate system is generated according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  2. 根据权利要求1所述的方法,其特征在于,所述将所述相机点云转换为世界点云包括:The method according to claim 1, wherein the converting the camera point cloud into a world point cloud comprises:
    将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵;registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
    根据所述变换矩阵将所述相机点云转换为世界点云。Transform the camera point cloud into a world point cloud according to the transformation matrix.
  3. 根据权利要求2所述的方法,其特征在于,所述将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵包括:The method according to claim 2, wherein the registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system comprises:
    分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
    根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
    根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述变换 矩阵将所述相机点云转换为世界点云包括:The method according to claim 2 or 3, wherein the converting the camera point cloud into a world point cloud according to the transformation matrix comprises:
    确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
    若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
    若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
    利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  6. 根据权利要求5所述的方法,其特征在于,所述构建点云数据训练集包括:The method according to claim 5, wherein the constructing a training set of point cloud data comprises:
    构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
    将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
    将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
    随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识别物的相机点云数据;Randomly generating a camera perspective, and rendering based on the camera perspective to generate camera point cloud data for each identified object;
    将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
    对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
  7. 一种三维目标抓取方法,其特征在于,包括权利要求1-6任一项所述的三维目标检测方法,所述三维目标抓取方法还包括:A three-dimensional target grabbing method, characterized in that it includes the three-dimensional target detection method according to any one of claims 1-6, and the three-dimensional target grabbing method further comprises:
    根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置;Determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system;
    根据所述空间位置生成抓取指令,以使得抓取器根据所述抓取指令对所述目标识别物进行抓取。A grasping instruction is generated according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  8. 一种三维目标检测装置,其特征在于,所述装置包括:A three-dimensional target detection device, characterized in that the device comprises:
    获取模块,用于获取包含目标识别物的深度图像;The acquisition module is used to acquire the depth image containing the target recognition object;
    第一生成模块,用于根据所述深度图像以及相机内参生成对应于所述深度图像的相机点云,所述相机点云为相机坐标系下的点云;a first generation module, configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud in a camera coordinate system;
    转换模块,用于将所述相机点云转换为世界点云,所述世界点云为世界坐标系下的点云;a conversion module for converting the camera point cloud into a world point cloud, where the world point cloud is a point cloud in the world coordinate system;
    第二生成模块,用于根据预设的目标识别模型对所述世界点云进行目标检测,以生成世界坐标系下所述目标识别物的外接最小矩体;a second generation module, configured to perform target detection on the world point cloud according to a preset target recognition model, so as to generate a circumscribed minimum rectangle of the target recognition object in the world coordinate system;
    第三生成模块,用于根据所述世界坐标系下所述目标识别物的外接最小矩体生成相机坐标系下所述目标识别物的外接最小矩体。The third generation module is configured to generate the minimum circumscribed rectangle of the target identifier in the camera coordinate system according to the circumscribed minimum rectangle of the target identifier in the world coordinate system.
  9. 根据权利要求8所述的装置,其特征在于,所述转换模块,包括:The device according to claim 8, wherein the conversion module comprises:
    配准单元,用于将所述相机点云与预设的平面点云进行配准,以生成相机坐标系到世界坐标系的变换矩阵;a registration unit for registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from the camera coordinate system to the world coordinate system;
    转换单元,用于根据所述变换矩阵将所述相机点云转换为世界点云。A conversion unit, configured to convert the camera point cloud into a world point cloud according to the transformation matrix.
  10. 根据权利要求9所述的装置,其特征在于,所述配准单元,用于包括:The device according to claim 9, wherein the registration unit is configured to include:
    分别计算所述相机点云在三个维度上的均值;Calculate the mean value of the camera point cloud in three dimensions respectively;
    根据所述均值构造齐次变换矩阵,将所述齐次变换矩阵设置为迭代最近点算法的初值;Construct a homogeneous transformation matrix according to the mean value, and set the homogeneous transformation matrix as the initial value of the iterative closest point algorithm;
    根据所述迭代最近点算法以及垂直于重力轴的平面点云生成相机坐标系到世界坐标系的变换矩阵。A transformation matrix from the camera coordinate system to the world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.
  11. 根据权利要求9或10所述的装置,其特征在于,所述转换单元,用于包括:The device according to claim 9 or 10, wherein the conversion unit is configured to include:
    确定所述变换矩阵对应的旋转矩阵;determining the rotation matrix corresponding to the transformation matrix;
    若所述旋转矩阵所对应的旋转角大于90度,则根据所述旋转矩阵与所述相机点云生成世界点云;If the rotation angle corresponding to the rotation matrix is greater than 90 degrees, generate a world point cloud according to the rotation matrix and the camera point cloud;
    若所述旋转矩阵所对应的旋转角不大于90度,则根据所述旋转矩阵对应的余角旋转量与所述相机点云生成世界点云。If the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to the complementary angle rotation corresponding to the rotation matrix and the camera point cloud.
  12. 根据权利要求8所述的装置,其特征在于,所述装置还包括训练模块,用于:The device according to claim 8, wherein the device further comprises a training module for:
    构建点云数据训练集,所述点云数据训练集包括多组世界点云数据以及每一组世界点云数据对应的标签信息;constructing a point cloud data training set, the point cloud data training set includes multiple sets of world point cloud data and label information corresponding to each set of world point cloud data;
    利用所述点云数据训练集训练预设的目标识别算法,以生成所述目标识别模型。A preset target recognition algorithm is trained by using the point cloud data training set to generate the target recognition model.
  13. 根据权利要求12所述的装置,其特征在于,所述训练模块,用于包括:The apparatus according to claim 12, wherein the training module is configured to include:
    构建三维模型库,所述三维模型库包括多个识别物的三维模型;constructing a three-dimensional model library, the three-dimensional model library includes three-dimensional models of a plurality of identification objects;
    将每一个识别物摆正至世界坐标系之后,计算每一个识别物的外接最小矩体初始值;After aligning each recognized object to the world coordinate system, calculate the initial value of the circumscribed minimum rectangle of each recognized object;
    将每一个识别物进行仿真摆放,并计算每一个识别物在仿真位置的外接最小矩体仿真值;Place each identification object in a simulated position, and calculate the external minimum rectangle simulation value of each identification object at the simulation position;
    随机生成相机视角,并基于所述相机视角进行渲染,以生成每一个识别物的相机点云数据;Randomly generating a camera perspective, and rendering based on the camera perspective to generate camera point cloud data for each identified object;
    将所述每一个识别物的相机点云数据转换为对应的世界点云数据;Converting the camera point cloud data of each identified object into the corresponding world point cloud data;
    对所述对应的世界点云数据添加标签信息。Add label information to the corresponding world point cloud data.
  14. 一种三维目标抓取装置,其特征在于,包括权利要求9-13任一项所述的三维目标检测装置,所述三维目标抓取装置还包括:A three-dimensional target grasping device, characterized in that it includes the three-dimensional target detection device according to any one of claims 9-13, and the three-dimensional target grasping device further comprises:
    空间确定模块,用于根据所述相机坐标系下所述目标识别物的外接最小矩体确定所述目标识别物的空间位置;a spatial determination module, configured to determine the spatial position of the target recognition object according to the circumscribed minimum rectangle of the target recognition object in the camera coordinate system;
    抓取模块,用于根据所述空间位置生成抓取指令,以使得抓取器根据 所述抓取指令对所述目标识别物进行抓取。A grasping module, configured to generate a grasping instruction according to the spatial position, so that the grasper grasps the target identification object according to the grasping instruction.
  15. 一种电子设备,其特征在于,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;An electronic device, characterized in that it comprises: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface communicate with each other through the communication bus;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-6任意一项所述的三维目标检测方法或如权利要求7所述的三维目标抓取方法的操作。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the three-dimensional target detection method according to any one of claims 1-6 or the three-dimensional target capture method according to claim 7. The operation of the fetch method.
  16. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一可执行指令,所述可执行指令在电子设备上运行时,使得电子设备执行如权利要求1-6任意一项所述的三维目标检测方法或如权利要求7所述的三维目标抓取方法的操作。A computer-readable storage medium, characterized in that, the storage medium stores at least one executable instruction, and when the executable instruction is executed on an electronic device, the electronic device executes any one of claims 1-6. The operation of the three-dimensional target detection method or the three-dimensional target grasping method as claimed in claim 7.
  17. 一种计算机程序,包括指令,当其在计算机上运行时,使得计算机执行根据权利要求1-6任意一项所述的三维目标检测方法或如权利要求7所述的三维目标抓取方法的操作。A computer program, comprising instructions that, when run on a computer, make the computer perform the operations of the three-dimensional target detection method according to any one of claims 1-6 or the three-dimensional target grasping method according to claim 7 .
PCT/CN2021/143443 2021-04-29 2021-12-30 Three-dimensional target detection method and grabbing method, apparatus, and electronic device WO2022227678A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110473106.3A CN113223091B (en) 2021-04-29 2021-04-29 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN202110473106.3 2021-04-29

Publications (1)

Publication Number Publication Date
WO2022227678A1 true WO2022227678A1 (en) 2022-11-03

Family

ID=77090035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143443 WO2022227678A1 (en) 2021-04-29 2021-12-30 Three-dimensional target detection method and grabbing method, apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN113223091B (en)
WO (1) WO2022227678A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116330306A (en) * 2023-05-31 2023-06-27 之江实验室 Object grabbing method and device, storage medium and electronic equipment

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223091B (en) * 2021-04-29 2023-01-24 达闼机器人股份有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN115222799B (en) * 2021-08-12 2023-04-11 达闼机器人股份有限公司 Method and device for acquiring image gravity direction, electronic equipment and storage medium
CN113689351B (en) * 2021-08-24 2023-10-10 北京石油化工学院 Dangerous chemical storage monitoring method, device and equipment based on depth camera
CN114627239B (en) * 2022-03-04 2024-04-30 北京百度网讯科技有限公司 Bounding box generation method, device, equipment and storage medium
CN114754779B (en) * 2022-04-27 2023-02-14 镁佳(北京)科技有限公司 Positioning and mapping method and device and electronic equipment
CN114643588B (en) * 2022-05-19 2022-08-05 睿驰(深圳)智能有限公司 Control method, system and medium for autonomous mobile disinfection robot
CN115272791B (en) * 2022-07-22 2023-05-26 仲恺农业工程学院 YoloV 5-based multi-target detection and positioning method for tea leaves
CN117689678A (en) * 2024-02-04 2024-03-12 法奥意威(苏州)机器人系统有限公司 Workpiece weld joint identification method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157920A1 (en) * 2016-12-01 2018-06-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing obstacle of vehicle
CN110344621A (en) * 2019-06-13 2019-10-18 武汉大学 A kind of wheel points cloud detection method of optic towards intelligent garage
WO2020103427A1 (en) * 2018-11-23 2020-05-28 华为技术有限公司 Object detection method, related device and computer storage medium
CN111950426A (en) * 2020-08-06 2020-11-17 东软睿驰汽车技术(沈阳)有限公司 Target detection method and device and delivery vehicle
CN111986232A (en) * 2020-08-13 2020-11-24 上海高仙自动化科技发展有限公司 Target object detection method, target object detection device, robot and storage medium
CN112200851A (en) * 2020-12-09 2021-01-08 北京云测信息技术有限公司 Point cloud-based target detection method and device and electronic equipment thereof
US20210042929A1 (en) * 2019-01-22 2021-02-11 Institute Of Automation, Chinese Academy Of Sciences Three-dimensional object detection method and system based on weighted channel features of a point cloud
CN113223091A (en) * 2021-04-29 2021-08-06 达闼机器人有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986161B (en) * 2018-06-19 2020-11-10 亮风台(上海)信息科技有限公司 Three-dimensional space coordinate estimation method, device, terminal and storage medium
CN112446227A (en) * 2019-08-12 2021-03-05 阿里巴巴集团控股有限公司 Object detection method, device and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157920A1 (en) * 2016-12-01 2018-06-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing obstacle of vehicle
WO2020103427A1 (en) * 2018-11-23 2020-05-28 华为技术有限公司 Object detection method, related device and computer storage medium
US20210042929A1 (en) * 2019-01-22 2021-02-11 Institute Of Automation, Chinese Academy Of Sciences Three-dimensional object detection method and system based on weighted channel features of a point cloud
CN110344621A (en) * 2019-06-13 2019-10-18 武汉大学 A kind of wheel points cloud detection method of optic towards intelligent garage
CN111950426A (en) * 2020-08-06 2020-11-17 东软睿驰汽车技术(沈阳)有限公司 Target detection method and device and delivery vehicle
CN111986232A (en) * 2020-08-13 2020-11-24 上海高仙自动化科技发展有限公司 Target object detection method, target object detection device, robot and storage medium
CN112200851A (en) * 2020-12-09 2021-01-08 北京云测信息技术有限公司 Point cloud-based target detection method and device and electronic equipment thereof
CN113223091A (en) * 2021-04-29 2021-08-06 达闼机器人有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116330306A (en) * 2023-05-31 2023-06-27 之江实验室 Object grabbing method and device, storage medium and electronic equipment
CN116330306B (en) * 2023-05-31 2023-08-15 之江实验室 Object grabbing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113223091A (en) 2021-08-06
CN113223091B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
WO2022227678A1 (en) Three-dimensional target detection method and grabbing method, apparatus, and electronic device
Tóth et al. Automatic LiDAR-camera calibration of extrinsic parameters using a spherical target
CN107679537B (en) A kind of texture-free spatial target posture algorithm for estimating based on profile point ORB characteristic matching
US7680300B2 (en) Visual object recognition and tracking
CN111738261A (en) Pose estimation and correction-based disordered target grabbing method for single-image robot
CN108986161A (en) A kind of three dimensional space coordinate estimation method, device, terminal and storage medium
CN108492333B (en) Spacecraft attitude estimation method based on satellite-rocket docking ring image information
WO2022017131A1 (en) Point cloud data processing method and device, and intelligent driving control method and device
CN110796700B (en) Multi-object grabbing area positioning method based on convolutional neural network
CN111862201A (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN111415420B (en) Spatial information determining method and device and electronic equipment
CN112465903A (en) 6DOF object attitude estimation method based on deep learning point cloud matching
CN113927597B (en) Robot connecting piece six-degree-of-freedom pose estimation system based on deep learning
WO2022021156A1 (en) Method and apparatus for robot to grab three-dimensional object
CN109934165A (en) A kind of joint point detecting method, device, storage medium and electronic equipment
KR102372298B1 (en) Method for acquiring distance to at least one object located in omni-direction of vehicle and vision device using the same
Liu et al. 6d object pose estimation without pnp
US11420334B2 (en) Candidate six dimensional pose hypothesis selection
CN115409949A (en) Model training method, visual angle image generation method, device, equipment and medium
CN112378409B (en) Robot RGB-D SLAM method based on geometric and motion constraint in dynamic environment
Kim et al. Pose initialization method of mixed reality system for inspection using convolutional neural network
Du et al. Pose Measurement Method of Non-cooperative Targets Based on Semantic Segmentation
CN117315018B (en) User plane pose detection method, equipment and medium based on improved PnP
CN110580703B (en) Distribution line detection method, device, equipment and storage medium
Vladimir et al. A lightweight convolutional neural network for pose estimation of a planar model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21939137

Country of ref document: EP

Kind code of ref document: A1