WO2022179261A1 - Procédé et appareil de saisie d'objet basé sur un alignement 3d et dispositif informatique - Google Patents

Procédé et appareil de saisie d'objet basé sur un alignement 3d et dispositif informatique Download PDF

Info

Publication number
WO2022179261A1
WO2022179261A1 PCT/CN2021/138473 CN2021138473W WO2022179261A1 WO 2022179261 A1 WO2022179261 A1 WO 2022179261A1 CN 2021138473 W CN2021138473 W CN 2021138473W WO 2022179261 A1 WO2022179261 A1 WO 2022179261A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
point
matching
objects
pose information
Prior art date
Application number
PCT/CN2021/138473
Other languages
English (en)
Chinese (zh)
Inventor
刘迪一
魏海永
盛文波
李辉
段文杰
丁有爽
邵天兰
Original Assignee
梅卡曼德(北京)机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 梅卡曼德(北京)机器人科技有限公司 filed Critical 梅卡曼德(北京)机器人科技有限公司
Publication of WO2022179261A1 publication Critical patent/WO2022179261A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0014Image feed-back for automatic industrial control, e.g. robot with camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30164Workpiece; Machine component

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, device and computing device for grasping objects based on 3D matching.
  • the determination of the pose information of the object to be grasped in the prior art is not accurate enough, which may easily lead to grasping errors of the robot, for example, the robot cannot grasp the object successfully, or the object falls after grasping, or even grasps What is taken is that the pressed object causes the object located above the pressed object to fall, which affects the realization of industrial automation.
  • the present application is proposed to provide a 3D matching-based object grasping method, apparatus and computing device that overcome the above problems or at least partially solve the above problems.
  • a method for grasping objects based on 3D matching comprising:
  • the stacking relationship between the objects is identified, and the target object is determined from each object according to the stacking relationship, and the pose information of the target object is converted into the robot coordinate system, and the converted position of the target object
  • the pose information is transmitted to the robot, so that the robot can perform a grasping operation for the target object according to the transformed pose information of the target object.
  • a 3D matching-based object grasping device comprising:
  • an acquisition module adapted to acquire a scene image of the current scene and a point cloud corresponding to the scene image
  • the instance segmentation module is suitable for inputting the scene image into the trained deep learning segmentation model for instance segmentation processing, so as to obtain the segmentation result of each object in the scene image;
  • the object point cloud determination module is suitable for determining the point cloud corresponding to each object according to the point cloud corresponding to the scene image and the segmentation result of each object;
  • a matching module adapted to match the point cloud corresponding to the object with the preset template point cloud for each object, and determine the pose information of the object
  • the stacking recognition module is suitable for identifying the stacking relationship between objects according to the point cloud corresponding to each object;
  • the processing module is suitable for determining the target object from each object according to the stacking relationship, converting the pose information of the target object into the robot coordinate system, and transmitting the transformed pose information of the target object to the robot for the robot to use according to the target object.
  • the transformed pose information of the object performs the grasping operation for the target object.
  • a computing device comprising: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface communicate with each other through the communication bus;
  • the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operations corresponding to the above-mentioned 3D matching-based object grasping method.
  • a computer storage medium where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the above-mentioned 3D matching-based object grasping method.
  • the scene image is segmented by using the deep learning segmentation model, so as to realize the precise segmentation of each object in the scene image.
  • the point cloud corresponding to each object with the preset template point cloud, the The accurate determination of the pose information of each object helps the robot to accurately and firmly perform the grasping operation according to the pose information of the object, avoiding the situation that the robot cannot successfully grasp the object or the object falls after being grasped.
  • this solution can also determine the target object to be grasped according to the stacking relationship between the objects, avoiding the situation where the object to be pressed is mistakenly grasped as the target object and the object located above the object to be pressed falls, effectively It greatly improves the object grasping accuracy and optimizes the object grasping method.
  • FIG. 1 shows a schematic flowchart of a method for grasping objects based on 3D matching according to an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of a method for grasping objects based on 3D matching according to another embodiment of the present application
  • FIG. 3 shows a structural block diagram of an object grasping device based on 3D matching according to an embodiment of the present application
  • FIG. 4 shows a schematic structural diagram of a computing device according to an embodiment of the present application.
  • FIG. 1 shows a schematic flowchart of a method for grasping objects based on 3D matching according to an embodiment of the present application. As shown in FIG. 1 , the method includes the following steps:
  • Step S101 Obtain a scene image of the current scene and a point cloud corresponding to the scene image, input the scene image into a trained deep learning segmentation model for instance segmentation processing, and obtain segmentation results of each object in the scene image.
  • the scene image and the depth image of the current scene can be collected by the camera, and the camera can be a 3D camera, which is set at the upper position.
  • the image may be an RGB image
  • the pixel points of the scene image and the depth image correspond one-to-one.
  • the point cloud includes the pose information of each 3D point.
  • the pose information of each 3D point can specifically include the XYZ three dimensions of each 3D point in space. The coordinate value of the axis and the XYZ three-axis orientation of each 3D point itself.
  • step S101 a scene image of the current scene collected by the camera and a point cloud corresponding to the scene image obtained by processing the scene image and the depth image may be acquired.
  • sample scene images can be collected in advance, a training sample set can be constructed, and each sample scene image in the training sample set can be trained by using a deep learning algorithm.
  • the deep learning segmentation model after obtaining the scene image of the current scene, the scene image can be input into the trained deep learning segmentation model, and a series of model calculations can be performed by using the trained deep learning segmentation model. Instance segmentation is performed on each object included, so as to obtain the segmentation result of each object in the scene image.
  • Step S102 according to the point cloud corresponding to the scene image and the segmentation result of each object, determine the point cloud corresponding to each object.
  • the point cloud corresponding to the scene image can be matched with the segmentation result of each object obtained by instance segmentation processing, the 3D point corresponding to each object can be found from the point cloud corresponding to the scene image, and for each object, the All 3D points corresponding to an object form a point cloud corresponding to the object.
  • Step S103 for each object, the point cloud corresponding to the object is matched with the preset template point cloud, and the pose information of the object is determined.
  • a template library containing multiple preset template point clouds is pre-built.
  • the preset template point clouds are pre-determined known objects that serve as matching benchmarks. the corresponding point cloud.
  • the pose information of the object is determined by matching the point cloud corresponding to the object with the preset template point cloud.
  • Step S104 according to the point cloud corresponding to each object, identify the stacking relationship between the objects.
  • each object may be stacked on each other in the actual scene, in order to prevent mistaking the pressed object as the target object to be grasped, in this embodiment, it is also necessary to identify the object according to the point cloud corresponding to each object.
  • the stacking relationship between any two objects is used to reflect which object between any two objects is the pressed object that is pressed below. Specifically, each 3D point in the point cloud corresponding to each object is projected onto the plane where the camera is located, and the stacking relationship between the two objects is determined by judging whether the projection areas corresponding to any two objects on the plane where the camera is located overlap. .
  • Step S105 determine the target object from each object, convert the pose information of the target object into the robot coordinate system, and transmit the transformed pose information of the target object to the robot.
  • the object to be pressed can be screened out from each object, and the object closest to the plane where the camera is located is selected from the objects after screening, and the object closest to the plane where the camera is located is the current scene.
  • the object with the highest position, the object closest to the plane where the camera is located is used as the target object to be grasped.
  • the pose information determined in step S103 is determined in the camera coordinate system.
  • the pose information of the target object needs to be converted into the robot coordinate system, and the target object needs to be converted into the robot coordinate system.
  • the transformed pose information of the object is transmitted to the robot, so that the robot can perform a grasping operation for the target object according to the transformed pose information of the target object.
  • the scene image is segmented by using the deep learning segmentation model, so as to realize the precise segmentation of each object in the scene image.
  • the template point cloud is matched to realize the accurate determination of the pose information of each object, which helps the robot to perform the grasping operation accurately and firmly according to the pose information of the object, and avoids the robot being unable to successfully grasp the object or the object being caught.
  • the situation of falling after grasping; and this solution can also determine the target object to be grasped according to the stacking relationship between the objects, so as to avoid mistakenly grasping the pressed object as the target object and causing the object to be located above the pressed object. In the case of falling objects, the accuracy of object grasping is effectively improved, and the way of grasping objects is optimized.
  • FIG. 2 shows a schematic flowchart of a method for grasping objects based on 3D matching according to another embodiment of the present application. As shown in FIG. 2 , the method includes the following steps:
  • Step S201 Obtain a scene image of the current scene and a point cloud corresponding to the scene image, input the scene image into a trained deep learning segmentation model for instance segmentation processing, and obtain segmentation results of each object in the scene image.
  • the scene image and depth image of the current scene can be collected by the camera, the scene image of the current scene collected by the camera and the point cloud corresponding to the scene image obtained by processing the scene image and the depth image are obtained, and then the scene image is input to the
  • the trained deep learning segmentation model is used to perform instance segmentation processing on each object included in the scene image, and the segmentation result of each object in the scene image is obtained.
  • the segmentation result of each object may include a binarized segmented image of each object.
  • the number of binarized segmented images output by the deep learning segmentation model corresponds to the number of objects contained in the scene image, and the binarized segmented images are in one-to-one correspondence with the objects contained in the scene image.
  • Step S202 according to the point cloud corresponding to the scene image and the segmentation result of each object, determine the point cloud corresponding to each object.
  • the segmentation result of each object may include a binarized segmentation image of each object, and for each object, the binary segmentation image of the object may include the object area where the object is located and the non-object area other than the object area.
  • the object area, the object area can be represented by a white area, and the non-object area can be represented by a black area.
  • the point cloud corresponding to the scene image can be projected into the binarized segmentation image of the object, and the 3D point in the object region of the object region of the binary segmentation image can be projected from the point cloud corresponding to the scene image as the The 3D point corresponding to the object is obtained, and the point cloud corresponding to the object is obtained. Specifically, all 3D points in the point cloud corresponding to the scene image are projected. If a 3D point in the point cloud corresponding to the scene image falls into the white object area after being projected, then the 3D point is considered to belong to the object. That is, the 3D point is the 3D point corresponding to the object, and all the 3D points corresponding to the object are summarized to obtain the point cloud corresponding to the object. Through this processing method, the accurate determination of the point cloud corresponding to the object is achieved.
  • the pose information of each object can be determined by matching with the preset template point cloud.
  • the determination of the pose information of the object can be completed by secondary matching. Specifically, it can be implemented through the following steps S203 and S204.
  • Step S203 for each object, the pose information of each 3D point in the point cloud corresponding to the object is matched with the pose information of each 3D point in the preset template point cloud for the first time to obtain the first matching result.
  • the point cloud corresponding to the object includes the pose information of each 3D point, and for any two 3D points in the point cloud corresponding to the object, a point pair containing two 3D points is constructed, and according to the two 3D points
  • the pose information generates a point-to-point vector of point pairs, so that the object corresponds to multiple point pairs.
  • the preset template point cloud is a pre-determined point cloud corresponding to a known object that is used as a matching benchmark.
  • the point cloud also includes the pose information of each 3D point.
  • the construction contains There are two point pairs of 3D points, and a point pair vector of the point pairs is generated according to the pose information of the two 3D points, then the preset template point cloud also corresponds to multiple point pairs. Then, the point-to-point vector of each point pair of the object is matched with the point-to-point vector of each point pair in the preset template point cloud for the first time to obtain the first matching result.
  • the pose information of each 3D point is pre-defined in each preset template point cloud.
  • each preset template point cloud needs to be transformed into the current scene, so that the preset template point after pose transformation
  • the cloud and the point cloud corresponding to the object in the current scene overlap as much as possible, so as to obtain the first matching result, wherein the first matching result may include multiple pose transformation relationships of the matched preset template point cloud.
  • Step S204 performing a second matching between the pose information of each 3D point in the point cloud corresponding to the object and the multiple pose transformation relationships of the matching preset template point cloud, and according to the matching score in the second matching result
  • the pose transformation relationship of the highest matching preset template point cloud determines the pose information of the object.
  • the point corresponding to the object is The pose information of each 3D point in the cloud is matched with the multiple pose transformation relationships of the matched preset template point cloud for the second time.
  • a preset evaluation algorithm can be used to calculate the matching score between the pose information of each 3D point in the point cloud corresponding to the object and the multiple pose transformation relationships of the matching preset template point cloud, to obtain the first Second match result.
  • Those skilled in the art can select a preset evaluation algorithm according to actual needs, which is not limited here.
  • the preset evaluation algorithm may be an ICP (Iterative Closest Point, iterative closest point) algorithm, a GMM (Gaussian Mixed Model, Gaussian Mixed Model) algorithm, and the like.
  • the second matching is the further optimization and correction of the first matching result, and the pose transformation relationship of the matching preset template point cloud with the highest matching score in the second matching result is used as the final matching object.
  • the pose transformation relationship of the matched preset template point cloud with the highest matching score in the secondary matching results determines the pose information of the object.
  • Step S205 according to the point cloud corresponding to each object, identify the stacking relationship between the objects.
  • each 3D point in the point cloud corresponding to each object can be projected onto the plane where the camera is located to obtain the corresponding projection area of each object on the plane, and according to the pose information of each 3D point in the point cloud corresponding to each object, Calculate the distance between each object and the plane, and then determine whether the projection areas corresponding to any two objects overlap. If the corresponding projection areas of two objects overlap, it means that the two objects are in a state where one object is on top of the other, then it is determined that there is a stacking relationship between the two objects, and according to the relationship between the two objects and the plane The distance between the two objects determines the object to be pressed in the stacking relationship.
  • the object that is farther from the plane of the two objects can be determined as the object to be pressed, and the object that is closer to the plane of the two objects can be determined to be located in the object to be pressed. Press the object above the object; if the corresponding projection areas of two objects do not overlap, it means that the two objects are not in a state where one object is pressed on top of the other object, and it is determined that there is no stacking relationship between the two objects.
  • Step S206 determine the target object from each object, convert the pose information of the target object into the robot coordinate system, and transmit the transformed pose information of the target object to the robot.
  • the pressed object is not suitable as the current target object to be grasped
  • the pressed object can be screened out from each object according to the stacking relationship, and after the screening process Select the object closest to the plane where the camera is located as the target object. Since the pose information of the target object is determined in the camera coordinate system, in order to facilitate the robot to locate the target object, it is necessary to use a preset conversion algorithm to convert the pose information of the target object into the robot coordinate system, and then convert the target object into The converted pose information is transmitted to the robot, so that the robot can perform a grasping operation for the target object according to the converted pose information of the target object.
  • the precise segmentation of each object in the scene image is realized, and the point cloud corresponding to each object is matched with the preset template point cloud twice, so that each object can be accurately segmented. It can also accurately determine the pose information of the object, and can also realize the effective screening of objects according to the stacking relationship between the objects, screen out the pressed objects from each object, and select the current distance from the objects after screening.
  • the object closest to the plane is used as the current target object to be grasped, so that the robot can accurately and firmly perform the grasping operation according to the pose information of the final target object, which effectively improves the grasping accuracy of the object and reduces the grasping of the robot.
  • the object grasping method is further optimized.
  • Fig. 3 shows a structural block diagram of an object grasping device based on 3D matching according to an embodiment of the present application.
  • the device includes: an acquisition module 301, an instance segmentation module 302, an object point cloud determination module 303, A matching module 304 , a stack identification module 305 and a processing module 306 .
  • the obtaining module 301 is adapted to: obtain a scene image of the current scene and a point cloud corresponding to the scene image.
  • the instance segmentation module 302 is adapted to: input the scene image into the trained deep learning segmentation model to perform instance segmentation processing, and obtain segmentation results of each object in the scene image.
  • the object point cloud determination module 303 is adapted to: determine the point cloud corresponding to each object according to the point cloud corresponding to the scene image and the segmentation result of each object.
  • the matching module 304 is adapted to: for each object, match the point cloud corresponding to the object with the preset template point cloud, and determine the pose information of the object.
  • the stacking identification module 305 is adapted to: identify the stacking relationship between the objects according to the point cloud corresponding to each object.
  • the processing module 306 is adapted to: determine the target object from each object according to the stacking relationship, convert the pose information of the target object into the robot coordinate system, and transmit the transformed pose information of the target object to the robot for the robot to use according to The converted pose information of the target object performs a grasping operation for the target object.
  • the segmentation result of each object includes: a binarized segmented image of each object.
  • the object point cloud determination module 303 is further adapted to: for each object, project the point cloud corresponding to the scene image to the binarized segmentation image of the object, and project the point cloud corresponding to the scene image to the binarized segmentation image.
  • the 3D point in the object area is used as the 3D point corresponding to the object, and the point cloud corresponding to the object is obtained.
  • the matching module 304 is further adapted to: perform the first matching between the pose information of each 3D point in the point cloud corresponding to the object and the pose information of each 3D point in the preset template point cloud to obtain the first match.
  • Matching result wherein, the first matching result includes multiple pose transformation relationships of the matching preset template point cloud; the pose information of each 3D point in the point cloud corresponding to the object is matched with the matching preset template point.
  • the second matching is performed on the multiple pose transformation relationships of the cloud, and the pose information of the object is determined according to the pose transformation relationship of the matching preset template point cloud with the highest matching score in the second matching result.
  • the matching module 304 is further adapted to: for any two 3D points in the point cloud corresponding to the object, construct a point pair containing the two 3D points and generate a point pair vector of the point pair; each point of the object The point-to-point vector of the pair is matched with the point-to-point vector of each point pair in the preset template point cloud for the first time to obtain the first matching result.
  • the matching module 304 is further adapted to: use a preset evaluation algorithm to calculate the relationship between the pose information of each 3D point in the point cloud corresponding to the object and the multiple pose transformation relationships of the matching preset template point cloud. match score.
  • the stack recognition module 305 is further adapted to: project each 3D point in the point cloud corresponding to each object to the plane where the camera is located, obtain the projection area corresponding to each object, and calculate the distance between each object and the plane; determine Whether the projection areas corresponding to any two objects overlap; if so, it is determined that there is a stacking relationship between the two objects, and the pressed object in the stacking relationship is determined according to the distance between the two objects and the plane; if not, it is determined There is no stacking relationship between the two objects.
  • the processing module 306 is further adapted to: according to the stacking relationship, screen out the objects to be pressed from each object, and select the object closest to the plane where the camera is located as the target object from the objects after screening and processing.
  • the precise segmentation of each object in the scene image is realized, and the point cloud corresponding to each object is matched with the preset template point cloud twice, so that each object can be accurately segmented. It can also accurately determine the pose information of the object, and can also realize the effective screening of objects according to the stacking relationship between the objects, screen out the pressed objects from each object, and select the current distance from the objects after screening.
  • the object closest to the plane is used as the current target object to be grasped, so that the robot can accurately and firmly perform the grasping operation according to the pose information of the final target object, which effectively improves the grasping accuracy of the object and reduces the grasping of the robot.
  • the object grasping method is further optimized.
  • the present application also provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the executable instruction can execute the 3D matching-based object grasping method in any of the above method embodiments.
  • FIG. 4 shows a schematic structural diagram of a computing device according to an embodiment of the present application.
  • the specific embodiment of the present application does not limit the specific implementation of the computing device.
  • the computing device may include: a processor (processor) 402 , a communication interface (Communications Interface) 404 , a memory (memory) 406 , and a communication bus 408 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 .
  • the communication interface 404 is used for communicating with network elements of other devices such as clients or other servers.
  • the processor 402 is configured to execute the program 410, and specifically may execute the relevant steps in the above-mentioned embodiments of the method for grasping objects based on 3D matching.
  • the program 410 may include program code including computer operation instructions.
  • the processor 402 may be a central processing unit (CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 406 is used to store the program 410 .
  • Memory 406 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • the program 410 may specifically be used to cause the processor 402 to execute the 3D matching-based object grasping method in any of the above method embodiments.
  • the steps in the program 410 reference may be made to the corresponding descriptions in the corresponding steps and units in the above-mentioned embodiment of the object grasping based on 3D matching, which will not be repeated here.
  • Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding process descriptions in the foregoing method embodiments, which will not be repeated here.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
  • Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for performing part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Robotics (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil de saisie d'objet basé sur un alignement 3D et un dispositif informatique. Le procédé comprend : l'obtention d'une image de scène et d'un nuage de points correspondant à l'image de scène, et l'entrée de l'image de scène dans un modèle de segmentation d'apprentissage profond pour effectuer un traitement de segmentation d'instance pour obtenir des résultats de segmentation d'objets dans l'image de scène (S101) ; la détermination, en fonction du nuage de points correspondant à l'image de scène et aux résultats de segmentation des objets, de nuages de points correspondant aux objets (S102) ; pour chaque objet, l'alignement du nuage de points correspondant à l'objet et d'un nuage de points de modèle prédéfini, et la détermination d'informations de pose de l'objet (S103) ; l'identification d'une relation d'empilement entre les objets en fonction des nuages de points correspondant aux objets (S104) ; et la détermination d'un objet cible à partir des objets selon la relation d'empilement, et la transmission des informations de pose converties de l'objet cible à un robot (S105).
PCT/CN2021/138473 2021-02-26 2021-12-15 Procédé et appareil de saisie d'objet basé sur un alignement 3d et dispositif informatique WO2022179261A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110217397.XA CN112837371B (zh) 2021-02-26 2021-02-26 基于3d匹配的物体抓取方法、装置及计算设备
CN202110217397.X 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022179261A1 true WO2022179261A1 (fr) 2022-09-01

Family

ID=75933692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138473 WO2022179261A1 (fr) 2021-02-26 2021-12-15 Procédé et appareil de saisie d'objet basé sur un alignement 3d et dispositif informatique

Country Status (2)

Country Link
CN (1) CN112837371B (fr)
WO (1) WO2022179261A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115533902A (zh) * 2022-09-29 2022-12-30 杭州海康机器人股份有限公司 一种基于视觉引导的拆垛方法、装置、电子设备及系统
CN115582827A (zh) * 2022-10-20 2023-01-10 大连理工大学 一种基于2d和3d视觉定位的卸货机器人抓取方法
CN115582840A (zh) * 2022-11-14 2023-01-10 湖南视比特机器人有限公司 无边框钢板工件分拣抓取位姿计算方法、分拣方法及系统
CN118247781A (zh) * 2024-01-31 2024-06-25 九众九机器人有限公司 一种基于深度学习的工业机器人目标识别方法和系统
CN118305809A (zh) * 2024-06-07 2024-07-09 机科发展科技股份有限公司 使用机械臂进行工件抓取的方法、装置、设备和介质
CN118314531A (zh) * 2024-06-07 2024-07-09 浙江聿力科技有限公司 政务服务行为位姿监测管理方法及系统

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837371B (zh) * 2021-02-26 2024-05-24 梅卡曼德(北京)机器人科技有限公司 基于3d匹配的物体抓取方法、装置及计算设备
CN113284179B (zh) * 2021-05-26 2022-09-13 吉林大学 一种基于深度学习的机器人多物体分拣方法
CN113307042B (zh) * 2021-06-11 2023-01-03 梅卡曼德(北京)机器人科技有限公司 基于传送带的物体拆垛方法、装置、计算设备及存储介质
CN113351522B (zh) * 2021-06-11 2023-01-31 梅卡曼德(北京)机器人科技有限公司 物品分拣方法、装置及系统
CN113313803B (zh) * 2021-06-11 2024-04-19 梅卡曼德(北京)机器人科技有限公司 垛型分析方法、装置、计算设备及计算机存储介质
CN113511503B (zh) * 2021-06-17 2022-09-23 北京控制工程研究所 一种自主智能的地外探测不确定物体采集与归集装箱方法
CN114049444B (zh) * 2022-01-13 2022-04-15 深圳市其域创新科技有限公司 一种3d场景生成方法及装置
CN114049355B (zh) * 2022-01-14 2022-04-19 杭州灵西机器人智能科技有限公司 一种散乱工件的识别标注方法、系统和装置
CN114952809B (zh) * 2022-06-24 2023-08-01 中国科学院宁波材料技术与工程研究所 工件识别和位姿检测方法、系统及机械臂的抓取控制方法
CN115082559B (zh) * 2022-07-20 2022-11-01 广东工业大学 一种柔性件的多目标智能分拣方法、系统及存储介质
WO2024152235A1 (fr) * 2023-01-18 2024-07-25 中兴通讯股份有限公司 Procédé et dispositif de reconnaissance de position et de pose d'objet cible, procédé et système de fonctionnement d'objet cible, et support lisible par ordinateur
CN117104831A (zh) * 2023-09-01 2023-11-24 中信戴卡股份有限公司 转向节工件的机器人3d识别及加工方法与系统
CN117115228A (zh) * 2023-10-23 2023-11-24 广东工业大学 Sop芯片管脚共面度检测方法及装置
CN117724610B (zh) * 2023-12-13 2024-09-20 广东聚华新型显示研究院 用于头显设备的数据处理方法、装置、头戴设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109927036A (zh) * 2019-04-08 2019-06-25 青岛小优智能科技有限公司 一种三维视觉引导机械手抓取的方法及系统
US20190272411A1 (en) * 2018-03-05 2019-09-05 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Object recognition
CN110992427A (zh) * 2019-12-19 2020-04-10 深圳市华汉伟业科技有限公司 一种形变物体的三维位姿估计方法及定位抓取系统
CN111091062A (zh) * 2019-11-21 2020-05-01 东南大学 一种基于3d视觉聚类和匹配的机器人乱序目标分拣方法
CN111563442A (zh) * 2020-04-29 2020-08-21 上海交通大学 基于激光雷达的点云和相机图像数据融合的slam方法及系统
CN112837371A (zh) * 2021-02-26 2021-05-25 梅卡曼德(北京)机器人科技有限公司 基于3d匹配的物体抓取方法、装置及计算设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190272411A1 (en) * 2018-03-05 2019-09-05 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Object recognition
CN109927036A (zh) * 2019-04-08 2019-06-25 青岛小优智能科技有限公司 一种三维视觉引导机械手抓取的方法及系统
CN111091062A (zh) * 2019-11-21 2020-05-01 东南大学 一种基于3d视觉聚类和匹配的机器人乱序目标分拣方法
CN110992427A (zh) * 2019-12-19 2020-04-10 深圳市华汉伟业科技有限公司 一种形变物体的三维位姿估计方法及定位抓取系统
CN111563442A (zh) * 2020-04-29 2020-08-21 上海交通大学 基于激光雷达的点云和相机图像数据融合的slam方法及系统
CN112837371A (zh) * 2021-02-26 2021-05-25 梅卡曼德(北京)机器人科技有限公司 基于3d匹配的物体抓取方法、装置及计算设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115533902A (zh) * 2022-09-29 2022-12-30 杭州海康机器人股份有限公司 一种基于视觉引导的拆垛方法、装置、电子设备及系统
CN115582827A (zh) * 2022-10-20 2023-01-10 大连理工大学 一种基于2d和3d视觉定位的卸货机器人抓取方法
CN115582840A (zh) * 2022-11-14 2023-01-10 湖南视比特机器人有限公司 无边框钢板工件分拣抓取位姿计算方法、分拣方法及系统
CN118247781A (zh) * 2024-01-31 2024-06-25 九众九机器人有限公司 一种基于深度学习的工业机器人目标识别方法和系统
CN118305809A (zh) * 2024-06-07 2024-07-09 机科发展科技股份有限公司 使用机械臂进行工件抓取的方法、装置、设备和介质
CN118314531A (zh) * 2024-06-07 2024-07-09 浙江聿力科技有限公司 政务服务行为位姿监测管理方法及系统

Also Published As

Publication number Publication date
CN112837371B (zh) 2024-05-24
CN112837371A (zh) 2021-05-25

Similar Documents

Publication Publication Date Title
WO2022179261A1 (fr) Procédé et appareil de saisie d'objet basé sur un alignement 3d et dispositif informatique
WO2020119338A1 (fr) Procédé de détection de la position de préhension d'un robot concernant un objet cible
WO2021143231A1 (fr) Procédé d'apprentissage de modèle de détection cible, ainsi que procédé et appareil de marquage de données
CN111723782A (zh) 基于深度学习的视觉机器人抓取方法及系统
CN112802105A (zh) 对象抓取方法及装置
CN115319739B (zh) 一种基于视觉机械臂抓取工件方法
WO2023092519A1 (fr) Procédé et appareil de commande de préhension, dispositif électronique et support d'enregistrement
CN113284178B (zh) 物体码垛方法、装置、计算设备及计算机存储介质
CN113762159B (zh) 一种基于有向箭头模型的目标抓取检测方法及系统
KR20220089463A (ko) 피킹 로봇을 위한 비젼 분석 장치
CN114310892B (zh) 基于点云数据碰撞检测的物体抓取方法、装置和设备
CN116228854A (zh) 一种基于深度学习的包裹自动分拣方法
CN115284279A (zh) 一种基于混叠工件的机械臂抓取方法、装置及可读介质
Chowdhury et al. Neural Network-Based Pose Estimation Approaches for Mobile Manipulation
Arents et al. Construction of a smart vision-guided robot system for manipulation in a dynamic environment
Wang et al. GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter
Ge et al. Pixel-Level Collision-Free Grasp Prediction Network for Medical Test Tube Sorting on Cluttered Trays
CN113592800B (zh) 基于动态扫描参数的图像扫描方法及装置
CN113284129B (zh) 基于3d包围盒的压箱检测方法及装置
Li et al. RoboCloud: augmenting robotic visions for open environment modeling using Internet knowledge
WO2023082417A1 (fr) Procédé et appareil d'obtention d'informations de point de préhension, dispositif électronique et support de stockage
CN114972495A (zh) 针对纯平面结构的物体的抓取方法、装置及计算设备
Kijdech et al. Pick-and-place application using a dual arm collaborative robot and an RGB-D camera with YOLOv5
CN111470244B (zh) 机器人系统的控制方法以及控制装置
CN112837370A (zh) 基于3d包围盒的物体堆叠判断方法、装置及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927685

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.01.2024)

122 Ep: pct application non-entry in european phase

Ref document number: 21927685

Country of ref document: EP

Kind code of ref document: A1