WO2022116423A1 - Object posture estimation method and apparatus, and electronic device and computer storage medium - Google Patents

Object posture estimation method and apparatus, and electronic device and computer storage medium Download PDF

Info

Publication number
WO2022116423A1
WO2022116423A1 PCT/CN2021/083083 CN2021083083W WO2022116423A1 WO 2022116423 A1 WO2022116423 A1 WO 2022116423A1 CN 2021083083 W CN2021083083 W CN 2021083083W WO 2022116423 A1 WO2022116423 A1 WO 2022116423A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
loss value
point set
point
visibility
Prior art date
Application number
PCT/CN2021/083083
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
李泽远
朱星华
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022116423A1 publication Critical patent/WO2022116423A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to an object pose estimation method, apparatus, electronic device, and computer-readable storage medium.
  • the grasping and sorting tasks of industrial robotic arms mainly rely on the pose estimation of the objects to be grasped.
  • the pose estimation methods of objects mainly use point-by-point teaching or 2D visual perception methods.
  • the point-by-point teaching method is complex and time-consuming, and the 2D visual perception method will lead to inaccurate pose estimation of objects due to the cluttered placement of objects and the occlusion between objects.
  • An object pose estimation method provided by this application includes:
  • Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
  • Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object
  • the pose of the target object is calculated.
  • the present application also provides a device for estimating the pose of a target object, the device comprising:
  • a three-dimensional point cloud acquisition module configured to obtain a scene depth map of a target object by using a preset camera device, and calculate a three-dimensional point cloud of the scene depth map according to the pixel points in the scene depth map;
  • a target object point set extraction module used for extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set
  • a visibility loss value calculation module configured to calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
  • a key point loss value calculation module configured to perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the target object according to the key point set;
  • a semantic loss value calculation module configured to perform semantic segmentation on the pixels of the scene depth map to obtain the semantic loss value of the target object
  • the pose calculation module is configured to calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training.
  • the present application also provides an electronic device, the electronic device comprising:
  • the processor executes the computer program stored in the memory to implement the method for estimating the pose of an object as described below:
  • Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
  • Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object
  • the pose of the target object is calculated.
  • the present application also provides a computer-readable storage medium, including a storage data area and a storage program area, the storage data area stores created data, and the storage program area stores a computer program; wherein, the computer program is implemented as follows when executed by a processor The described object pose estimation method:
  • Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
  • Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object
  • the pose of the target object is calculated.
  • FIG. 1 is a schematic flowchart of an object pose estimation method provided by an embodiment of the present application
  • FIG. 2 is a schematic block diagram of an object pose estimation apparatus provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an internal structure of an electronic device for implementing a method for estimating object pose and pose provided by an embodiment of the present application;
  • the embodiments of the present application provide a method for estimating the pose of an object.
  • the execution subject of the object pose estimation method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like.
  • the object pose estimation method may be executed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • the object pose estimation method includes:
  • S1 Use a preset camera device to acquire a scene depth map of a target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map.
  • the camera device may be a 3D camera
  • the target object may be a target object to be grasped by a manipulator.
  • the scene depth image is also called a range image (range image), and refers to an image in which the distance (depth) from the camera to each point of the target object is taken as the pixel value.
  • the scene depth map can be calculated as point cloud data after coordinate transformation.
  • the scene depth map may be stored in a blockchain node.
  • the 3D point cloud of the scene depth map can be calculated according to the pixel points in the scene depth map through the following formula:
  • x, y, z are the coordinates of the point in the three-dimensional point cloud
  • u, v are the row and column where the pixel point is located in the scene depth map
  • c x and cy are the two-dimensional pixel point in the scene depth map
  • the coordinates, f x , f y , and d are the focal lengths of the camera device on the x-axis, the y-axis, and the z-axis, respectively.
  • the three-dimensional point cloud is the three-dimensional point cloud of the scene depth map of the target object to be grasped by the manipulator. Since there are many objects in the scene of the target object to be grasped, it is necessary to extract target points from the three-dimensional point cloud to obtain a target object point set.
  • the pre-built deep learning network is a convolutional neural network including a convolution layer, a pooling layer, and a fully connected layer.
  • the convolution layer uses a preset function to perform feature extraction on the three-dimensional point cloud, and the pooling layer compresses the data obtained by feature extraction, simplifies the computational complexity, and extracts main feature data.
  • the fully connected layer is:
  • the feature point set is obtained by concatenating all the data obtained by feature extraction.
  • the deep learning network further includes a classifier. Specifically, the classifier uses a given category to learn classification rules using known training data, and then classifies the feature point set to obtain the target object point set and the non-target object point set.
  • a deep learning network to extract target points in the three-dimensional point cloud to obtain a target object point set, including:
  • the feature point set is classified into a target point set and a non-target point set by using the classifier in the deep learning network, and the target point set is extracted to obtain a target object point set.
  • visibility is the degree to which a target object can be seen by normal eyesight. Some objects are occluded by other objects and other reasons, resulting in reduced visibility, resulting in a loss of visibility value. Those heavily occluded objects are not the objects that the robotic arm prioritizes to grab, because they are likely to be at the bottom, and there is not enough information for pose estimation. In order to reduce the interference caused by these objects, the embodiment of this application needs to calculate the objects The visibility loss value of .
  • the visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
  • N i represents the number of points of the target object point set of the target target object i
  • N max represents the number of points of the largest point set in the target object contained in the 3D point cloud
  • performing Hough voting on the target object point set to obtain a key point set including:
  • the target object sampling point set is obtained by sampling the target object point set, and the Euclidean distance offset of the target object sampling point is calculated to obtain the offset;
  • Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
  • the key point set is divided into a common key point set and a central key point, and the following formula is used to adopt a point-by-point method.
  • the feature regression algorithm calculates the keypoint loss value L kps of the keypoint set:
  • L kp represents the loss of common key points
  • N is the number of points in the target object point set
  • M is the number of common key points
  • L c represents the loss of the center key point
  • ⁇ x i is the actual offset from the common key point to the center key point
  • ⁇ 1 is the weight of the loss of the common key point
  • ⁇ 2 is the weight of the loss of the center key point.
  • the semantic segmentation is to calculate the semantic loss L s of the target object according to the pixel points of the scene depth map using the following formula
  • represents the balance parameter of the camera
  • represents the focus parameter of the camera
  • q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
  • the pose of the target object refers to a six-dimensional quantity composed of a three-dimensional rotation matrix and a three-dimensional translation matrix.
  • L kps represents the key point loss value
  • L s represents the semantic loss
  • L v represents the visibility loss value
  • ⁇ 01 , ⁇ 02 , ⁇ 03 represent the weights obtained after training the multi-task joint model value.
  • the embodiment of the present application calculates the three-dimensional point cloud of the scene depth map by acquiring the scene depth map of the target object, and uses the deep learning network to extract the target object point set from the three-dimensional point cloud, and according to the three-dimensional point cloud And the target object point set calculates the visibility loss value, key point loss value and semantic loss value of the target object, and finally obtains the pose of the target object according to the visibility loss value, key point loss value and semantic loss value.
  • the object pose estimation method proposed in the embodiment of the present application performs pose estimation on the target object according to the loss of visibility, key points, and semantics, and therefore, the accuracy of the object pose estimation can be improved.
  • FIG. 2 it is a schematic diagram of a module of the object pose estimation apparatus of the present application.
  • the object pose estimation apparatus 100 described in this application may be installed in an electronic device.
  • the object pose estimation device may include a three-dimensional point cloud acquisition module 101 , a target object point set extraction module 102 , a visibility loss value calculation module 103 , a key point loss value calculation module 104 , and a semantic loss value calculation module 105 and pose calculation module 106 .
  • the modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the three-dimensional point cloud acquiring module 101 is configured to acquire a scene depth map of a target object by using a preset camera device, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map.
  • the camera device may be a 3D camera
  • the target object may be a target object to be grasped by a manipulator.
  • the scene depth image (depth image) is also called a range image (range image), and refers to an image in which the distance (depth) from the camera to each point of the target object is taken as the pixel value.
  • the scene depth map can be calculated as point cloud data after coordinate transformation.
  • the 3D point cloud of the scene depth map can be calculated according to the pixel points in the scene depth map through the following formula:
  • x, y, z are the coordinates of the point in the three-dimensional point cloud
  • u, v are the row and column where the pixel point is located in the scene depth map
  • c x and cy are the two-dimensional pixel point in the scene depth map
  • the coordinates, f x , f y , and d are the focal lengths of the camera device on the x-axis, the y-axis, and the z-axis, respectively.
  • the target object point set extraction module 102 uses a pre-built deep learning network to extract target points in the three-dimensional point cloud to obtain a target object point set.
  • the three-dimensional point cloud is the three-dimensional point cloud of the scene depth map of the target object to be grasped by the manipulator. Since there are many objects in the scene of the target object to be grasped, it is necessary to extract target points from the three-dimensional point cloud to obtain a target object point set.
  • the pre-built deep learning network is a convolutional neural network, including a convolution layer, a pooling layer, and a fully connected layer.
  • the convolution layer uses a preset function to perform feature extraction on the three-dimensional point cloud, and the pooling layer compresses the data obtained by feature extraction, simplifies the computational complexity, and extracts main feature data.
  • the fully connected layer is:
  • the feature point set is obtained by concatenating all the data obtained by feature extraction.
  • the deep learning network further includes a classifier. Specifically, the classifier uses a given category to learn classification rules using known training data, and then classifies the feature point set to obtain the target object point set and the non-target object point set.
  • the target object point set extraction module 102 is specifically used for:
  • the feature point set is classified into a target point set and a non-target object point set by using the classifier in the deep learning network, and the target object point set is extracted.
  • the visibility loss value calculation module 103 is configured to calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set.
  • visibility is the degree to which a target object can be seen by normal eyesight. Some objects are occluded by other objects and other reasons, resulting in reduced visibility, resulting in a loss of visibility value. Those heavily occluded objects are not the objects that the robotic arm prioritizes to grab, because they are likely to be at the bottom, and there is not enough information for pose estimation. In order to reduce the interference caused by these objects, the embodiment of this application needs to calculate the objects The visibility loss value of .
  • the visibility loss value calculation module 103 is specifically used for:
  • the visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
  • N i represents the number of points of the target object point set of the target target object i
  • N max represents the number of points of the largest point set in the target object contained in the 3D point cloud
  • the key point loss value calculation module 104 is configured to perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the target object according to the key point set.
  • performing Hough voting on the target object point set to obtain a key point set including:
  • the target object sampling point set is obtained by sampling the target object point set, and the Euclidean distance offset of the target object sampling point is calculated to obtain the offset;
  • Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
  • the key point set is divided into a common key point set and a central key point, and the following formula is used to adopt a point-by-point method.
  • the feature regression algorithm calculates the keypoint loss value L kps of the keypoint set:
  • L kp represents the loss of common key points
  • N is the number of points in the target object point set
  • M is the number of common key points
  • L c represents the loss of the center key point
  • ⁇ x i is the actual offset from the common key point to the center key point
  • ⁇ 1 is the weight of the loss of the common key point
  • ⁇ 2 is the weight of the loss of the center key point.
  • the semantic loss value calculation module 105 is configured to perform semantic segmentation on the pixels of the scene depth map to obtain the semantic loss value of the target object.
  • the semantic segmentation is to calculate the semantic loss L s of the target object according to the pixel points of the scene depth map using the following formula
  • represents the balance parameter of the camera
  • represents the focus parameter of the camera
  • q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
  • the pose calculation module 106 is configured to calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training.
  • the pose of the target object refers to a six-dimensional quantity composed of a three-dimensional rotation matrix and a three-dimensional translation matrix.
  • the pose calculation module 106 uses the following multi-task joint model to calculate the final loss value L mt of the target object:
  • L kps represents the key point loss value
  • L s represents the semantic loss
  • L v represents the visibility loss value
  • ⁇ 01 , ⁇ 02 , ⁇ 03 represent the weights obtained after training the multi-task joint model value
  • the embodiment of the present application further adjusts the predicted rotation matrix and the predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
  • the pose calculation module 106 sends the pose of the target object to a pre-built robotic arm, and uses the robotic arm to perform the target object grasping task.
  • FIG. 3 it is a schematic structural diagram of an electronic device implementing the object pose estimation method of the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an object pose estimation program 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium may be volatile or non-volatile.
  • the readable storage medium includes a flash memory, a mobile hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device 1.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the object pose estimation program 12, etc., but also can be used to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central processing unit Central Processing unit, CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (ControlUnit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the object) stored in the memory 11. pose estimation program, etc.), and call the data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the figure. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the object pose estimation program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple computer programs, and when running in the processor 10, it can realize:
  • Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
  • Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object
  • the pose of the target object is calculated.
  • the modules/units integrated in the electronic device 1 may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read Only Memory) -Only Memory).
  • the computer usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; using the created data, etc.
  • modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A target object posture estimation method, comprising: obtaining a three-dimensional point cloud according to a scene depth map of a target object; extracting a target object point set from the three-dimensional point cloud; calculating a visibility loss value of the target object according to the three-dimensional point cloud and the target object point set; calculating a key point loss value of the target object by means of performing Hough voting on the target object point set; performing semantic segmentation on pixel points of the scene depth map, so as to obtain a semantic loss value of the target object; and calculating a posture of the target object according to the visibility loss value, the key point loss value, the semantic loss value and a multi-task joint model. Further provided are a target object posture estimation apparatus, a device and a storage medium. The method further relates to blockchain technology, and a scene depth map can be stored in a blockchain node. By means of the method, the posture of a target object to be grabbed can be accurately analyzed, thereby improving the grabbing precision of a mechanical arm.

Description

物体位姿估计方法、装置、电子设备及计算机存储介质Object pose estimation method, device, electronic device and computer storage medium
本申请要求于2020年12月01日提交中国专利局、申请号为202011385260.7,发明名称为“物体位姿估计方法、装置、电子设备及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 01, 2020 with the application number 202011385260.7 and the invention titled "Object pose estimation method, device, electronic device and computer storage medium", the entire content of which is Incorporated herein by reference.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种物体位姿估计方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to an object pose estimation method, apparatus, electronic device, and computer-readable storage medium.
背景技术Background technique
发明人意识到,随着工业领域上机械臂的不断发展和智能视觉系统的深入应用,搭载有智能视觉系统的机械臂开始承担起智能分拣、柔性制造等复杂任务,成为一种节省人力资源的工业机械。The inventor realized that with the continuous development of robotic arms in the industrial field and the in-depth application of intelligent vision systems, robotic arms equipped with intelligent vision systems began to undertake complex tasks such as intelligent sorting and flexible manufacturing, which became a way to save human resources. industrial machinery.
工业机械臂的抓取、分拣任务主要依靠对待抓取物体的位姿估计。目前,物体的位姿估计方法主要是利用逐点示教或者2D视觉感知的方法。然而在工业环境下,逐点示教的方法既复杂又浪费时间,2D视觉感知的方法又会因为物体存在的摆放杂乱和各物体之间的遮挡问题导致物体的位姿估计不准确。The grasping and sorting tasks of industrial robotic arms mainly rely on the pose estimation of the objects to be grasped. At present, the pose estimation methods of objects mainly use point-by-point teaching or 2D visual perception methods. However, in an industrial environment, the point-by-point teaching method is complex and time-consuming, and the 2D visual perception method will lead to inaccurate pose estimation of objects due to the cluttered placement of objects and the occlusion between objects.
发明内容SUMMARY OF THE INVENTION
本申请提供的一种物体位姿估计方法,包括:An object pose estimation method provided by this application includes:
利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
本申请还提供一种目标物体位姿估计装置,所述装置包括:The present application also provides a device for estimating the pose of a target object, the device comprising:
三维点云获取模块,用于利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;a three-dimensional point cloud acquisition module, configured to obtain a scene depth map of a target object by using a preset camera device, and calculate a three-dimensional point cloud of the scene depth map according to the pixel points in the scene depth map;
目标物体点集提取模块,用于利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;A target object point set extraction module, used for extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
可见度损失值计算模块,用于根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;a visibility loss value calculation module, configured to calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
关键点损失值计算模块,用于对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;a key point loss value calculation module, configured to perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the target object according to the key point set;
语义损失值计算模块,用于对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;a semantic loss value calculation module, configured to perform semantic segmentation on the pixels of the scene depth map to obtain the semantic loss value of the target object;
位姿计算模块,用于根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。The pose calculation module is configured to calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training.
本申请还提供一种电子设备,所述电子设备包括:The present application also provides an electronic device, the electronic device comprising:
存储器,存储至少一个计算机程序;及a memory that stores at least one computer program; and
处理器,执行所述存储器中存储的计算机程序以实现如下所述的物体位姿估计方法:The processor executes the computer program stored in the memory to implement the method for estimating the pose of an object as described below:
利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
本申请还提供一种计算机可读存储介质,包括存储数据区和存储程序区,存储数据区存储创建的数据,存储程序区存储有计算机程序;其中,所述计算机程序被处理器执行时实现如下所述的物体位姿估计方法:The present application also provides a computer-readable storage medium, including a storage data area and a storage program area, the storage data area stores created data, and the storage program area stores a computer program; wherein, the computer program is implemented as follows when executed by a processor The described object pose estimation method:
利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
附图说明Description of drawings
图1为本申请一实施例提供的物体位姿估计方法的流程示意图;FIG. 1 is a schematic flowchart of an object pose estimation method provided by an embodiment of the present application;
图2为本申请一实施例提供的物体位姿估计装置的模块示意图;FIG. 2 is a schematic block diagram of an object pose estimation apparatus provided by an embodiment of the present application;
图3为本申请一实施例提供的实现物体位姿估计方法的电子设备的内部结构示意图;3 is a schematic diagram of an internal structure of an electronic device for implementing a method for estimating object pose and pose provided by an embodiment of the present application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
本申请实施例提供一种物体位姿估计方法。所述物体位姿估计方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述物体位姿估计方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。The embodiments of the present application provide a method for estimating the pose of an object. The execution subject of the object pose estimation method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the object pose estimation method may be executed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
参照图1所示,为本申请一实施例提供的物体位姿估计方法的流程示意图。在本实施例中,所述物体位姿估计方法包括:Referring to FIG. 1 , a schematic flowchart of an object pose estimation method provided by an embodiment of the present application is shown. In this embodiment, the object pose estimation method includes:
S1、利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云。S1. Use a preset camera device to acquire a scene depth map of a target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map.
本申请实施例中,所述摄像装置可以是一个3D摄像机,以及所述目标物体可以是机械手待抓取的目标物体。所述场景深度图像(depthimage)也被称为距离影像(rangeimage),是指将从摄像装置到所述目标物体各点的距离(深度)作为像素值的图像。所述场景深度 图经过坐标转换可以计算为点云数据。In this embodiment of the present application, the camera device may be a 3D camera, and the target object may be a target object to be grasped by a manipulator. The scene depth image (depth image) is also called a range image (range image), and refers to an image in which the distance (depth) from the camera to each point of the target object is taken as the pixel value. The scene depth map can be calculated as point cloud data after coordinate transformation.
本申请其中一个实施例中,所述场景深度图可存储于区块链节点中。In one of the embodiments of the present application, the scene depth map may be stored in a blockchain node.
详细地,本申请实施例可以通过以下公式根据所述场景深度图中的像素点计算所述场景深度图的三维点云:In detail, in this embodiment of the present application, the 3D point cloud of the scene depth map can be calculated according to the pixel points in the scene depth map through the following formula:
Figure PCTCN2021083083-appb-000001
Figure PCTCN2021083083-appb-000001
其中,x、y、z是三维点云中点的坐标,u、v为所述场景深度图中像素点所在的行和列,c x和c y是所述场景深度图中像素点二维坐标,f x、f y、d分别为所述摄像装置在x轴、y轴和z轴的焦距。 Wherein, x, y, z are the coordinates of the point in the three-dimensional point cloud, u, v are the row and column where the pixel point is located in the scene depth map, c x and cy are the two-dimensional pixel point in the scene depth map The coordinates, f x , f y , and d are the focal lengths of the camera device on the x-axis, the y-axis, and the z-axis, respectively.
S2、利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集。S2. Use a pre-built deep learning network to extract target points in the three-dimensional point cloud to obtain a target object point set.
如上述描述可知,所述三维点云是机械手待抓取的目标物体的场景深度图的三维点云。由于所述待抓取的目标物体的场景中会存在很多物体,因此,需要从所述三维点云中提取目标点,得到目标物体点集。As can be seen from the above description, the three-dimensional point cloud is the three-dimensional point cloud of the scene depth map of the target object to be grasped by the manipulator. Since there are many objects in the scene of the target object to be grasped, it is necessary to extract target points from the three-dimensional point cloud to obtain a target object point set.
本申请实施例中,所述预构建的深度学习网络为卷积神经网络包括卷积层、池化层、全连接层。所述卷积层利用预设函数对所述三维点云进行特征提取,所述池化层为对特征提取得到的数据进行压缩,简化计算复杂度,提取主要特征数据,所述全连接层为连接所有特征提取得到的数据得到特征点集。进一步地,本申请实施例中,所述深度学习网络还包括一个分类器。详细地,所述分类器为利用给定的类别,利用已知的训练数据学习分类规则,然后对于所述特征点集进行分类,得到所述目标物体点集和非目标物体点集。In the embodiment of the present application, the pre-built deep learning network is a convolutional neural network including a convolution layer, a pooling layer, and a fully connected layer. The convolution layer uses a preset function to perform feature extraction on the three-dimensional point cloud, and the pooling layer compresses the data obtained by feature extraction, simplifies the computational complexity, and extracts main feature data. The fully connected layer is: The feature point set is obtained by concatenating all the data obtained by feature extraction. Further, in the embodiment of the present application, the deep learning network further includes a classifier. Specifically, the classifier uses a given category to learn classification rules using known training data, and then classifies the feature point set to obtain the target object point set and the non-target object point set.
详细地,所述利用深度学习网络提取所述三维点云中的目标点,得到目标物体点集,包括:In detail, the use of a deep learning network to extract target points in the three-dimensional point cloud to obtain a target object point set, including:
利用预构建的深度学习网络中的卷积、池化以及全连接层提取所述三维点云的特征点集;Extract the feature point set of the 3D point cloud by using the convolution, pooling and fully connected layers in the pre-built deep learning network;
利用所述深度学习网络中的分类器将所述特征点集分类为目标点集和非目标点集,并提取其中的目标点集得到目标物体点集。The feature point set is classified into a target point set and a non-target point set by using the classifier in the deep learning network, and the target point set is extracted to obtain a target object point set.
S3、根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值。S3. Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set.
可以理解,可见度是目标物体能够被正常目力看见的程度。有些物体由于被其他物体遮挡等原因,导致可见度降低,产生了可见度损失值。那些遮挡严重的物体并不是机械臂优先考虑抓取的对象,因为它们大概率位于底部,而且也没有足够的信息进行位姿估计,为了减少这些物体带来的干扰,本申请实施例需要计算物体的可见度损失值。It can be understood that visibility is the degree to which a target object can be seen by normal eyesight. Some objects are occluded by other objects and other reasons, resulting in reduced visibility, resulting in a loss of visibility value. Those heavily occluded objects are not the objects that the robotic arm prioritizes to grab, because they are likely to be at the bottom, and there is not enough information for pose estimation. In order to reduce the interference caused by these objects, the embodiment of this application needs to calculate the objects The visibility loss value of .
本申请其中一个实施例可以采用下述方法计算所述目标物体的可见度损失值:One of the embodiments of the present application may use the following method to calculate the visibility loss value of the target object:
根据所述目标物体点集的点数与所述三维点云中包含的所有物体中的最大点集的点数的比值计算所述目标物体的实际可见度;Calculate the actual visibility of the target object according to the ratio of the point number of the target object point set to the point number of the largest point set in all objects included in the three-dimensional point cloud;
通过所述实际可见度与所述目标物体的预测可见度的差的加权计算得到所述目标物体的可见度损失值。The visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
即:which is:
Figure PCTCN2021083083-appb-000002
Figure PCTCN2021083083-appb-000002
Figure PCTCN2021083083-appb-000003
Figure PCTCN2021083083-appb-000003
其中,N i代表目标目标物体i的目标物体点集的点数,N max代表三维点云中包含的目标物体中的最大点集的点数,
Figure PCTCN2021083083-appb-000004
代表目标目标物体i的预测可见度,即未有任何遮挡情况目标目标物体i的最大可见度。
Among them, N i represents the number of points of the target object point set of the target target object i, N max represents the number of points of the largest point set in the target object contained in the 3D point cloud,
Figure PCTCN2021083083-appb-000004
Represents the predicted visibility of the target object i, that is, the maximum visibility of the target object i without any occlusion.
S4、对所述目标物体点集进行霍夫投票,得到关键点集,计算所述关键点集的关键点损失值。S4. Perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the key point set.
详细地,所述对所述目标物体点集进行霍夫投票,得到关键点集,包括:Specifically, performing Hough voting on the target object point set to obtain a key point set, including:
从所述目标物体点集采样得到目标物体采样点集,计算所述目标物体采样点的欧式距离偏移,得到偏移量;The target object sampling point set is obtained by sampling the target object point set, and the Euclidean distance offset of the target object sampling point is calculated to obtain the offset;
根据所述偏移量进行投票,将票数超过预设阈值的点的集合作为关键点集。Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
进一步地,本申请实施例根据中心关键点有且只有一个且不会受遮挡影响的性质,将所述关键点集分为普通关键点集和中心关键点,并利用下述公式,采用逐点特征回归算法计算所述关键点集的关键点损失值L kpsFurther, in the embodiment of the present application, according to the property that there is only one central key point and is not affected by occlusion, the key point set is divided into a common key point set and a central key point, and the following formula is used to adopt a point-by-point method. The feature regression algorithm calculates the keypoint loss value L kps of the keypoint set:
Figure PCTCN2021083083-appb-000005
Figure PCTCN2021083083-appb-000005
Figure PCTCN2021083083-appb-000006
Figure PCTCN2021083083-appb-000006
L kps=γ 1L kp2L c L kps1 L kp2 L c
其中,L kp代表普通关键点损失,N为目标物体点集的点数,M为普通关键点的数量,
Figure PCTCN2021083083-appb-000007
代表目标物体点集的实际位置偏移,
Figure PCTCN2021083083-appb-000008
代表目标物体点集的预测实际位置偏移,L c代表中心关键点损失,Δx i是普通关键点到中心关键点的实际偏移,
Figure PCTCN2021083083-appb-000009
是普通关键点到中心关键点的预测偏移,γ 1为普通关键点损失的权值、γ 2为中心关键点损失的权重。
Among them, L kp represents the loss of common key points, N is the number of points in the target object point set, M is the number of common key points,
Figure PCTCN2021083083-appb-000007
represents the actual position offset of the target object point set,
Figure PCTCN2021083083-appb-000008
represents the predicted actual position offset of the target object point set, L c represents the loss of the center key point, Δx i is the actual offset from the common key point to the center key point,
Figure PCTCN2021083083-appb-000009
is the predicted offset from the common key point to the center key point, γ 1 is the weight of the loss of the common key point, and γ 2 is the weight of the loss of the center key point.
S5、对所述场景深度图的像素点进行语义分割,得到语义损失值。S5. Perform semantic segmentation on the pixels of the scene depth map to obtain a semantic loss value.
详细地,所述语义分割为根据所述场景深度图的像素点,利用如下公式计算得到所述目标物体的语义损失L sIn detail, the semantic segmentation is to calculate the semantic loss L s of the target object according to the pixel points of the scene depth map using the following formula;
L s=-α(1-q i) γlog(q i) L s =-α(1-q i ) γ log(q i )
其中,α表示所述摄像装置的平衡参数,γ表示所述摄像装置的焦点参数,q i代表场景深度图中第i个像素点属于前景点还是背景点的置信度。 Wherein, α represents the balance parameter of the camera, γ represents the focus parameter of the camera, and q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
S6、根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。S6. Calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training.
详细地,本申请实施例中,所述目标物体的位姿是指三维的旋转矩阵和三维的平移矩阵组成的六维量。In detail, in the embodiment of the present application, the pose of the target object refers to a six-dimensional quantity composed of a three-dimensional rotation matrix and a three-dimensional translation matrix.
本申请实施例利用下述多任务联合模型计算所述目标物体的最终损失值L mtThe embodiment of the present application uses the following multi-task joint model to calculate the final loss value L mt of the target object:
L mt=μ 1L kps2L s3L v L mt = μ 1 L kps + μ 2 L s + μ 3 L v
其中,L kps代表所述关键点损失值,L s代表所述语义损失,L v代表所述可见度损失值,μ 01、μ 02、μ 03代表对所述多任务联合模型训练后得到的权值。 Wherein, L kps represents the key point loss value, L s represents the semantic loss, L v represents the visibility loss value, μ 01 , μ 02 , μ 03 represent the weights obtained after training the multi-task joint model value.
根据所述最终损失值调整所述目标物体的预测旋转矩阵和预测平移矩阵,得到所述目标物体的物姿。Adjust the predicted rotation matrix and predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
本申请实施例通过获取目标物体的场景深度图计算出所述场景深度图的三维点云,并 利用深度学习网络从所述三维点云中提取得到目标物体点集,并根据所述三维点云以及所述目标物体点集计算所述目标物体的可见度损失值、关键点损失值以及语义损失值,最后根据所述可见度损失值、关键点损失值以及语义损失值得到目标物体的位姿。本申请实施例提出的物体位姿估计方法根据可见度、关键点以及语义三个方面的损失对目标物体进行位姿估计,因此,可以提高物体位姿估计的准确性。The embodiment of the present application calculates the three-dimensional point cloud of the scene depth map by acquiring the scene depth map of the target object, and uses the deep learning network to extract the target object point set from the three-dimensional point cloud, and according to the three-dimensional point cloud And the target object point set calculates the visibility loss value, key point loss value and semantic loss value of the target object, and finally obtains the pose of the target object according to the visibility loss value, key point loss value and semantic loss value. The object pose estimation method proposed in the embodiment of the present application performs pose estimation on the target object according to the loss of visibility, key points, and semantics, and therefore, the accuracy of the object pose estimation can be improved.
如图2所示,是本申请物体位姿估计装置的模块示意图。As shown in FIG. 2 , it is a schematic diagram of a module of the object pose estimation apparatus of the present application.
本申请所述物体位姿估计装置100可以安装于电子设备中。根据实现的功能,所述物体位姿估计装置可以包括三维点云获取模块101、目标物体点集提取模块102、可见度损失值计算模块103、关键点损失值计算模块104、语义损失值计算模块105和位姿计算模块106。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The object pose estimation apparatus 100 described in this application may be installed in an electronic device. According to the implemented functions, the object pose estimation device may include a three-dimensional point cloud acquisition module 101 , a target object point set extraction module 102 , a visibility loss value calculation module 103 , a key point loss value calculation module 104 , and a semantic loss value calculation module 105 and pose calculation module 106 . The modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:
所述三维点云获取模块101,用于利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云。The three-dimensional point cloud acquiring module 101 is configured to acquire a scene depth map of a target object by using a preset camera device, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map.
本申请实施例中,所述摄像装置可以是一个3D摄像机,以及所述目标物体可以是机械手待抓取的目标物体。所述场景深度图像(depthimage)也被称为距离影像(rangeimage),是指将从摄像装置到所述目标物体各点的距离(深度)作为像素值的图像。所述场景深度图经过坐标转换可以计算为点云数据。详细地,本申请实施例可以通过以下公式根据所述场景深度图中的像素点计算所述场景深度图的三维点云:In this embodiment of the present application, the camera device may be a 3D camera, and the target object may be a target object to be grasped by a manipulator. The scene depth image (depth image) is also called a range image (range image), and refers to an image in which the distance (depth) from the camera to each point of the target object is taken as the pixel value. The scene depth map can be calculated as point cloud data after coordinate transformation. In detail, in this embodiment of the present application, the 3D point cloud of the scene depth map can be calculated according to the pixel points in the scene depth map through the following formula:
Figure PCTCN2021083083-appb-000010
Figure PCTCN2021083083-appb-000010
其中,x、y、z是三维点云中点的坐标,u、v为所述场景深度图中像素点所在的行和列,c x和c y是所述场景深度图中像素点二维坐标,f x、f y、d分别为所述摄像装置在x轴、y轴和z轴的焦距。 Wherein, x, y, z are the coordinates of the point in the three-dimensional point cloud, u, v are the row and column where the pixel point is located in the scene depth map, c x and cy are the two-dimensional pixel point in the scene depth map The coordinates, f x , f y , and d are the focal lengths of the camera device on the x-axis, the y-axis, and the z-axis, respectively.
所述目标物体点集提取模块102,利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集。The target object point set extraction module 102 uses a pre-built deep learning network to extract target points in the three-dimensional point cloud to obtain a target object point set.
如上述描述可知,所述三维点云是机械手待抓取的目标物体的场景深度图的三维点云。由于所述待抓取的目标物体的场景中会存在很多物体,因此,需要从所述三维点云中提取目标点,得到目标物体点集。As can be seen from the above description, the three-dimensional point cloud is the three-dimensional point cloud of the scene depth map of the target object to be grasped by the manipulator. Since there are many objects in the scene of the target object to be grasped, it is necessary to extract target points from the three-dimensional point cloud to obtain a target object point set.
本申请实施例中,所述预构建的深度学习网络为卷积神经网络,包括卷积层、池化层、全连接层。所述卷积层利用预设函数对所述三维点云进行特征提取,所述池化层为对特征提取得到的数据进行压缩,简化计算复杂度,提取主要特征数据,所述全连接层为连接所有特征提取得到的数据得到特征点集。进一步地,本申请实施例中,所述深度学习网络还包括一个分类器。详细地,所述分类器为利用给定的类别,利用已知的训练数据学习分类规则,然后对于所述特征点集进行分类,得到所述目标物体点集和非目标物体点集。In the embodiment of the present application, the pre-built deep learning network is a convolutional neural network, including a convolution layer, a pooling layer, and a fully connected layer. The convolution layer uses a preset function to perform feature extraction on the three-dimensional point cloud, and the pooling layer compresses the data obtained by feature extraction, simplifies the computational complexity, and extracts main feature data. The fully connected layer is: The feature point set is obtained by concatenating all the data obtained by feature extraction. Further, in the embodiment of the present application, the deep learning network further includes a classifier. Specifically, the classifier uses a given category to learn classification rules using known training data, and then classifies the feature point set to obtain the target object point set and the non-target object point set.
详细地,本申请实施例中,所述目标物体点集提取模块102具体用于:In detail, in the embodiment of the present application, the target object point set extraction module 102 is specifically used for:
利用预构建的深度学习网络中的卷积、池化以及全连接层提取所述三维点云的特征点集;Extract the feature point set of the 3D point cloud by using the convolution, pooling and fully connected layers in the pre-built deep learning network;
利用所述深度学习网络中的分类器将所述特征点集分类为目标点集和非目标物体点集,并提取其中的目标物体点集。The feature point set is classified into a target point set and a non-target object point set by using the classifier in the deep learning network, and the target object point set is extracted.
所述可见度损失值计算模块103,用于根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值。The visibility loss value calculation module 103 is configured to calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set.
可以理解,可见度是目标物体能够被正常目力看见的程度。有些物体由于被其他物体遮挡等原因,导致可见度降低,产生了可见度损失值。那些遮挡严重的物体并不是机械臂优先考虑抓取的对象,因为它们大概率位于底部,而且也没有足够的信息进行位姿估计,为了减少这些物体带来的干扰,本申请实施例需要计算物体的可见度损失值。It can be understood that visibility is the degree to which a target object can be seen by normal eyesight. Some objects are occluded by other objects and other reasons, resulting in reduced visibility, resulting in a loss of visibility value. Those heavily occluded objects are not the objects that the robotic arm prioritizes to grab, because they are likely to be at the bottom, and there is not enough information for pose estimation. In order to reduce the interference caused by these objects, the embodiment of this application needs to calculate the objects The visibility loss value of .
本申请其中一个实施例,所述可见度损失值计算模块103具体用于:In one of the embodiments of the present application, the visibility loss value calculation module 103 is specifically used for:
根据所述目标物体的目标物体点集的点数与所述三维点云中包含的所有目标物体中的最大点集的点数的比值计算所述目标物体的实际可见度;Calculate the actual visibility of the target object according to the ratio of the number of points of the target object point set of the target object to the number of points of the largest point set among all the target objects included in the three-dimensional point cloud;
通过所述实际可见度与所述目标物体的预测可见度的差的加权计算得到所述目标物体的可见度损失值。The visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
即:which is:
Figure PCTCN2021083083-appb-000011
Figure PCTCN2021083083-appb-000011
Figure PCTCN2021083083-appb-000012
Figure PCTCN2021083083-appb-000012
其中,N i代表目标目标物体i的目标物体点集的点数,N max代表三维点云中包含的目标物体中的最大点集的点数,
Figure PCTCN2021083083-appb-000013
代表目标目标物体i的预测可见度,即未有任何遮挡情况目标目标物体i的最大可见度。
Among them, N i represents the number of points of the target object point set of the target target object i, N max represents the number of points of the largest point set in the target object contained in the 3D point cloud,
Figure PCTCN2021083083-appb-000013
Represents the predicted visibility of the target object i, that is, the maximum visibility of the target object i without any occlusion.
所述关键点损失值计算模块104,用于对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值。The key point loss value calculation module 104 is configured to perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the target object according to the key point set.
详细地,所述对所述目标物体点集进行霍夫投票,得到关键点集,包括:Specifically, performing Hough voting on the target object point set to obtain a key point set, including:
从所述目标物体点集采样得到目标物体采样点集,计算所述目标物体采样点的欧式距离偏移,得到偏移量;The target object sampling point set is obtained by sampling the target object point set, and the Euclidean distance offset of the target object sampling point is calculated to obtain the offset;
根据所述偏移量进行投票,将票数超过预设阈值的点的集合作为关键点集。Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
进一步地,本申请实施例根据中心关键点有且只有一个且不会受遮挡影响的性质,将所述关键点集分为普通关键点集和中心关键点,并利用下述公式,采用逐点特征回归算法计算所述关键点集的关键点损失值L kpsFurther, in the embodiment of the present application, according to the property that there is only one central key point and is not affected by occlusion, the key point set is divided into a common key point set and a central key point, and the following formula is used to adopt a point-by-point method. The feature regression algorithm calculates the keypoint loss value L kps of the keypoint set:
Figure PCTCN2021083083-appb-000014
Figure PCTCN2021083083-appb-000014
Figure PCTCN2021083083-appb-000015
Figure PCTCN2021083083-appb-000015
L kps=γ 1L kp2L c L kps1 L kp2 L c
其中,L kp代表普通关键点损失,N为目标物体点集的点数,M为普通关键点的数量,
Figure PCTCN2021083083-appb-000016
代表目标物体点集的实际位置偏移,
Figure PCTCN2021083083-appb-000017
代表目标物体点集的预测实际位置偏移,L c代表中心关键点损失,Δx i是普通关键点到中心关键点的实际偏移,
Figure PCTCN2021083083-appb-000018
是普通关键点到中心关键点的预测偏移,γ 1为普通关键点损失的权值、γ 2为中心关键点损失的权重。
Among them, L kp represents the loss of common key points, N is the number of points in the target object point set, M is the number of common key points,
Figure PCTCN2021083083-appb-000016
represents the actual position offset of the target object point set,
Figure PCTCN2021083083-appb-000017
represents the predicted actual position offset of the target object point set, L c represents the loss of the center key point, Δx i is the actual offset from the common key point to the center key point,
Figure PCTCN2021083083-appb-000018
is the predicted offset from the common key point to the center key point, γ 1 is the weight of the loss of the common key point, and γ 2 is the weight of the loss of the center key point.
所述语义损失值计算模块105,用于对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值。The semantic loss value calculation module 105 is configured to perform semantic segmentation on the pixels of the scene depth map to obtain the semantic loss value of the target object.
详细地,所述语义分割为根据所述场景深度图的像素点,利用如下公式计算得到所述目标物体的语义损失L sIn detail, the semantic segmentation is to calculate the semantic loss L s of the target object according to the pixel points of the scene depth map using the following formula;
L s=-α(1-q i) γlog(q i) L s =-α(1-q i ) γ log(q i )
其中,α表示所述摄像装置的平衡参数,γ表示所述摄像装置的焦点参数,q i代表场景深度图中第i个像素点属于前景点还是背景点的置信度。 Wherein, α represents the balance parameter of the camera, γ represents the focus parameter of the camera, and q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
所述位姿计算模块106,用于根据所述可见度损失值、所述关键点损失值、所述语义损失值,以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。The pose calculation module 106 is configured to calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value, and the multi-task joint model obtained by pre-training.
详细地,本申请实施例中,所述目标物体的位姿是指三维的旋转矩阵和三维的平移矩阵组成的六维量。In detail, in the embodiment of the present application, the pose of the target object refers to a six-dimensional quantity composed of a three-dimensional rotation matrix and a three-dimensional translation matrix.
详细地,所述位姿计算模块106利用下述多任务联合模型计算所述目标物体的最终损失值L mtSpecifically, the pose calculation module 106 uses the following multi-task joint model to calculate the final loss value L mt of the target object:
L mt=μ 1L kps2L s3L v L mt = μ 1 L kps + μ 2 L s + μ 3 L v
其中,L kps代表所述关键点损失值,L s代表所述语义损失,L v代表所述可见度损失值,μ 01、μ 02、μ 03代表对所述多任务联合模型训练后得到的权值; Wherein, L kps represents the key point loss value, L s represents the semantic loss, L v represents the visibility loss value, μ 01 , μ 02 , μ 03 represent the weights obtained after training the multi-task joint model value;
本申请实施例进一步根据所述最终损失值调整所述目标物体的预测旋转矩阵和预测平移矩阵,得到所述目标物体的物姿。The embodiment of the present application further adjusts the predicted rotation matrix and the predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
进一步地,所述位姿计算模块106将所述目标物体位姿发送给预构建的机械臂,利用所述机械臂执行目标物体抓取任务。Further, the pose calculation module 106 sends the pose of the target object to a pre-built robotic arm, and uses the robotic arm to perform the target object grasping task.
如图3所示,是本申请实现物体位姿估计方法的电子设备的结构示意图。As shown in FIG. 3 , it is a schematic structural diagram of an electronic device implementing the object pose estimation method of the present application.
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如物体位姿估计程序12。The electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an object pose estimation program 12.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质可以是易失性的,也可以是非易失性的。具体的,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(SmartMediaCard,SMC)、安全数字(SecureDigital,SD)卡、闪存卡(FlashCard)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如物体位姿估计程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium may be volatile or non-volatile. Specifically, the readable storage medium includes a flash memory, a mobile hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device 1. card, flash memory card (FlashCard) and so on. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the object pose estimation program 12, etc., but also can be used to temporarily store data that has been output or will be output.
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(CentralProcessingunit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(ControlUnit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行物体位姿估计程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits. Central processing unit (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 10 is the control core (ControlUnit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the object) stored in the memory 11. pose estimation program, etc.), and call the data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
所述总线可以是外设部件互连标准(peripheralcomponentinterconnect,简称PCI)总线或扩展工业标准结构(extendedindustrystandardarchitecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
图3仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构 并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the figure. components, or a combination of certain components, or a different arrangement of components.
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management The device implements functions such as charge management, discharge management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(OrganicLight-EmittingDiode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.
所述电子设备1中的所述存储器11存储的物体位姿估计程序12是多个计算机程序的组合,在所述处理器10中运行时,可以实现:The object pose estimation program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple computer programs, and when running in the processor 10, it can realize:
利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。所述计算机可读存储介质可以是易失性的,也可以是非易失性的。具体的,所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-OnlyMemory)。Further, if the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read Only Memory) -Only Memory).
进一步地,所述计算机可用存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; using the created data, etc.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既 可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图表记视为限制所涉及的权利要求。Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any accompanying reference signs in the claims should not be construed as limiting the involved claims.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims (20)

  1. 一种物体位姿估计方法,其中,所述方法包括:An object pose estimation method, wherein the method includes:
    利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
    利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
    根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
    对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
    对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
    根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
  2. 如权利要求1所述的物体位姿估计方法,其中,所述根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值,包括:The object pose estimation method according to claim 1, wherein calculating the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set, comprising:
    根据所述目标物体点集的点数与所述三维点云中包含的所有物体中的最大点集的点数的比值计算所述目标物体的实际可见度;Calculate the actual visibility of the target object according to the ratio of the point number of the target object point set to the point number of the largest point set in all objects included in the three-dimensional point cloud;
    通过所述实际可见度与所述目标物体的预测可见度的差的加权计算得到所述目标物体的可见度损失值。The visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
  3. 如权利要求1所述的物体位姿估计方法,其中,所述利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集,包括:The method for estimating object pose and pose according to claim 1, wherein, extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set, comprising:
    利用预构建的深度学习网络中的卷积、池化以及全连接层提取所述三维点云的特征点集;Extract the feature point set of the 3D point cloud by using the convolution, pooling and fully connected layers in the pre-built deep learning network;
    利用所述深度学习网络中的分类器将所述特征点集分类为目标点集和非目标点集,并提取其中的目标点集得到目标物体点集。The feature point set is classified into a target point set and a non-target point set by using the classifier in the deep learning network, and the target point set is extracted to obtain a target object point set.
  4. 如权利要求1所述的物体位姿估计方法,其中,所述对所述目标物体点集进行霍夫投票,得到关键点集,包括:The object pose estimation method according to claim 1, wherein the Hough voting is performed on the target object point set to obtain a key point set, comprising:
    从所述目标物体点集中采样得到采样点集,计算所述采样点集之间的欧式距离偏移,得到偏移量;The sampling point set is obtained by sampling from the target object point set, and the Euclidean distance offset between the sampling point sets is calculated to obtain the offset;
    根据所述偏移量进行投票,将票数超过预设阈值的点的集合作为关键点集。Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
  5. 如权利要求1所述的物体位姿估计方法,其中,所述对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值,包括:The object pose estimation method according to claim 1, wherein the semantic segmentation of the pixels of the scene depth map to obtain the semantic loss value of the target object comprises:
    利用如下公式计算得到所述目标物体的语义损失L sThe semantic loss L s of the target object is obtained by calculating the following formula;
    L s=-α(1-q i) γlog(q i) L s =-α(1-q i ) γ log(q i )
    其中,α表示所述摄像装置的平衡参数,γ表示所述摄像装置的焦点参数,q i代表场景深度图中第i个像素点属于前景点还是背景点的置信度。 Wherein, α represents the balance parameter of the camera, γ represents the focus parameter of the camera, and q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
  6. 如权利要求1至5中任意一项所述的物体位姿估计方法,其中,所述根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿,包括:The method for estimating object pose and pose according to any one of claims 1 to 5, wherein the multi-task joint based on the visibility loss value, the keypoint loss value, the semantic loss value and the pre-trained model, and calculate the pose of the target object, including:
    利用下述多任务联合模型计算所述目标物体的最终损失值L mtThe final loss value L mt of the target object is calculated using the following multi-task joint model:
    L mt=μ 1L kps2L s3L v L mt = μ 1 L kps + μ 2 L s + μ 3 L v
    其中,L kps代表所述关键点损失值,L s代表所述语义损失,L v代表所述可见度损失值,μ 01、μ 02、μ 03代表对所述多任务联合模型训练后得到的权值。 Wherein, L kps represents the key point loss value, L s represents the semantic loss, L v represents the visibility loss value, μ 01 , μ 02 , μ 03 represent the weights obtained after training the multi-task joint model value.
    根据所述最终损失值调整所述目标物体的预测旋转矩阵和预测平移矩阵,得到所述目标物体的物姿。Adjust the predicted rotation matrix and predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
  7. 如权利要求1至5中任意一项所述的物体位姿估计方法,其中,所述对所述目标点 进行多任务联合训练,得到目标物体的位姿之后,还包括:The object pose estimation method according to any one of claims 1 to 5, wherein the multi-task joint training is performed on the target point, and after obtaining the pose of the target object, the method further includes:
    将所述目标物体的位姿发送给预构建的机械臂,利用所述机械臂执行目标物体的抓取任务。The pose of the target object is sent to a pre-built robotic arm, and the robotic arm is used to perform the grasping task of the target object.
  8. 一种物体位姿估计装置,其中,所述装置包括:An object pose estimation device, wherein the device includes:
    三维点云获取模块,用于利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;a three-dimensional point cloud acquisition module, configured to obtain a scene depth map of a target object by using a preset camera device, and calculate a three-dimensional point cloud of the scene depth map according to the pixel points in the scene depth map;
    目标物体点集提取模块,用于利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;A target object point set extraction module, used for extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
    可见度损失值计算模块,用于根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;a visibility loss value calculation module, configured to calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
    关键点损失值计算模块,用于对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;a key point loss value calculation module, configured to perform Hough voting on the target object point set to obtain a key point set, and calculate the key point loss value of the target object according to the key point set;
    语义损失值计算模块,用于对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;a semantic loss value calculation module, configured to perform semantic segmentation on the pixels of the scene depth map to obtain the semantic loss value of the target object;
    位姿计算模块,用于根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。The pose calculation module is configured to calculate the pose of the target object according to the visibility loss value, the key point loss value, the semantic loss value and the multi-task joint model obtained by pre-training.
  9. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device comprises:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序指令,所述计算机程序指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下所述的物体位姿估计方法:The memory stores computer program instructions executable by the at least one processor, the computer program instructions being executed by the at least one processor to enable the at least one processor to perform an object pose as described below Estimation method:
    利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
    利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
    根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
    对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
    对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
    根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
  10. 如权利要求9所述的电子设备,其中,所述根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值,包括:The electronic device according to claim 9, wherein calculating the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set comprises:
    根据所述目标物体点集的点数与所述三维点云中包含的所有物体中的最大点集的点数的比值计算所述目标物体的实际可见度;Calculate the actual visibility of the target object according to the ratio of the point number of the target object point set to the point number of the largest point set in all objects included in the three-dimensional point cloud;
    通过所述实际可见度与所述目标物体的预测可见度的差的加权计算得到所述目标物体的可见度损失值。The visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
  11. 如权利要求9所述的电子设备,其中,所述利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集,包括:The electronic device according to claim 9, wherein, extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set, comprising:
    利用预构建的深度学习网络中的卷积、池化以及全连接层提取所述三维点云的特征点集;Extract the feature point set of the 3D point cloud by using the convolution, pooling and fully connected layers in the pre-built deep learning network;
    利用所述深度学习网络中的分类器将所述特征点集分类为目标点集和非目标点集,并提取其中的目标点集得到目标物体点集。The feature point set is classified into a target point set and a non-target point set by using the classifier in the deep learning network, and the target point set is extracted to obtain a target object point set.
  12. 如权利要求9所述的电子设备,其中,所述对所述目标物体点集进行霍夫投票,得到关键点集,包括:The electronic device according to claim 9, wherein, performing Hough voting on the target object point set to obtain a key point set, comprising:
    从所述目标物体点集中采样得到采样点集,计算所述采样点集之间的欧式距离偏移,得到偏移量;The sampling point set is obtained by sampling from the target object point set, and the Euclidean distance offset between the sampling point sets is calculated to obtain the offset;
    根据所述偏移量进行投票,将票数超过预设阈值的点的集合作为关键点集。Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
  13. 如权利要求9所述的电子设备,其中,所述对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值,包括:The electronic device according to claim 9, wherein the semantic segmentation of the pixels of the scene depth map to obtain the semantic loss value of the target object comprises:
    利用如下公式计算得到所述目标物体的语义损失L sThe semantic loss L s of the target object is obtained by calculating the following formula;
    L s=-α(1-q i) γlog(q i) L s =-α(1-q i ) γ log(q i )
    其中,α表示所述摄像装置的平衡参数,γ表示所述摄像装置的焦点参数,q i代表场景深度图中第i个像素点属于前景点还是背景点的置信度。 Wherein, α represents the balance parameter of the camera, γ represents the focus parameter of the camera, and q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
  14. 如权利要求9至13中任意一项所述的电子设备,其中,所述根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿,包括:The electronic device according to any one of claims 9 to 13, wherein the calculation is performed according to the visibility loss value, the keypoint loss value, the semantic loss value and a pre-trained multi-task joint model Obtain the pose of the target object, including:
    利用下述多任务联合模型计算所述目标物体的最终损失值L mtThe final loss value L mt of the target object is calculated using the following multi-task joint model:
    L mt=μ 1L kps2L s3L v L mt = μ 1 L kps + μ 2 L s + μ 3 L v
    其中,L kps代表所述关键点损失值,L s代表所述语义损失,L v代表所述可见度损失值,μ 01、μ 02、μ 03代表对所述多任务联合模型训练后得到的权值。 Wherein, L kps represents the key point loss value, L s represents the semantic loss, L v represents the visibility loss value, μ 01 , μ 02 , μ 03 represent the weights obtained after training the multi-task joint model value.
    根据所述最终损失值调整所述目标物体的预测旋转矩阵和预测平移矩阵,得到所述目标物体的物姿。Adjust the predicted rotation matrix and predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
  15. 一种计算机可读存储介质,包括存储数据区和存储程序区,存储数据区存储创建的数据,存储程序区存储有计算机程序;其中,所述计算机程序被处理器执行时实现如下所述的物体位姿估计方法:A computer-readable storage medium, comprising a storage data area and a storage program area, the storage data area stores created data, and the storage program area stores a computer program; wherein, when the computer program is executed by a processor, the following objects are realized Pose estimation method:
    利用预设的摄像装置获取目标物体的场景深度图,根据所述场景深度图中的像素点计算所述场景深度图的三维点云;Use a preset camera to obtain a scene depth map of the target object, and calculate a three-dimensional point cloud of the scene depth map according to the pixels in the scene depth map;
    利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集;Extract target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set;
    根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值;Calculate the visibility loss value of the target object according to the three-dimensional point cloud and the target object point set;
    对所述目标物体点集进行霍夫投票,得到关键点集,根据所述关键点集计算所述目标物体的关键点损失值;Hough voting is performed on the target object point set to obtain a key point set, and the key point loss value of the target object is calculated according to the key point set;
    对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值;Semantic segmentation is performed on the pixels of the scene depth map to obtain the semantic loss value of the target object;
    根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿。According to the visibility loss value, the key point loss value, the semantic loss value and the multi-task joint model obtained by pre-training, the pose of the target object is calculated.
  16. 如权利要求15所述的计算机可读存储介质,其中,所述根据所述三维点云和所述目标物体点集,计算所述目标物体的可见度损失值,包括:The computer-readable storage medium of claim 15, wherein the calculating a visibility loss value of the target object according to the three-dimensional point cloud and the target object point set comprises:
    根据所述目标物体点集的点数与所述三维点云中包含的所有物体中的最大点集的点数的比值计算所述目标物体的实际可见度;Calculate the actual visibility of the target object according to the ratio of the point number of the target object point set to the point number of the largest point set in all objects included in the three-dimensional point cloud;
    通过所述实际可见度与所述目标物体的预测可见度的差的加权计算得到所述目标物体的可见度损失值。The visibility loss value of the target object is obtained by weighted calculation of the difference between the actual visibility and the predicted visibility of the target object.
  17. 如权利要求15所述的计算机可读存储介质,其中,所述利用预构建的深度学习网络提取所述三维点云中的目标点,得到目标物体点集,包括:The computer-readable storage medium according to claim 15, wherein, extracting target points in the three-dimensional point cloud by using a pre-built deep learning network to obtain a target object point set, comprising:
    利用预构建的深度学习网络中的卷积、池化以及全连接层提取所述三维点云的特征点集;Extract the feature point set of the 3D point cloud by using the convolution, pooling and fully connected layers in the pre-built deep learning network;
    利用所述深度学习网络中的分类器将所述特征点集分类为目标点集和非目标点集,并提取其中的目标点集得到目标物体点集。The feature point set is classified into a target point set and a non-target point set by using the classifier in the deep learning network, and the target point set is extracted to obtain a target object point set.
  18. 如权利要求15所述的计算机可读存储介质,其中,所述对所述目标物体点集进行霍夫投票,得到关键点集,包括:The computer-readable storage medium according to claim 15, wherein the performing Hough voting on the target object point set to obtain a key point set, comprising:
    从所述目标物体点集中采样得到采样点集,计算所述采样点集之间的欧式距离偏移, 得到偏移量;Sampling from the target object point set to obtain a sampling point set, and calculating the Euclidean distance offset between the sampling point sets to obtain an offset;
    根据所述偏移量进行投票,将票数超过预设阈值的点的集合作为关键点集。Voting is performed according to the offset, and the set of points whose votes exceed the preset threshold is used as the key point set.
  19. 如权利要求15所述的计算机可读存储介质,其中,所述对所述场景深度图的像素点进行语义分割,得到所述目标物体的语义损失值,包括:The computer-readable storage medium of claim 15, wherein the semantically segmenting the pixels of the scene depth map to obtain the semantic loss value of the target object comprises:
    利用如下公式计算得到所述目标物体的语义损失L sThe semantic loss L s of the target object is obtained by calculating the following formula;
    L s=-α(1-q i) γlog(q i) L s =-α(1-q i ) γ log(q i )
    其中,α表示所述摄像装置的平衡参数,γ表示所述摄像装置的焦点参数,q i代表场景深度图中第i个像素点属于前景点还是背景点的置信度。 Wherein, α represents the balance parameter of the camera, γ represents the focus parameter of the camera, and q i represents the confidence that the ith pixel in the scene depth map belongs to the foreground point or the background point.
  20. 如权利要求15至19中任意一项所述的计算机可读存储介质,其中,所述根据所述可见度损失值、所述关键点损失值、所述语义损失值以及预先训练得到的多任务联合模型,计算得到所述目标物体的位姿,包括:The computer-readable storage medium according to any one of claims 15 to 19, wherein the multi-task joint based on the visibility loss value, the keypoint loss value, the semantic loss value and pre-trained model, and calculate the pose of the target object, including:
    利用下述多任务联合模型计算所述目标物体的最终损失值L mtThe final loss value L mt of the target object is calculated using the following multi-task joint model:
    L mt=μ 1L kps2L s3L v L mt = μ 1 L kps + μ 2 L s + μ 3 L v
    其中,L kps代表所述关键点损失值,L s代表所述语义损失,L v代表所述可见度损失值,μ 01、μ 02、μ 03代表对所述多任务联合模型训练后得到的权值。 Wherein, L kps represents the key point loss value, L s represents the semantic loss, L v represents the visibility loss value, μ 01 , μ 02 , μ 03 represent the weights obtained after training the multi-task joint model value.
    根据所述最终损失值调整所述目标物体的预测旋转矩阵和预测平移矩阵,得到所述目标物体的物姿。Adjust the predicted rotation matrix and predicted translation matrix of the target object according to the final loss value to obtain the pose of the target object.
PCT/CN2021/083083 2020-12-01 2021-03-25 Object posture estimation method and apparatus, and electronic device and computer storage medium WO2022116423A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011385260.7 2020-12-01
CN202011385260.7A CN112446919B (en) 2020-12-01 2020-12-01 Object pose estimation method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
WO2022116423A1 true WO2022116423A1 (en) 2022-06-09

Family

ID=74740242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083083 WO2022116423A1 (en) 2020-12-01 2021-03-25 Object posture estimation method and apparatus, and electronic device and computer storage medium

Country Status (2)

Country Link
CN (1) CN112446919B (en)
WO (1) WO2022116423A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147488A (en) * 2022-07-06 2022-10-04 湖南大学 Workpiece pose estimation method based on intensive prediction and grasping system
CN115546216A (en) * 2022-12-02 2022-12-30 深圳海星智驾科技有限公司 Tray detection method, device, equipment and storage medium
CN115797565A (en) * 2022-12-20 2023-03-14 北京百度网讯科技有限公司 Three-dimensional reconstruction model training method, three-dimensional reconstruction device and electronic equipment
CN116630394A (en) * 2023-07-25 2023-08-22 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint
CN117226854A (en) * 2023-11-13 2023-12-15 之江实验室 Method and device for executing clamping task, storage medium and electronic equipment

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446919B (en) * 2020-12-01 2024-05-28 平安科技(深圳)有限公司 Object pose estimation method and device, electronic equipment and computer storage medium
CN113012291B (en) * 2021-04-01 2022-11-25 清华大学 Method and device for reconstructing three-dimensional model of object based on manipulator parameters
CN113095205B (en) * 2021-04-07 2022-07-12 北京航空航天大学 Point cloud target detection method based on improved Hough voting
CN113469947B (en) * 2021-06-08 2022-08-05 智洋创新科技股份有限公司 Method for measuring hidden danger and transmission conductor clearance distance suitable for various terrains
CN114399421A (en) * 2021-11-19 2022-04-26 腾讯科技(成都)有限公司 Storage method, device and equipment for three-dimensional model visibility data and storage medium
CN115482279A (en) * 2022-09-01 2022-12-16 北京有竹居网络技术有限公司 Object pose estimation method, device, medium, and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066935A (en) * 2017-01-25 2017-08-18 网易(杭州)网络有限公司 Hand gestures method of estimation and device based on deep learning
US20170330375A1 (en) * 2015-02-04 2017-11-16 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus
CN108665537A (en) * 2018-05-15 2018-10-16 清华大学 The three-dimensional rebuilding method and system of combined optimization human body figure and display model
CN111160280A (en) * 2019-12-31 2020-05-15 芜湖哈特机器人产业技术研究院有限公司 RGBD camera-based target object identification and positioning method and mobile robot
CN112446919A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Object pose estimation method and device, electronic equipment and computer storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3065100B1 (en) * 2017-04-06 2019-04-12 B<>Com INSTALLATION ESTIMATING METHOD, DEVICE, SYSTEM AND COMPUTER PROGRAM THEREOF
JP2018189510A (en) * 2017-05-08 2018-11-29 株式会社マイクロ・テクニカ Method and device for estimating position and posture of three-dimensional object
CN108961339B (en) * 2018-07-20 2020-10-20 深圳辰视智能科技有限公司 Point cloud object attitude estimation method, device and equipment based on deep learning
CN111489394B (en) * 2020-03-16 2023-04-21 华南理工大学 Object posture estimation model training method, system, device and medium
CN111968129B (en) * 2020-07-15 2023-11-07 上海交通大学 Instant positioning and map construction system and method with semantic perception

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330375A1 (en) * 2015-02-04 2017-11-16 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus
CN107066935A (en) * 2017-01-25 2017-08-18 网易(杭州)网络有限公司 Hand gestures method of estimation and device based on deep learning
CN108665537A (en) * 2018-05-15 2018-10-16 清华大学 The three-dimensional rebuilding method and system of combined optimization human body figure and display model
CN111160280A (en) * 2019-12-31 2020-05-15 芜湖哈特机器人产业技术研究院有限公司 RGBD camera-based target object identification and positioning method and mobile robot
CN112446919A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Object pose estimation method and device, electronic equipment and computer storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147488A (en) * 2022-07-06 2022-10-04 湖南大学 Workpiece pose estimation method based on intensive prediction and grasping system
CN115546216A (en) * 2022-12-02 2022-12-30 深圳海星智驾科技有限公司 Tray detection method, device, equipment and storage medium
CN115546216B (en) * 2022-12-02 2023-03-31 深圳海星智驾科技有限公司 Tray detection method, device, equipment and storage medium
CN115797565A (en) * 2022-12-20 2023-03-14 北京百度网讯科技有限公司 Three-dimensional reconstruction model training method, three-dimensional reconstruction device and electronic equipment
CN115797565B (en) * 2022-12-20 2023-10-27 北京百度网讯科技有限公司 Three-dimensional reconstruction model training method, three-dimensional reconstruction device and electronic equipment
CN116630394A (en) * 2023-07-25 2023-08-22 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint
CN116630394B (en) * 2023-07-25 2023-10-20 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint
CN117226854A (en) * 2023-11-13 2023-12-15 之江实验室 Method and device for executing clamping task, storage medium and electronic equipment
CN117226854B (en) * 2023-11-13 2024-02-02 之江实验室 Method and device for executing clamping task, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112446919B (en) 2024-05-28
CN112446919A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2022116423A1 (en) Object posture estimation method and apparatus, and electronic device and computer storage medium
JP6745328B2 (en) Method and apparatus for recovering point cloud data
US10832039B2 (en) Facial expression detection method, device and system, facial expression driving method, device and system, and storage medium
CN110363817B (en) Target pose estimation method, electronic device, and medium
WO2020244075A1 (en) Sign language recognition method and apparatus, and computer device and storage medium
CN111723786A (en) Method and device for detecting wearing of safety helmet based on single model prediction
US20220262093A1 (en) Object detection method and system, and non-transitory computer-readable medium
CN110991513A (en) Image target recognition system and method with human-like continuous learning capability
CN112419326B (en) Image segmentation data processing method, device, equipment and storage medium
US20230020965A1 (en) Method and apparatus for updating object recognition model
WO2023083030A1 (en) Posture recognition method and related device
CN110222651A (en) A kind of human face posture detection method, device, terminal device and readable storage medium storing program for executing
CN116778527A (en) Human body model construction method, device, equipment and storage medium
Wang et al. Deep leaning-based ultra-fast stair detection
CN115511779A (en) Image detection method, device, electronic equipment and storage medium
CN112784102B (en) Video retrieval method and device and electronic equipment
Gheitasi et al. Estimation of hand skeletal postures by using deep convolutional neural networks
CN116453222B (en) Target object posture determining method, training device and storage medium
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN116309643A (en) Face shielding score determining method, electronic equipment and medium
CN114494857A (en) Indoor target object identification and distance measurement method based on machine vision
CN117036658A (en) Image processing method and related equipment
CN113869218A (en) Face living body detection method and device, electronic equipment and readable storage medium
Zhou et al. Vision sensor‐based SLAM problem for small UAVs in dynamic indoor environments
CN114627535B (en) Coordinate matching method, device, equipment and medium based on binocular camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899467

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899467

Country of ref document: EP

Kind code of ref document: A1