CN114140517A

CN114140517A - Object pose identification method and device, visual processing equipment and readable storage medium

Info

Publication number: CN114140517A
Application number: CN202111401324.2A
Authority: CN
Inventors: 王阳; 熊友军
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-03-04

Abstract

The application provides an object pose identification method and device, a visual processing device and a readable storage medium, and relates to the technical field of robot control. According to the method and the device, after the color image and the depth image which are acquired by the grabbing robot and are matched with the pixel content are acquired, the color normalization processing is carried out on the color image according to the color calibration parameters to obtain a first target image, the depth normalization compensation processing is carried out on the depth image to obtain a second target image, then the image channel splicing processing is carried out on the first target image and the second target image to obtain a corresponding target object image, then a prestored pose estimation model is called to directly identify the object pose information of the target object from the target object image, so that the object pose information of the target object is directly identified after the color image and the depth image of the target object are fused, the pose identification real-time is enhanced, and the pose identification efficiency is ensured, and meanwhile, the scheme application range is effectively expanded.

Description

Object pose identification method and device, visual processing equipment and readable storage medium

Technical Field

The application relates to the technical field of robot control, in particular to an object pose identification method and device, a visual processing device and a readable storage medium.

Background

With the continuous development of science and technology, the robot technology has great research value and application value and is widely regarded by various industries. In the practical application process of the robot technology, the robot grabbing action belongs to one of the conventional actions which can be executed by an intelligent robot, and in order to realize the robot grabbing action, the pose condition of a target object is required to be preferentially acquired, the grabbing point position acting on the target object is estimated, and then the grabbing action track of the robot is planned to be executed. Therefore, identifying the specific pose condition of the target object is a necessary operation for achieving the robot grasping action.

At present, the mainstream general scheme for identifying the specific pose state of the target object in the industry is to split the pose state into position information and posture information for respective identification and detection, that is, an object positioning module is separately arranged to identify the position information of the target object, and an object posture estimation module is separately arranged to identify the posture information of the target object. Therefore, the general scheme takes longer time for identifying the object pose in specific implementation and has poor overall identification real-time performance. Meanwhile, the balance between the scheme application range and the recognition efficiency cannot be achieved by the existing object positioning algorithm for realizing the function of the object positioning module, and the balance between the scheme application range and the recognition efficiency cannot be achieved by the existing object attitude estimation algorithm for realizing the function of the object attitude estimation module.

Disclosure of Invention

In view of the above, an object of the present application is to provide an object pose identification method and apparatus, a visual processing device, and a readable storage medium, which can directly identify object pose information of a target object after fusing a color image and a depth image of the target object, so as to enhance pose identification real-time performance, and effectively expand a scheme application range while ensuring pose identification efficiency.

In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:

in a first aspect, the present application provides an object pose identification method, including:

acquiring a color image and a depth image which are acquired by a grabbing robot aiming at a target object, wherein the color image is matched with the depth image in pixel content at the same pixel position;

carrying out color normalization processing on the color image according to color calibration parameters to obtain a first target image;

carrying out depth normalization compensation processing on the depth image to obtain a second target image;

performing image channel splicing processing on the first target image and the second target image to obtain corresponding target object images;

and calling a pre-stored pose estimation model to identify the target object image to obtain object pose information of the target object.

In an optional embodiment, the color calibration parameters include preset normalized pixel mean values and preset normalized pixel standard deviation values of different color channels, and the step of performing color normalization processing on the color image according to the color calibration parameters to obtain a first target image includes:

aiming at the image components of the color image at different color channels, carrying out gray level normalization processing on all pixels of each image component to obtain a corresponding gray level normalized image;

and for each color channel, performing image standardization processing on all pixels of the gray-scale normalized image corresponding to the color channel according to the preset normalized pixel mean value and the preset normalized pixel standard difference value of the color channel to obtain the image component of the first target image at the color channel.

In an optional embodiment, the step of performing depth normalization compensation processing on the depth image to obtain a second target image includes:

carrying out depth normalization processing on all pixels of the depth image according to the maximum depth value of the depth image to obtain a corresponding depth normalization image;

and carrying out depth compensation processing on the depth-free pixels in the depth normalized image according to a preset normalized depth value to obtain the second target image.

In an alternative embodiment, the method further comprises:

acquiring at least one object sample data set corresponding to different object types, wherein the object sample data set comprises target object images marked with object outlines of corresponding object samples under different real object pose information, and the real object pose information is obtained by directly feeding back a mechanical arm end effector of the grabbing robot when the grabbing robot grabs the corresponding object samples;

and carrying out convolutional neural network model training based on the acquired sample data sets of all the objects to obtain the pose estimation model.

In a second aspect, the present application provides an object pose recognition apparatus, the apparatus comprising:

the acquisition image acquisition module is used for acquiring a color image and a depth image acquired by the grabbing robot aiming at a target object, wherein the color image is matched with the depth image in pixel content at the same pixel position;

the color normalization processing module is used for carrying out color normalization processing on the color image according to the color calibration parameters to obtain a first target image;

the depth normalization compensation module is used for performing depth normalization compensation processing on the depth image to obtain a second target image;

the image channel splicing module is used for carrying out image channel splicing processing on the first target image and the second target image to obtain corresponding target object images;

and the pose estimation and identification module is used for calling a prestored pose estimation model to identify the target object image to obtain the object pose information of the target object.

In an optional embodiment, the color calibration parameters include preset normalized pixel mean values and preset normalized pixel standard difference values of different color channels, and the color normalization processing module includes a grayscale normalization sub-module and an image normalization sub-module;

the gray scale normalization submodule is used for carrying out gray scale normalization processing on all pixels of each image component aiming at the image components of the color image at different color channels to obtain a corresponding gray scale normalized image;

and the image standardization sub-module is used for carrying out image standardization processing on all pixels of the gray-scale normalized image corresponding to each color channel according to the preset normalized pixel mean value and the preset normalized pixel standard difference value of the color channel so as to obtain the image component of the first target image at the color channel.

In an optional embodiment, the depth normalization compensation module includes a depth normalization sub-module and a depth value compensation sub-module;

the depth normalization submodule is used for carrying out depth normalization processing on all pixels of the depth image according to the maximum depth value of the depth image to obtain a corresponding depth normalization image;

and the depth value compensation submodule is used for carrying out depth compensation processing on the depth-free pixels in the depth normalized image according to a preset normalized depth value to obtain the second target image.

In an alternative embodiment, the apparatus further comprises:

the system comprises a sample data acquisition module, a data acquisition module and a data acquisition module, wherein the sample data acquisition module is used for acquiring at least one object sample data set corresponding to different object types respectively, the object sample data set comprises target object images marked with object outlines of corresponding object samples under different real object pose information, and the real object pose information is directly fed back by a mechanical arm end effector of the grabbing robot when the grabbing robot grabs the corresponding object samples;

and the network model training module is used for carrying out convolutional neural network model training on the basis of the acquired sample data sets of all the objects to obtain the pose estimation model.

In a third aspect, the present application provides a vision processing apparatus comprising a processor and a memory, wherein the memory stores a computer program executable by the processor, and the processor can execute the computer program to realize the object pose identification method according to any one of the foregoing embodiments.

In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for identifying the pose of an object according to any one of the foregoing embodiments is implemented.

Therefore, the beneficial effects of the embodiment of the application include the following:

after the color image and the depth image which are acquired by the grabbing robot and are matched with the pixel contents at the same pixel position are acquired, the color image is subjected to color normalization processing according to the color calibration parameters to obtain a first target image, the depth image is subjected to depth normalization compensation processing to obtain a second target image, then, the first target image and the second target image are processed by image channel splicing to obtain the corresponding target object image, then, the pre-stored pose estimation model is called to directly identify the target object image to obtain the object pose information of the target object, so that the object pose information of the target object is directly identified after the color image and the depth image of the target object are fused, the pose recognition real-time performance is enhanced, and the pose recognition efficiency is ensured while the application range of the scheme is effectively expanded.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a schematic diagram of a visual processing apparatus according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an object pose identification method according to an embodiment of the present application;

fig. 3 is an exemplary schematic view of a color image and a depth image acquired by a grabbing robot according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating the sub-steps included in step S220 in FIG. 2;

FIG. 5 is a flowchart illustrating the sub-steps included in step S230 of FIG. 2;

fig. 6 is a second schematic flowchart of an object pose identification method according to an embodiment of the present application;

fig. 7 is one of schematic constituent diagrams of an object pose recognition apparatus provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of the color normalization processing module of FIG. 7;

FIG. 9 is a schematic diagram of the depth normalization compensation module of FIG. 7;

fig. 10 is a second schematic view of the object pose recognition apparatus according to the embodiment of the present application.

Icon: 10-a vision processing device; 11-a memory; 12-a processor; 13-a communication unit; 100-object pose recognition means; 110-acquisition of an image; 120-color normalization processing module; 130-a depth normalization compensation module; 140-image channel stitching module; 150-pose estimation identification module; 121-grayscale normalization submodule; 122-an image normalization sub-module; 131-a depth normalization submodule; 132-depth value compensation submodule; 160-sample data acquisition module; 170-network model training module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is to be understood that relational terms such as the terms first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

The applicant finds out through diligent research that the solutions used by the existing object positioning functions can be roughly divided into three categories: (1) by using an Object Localization and Classification-free algorithm, the algorithm only outputs possible candidate position information of an Object and does not have a class label of the Object, and is generally used for grabbing of a mechanical arm in a 2D plane, especially for the Object grabbing function of an industrial mechanical arm, under the scene of the type, the visual angle of the mechanical arm is fixed, a target Object has a stable geometric shape, and the background of a working space is single, so that the Object Localization function can be realized by using template matching, but the algorithm has low intelligent degree and poor mobility, and is difficult to apply to complex scenes; (2) object Detection (Object Detection) -based Object positioning algorithms, which output position and category information of a target Object by using a rectangular Detection frame, are generally suitable for application scenarios without accurate contour information of the Object, but for some fine robot operations (e.g., mechanical arm grabbing and dynamic obstacle avoidance), background information in the Detection frame becomes noise and interferes with the execution performance of subsequent robot operations; (3) based on an Object Instance Segmentation (Object Instance Segmentation) Object positioning algorithm, the algorithm further provides pixel masks (masks) of various objects in a detection frame on the basis of an Object detection algorithm, the advantages of the Object detection algorithm are inherited, the output detection frame has a stable geometric shape, meanwhile, the model output form is more flexible, the output Mask can finely depict the Object outline, the interference on the subsequent robot operation is reduced, the model scale of the algorithm is further increased, and the overall recognition efficiency of the algorithm is greatly reduced. Therefore, the current implementation of the object positioning function is either only suitable for positioning some specific objects or only suitable for positioning under some specific scenes, or the overall positioning efficiency cannot be ensured under the condition of expanding the application range and maintaining the positioning effect.

For the object pose estimation function, the schemes used by the existing object pose estimation function can also be roughly divided into three categories: (1) object Pose Estimation algorithms (coresponsiveness-based Object position Estimation) based on feature matching, which need to extract features from input point clouds and convert the matching between the point clouds into matching between the features, have relatively robust algorithm results, but have high requirements on the material on the surface of an Object, and fail to extract the features due to the fact that the algorithms cannot extract the features when the surface of the Object is smooth and has no obvious texture change; (2) an Object Pose Estimation algorithm (Template-based Object position Estimation) based on Template matching can be regarded as solving a local-to-global coarse registration optimization problem (part-to-world coarse registration optimization) which has low overall requirements on the surface material of an Object, but the algorithm is sensitive to an initial value and is easy to generate multiple solutions, because a target Object is a symmetric Object in many cases, the optimization problem is changed into a non-convex optimization problem, and the algorithm has multiple local optimal solutions; (3) the Object Pose Estimation algorithm (Voting-based Object position Estimation) based on the Voting mechanism can be regarded as the integration of the Object Pose Estimation algorithm based on template matching, the results of a plurality of Object Pose Estimation algorithms based on template matching are sequenced, and the optimal solution of the model is obtained by Voting. The accuracy of the algorithm is generally the highest, but because the algorithm integrates a plurality of pose estimation models, the model scale of the algorithm is generally large, and the real-time performance of the whole algorithm is poor. Therefore, the implementation scheme of the object posture estimation function is similar to the implementation scheme of the object positioning function, and the implementation scheme of the object posture estimation function can be only suitable for posture estimation of some specific objects or posture estimation under some specific scenes, or the whole posture estimation efficiency cannot be ensured under the conditions of expanding the application range and maintaining the posture estimation effect.

Under the circumstance, in order to ensure the real-time performance of object pose identification, effectively maintain the balance between the pose identification efficiency and the pose identification scheme application range and improve the execution feasibility of the robot grabbing action, the embodiment of the application provides an object pose identification method and device, a visual processing device and a readable storage medium to realize the functions.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a visual processing apparatus 10 according to an embodiment of the present disclosure. In this embodiment of the application, the vision processing apparatus 10 may be in remote communication connection with the grabbing robot, or may be integrated with the grabbing robot, and is configured to process object images of different types of target objects collected by the grabbing robot, so as to assist the grabbing robot to quickly identify pose conditions of the different types of target objects, thereby effectively maintaining balance between pose identification efficiency and a pose identification scheme application range, and improving execution feasibility of robot grabbing actions. The object images collected by the grabbing robot comprise color images and depth images of corresponding objects, the grabbing robot is a robot which is provided with at least one mechanical arm capable of realizing grabbing actions and has image collection capacity, and the grabbing robot can be, but is not limited to, a whole humanoid robot, an upper half humanoid robot and the like.

In the present embodiment, the vision processing apparatus 10 may include a memory 11, a processor 12, a communication unit 13, and an object pose recognition device 100. Wherein, the respective elements of the memory 11, the processor 12 and the communication unit 13 are electrically connected to each other directly or indirectly to realize the transmission or interaction of data. For example, the memory 11, the processor 12 and the communication unit 13 may be electrically connected to each other through one or more communication buses or signal lines.

In this embodiment, the Memory 11 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 11 is used for storing a computer program, and the processor 12 can execute the computer program after receiving an execution instruction.

The memory 11 is further configured to store a pose estimation model, where the pose estimation model is a convolutional neural network model trained by using at least one object sample data set corresponding to each of different object classes, and is used to quickly and effectively identify the class to which the object belongs and the current pose status of the object. Each object sample data set corresponds to one object sample, and a single object sample data set is constructed and formed on the basis of a plurality of object pose conditions (including 3D position information and 3D rotation information of the corresponding object sample) which can be represented by the object sample, and color images and depth images which are acquired by the grabbing robot for the object samples with different object poses. In this embodiment, the capture robot is provided with an RGB-D vision sensor to ensure that the capture robot can acquire a color image and a depth image of a specific object at the same time, and ensure that the pixel contents of the color image and the depth image acquired at the same time are matched at the same pixel position.

In this embodiment, the processor 12 may be an integrated circuit chip having signal processing capabilities. The Processor 12 may be a general-purpose Processor including at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and discrete hardware components. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that implements or executes the methods, steps and logic blocks disclosed in the embodiments of the present application.

In this embodiment, the communication unit 13 is configured to establish a communication connection between the vision processing apparatus 10 and other electronic apparatuses through a network, and to send and receive data through the network, where the network includes a wired communication network and a wireless communication network. For example, the vision processing apparatus 10 may obtain a control instruction of a remote control apparatus through the communication unit 13, and adjust its working condition according to the control instruction, where the instruction content corresponding to the control instruction may be, but is not limited to, controlling the vision processing apparatus 10 to start, controlling the vision processing apparatus 10 to suspend running, controlling the vision processing apparatus 10 to shut down, and the like.

In the present embodiment, the object pose recognition apparatus 100 includes at least one software functional module that can be stored in the memory 11 in the form of software or firmware or solidified in the operating system of the vision processing device 10. The processor 12 may be used to execute executable modules stored by the memory 11, such as software functional modules and computer programs included in the object pose recognition apparatus 100. The vision processing device 10 fuses the color image and the depth image acquired by the grabbing robot for the target object through the object pose recognition device 100, and then directly recognizes the object pose information of the target object, so as to ensure the real-time object pose recognition, ensure that the object pose recognition scheme provided by the application can have better pose recognition efficiency and is applicable to objects of different object types, namely effectively expand the application range of the scheme, thereby effectively maintaining the balance between the pose recognition efficiency and the application range of the pose recognition scheme and improving the execution feasibility of the grabbing action of the robot.

It is to be understood that the block diagram shown in fig. 1 is merely one constituent schematic diagram of the vision processing apparatus 10, and that the vision processing apparatus 10 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

In the application, in order to ensure that the vision processing device 10 can assist the grabbing robot to quickly identify the pose conditions of different types of target objects, effectively maintain the balance between the pose identification efficiency and the application range of the pose identification scheme, and improve the execution feasibility of the grabbing actions of the robot, the embodiment of the application provides an object pose identification method to achieve the foregoing object. The object pose recognition method provided by the present application is described in detail below.

Referring to fig. 2, fig. 2 is a schematic flow chart of an object pose identification method according to an embodiment of the present application. In the embodiment of the present application, the object pose identification method may include steps S210 to S250, so as to ensure real-time object pose identification, effectively maintain the balance between the pose identification efficiency and the pose identification scheme application range, and effectively expand the scheme application range while ensuring the pose identification efficiency.

Step S210, acquiring a color image and a depth image which are acquired by the grabbing robot aiming at the target object, wherein the color image is matched with the depth image in pixel content at the same pixel position.

In this embodiment, the capture robot may select a certain time through the installed RGB-D vision sensor to simultaneously capture a color image and a depth image of a certain object, where the captured color image and the captured depth image are matched with each other in terms of image size, and the pixel content of each pixel in the color image is matched with the pixel content corresponding to the pixel position in the depth image.

Taking the color image and the depth image shown in fig. 3 as an example, both the color image (a) and the depth image (b) in fig. 3 are used to describe the scene in which the specific object (i.e., the CHEEZ-IT packing box and the storage paper box in fig. 3) is located, the image position and the object screen size of the specific object in the two images correspond to each other, the image sizes of the two images match with each other, and the pixel contents of the two images at the same pixel position match with each other.

When the current object pose of the target object needs to be identified, the capture robot may be invoked accordingly to acquire the current color image and depth image of the target object by using its own RGB-D vision sensor, and send the acquired color image and depth image to the vision processing device 10.

And step S220, carrying out color normalization processing on the color image according to the color calibration parameters to obtain a first target image.

In this embodiment, the color calibration parameter is used to make the distribution condition of the pixel values of the image after the color normalization processing of the corresponding color image conform to the data distribution rule of data centralization. The color calibration parameters may include preset normalized pixel mean values and preset normalized pixel standard deviation values of different color channels. In one embodiment of this embodiment, the color normalization process includes a processing operation on image components corresponding to the color image in three color channels, namely, R (red), G (green) and B (blue), the value of the preset normalized pixel mean at the R channel may be configured to be 0.485, the value of the preset normalized pixel mean value at the G channel may be configured to be 0.456, the value of the preset normalized pixel mean value at the B channel may be configured to be 0.406, the value of the preset normalized pixel standard deviation value at the R channel may be configured to be 0.229, the value of the preset normalized pixel standard deviation value at the G channel may be configured to be 0.224, the value of the preset normalized pixel standard deviation value at the B channel may be configured to be 0.225, thereby ensuring that the color calibration parameters conform to the data distribution rule.

Optionally, referring to fig. 4, fig. 4 is a flowchart illustrating the sub-steps included in step S220 in fig. 2. In this embodiment, the step S220 may include a sub-step S221 and a sub-step S222 to convert the color image into a normalized image conforming to the data distribution rule.

And a substep S221 of performing gray-scale normalization processing on all pixels of each image component to obtain a corresponding gray-scale normalized image, for the image components of the color image at different color channels.

In this embodiment, for an image component of a color image at a single color channel, a gray-scale normalization processing operation on the image component may be completed by dividing a pixel value of each pixel in the image component by a value 255 for representing a maximum gray-scale value, so as to ensure that a finally obtained pixel value of each pixel in a gray-scale normalized image corresponding to the image component is within a value range of [0,1 ].

In the substep S222, for each color channel, according to the preset normalized pixel mean value and the preset normalized pixel standard deviation value of the color channel, performing image normalization processing on all pixels of the grayscale normalized image corresponding to the color channel to obtain an image component of the first target image at the color channel.

In this embodiment, the sub-step S222 may be configured to form image components corresponding to the first target image at different color channels, that is, to obtain the first target image. In the process of performing image normalization processing on a gray-scale normalized image of a certain color channel, a pixel difference value between a pixel value of each pixel in the gray-scale normalized image and a preset normalized pixel mean value of the color channel needs to be calculated, then a quotient value between the pixel difference value corresponding to each pixel in the gray-scale normalized image and a preset normalized pixel standard difference value of the color channel is calculated, and all the calculated quotient values are respectively used as pixel values of the first target image at corresponding pixel positions in an image component of the color channel according to a pixel correspondence relationship, so that the color image is converted into a normalized image according with a data distribution rule at the level of the color channel.

Therefore, the color image can be converted into a normalized image according with the data distribution rule by performing the substeps 221 to the substep S222.

And step S230, carrying out depth normalization compensation processing on the depth image to obtain a second target image.

Optionally, referring to fig. 5, fig. 5 is a flowchart illustrating sub-steps included in step S230 in fig. 2. In this embodiment, the step S230 may include substeps S231 to S232, which convert the depth image into a normalized image with complete data.

And a substep S231 of performing depth normalization processing on all pixels of the depth image according to the maximum depth value of the depth image to obtain a corresponding depth normalization image.

In this embodiment, the depth normalization processing operation on the depth image may be completed by dividing the pixel value of each pixel in the depth image by the maximum depth value of the depth image, so as to ensure that the finally obtained pixel value of each effective pixel in the depth normalized image corresponding to the depth image is within the numerical range of [0,1 ]. Wherein the effective pixel is a pixel having a depth (i.e. a pixel value is not null) in the depth normalized image.

And a substep S232 of performing depth compensation processing on the depth-free pixels in the depth normalized image according to a preset normalized depth value to obtain a second target image.

In this embodiment, the preset normalized depth value is used to indicate that the corresponding pixel belongs to the no-depth state, and the preset normalized depth value may be configured as-1, in which case the sub-step S232 may be used to ensure that the second target image belongs to a normalized image with complete data.

Therefore, the depth image can be converted into a normalized image with complete data by performing the substeps S231 to 232.

Step S240, performing image channel stitching on the first target image and the second target image to obtain corresponding target object images.

In this embodiment, the first target image and the second target image have the same image size, the first target image relates to a plurality of color channels, and the second target image relates to a depth channel, so that the image size of the target object image is consistent with the respective image sizes of the first target image and the second target image, and the image channel of the target object image includes the plurality of color channels and the depth channel, that is, the target object image is obtained by dimension superposition of the first target image and the second target image.

And step S250, calling a prestored pose estimation model to identify the target object image to obtain object pose information of the target object.

In this embodiment, after obtaining the target object image corresponding to the target object, the vision processing apparatus 10 may call the stored pose estimation model accordingly, and input the target object image into the pose estimation model for recognition, determine the object class to which the target object belongs, and the object pose information (including the 3D position information and the 3D rotation information of the target object) recognized by the target object under the object class to which the target object belongs, so as to perform fast recognition on the pose statuses of different classes of target objects by using the pose estimation model, so as to improve the real-time performance of object pose recognition, ensure that the object pose recognition scheme provided by the present application is applicable to objects of different object classes while having better pose recognition efficiency, that is, effectively maintain the balance between the pose recognition efficiency and the pose recognition scheme application range, and the execution feasibility of the grabbing action of the robot is improved.

Therefore, by executing the steps S210 to S250, the grabbing robot is assisted to quickly identify the pose conditions of different types of target objects, the real-time property of object pose identification is improved, the balance between the pose identification efficiency and the pose identification scheme application range is effectively maintained, and the execution feasibility of the grabbing action of the robot is improved.

Optionally, referring to fig. 6, fig. 6 is a second schematic flow chart of the object pose identification method according to the embodiment of the present application. In this embodiment of the application, the object pose identification method may further include step S260 and step S270, so as to implement effective training of the pose estimation model, ensure that the pose estimation model is adapted to the grasping robot, effectively reduce data labeling pressure of training sample data, and improve the overall model training efficiency.

Step S260, at least one object sample data set corresponding to different object types is obtained, wherein the object sample data set comprises target object images marked with object outlines of corresponding object samples under different real object pose information, and the real object pose information is obtained by directly feeding back a mechanical arm end effector of the grabbing robot when the grabbing robot grabs the corresponding object samples.

In this embodiment, the object sample data set used for training the pose estimation model may be constructed and formed on the basis of the object sample information provided by the grabbing robot, so as to ensure that the finally trained pose estimation model and the grabbing robot are adapted to each other. The information of the multiple real object poses of the corresponding object samples in the single object sample data set can be obtained by sensing the end effector of the mechanical arm of the grabbing robot when the grabbing robot grabs the corresponding object samples, so that researchers do not need to label data of the real object poses in model training data, the data labeling pressure of training sample data is effectively reduced, and the whole model training efficiency is improved. The single object sample data set further includes a target object image marked with an object contour corresponding to the object sample under different real object pose information, the target object image corresponding to a certain real object pose information in the object sample data set can be obtained by processing the color image and the depth image acquired by the grabbing robot under the condition that the corresponding object sample is in the real object pose information through the steps S220 to S240, and then, a researcher marks the contour of the object sample in the target object image.

And step S270, carrying out convolutional neural network model training based on all the obtained object sample data sets to obtain a pose estimation model.

In this embodiment, on the basis of the acquired sample data sets of all objects adapted to the grasping robot, a back propagation algorithm is used to perform convolutional neural network model training, so as to obtain a corresponding pose estimation model.

Therefore, the method can realize effective training of the pose estimation model by executing the steps S260 and S270, ensure the pose estimation model to be matched with the grabbing robot, effectively reduce the data labeling pressure of training sample data, and improve the overall model training efficiency.

In the present application, to ensure that the vision processing apparatus 10 can execute the above-described object pose recognition method by the object pose recognition device 100, the present application realizes the aforementioned functions by performing functional block division on the object pose recognition device 100. The following describes the specific components of the object pose recognition apparatus 100 provided in the present application.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating an object pose recognition apparatus 100 according to an embodiment of the present disclosure. In the embodiment of the present application, the object pose recognition apparatus 100 may include an acquisition image acquisition module 110, a color normalization processing module 120, a depth normalization compensation module 130, an image channel stitching module 140, and a pose estimation recognition module 150.

The captured image acquiring module 110 is configured to acquire a color image and a depth image captured by the capture robot for the target object, where the color image and the depth image have matched pixel contents at the same pixel position.

And the color normalization processing module 120 is configured to perform color normalization processing on the color image according to the color calibration parameter to obtain a first target image.

And the depth normalization compensation module 130 is configured to perform depth normalization compensation processing on the depth image to obtain a second target image.

And the image channel splicing module 140 is configured to perform image channel splicing processing on the first target image and the second target image to obtain corresponding target object images.

And the pose estimation and identification module 150 is configured to call a pre-stored pose estimation model to identify the target object image, so as to obtain object pose information of the target object.

Optionally, referring to fig. 8, fig. 8 is a schematic composition diagram of the color normalization processing module 120 in fig. 7. In this embodiment, the color calibration parameters include preset normalized pixel mean values and preset normalized pixel standard deviation values of different color channels, and the color normalization processing module 120 may include a grayscale normalization sub-module 121 and an image normalization sub-module 122.

The grayscale normalization sub-module 121 is configured to perform grayscale normalization processing on all pixels of each image component of the color image at different color channels to obtain a corresponding grayscale normalized image.

And the image normalization submodule 122 is configured to, for each color channel, perform image normalization processing on all pixels of the grayscale normalized image corresponding to the color channel according to the preset normalized pixel mean value and the preset normalized pixel standard deviation value of the color channel, so as to obtain an image component of the first target image at the color channel.

Alternatively, referring to fig. 9, fig. 9 is a schematic composition diagram of the depth normalization compensation module 130 in fig. 7. In this embodiment, the depth normalization compensation module 130 may include a depth normalization sub-module 131 and a depth value compensation sub-module 132.

And the depth normalization submodule 131 is configured to perform depth normalization processing on all pixels of the depth image according to the maximum depth value of the depth image, so as to obtain a corresponding depth normalization image.

And the depth value compensation submodule 132 is configured to perform depth compensation processing on the depth-free pixels in the depth normalized image according to a preset normalized depth value, so as to obtain the second target image.

Optionally, referring to fig. 10, fig. 10 is a second schematic view of the object pose recognition apparatus 100 according to the embodiment of the present application. In the embodiment of the present application, the object pose recognition apparatus 100 may further include a sample data acquisition module 160 and a network model training module 170.

The sample data acquiring module 160 is configured to acquire at least one object sample data set corresponding to each of different object types, where the object sample data set includes target object images marked with object outlines of corresponding object samples under different real object pose information, and the real object pose information is obtained by directly feeding back, when the corresponding object samples are grabbed by the grabbing robot, a mechanical arm end effector of the grabbing robot.

And the network model training module 170 is configured to perform convolutional neural network model training based on the acquired sample data sets of all the objects to obtain the pose estimation model.

It should be noted that the basic principle and the resulting technical effect of the object pose recognition apparatus 100 provided in the embodiment of the present application are the same as those of the object pose recognition method described above. For brief description, where not mentioned in this embodiment section, reference may be made to the above description of the object pose recognition method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a readable storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In summary, in the object pose identification method and apparatus, the visual processing device, and the readable storage medium provided by the present application, after obtaining a color image and a depth image, which are acquired by a capture robot and are matched with the pixel content of the same pixel position, acquired by a target object, the color image is color normalized according to color calibration parameters to obtain a first target image, the depth image is depth normalization compensation processed to obtain a second target image, then the first target image and the second target image are image channel stitched to obtain a corresponding target object image, then a pre-stored pose estimation model is called to directly identify the target object image to obtain object pose information of the target object, so that the color image and the depth image of the target object are fused to directly identify the object pose information of the target object, the pose recognition real-time performance is enhanced, and the pose recognition efficiency is ensured while the application range of the scheme is effectively expanded.

The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An object pose recognition method, characterized in that the method comprises:

2. The method according to claim 1, wherein the color calibration parameters include a preset normalized pixel mean value and a preset normalized pixel standard deviation value of different color channels, and the step of performing color normalization processing on the color image according to the color calibration parameters to obtain the first target image includes:

3. The method of claim 1, wherein the step of performing depth normalization compensation processing on the depth image to obtain a second target image comprises:

4. The method according to any one of claims 1-3, further comprising:

5. An object pose recognition apparatus, characterized in that the apparatus comprises:

6. The apparatus of claim 5, wherein the color calibration parameters comprise a preset normalized pixel mean value and a preset normalized pixel standard deviation value of different color channels, and the color normalization processing module comprises a gray scale normalization sub-module and an image normalization sub-module;

7. The apparatus of claim 5, wherein the depth normalization compensation module comprises a depth normalization sub-module and a depth value compensation sub-module;

8. The apparatus of any one of claims 5-7, further comprising:

9. A vision processing apparatus comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being capable of executing the computer program to implement the object pose recognition method according to any one of claims 1 to 4.

10. A readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the object pose identification method according to any one of claims 1 to 4.