CN114708323A

CN114708323A - Object posture detection method and device

Info

Publication number: CN114708323A
Application number: CN202210244369.1A
Authority: CN
Inventors: 石光明; 李旭阳; 饶承炜; 于明轩; 谢雪梅
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-07-05

Abstract

The invention discloses a method and a device for detecting object postures, wherein the method comprises the steps of obtaining image data, wherein the image data is provided with a target object; acquiring a part constituting an object and a part frame corresponding to the part from the image data; obtaining the category of the target object based on the corresponding information of the component and the component frame in a preset object structure relation library; combining to obtain the target object according to the category and the coordinate dimension information of the component frame; and processing the combined target object to obtain the attitude information of the components forming the target object. According to the method and the device for detecting the object posture, provided by the embodiment of the invention, different components of the target object can be distinguished by splitting and identifying the image object, so that the 6D posture detection of the target object component is realized.

Description

Object posture detection method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a method and a device for detecting object postures.

Background

In image processing by a computer, the task of object detection is to find all objects (objects) of interest in an image, determine their pose, which refers to the spatial position xyz of the object in 3D space, and the angle at which the object is rotated about the x, y and z axes.

For a person to acquire a particular object, the pose of the object in space must be known. As well as for computers, finding the final target in the image also requires accurate pose of the object, and therefore, target detection has always been the most challenging problem in the field of computer vision.

Disclosure of Invention

The invention provides a method and a device for detecting object postures, which can distinguish different components of a target object by splitting and identifying an image object, thereby realizing the posture detection of the target object component.

In order to solve the above technical problem, an embodiment of the present invention provides a method for detecting an object posture, including:

acquiring image data, wherein the image data is provided with a target object;

acquiring a part constituting an object and a part frame corresponding to the part from the image data;

obtaining the category of the target object based on the corresponding information of the component and the component frame in a preset object structure relation library;

combining to obtain the target object according to the category and the coordinate dimension information of the component frame;

and processing the combined target object to obtain the attitude information of the components forming the target object.

As one of the preferable schemes, the image data includes an RGB image and a corresponding depth image;

after acquiring the parts constituting the object and the part frames corresponding to the parts in the image data, the method further includes:

and calculating the three-dimensional coordinates of the center point of the component frame in the RGB image, the rotation angle of the component in an imaging plane and the included angle between the component and the sight line direction.

As one preferable scheme, the object structure relationship library has standard comparison maps of various types of objects.

As one preferable scheme, the combining the target object according to the category and the coordinate size information of the component frame specifically includes:

predicting the component frames to obtain a plurality of prediction frames for forming the target object;

screening the prediction frame to obtain an adjacent frame capable of forming the target object;

and combining to obtain the target object based on the part corresponding to the part frame and the adjacent piece corresponding to the adjacent frame.

As one preferable scheme, the processing the combined target object to obtain the posture information of the components constituting the target object specifically includes:

sampling the corresponding position depth of the component frame in the depth image to obtain three-dimensional coordinates reflecting components forming the target object;

carrying out coordinate transformation on the three-dimensional coordinate to obtain a transformed three-dimensional coordinate;

and calculating the transformed three-dimensional coordinates, and obtaining an included angle between a component forming the target object and a coordinate system after normalization processing.

Another embodiment of the present invention provides an apparatus for detecting an object posture, including:

the device comprises an image acquisition module, a processing module and a display module, wherein the image acquisition module is used for acquiring image data, and the image data is provided with a target object;

the component frame module is used for acquiring components forming the object and component frames corresponding to the components in the image data;

the object type identification module is used for obtaining the type of the target object according to the corresponding information of the component and the component frame in a preset object structure relation library;

the object combination module is used for combining to obtain the target object according to the category and the coordinate size information of the component frame;

and the attitude acquisition module is used for processing the combined target object to obtain attitude information of components forming the target object.

the device further comprises:

and the coordinate included angle calculation module is used for calculating the three-dimensional coordinates of the central point of the component frame, the rotation angle of the component in the imaging plane and the included angle between the component and the sight line direction in the RGB image.

As one preferable scheme, the object structure relationship library has standard comparison maps of various objects.

As one preferable scheme, the object combination module comprises:

the prediction frame unit is used for predicting the component frames to obtain a plurality of prediction frames for forming the target object;

an adjacent frame unit, configured to screen the prediction frame to obtain an adjacent frame that can constitute the target object;

and the combination unit is used for combining the parts corresponding to the part frames and the adjacent pieces corresponding to the adjacent frames to obtain the target object.

As one of the preferable solutions, the posture acquiring module includes:

an adoption unit, configured to sample a corresponding position depth of the component frame in the depth image, and obtain a three-dimensional coordinate reflecting a component constituting the target object;

the coordinate transformation unit is used for carrying out coordinate transformation on the three-dimensional coordinate to obtain a transformed three-dimensional coordinate;

and the calculating unit is used for calculating the transformed three-dimensional coordinates and obtaining the included angle between the part forming the target object and the coordinate system after normalization processing.

Compared with the prior art, the embodiment of the invention has the advantages that at least one point is as follows: after image data shot by a camera is acquired, firstly, components of all objects in the image are identified, the components are identified in a component frame mode, then comparison query is carried out in a preset object structure relation library to obtain the category of a target object, then the target object is obtained through combination according to the category and coordinate size information of the component frame, so that the target object is split and combined, and finally the combined target object is processed to obtain posture information of the components forming the target object. The whole process can be used for clarifying the combination relationship among the object components, distinguishing different component components of the target object, calculating the posture information of the object in a three-dimensional space, such as the placing angle and the like, and realizing the posture detection of the whole target object and the component components.

Drawings

FIG. 1 is a flow chart illustrating a method for detecting object poses in one embodiment of the present invention;

FIG. 2 is a schematic diagram of a camera imaging coordinate system in one embodiment of the present invention;

FIG. 3 is a schematic diagram of a component block corresponding to a component in one embodiment of the present invention;

FIG. 4 is a schematic view of a standard ladle in an object structure relationship library according to one embodiment of the present invention;

FIG. 5 is a diagram of a standard hand drill in a relational library of object structures in one embodiment of the invention;

FIG. 6 is a diagram illustrating the prediction of a neighboring block in one embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments, and the embodiments are provided for the purpose of making the disclosure of the present invention more thorough and complete. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present application, the terms "first", "second", "third", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first," "second," "third," etc. may explicitly or implicitly include one or more of the features. In the description of the present application, the meaning of "a plurality" is two or more unless otherwise specified.

In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. As used herein, the terms "vertical," "horizontal," "left," "right," "up," "down," and the like are for illustrative purposes only and do not indicate or imply that the referenced device or element must be in a particular orientation, constructed or operated in a particular manner, and is not to be construed as limiting the present invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

In the description of the present application, it should be noted that, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention, as those skilled in the art will recognize the specific meaning of the terms used in the present application in a particular context.

An embodiment of the present invention provides a method for detecting an object posture, and specifically, please refer to fig. 1 to 6, where fig. 1 shows a schematic flow diagram of the method for detecting an object posture in one embodiment of the present invention, fig. 2 shows a schematic diagram of an imaging coordinate system of a camera in one embodiment of the present invention, where a positive direction of a z-axis is a viewing direction of the camera, a positive direction of an x-axis is a horizontal rightward direction of an imaging plane, and a positive direction of a y-axis is a vertical downward direction of the imaging plane, fig. 3 shows a schematic diagram of a component frame corresponding to a component in one embodiment of the present invention, where (x, y) are central coordinates of the component frame, w and h are width and height of the frame, respectively, and θ is a rotation angle of the frame, fig. 4 shows a schematic diagram of a standard spoon in an object structure relationship library in one embodiment of the present invention, fig. 5 shows a schematic diagram of a standard hand drill in an object structure relationship library in one embodiment of the present invention, fig. 6 is a schematic diagram illustrating prediction of an adjacent frame in an embodiment of the present invention, where the method for detecting the object pose in the embodiment of the present invention includes steps S1 to S5:

s1, acquiring image data, wherein the image data is provided with a target object;

s2, acquiring a part forming the object and a part frame corresponding to the part in the image data;

s3, obtaining the type of the target object based on the corresponding information of the component and the component frame in a preset object structure relation library;

s4, combining the target object according to the category and the coordinate dimension information of the component frame;

and S5, processing the combined target object to obtain the attitude information of the components forming the target object.

It should be noted that, because various objects have different appearances, shapes and postures, and interference of factors such as illumination and shielding during imaging is added, target detection is always a challenging problem in the field of computer vision. The existing target detection method obtains a target detection network parameter model by performing network training on a target detection data set marked by a straight frame, but the existing target detection technology recognizes an object as a whole and cannot distinguish object components, so that information of each component of the object is lost in the recognition process, and subsequent tasks such as operation of attitude estimation on the object are not facilitated.

In the embodiment of the present invention, in step S1, it is preferable that the RGB image is acquired by an RGBD camera, and the computer may acquire the object type and the position displayed by the straight frame in the screen according to the detection result of the RGB image. When the RGBD camera is used, the distance between an object and the plane of the camera can be obtained through a depth image, and the position of the object in a space rectangular coordinate system with the camera as the origin of coordinates can be obtained by combining the internal and external parameters of the camera.

Aiming at the defects that the existing relation of the split parts of the object and the placing direction of the object cannot be identified, the embodiment of the invention constructs a preset object structure relation library aiming at different objects, wherein the preset object structure relation library comprises the components of the object, the position relation of the components and the length-width ratio information of the components, and the detection result of the parts of the object in the component frame is combined. Here, the depth image given by the depth camera and the internal and external parameters of the RGBD camera need to be combined to obtain the 3D coordinate information of the corresponding component, which is not described herein again.

Further, in the above embodiment, the image data includes an RGB image and a corresponding depth image; for step S2, after object component detection is performed on the RGB image returned by the RGBD camera, and a component constituting the object and a component frame corresponding to the component are obtained, the method further includes:

and combining the detection result of the component frame with the internal and external parameters of the RGBD camera to calculate the three-dimensional coordinate of the central point of the component frame in the RGB image, the rotation angle beta of the object component in the imaging plane of the camera (camera) and the included angle alpha between the object component and the sight line direction. Wherein, the steps are based on the calculation given by the projection transformation relation of the camera.

Further, for the above step S3, the object structure relationship library is preset, and has a standard map of various objects (a standard spoon is shown in fig. 4, and a standard hand drill is shown in the figure). Since the present embodiment detects a part of an object rather than the whole object, and there are often a plurality of object parts in an image, it is difficult to easily determine the combination relationship of the parts. In this embodiment, taking a spoon as an example, if a plurality of spoons placed relatively close to each other appear in an image, after component recognition is performed on the image, detection results of a plurality of spoon heads and spoon handles can be obtained, but at this time, the type of a target object is not easy to judge, it is more difficult to recognize which spoon handle a certain spoon head in the image should be connected to, and at this time, it is necessary to infer and judge the combination relationship between the components according to a standard spoon in an object structure relationship library.

As shown in fig. 4, a standard spoon is normalized to a 1 × 1 × 1 cube in fig. 4 and divided into a spoon head and a spoon handle by parts, and a standard gimlet is normalized to a 1 × 1 × 1 cube in fig. 5 and divided into a drill bit, a gimlet body, and a gimlet handle by parts. Therefore, the object structure relation library can provide guidance for the component combination mode of the object on the basis of component detection.

Further, in the case of the detection results of a plurality of components included in the image, the components need to be combined according to a certain rule to form a plurality of complete objects, and the combining process is as described in step S4, and specifically includes:

s41, predicting the component frames to obtain a plurality of prediction frames for forming the target object;

s42, screening the prediction frame to obtain an adjacent frame capable of forming the target object;

and S43, combining the parts corresponding to the part frames and the adjacent pieces corresponding to the adjacent frames to obtain the target object.

Taking the spoon as an example, on the basis of the spoon head, according to the combination relationship between the spoon head and the spoon handle, it is predicted that one spoon handle may exist in the upper, lower, left and right directions of the spoon head, and at this time, prediction frames of four spoon handles exist.

And screening the four prediction frames, matching the three-dimensional coordinates, the length, the width and the angle of the prediction result with the detection result of the actual adjacent part, regarding the distance between the angle and the vertex and the detection result closest to the prediction frame as an adjacent frame combined with the part frame (the specific distance can be the weighted sum of the cosine distance and the Euclidean distance), and combining to obtain the target object based on the part corresponding to the part frame and the adjacent piece corresponding to the adjacent frame after obtaining the part frame and the adjacent frame.

Preferably, in the above embodiment, based on the coordinates of the object part, the 3D center coordinates and the length and width of the given adjacent frame, the specific calculation formula is as follows:

wherein h is_predict、w_predictHeight, width, h, of the abutment_part、w_partRespectively the height and width of the detection result of the central component; h is_{NormalizedNeighbor}、w_{NormalizedNeighbor}Respectively normalizing the height and the width of the adjacent frame; h is_{NormalizedCenterPart}、w_{NormalizedCenterPart}Respectively, the height and width of the normalized center box.

Further, in the above embodiment, as for the step S5, the method specifically includes:

s51, sampling the corresponding position depth of the component frame in the depth image to obtain the three-dimensional coordinates of the components which reflect and form the target object;

s52, carrying out coordinate transformation on the three-dimensional coordinate to obtain a transformed three-dimensional coordinate;

and S53, calculating the transformed three-dimensional coordinates, and obtaining the included angle between the part forming the target object and the coordinate system after normalization processing.

Specifically, the depth of the position of the depth image corresponding to the component frame is sampled according to the detection result of the object component, the three-dimensional coordinates of the sampling point under the coordinate system of the camera are obtained, coordinate transformation is carried out on the coordinate representation of the point obtained by sampling by combining the internal and external parameters of the camera, and the coordinate transformation calculation formula is as follows:

P′＝PT^xT^yT^z

wherein, P ═ x ', y ', z ']The three-dimensional coordinates after transformation; p ═ x, y, z]Three-dimensional coordinates before transformation; transformation matrix T^x、T^y、T^zAre respectively:

wherein a, b and c are the rotation angles of the camera around the x, y and z axes respectively.

After three-dimensional coordinates of the center of the object component and the sampling point in a Cartesian coordinate system of the camera are obtained, an included angle alpha between the object component and the z axis of the Cartesian coordinate system of the camera can be calculated according to an arctangent function of the ratio of the z axis coordinate of the sampling point to the x and y axis coordinates; and calculating the included angle beta between the object and the x axis according to the two norms of the x-axis coordinate and the y-axis coordinate after normalization between the object sampling points. The specific calculation formula is as follows:

wherein x is₁,y₁,z₁Is the three-dimensional coordinate, x, of the sampling point 1 in the Cartesian coordinate system of the camera₂,y₂,z₂Is the three-dimensional coordinates of the sample point 2 in the camera cartesian coordinate system. Certainly, in order to ensure the coordinate calculation to be accurate, a plurality of groups of sampling points can be selected according to different distributions and a straight line can be fitted to calculate the angle, which is not described herein again.

an image obtaining module 11, configured to obtain image data, where the image data includes a target object;

a component frame module 12, configured to acquire a component constituting an object and a component frame corresponding to the component from the image data;

an object type identification module 13, configured to obtain the type of the target object according to corresponding information of the component and the component frame in a preset object structure relationship library;

the object combination module 14 is used for combining the categories and the coordinate dimension information of the component frames to obtain the target object;

and the attitude acquisition module 15 is configured to process the combined target object to obtain attitude information of components constituting the target object.

Further, in the above-described embodiment, the image data includes an RGB image and a corresponding depth image;

the device further comprises:

Further, in the above embodiment, the object structure relationship library has standard maps of various types of objects.

Further, in the above embodiment, the object combining module 14 includes:

Further, in the above embodiment, the gesture obtaining module 15 includes:

The method and the device for detecting the object posture provided by the embodiment of the invention have the beneficial effects that at least one point is as follows:

after image data shot by a camera is acquired, firstly, components of all objects in the image are identified, the components are identified in a component frame mode, then comparison query is carried out in a preset object structure relation library to obtain the category of a target object, then the target object is obtained through combination according to the category and coordinate size information of the component frame, so that the target object is split and combined, and finally the combined target object is processed to obtain posture information of the components forming the target object. The whole process can be used for clarifying the combination relationship among the object parts, distinguishing different component parts of the target object, and calculating the posture information of the object in a three-dimensional space, such as the placing angle and the like, so that the posture detection of the whole target object and the component parts is realized.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for detecting an object pose, comprising:

acquiring image data, wherein the image data is provided with a target object;

acquiring a part constituting an object and a part frame corresponding to the part in the image data;

2. The method of detecting an object pose as claimed in claim 1, wherein the image data comprises an RGB image and a corresponding depth image;

3. The method for detecting the posture of the object according to claim 1, wherein the object structure relationship library has standard maps of various objects.

4. The method for detecting the posture of the object according to claim 1, wherein the combining the target object according to the category and the coordinate dimension information of the component frame specifically includes:

5. The method for detecting the posture of the object according to claim 2, wherein the processing the combined target object to obtain the posture information of the component constituting the target object specifically includes:

6. An apparatus for detecting the attitude of an object, comprising:

7. The apparatus for detecting the posture of an object according to claim 6, wherein said image data includes an RGB image and a corresponding depth image;

the device further comprises:

8. The apparatus according to claim 6, wherein the object structure relationship library has standard maps of various objects.

9. The apparatus for detecting the posture of an object according to claim 6, wherein said object combination module comprises:

10. The apparatus for detecting the posture of an object according to claim 7, wherein said posture acquisition module comprises: