CN113610966A

CN113610966A - Three-dimensional attitude adjustment method and device, electronic equipment and storage medium

Info

Publication number: CN113610966A
Application number: CN202110929425.0A
Authority: CN
Inventors: 吴思泽; 金晟; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-11-05
Also published as: WO2023015903A1

Abstract

The disclosure provides a method and a device for adjusting a three-dimensional posture, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring three-dimensional coordinates to be adjusted of a plurality of key points of a target object in a target voxel space; determining key point characteristic information obtained by projecting a plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted; and determining the three-dimensional posture information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target image corresponding to the plurality of key points at the plurality of visual angles respectively. The connection relation between the key points can be constrained by combining the key point connection relation information, so that the determined key point characteristic information is more accurate, and the precision and accuracy of the three-dimensional posture are further improved.

Description

Three-dimensional attitude adjustment method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a method and an apparatus for adjusting a three-dimensional pose, an electronic device, and a storage medium.

Background

Three-Dimensional (3D) human body pose estimation refers to estimating the pose of a character target from an image, a video or a point cloud, and is commonly used in various industrial fields such as human body reconstruction, human-computer interaction, behavior recognition, game modeling and the like.

In the related art, a 3D human body posture estimation scheme is provided for performing multi-view feature extraction based on 3D spatial voxelization and detecting key points through a Convolutional Neural Network (CNN). The spatial voxelization is to divide a 3D space into equal-sized grids equidistantly, and the voxelized multi-view image features can be used as the input of 3D convolution.

The space voxelization in the 3D human body posture estimation scheme can bring quantization errors, and in a large 3D space scene, only a large step length can be selected for voxelization, so that the quantization errors are further increased, and the precision and the accuracy of the determined three-dimensional posture are low.

Disclosure of Invention

The embodiment of the disclosure at least provides a method and a device for adjusting a three-dimensional posture, electronic equipment and a storage medium, so as to improve the precision and accuracy of three-dimensional posture evaluation.

In a first aspect, an embodiment of the present disclosure provides a method for adjusting a three-dimensional pose, where the method includes:

acquiring three-dimensional coordinates to be adjusted of a plurality of key points of a target object in a target voxel space;

determining key point characteristic information obtained by projecting the plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted; the target images are obtained by shooting target objects under a plurality of visual angles;

and determining the three-dimensional posture information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the key point feature information of the target image corresponding to the plurality of key points in the plurality of visual angles respectively.

By adopting the three-dimensional posture adjustment method, under the condition that the three-dimensional coordinates to be adjusted of a plurality of key points of the target object in the target voxel space are obtained, the key point characteristic information obtained by respectively projecting the key points in a plurality of target images can be determined based on the three-dimensional coordinates to be adjusted, and finally the three-dimensional posture information of the target object is determined based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target images corresponding to the key points in a plurality of visual angles. Therefore, the connection relationship of the plurality of key points under different visual angles can be determined by utilizing the key point characteristic information of the plurality of key points under different visual angles, the connection relationship is helpful for determining more accurate key point characteristic information, in addition, the connection relationship between the key points can be constrained by combining the key point connection relationship information which is constructed in advance, the determined key point characteristic information is more accurate, and the accuracy and precision of the determined three-dimensional attitude information are further improved.

In a possible implementation manner, the determining, based on the three-dimensional coordinate to be adjusted, the feature information of the keypoints that is obtained by projecting the plurality of keypoints in the plurality of target images respectively includes:

determining two-dimensional projection point information of the plurality of key points in the plurality of target images respectively based on the three-dimensional coordinate to be adjusted, and extracting image features corresponding to the plurality of target images respectively;

extracting key point feature information matched with the key points from image features respectively corresponding to the target images on the basis of two-dimensional projection point information of the key points in the target images;

and determining the extracted key point feature information matched with the key points as the key point feature information obtained by projection in the plurality of target images.

Here, the key point feature information matching the key points can be determined based on the correspondence between the two-dimensional projected point information and the image features of the key points in the plurality of target images, and the operation is simple.

In one possible embodiment, the two-dimensional proxel information includes image position information of the two-dimensional proxel; the extracting, based on the two-dimensional projection point information of the key points in the plurality of target images, key point feature information matched with the key points from image features respectively corresponding to the plurality of target images includes:

for each target image in the plurality of target images, extracting image features corresponding to the image position information from image features corresponding to the target image based on the image position information of the two-dimensional projection points of the key points in the plurality of target images;

and determining the extracted image features corresponding to the image position information as key point feature information matched with the key points.

In a possible implementation manner, the determining three-dimensional posture information of the target object based on pre-constructed key point connection relationship information corresponding to the target object and key point feature information of a target image corresponding to a plurality of key points at the plurality of viewing angles respectively includes:

for each key point in the plurality of key points, determining updated key point feature information of the key point at different viewing angles based on the key point feature information of the key point at different viewing angles and the key point feature information of other key points associated with the key point;

and determining the three-dimensional posture information of the target object based on the updated key point characteristic information corresponding to the plurality of key points respectively and the pre-constructed key point connection relation information corresponding to the target object.

Here, the feature information of the key point may be updated by using the feature information of the key point of each key point under different viewing angles and the feature information of the key point of other key points associated with the key point, and the updated feature information of the key point includes the features of other key points in one view to some extent and also includes the features of the key points between different views, so that the features of the key point are closer to accuracy, and the determined three-dimensional pose information is more accurate.

In one possible implementation, the determining, based on the keypoint feature information of the keypoint at different viewing angles and the keypoint feature information of other keypoints associated with the keypoint, updated keypoint feature information of the keypoint at different viewing angles includes:

taking each of the plurality of views as a target view, respectively performing the following steps:

performing first updating on the key point feature information of the key point under different viewing angles based on the key point feature information of the key point under different viewing angles and a first connection relation between the two-dimensional projection points of the key point under different viewing angles to obtain first updated key point feature information;

performing second updating on the key point feature information of the key point under the target view angle based on the key point feature information of the key point under the target view angle and the key point feature information of other key points which belong to the target view angle and have a second connection relation with the key point, so as to obtain second updated key point feature information;

and determining updated key point feature information of the key point under the target view angle based on the first updated key point feature information and the second updated key point feature information.

for each key point in the plurality of key points, fusing key point feature information of the key point under different viewing angles to obtain fused key point feature information corresponding to the key point;

and determining the three-dimensional attitude information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the fusion key point feature information corresponding to the plurality of key points respectively.

Here, through the fusion operation of the key point feature information under different viewing angles, the determined fusion key point feature information can take into account the features of different viewing angles, and the accuracy of the three-dimensional attitude information is further improved.

In one possible implementation, the keypoint feature information includes keypoint feature values of multiple dimensions; the fusing the key point feature information of the key point under different viewing angles to obtain the fused key point feature information corresponding to the key point comprises the following steps:

for each dimension in the plurality of dimensions, determining a plurality of key point characteristic values corresponding to the dimension of the key point under different viewing angles, and determining a fused key point characteristic value corresponding to the dimension based on the determined plurality of key point characteristic values;

and determining fused key point characteristic information corresponding to the key points based on the fused key point characteristic values respectively corresponding to the dimensions.

In a possible implementation manner, the determining, based on the determined plurality of keypoint feature values, a fused keypoint feature value corresponding to the dimension includes one of the following manners:

selecting a key point characteristic value with the largest value from the plurality of key point characteristic values as a fused key point characteristic value corresponding to the dimension;

taking the average value of the plurality of key point characteristic values as a fused key point characteristic value corresponding to the dimensionality;

obtaining weight values corresponding to the plurality of key point characteristic values respectively, and determining the fused key point characteristic values corresponding to the dimensionality based on the weighted sum of the plurality of key point characteristic values and the weight values corresponding to the plurality of key point characteristic values respectively.

In a possible implementation manner, determining three-dimensional pose information of the target object based on pre-constructed keypoint connection relationship information corresponding to the target object and fused keypoint feature information corresponding to each of the plurality of keypoints, includes:

updating the feature information of the fusion key points corresponding to the plurality of key points respectively based on a third connection relation between the key points included in the pre-constructed key point connection relation information corresponding to the target object to obtain updated feature information of the fusion key points;

and determining the three-dimensional attitude information of the target object based on the updated fusion key point characteristic information.

Here, the feature information of the fusion key points corresponding to the plurality of key points respectively may be updated based on the third connection relationship between the key points included in the pre-constructed key point connection relationship information, so as to obtain updated feature information of the fusion key points, that is, the feature information of the fusion key points may be calibrated by using the pre-constructed third connection relationship, so that the determined three-dimensional posture is also more accurate.

In a possible implementation manner, each of the plurality of key points of the target object is taken as a first key point, and each of the key points having the third connection relationship is taken as a second key point;

the second key point is a human skeleton point;

the first key points comprise at least one of human skeleton points and human mark points.

In one possible implementation, the determining three-dimensional pose information of the target object based on the updated fusion keypoint feature information includes:

inputting the updated fusion key point feature information into a pre-trained target posture recognition network, and outputting posture deviation information; the attitude deviation information is used for representing the deviation condition between the current attitude of the target object and the attitude to be adjusted;

determining adjusted three-dimensional coordinates of a plurality of key points of the target object in the target voxel space based on the attitude deviation information and the three-dimensional coordinates of the plurality of key points of the target object to be adjusted in the target voxel space, and determining three-dimensional attitude information of the target object based on the adjusted three-dimensional coordinates.

In a possible embodiment, the obtaining three-dimensional coordinates to be adjusted of a plurality of key points of the target object in the target voxel space includes one of the following manners:

acquiring a plurality of target images obtained by shooting the target object under a plurality of visual angles, and determining three-dimensional coordinates to be adjusted of a plurality of key points of the target object in the target voxel space based on the plurality of target images;

the method comprises the steps of obtaining depth information respectively returned by a plurality of detection rays emitted by radio equipment, and determining three-dimensional coordinates to be adjusted of a plurality of key points of the target object in the target voxel space based on the depth information.

In a possible implementation manner, each of the obtained multiple target images is used as a first target image, and each of the multiple target images for the keypoint projection is used as a second target image;

at least a portion of the image in the first target image is the same as at least a portion of the image in the second target image; alternatively, the first and second electrodes may be,

the first target image and the second target image do not have the same image.

In a second aspect, the present disclosure also provides an apparatus for three-dimensional pose adjustment, the apparatus comprising:

the acquisition module is used for acquiring three-dimensional coordinates to be adjusted of a plurality of key points of the target object in a target voxel space;

the determining module is used for determining key point characteristic information obtained by projecting the plurality of key points in the plurality of target images respectively based on the three-dimensional coordinate to be adjusted; the target images are obtained by shooting target objects under a plurality of visual angles;

and the adjusting module is used for determining the three-dimensional posture information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target image corresponding to the plurality of key points in the plurality of visual angles respectively.

In a third aspect, the disclosed embodiments also provide an electronic device, including: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the steps of the method of three-dimensional pose adjustment according to the first aspect and any of its various embodiments.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method for three-dimensional pose adjustment according to the first aspect and any one of the various embodiments thereof.

For the description of the effects of the above three-dimensional posture adjustment apparatus, electronic device, and computer-readable storage medium, reference is made to the description of the above three-dimensional posture adjustment method, which is not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a method for adjusting a three-dimensional pose provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating an application of a method for adjusting a three-dimensional pose provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating an apparatus for three-dimensional pose adjustment provided by an embodiment of the present disclosure;

fig. 4 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Research shows that the related technology provides a 3D human body posture estimation scheme which performs multi-view feature extraction based on 3D space voxelization and detects key points through CNN. The spatial voxelization is to divide a 3D space into equal-sized grids equidistantly, and the voxelized multi-view image features can be used as the input of 3D convolution.

Based on the above research, the present disclosure provides a method and an apparatus for adjusting a three-dimensional pose, an electronic device, and a storage medium, so as to improve the precision and accuracy of three-dimensional pose evaluation.

To facilitate understanding of the present embodiment, first, a method for adjusting a three-dimensional pose disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for adjusting a three-dimensional pose provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a vehicle-mounted device, a wearable device, or a server or other processing device. In some possible implementations, the method of three-dimensional pose adjustment may be implemented by a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, which is a flowchart of a method for adjusting a three-dimensional pose provided in the embodiment of the present disclosure, the method includes steps S101 to S103, where:

s101: acquiring three-dimensional coordinates to be adjusted of a plurality of key points of a target object in a target voxel space;

s102: determining key point characteristic information obtained by projecting a plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted; the multiple target images are target images obtained by shooting target objects under multiple visual angles;

s103: and determining the three-dimensional posture information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target image corresponding to the plurality of key points at the plurality of visual angles respectively.

In order to facilitate understanding of the method for adjusting the three-dimensional posture provided by the embodiment of the present disclosure, first, a brief description is provided to an application scenario of the method. The method for adjusting the three-dimensional posture in the embodiment of the disclosure can be applied to any application scene that needs to be adjusted by the three-dimensional posture, for example, the method for adjusting the three-dimensional posture of a pedestrian in front of an automatically-driven vehicle in the field of automatic driving, and for example, the method for adjusting the three-dimensional posture of a road vehicle in the field of intelligent security and protection, etc., which is not particularly limited in the embodiment of the disclosure. The following is a more illustrative example of the field of autonomous driving.

It is considered that the precision of the three-dimensional attitude determined by combining the voxelization and the CNN network in the related art is often limited by the quantization error of the voxelization, and in addition, the precision of the three-dimensional attitude determined even with other in-vehicle devices such as a radio device and the like may be low in the precision and accuracy of the determined three-dimensional attitude information due to the influence of various adverse factors.

In order to solve the above problem, the embodiment of the present disclosure provides a scheme for adjusting a three-dimensional posture by combining pre-constructed key point connection relationship information and key point feature information of a plurality of key points respectively at different viewing angles, so as to improve the precision and accuracy of the three-dimensional posture, and thus, the method can be better applied to various actual scenes.

The three-dimensional coordinates to be adjusted in the embodiment of the present disclosure may be initial three-dimensional coordinates of a plurality of key points for the same target object. In a specific application, the three-dimensional coordinate to be adjusted may be determined by performing voxelization and CNN network detection on a plurality of target images, may be calculated based on a limit distance of the plurality of target images, and then calculated based on 3D reconstruction, or calculated based on depth information detected by a radio device that operates synchronously, and in addition, may be determined by other methods, which is not limited in this disclosure.

It should be noted that, in the case of determining the three-dimensional coordinates to be adjusted, the selected target image and the target image subjected to the subsequent key point projection may be images obtained by shooting the same target object. In a specific application, the images may be completely the same, may be partially the same, or may be completely different images. Each target image selected for determining the three-dimensional coordinates to be adjusted is used as a first target image, and each target image used for keypoint projection is used as a second target image, wherein at least partial images in the first target image are the same as at least partial images in the second target image, and the partial images can be the same or all images can be the same, the partial images can refer to overlapped images, and the number of the overlapped images and the shooting visual angle are the same; alternatively, the first target image and the second target image do not have the same image, that is, although both the first target image and the second target image are images taken with respect to the target object in a certain posture, the taken angle of view is not the same when the target object is taken.

In the embodiment of the present disclosure, the plurality of key points of the target object may correspond to key nodes of the target object, and taking a human body as the target object as an example, the key points may be human skeleton points corresponding to human skeletons, or may be human mark points capable of identifying the human body.

Under the condition that the three-dimensional coordinate to be adjusted is obtained, the method for adjusting the three-dimensional posture provided by the embodiment of the disclosure may determine two-dimensional projection point information of a plurality of key points in a plurality of target images respectively, and determine key point feature information of the plurality of key points under different viewing angles based on the two-dimensional projection point information.

The plurality of target images for performing two-dimensional projection may be obtained by shooting the same target object at a plurality of viewing angles, that is, one viewing angle may correspond to one target image. In the field of automatic driving, the target images may be obtained by synchronously shooting the same target object by a plurality of cameras installed in a vehicle, where the plurality of cameras may be selected according to different user requirements, for example, three target images shot by three cameras installed on two sides and a center of a vehicle head and corresponding to a pedestrian ahead may be used.

The information about the two-dimensional projection point can be determined based on a conversion relationship between a three-dimensional coordinate system where the three-dimensional coordinate to be adjusted is located and a two-dimensional coordinate system where the target image is located, that is, the key point can be projected onto the target image by using the conversion relationship, so that information such as an image position of the two-dimensional projection point of the key point on the target image is determined.

Based on the two-dimensional projection point information of the plurality of key points in the plurality of target images respectively, the key point feature information of the plurality of key points under different viewing angles can be determined, wherein the determined key point feature information can be feature information fused with different viewing angles, and the key point feature information is mainly based on the fact that for the same target object, a certain connection relation exists between corresponding key points under different viewing angles, and then the related key point features can be updated. In addition, under the same view angle, a certain connection relation also exists between corresponding key points, and the updating of the related key point characteristics can be realized, so that the determined key point characteristic information is more fit with the actual posture of the target object.

The pre-constructed key point connection relation information can correspond to a target object with a certain posture, and key point characteristic information of a plurality of key points under different visual angles can be restrained by combining the key point connection relation information, so that the determined three-dimensional posture can be more accurate.

The three-dimensional posture information determined based on the key point connection relationship and the key point feature information may be obtained by combining adjusted three-dimensional coordinates obtained by adjusting three-dimensional coordinates to be adjusted of each key point of a plurality of key points of the target object, that is, the adjusted three-dimensional coordinates of the plurality of key points may represent the three-dimensional posture of the target object.

Considering the key role of the determination of the key point feature information of the key point on the three-dimensional pose adjustment, the process of determining the key point feature information may be described in detail next.

The process for determining the key point feature information mainly comprises the following steps:

determining two-dimensional projection point information of a plurality of key points in a plurality of target images respectively based on a three-dimensional coordinate to be adjusted, and extracting image features corresponding to the target images respectively;

extracting key point feature information matched with key points from image features respectively corresponding to the multiple target images based on two-dimensional projection point information of the key points in the multiple target images;

and step three, determining the extracted key point feature information matched with the key points as the key point feature information obtained by projection in the multiple target images.

In order to extract the feature information of the key points matching with the key points, in the method for adjusting the three-dimensional posture provided by the embodiment of the disclosure, for each target image, based on the image position information of the two-dimensional projection points of the key points in the multiple target images, the image feature corresponding to the image position information is extracted from the image features corresponding to the target image, and the extracted image feature is taken as the feature information of the key points matching with the key points.

The image features corresponding to the target image may be obtained based on image processing, may also be obtained based on trained feature extraction network extraction, and may also be determined by other methods capable of extracting various information representing the target object, the scene where the target object is located, and the like.

In order to determine more accurate three-dimensional attitude information of the target object, the three-dimensional attitude information of the target object may be determined by first updating the key point feature information of the key points based on the key point connection relationship, and then determining the three-dimensional attitude information of the target object based on the updated key point feature information and the pre-constructed key point connection relationship information corresponding to the target object, which may specifically be implemented by the following steps:

step one, aiming at each key point in a plurality of key points, determining updated key point feature information of the key point at different view angles based on key point feature information of the key point at different view angles and key point feature information of other key points related to the key point;

and secondly, determining the three-dimensional attitude information of the target object based on the updated key point characteristic information corresponding to the plurality of key points respectively and the pre-constructed key point connection relation information corresponding to the target object.

Here, for each keypoint, the other keypoints associated with the keypoint may be keypoints having a connection relationship with the keypoint, where the connection relationship mainly corresponds to a connection relationship between keypoints under the same view, and for the keypoint feature information of the keypoint under different views, the connection relationship between two-dimensional projection points determined for the same keypoint under different views may be determined. Taking each of the multiple viewing angles as a target viewing angle, specifically, the updating of the key point feature information of the key point at each viewing angle may be performed through the following steps:

firstly, performing first updating on key point feature information of key points under different viewing angles based on key point feature information of the key points under different viewing angles and a first connection relation between two-dimensional projection points of the key points under different viewing angles to obtain first updated key point feature information; secondly updating the key point feature information of the key point under the target view angle based on the key point feature information of the key point under the target view angle and the key point feature information of other key points which belong to the target view angle and have a second connection relation with the key point, so as to obtain second updated key point feature information;

and secondly, determining the updated key point feature information of the key point under the target view angle based on the first updated key point feature information and the second updated key point feature information.

The first connection relation between the two-dimensional projection points of the key points under different viewing angles is predetermined, and the key point feature information of the key points under one viewing angle can be updated based on the key point feature information of the key points under each viewing angle based on the first connection relation, that is, the first updated key point feature information fuses the key point features of the same key point under other views.

In addition, the key point feature information of the key point can be updated based on the key point feature information of other key points which belong to the target view and have a second connection relation with the key point, wherein the second connection relation can also be predetermined, so that the determined second updated key point feature information fuses the key point features of other key points in the same view.

By combining the first updated key point feature information and the second updated key point feature information, the determined updated key point feature information of the key point at any view angle can be more accurate.

It should be noted that, in the process of updating the key point feature information by combining the first updated key point feature information and the second updated key point feature information, the first update may be performed first, and then the second update may be performed on the basis of the first update; or the second updating can be carried out firstly, and then the first updating is carried out on the basis of the second updating; the first update and the second update may also be performed simultaneously, and then the results of the first update and the second update are fused to realize the update of the feature information of the key point, which is not limited specifically herein.

In practical applications, the above-mentioned update of the feature information of the keypoint can be realized by using Graph Neural Network (GNN). Before the feature update, a graph model may be constructed based on the first connection relationship, the second connection relationship, and the key point feature information of the key point may be continuously updated by performing a convolution operation on the graph model.

The method for adjusting the three-dimensional posture provided by the embodiment of the disclosure can be implemented by firstly fusing the feature information of the key points and then determining the three-dimensional posture information of the target object by combining the pre-constructed key point connection relation information to improve the accuracy of the three-dimensional posture information, and specifically can be implemented by the following steps:

step one, aiming at each key point in a plurality of key points, fusing key point feature information of the key point under different viewing angles to obtain fused key point feature information corresponding to the key point;

and secondly, determining three-dimensional attitude information of the target object based on pre-constructed key point connection relation information corresponding to the target object and fusion key point characteristic information corresponding to the plurality of key points respectively.

Here, the key point feature information under different viewing angles may be fused for the key point, so that the obtained fused key point feature information may give consideration to the posture of the target object under each viewing angle to some extent, and the method specifically includes the following steps:

determining a plurality of key point characteristic values of key points corresponding to dimensions under different viewing angles aiming at each dimension in the plurality of dimensions, and determining a fused key point characteristic value corresponding to the dimension based on the determined plurality of key point characteristic values;

and secondly, determining fused key point characteristic information corresponding to the key points based on the fused key point characteristic values respectively corresponding to the multiple dimensions.

Here, for each dimension of the keypoint feature information, the keypoint feature value with the largest value may be selected from a plurality of keypoint feature values corresponding to the dimension from one keypoint at different viewing angles to determine the keypoint feature value as the fused keypoint feature value corresponding to the dimension, and features of each dimension are revealed with the largest possibility.

In addition, weighted summation can be performed by combining with weighted values respectively corresponding to a plurality of key point characteristic values, and the fused key point characteristic value is determined, so that characteristic fusion for the key point is realized. In practical applications, the above-mentioned related weight values may be determined manually, or may be determined by a pre-trained weight matching network, which is not limited herein.

In the process of confirming the three-dimensional posture information of the target object, the embodiment of the present disclosure may further update the feature information of the fusion key point based on the pre-constructed key point connection relationship information corresponding to the target object, so as to further improve the accuracy of the determined posture, which may be specifically implemented by the following steps:

updating fused key point feature information corresponding to a plurality of key points respectively based on a third connection relation between the key points included in pre-constructed key point connection relation information corresponding to a target object to obtain updated fused key point feature information;

and secondly, determining the three-dimensional attitude information of the target object based on the updated fusion key point characteristic information.

Here, the information about the connection relationship of the key points constructed in advance may include a third connection relationship between the key points, where the third connection relationship may be a connection relationship formed by sequentially connecting the human skeleton points of the human body according to the human skeleton structure, and the feature information of the fusion key point corresponding to each key point may be calibrated more accurately to a certain extent, so that the determined three-dimensional posture of the target object is also more accurate.

In the embodiment of the present disclosure, the three-dimensional posture information of the target object may be determined according to the following steps:

step one, inputting updated feature information of the fusion key points into a pre-trained target posture recognition network, and outputting posture deviation information; the attitude deviation information is used for representing the deviation condition between the current attitude of the target object and the attitude to be adjusted;

and secondly, determining the adjusted three-dimensional coordinates of the plurality of key points of the target object in the target voxel space based on the attitude deviation information and the three-dimensional coordinates to be adjusted of the plurality of key points of the target object in the target voxel space, and determining the three-dimensional attitude information of the target object based on the adjusted three-dimensional coordinates.

Here, it may be that the determination using the target posture recognition network is related to posture deviation information corresponding to a deviation between the current posture and the posture to be adjusted, and based on the posture deviation information and the three-dimensional coordinate to be adjusted, the adjusted three-dimensional coordinate of the target object in the target voxel space may be determined, so that the three-dimensional posture information of the target object may be determined.

The gesture to be adjusted can be obtained by combining three-dimensional coordinates to be adjusted of a plurality of key points of the target object, so that the target gesture recognition network can output coordinate deviation values of each key point of the target object, and sum the coordinate deviation values and the corresponding three-dimensional coordinates to be adjusted, so that the adjusted three-dimensional coordinates of each key point can be determined.

In order to further understand the method for adjusting the three-dimensional posture provided by the embodiments of the present disclosure, the following may be further described with reference to fig. 2.

As shown in fig. 2, for the target object in the posture to be adjusted, three-dimensional coordinates to be adjusted of a plurality of key points in the target voxel space may be determined based on the posture to be adjusted, the three-dimensional coordinates to be adjusted are projected onto the target image captured by the cameras (Camera #1-Camera #3) in three view angles, and a graph model G ═ { V, E } may be constructed as shown in the figure.

The node V corresponds to an image feature of a key point at an image position where a two-dimensional projection point in the target image is located, and the edge E corresponds to a relationship between nodes, which may be a connection of the same key point under a cross-view angle or a connection of different key points in a single view angle.

After the graph model is constructed, the feature information of the keypoints under different viewing angles can be updated, and here, the feature updating can be specifically realized by using the GNN. In addition, the fusion of multi-view features can be done based on maximum pooling.

And updating the fusion key point feature information obtained by fusion by using pre-constructed key point connection relation information corresponding to the target object, inputting the updated fusion key point feature information into a regression network, predicting a correction value of the attitude estimation to be adjusted, and performing summation operation on the correction value and the attitude to be adjusted to determine the adjusted three-dimensional attitude information.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a three-dimensional posture adjustment device corresponding to the three-dimensional posture adjustment method, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the three-dimensional posture adjustment method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 3, a schematic diagram of an apparatus for adjusting a three-dimensional pose provided in an embodiment of the present disclosure is shown, where the apparatus includes: an acquisition module 301, a determination module 302 and an adjustment module 303; wherein the content of the first and second substances,

an obtaining module 301, configured to obtain three-dimensional coordinates to be adjusted of a plurality of key points of a target object in a target voxel space;

the determining module 302 is configured to determine, based on the three-dimensional coordinate to be adjusted, key point feature information obtained by projecting a plurality of key points in a plurality of target images respectively; the multiple target images are target images obtained by shooting target objects under multiple visual angles;

the adjusting module 303 is configured to determine three-dimensional posture information of the target object based on pre-constructed key point connection relationship information corresponding to the target object and key point feature information of the target image corresponding to the plurality of key points at the plurality of viewing angles, respectively.

By adopting the three-dimensional posture adjusting device, under the condition that the three-dimensional coordinates of a plurality of key points of the target object to be adjusted in the target voxel space are obtained, the key point characteristic information obtained by respectively projecting the key points in a plurality of target images can be determined based on the three-dimensional coordinates to be adjusted, and finally, the three-dimensional posture information of the target object is determined based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target images corresponding to the key points in a plurality of visual angles. Therefore, the connection relationship of the plurality of key points under different visual angles can be determined by utilizing the key point characteristic information of the plurality of key points under different visual angles, the connection relationship is helpful for determining more accurate key point characteristic information, in addition, the connection relationship between the key points can be constrained by combining the key point connection relationship information which is constructed in advance, the determined key point characteristic information is more accurate, and the accuracy and precision of the determined three-dimensional attitude information are further improved.

In a possible implementation manner, the determining module 302 is configured to determine, based on the three-dimensional coordinate to be adjusted, feature information of keypoints obtained by projecting a plurality of keypoints in a plurality of target images respectively, according to the following steps:

determining two-dimensional projection point information of a plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted, and extracting image features corresponding to the target images respectively;

In one possible embodiment, the two-dimensional proxel information includes image position information of the two-dimensional proxel; a determining module 302, configured to extract, from image features respectively corresponding to multiple target images, key point feature information matched with key points based on two-dimensional projection point information of the key points in the multiple target images according to the following steps:

extracting image features corresponding to the image position information from the image features corresponding to the target images based on the image position information of the two-dimensional projection points of the key points in the target images aiming at each target image in the target images;

In a possible implementation manner, the adjusting module 303 is configured to determine three-dimensional posture information of the target object based on pre-constructed key point connection relationship information corresponding to the target object and key point feature information of the target image corresponding to a plurality of key points at a plurality of viewing angles, respectively, according to the following steps:

for each key point in the plurality of key points, determining updated key point feature information of the key point at different view angles based on the key point feature information of the key point at different view angles and the key point feature information of other key points associated with the key point;

and determining the three-dimensional attitude information of the target object based on the updated key point characteristic information corresponding to the key points and the pre-constructed key point connection relation information corresponding to the target object.

In a possible implementation manner, the adjusting module 303 is configured to determine, based on the key point feature information of the key point at different viewing angles and the key point feature information of other key points associated with the key point, updated key point feature information of the key point at different viewing angles according to the following steps:

taking each of the plurality of viewing angles as a target viewing angle, respectively executing the following steps:

performing first updating on the key point feature information of the key point under different viewing angles based on the key point feature information of the key point under different viewing angles and a first connection relation between two-dimensional projection points of the key point under different viewing angles to obtain first updated key point feature information; and the number of the first and second groups,

performing second updating on the key point feature information of the key point at the target view angle based on the key point feature information of the key point at the target view angle and the key point feature information of other key points which belong to the target view angle together with the key point and have a second connection relation with the key point, so as to obtain second updated key point feature information;

and determining the updated key point feature information of the key point under the target view angle based on the first updated key point feature information and the second updated key point feature information.

and determining the three-dimensional attitude information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the fusion key point characteristic information respectively corresponding to the plurality of key points.

In one possible implementation, the keypoint feature information comprises keypoint feature values for a plurality of dimensions; the adjusting module 303 is configured to fuse the key point feature information of the key point at different viewing angles according to the following steps to obtain fused key point feature information corresponding to the key point:

determining a plurality of key point characteristic values corresponding to the dimensions of the key points under different viewing angles aiming at each dimension of the dimensions, and determining a fused key point characteristic value corresponding to the dimensions based on the determined plurality of key point characteristic values;

and determining fused key point characteristic information corresponding to the key points based on the fused key point characteristic values respectively corresponding to the multiple dimensions.

In a possible implementation manner, the adjusting module 303 is configured to determine a fused keypoint feature value corresponding to a dimension based on the determined multiple keypoint feature values as follows:

taking the average value of the characteristic values of the plurality of key points as the characteristic value of the fused key point corresponding to the dimensionality;

In a possible implementation manner, the adjusting module 303 is configured to determine three-dimensional pose information of the target object based on pre-constructed key point connection relationship information corresponding to the target object and fused key point feature information corresponding to each of the plurality of key points, according to the following steps:

In one possible implementation, each of the plurality of key points of the target object is used as a first key point, and each of the key points having the third connection relationship is used as a second key point;

the second key point is a human skeleton point;

In a possible implementation manner, the adjusting module 303 is configured to determine three-dimensional posture information of the target object based on the updated fusion key point feature information according to the following steps:

inputting the updated feature information of the fusion key points into a pre-trained target posture recognition network, and outputting posture deviation information; the attitude deviation information is used for representing the deviation condition between the current attitude of the target object and the attitude to be adjusted;

and determining the adjusted three-dimensional coordinates of the plurality of key points of the target object in the target voxel space based on the attitude deviation information and the three-dimensional coordinates to be adjusted of the plurality of key points of the target object in the target voxel space, and determining the three-dimensional attitude information of the target object based on the adjusted three-dimensional coordinates.

In a possible embodiment, the obtaining module 301 is configured to obtain three-dimensional coordinates to be adjusted of a plurality of key points of the target object in the target voxel space according to the following manner:

acquiring a plurality of target images obtained by shooting a target object under a plurality of visual angles, and determining three-dimensional coordinates to be adjusted of a plurality of key points of the target object in a target voxel space based on the plurality of target images;

the method comprises the steps of obtaining depth information respectively returned by a plurality of detection rays emitted by radio equipment, and determining three-dimensional coordinates to be adjusted of a plurality of key points of a target object in a target voxel space based on the depth information.

In one possible implementation, each target image in the acquired multiple target images is used as a first target image, and each target image in the multiple target images for keypoint projection is used as a second target image;

at least part of the image in the first target image is the same as at least part of the image in the second target image;

the first target image and the second target image do not have the same image.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides an electronic device, as shown in fig. 4, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 401, a memory 402, and a bus 403. The memory 402 stores machine-readable instructions executable by the processor 401 (for example, execution instructions corresponding to the obtaining module 301, the determining module 302, and the adjusting module 303 in the apparatus in fig. 3, and the like), when the electronic device is operated, the processor 401 and the memory 402 communicate via the bus 403, and when the machine-readable instructions are executed by the processor 401, the following processes are performed:

determining key point characteristic information obtained by projecting a plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted; the multiple target images are target images obtained by shooting target objects under multiple visual angles;

and determining the three-dimensional posture information of the target object based on the pre-constructed key point connection relation information corresponding to the target object and the key point characteristic information of the target image corresponding to the plurality of key points at the plurality of visual angles respectively.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for adjusting a three-dimensional pose in the above-mentioned method embodiments are executed. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the method for adjusting a three-dimensional pose in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of three-dimensional pose adjustment, the method comprising:

2. The method according to claim 1, wherein the determining, based on the three-dimensional coordinates to be adjusted, the key point feature information obtained by projecting the plurality of key points in a plurality of target images respectively comprises:

3. The method of claim 2, wherein the two-dimensional proxel information includes image position information of the two-dimensional proxel; the extracting, based on the two-dimensional projection point information of the key points in the plurality of target images, key point feature information matched with the key points from image features respectively corresponding to the plurality of target images includes:

for each target image in the plurality of target images, extracting image features corresponding to the image position information from image features corresponding to the target image based on image position information of two-dimensional projection points of the key points in the target image;

4. The method according to any one of claims 1 to 3, wherein the determining the three-dimensional pose information of the target object based on the pre-constructed key point connection relationship information corresponding to the target object and the key point feature information of the target image corresponding to a plurality of key points at the plurality of viewing angles respectively comprises:

5. The method according to claim 4, wherein determining updated keypoint feature information of the keypoints at different perspectives based on the keypoint feature information of the keypoints at different perspectives and the keypoint feature information of other keypoints associated with the keypoints comprises:

6. The method according to any one of claims 1 to 5, wherein the determining the three-dimensional pose information of the target object based on the pre-constructed key point connection relationship information corresponding to the target object and the key point feature information of the target image corresponding to a plurality of key points at the plurality of viewing angles respectively comprises:

7. The method of claim 6, wherein the keypoint feature information comprises keypoint feature values for a plurality of dimensions; the fusing the key point feature information of the key point under different viewing angles to obtain the fused key point feature information corresponding to the key point comprises the following steps:

8. The method according to claim 7, wherein the determining the fused keypoint feature value corresponding to the dimension based on the determined keypoint feature values comprises one of the following ways:

9. The method according to any one of claims 6 to 8, wherein the determining the three-dimensional pose information of the target object based on the pre-constructed key point connection relationship information corresponding to the target object and the fused key point feature information corresponding to each of the plurality of key points comprises:

10. The method according to claim 9, wherein each of the plurality of key points of the target object is taken as a first key point, and each of the respective key points having the third connection relationship is taken as a second key point;

the second key point is a human skeleton point;

11. The method according to claim 9 or 10, wherein the determining three-dimensional pose information of the target object based on the updated fusion keypoint feature information comprises:

12. The method according to any one of claims 1 to 11, wherein the obtaining three-dimensional coordinates to be adjusted of the plurality of key points of the target object in the target voxel space comprises one of the following manners:

13. The method according to claim 12, wherein each of the plurality of target images is acquired as a first target image, and each of the plurality of target images for the keypoint projection is acquired as a second target image;

the first target image and the second target image do not have the same image.

14. An apparatus for three-dimensional pose adjustment, the apparatus comprising:

the determining module is used for determining key point characteristic information obtained by projecting the plurality of key points in a plurality of target images respectively based on the three-dimensional coordinate to be adjusted; the target images are obtained by shooting target objects under a plurality of visual angles;

15. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of three-dimensional pose adjustment according to any of claims 1 to 13.

16. A computer-readable storage medium, having stored thereon a computer program for performing, when being executed by a processor, the steps of the method for three-dimensional pose adjustment according to any one of claims 1 to 13.