CN115937486A

CN115937486A - Pose marking method and device, electronic equipment and storage medium

Info

Publication number: CN115937486A
Application number: CN202211668238.2A
Authority: CN
Inventors: 张夏杰; 史培元
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-04-07

Abstract

The embodiment of the invention discloses a pose marking method and device, electronic equipment and a storage medium. The method comprises the following steps: displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a part image, the three-dimensional model corresponds to a target object which can be worn on a target part, and the part image comprises the target part; generating a transformation matrix in response to a control operation acting on the display screen; projecting the three-dimensional points to the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the part image, and updating and displaying the target image according to the rendering result; and responding to a pose marking completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position. According to the technical scheme of the embodiment of the invention, the high-efficiency and high-quality pose marking process can be realized.

Description

Pose marking method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a pose marking method and device, electronic equipment and a storage medium.

Background

The wrist ornament is an ornament worn on the wrist, such as a watch, a bracelet or a wrist guard. The current experience wrist decoration mainly adopts two ways of trying on or watching the wearing effect of a model on site, but the former consumes time and labor, and the latter can not experience the real effect after the wrist decoration is worn, so that the virtual try-on technology is produced.

The virtual try-on technology is that a wrist in a real environment is shot by taking a camera as input, and a wrist decoration is rendered on the wrist part, so that a user can watch the decoration wearing effect through a display screen, and the try-on effect is achieved. The primary premise for realizing the technology is to identify the pose of the wrist, and for the scheme of identifying the pose through a deep learning model, the pose of the wrist in a training set needs to be marked for model learning.

In the process of implementing the invention, the inventor finds that the following technical problems exist in the prior art: the existing pose marking scheme has the problems of low efficiency and low quality.

Disclosure of Invention

The embodiment of the invention provides a pose marking method and device, electronic equipment and a storage medium, so as to realize a high-efficiency and high-quality pose marking process.

According to an aspect of the invention, a pose labeling method is provided, which may include:

displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a part image, the three-dimensional model corresponds to a target object which can be worn on a target part, and the part image comprises the target part;

generating a transformation matrix in response to a control operation acting on the display screen;

projecting the three-dimensional points onto the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection onto the part image, and updating and displaying the target image according to the rendering result;

and responding to a pose marking completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position.

According to another aspect of the present invention, there is provided a pose labeling apparatus, which may include:

the target image display module is used for displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a position image, the three-dimensional model corresponds to a target object which can be worn on a target position, and the position image comprises the target position;

a transformation matrix generation module for generating a transformation matrix in response to a control operation acting on the display screen;

the target image updating module is used for projecting the three-dimensional points to the position image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the position image, and updating and displaying the target image according to a rendering result;

and the pose obtaining module is used for responding to the pose marking finishing instruction, obtaining the position of the projected two-dimensional point in the target image and obtaining the pose of the target part according to the position.

According to another aspect of the present invention, there is provided an electronic device, which may include:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the pose labeling method provided by any of the embodiments of the present invention when executed.

According to another aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer instructions for causing a processor to execute the method for pose annotation provided by any of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, a target image is displayed on a display screen, the target image is obtained by projecting and rendering 3D points on a 3D model onto a position image, the 3D model corresponds to a target object which can be worn on a target position, and the position image comprises the target position; generating a transformation matrix which can be used for transforming the 3D model after being controlled and operated from an object coordinate system of the 3D model to an equipment coordinate system of shooting equipment of the position image in response to the control operation acted on the display screen; projecting the 3D points onto the region image again based on the transformation matrix, rendering the projected 2D points onto the region image, and updating and displaying the target image according to a rendering result so that a user can determine whether to continuously control the 3D model; and responding to the pose marking completion instruction, and acquiring the positions of the projected 2D points in the target image so as to obtain the pose of the target part according to the positions. According to the technical scheme, the transformation matrix of the 3D model is generated through control operation triggered in the annotation process, the 3D points on the 3D model are projected onto the position image based on the transformation matrix, and the relative poses among the 3D points are fixed and unchanged, so that when the poses are obtained based on the positions of the projected 2D points in the target image after pose annotation is completed, the corresponding poses of different target images (or the position images) have uniformity, and therefore the annotation quality is guaranteed; meanwhile, for each part image, the positions of the 2D points are not required to be sequentially marked, and the pose marking process can be completed only through limited times of control operation, so that the marking efficiency is ensured.

It should be understood that the statements in this section do not necessarily identify key or critical features of any embodiment of the present invention, nor do they necessarily limit the scope of the present invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a pose marking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a position image and a corresponding target image in a pose labeling method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another pose marking method provided by the embodiment of the invention;

fig. 4 is an exemplary diagram of a sphere model and its associated calculation in another pose labeling method provided by the embodiment of the invention;

FIG. 5 is a flowchart of another pose marking method according to an embodiment of the present invention;

FIG. 6 is an exemplary diagram of a projection of a 3D point and a 2D point in another pose labeling method provided by an embodiment of the invention;

FIG. 7 is a flowchart of another pose marking method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a watch model and a combined model in another pose labeling method provided by an embodiment of the invention from a top view;

FIG. 9 is a schematic diagram of the width-to-thickness ratio of a wrist model represented by a cylinder in another pose labeling method provided by the embodiment of the invention;

fig. 10 is a schematic frame diagram of an alternative example of another pose labeling method according to an embodiment of the present invention;

fig. 11 is a block diagram of a pose marking apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device implementing the pose labeling method according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. The cases of "target", "original", etc. are similar and will not be described in detail herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For better understanding of the reasons for the high efficiency and quality of the pose marking method explained below, a pose marking scheme that is currently in common use is first exemplified here. For example, at present, the pose estimation is usually realized based on a deep learning model, and it should be noted that, for the pose estimation based on a monocular, it is difficult for the model to directly learn three-dimensional (3D) information, that is, it is difficult to learn the pose from the 3D information, and the model learning is facilitated by converting the 3D information into two-dimensional (2D) information. Therefore, for the wrist image obtained by shooting the wrist, the positions of the feature points on the wrist in the wrist image are usually directly labeled, so as to obtain the pose of the wrist based on the positions. However, a lot of wrist images are concentrated in the training set, which causes that the user needs to label the wrist images at a plurality of positions in sequence, and the labeling efficiency is low; moreover, in different wrist images, poses obtained based on the positions marked manually are different, and marking quality is low.

Fig. 1 is a flowchart of a pose labeling method provided in an embodiment of the present invention. The embodiment can be applied to the situation of pose marking. The method can be executed by the pose marking device provided by the embodiment of the invention, the device can be realized in a software and/or hardware mode, the device can be integrated on electronic equipment, and the equipment can be various user terminals or servers.

Referring to fig. 1, the method of the embodiment of the present invention specifically includes the following steps:

and S110, displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a position image, the three-dimensional model corresponds to a target object which can be worn on a target position, and the position image comprises the target position.

The target image is displayed on the display screen, so that a user can browse whether a target object in the target image is just worn on the target part or not through the display screen. Specifically, the target portion may be a portion where the target object can be worn, such as a head, a neck, a wrist, or an ankle, and the portion image may be an image obtained by photographing the target portion. The object item may be an item wearable on a target portion, such as a hat or a hair clip wearable on the head, a necklace or a silk towel wearable on the neck, a watch or a bracelet wearable on the wrist, and the like, and an ankle or an ankle wearable on the ankle. In practical application, a 3D model corresponding to the target object may be preset, where the 3D model may be an object model of the target object itself in a 3D space, or may be a combined model obtained by combining the object model and a part model of the target part itself in the 3D space, and the like, and is not specifically limited herein. The 3D points may be spatial points on the 3D model, and the target image obtained by projecting and rendering the 3D points on the map image may include a target part directly photographed and a target object obtained by rendering.

And S120, generating a transformation matrix in response to the control operation acted on the display screen.

The control operation can be an operation acting on a display screen and used for controlling the 3D model to translate and/or rotate, and for example, a user can slide a mouse wheel to trigger the control operation so as to zoom in or zoom out the 3D model; the control operation is triggered by dragging by pressing a right mouse button, so that the 3D model is moved upwards, downwards, leftwards or rightwards; the control operation is triggered by pressing a left mouse button for dragging, so that the 3D model is rotated; and the like, are not specifically limited herein. It should be noted that, the first 2 examples can be regarded as a translation 3D model, and the 3 rd example can be regarded as a rotation 3D model.

In response to the control operation, a transformation matrix for projecting the 3D points on the translated and/or rotated 3D model onto the part image is generated, i.e., a transformation matrix for transforming the 3D points on the translated and/or rotated 3D model from the object coordinate system in which they are located to the device coordinate system in which the photographing device of the part image is located (e.g., the camera coordinate system in which the camera is located) is generated. In practical applications, the transformation matrix may be represented by an offset vector for representing the translation result of the 3D model and a rotation matrix for representing the rotation result of the 3D model. The pose marking device can respond to the control operation to generate a corresponding transformation matrix when receiving the control operation, and execute the next step based on the transformation matrix.

And S130, projecting the three-dimensional points to the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the part image, and updating and displaying the target image according to the rendering result.

And re-projecting the 3D points on the 3D model onto the position image based on the transformation matrix to obtain 2D points corresponding to the 3D points respectively. Then, the projected 2D point is rendered on the part image, so that the currently displayed target image is updated based on the rendering result (i.e., the rendered part image), and the updated target image is displayed on the display screen. In this way, the user can directly determine whether the target object is worn on the target part after the current annotation (i.e., control operation) through the display screen. If not, the next control operation can be carried out; otherwise, a pose marking completion instruction can be triggered. For example, referring to the position image shown in the left diagram and the target image shown in the right diagram in fig. 2, the target image may show the relative position relationship between the wrist and the watch after the labeling of this time, and the user may visually determine whether the watch is worn on the wrist, so as to perform the next operation.

And S140, responding to the pose marking completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position.

The pose marking completion instruction may be an instruction triggered by a user to indicate that the pose of the target portion in the currently displayed target image (or the portion image corresponding to the target image) is marked, and in response to the instruction, the positions of the projected 2D points in the target image are obtained, and then the pose of the target portion is obtained based on the positions.

It should be noted that the reason why the pose is obtained by the position of the projected 2D point in the target image, not by the position of the directly shot or rendered 2D point, is that the projected 2D points can be corresponding to the corresponding 3D points on the 3D model, and the relative pose between the 3D points on the 3D model is fixed, so that when the pose is obtained based on the position of the projected 2D point, the corresponding poses of different target images are unified, thereby ensuring the quality of pose labeling.

In practical application, optionally, after the labeling work of the part images in the training set is completed, any one part image and the positions of the 2D points corresponding to the part image can be used as a group of training samples, and model training is performed based on a plurality of groups of training samples, so that a pose estimation model capable of performing pose estimation is obtained.

According to the technical scheme of the embodiment of the invention, a target image is displayed on a display screen, the target image is obtained by projecting and rendering 3D points on a 3D model onto a position image, the 3D model corresponds to a target object which can be worn on a target position, and the position image comprises the target position; generating a transformation matrix which can be used for transforming the 3D model after being controlled and operated from an object coordinate system of the 3D model to a position image from an equipment coordinate system of shooting equipment in response to the control operation acted on the display screen; projecting the 3D points to the region image again based on the transformation matrix, rendering the projected 2D points to the region image, and updating and displaying the target image according to a rendering result so that a user can determine whether to continuously control the 3D model; and responding to the pose marking completion instruction, and acquiring the positions of the projected 2D points in the target image so as to obtain the pose of the target part according to the positions. According to the technical scheme, the transformation matrix of the 3D model is generated through control operation triggered in the annotation process, the 3D points on the 3D model are projected onto the position image based on the transformation matrix, and the relative poses among the 3D points are fixed and unchanged, so that when the poses are obtained based on the positions of the projected 2D points in the target image after pose annotation is completed, the corresponding poses of different target images (or the position images) have uniformity, and therefore the annotation quality is guaranteed; meanwhile, for each part image, the positions of the 2D points are not required to be sequentially marked, and the pose marking process can be completed only through limited times of control operation, so that the marking efficiency is ensured.

Fig. 3 is a flowchart of another pose labeling method according to an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, generating the transformation matrix in response to a control operation acting on the display screen may include: obtaining an offset vector and a rotation matrix in response to a control operation acting on a display screen; and generating a transformation matrix according to the offset vector and the rotation matrix. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 3, the method of this embodiment may specifically include the following steps:

s210, displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a position image, the three-dimensional model corresponds to a target object which can be worn on a target position, and the position image comprises the target position.

And S220, responding to the control operation acted on the display screen, and obtaining an offset vector and a rotation matrix.

When the control operation is used for controlling the 3D model to translate, the translation result of the time can be represented by the offset vector, namely, the translation of the 3D model can be realized by adjusting the offset vector; when the control operation is used to control the 3D model to rotate, the rotation result of this time can be represented by a rotation matrix, i.e., the rotation of the 3D model can be achieved by adjusting the rotation matrix. In response to the control operation, an offset vector and a rotation matrix may be derived, both of which are key factors for subsequent generation of the transformation matrix.

And S230, generating a transformation matrix according to the offset vector and the rotation matrix.

Wherein the offset vector and the rotation matrix are combined to generate a transformation matrix. For example, assuming that the offset vector is represented by T and the rotation matrix is represented by R, the transformation matrix thus generated may be represented by T = [ R | T ].

And S240, projecting the three-dimensional points to the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the part image, and updating and displaying the target image according to the rendering result.

And S250, responding to a pose marking completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position.

According to the technical scheme of the embodiment of the invention, the offset vector used for representing the translation result and the rotation matrix used for representing the rotation result are obtained by responding to the control operation, and the transformation matrix is further generated according to the offset vector and the rotation matrix, so that the effect of accurately generating the transformation matrix is ensured.

An alternative solution, where the control operation includes a panning operation for panning the three-dimensional model, and the deriving the offset vector in response to the control operation acting on the display screen may include: acquiring a translation direction and a translation size in response to a translation operation acting on the display screen; according to the translation directions andand translating the size to obtain an offset vector. The translation direction can represent the direction to which the 3D model is translated, the translation size can represent how much the 3D model is translated, and an accurate offset vector can be obtained according to the two methods. Illustratively, assume that the offset vector t is represented by the following equation:

when the translation direction is the z-direction and the translation magnitude is 3 units, the offset vector at this time can be expressed as ≧ greater>

In another alternative embodiment, when the control operation includes a rotation operation for rotating the three-dimensional model, obtaining a rotation matrix in response to the control operation applied to the display screen may include: obtaining a rotation axis and a rotation angle in response to a rotation operation acting on the display screen; and obtaining a rotation matrix according to the rotation axis and the rotation angle. Wherein the rotation axis may indicate along which axis the 3D model is rotated, and the rotation angle may indicate how many angles the 3D model is rotated along the rotation axis, such as 45 ° along the x-axis. According to the two factors of the rotating shaft and the rotating angle, the rotating matrix can be accurately determined.

On this basis, optionally, the pose labeling method may further include: acquiring a predefined sphere model matched with a target image; obtaining the rotation axis and the rotation angle in response to a rotation operation acting on the display screen may include: acquiring a rotation start point and a rotation end point of the rotation operation on the target image in response to the rotation operation acting on the display screen; determining a starting vector corresponding to the rotation starting point on the sphere model and an ending vector corresponding to the rotation ending point on the sphere model; and obtaining the rotating shaft and the rotating angle according to the starting vector and the ending vector.

In consideration of the fact that the control operation triggered by the user can only operate the 2D space where the 2D screen is located and cannot operate the 3D space where the 3D model is located, it is necessary to convert the position information in the 2D space corresponding to the control operation into the rotation information in the 3D space. To achieve this, a sphere model matched with the target image is predefined, so as to realize the operation of the 2D space on the 3D space based on the sphere model. Specifically, the rotation operation is performed on the display screen, and the target image is displayed on the display screen, so that the rotation start point and the rotation end point of the rotation operation on the target image can be obtained by using the display screen as an intermediate. Furthermore, since the sphere model is matched with the target image and the rotation starting point is a 2D point on the target image, a starting vector corresponding to the rotation starting point on the sphere model can be obtained by using the target image as an intermediate medium. The process of obtaining the end vector is similar and will not be described herein. On the basis, when the 3D model is controlled to rotate based on the rotation operation, the rotation of the 3D model can be understood as the rotation from the start vector to the end vector (i.e. the 3D model originally facing the start vector and facing the end vector after the rotation), and the rotation can be expressed by the shaft angle (i.e. the rotation shaft and the rotation angle), so that the rotation shaft and the rotation angle can be obtained according to the start vector and the end vector, and further the rotation matrix can be obtained according to the rotation shaft and the rotation angle, thereby realizing the effect of accurately determining the rotation matrix corresponding to the rotation operation. In practical applications, optionally, a cross product result of the start vector and the end vector may be used as a rotation axis, and an inner product result of the start vector and the end vector may also be used as a rotation angle, and the like, which is not specifically limited herein.

On the basis, optionally, the origin of the sphere model is located at the center of the target image, the radius of the sphere model comprises a half of the minimum value in the width and the height of the target image, an xyz coordinate system where the sphere model is located is constructed on the basis of the x direction, the y direction and the z direction, and an xy plane formed on the basis of the x direction and the y direction is superposed with a uv plane where the target image is located; determining a corresponding start vector of the rotation start point on the sphere model may include: obtaining an x component according to the coordinate and the width of the rotation starting point in the x direction; obtaining a y component according to the coordinate and the height of the rotation starting point in the y direction; obtaining a z component according to the radius, the x component and the y component; and obtaining a corresponding starting vector of the rotation starting point on the sphere model according to the x component, the y component and the z component. Here, since the xy plane coincides with the uv plane, the rotation start point on the target image (i.e., the uv plane) can also be regarded as the rotation start point on the xy plane. On the basis, by combining the relation between the radius of the spherical model and the width and the height of the target image and the relation between the origin of the spherical model and the center of the target image, each component of the initial vector can be obtained according to the coordinates of the rotation initial point in the x direction and the y direction, the width, the height and the radius, so that the initial vector is obtained according to the components, and the effect of accurately determining the initial vector is achieved. The determination process of the ending vector is similar and is not described herein again.

To better understand the determination of the rotation torque matrix as a whole, it is exemplified below with reference to specific examples. Illustratively, as shown in fig. 4, the left diagram in the figure shows a schematic diagram of a projection result of a sphere model on a target image, wherein an origin of the sphere model is located at the center of the target image, a radius r of the sphere model is 1/2 of a minimum value of a width w and a height h of the target image, i.e., r = min (w, h)/2, the sphere model is located under an xyz coordinate system, the target image is located under a uv coordinate system, and an xy plane under the xyz coordinate system coincides with a uv plane under the uv coordinate system. On this basis, it is assumed that the rotation start point of the rotation operation is located at a point m on the target image ₁ ＝(m _x ,m _y ) Wherein m is _x Represents m ₁ Coordinate in the x-direction, m _y Represents m ₁ Coordinates in the y direction; meanwhile, the rotation starting point is also positioned at the starting vector of the sphere model

As shown in the middle illustration in fig. 4.

The components of (a) are calculated as follows:

wherein p is _x To represent

Distance from intersection with target image to y-axis, p _y Denotes the distance, p, of the intersection point to the x-axis _z Represents->

Distance to the target image. Point m on the target image ₂ In the rotating end point of (a) an end vector &>

The determination process is similar, as shown in the right diagram of fig. 4, and is not described herein again. It should be noted that, during the 3D model rotation process, the object coordinate system in which the 3D model is located rotates synchronously with the 3D model, but the xyz coordinate system in which the spherical model is located does not rotate. Further, the axis of rotation->

And the rotation angle θ can be represented by the following equation:

wherein

Is->

And &>

Cross knot ofFruit, θ is>

And &>

The inner product of (2). To this end, the rotation matrix R can be represented by the following equation: />

Fig. 5 is a flowchart of another pose labeling method provided in the embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the re-projecting the three-dimensional point onto the part image based on the transformation matrix may include: acquiring a labeling result aiming at the three-dimensional model, and determining a labeled point from the three-dimensional points based on the labeling result; and based on the transformation matrix, the marked points are re-projected onto the part image. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 5, the method of this embodiment may specifically include the following steps:

s310, obtaining a labeling result aiming at the three-dimensional model, and determining a labeled point from three-dimensional points of the three-dimensional model based on the labeling result, wherein the three-dimensional model corresponds to a target object which can be worn on a target part.

Since the 2D points obtained by projecting each 3D point on the 3D model onto the part image are not necessarily all the points with obvious features on the target part, the 2D points with unobvious features may affect the model learning effect. Therefore, a user can label the 3D model firstly, and labeled points which are beneficial to model learning are labeled from all the 3D points, so that the pose labeling device can obtain labeling results for the 3D model, and can determine the labeled points from the 3D points on the 3D model based on the labeling results, and then only the labeled points can be projected onto a part image, and the model learning effect is guaranteed.

And S320, displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering the marked points on a part image, and the part image comprises a target part.

And S330, generating a transformation matrix in response to the control operation acted on the display screen.

And S340, based on the transformation matrix, re-projecting the marked points to the part image, rendering the two-dimensional points obtained by projection to the part image, and updating and displaying the target image according to the rendering result.

And S350, responding to the pose labeling completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position.

According to the technical scheme of the embodiment of the invention, the labeling result aiming at the 3D model is obtained, then the labeled points are determined from the 3D points of the 3D model based on the labeling result, the 2D points corresponding to the labeled points are usually the points with obvious characteristics on the target part, and the labeled points are projected on the part image subsequently, so that the model learning effect is ensured.

On the basis of the technical scheme, the practical application scene is considered, optionally, after model training is carried out on the basis of the positions of the 2D points corresponding to the marked points, under the condition that the model training effect is not good, a user can mark the 3D model again, so that the position and pose marking device can acquire the marked points again and execute subsequent steps until the marked points capable of bringing a better model training effect are obtained, and the subsequent model training effect is effectively ensured by the flexible selection mode of the marked points. Optionally, when the 3D model is the combined model set forth above, the marked points may be from the target region, which is more favorable for quick selection of suitable marked points, for example, referring to fig. 6, the points with larger volume (i.e., circled by a dotted line) on the left graph in the diagram represent 10 marked points marked out from all the 3D points on the wrist model, and the right graph in the diagram represents 2D points obtained by projecting the 10 marked points onto the image of the region, and the features of the 2D points on the wrist are more obvious.

On the basis of any of the above technical solutions, optionally, re-projecting the three-dimensional point onto the position image based on the transformation matrix may include: determining internal parameters of the shooting equipment of the position image according to the width and the height of the position image; the three-dimensional points are re-projected onto the part image according to the internal reference and the transformation matrix. For example, internal reference K is represented by the following formula:

where w represents the width and h represents the height. For another example, the projection process of the 3D point can be implemented by the following equation:

Where R denotes a rotation matrix, t denotes an offset vector, P _w Representing the coordinates of the 3D point in the object coordinate system in which the point is located, and Z represents P _w The coordinates in the Z direction of the object coordinate system, (u, ν) indicate coordinates of the 2D point corresponding to the 3D point in the uv coordinate system.

Fig. 7 is a flowchart of another pose labeling method provided in the embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the pose labeling method may further include: acquiring a three-dimensional object model preset for a target object and a three-dimensional part model preset for a target part; and combining the object model and the part model to obtain a three-dimensional model. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 7, the method of this embodiment may specifically include the following steps:

s410, a three-dimensional part model preset for the target part and a three-dimensional object model preset for a target object capable of being worn on the target part are acquired.

In order to ensure the accuracy of pose marking, different types of target parts can respectively correspond to respective part models, for example, a wrist model corresponds to a wrist and a head model corresponds to a head; correspondingly, target objects of different types or different models in the same type can also respectively correspond to the respective object models, for example, a brand a watch, a brand B watch and a bracelet respectively correspond to the respective object models.

And S420, combining the object model and the part model to obtain a three-dimensional model.

The object model and the part model are combined or bound to obtain a three-dimensional model, wherein the three-dimensional model is the combined model explained above. Illustratively, referring to fig. 8, taking the target object as a watch and the target location as a wrist as an example, the first row is a schematic view of the watch model in a top view, and the second row is a schematic view of a combined model obtained by combining the wrist model and the watch model in the top view, and the schematic view of the second row can show the effect of wearing the watch on the wrist.

It should be noted that, the advantage of using the combined model as the three-dimensional model is that the shielding effect of the target portion when actually worn on the target object can be embodied, for example, the inner side of the watch is shielded by the wrist, and the shielding effect is more beneficial for the user to determine whether the target object is worn on the target portion exactly when marking, so that a more real pose marking effect can be obtained.

And S430, displaying a target image on the display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on the three-dimensional model onto the part image, and the part image comprises the target part.

S440, generating a transformation matrix in response to the control operation acted on the display screen.

And S450, projecting the three-dimensional points to the region image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the region image, and updating and displaying the target image according to the rendering result.

And S460, responding to the pose labeling completion instruction, and acquiring the position of the projected two-dimensional point in the target image so as to obtain the pose of the target part according to the position.

According to the technical scheme of the embodiment of the invention, the three-dimensional object model preset for the target object and the three-dimensional part model preset for the target part are combined, and the subsequent steps are executed based on the obtained three-dimensional model, so that the authenticity of the pose marking result obtained subsequently is ensured.

On this basis, optionally, the origin and the orientation of the object coordinate system where the object model is located and the origin and the orientation of the object coordinate system where the part model is located are the same, and further, the origin and the orientation of the object coordinate system where the object model of each target object that can be worn on the target part and the origin and the orientation of the object coordinate system where the part model is located are the same, which means that the object coordinate systems are all the same, and then, for any target object, the rotation matrix obtained through the following steps can be applied to other target objects, thereby further improving the pose labeling efficiency.

Optionally, when the target portion is a wrist, the portion model may be represented by a cylinder, and the aspect ratio of the cylinder is obtained based on the statistics of the wrist size data collected in advance. Since the wrist size data of each person may have a certain difference, in order to reduce the difference, the wrist size data may be counted, so as to obtain a standardized (or averaged, that is, applicable to each wrist) wrist model matching the real wrist, thereby ensuring the accuracy of pose labeling. On the basis, the wearing effect of the wristlet can be influenced by the rotation of the wrist, and the wrist is in the condition of inconsistent width-thickness ratio in the rotation process, so that the width-thickness ratio is obtained through the statistics of the size data of the wrist, the wrist model is set based on the width-thickness ratio, and the width and the height of the wrist model can be determined according to the specific size data of the wristlet. For example, see the width-to-thickness ratio of the wrist model shown in FIG. 9, where the width is xx times the thickness.

In order to better understand the above technical solutions as a whole, the following description is given by way of example with reference to specific examples. For example, see a schematic frame diagram of a pose labeling system shown in fig. 10, which mainly includes three stages of 3D model design, artificial labeling, and 2D projection. In the 3D model design stage, a standardized watch model is obtained according to the collected watch size data statistics, an average wrist model is obtained according to the collected wrist size data statistics, and then the wrist model and the watch model are bound to obtain a wrist-watch integrated combined model. In the manual annotation stage, the joint model is projected and rendered on the wrist image to obtain a rendering graph represented by the watch in the diagram in the manual annotation stage, and then the joint model is translated and rotated according to control operation triggered by a user. In the 2D projection stage, selected midpoints are marked out from the 3D points of the joint model, and then the selected midpoints are projected in the regional image based on the mode view transformation, the projection transformation, the perspective division and the viewport transformation to obtain 2D points. After the pose labeling is finished, the obtained coordinates of the 2D points can be used for subsequent model training.

Fig. 11 is a structural block diagram of a pose marking apparatus according to an embodiment of the present invention, which is configured to execute the pose marking method according to any of the embodiments described above. The device and the pose marking method of each embodiment belong to the same inventive concept, and details which are not described in detail in the embodiment of the pose marking device can refer to the embodiment of the pose marking method. Referring to fig. 11, the apparatus may specifically include: a target image display module 510, a transformation matrix generation module 520, a target image update module 530, and a pose derivation module 540.

The target image display module 510 is configured to display a target image on a display screen, where the target image is obtained by projecting and rendering a three-dimensional point on a three-dimensional model onto a region image, the three-dimensional model corresponds to a target object that can be worn on a target region, and the region image includes the target region;

a transformation matrix generation module 520 for generating a transformation matrix in response to a control operation acting on the display screen;

a target image updating module 530, configured to re-project the three-dimensional points onto the region image based on the transformation matrix, render the two-dimensional points obtained by projection onto the region image, and update and display the target image according to the rendering result;

and a pose obtaining module 540, configured to obtain, in response to the pose labeling completion instruction, a position of the two-dimensional point projected in the target image, so as to obtain a pose of the target portion according to the position.

Optionally, the transformation matrix generating module 520 may include:

a rotation matrix obtaining submodule for obtaining an offset vector and a rotation matrix in response to a control operation applied to the display screen;

and the transformation matrix generation submodule is used for generating a transformation matrix according to the offset vector and the rotation matrix.

On this basis, as an alternative, the control operation may include a translation operation for translating the three-dimensional model, and the rotation matrix obtaining sub-module may include:

a translation size acquisition unit configured to acquire a translation direction and a translation size in response to a translation operation acting on the display screen;

and the offset vector obtaining unit is used for obtaining an offset vector according to the translation direction and the translation size.

Alternatively, the control operation includes a rotation operation for rotating the three-dimensional model, and the rotation matrix obtaining sub-module may include:

a rotation angle obtaining unit for obtaining a rotation axis and a rotation angle in response to a rotation operation acting on the display screen;

and the rotation matrix obtaining unit is used for obtaining a rotation matrix according to the rotation axis and the rotation angle.

On this basis, optionally, the rotation matrix obtaining sub-module may further include:

the sphere model acquiring unit is used for acquiring a predefined sphere model matched with the target image;

the rotation angle obtaining unit may include:

a rotation end point acquisition subunit operable to acquire a rotation start point and a rotation end point of the rotation operation on the target image in response to the rotation operation acting on the display screen;

the starting vector determining subunit is used for determining a corresponding starting vector of the rotation starting point on the sphere model;

an end vector determining subunit, configured to determine an end vector corresponding to the rotation end point on the sphere model;

and the rotation angle obtaining subunit is used for obtaining the rotation axis and the rotation angle according to the starting vector and the ending vector.

On the basis, optionally, the origin of the sphere model is located at the center of the target image, the radius of the sphere model comprises half of the minimum value in the width and the height of the target image, an xyz coordinate system in which the sphere model is located is constructed based on the x direction, the y direction and the z direction, and an xy plane formed based on the x direction and the y direction is superposed with a uv plane in which the target image is located;

the start vector determining subunit may be specifically configured to:

obtaining an x component according to the coordinate and the width of the rotation starting point in the x direction;

the y component is obtained according to the coordinate and the height of the rotation starting point in the y direction;

obtaining a z component according to the radius, the x component and the y component;

and obtaining a corresponding starting vector of the rotation starting point on the sphere model according to the x component, the y component and the z component.

Optionally, the target image updating module 530 may include:

the marked point determining unit is used for acquiring a marking result aiming at the three-dimensional model and determining a marked point from the three-dimensional point based on the marking result;

and the marked point projection unit is used for re-projecting the marked points to the position image based on the transformation matrix.

Optionally, the target image updating module 530 may include:

an internal reference determination unit for determining an internal reference of the photographing apparatus of the partial image based on the width and height of the partial image;

and the three-dimensional point projection unit is used for re-projecting the three-dimensional points to the part image according to the internal reference and the transformation matrix.

Optionally, the pose marking apparatus may further include:

a part model acquisition module for acquiring a three-dimensional object model set for a target object in advance and a three-dimensional part model set for a target part in advance;

and the three-dimensional model obtaining module is used for combining the object model and the part model to obtain a three-dimensional model.

Optionally, the origin and the orientation of the object coordinate system where the object model is located and the object coordinate system where the part model is located are the same; and/or the presence of a gas in the gas,

when the target part is a wrist, the part model is represented by a cylinder, and the width-thickness ratio of the cylinder is obtained based on the statistics of the size data of the wrist collected in advance.

The pose marking device provided by the embodiment of the invention displays a target image on a display screen through a target image display module, wherein the target image is obtained by projecting and rendering 3D points on a 3D model onto a part image, the 3D model corresponds to a target object which can be worn on a target part, and the part image comprises the target part; responding to the control operation acted on the display screen through a transformation matrix generation module, and generating a transformation matrix which can be used for transforming the 3D model after the control operation from an object coordinate system of the 3D model to an equipment coordinate system of shooting equipment of the position image; projecting the 3D points to the region image again through the target image updating module based on the transformation matrix, rendering the projected 2D points to the region image, and then updating and displaying the target image according to a rendering result so that a user can determine whether to continuously control the 3D model; and responding to a pose marking completion instruction through a pose obtaining module, and obtaining the positions of the projected 2D points in the target image so as to obtain the pose of the target part according to the positions. According to the device, the transformation matrix of the 3D model is generated through the control operation triggered in the labeling process, the 3D points on the 3D model are projected onto the position image based on the transformation matrix, and the relative poses among the 3D points are fixed and unchanged, so that when the poses are obtained based on the positions of the projected 2D points in the target image when the pose labeling is completed, the corresponding poses of different target images (or the position images) have uniformity, and the labeling quality is ensured; meanwhile, for each part image, the positions of the 2D points are not required to be sequentially marked, and the pose marking process can be completed only through limited times of control operation, so that the marking efficiency is ensured.

The pose marking device provided by the embodiment of the invention can execute the pose marking method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, in the embodiment of the pose marking apparatus, each included unit and each included module are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

FIG. 12 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 12, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 executes the respective methods and processes described above, such as the pose labeling method.

In some embodiments, the pose annotation method can be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the pose labeling method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the pose annotation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pose labeling method is characterized by comprising the following steps:

displaying a target image on a display screen, wherein the target image is obtained by projecting and rendering three-dimensional points on a three-dimensional model onto a region image, the three-dimensional model corresponds to a target object wearable on a target region, and the region image includes the target region;

projecting the three-dimensional points onto the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection onto the part image, and updating and displaying the target image according to a rendering result;

2. The method of claim 1, wherein generating a transformation matrix in response to a control operation acting on the display screen comprises:

obtaining an offset vector and a rotation matrix in response to a control operation acting on the display screen;

and generating a transformation matrix according to the offset vector and the rotation matrix.

3. The method of claim 2, wherein the control operation comprises a translation operation for translating the three-dimensional model;

the obtaining of the offset vector in response to the control operation acting on the display screen comprises:

acquiring a translation direction and a translation size in response to the translation operation acting on the display screen;

and obtaining an offset vector according to the translation direction and the translation size.

4. The method of claim 2, wherein the control operation comprises a rotation operation for rotating the three-dimensional model;

the obtaining of the rotation matrix in response to the control operation acting on the display screen includes:

obtaining a rotation axis and a rotation angle in response to the rotation operation acting on the display screen;

and obtaining a rotation matrix according to the rotation axis and the rotation angle.

5. The method of claim 4, further comprising:

acquiring a predefined sphere model matched with the target image;

obtaining a rotation axis and a rotation angle in response to the rotation operation acting on the display screen, including:

acquiring a rotation starting point and a rotation ending point of the rotation operation on the target image in response to the rotation operation acting on the display screen;

determining a starting vector corresponding to the rotation starting point on the sphere model and an ending vector corresponding to the rotation ending point on the sphere model;

and obtaining a rotating shaft and a rotating angle according to the starting vector and the ending vector.

6. The method of claim 5, wherein:

the origin of the spherical model is located at the center of the target image, the radius of the spherical model comprises half of the minimum value in the width and the height of the target image, an xyz coordinate system where the spherical model is located is constructed on the basis of the x direction, the y direction and the z direction, and an xy plane formed on the basis of the x direction and the y direction is superposed with a uv plane where the target image is located;

the determining a starting vector corresponding to the rotation starting point on the sphere model comprises:

obtaining an x component according to the coordinate of the rotation starting point in the x direction and the width;

obtaining a y component according to the coordinate of the rotation starting point in the y direction and the height;

and obtaining a starting vector corresponding to the rotation starting point on the sphere model according to the x component, the y component and the z component.

7. The method of claim 1, wherein said re-projecting the three-dimensional points onto the part image based on the transformation matrix comprises:

acquiring a labeling result aiming at the three-dimensional model, and determining a labeled point from the three-dimensional points based on the labeling result;

based on the transformation matrix, the annotated point is re-projected onto the part image.

8. The method of claim 1, wherein said re-projecting the three-dimensional points onto the part image based on the transformation matrix comprises:

determining internal parameters of the shooting equipment of the part image according to the width and the height of the part image;

and re-projecting the three-dimensional point onto the part image according to the internal reference and the transformation matrix.

9. The method of claim 1, further comprising:

acquiring a three-dimensional object model preset for the target object and a three-dimensional part model preset for the target part;

and combining the object model and the part model to obtain the three-dimensional model.

10. The method according to claim 9, wherein the origin and orientation of the object coordinate system in which the object model is located and the object coordinate system in which the part model is located are the same; and/or the presence of a gas in the gas,

and when the target part is a wrist, the part model is represented by a cylinder, and the width-thickness ratio of the cylinder is obtained based on the statistics of the size data of the wrist collected in advance.

11. A pose labeling apparatus, comprising:

a target image display module, configured to display a target image on a display screen, where the target image is obtained by projecting and rendering a three-dimensional point on a three-dimensional model onto a region image, the three-dimensional model corresponds to a target object that can be worn on a target region, and the region image includes the target region;

the target image updating module is used for projecting the three-dimensional points to the part image again based on the transformation matrix, rendering the two-dimensional points obtained by projection to the part image, and updating and displaying the target image according to a rendering result;

and the pose obtaining module is used for responding to a pose marking finishing instruction, obtaining the position of the projected two-dimensional point in the target image and obtaining the pose of the target part according to the position.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the pose labeling method according to any one of claims 1 to 10.

13. A computer-readable storage medium characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement, when executed, the pose annotation method according to any one of claims 1 to 10.