CN114973304A

CN114973304A - Human body posture estimation method and device, terminal equipment and storage medium

Info

Publication number: CN114973304A
Application number: CN202110204533.1A
Authority: CN
Inventors: 赵晨晨; 李渊
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2022-08-30

Abstract

The application is applicable to the technical field of image processing, and provides a human body posture estimation method, a human body posture estimation device, terminal equipment and a storage medium. The method comprises the following steps: obtaining a trained attitude estimation model, wherein the trained attitude estimation model is obtained by training based on normal portrait training data and incomplete portrait training data; and carrying out human body posture estimation on the picture to be detected through the trained posture estimation model to obtain a human body posture estimation result. The training data are transformed, namely more incomplete portrait training data are generated on the basis of normal portrait training data without adding extra training data, and the trained posture estimation model is obtained by training the transformed training data; because the trained posture estimation model has higher recognition precision and stronger robustness, the problem that the existing posture estimation model has poor recognition effect on the incomplete portrait can be solved under the condition of not influencing the human posture estimation effect of the normal portrait.

Description

Human body posture estimation method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of image recognition, and particularly relates to a human body posture estimation method and device, a terminal device and a storage medium.

Background

The detection of key points of human bones is also called Pose Estimation (position Estimation), and mainly detects key point information of human bodies, such as joints of four limbs, head, neck and the like, and describes the human bone information through the key points. The human body posture recognition has wide application range and can be used in various fields such as man-machine interaction, movie and television production, motion analysis, game and entertainment and the like. People can recognize and position the motion trail of the human joint points by using the human posture and record the motion data of the human joint points, so that the 3D animation can simulate the human motion to produce movie television, the motion can be analyzed by the recorded track and data, and man-machine interaction, game entertainment and the like can be realized.

The human skeleton key point detection is a multi-aspect task and comprises target detection, human skeleton key point detection, segmentation and the like. The 2D human skeleton key point detection algorithm model is basically realized by following two ideas of top-down (top-down) and bottom-up (bottom-up). In 2D multi-person key point detection (multi-person attitude estimation), firstly carrying out target human body detection by a top-down method, and then carrying out single-person key point detection (single-person attitude estimation) on each detected person; the bottom-up method detects the key points of all people and then carries out grouping association on the key points. Generally, the top-down method is more accurate, while the bottom-up method is faster.

The top-down human body posture estimation method is not friendly in engineering use because the algorithm speed is slow due to high precision, and the time consumption is increased along with the increase of the estimated number of people. In a real scene, some pictures only have incomplete human images, even half of human body conditions, and the estimation effect of the top-down human body posture estimation method in the scene is general.

Disclosure of Invention

The embodiment of the application provides a human body posture estimation method, a human body posture estimation device, terminal equipment and a storage medium.

In a first aspect, an embodiment of the present application provides a method for estimating a human body pose, including:

obtaining a trained attitude estimation model, wherein the trained attitude estimation model is obtained by training based on normal portrait training data and incomplete portrait training data;

and carrying out human body posture estimation on the picture to be detected through the trained posture estimation model to obtain a human body posture estimation result of the picture to be detected.

In a second aspect, an embodiment of the present application provides a human body posture estimation device, including:

the acquisition module is used for acquiring a trained posture estimation model, and the trained posture estimation model is obtained by training based on normal portrait training data and incomplete portrait training data;

and the posture estimation module is used for carrying out human posture estimation on the picture to be detected through the trained posture estimation model to obtain a human posture estimation result of the picture to be detected.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the human body posture estimation method in the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program implements the human body posture estimation method in the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the human body posture estimation method in any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that: the training data is transformed, namely more incomplete portrait training data are generated on the basis of normal portrait training data without adding additional training data, and the trained posture estimation model is obtained by training the transformed training data; the trained attitude estimation model has higher recognition precision and stronger robustness, so the problem that the existing attitude estimation model has poor recognition effect on the incomplete portrait can be solved under the condition of not influencing the human body attitude estimation effect of the normal portrait.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a human body posture estimation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an initial body frame in a body posture estimation method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a pose estimation model in a human pose estimation method according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for estimating a human body pose according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of a human body posture estimation device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Fig. 1 shows a flow diagram of a human body posture estimation method provided by the present application, which is applicable to various scenarios such as human-computer interaction, movie and television production, motion analysis, game entertainment, and the like, by way of example and not limitation, and the method is executed by a corresponding human body posture estimation device, which is composed of software and/or hardware and is generally integrated in a terminal device such as a game machine, a virtual reality device, and a motion data acquisition device.

As shown in fig. 1, the human body posture estimation method includes the following steps:

and S101, obtaining a trained attitude estimation model.

The trained pose estimation model is trained based on normal portrait training data and incomplete portrait training data.

Preparing a plurality of picture training data containing human body images, wherein each human body image on the picture is marked out by a rectangular frame to form an initial human body frame, and marking human body key points in each initial human body frame, wherein the human body key points comprise but are not limited to a head, a neck, hands, feet, main joints and the like.

The initial human body frame may be a complete human body image, and may also be an incomplete human body image due to reasons such as human body being blocked, no complete shooting, incorrect labeling of the initial human body frame, and the like.

On the premise of not acquiring new picture training data, incomplete human image transformation is carried out on the marked initial human body frame in the picture training data for increasing the number of training samples, namely, the marked initial human body frame in the picture training data is reduced, so that the human body image in the marked new human body frame is incomplete.

Classifying complete human body images in the initial human body frame with all human body key points into normal human image training data; incomplete human images in a new human frame with only partial human key points are classified as incomplete human image training data.

The incomplete portrait training data is increased through transformation, the attitude estimation model can achieve the effect of robustness on both the complete portrait and the incomplete portrait, and the influence of the body frame on the estimation effect of the attitude estimation model can be properly weakened through the transformation of the marked initial body frame.

And constructing the attitude estimation model into a neural network model, firstly training the attitude estimation model by using the image training data comprising the marked initial human body frame and the human body key points, and then training the attitude estimation model by using the new image training data comprising the marked new human body frame and the human body key points to obtain the trained attitude estimation model.

S102, carrying out human body posture estimation on the picture to be detected through the trained posture estimation model to obtain a human body posture estimation result of the picture to be detected.

And finishing target human body detection through a target detection algorithm, and outputting a to-be-detected human body frame, wherein the target detection algorithm model comprises a YOLO series, an SSD series, an Anchor free series and the like.

And detecting key points of the human body frame to be detected by using the trained posture estimation model, and outputting coordinates of the key points of the human body.

As a specific implementation, the human body posture estimation method is as follows:

s201, incomplete portrait transformation is carried out on the initial body frame marked in the picture training data, and a new marked body frame is obtained.

The marked initial human body frame in the picture training data is cut randomly, no matter the initial human body frame contains a complete human body image or an incomplete human body image, a new human body frame can be generated through cutting, and therefore the incomplete portrait can appear in the human body image in the new human body frame.

Wherein, carry out random cropping to the initial human frame that has marked in the picture training data, include: and determining the cutting position and the cutting size through the generated random number, and cutting the initial human body frame according to the cutting position and the cutting size. As a possible implementation manner, whether to crop and the cropping position are determined by a random integer generated within a certain value range, and the cropping size is determined by a random floating point number generated within a certain value range.

Fig. 2 is a schematic diagram of an initial body frame in the human body posture estimation method provided in this embodiment. By way of example and not limitation, referring to FIG. 2, the initial body frame is placed in XoY Cartesian coordinates and represented by coordinates (X) _min ,Y _min ,X _max ,Y _max ) In [0,9 ]]In the range of (1) to (2), 0 or 1 corresponds to X _min Increase in value of (A), Y _min 、X _max And Y _max The value of (1) is unchanged (the left boundary of the initial body frame is shifted to the right, the left key point is missing), 2 or 3 corresponds to X _max Decrease in value of X _min 、Y _min And Y _max Has a value ofVariation (left translation of right boundary of initial human body frame, missing of right key point), 4 or 5 corresponds to Y _min Increase in value of (A), X _min 、X _max And Y _max Is unchanged (upper boundary of the initial body frame translates downward, upper keypoints are missing), 6 or 7 corresponds to Y _max Decrease in value of X _min 、Y _min And X _max The value of (1) is unchanged (the lower boundary of the initial human body frame translates upwards, the lower key point is lost), 8 or 9 is not cut correspondingly, and all coordinate values are unchanged; at [0,0.25]]Is generated, the random floating point number multiplied by the width of the initial human body box is equal to X _min Or X _max The random floating point number multiplied by the height of the initial body frame is equal to Y _min Or Y _max The variation of (2) is the cutting size of the corresponding part of the initial human body frame. The new body frame is represented by coordinates as (new _ X) _min ,new_Y _min ,new_X _max ,new_Y _max )。

The value range of the random floating point number can be enlarged or reduced according to the cropping degree, the larger the value is, the more cropping is, and the more incomplete the portrait is. The upper part, the lower part, the left part and the right part of the initial human body frame can be cut at one place or a plurality of places simultaneously, the value range of random integers can be set as required, one digit corresponds to one place, two digits correspond to two places respectively, and so on, the more the cut parts are, the more incomplete the portrait is. The value range of the random number is determined according to the identification precision of the incomplete portrait. For example, a random integer is set to a value range of [00,99 ]]The value range of the random floating point number is [0,0.1 ]]When the generated random integer is 15 and the random floating point number is 0.05, the corresponding X is _min And Y _min Increase in value of (A), X _max And Y _max The left border of the initial body frame translates to the right with an amount of 0.05 x the width of the initial body frame, while the upper border translates downward with an amount of 0.05 x the height of the initial body frame.

Or randomly cutting human body parts in the initial human body frame, and randomly cutting one or more of the head and neck area, the hand area, the foot area and the leg area to generate a new human body frame.

A large number of training samples can be obtained by performing one or more times of incomplete portrait transformations on the labeled initial human body frame.

Furthermore, a labeled new human body frame is extracted from the picture training data, and numerical value normalization processing is carried out on the image in the labeled new human body frame. Namely, the value of each channel of the image in the labeled new human body frame is divided by 255, so that the value is normalized to the range interval of [0,1], and the input posture estimation model is convenient to learn.

And normalizing the dimension of the marked new human body frame according to the input requirement of the posture estimation model.

And determining a coding rule according to the value range of the random number, and coding the coordinates of the labeled human key points in the labeled new human frame with normalized size to obtain new training data.

Specifically, the coordinates of the body key points are expressed as (p1_ x, p1_ y, p2_ x, p2_ y, …, pi _ x, pi _ y), and following the above example, when the random floating point number ranges from [0,0.25], in an extreme case, the upper, lower, left and right sides of the initial body frame are cropped, the cropping size is 0.25, the initial body frame is cropped to each of the upper, lower, left and right sides 1/4, the width and height of the obtained new body frame are half (i.e. 0.5) of the width and height of the initial body frame, and the encoding rule is expressed as:

encoded coordinates normalize _ pi _ X ═ 0.25+ pi _ X0.5/(new _ X) _max -new_X _min )，normalize_pi_y＝0.25+pi_y*0.5/(new_Y _max -new_Y _min )。

S202, training the posture estimation model by using the new picture training data comprising the labeled new human body frame and the human body key points to obtain the trained posture estimation model.

Fig. 3 is a schematic structural diagram of a posture estimation model in the human body posture estimation method provided in this embodiment. The attitude estimation model is designed into a lightweight model structure, and is obtained by respectively convolving a high-resolution feature map (feature-map) and a low-resolution feature-map and performing fitting training in a regression mode by adopting a neural network model (backbone) MobileNet50 x 0.5 x.

The attitude estimation model can be subjected to complete portrait training through the marked initial human body frame and the human body key points in the initial human body frame, and the attitude estimation model can also be subjected to incomplete portrait training through new picture training data. And (4) quantitatively deriving the training to obtain a model in a fluid 8 format.

S203, carrying out human body posture estimation on the picture to be detected through the trained posture estimation model to obtain a human body posture estimation result of the picture to be detected.

The target human body detection is completed through a target detection algorithm, a human body frame to be detected is output, the coordinate of the human body frame to be detected is represented as (X1, Y1, X2 and Y2), and at the moment, an incomplete portrait may exist in the human body frame to be detected due to the fact that a certain part of a human body in an image is shielded, or the accuracy of the target detection algorithm is insufficient, so that the whole portrait is not completely selected in the human body frame to be detected.

After a human body frame to be detected is extracted from a picture to be detected, numerical value normalization processing is carried out on images in the human body frame to be detected, the images after the normalization processing are deduced through a trained posture estimation model, and the predicted coordinates (P1_ X, P1_ Y, P2_ X, P2_ Y, …, Pn _ X and Pn _ Y) of key points of a human body are obtained.

And decoding the predicted coordinates according to the coding rule to obtain the actual coordinates of the key points of the human body in the picture to be detected. The decoding process of the actual coordinates is expressed as:

True_Pn_X＝(Pn_X-0.25)*((X2-X1)/0.5)+X1；

True_Pn_Y＝(Pn_Y-0.25)*((Y2-Y1)/0.5)+Y1。

through decoding, even if the complete human body image is selected out of the frame of the human body to be detected, the coordinates of the human body key points outside the frame of the human body to be detected can be decoded, the coordinates of the human body key points to be located in the human body part shielded in the frame of the human body to be detected can be predicted, and complete posture recognition of the incomplete portrait is achieved. The encoding and decoding method is simpler and the operation speed is higher.

In the embodiment, extra training data is not required to be added, the training samples are increased by transforming the human body frame of the training data, the influence of the human body frame on the posture estimation model can be weakened, and the posture estimation effect on the incomplete portrait is improved under the condition that the posture estimation effect of the complete human body is not influenced; and fitting and training the attitude estimation model based on the lightweight structure by adopting a regression mode, so that the attitude estimation model achieves the effect of robustness on both complete portraits and incomplete portraits.

As a preferred implementation, optimization is performed on the basis of the above embodiment, and details of the same parts are not repeated, as shown in the flowchart of the human body posture estimation method shown in fig. 4, the human body posture estimation method includes the following steps:

s301, image transformation is carried out on the picture training data.

Aiming at the influences of different environments, illumination and angles in a real use scene, random image transformation is carried out on picture training data, and the image transformation comprises at least one data enhancement operation of color transformation, contrast transformation, brightness transformation and rotation transformation so as to increase the correlation robustness of the posture estimation model.

S302, incomplete portrait transformation is carried out on the initial body frame marked in the picture training data, and a new marked body frame is obtained.

The step S301 and the step S302 may be interchanged, and the image transformation may be performed on the picture training data first, and then the incomplete portrait transformation may be performed on the initial body frame marked in the picture training data after the image transformation; or incomplete portrait transformation can be performed on the initial body frame marked in the picture training data to obtain a new marked body frame, and then image transformation is performed on the image in the new marked body frame.

And extracting the labeled new human body frame from the picture training data, performing numerical value normalization on the image in the labeled new human body frame, wherein after the picture training data is subjected to at least one of color transformation, contrast transformation, brightness transformation or rotation transformation, the corresponding numerical value normalization result also changes, and the enrichment of the training sample of the posture estimation model is facilitated.

The dimensions of the new image-transformed body frame are normalized according to the input requirements of the pose estimation model.

And determining a coding rule according to the value range of the random number, coding the coordinates of the labeled human key points in the new human body frame with the normalized size, and acquiring new training data.

S303, judging whether the loss number of the human body key points in the marked new human body frame is larger than a preset value or not, if so, executing a step S304, otherwise, executing a step S305.

The maximum value of the loss number allowed by the human key points is preset according to the requirement of the identification precision of the incomplete portrait, the larger the preset value is, the more the possibly missing parts of the portrait in the new human frame are, and the better the detection effect of the trained posture estimation model on the human key points of the incomplete portrait is. Of course, it may be necessary to correspondingly extend the model training time or the human keypoint detection time.

For example, if the number of human body key points of the complete human body is 15, the default value of the allowed loss number after cutting is 4, and when the number of human body key points in the new human body frame is less than 11, it is determined that the cutting is excessive, the cutting is cancelled, the marked initial human body frame is set as the new human body frame, and the new human body frame is used for training the posture estimation model.

S304, canceling the cutting, setting the marked initial human body frame as a marked new human body frame, and continuing to execute the step S305.

If the loss number of the key points of the human body in the marked new human body frame is larger than the preset value, the training data of the new picture generated by the cutting can be considered to be not in accordance with the training requirement, the cutting is cancelled, and the marked initial human body frame is set as the new human body frame.

S305, training the posture estimation model by using the new picture training data comprising the labeled new human body frame and the human body key points to obtain the trained posture estimation model.

S306, carrying out human body posture estimation on the picture to be detected through the trained posture estimation model to obtain a human body posture estimation result of the picture to be detected.

On the basis of the above embodiment, the present embodiment performs image transformation on training data, including color, illumination, rotation, and other aspects, so that the trained pose estimation model achieves a better application effect in many real application scenes. In order to reduce unnecessary training, the embodiment can filter the human body frames which are too much in defect after cutting transformation according to the requirement on the posture estimation effect, and is closer to the actual application scene.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 5 shows a block diagram of a human body posture estimation apparatus provided in the embodiment of the present application, and only shows relevant parts in the embodiment of the present application for convenience of description.

Referring to fig. 5, the apparatus includes:

The obtaining module comprises a sample transformation unit and a model training unit. The sample transformation unit is used for carrying out incomplete portrait transformation on the marked initial human body frame in the picture training data to obtain a marked new human body frame; and the model training unit trains the posture estimation model by using the new picture training data comprising the labeled new human body frame and the human body key points to obtain the trained posture estimation model.

The sample transformation unit is specifically used for randomly cutting the marked initial human body frame in the picture training data, and comprises:

determining a cutting position and a cutting size through the generated random number; and cutting the marked initial human body frame in the picture training data according to the cutting position and the cutting size. As a possible implementation manner, whether to crop and the position of the cropping are determined by a random integer generated within a certain value range, and the cropping size is determined by a random floating point number generated within a certain value range.

Further, the device also comprises a transformation filtering module for filtering the transformation result of the sample transformation unit after the marked new human body frame is obtained. The method comprises the following specific steps: and if the loss number of the key points of the human body in the marked new human body frame is greater than the preset value, canceling the cutting, and setting the marked initial human body frame as the marked new human body frame.

The device further comprises a normalization coding module, before the new picture training data is used for training the posture estimation model, the normalization coding module is used for extracting the labeled new human body frame from the picture training data and carrying out numerical value normalization processing on the image in the labeled new human body frame; standardizing the size of the marked new human body frame according to the input requirement of the posture estimation model; and determining a coding rule according to the value range of the random number, and coding the coordinates of the labeled human key points in the labeled new human frame with normalized size to obtain new training data.

Further, the device comprises an image transformation module, before the marked new human body frame is obtained, the image transformation module is used for carrying out image transformation on the picture training data, and the image transformation comprises at least one of color transformation, contrast transformation, brightness transformation and rotation transformation.

The attitude estimation module is specifically configured to: after a human body frame to be detected is extracted from a picture to be detected, obtaining the predicted coordinates of the human body key points through a trained posture estimation model; and decoding the predicted coordinates according to the encoding rule to obtain actual coordinates of key points of the human body in the picture to be detected, wherein the actual coordinates are the human body posture estimation result of the picture to be detected.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the device is divided into different functional units or modules, so as to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

An embodiment of the present application further provides a terminal device, and with reference to fig. 6, the terminal device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The specific type of the terminal device is not limited at all, and the human body posture estimation method provided by the embodiment of the application can be applied to terminal devices such as a motion sensing game machine, an Augmented Reality (AR)/Virtual Reality (VR) device and a motion data acquisition device.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps that can be implemented in the above method embodiments.

Embodiments of the present application provide a computer program product, which when running on a terminal device in the foregoing embodiments, enables the terminal device to implement the steps in the foregoing method embodiments when executed.

The units integrated by the human body posture estimation device provided by the above embodiment can be stored in a computer readable storage medium if the units are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the flow in the method of the embodiments described above can be implemented by a computer program to instruct related hardware to complete, and the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one type of logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A human body posture estimation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining a trained pose estimation model comprises:

performing incomplete portrait transformation on the marked initial human body frame in the picture training data to obtain a marked new human body frame;

and training a posture estimation model by using the new picture training data comprising the labeled new human body frame and the human body key points to obtain the trained posture estimation model.

3. The method of claim 2, wherein the incomplete human image transformation of the labeled initial human body frame in the picture training data comprises:

and randomly cutting the marked initial human body frame in the picture training data.

4. The method of claim 3, wherein the randomly cropping the labeled initial human body frame in the picture training data comprises:

determining a cutting position and a cutting size through the generated random number;

and cutting the marked initial human body frame in the picture training data according to the cutting position and the cutting size.

5. The method of claim 4, wherein after the incomplete human image transformation is performed on the labeled initial human body frame in the picture training data to obtain a labeled new human body frame, the method further comprises:

and if the loss number of the key points of the human body in the marked new human body frame is greater than a preset value, canceling the cutting, and setting the marked initial human body frame as the marked new human body frame.

6. The method of claim 5, wherein prior to training a pose estimation model with new picture training data comprising the labeled new body box and body keypoints, the method further comprises:

extracting the labeled new human body frame from the picture training data, and carrying out numerical value normalization processing on the image in the labeled new human body frame;

normalizing the size of the labeled new human body frame according to the input requirement of the posture estimation model;

determining a coding rule according to the value range of the random number, and coding the coordinates of the labeled human key points in the labeled new human frame with normalized size to obtain new training data;

the training of the pose estimation model by using the new image training data including the labeled new human body frame and the human body key points to obtain the trained pose estimation model comprises the following steps:

and training the attitude estimation model by using the new training data to obtain the trained attitude estimation model.

7. The method of claim 2, wherein before the incomplete portrait transformation is performed on the labeled initial frame in the photo training data to obtain the labeled new frame, the method further comprises:

performing image transformation on the picture training data, wherein the image transformation comprises at least one of color transformation, contrast transformation, brightness transformation and rotation transformation.

8. The method of claim 6, wherein the obtaining the human body posture estimation result of the picture to be detected by performing the human body posture estimation on the picture to be detected through the trained posture estimation model comprises:

after a human body frame to be detected is extracted from a picture to be detected, obtaining the predicted coordinates of the key points of the human body through the trained posture estimation model;

and decoding the predicted coordinates according to the coding rule to obtain actual coordinates of key points of the human body in the picture to be detected, wherein the actual coordinates are the human body posture estimation result of the picture to be detected.

9. A human body posture estimation device, characterized by comprising:

10. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.