CN113297995A

CN113297995A - Human body posture estimation method and terminal equipment

Info

Publication number: CN113297995A
Application number: CN202110603738.7A
Authority: CN
Inventors: 林灿然; 程骏; 郭渺辰; 邵池; 庞建新
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-24
Anticipated expiration: 2041-05-31
Also published as: CN113297995B

Abstract

The application is applicable to the technical field of computers, and provides a human body posture estimation method and terminal equipment, wherein the method comprises the following steps: acquiring a first thermodynamic diagram according to an input image; inputting the first thermodynamic diagram into a preset feature fusion model for processing to obtain a second thermodynamic diagram; and estimating the human body posture based on the second thermodynamic diagram to obtain the coordinates of the human body key points, further performing feature fusion on the obtained thermodynamic diagram to obtain a thermodynamic diagram with better quality, and estimating the human body posture based on the thermodynamic diagram after feature fusion, thereby effectively improving the identification precision of the human body key points.

Description

Human body posture estimation method and terminal equipment

Technical Field

The application belongs to the technical field of robots, and particularly relates to a human body posture estimation method and terminal equipment.

Background

The human body posture estimation refers to a process of recovering key points of a human body from a given image or a video. The general human body key point determination is to directly regress the coordinate value of each key point according to the coordinate value of the label, and the method has the advantages of simplicity, directness, high speed and the like, but can cause larger estimation error due to the lack of some image semantic information. In order to improve the accuracy, the human body posture can be estimated based on the thermodynamic diagrams of the images, however, the thermodynamic diagrams generated by the existing human body posture estimation model are generally poor in quality, and errors are easily caused in the identification of key points of the human body.

Disclosure of Invention

The embodiment of the application provides a human body posture estimation method and terminal equipment, and can solve the problems that thermodynamic diagrams generated by existing human body posture estimation models are poor in quality generally, and errors are prone to occurring in recognition of human body key points.

In a first aspect, an embodiment of the present application provides a method for estimating a human body pose, including:

acquiring a first thermodynamic diagram according to an input image;

inputting the first thermodynamic diagram into a preset feature fusion model for processing to obtain a second thermodynamic diagram;

and estimating the human body posture based on the second thermodynamic diagram to obtain the coordinates of the human body key points.

In a possible implementation manner of the first aspect, the preset feature fusion model includes a down-sampling module, an up-sampling module, and a transverse connection module;

correspondingly, the inputting the first thermodynamic diagram into a feature fusion model for processing to obtain a second thermodynamic diagram, including:

based on the down-sampling module, the first thermodynamic diagram is subjected to down-sampling processing to obtain a multi-scale image;

performing upsampling processing on the output of the downsampling module based on the upsampling module;

and performing feature fusion on the multi-scale image and the output of the down-sampling module through a transverse connection module to obtain the second thermodynamic diagram.

In a possible implementation manner of the first aspect, the downsampling module includes a first convolution unit, a second convolution unit, a third convolution unit, and a fourth convolution unit;

correspondingly, the downsampling the first thermodynamic diagram based on the downsampling module to obtain a multi-scale image includes:

the first convolution unit carries out downsampling processing on the first thermodynamic diagram to obtain a first scale image;

the second convolution unit carries out downsampling processing on the first scale image to obtain a second scale image;

and the third convolution unit carries out downsampling processing on the second scale image to obtain a third scale image.

In a possible implementation manner of the first aspect, the upsampling module includes a fifth convolution unit, a sixth convolution unit, and a seventh convolution unit, the cross-connection module includes a first cross-connection unit, a second cross-connection unit, and a third cross-connection unit, an output of the first convolution unit is connected to an input of the second convolution unit and a first input of the third cross-connection unit, an output of the second convolution unit is connected to an input of the third convolution unit and a first input of the second cross-connection unit, an output of the third convolution unit is connected to an input of the fourth convolution unit and a first input of the first cross-connection unit, an output of the fourth convolution unit is connected to a second input of the first cross-connection unit, and an output of the first cross-connection unit is connected to an input of the fifth convolution unit, the output of the fifth convolution unit is connected to the second input of the second transverse connection unit, the output of the second transverse connection unit is connected to the input of the sixth convolution unit, the output of the sixth convolution unit is connected to the second input of the third transverse connection unit, and the output of the third transverse connection unit is connected to the input of the seventh convolution unit.

In a possible implementation manner of the first aspect, the first transverse connection unit includes a first transverse convolution unit and a first amplification unit, the second transverse connection unit includes a second transverse convolution unit and a second amplification unit, and the third transverse connection unit includes a third transverse convolution unit and a third amplification unit.

In a possible implementation manner of the first aspect, the method further includes:

constructing a feature fusion model;

and training the feature fusion model based on training data to obtain the preset feature fusion model.

In a possible implementation manner of the first aspect, the training data includes training images and real labels corresponding to the training images; training the feature fusion model based on training data to obtain the preset feature fusion model, including:

inputting a training image into the preset feature fusion model for processing to obtain a thermodynamic diagram corresponding to the training image;

determining model loss based on a thermodynamic diagram corresponding to the training image and a real label corresponding to the training;

and adjusting the model parameters of the preset feature fusion model based on the model loss until the loss function of the feature fusion model is converged to obtain the trained preset feature fusion model.

In a second aspect, an embodiment of the present application provides a terminal device, including:

the acquisition module is used for acquiring a first thermodynamic diagram according to the input image;

the characteristic fusion module is used for inputting the first thermodynamic diagram into a preset characteristic fusion model for processing to obtain a second thermodynamic diagram;

and the posture estimation module is used for estimating the posture of the human body based on the second thermodynamic diagram to obtain the coordinates of the key points of the human body.

In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method according to the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of the first aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that: the acquired thermodynamic diagrams can be further subjected to feature fusion to obtain thermodynamic diagrams with better quality, and the human body posture estimation is carried out on the basis of the thermodynamic diagrams after the feature fusion, so that the identification precision of the human body key points can be effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of a human body posture estimation method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a preset feature fusion model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a first transverse connecting unit provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to another embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The human body structure is a non-rigid structure, and various changes exist in the human body structure relative to rigid structures such as automobiles, tables and chairs and the like. Rotating around the raised part of the joint point, humans can perform various complex actions. Thus, by determining the positions of key joint points of the head, neck, trunk, and limbs, the posture characteristics of the person can be recognized, thereby performing behavior recognition. The recognition of the human body posture characteristics can be applied to various scenes, such as somatosensory interaction, target tracking, behavior recognition and the like.

Human body pose estimation can determine human body key joint point positions. Specifically, the estimation of the human body posture refers to a process of recovering key points of a human body from a given image or a video.

There are two methods for determining the position coordinates of key points of the human body. The first method is to directly regress the coordinate value of each key point according to the coordinate value of the label, and the method has the advantages of simplicity, directness, high speed and the like. However, the method is too simple, some image semantic information is lacking, and the method is more suitable for identifying the key points of the human face, because the distance between the key points of the human face is small, the variation amplitude is not large, the interval between every two key points is long, and the method of direct regression is adopted for identifying the key points of the human body with large variation amplitude, so that a large error is brought.

To reduce errors, a second method of determining the location coordinates of key points of the human body is to determine the coordinates of the key points based on thermodynamic diagrams. The thermodynamic diagram is used for measuring the confidence degree of the key point appearing at a certain position of the image, the thermodynamic diagram is composed of a series of two-dimensional points, each point represents the confidence degree of the key point appearing at the position, the final position of the key point is defined as the position with the highest confidence degree, and the method can fully utilize the semantic information of the image and is high in precision.

However, thermodynamic diagrams generated by existing human body posture estimation models are generally poor in quality, and errors are prone to occur in recognition of human body key points.

In order to solve the above problem, embodiments of the present application provide a human body posture estimation method, which can further perform feature fusion on an obtained thermodynamic diagram to obtain a thermodynamic diagram with better quality, and perform human body posture estimation based on the thermodynamic diagram after feature fusion, so as to effectively improve the identification accuracy of human body key points.

Fig. 1 is a schematic flow chart of a human body posture estimation method provided by the embodiment of the present application. In the embodiment of the present application, an executing subject is taken as an example to be explained, as shown in fig. 1, the human body posture estimation method may include steps S101 to S103, which are detailed as follows:

s101: a first thermodynamic diagram is acquired from an input image.

In this embodiment, the input image may be a frame of image cut out from a video file, for example, a cut-out image in a surveillance video shot by a surveillance camera; the input image may be a photograph directly taken by an image pickup apparatus.

In the embodiment of the present application, the input image may be a human body image for which pose estimation is required.

In this embodiment of the application, the terminal device may process the input image based on an existing human body posture estimation algorithm (e.g., a Hourglass algorithm, etc.), so as to obtain the first thermodynamic diagram.

S102: and inputting the first thermodynamic diagram into a preset feature fusion model for processing to obtain a second thermodynamic diagram.

In the embodiment of the application, the first thermodynamic diagram is processed through the preset feature fusion model capable of refining the thermodynamic diagrams, so that the second thermodynamic diagram with better quality is obtained, and the accuracy of human body posture estimation can be improved.

In the embodiment of the present application, the preset feature fusion model refers to a feature fusion model that completes training. The preset feature fusion model can be preset in the terminal device. After the first thermodynamic diagram is obtained, the terminal device may automatically invoke the preset feature fusion model, and then input the first thermodynamic diagram into the preset feature fusion model for processing, so as to obtain a second thermodynamic diagram corresponding to the first thermodynamic diagram.

In an embodiment of the application, the preset feature fusion model includes a down-sampling module, an up-sampling module, and a transverse connection module.

Accordingly, S102 may include the steps of:

Referring to fig. 2, fig. 2 is a schematic structural diagram of a preset feature fusion model according to an embodiment of the present application. As shown in fig. 2, in an embodiment of the present application, the downsampling module includes a first convolution unit a1, a second convolution unit a2, a third convolution unit A3, and a fourth convolution unit a 4; the up-sampling module comprises a fifth convolution unit A5, a sixth convolution unit A6 and a seventh convolution unit A7, the transverse connection module comprises a first transverse connection unit B1, a second transverse connection unit B2 and a third transverse connection unit B3, the output of the first convolution unit A1 is respectively connected with the input of the second convolution unit A2 and the first input of the third transverse connection unit B3, the output of the second convolution unit A2 is respectively connected with the input of the third convolution unit A3 and the first input of the second transverse connection unit B2, the output of the third convolution unit A3 is respectively connected with the input of the fourth convolution unit A4 and the first input of the first transverse connection unit B1, the output of the fourth convolution unit A4 is connected with the second input of the first transverse connection unit B1, the output of the first transverse connection unit B1 is connected with the input of the fifth convolution unit A5, the output of the fifth transverse connection unit A5 is connected with the second input of the second convolution unit B2, the output of the second cross-connect unit B2 is connected to the input of the sixth convolution unit a6, the output of the sixth convolution unit a6 is connected to the second input of the third cross-connect unit B3, and the output of the third cross-connect unit B3 is connected to the input of the seventh convolution unit a 7.

On this basis, the downsampling module downsamples the first thermodynamic diagram to obtain a multi-scale image, and the method includes:

the first convolution unit A1 carries out downsampling processing on the first thermodynamic diagram to obtain a first scale image;

the second convolution unit A2 carries out downsampling processing on the first scale image to obtain a second scale image;

the third convolution unit a3 performs downsampling processing on the second scale image to obtain a third scale image.

Exemplarily, assuming that the size (spatial resolution) of the first thermodynamic diagram input into the preset feature fusion model is 64 × 64, the downsampling operation is performed through the first convolution unit a1, the second convolution unit a2, and the third convolution unit A3, respectively, that is, the spatial resolution of the first thermodynamic diagram is reduced and the number of channels is increased by the convolution kernel, and the downsampling operation is performed from 64 × 64 to 32 × 32 (the first scale image output by the first convolution unit a 1); from 32 x 32 to 16 x 16 (second scale image output by the second convolution unit a 2); from 16 x 16 to 8 x 8 (third scale image output by the third convolution unit a 3).

And then restoring the resolution of the thermodynamic diagram by an up-sampling operation, such as an interpolation mode of nearest neighbor interpolation or bilinear interpolation, and the like, and simultaneously realizing pixel-by-pixel addition based on transverse connection to realize feature fusion.

Referring to fig. 3, fig. 3 is a schematic structural diagram illustrating a first transverse connection unit B1 according to an embodiment of the present disclosure. As shown in fig. 3, in an embodiment of the present application, the first transverse connection unit B1 includes a first transverse convolution unit B11 and a first amplification unit B12.

It should be noted that the second transverse connecting unit B2 and the third transverse connecting unit B3 also have the structure of the first transverse connecting unit B1 shown in FIG. 3.

For the image output by the down sampling unit, the number of channels which are increased before is reduced by 1 × 1 convolution from left to right without changing the size of the spatial resolution, the image output by the up sampling unit is subjected to up sampling operation (2 × up) from top to bottom, the resolution of the image is restored to the image which is twice as large, so as to ensure that the added thermodynamic diagrams are consistent in the spatial resolution, and then the characteristic fusion is realized by pixel-by-pixel addition.

After the operation is carried out through the preset feature fusion model, a second thermodynamic diagram with the same spatial resolution as that of the first thermodynamic diagram is obtained, but the quality of the second thermodynamic diagram is better than that of the first thermodynamic diagram after the second thermodynamic diagram is fused with features of images with different scales.

S103: and estimating the human body posture based on the second thermodynamic diagram to obtain the coordinates of the human body key points.

In the embodiment of the application, after the second thermodynamic diagram is obtained, the coordinates of the key points of the human body can be obtained based on the second thermodynamic diagram. Specifically, the confidence level that the key point of the human body appears at a certain position of the image is determined based on the second thermodynamic diagram. The second thermodynamic diagram is composed of a series of two-dimensional points, each point representing the confidence with which a keypoint appears at that location, and the final location of the keypoint being defined as the location with the highest confidence. Namely, the position with the highest confidence coefficient of each key point is determined through the second thermodynamic diagram, and then the coordinate of the position is determined.

It should be noted that the key points of the human body generally include 20 key joint points of the human body, such as joint points on the head, neck, torso, and limbs.

In summary, the human body posture estimation method provided by the embodiment of the application can further perform feature fusion on the obtained thermodynamic diagram to obtain a thermodynamic diagram with better quality, and can effectively improve the identification precision of the human body key points by performing human body posture estimation on the basis of the thermodynamic diagram after feature fusion.

In another embodiment of the present application, the above human body posture estimation method further includes the following steps:

constructing a feature fusion model;

In the embodiment of the present application, the structure of the constructed feature fusion model is the structure shown in fig. 2, which is not described in detail herein

In a specific application, the training data comprises training images and real labels corresponding to the training images; training the feature fusion model based on training data to obtain the preset feature fusion model, including:

In the embodiment of the present application, the feature fusion model after training is the preset feature fusion model in the embodiment of the present application.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the human body posture estimation method described in the foregoing embodiment, fig. 5 shows a structural block diagram of the terminal device provided in the embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown.

Referring to fig. 4, the terminal device 40 includes: an acquisition module 41, a feature fusion module 42, and an attitude estimation module 43.

The obtaining module 41 is configured to obtain a first thermodynamic diagram according to an input image.

The feature fusion module 42 is configured to input the first thermodynamic diagram into a preset feature fusion model for processing, so as to obtain a second thermodynamic diagram.

And the posture estimation module 43 is configured to perform human posture estimation based on the second thermodynamic diagram to obtain human key point coordinates. .

In a possible implementation manner, the terminal device further includes a construction unit and a training unit.

The construction unit is used for constructing the feature fusion model.

The training unit is used for training the feature fusion model based on training data to obtain the preset feature fusion model.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 5 is a schematic structural diagram of a terminal device according to another embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, wherein the processor 50 executes the computer program 52 to implement the steps of any of the above-mentioned human body posture estimation method embodiments.

The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is only an example of the terminal device 5, and does not constitute a limitation to the terminal device 5, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 50 may be a Central Processing Unit (CPU), and the Processor 50 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may in some embodiments be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A human body posture estimation method is characterized by comprising the following steps:

acquiring a first thermodynamic diagram according to an input image;

2. The human pose estimation method of claim 1, wherein the preset feature fusion model comprises a down-sampling module, an up-sampling module and a transverse connection module;

3. The human pose estimation method of claim 2, wherein the downsampling module comprises a first convolution unit, a second convolution unit, a third convolution unit, and a fourth convolution unit;

4. The human pose estimation method of claim 3, wherein the up-sampling module comprises a fifth convolution unit, a sixth convolution unit and a seventh convolution unit, the cross-connect module comprises a first cross-connect unit, a second cross-connect unit and a third cross-connect unit, an output of the first convolution unit is connected to an input of the second convolution unit and a first input of the third cross-connect unit, respectively, an output of the second convolution unit is connected to an input of the third convolution unit and a first input of the second cross-connect unit, respectively, an output of the third convolution unit is connected to an input of the fourth convolution unit and a first input of the first cross-connect unit, respectively, an output of the fourth convolution unit is connected to a second input of the first cross-connect unit, the output of the first transverse connection unit is connected with the input of the fifth convolution unit, the output of the fifth convolution unit is connected with the second input of the second transverse connection unit, the output of the second transverse connection unit is connected with the input of the sixth convolution unit, the output of the sixth convolution unit is connected with the second input of the third transverse connection unit, and the output of the third transverse connection unit is connected with the input of the seventh convolution unit.

5. The human body pose estimation method of claim 4, wherein the first transverse connection unit comprises a first transverse convolution unit and a first amplification unit, the second transverse connection unit comprises a second transverse convolution unit and a second amplification unit, and the third transverse connection unit comprises a third transverse convolution unit and a third amplification unit.

6. The human pose estimation method of claim 1, further comprising:

constructing a feature fusion model;

7. The human body pose estimation method of claim 6, wherein the training data comprises training images and real labels corresponding to the training images; training the feature fusion model based on training data to obtain the preset feature fusion model, including:

8. A terminal device, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.