CN111783662B

CN111783662B - Attitude estimation method, estimation model training method, device, medium and equipment

Info

Publication number: CN111783662B
Application number: CN202010622916.6A
Authority: CN
Inventors: 罗宇轩; 唐堂; 刘钢
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2022-02-08
Anticipated expiration: 2040-06-30
Also published as: CN111783662A

Abstract

The disclosure relates to a posture estimation method, an estimation model training device, a medium and an apparatus. The method comprises the following steps: acquiring a target human body image; inputting the target human body image into a human body posture estimation model to obtain posture data of a human body in the target human body image; the human body posture estimation model is obtained by training in the following mode: acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image; and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain a human body posture estimation model. Therefore, the model has stronger robustness, and the orientation of the user can be correctly identified even if the facial information of the user in the target human body image is lost, so that the accuracy of human body posture estimation is improved.

Description

Attitude estimation method, estimation model training method, device, medium and equipment

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a pose estimation method, an estimation model training device, a medium, and an apparatus.

Background

The human body posture estimation refers to positioning human body key points through a human body posture estimation model for a given image or a video so as to obtain the posture data of the human body in the image or the video. When the user's face is poorly visible (e.g., heads up) in a given image or video, i.e., facial information is missing, the human pose estimation model will have difficulty distinguishing the orientation of the user in the image, i.e., whether the user is facing the camera on the front or the camera on the back, and the model will be less robust. In order to avoid overfitting the human body posture estimation model and improve the robustness of the model, various data enhancement methods are generally introduced in the model training process. Common data enhancement methods at the present stage include color disturbance, random Gaussian noise disturbance, rotation disturbance, horizontal inversion and the like. Because the existing data enhancement method cannot well simulate samples of the type with poor face visibility of the user, even if the existing perturbation method is added for model training, the obtained model cannot distinguish the orientation of the user.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a human body posture estimation method, including:

acquiring a target human body image;

inputting the target human body image into a human body posture estimation model to obtain posture data of a human body in the target human body image;

the human body posture estimation model is obtained by training in the following mode:

acquiring a plurality of sample human body images and posture data of a human body in each sample human body image;

erasing the head area in part of the sample human body image;

and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain the human body posture estimation model.

In a second aspect, the present disclosure provides a training method for a human body posture estimation model, including:

acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image;

and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain a human body posture estimation model.

In a third aspect, a human body posture estimation device is provided, including:

the first acquisition module is used for acquiring a target human body image;

the posture determining module is used for inputting the target human body image acquired by the first acquiring module into a human body posture estimating model to obtain posture data of a human body in the target human body image;

the human body posture estimation model is obtained by training through a training device of the human body posture estimation model, and the training device of the human body posture estimation model comprises:

the second acquisition module is used for acquiring a plurality of sample human body images and posture data of a human body in each sample human body image;

the erasing module is used for erasing the head area in the part of the sample human body image;

and the training module is used for carrying out model training according to the sample human body images obtained after the erasing processing of the erasing module, the sample human body images which are not subjected to the erasing processing and the posture data of the human body in each sample human body image so as to obtain the human body posture estimation model.

In a fourth aspect, a training apparatus for a human body posture estimation model is provided, which includes:

the erasing module is used for erasing the head area in part of the sample human body image;

and the training module is used for carrying out model training according to the sample human body images obtained after the erasing processing of the erasing module, the sample human body images which are not subjected to the erasing processing and the posture data of the human body in each sample human body image so as to obtain a human body posture estimation model.

In a fifth aspect, a computer readable medium is provided, on which a computer program is stored, which when executed by a processing device, implements the steps of the human body posture estimation method provided by the first aspect of the present disclosure.

In a sixth aspect, a computer readable medium is provided, on which a computer program is stored, which when executed by a processing device, implements the steps of the training method of the human body posture estimation model provided in the second aspect of the present disclosure.

In a seventh aspect, an electronic device is provided, including:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the human body posture estimation method provided by the first aspect of the present disclosure.

In an eighth aspect, an electronic device is provided, including:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the training method of the human body posture estimation model provided by the second aspect of the present disclosure.

In the technical scheme, in the human body posture estimation model training process, after a plurality of sample human body images and posture data of a human body in each sample human body image are acquired, the sample human body images and the posture data of the human body in each sample human body image are not directly used for model training, a head area in the acquired part of sample human body images is firstly erased for data enhancement, so that training samples are richer and more changeable, and then model training is performed based on the sample human body images acquired after the head area is erased, the sample human body images which are not erased and the posture data of the human body in each sample human body image. Because the sample human body image obtained after the head region erasing processing does not contain the head information of the user, the human body posture estimation model obtained based on the training does not depend on the head characteristics of the human body too much to distinguish the orientation of the user, so that the model has stronger robustness, the orientation of the user can be correctly identified even if the facial information of the user in the target human body image is missing, and the accuracy of human body posture estimation is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

In the drawings:

FIG. 1 is a flow chart illustrating a method of human pose estimation according to an exemplary embodiment.

FIG. 2 is a flow chart illustrating a method of training a body pose estimation model, according to an exemplary embodiment.

Fig. 3 is a flow chart illustrating a method of head region determination according to an example embodiment.

Fig. 4A-4C are schematic diagrams illustrating a process of performing an erasing process on a head region in a sample human body image according to an exemplary embodiment.

Fig. 5A-5C are schematic diagrams illustrating a process of performing an erasing process on a head region in a sample human body image according to another exemplary embodiment.

FIG. 6 is a flow chart illustrating a method of human pose estimation according to another exemplary embodiment.

Fig. 7 is a block diagram illustrating a human body posture estimation apparatus according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating a training apparatus for a human pose estimation model according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

FIG. 1 is a flow chart illustrating a method of human pose estimation according to an exemplary embodiment. As shown in fig. 1, the method may include S101 and S102.

In S101, a target human body image is acquired.

In this embodiment, the human body posture estimation method may be applied to a server or a terminal device (e.g., a smart phone, a tablet computer, etc.), which may obtain the target human body image from a remote location or a local location through a wired connection manner or a wireless connection manner. The target human body image is a human body image of human body posture data corresponding to the target human body image to be estimated. For example, the target human body image may be an image obtained by shooting a target human body (i.e., a human body within a shooting range of a camera) by the camera provided on the server or the terminal device. As another example, the target human body image may be a human body image acquired from a preset human body image set (e.g., randomly acquired or acquired in a preset image number order).

In S102, the target human body image is input into the human body posture estimation model, and posture data of the human body in the target human body image is obtained.

In the present disclosure, the pose data may include rotational quaternion information for each skeletal joint of the human body in the target human body image. Wherein, the rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in the space.

In addition, the human Pose Estimation model may be, for example, DensePose, OpenPose, Realtime Multi-Person Pose Estimation, or the like. And, the human posture estimation model can be obtained through training of S201 to S203 shown in fig. 2.

In S201, a plurality of sample human body images and posture data of a human body in each sample human body image are acquired.

In S202, an erasing process is performed on the head region in the partial sample body image.

In the present disclosure, it may be determined which of the plurality of sample human body images acquired in S201 above the head region in the sample human body image is to be subjected to the erasing process in various ways. In one embodiment, a plurality of sample human body images may be randomly selected from the plurality of sample human body images, and the head region in each selected sample human body image is erased, while the head region in the remaining sample human body images that are not selected is not erased.

In another embodiment, it may be determined whether to perform the erasing process on the head region in each of the plurality of sample human body images acquired in S201 with a certain random probability.

For example, for each sample human body image, a random number corresponding to the sample human body image and having a value in a range of [0,1] may be randomly generated; then, judging whether the random number and a preset probability threshold value meet preset conditions or not; and if the comparison result does not meet the preset threshold, determining not to erase the head region in the sample human body image. The preset condition may be, for example, that the random number is less than or equal to a preset probability threshold.

In S203, model training is performed according to the sample human body image obtained after the erasing process, the sample human body image that is not subjected to the erasing process, and the posture data of the human body in each sample human body image, so as to obtain a human body posture estimation model.

In the present disclosure, the head region in S202 described above may be determined in various ways. In one embodiment, for each sample human body image determined to be subjected to the head region erasing process (i.e., the sample human body image to be subjected to the head region erasing process), the head region in the sample human body image can be identified by means of image identification.

In another embodiment, the above-described head region may be determined by S301 and S302 shown in fig. 3.

In S301, annotation information corresponding to the sample human body image is acquired.

In the present disclosure, the labeling information is used to label the human skeleton key points or the human key parts included in the sample human body image. The human skeletal keypoints are points (e.g., gray points shown in fig. 4A) in the sample human image that are used to characterize a particular part of the human body (e.g., wrist, knee, shoulder, etc.). Illustratively, the sample human body image includes human body skeletal key points such as the head, shoulders, elbows, wrists, hips, knees, and ankles of the human body. Wherein, the key points of the human skeleton can be marked in the form of points (as shown in fig. 4A).

The key part of the human body is a specific part (for example, wrist, knee, shoulder, etc.) of the human body in the sample human body image. Illustratively, the sample human body image includes human body key parts such as the head, shoulders, elbows, wrists, buttocks, knees and ankles of the human body. Wherein, the key parts of the human body can be marked in the form of polygons (e.g., rectangles) (as shown in fig. 5A).

In addition, the labeling information may be text information stored separately, wherein the text information may include pixel coordinates of human skeleton key points in the sample human body image or corner coordinates of polygonal areas corresponding to human body key parts; alternatively, the labeling information may be a point or a polygon directly labeled on the sample human body image.

In S302, a head region is determined from the sample human body image based on the annotation information.

In one embodiment, the labeling information is used for labeling human skeleton key points, wherein the human skeleton key points include a left shoulder key point and a right shoulder key point. In this way, a region located above the target key point in the sample human body image, which is a key point closer to the upper edge of the sample human body image, of the left shoulder key point and the right shoulder key point, can be determined as the head region.

Illustratively, as shown in fig. 4A, the key point closer to the upper edge of the sample human body image among the left shoulder key point B and the right shoulder key point a is the right shoulder key point a, i.e., the target key point is the right shoulder key point a, and thus, the region above the right shoulder key point a in the sample human body image is determined as the head region, i.e., the rectangular region shown by the broken line in fig. 5A.

In another embodiment, the labeling information is used for labeling a human body key part, wherein the human body key part comprises a head. In this way, the region corresponding to the head labeled with the labeling information in the sample human body image can be determined as the head region.

For example, as shown in fig. 5A, the region corresponding to the head labeled with the labeling information in the sample human body image is a rectangle I, and therefore, the region represented by the rectangle I can be determined as the head region.

After the head region in the sample human body image to be subjected to the head region erasing process, the erasing process of the corresponding head region can be realized by the following three ways:

(1) and replacing the head area in the part of the sample human body image with a preset image.

The preset image may be any image that does not include header information.

Illustratively, the preset image is a solid image (i.e., solid filling of the head region), such as a solid black image (as shown in fig. 4B and 5B), a solid gray image, or the like.

As another example, the preset image is a texture image (as shown in fig. 4C and 5C).

Further illustratively, the preset image is an image preset by the user and not containing header information.

(2) And respectively adjusting the pixel value of each pixel point in the head area in the partial sample human body image to be a target pixel value corresponding to the corresponding head area.

And the target pixel value corresponding to the corresponding head region is the average of the pixel values of all the pixel points in the head region.

(3) And respectively randomly adjusting the pixel value of each pixel point in the head region of the partial sample body image.

It should be noted that, S202 may be executed before S203 (as shown in fig. 2), that is, each sample human body image to be subjected to the head region erasing process may be subjected to the model training after the erasing process, or may be executed alternately with S203, that is, each time one sample human body image is taken out, whether the head region is subjected to the erasing operation is determined, if the head region is determined to be subjected to the erasing operation, the head region is subjected to the erasing operation, then the model training is performed according to the sample human body image obtained after the erasing process and the posture data of the human body in the sample human body image, and if the head region is determined not to be subjected to the erasing operation, the model training is performed directly according to the sample human body image and the posture data of the human body in the sample human body image.

As described above, the posture data acquired in S102 described above may include rotational quaternion information of each skeletal joint of the human body in the target human body image. The rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in the space, and further the bone joints corresponding to the virtual image can be rotated according to the rotation quaternion information of each bone joint of the human body, so that a user can conveniently and quickly control the action of the virtual image through the action of the user. For example, in a live scene, the anchor can freely control the corresponding avatar action through the action of the anchor. Specifically, as shown in fig. 6, the method may further include S103.

In S103, the avatar is controlled to perform a corresponding gesture action according to the gesture data of the human body in the target human body image.

The present disclosure also provides a training method of a human body posture estimation model, as shown in fig. 2, the method includes S201 to S203.

Optionally, the head region is determined by:

acquiring marking information corresponding to the sample human body image, wherein the marking information is used for marking human body bone key points or human body key parts contained in the sample human body image;

and determining a head region from the sample human body image according to the labeling information.

Optionally, the labeling information is used for labeling the human skeleton key points, where the human skeleton key points include a left shoulder key point and a right shoulder key point;

determining a head region from the sample human body image according to the labeling information, including:

determining a region above a target key point in the sample human body image as a head region, wherein the target key point is a key point which is closer to the upper edge of the sample human body image in the left shoulder key point and the right shoulder key point.

Optionally, the labeling information is used for labeling the human body key part, wherein the human body key part includes a head;

and determining a region corresponding to the head marked by the marking information in the sample human body image as a head region.

Optionally, the erasing the head region in the part of the sample human body image includes any one of:

replacing a head region in a part of the sample human body image with a preset image, wherein the preset image does not include head information;

adjusting the pixel values of all the pixel points in the head region in part of the sample human body image to be target pixel values corresponding to the corresponding head region respectively, wherein the target pixel values corresponding to the corresponding head region are the average of the pixel values of all the pixel points in the head region;

and respectively randomly adjusting the pixel value of each pixel point in the head region of the partial sample human body image.

Fig. 7 is a block diagram illustrating a human body posture estimation apparatus according to an exemplary embodiment. As shown in fig. 7, the apparatus 700 includes: a first obtaining module 701, configured to obtain a target human body image; a pose determining module 702, configured to input the target human body image acquired by the first acquiring module 701 into a human body pose estimation model, so as to obtain pose data of a human body in the target human body image; as shown in fig. 8, the human body posture estimation model is obtained by training a training apparatus 800 of the human body posture estimation model, where the training apparatus 800 of the human body posture estimation model includes: a second obtaining module 801, configured to obtain a plurality of sample human body images and posture data of a human body in each sample human body image; an erasing module 802, configured to erase a head region in the partial sample human body image; a training module 803, configured to perform model training according to the sample human body images obtained after the erasing processing by the erasing module 802, the sample human body images that are not erased, and the posture data of the human body in each sample human body image, so as to obtain the human body posture estimation model.

It should be noted that the training device 800 of the human body posture estimation model may be provided independently of the human body posture estimation device 700, or may be integrated in the human body posture estimation device 700, and is not particularly limited in this disclosure.

Optionally, the apparatus 700 further comprises: and the control module is used for controlling the virtual image to execute corresponding gesture actions according to the gesture data of the human body in the target human body image.

Optionally, the erasing module 802 includes: the acquisition submodule is used for acquiring marking information corresponding to the sample human body image, wherein the marking information is used for marking human body bone key points or human body key parts contained in the sample human body image; and the determining submodule is used for determining a head region from the sample human body image according to the labeling information.

Optionally, the labeling information is used for labeling the human skeleton key points, where the human skeleton key points include a left shoulder key point and a right shoulder key point; the determining submodule is used for determining an area, located above a target key point, in the sample human body image as a head area, wherein the target key point is a key point which is closer to the upper edge of the sample human body image in the left shoulder key point and the right shoulder key point.

Optionally, the labeling information is used for labeling the human body key part, wherein the human body key part includes a head; the determining submodule is used for determining a region corresponding to the head marked by the marking information in the sample human body image as a head region.

Optionally, the erase module 802 comprises any of: a replacement sub-module, configured to replace a head region in a part of the sample human body image with a preset image, where the preset image does not include head information; a first adjusting submodule, configured to adjust pixel values of pixels in a header region of a part of the sample human body image to target pixel values corresponding to the corresponding header region, respectively, where the target pixel value corresponding to the corresponding header region is an average of the pixel values of the pixels in the header region; and the second adjusting submodule is used for respectively and randomly adjusting the pixel value of each pixel point in the head area in part of the sample human body image.

The present disclosure further provides a training apparatus for a human body posture estimation model, as shown in fig. 8, the training apparatus 800 for a human body posture estimation model includes: a second obtaining module 801, configured to obtain a plurality of sample human body images and posture data of a human body in each sample human body image; an erasing module 802, configured to erase a head region in the partial sample human body image; a training module 803, configured to perform model training according to the sample human body images obtained after the erasing processing by the erasing module 802, the sample human body images that are not erased, and the posture data of the human body in each sample human body image, so as to obtain the human body posture estimation model.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Referring now to FIG. 9, a block diagram of an electronic device 900 (e.g., a client or server) suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 901 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The processing apparatus 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication device 909 may allow the electronic apparatus 900 to perform wireless or wired communication with other apparatuses to exchange data. While fig. 9 illustrates an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to acquire a target human body image; inputting the target human body image into a human body posture estimation model to obtain posture data of a human body in the target human body image; the human body posture estimation model is obtained by training in the following mode: acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image; and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain the human body posture estimation model.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image; and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain a human body posture estimation model.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation to the module itself, and for example, the first acquisition module may also be described as a "module that acquires an image of a target human body".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In accordance with one or more embodiments of the present disclosure, example 1 provides a human body posture estimation method, including: acquiring a target human body image; inputting the target human body image into a human body posture estimation model to obtain posture data of a human body in the target human body image; the human body posture estimation model is obtained by training in the following mode: acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image; and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain the human body posture estimation model.

Example 2 provides the method of example 1, further comprising, in accordance with one or more embodiments of the present disclosure: and controlling the virtual image to execute corresponding posture action according to the posture data of the human body in the target human body image.

Example 3 provides the method of example 1 or 2, the head region determined by: acquiring marking information corresponding to the sample human body image, wherein the marking information is used for marking human body bone key points or human body key parts contained in the sample human body image; and determining a head region from the sample human body image according to the labeling information.

Example 4 provides the method of example 3, the labeling information being for labeling the human bone keypoints including a left shoulder keypoint and a right shoulder keypoint; determining a head region from the sample human body image according to the labeling information, including: determining a region above a target key point in the sample human body image as a head region, wherein the target key point is a key point which is closer to the upper edge of the sample human body image in the left shoulder key point and the right shoulder key point.

Example 5 provides the method of example 3, the labeling information being for labeling the human critical part, wherein the human critical part includes a head; determining a head region from the sample human body image according to the labeling information, including: and determining a region corresponding to the head marked by the marking information in the sample human body image as a head region.

Example 6 provides the method of example 1 or 2, the erasing a head region in the partial sample human body image, comprising any one of: replacing a head region in a part of the sample human body image with a preset image, wherein the preset image does not include head information; adjusting the pixel values of all the pixel points in the head region in part of the sample human body image to be target pixel values corresponding to the corresponding head region respectively, wherein the target pixel values corresponding to the corresponding head region are the average of the pixel values of all the pixel points in the head region; and respectively randomly adjusting the pixel value of each pixel point in the head region of the partial sample human body image.

Example 7 provides a method of training a human pose estimation model, according to one or more embodiments of the present disclosure, including: acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; erasing the head area in part of the sample human body image; and performing model training according to the sample human body image obtained after the erasing treatment, the sample human body image which is not subjected to the erasing treatment and the posture data of the human body in each sample human body image to obtain a human body posture estimation model.

Example 8 provides the method of example 7, the head region determined by: acquiring marking information corresponding to the sample human body image, wherein the marking information is used for marking human body bone key points or human body key parts contained in the sample human body image; and determining a head region from the sample human body image according to the labeling information.

Example 9 provides the method of example 8, the labeling information to label the human bone keypoints including a left shoulder keypoint and a right shoulder keypoint, according to one or more embodiments of the present disclosure; determining a head region from the sample human body image according to the labeling information, including: determining a region above a target key point in the sample human body image as a head region, wherein the target key point is a key point which is closer to the upper edge of the sample human body image in the left shoulder key point and the right shoulder key point.

Example 10 provides the method of example 8, the labeling information to label the human critical part, wherein the human critical part comprises a head;

Example 11 provides the method of example 7, the erasing a head region in the partial sample human body image, comprising any one of: replacing a head region in a part of the sample human body image with a preset image, wherein the preset image does not include head information; adjusting the pixel values of all the pixel points in the head region in part of the sample human body image to be target pixel values corresponding to the corresponding head region respectively, wherein the target pixel values corresponding to the corresponding head region are the average of the pixel values of all the pixel points in the head region; and respectively randomly adjusting the pixel value of each pixel point in the head region of the partial sample human body image.

Example 12 provides, in accordance with one or more embodiments of the present disclosure, a human body pose estimation apparatus, including: the first acquisition module is used for acquiring a target human body image; the posture determining module is used for inputting the target human body image acquired by the first acquiring module into a human body posture estimating model to obtain posture data of a human body in the target human body image; the human body posture estimation model is obtained by training through a training device of the human body posture estimation model, and the training device of the human body posture estimation model comprises: the second acquisition module is used for acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; the erasing module is used for erasing the head area in the part of the sample human body image; and the training module is used for carrying out model training according to the sample human body images obtained after the erasing processing of the erasing module, the sample human body images which are not subjected to the erasing processing and the posture data of the human body in each sample human body image so as to obtain the human body posture estimation model.

Example 13 provides, in accordance with one or more embodiments of the present disclosure, a training apparatus for a human body pose estimation model, including: the second acquisition module is used for acquiring a plurality of sample human body images and posture data of a human body in each sample human body image; the erasing module is used for erasing the head area in part of the sample human body image; and the training module is used for carrying out model training according to the sample human body images obtained after the erasing processing of the erasing module, the sample human body images which are not subjected to the erasing processing and the posture data of the human body in each sample human body image so as to obtain a human body posture estimation model.

Example 14 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-6, in accordance with one or more embodiments of the present disclosure.

Example 15 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 7-11, in accordance with one or more embodiments of the present disclosure.

Example 16 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-6.

Example 17 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 7-11.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A human body posture estimation method is characterized by comprising the following steps:

acquiring a target human body image;

inputting the target human body image into a human body posture estimation model to obtain posture data of a human body in the target human body image, wherein the posture data comprise rotation quaternion information of each bone joint of the human body in the corresponding human body image, and the rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in space;

controlling the virtual image to execute corresponding gesture actions according to the gesture data of the human body in the target human body image;

erasing the head area in part of the sample human body image;

2. The method of claim 1, wherein the head region is determined by:

3. The method according to claim 2, wherein the labeling information is used for labeling the human skeleton key points, which include a left shoulder key point and a right shoulder key point;

4. The method according to claim 2, wherein the labeling information is used for labeling the human body key part, wherein the human body key part comprises a head;

5. The method according to claim 1, wherein the erasing the head region in the partial sample human body image comprises any one of:

6. A training method of a human body posture estimation model is characterized by comprising the following steps:

acquiring a plurality of sample human body images and posture data of a human body in each sample human body image, wherein the posture data comprises rotation quaternion information of each bone joint of the human body in the corresponding human body image, and the rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in space;

erasing the head area in part of the sample human body image;

7. The method of claim 6, wherein the head region is determined by:

8. The method according to claim 7, wherein the labeling information is used for labeling the human skeleton key points, and the human skeleton key points comprise a left shoulder key point and a right shoulder key point;

9. The method according to claim 7, wherein the labeling information is used for labeling the human body key part, wherein the human body key part comprises a head;

10. The method according to claim 6, wherein the erasing the head region in the partial sample human body image comprises any one of:

11. A human body posture estimation device, characterized by comprising:

the first acquisition module is used for acquiring a target human body image;

the posture determining module is used for inputting the target human body image acquired by the first acquiring module into a human body posture estimating model to obtain posture data of a human body in the target human body image, wherein the posture data comprises rotation quaternion information of each bone joint of the human body in a corresponding human body image, and the rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in space;

the control module is used for controlling the virtual image to execute corresponding gesture actions according to the gesture data of the human body in the target human body image;

12. A training device for a human body posture estimation model is characterized by comprising:

the second acquisition module is used for acquiring a plurality of sample human body images and posture data of a human body in each sample human body image, wherein the posture data comprises rotation quaternion information of each bone joint of the human body in the corresponding human body image, and the rotation quaternion information is used for describing the rotation amplitude of each bone joint of the human body in space;

13. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 5.

14. A computer-readable medium, on which a computer program is stored, which program, when being executed by processing means, is adapted to carry out the steps of the method of any one of claims 6 to 10.

15. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 5.

16. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 6 to 10.